Contents
- 1 Building an Intelligent Question-Answering System with Tavily Search API, Chroma, Google Gemini LLMs, and LangChain
- 1.1 What This Means for You
- 1.2 Original Post
- 1.3 Code Summary and Explanation
- 1.3.1 Imports
- 1.3.2 API Key Handling and Logging Setup
- 1.3.3 Component Imports
- 1.3.4 Exception Classes and Utility Functions
- 1.3.5 Search Query Parser
- 1.3.6 EnhancedTavilyRetriever
- 1.3.7 SearchCache
- 1.3.8 Component Initialization
- 1.3.9 Prompt Templates and Instantiations
- 1.3.10 Getting the LLM, Output Parser, and helper functions
- 1.3.11 Advanced Chain and Additional Functions
- 1.4 Key Terms
- 1.5 Search the Web
Building an Intelligent Question-Answering System with Tavily Search API, Chroma, Google Gemini LLMs, and LangChain
In this tutorial, we demonstrate how to create a powerful and intelligent question-answering system by combining the strengths of Tavily Search API, Chroma, Google Gemini LLMs, and the LangChain framework. The pipeline leverages real-time web search, semantic document caching, and contextual response generation to provide accurate and relevant answers to user queries.
What This Means for You
- Leverage real-time web search and semantic document caching to build an intelligent QA system.
- Combine the strengths of Tavily Search API, Chroma, Google Gemini LLMs, and LangChain to create a powerful and flexible pipeline.
- Implement advanced features like domain-specific filtering, query analysis, and semantic vector caching for improved search and response quality.
- Learn how to integrate and orchestrate different components to enable efficient and context-aware conversational AI.
Original Post
…
Code Summary and Explanation
Here we provide a clean HTML format of the tutorial code with proper formatting and structure:
Imports
import os
import getpass
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import json
import time
from typing import List, Dict, Any, Optional
from datetime import datetime
API Key Handling and Logging Setup
if "TAVILY_API_KEY" not in os.environ:
os.environ["TAVILY_API_KEY"] = getpass.getpass("Enter Tavily API key: ")
if "GOOGLE_API_KEY" not in os.environ:
os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter Google API key: ")
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s")
logger = logging.getLogger(__name__)
Component Imports
from langchain_community.retrievers import TavilySearchAPIRetriever
from langchain_community.vectorstores import Chroma
from langchain_core.documents import Document
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from langchain.memory import ConversationBufferMemory
Exception Classes and Utility Functions
class SearchQueryError(Exception):
"""Exception raised for errors in the search query."""
pass
def format_docs(docs):
formatted_content = []
for i, doc in enumerate(docs):
metadata = doc.metadata
source = metadata.get('source', 'Unknown source')
title = metadata.get('title', 'Untitled')
score = metadata.get('score', 0)
formatted_content.append(
f"Document {i+1} [Score: {score:.2f}]:n"
f"Title: {title}n"
f"Source: {source}n"
f"Content: {doc.page_content}n"
)
return "nn".join(formatted_content)
Search Query Parser
class SearchResultsParser:
def parse(self, text):
try:
if isinstance(text, str):
import re
import json
json_match = re.search(r'{.*}', text, re.DOTALL)
if json_match:
json_str = json_match.group(0)
return json.loads(json_str)
return {"answer": text, "sources": [], "confidence": 0.5}
elif hasattr(text, 'content'):
return {"answer": text.content, "sources": [], "confidence": 0.5}
else:
return {"answer": str(text), "sources": [], "confidence": 0.5}
except Exception as e:
logger.warning(f"Failed to parse JSON: {e}")
return {"answer": str(text), "sources": [], "confidence": 0.5}
EnhancedTavilyRetriever
class EnhancedTavilyRetriever:
...
SearchCache
class SearchCache:
...
Component Initialization
search_cache = SearchCache()
enhanced_retriever = EnhancedTavilyRetriever(max_results=5)
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
Prompt Templates and Instantiations
system_template = """You are a research assistant that provides accurate answers based on the search results provided.
Follow these guidelines:
1. Only use the context provided to answer the question
2. If the context doesn't contain the answer, say "I don't have sufficient information to answer this question."
3. Cite your sources by referencing the document numbers
4. Don't make up information
5. Keep the answer concise but complete
Context: {context}
Chat History: {chat_history}
"""
system_message = SystemMessagePromptTemplate.from_template(system_template)
human_template = "Question: {question}"
human_message = HumanMessagePromptTemplate.from_template(human_template)
prompt = ChatPromptTemplate.from_messages([system_message, human_message])
Getting the LLM, Output Parser, and helper functions
def get_llm(model_name="gemini-2.0-flash-lite", temperature=0.2, response_mode="json"):
...
output_parser = SearchResultsParser()
def retrieve_with_fallback(query):
...
def summarize_documents(documents, query):
...
Advanced Chain and Additional Functions
def advanced_chain(query_engine="enhanced", model="gemini-1.5-pro", include_history=True):
...
def analyze_query(query):
...
print("Advanced Tavily-Gemini Implementation")
print("="*50)
query = "what year was breath of the wild released and what was its reception?"
print(f"Query: {query}")
try:
...
except Exception as e:
print(f"Error in search: {e}")
history = enhanced_retriever.get_search_history()
print("nSearch History:")
for i, h in enumerate(history):
print(f"{i+1}. Query: {h['query']} - Results: {h['num_results']} - Time: {h['response_time']:.2f}s")
print("nAdvanced search with domain filtering:")
specialized_retriever = EnhancedTavilyRetriever(
max_results=3,
search_depth="advanced",
include_domains=["nintendo.com", "zelda.com"],
exclude_domains=["reddit.com", "twitter.com"]
)
try:
...
except Exception as e:
print(f"Error in specialized search: {e}")
print("nSearch Metrics:")
plot_search_metrics(history)
Key Terms
- Conversational AI
- Question-Answering System
- Real-time Web Search
- Semantic Document Caching
- Tavily Search API
- Chroma
- Google Gemini LLMs
- LangChain framework
- Advanced prompt engineering
- Sentiment and entity analysis
- Dynamic vector store updates
ORIGINAL SOURCE:
Source link