Chatbot with web and proprietary data retrieval using LangChain
Before You Get Started
To use the You.com Search API, you will need an API key. Please visit api.you.com for more details.
Creating a chatbot that retrieves data from the web and a proprietary data source
Follow this guide to create basic chatbots in LangChain that retrieve context from proprietary data sources and the web using the You.com API. You will need to have
the environment variables YDC_API_KEY
and OPENAI_API_KEY
set up to follow this guide.
Installation
These are the packages and associated versions that need to be installed to follow this guide.
pip install langgraph==0.0.59
pip install pandas==2.2.2
pip install openai==1.30.3
pip install langchain==0.2.1
pip install langchain_community==0.2.1
pip install langchain_openai==0.1.7
pip install langchain_text_splitters==0.2.0
pip install langchain_core==0.2.1
pip install numpy==1.26.4
pip install python-dotenv==1.0.1
pip install faiss-cpu==1.8.0
Creating a chatbot using Chains in LangChain
Instantiate the You.com Retriever in LangChain
from langchain_community.retrievers.you import YouRetriever
ydc_retriever = YouRetriever(num_web_results = 10)
Proprietary Datasource Retriever
In this example, a CSV file is being loaded, split into chunks, the chunks are being vectorized, stored in a Facebook AI Similarity Search (FAISS) vector store, and a LangChain retriever is being created from the vector store. You can check out various the document loaders and retrievers in the LangChain docs.
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders.csv_loader import CSVLoader
loader = CSVLoader(file_path = "<Insert your csv file path here>")
data = loader.load()
# split the document into chunks, create a vector representation of the chunks, and store them in a FAISS vector store
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 100)
docs = text_splitter.split_documents(data)
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(documents=docs, embedding=embeddings)
faiss_retriever = db.as_retriever()
Creating an Ensemble Retriever in LangChain
An ensemble retriever that can ensemble the results of both the FAISS vector store retriever and the You.com retriever.
from langchain.retrievers import EnsembleRetriever
ensemble_retriever = EnsembleRetriever(
retrievers = [ydc_retriever, faiss_retriever], weights = [0.5, 0.5]
)
Create chains
This block demonstrates how to create basic chains without chat history. This assumes that you have created a LangChain ChatPromptTemplate and stored it in a variable called qa_prompt
.
To add chat history to your chatbot, you can check out the LangChain documentation on adding chat history to Q & A applications.
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o", temperature=0.5)
qa_chain = create_stuff_documents_chain(llm, qa_prompt)
rag_chain = create_retrieval_chain(ensemble_retriever, qa_chain)
Creating a chatbot using Agents in LangChain
Instantiate the You.com Tool in LangChain
from langchain_community.tools.you import YouSearchTool
from langchain_community.utilities.you import YouSearchAPIWrapper
api_wrapper = YouSearchAPIWrapper(num_web_results = 10)
ydc_tool = YouSearchTool(api_wrapper=api_wrapper)
Proprietary Datasource Retrieval Tool
Convert the proprietary datasource LangChain retriever we defined above into a tool.
from langchain.tools.retriever import create_retriever_tool
# convert this retriver into a tool
faiss_retriever_tool = create_retriever_tool(
faiss_retriever,
name = "custom_dataset_retriever",
description = "Retrieve relevant context from a custom dataset."
)
Creating a LangGraph Agent
The FAISS vector store retriever tool and the You.com tool can be passed as tools to the LangGraph agent. LangGraph agents come with built-in persistence, so we can add memory to the chatbot to engage in back-and-forth conversations. In this example, an in-memory checkpointer is being utilized to track chat history.
from langgraph.prebuilt import chat_agent_executor
from langgraph.checkpoint import MemorySaver
# Create a checkpointer to use memory
memory = MemorySaver()
# the vector store representation of the CSV dataset and the You.com Search tool will both be passed as tools to the agent
tools = [faiss_retriever_tool, ydc_tool]
agent_executor = chat_agent_executor.create_tool_calling_executor(llm, tools, checkpointer=memory)