Before You Get Started

To use the You.com Search API, you will need an API key. Please visit api.you.com for more details.

Creating a chatbot that retrieves data from the web and a proprietary data source

Follow this guide to create basic chatbots in LangChain that retrieve context from proprietary data sources and the web using the You.com API. You will need to have the environment variables YDC_API_KEY and OPENAI_API_KEY set up to follow this guide.

Installation

These are the packages and associated versions that need to be installed to follow this guide.

pip install langgraph==0.0.59
pip install pandas==2.2.2
pip install openai==1.30.3
pip install langchain==0.2.1
pip install langchain_community==0.2.1
pip install langchain_openai==0.1.7
pip install langchain_text_splitters==0.2.0
pip install langchain_core==0.2.1
pip install numpy==1.26.4
pip install python-dotenv==1.0.1
pip install faiss-cpu==1.8.0

Creating a chatbot using Chains in LangChain

Instantiate the You.com Retriever in LangChain

from langchain_community.retrievers.you import YouRetriever

ydc_retriever = YouRetriever(num_web_results = 10)

Proprietary Datasource Retriever

In this example, a CSV file is being loaded, split into chunks, the chunks are being vectorized, stored in a Facebook AI Similarity Search (FAISS) vector store, and a LangChain retriever is being created from the vector store. You can check out various the document loaders and retrievers in the LangChain docs.

from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders.csv_loader import CSVLoader

loader = CSVLoader(file_path = "<Insert your csv file path here>")
data = loader.load()
# split the document into chunks, create a vector representation of the chunks, and store them in a FAISS vector store
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 100)
docs = text_splitter.split_documents(data)
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(documents=docs, embedding=embeddings)
faiss_retriever = db.as_retriever()

Creating an Ensemble Retriever in LangChain

An ensemble retriever that can ensemble the results of both the FAISS vector store retriever and the You.com retriever.

from langchain.retrievers import EnsembleRetriever

ensemble_retriever = EnsembleRetriever(
    retrievers = [ydc_retriever, faiss_retriever], weights = [0.5, 0.5]
)

Create chains

This block demonstrates how to create basic chains without chat history. This assumes that you have created a LangChain ChatPromptTemplate and stored it in a variable called qa_prompt. To add chat history to your chatbot, you can check out the LangChain documentation on adding chat history to Q & A applications.

from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o", temperature=0.5)
qa_chain = create_stuff_documents_chain(llm, qa_prompt)
rag_chain = create_retrieval_chain(ensemble_retriever, qa_chain)

Creating a chatbot using Agents in LangChain

Instantiate the You.com Tool in LangChain

from langchain_community.tools.you import YouSearchTool
from langchain_community.utilities.you import YouSearchAPIWrapper

api_wrapper = YouSearchAPIWrapper(num_web_results = 10)
ydc_tool = YouSearchTool(api_wrapper=api_wrapper)

Proprietary Datasource Retrieval Tool

Convert the proprietary datasource LangChain retriever we defined above into a tool.

from langchain.tools.retriever import create_retriever_tool

# convert this retriver into a tool
faiss_retriever_tool = create_retriever_tool(
    faiss_retriever,
    name = "custom_dataset_retriever",
    description = "Retrieve relevant context from a custom dataset."
)

Creating a LangGraph Agent

The FAISS vector store retriever tool and the You.com tool can be passed as tools to the LangGraph agent. LangGraph agents come with built-in persistence, so we can add memory to the chatbot to engage in back-and-forth conversations. In this example, an in-memory checkpointer is being utilized to track chat history.

from langgraph.prebuilt import chat_agent_executor
from langgraph.checkpoint import MemorySaver

# Create a checkpointer to use memory
memory = MemorySaver()
# the vector store representation of the CSV dataset and the You.com Search tool will both be passed as tools to the agent
tools = [faiss_retriever_tool, ydc_tool]
agent_executor = chat_agent_executor.create_tool_calling_executor(llm, tools, checkpointer=memory)