Q&A Bot Using Langchain, Huggingface Embedding, OpenAI LLM

Free huggingface embedding models for great Q&A bot results

LLMs have made it extremely easy to build Chatbots. In this tutorial, I will show you how to leverage these tools to construct a custom Q&A bot using a document of your choice as the data source. And I will show you how to use embedding models from Huggingface instead of OpenAI to save compute cost.

We'll start by talking about the specific tools we will be using:

  • Langchain - abstraction framework for working with AI models. The library makes it much easier to implement common AI tasks.
  • Huggingface - online open source community where AI experts host their models and datasets. We will be using some free embedding models from here.
  • OpenAI - needs no introduction at this point. We'll be using the gpt-3 LLM from OpenAI

Project Summary

We will upload an earnings transcript from Meta in PDF format. And we'll ask the Q&A bot questions about the content of the document.

1. Install dependencies

!pip install langchain "langchain[docarray]" openai sentence_transformers

2. Load the document and split into chunks

We can use the PyPDFLoader provided by langchain to easily load text in a PDF file. We want to split this text into chunks because LLMs have a limit to the number of tokens it is able to process in a single query. GPT-3 (text-davinci-003) accepts up to 4,096 tokens. For the purpose of the tutorial, we'll split into smaller chunks to demonstrate the mechanism.

from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = PyPDFLoader("./META-Q1-2023-Earnings-Call-Transcript.pdf")
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 200,
    chunk_overlap  = 20,
    length_function = len,
texts = text_splitter.split_documents(docs)


page_content='1 \n Meta Platforms , Inc. ( META ) \nFirst Quarter 2023 Results Conference Call  \nApril 26th, 202 3 \n \nDeborah Crawford, VP, Investor Relations' metadata={'source': '/content/drive/My Drive/Colab Notebooks/META-Q1-2023-Earnings-Call-Transcript.pdf', 'page': 0}
page_content='Thank you. Good afternoon and welcome to Meta Platforms first quarter 2023 earnings conference call. \nJoining me today to discuss our results are Mark Zuckerberg, CEO and Susan Li, CFO.' metadata={'source': '/content/drive/My Drive/Colab Notebooks/META-Q1-2023-Earnings-Call-Transcript.pdf', 'page': 0}

The document has now been split into 394 chunks. Each chunk has up to 400 characters.

3. Use HugginFaceEmbeddings to build search

HuggingFace offers a number of very good open source models. "sentence-transformers" are text embedding models. You can see a list that is offered on HuggingFace website. I picked the most popular one all-MiniLM-L6-v2 which creates a 384 dimensional vector. In comparison, OpenAI embedding creates a 1,536 dimensions vector using the text-embedding-ada-002 model.

The best part about using HuggingFace embeddings? It is completely free! OpenAI will charge you $0.0001 / 1K tokens - this doesn't sound like a lot, but it really adds up for large documents.

DocArrayInMemorySearch is an in-memory document index store. It's great for testing small documents, but if you want to go to production I'd suggest going with Pinecone or Deep Lake

from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import DocArrayInMemorySearch

model_id = 'sentence-transformers/all-MiniLM-L6-v2'
model_kwargs = {'device': 'cpu'}
hf_embedding = HuggingFaceEmbeddings(
db = DocArrayInMemorySearch.from_documents(

To test that this is working, you can run the following query code:

docs = db.similarity_search("how popular is ig reels?")


[Document(page_content='Reels continues to grow quickly on both Facebook and Instagram. Reels also continue to become more',...

You will see a list of top document chunks most relevant to the query. It is showing that the vector indexing is working as expected.

4. Use OpenAI LLM to answer question given db context

We will be using OpenAI LLM to answer the user question. I have tried to use the models on HuggingFace, but you will need a massive machine to run the LLM models. GPT3 has 175 billion parameters, you would need a TON of RAM to load up a model that size. Even a 40b model like tiiuae/falcon-40b will not fit into latest Macbooks.

from langchain.llms import OpenAI
from langchain.chains import RetrievalQA

import os
os.environ['OPENAI_API_KEY'] = <INSERT>

llm = OpenAI(temperature=0.9)
qa_stuff = RetrievalQA.from_chain_type(

We can test how well the QA model is doing:

response = qa_stuff.run("How quickly is reels adoption growing?")


Reels continues to grow quickly on both Facebook and Instagram, and the number of Reels reshares has doubled over the last six months.

This is correct if you search through the PDF document! And you can try to ask about revenue, risks, etc. The bot will give all the right answers!


This tutorial is a quick proof of concept on building a simple Q&A bot using Langchain, HuggingFace, and OpenAI. Modern AI tools make it really simple to build powerful applications. Additionally, when working with AI models, you need to think about the compute and data costs. You should always see if an open source model will suit your needs.