Artificial Intelligence · Tutorial
RAG Tutorial 2026: Build AI Chatbot with
LangChain & ChromaDB (Step-by-Step Guide)
Learn RAG step-by-step with LangChain and ChromaDB. Build a real AI chatbot with full code, examples, and architecture explained — complete RAG LangChain tutorial for 2026.
AI-N305 June 2026 Advanced ~18 min read By Ragavi S
2K+
Lines of Code
9
Steps
#1
AI Topic 2026
1
GitHub Repo

I've seen a lot of RAG tutorials that explain the concept beautifully — then leave you staring at a 10-line pseudocode example. Not useful. This one is different. We're building an actual working AI chatbot using LangChain and ChromaDB, the kind where you drop in a real PDF and start asking questions immediately. Every file. Every line. Explained. If you've been googling "RAG LangChain tutorial" or "ChromaDB vector database" and landing on half-finished Medium posts — stick around. This is the guide I wish existed when I first started.

TECH JOURNALISM AI-N305 RAG Tutorial 2026 — Build AI Chatbot with LangChain & ChromaDB · Step-by-Step Guide 🦜 LangChain 🗄️ ChromaDB 🐍 Python ⚙️ Full Code 📄 YOUR DOCS 🧮 VECTORS 🗄️ CHROMADB 🤖 LLM ANSWER RAG PIPELINE mrs-journalism.blogspot.com 2026
"Most teams that think they need fine-tuning actually just need RAG. Fine-tuning costs thousands. RAG costs an API call."

RAG Tutorial for Beginners — What is Retrieval Augmented Generation?

01RAG Tutorial for Beginners — What Is It?

Here's the thing nobody tells you when you first start building with LLMs — they're confidently wrong. GPT-4, Claude, Gemini, it doesn't matter. Ask any of them about your internal documentation, your last product release, or something that happened three months ago and you'll get either a hallucinated answer or a polite "I don't have access to that." Neither works in production. This RAG tutorial is how you fix that properly.

The core idea behind Retrieval Augmented Generation is honestly pretty elegant. Rather than trying to shove all your knowledge into model weights (expensive, slow, inflexible), you just retrieve the relevant pieces at query time and hand them to the model as context. Three steps, and you're done:

  1. Break your documents into chunks, embed them, store in a ChromaDB vector database
  2. When someone asks something, run a semantic similarity search to pull the relevant chunks
  3. Feed those chunks to your LLM as context — it answers from your actual data, not from training memory

RAG Pipeline — End to End
📄
Your Docs
PDF · TXT · Web
✂️
Chunking
Split + overlap
🧮
Embeddings
Dense vectors
🗄️
ChromaDB
Vector store
🤖
LLM Answer
GPT / Claude

How to Build AI Chatbot using LangChain — Architecture

02How to Build AI Chatbot using LangChain
🔀
Document Loader
Your entry point. LangChain has 100+ loaders — PDF, Word, Notion, web scraping, you name it. For this tutorial we're using PyPDFLoader, but swapping it later takes one line.
✂️
Text Splitter
This part trips up most beginners. You can't feed a 50-page PDF to an LLM in one shot. You split it into small overlapping chunks — the overlap is what stops answers from getting cut off mid-thought.
🧮
Embedding Model
Converts text into numbers (vectors) that capture meaning. We'll use OpenAI's text-embedding-3-small — it's cheap, fast, and surprisingly good. Open-source options exist if you want to cut costs further.
🗄️
Vector Store
Where the vectors live. ChromaDB runs locally with zero setup — perfect for building and testing. When you're ready for production, Pinecone or Weaviate scale to millions of docs without changing much code.

Environment Setup — LangChain RAG Python Installation

03Environment Setup

Python 3.10 or higher. Use a virtual environment — mixing global packages is how you end up spending an afternoon debugging import errors that have nothing to do with your actual RAG code.

TERMINALbash
# Create virtual environment python -m venv rag-env source rag-env/bin/activate # Install all dependencies for RAG LangChain ChromaDB pip install langchain langchain-community langchain-openai pip install chromadb openai tiktoken pypdf pip install python-dotenv streamlit

RAG Document Loader — Load and Split Documents with LangChain

04Load & Split Documents
document_processor.pyPython
from langchain_community.document_loaders import PyPDFLoader from langchain.text_splitter import RecursiveCharacterTextSplitter def load_and_split(file_path: str) -> list: loader = PyPDFLoader(file_path) documents = loader.load() splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200, separators=["\n\n", "\n", " ", ""] ) chunks = splitter.split_documents(documents) print(f"✅ {len(documents)} pages → {len(chunks)} chunks") return chunks

The chunk_size and chunk_overlap parameters here are things I've tweaked across multiple projects. 1000 characters per chunk works well for most documents — not too small that you lose context, not so large that retrieval becomes vague. The 200-character overlap? That's the bit most tutorials skip. Without it, you'll occasionally get answers that feel cut off because the relevant sentence happened to fall right at a chunk boundary.

ChromaDB Vector Database Explained — Build Your Vector Store

05ChromaDB Vector Database Explained

ChromaDB is what makes this whole setup so easy to get running locally. It's just a folder on your disk — no Docker, no cloud account, no sign-up. You embed your chunks, it saves them, and later you query by meaning instead of keyword matching. That last part is what makes vector search feel almost magical the first time you try it.

vectorstore.pyPython
from langchain_openai import OpenAIEmbeddings from langchain_community.vectorstores import Chroma def create_vectorstore(chunks, persist_dir="./chroma_db"): embeddings = OpenAIEmbeddings(model="text-embedding-3-small") vectorstore = Chroma.from_documents( documents=chunks, embedding=embeddings, persist_directory=persist_dir ) return vectorstore def load_vectorstore(persist_dir="./chroma_db"): embeddings = OpenAIEmbeddings(model="text-embedding-3-small") return Chroma(persist_directory=persist_dir, embedding_function=embeddings)

Build AI Chatbot with RAG — The Core Chain Logic

06Build AI Chatbot with RAG — Core Chain
rag_chain.pyPython
from langchain_openai import ChatOpenAI from langchain.chains import RetrievalQA from langchain.prompts import PromptTemplate PROMPT_TEMPLATE = """Use ONLY the context below to answer. If not in context, say "I don't have enough information." Context: {context} Question: {question} Answer:""" def build_rag_chain(vectorstore): llm = ChatOpenAI(model="gpt-4o-mini", temperature=0) chain = RetrievalQA.from_chain_type( llm=llm, chain_type="stuff", retriever=vectorstore.as_retriever(search_kwargs={"k": 4}), chain_type_kwargs={"prompt": PromptTemplate.from_template(PROMPT_TEMPLATE)}, return_source_documents=True ) return chain
⚠️ Keep temperature at 0. I know it's tempting to bump it up for more "creative" responses — don't. The whole point of RAG is factual grounding. A temperature above 0 lets the model start improvising around your retrieved context, which brings hallucinations right back in through the side door.

Streamlit Chat Interface — Run Your RAG AI Chatbot

07Streamlit Chat Interface
app.pyPython
import streamlit as st from vectorstore import load_vectorstore from rag_chain import build_rag_chain st.title("📄 RAG AI Chatbot — Document Aware") @st.cache_resource def load_chain(): return build_rag_chain(load_vectorstore("./chroma_db")) chain = load_chain() if "messages" not in st.session_state: st.session_state.messages = [] for msg in st.session_state.messages: with st.chat_message(msg["role"]): st.write(msg["content"]) if prompt := st.chat_input("Ask anything about your document..."): result = chain({"query": prompt}) answer = result["result"] with st.chat_message("assistant"): st.write(answer) st.session_state.messages.append({"role":"assistant","content":answer})

ChromaDB vs Pinecone vs Weaviate — Which Vector Database to Choose?

08ChromaDB vs Pinecone vs Weaviate

Quick honest answer: start with ChromaDB, switch to Pinecone when you ship. I've used all three — ChromaDB is unbeatable for local development because there's literally nothing to set up. Pinecone is where you go when you have real users and need reliability and scale. Weaviate is worth looking at if you need hybrid search (combining keyword + semantic). The good news is the LangChain abstraction means switching between them is maybe 3 lines of code.

FeatureChromaDBPineconeWeaviate
Setuppip install · localManaged cloudDocker / Cloud
CostFreePaid after free tierOpen source
ScaleMillions of docsBillions of vectorsEnterprise scale
Best ForDev · PrototypingProduction SaaSHybrid search

Retrieval Augmented Generation Example — RAG vs Naive LLM

09Retrieval Augmented Generation Example — RAG vs Naive LLM
✗ Naive LLM
Training data only — no private docs
Confident hallucinations
Knowledge frozen at cutoff
No source attribution
✓ RAG System
Grounded in your documents
Cites exact source + page number
Update docs without retraining
Auditable · Enterprise-ready
⚙️
ragavi-s / rag-langchain-chromadb
Public · Python · MIT License
⭐ Star this repo
rag-langchain-chromadb/ ├── docs/ # Put your PDFs here ├── chroma_db/ # Auto-generated vector store ├── document_processor.py # Step 4 — Load & split ├── vectorstore.py # Step 5 — ChromaDB logic ├── rag_chain.py # Step 6 — RAG chain ├── app.py # Step 7 — Streamlit UI ├── requirements.txt └── README.md
→ View on GitHub
Before You Close This Tab
What is RAGStop thinking of it as a fancy feature. It's just: retrieve relevant chunks, hand them to the LLM, get an answer that's actually grounded in your data.
Core StackLangChain handles the orchestration, ChromaDB holds your vectors, OpenAI does the embedding and generation. Each piece is replaceable.
Dev → ProdBuild everything with ChromaDB. When you're ready to ship, swap to Pinecone. Seriously, it's 3 lines.
Don't Skip Thistemperature=0, chunk_overlap=200, k=4 chunks retrieved. These defaults took me a while to land on — trust them until you have a reason to change them.
What's NextOnce this works, look into re-ranking (it meaningfully improves answer quality) and hybrid search if your documents have lots of proper nouns or codes that semantic search struggles with.
👩‍💻
Written By
Ragavi S
Independent tech writer based in India. I write about AI, Python, and developer tools — mostly things I've actually built and broken myself before writing about them. Founder of Tech Journalism. No sponsored opinions, no hype, just code that works.
If this saved you time, pass it on.

Built something with this? Drop a GitHub link in the comments — always curious to see how people extend it. And if something didn't work the way I described, tell me that too.

TECH JOURNALISM — Independent Tech Publication · mrs-journalism.blogspot.com · 2026