RAG Tutorial 2026: Build AI Chatbot with LangChain & ChromaDB (Step-by-Step Guide)

Artificial Intelligence · Tutorial

RAG Tutorial 2026: Build AI Chatbot with
LangChain & ChromaDB (Step-by-Step Guide)

Learn RAG step-by-step with LangChain and ChromaDB. Build a real AI chatbot with full code, examples, and architecture explained — complete RAG LangChain tutorial for 2026.

AI-N305 June 2026 Advanced ~18 min read By Ragavi S

2K+

Lines of Code

Steps

AI Topic 2026

GitHub Repo

I've seen a lot of RAG tutorials that explain the concept beautifully — then leave you staring at a 10-line pseudocode example. Not useful. This one is different. We're building an actual working AI chatbot using LangChain and ChromaDB, the kind where you drop in a real PDF and start asking questions immediately. Every file. Every line. Explained. If you've been googling "RAG LangChain tutorial" or "ChromaDB vector database" and landing on half-finished Medium posts — stick around. This is the guide I wish existed when I first started.

"Most teams that think they need fine-tuning actually just need RAG. Fine-tuning costs thousands. RAG costs an API call."

RAG Tutorial for Beginners — What is Retrieval Augmented Generation?

01RAG Tutorial for Beginners — What Is It?

Here's the thing nobody tells you when you first start building with LLMs — they're confidently wrong. GPT-4, Claude, Gemini, it doesn't matter. Ask any of them about your internal documentation, your last product release, or something that happened three months ago and you'll get either a hallucinated answer or a polite "I don't have access to that." Neither works in production. This RAG tutorial is how you fix that properly.

The core idea behind Retrieval Augmented Generation is honestly pretty elegant. Rather than trying to shove all your knowledge into model weights (expensive, slow, inflexible), you just retrieve the relevant pieces at query time and hand them to the model as context. Three steps, and you're done:

  1. Break your documents into chunks, embed them, store in a ChromaDB vector database
  2. When someone asks something, run a semantic similarity search to pull the relevant chunks
  3. Feed those chunks to your LLM as context — it answers from your actual data, not from training memory

RAG Pipeline — End to End

📄

Your Docs

PDF · TXT · Web

→

✂️

Chunking

Split + overlap

→

🧮

Embeddings

Dense vectors

→

🗄️

ChromaDB

Vector store

→

🤖

LLM Answer

GPT / Claude

How to Build AI Chatbot using LangChain — Architecture

02How to Build AI Chatbot using LangChain

🔀

Document Loader

Your entry point. LangChain has 100+ loaders — PDF, Word, Notion, web scraping, you name it. For this tutorial we're using PyPDFLoader, but swapping it later takes one line.

✂️

Text Splitter

This part trips up most beginners. You can't feed a 50-page PDF to an LLM in one shot. You split it into small overlapping chunks — the overlap is what stops answers from getting cut off mid-thought.

🧮

Embedding Model

Converts text into numbers (vectors) that capture meaning. We'll use OpenAI's text-embedding-3-small — it's cheap, fast, and surprisingly good. Open-source options exist if you want to cut costs further.

🗄️

Vector Store

Where the vectors live. ChromaDB runs locally with zero setup — perfect for building and testing. When you're ready for production, Pinecone or Weaviate scale to millions of docs without changing much code.

Environment Setup — LangChain RAG Python Installation

03Environment Setup

Python 3.10 or higher. Use a virtual environment — mixing global packages is how you end up spending an afternoon debugging import errors that have nothing to do with your actual RAG code.

TERMINALbash

# Create virtual environment python -m venv rag-env source rag-env/bin/activate # Install all dependencies for RAG LangChain ChromaDB pip install langchain langchain-community langchain-openai pip install chromadb openai tiktoken pypdf pip install python-dotenv streamlit

RAG Document Loader — Load and Split Documents with LangChain

04Load & Split Documents

document_processor.pyPython

from langchain_community.document_loaders import PyPDFLoader from langchain.text_splitter import RecursiveCharacterTextSplitter def load_and_split(file_path: str) -> list: loader = PyPDFLoader(file_path) documents = loader.load() splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200, separators=["\n\n", "\n", " ", ""] ) chunks = splitter.split_documents(documents) print(f"✅ {len(documents)} pages → {len(chunks)} chunks") return chunks

The chunk_size and chunk_overlap parameters here are things I've tweaked across multiple projects. 1000 characters per chunk works well for most documents — not too small that you lose context, not so large that retrieval becomes vague. The 200-character overlap? That's the bit most tutorials skip. Without it, you'll occasionally get answers that feel cut off because the relevant sentence happened to fall right at a chunk boundary.

ChromaDB Vector Database Explained — Build Your Vector Store

05ChromaDB Vector Database Explained

ChromaDB is what makes this whole setup so easy to get running locally. It's just a folder on your disk — no Docker, no cloud account, no sign-up. You embed your chunks, it saves them, and later you query by meaning instead of keyword matching. That last part is what makes vector search feel almost magical the first time you try it.

vectorstore.pyPython

from langchain_openai import OpenAIEmbeddings from langchain_community.vectorstores import Chroma def create_vectorstore(chunks, persist_dir="./chroma_db"): embeddings = OpenAIEmbeddings(model="text-embedding-3-small") vectorstore = Chroma.from_documents( documents=chunks, embedding=embeddings, persist_directory=persist_dir ) return vectorstore def load_vectorstore(persist_dir="./chroma_db"): embeddings = OpenAIEmbeddings(model="text-embedding-3-small") return Chroma(persist_directory=persist_dir, embedding_function=embeddings)

Build AI Chatbot with RAG — The Core Chain Logic

06Build AI Chatbot with RAG — Core Chain

rag_chain.pyPython

from langchain_openai import ChatOpenAI from langchain.chains import RetrievalQA from langchain.prompts import PromptTemplate PROMPT_TEMPLATE = """Use ONLY the context below to answer. If not in context, say "I don't have enough information." Context: {context} Question: {question} Answer:""" def build_rag_chain(vectorstore): llm = ChatOpenAI(model="gpt-4o-mini", temperature=0) chain = RetrievalQA.from_chain_type( llm=llm, chain_type="stuff", retriever=vectorstore.as_retriever(search_kwargs={"k": 4}), chain_type_kwargs={"prompt": PromptTemplate.from_template(PROMPT_TEMPLATE)}, return_source_documents=True ) return chain

⚠️ Keep temperature at 0. I know it's tempting to bump it up for more "creative" responses — don't. The whole point of RAG is factual grounding. A temperature above 0 lets the model start improvising around your retrieved context, which brings hallucinations right back in through the side door.

Streamlit Chat Interface — Run Your RAG AI Chatbot

07Streamlit Chat Interface

app.pyPython

import streamlit as st from vectorstore import load_vectorstore from rag_chain import build_rag_chain st.title("📄 RAG AI Chatbot — Document Aware") @st.cache_resource def load_chain(): return build_rag_chain(load_vectorstore("./chroma_db")) chain = load_chain() if "messages" not in st.session_state: st.session_state.messages = [] for msg in st.session_state.messages: with st.chat_message(msg["role"]): st.write(msg["content"]) if prompt := st.chat_input("Ask anything about your document..."): result = chain({"query": prompt}) answer = result["result"] with st.chat_message("assistant"): st.write(answer) st.session_state.messages.append({"role":"assistant","content":answer})

ChromaDB vs Pinecone vs Weaviate — Which Vector Database to Choose?

08ChromaDB vs Pinecone vs Weaviate

Quick honest answer: start with ChromaDB, switch to Pinecone when you ship. I've used all three — ChromaDB is unbeatable for local development because there's literally nothing to set up. Pinecone is where you go when you have real users and need reliability and scale. Weaviate is worth looking at if you need hybrid search (combining keyword + semantic). The good news is the LangChain abstraction means switching between them is maybe 3 lines of code.

Feature	ChromaDB	Pinecone	Weaviate
Setup	pip install · local	Managed cloud	Docker / Cloud
Cost	Free	Paid after free tier	Open source
Scale	Millions of docs	Billions of vectors	Enterprise scale
Best For	Dev · Prototyping	Production SaaS	Hybrid search

Retrieval Augmented Generation Example — RAG vs Naive LLM

09Retrieval Augmented Generation Example — RAG vs Naive LLM

✗ Naive LLM

Training data only — no private docs

Confident hallucinations

Knowledge frozen at cutoff

No source attribution

✓ RAG System

Grounded in your documents

Cites exact source + page number

Update docs without retraining

Auditable · Enterprise-ready

⚙️

ragavi-s / rag-langchain-chromadb

Public · Python · MIT License

⭐ Star this repo

rag-langchain-chromadb/ ├── docs/ # Put your PDFs here ├── chroma_db/ # Auto-generated vector store ├── document_processor.py # Step 4 — Load & split ├── vectorstore.py # Step 5 — ChromaDB logic ├── rag_chain.py # Step 6 — RAG chain ├── app.py # Step 7 — Streamlit UI ├── requirements.txt └── README.md

→ View on GitHub

Recommended Tools for Production RAG

🌲 Pinecone

What I'd use when this goes to production. Fully managed, scales without you touching anything, and the free tier is generous enough to prototype with real data.

Try Free

🔷 Weaviate

If you need hybrid search — combining keyword matching with semantic search — Weaviate is the one to look at. Open-source, self-hostable, solid docs.

Open Source

🦜 LangChain Cloud

LangSmith is genuinely useful once you start debugging why your retrieval is returning the wrong chunks. Tracing makes RAG problems much easier to spot.

Get Started

Before You Close This Tab

What is RAGStop thinking of it as a fancy feature. It's just: retrieve relevant chunks, hand them to the LLM, get an answer that's actually grounded in your data.

Core StackLangChain handles the orchestration, ChromaDB holds your vectors, OpenAI does the embedding and generation. Each piece is replaceable.

Dev → ProdBuild everything with ChromaDB. When you're ready to ship, swap to Pinecone. Seriously, it's 3 lines.

Don't Skip Thistemperature=0, chunk_overlap=200, k=4 chunks retrieved. These defaults took me a while to land on — trust them until you have a reason to change them.

What's NextOnce this works, look into re-ranking (it meaningfully improves answer quality) and hybrid search if your documents have lots of proper nouns or codes that semantic search struggles with.

👩‍💻

Written By

Ragavi S

Independent tech writer based in India. I write about AI, Python, and developer tools — mostly things I've actually built and broken myself before writing about them. Founder of Tech Journalism. No sponsored opinions, no hype, just code that works.

🌐 Blog ⚙️ GitHub 💼 LinkedIn

TECH JOURNALISM — Independent Tech Publication · mrs-journalism.blogspot.com · 2026