RAG β Retrieval-Augmented Generation¶
Give a model access to your knowledge β documents, wikis, tickets β so it answers from facts instead of guessing. RAG is the most common way to build accurate, up-to-date AI features.
Overview¶
LLMs only know what was in their training data, up to a cutoff date, and they don't know your private documents at all. Retrieval-Augmented Generation (RAG) solves both: at query time you retrieve the most relevant pieces of your content and put them in the prompt, so the model generates an answer grounded in those facts β with citations you can verify.
Learning Objectives¶
By the end of this section you will be able to:
- Explain the RAG pipeline end-to-end and when to use it.
- Chunk documents effectively (the biggest quality lever).
- Store and search embeddings with a vector database.
- Improve retrieval with hybrid search and reranking.
- Evaluate a RAG system so you can improve it with confidence.
The RAG pipeline¶
RAG has two phases: an offline indexing phase and an online query phase.
flowchart TB
subgraph "Indexing (offline, once per update)"
D[Documents] --> C[Chunk] --> E1[Embed chunks] --> V[(Vector DB)]
end
subgraph "Query (online, per request)"
Q[User question] --> E2[Embed query]
E2 --> S[Search: find top-k<br/>relevant chunks]
V --> S
S --> RR[Optional: rerank]
RR --> P[Build prompt:<br/>question + chunks]
P --> L[LLM]
L --> A[Grounded answer<br/>+ citations]
end
What you'll learn¶
-
How you split documents determines what can be retrieved. The highest-leverage decision in RAG.
-
Store embeddings and find nearest neighbors fast, at scale.
-
Combine keyword + semantic search, then reorder for precision.
-
Measure retrieval and answer quality so you can actually improve.
When to use RAG (and when not to)¶
| Use RAG when⦠| Consider alternatives when⦠|
|---|---|
| Answers must come from your private/changing data | The knowledge is static and small β just put it in the prompt |
| You need citations and verifiability | You need new skills/behavior β fine-tuning |
| The knowledge base is large | The task needs actions β tools/agents |
RAG is a retrieval problem first
Most "the LLM gave a bad answer" problems in RAG are actually retrieval problems β the right information never made it into the prompt. Debug retrieval before blaming the model.
Prerequisites¶
This section assumes you understand Embeddings. If "cosine similarity" is unfamiliar, read that first.