RAG
Retrieval-Augmented Generation
Inject relevant context into an LLM prompt at runtime, retrieved from your own data, so the model answers from your knowledge instead of its training data.
RAG is the architecture that lets an LLM answer questions about data it was not trained on. The mechanism is straightforward: take the user’s question, retrieve the most relevant chunks from your corpus (via vector search, keyword search, or a hybrid), stuff them into the prompt, and let the model answer. It is the grounding layer underneath most useful AI agents.
The reason RAG exists: training a model on your private data is expensive, slow, and obsolete the moment your data changes. RAG sidesteps that by treating the data as runtime context. The trade-off is that retrieval quality becomes the bottleneck: a model with bad context produces confidently wrong answers.
Worked example of the classic failure: a company builds a support bot over its help docs, the demo is impressive, and then in production it confidently cites the wrong refund policy. The instinct is to blame the model and try a bigger one. The answers stay wrong, because the problem is upstream: the docs were chunked mid-sentence, the embeddings cannot tell “refund” from “return”, and there is no reranker to push the right passage to the top. Swap in better chunking, hybrid search, and a reranker and the same model suddenly looks smart. The lesson generalises: when RAG is wrong, suspect retrieval before generation almost every time.
The honest trade-off and where RAG breaks down: it excels at lookup-shaped questions answerable from a few passages, and degrades when the answer requires synthesising across many documents or reconciling contradictions in the corpus. At that point you want a smaller curated corpus, an agentic pipeline that reasons in steps, or fine-tuning, not a bigger retriever. The unsexy truth is that 80% of the work is making retrieval good (chunking, embeddings, reranking, hybrid search) and 20% is the model. Wrapping your data sources as MCP servers makes the retrieval layer portable, but it does not make it good. Vendors that pitch RAG as a one-click feature are pitching the easy part.