RAG (Retrieval Augmented Generation), which is also referred to as **grounding** (especially in the Google ecosystem) is a method for bringing additional context to an LLM at inference to improve responses.
There are many flavors of RAG and many more to come. A subfield of RAG that employs knowledge graphs is known as [[graph RAG]]. Cache RAG passes the entire knowledge base along with each prompt as key value pairs.
RAG involves a workflow including
- [[chunking]]
A dense retriever is the mechanism by which modern RAG systems retrieve passages for Q&A systems. In the past TF-IDF and then BM25 were used, which are keyword based. A dense retriever is a dense embedding that encodes semantics.