Retrieval Augmented Generation

RAG (Retrieval Augmented Generation), which is also referred to as **grounding** (especially in the Google ecosystem) is a method for bringing additional context to an LLM at inference to improve responses. There are many flavors of RAG and many more to come. - A subfield of RAG that employs knowledge graphs is known as [[graphRAG]]. - Cache RAG passes the entire knowledge base along with each prompt as key value pairs. RAG involves a workflow including - [[chunking]] A dense retriever is the mechanism by which modern RAG systems retrieve passages for Q&A systems. In the past TF-IDF and then BM25 were used, which are keyword based. A dense retriever is a dense embedding that encodes semantics.