LLM Learning Portal

Retrieval-Augmented Generation (RAG)

20/30

RAG Fundamentals

Retrieval-Augmented Generation (RAG) enhances LLMs by retrieving relevant information from external knowledge sources and incorporating it into the generation process.

Core Advantages

  • Up-to-date Information

    LLMs can access information beyond their training cutoff date

  • Factual Grounding

    Reduces hallucinations by connecting responses to verified sources

  • Domain Adaptation

    Tailors general LLMs to specific domains without fine-tuning

  • Source Attribution

    Enables citation and verification of information sources

Why RAG over Fine-tuning?

Aspect RAG Fine-tuning
Information updates Dynamic & immediate Requires retraining
Implementation Simpler architecture Complex training process
Resource requirements Lower compute needs High compute & data needs
Data transparency Clear provenance Black-box knowledge
Best for Knowledge-intensive tasks Style & capability adaption
In practice, RAG and fine-tuning are often used together for optimal results

RAG Architecture

Basic RAG Pipeline

1. User Query

2. Retrieval System

Finds relevant documents

3. Context Augmentation

Enhances prompt with retrieved info

4. LLM Generation

Creates response using augmented context

Key Components

Indexing Pipeline

Processes and organizes knowledge sources for efficient retrieval

  1. Document ingestion (PDFs, websites, databases)
  2. Chunking into manageable segments
  3. Embedding generation with vector models
  4. Storage in vector databases for semantic search
Retrieval Mechanisms

Methods for finding the most relevant context

Dense Retrieval

Uses semantic similarity between query and document embeddings

Sparse Retrieval

Keyword-based methods like BM25, TF-IDF

Hybrid Retrieval

Combines dense and sparse methods for better results

Re-ranking

Further refines initial search results for relevance

Integration with LLM

How retrieved information is used in generation

Prompt Engineering

Structuring retrieved content effectively in the prompt

Contextual Relevance

Ensuring retrieved information actually addresses the query

Citation & Attribution

Tracking sources through the generation process

Advanced RAG Techniques

Enhanced Retrieval

Query Transformation

Improving query effectiveness before retrieval

Query Expansion

Adding related terms to improve recall

Hypothetical Document Embeddings (HyDE)

Using an LLM to generate a hypothetical answer, then embedding that for retrieval

Query Decomposition

Breaking complex queries into simpler sub-queries

Multi-step Retrieval

Iterative approaches to finding better context

Self-RAG

LLM evaluates and improves its own retrievals

FLARE (Forward-Looking Active REtrieval)

Dynamically retrieves information during generation

ReAct

Interleaving reasoning and retrieval actions

Optimizing RAG Systems

Chunking Strategies

Better document segmentation for more precise retrieval

Fixed-size Chunks

Simple but may split related content

Semantic Chunking

Based on content meaning & structure

Hierarchical Chunking

Multiple levels of granularity

Sliding Window

Overlapping chunks to preserve context

Context Integration

Better ways to incorporate retrieved information

Fusion-in-Decoder

Processing multiple retrieved passages in parallel

Context Compression

Summarizing or distilling retrieved documents before use

Weighted Fusion

Prioritizing more relevant contexts in the final response

Building Effective RAG Systems

Data Preparation

Considerations:

  • Content quality & relevance
  • Data cleaning & normalization
  • Metadata enrichment
  • Regular updates & versioning
Technical Implementation

Popular Tools:

LangChain
LlamaIndex
Pinecone
Weaviate
Chroma
Qdrant
Evaluation

Key Metrics:

  • Relevance of retrieved contexts
  • Answer correctness & completeness
  • Hallucination reduction
  • End-to-end latency
The quality of RAG systems is fundamentally limited by two factors: the quality of the retrieval component and the ability of the LLM to properly utilize the retrieved information.
Implementation Example
from langchain import LLMChain
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate

# 1. Create vector store from documents
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(documents, embeddings)

# 2. Create retriever
retriever = vectorstore.as_retriever()

# 3. Define RAG prompt template
template = """Answer based on the following context:
{context}

Question: {question}
Answer: """

# 4. Create chain that combines retrieval and generation
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | PromptTemplate.from_template(template)
    | OpenAI()
)
Case Studies

Enterprise Search

Connecting LLMs to internal documents, wikis, and knowledge bases

Examples: Perplexity AI, GitHub Copilot for Business

Legal Contract Analysis

RAG systems that connect to case law and precedent databases

Examples: Harvey AI, Casetext CoCounsel

Medical Decision Support

Systems that retrieve from medical literature and patient records

Examples: Mayo Clinic AI, Nabla Copilot