What is Retrieval-Augmented Generation?
Retrieval-Augmented Generation (RAG) is a prompting technique that enhances large language model (LLM) responses by dynamically retrieving relevant information from external knowledge sources before generating a response. Rather than relying solely on the model’s internal knowledge, RAG incorporates up-to-date, specific, and contextually relevant information from external databases, documents, or knowledge bases.
Why Use Retrieval-Augmented Generation?
- Factual Accuracy: Access to external knowledge reduces hallucinations and factual errors
- Up-to-Date Information: Retrieves current information beyond the model’s training data
- Domain Specialization: Can access domain-specific knowledge not well-represented in general LLM training
- Knowledge Grounding: Provides citations and sources for statements to increase trustworthiness
- Scalable Knowledge: Can access vast amounts of knowledge without fine-tuning the base model
- Customizable Responses: Tailor responses based on your specific knowledge repositories
Basic Implementation in Latitude
Here’s a simple RAG implementation using Latitude:
---
provider: OpenAI
model: gpt-4o
temperature: 0.7
tools:
- search_knowledge_base:
description: Retrieve information from the knowledge base
parameters:
type: object
additionalProperties: false
required: ['query']
properties:
query:
type: string
description: The search query to retrieve relevant information
---
# Knowledge-Enhanced Response
Answer the user's question about {{ topic }} by first retrieving relevant information.
## User Question:
{{ user_question }}
## Information Retrieval:
I'll search for the most relevant information to answer this question.
## Comprehensive Answer:
Based on the retrieved information, I'll now provide a complete and accurate answer to the question.
Advanced Implementation with Multiple Sources
This example shows a more sophisticated RAG implementation that retrieves information from multiple sources and evaluates their relevance:
---
provider: OpenAI
model: gpt-4o
temperature: 0.3
tools:
- search_knowledge_base:
description: Retrieve information from the primary knowledge base
parameters:
type: object
additionalProperties: false
required: ['query']
properties:
query:
type: string
description: The search query to retrieve relevant information
- search_recent_documents:
description: Retrieve information from recent documents for time-sensitive information
parameters:
type: object
additionalProperties: false
required: ['query']
properties:
query:
type: string
description: The search query to retrieve recent information
---
<step>
# Initial Information Gathering
## User Question:
{{ user_question }}
## Primary Knowledge Search:
Let me search our main knowledge base for relevant information.
## Recent Documents Search:
I'll also check recent documents for any updates or new information.
</step>
<step>
# Information Synthesis
## Retrieved Information:
Let me analyze and synthesize the information from both sources:
1. **Main Knowledge Base Findings:**
- Key facts and concepts retrieved
- Relevant background information
2. **Recent Documents Findings:**
- Updates or new information
- Changes to previously established information
## Information Evaluation:
- Relevance score of each retrieved piece
- Consistency between sources
- Recency and reliability assessment
</step>
<step>
# Final Response Generation
Based on the synthesized information, I'll now provide a comprehensive answer:
## Answer:
[Comprehensive answer to the user's question]
## Sources:
[List of sources used with citations]
## Confidence Level:
[Assessment of confidence based on quality and consistency of retrieved information]
</step>
Domain-Specific RAG Implementation
This example shows how to implement RAG for a specific domain (medical information):
---
provider: OpenAI
model: gpt-4o
temperature: 0.2
tools:
- search_medical_database:
description: Retrieve information from verified medical databases
parameters:
type: object
additionalProperties: false
required: ['query', 'search_depth']
properties:
query:
type: string
description: The medical search query
search_depth:
type: string
enum: ['basic', 'detailed', 'comprehensive']
description: The depth of search to perform
---
# Medical Information Assistant
I'll answer your medical question by retrieving information from verified medical databases.
## Medical Question:
{{ medical_question }}
## Information Retrieval:
Searching medical databases for clinically validated information...
## Medical Answer:
Based on information from verified medical sources:
[Detailed answer with medical references]
## Important Notice:
This information is for educational purposes only and not a substitute for professional medical advice. Always consult a healthcare provider for medical concerns.
Best Practices for RAG
To implement retrieval-augmented generation effectively:
-
Optimize Search Queries
- Extract key entities and concepts from user questions
- Use query expansion to find related information
- Implement query reformulation techniques
-
Vector Database Setup
- Choose appropriate embedding models for your content
- Implement chunking strategies based on content type
- Use metadata filtering to improve retrieval precision
-
Result Processing
- Rank results by relevance and recency
- Filter out irrelevant or low-quality retrievals
- Rerank results based on semantic similarity
-
Source Integration
- Include source attribution in responses
- Assess source credibility and prioritize reliable sources
- Handle conflicting information from multiple sources
-
Information Synthesis
- Combine information from multiple sources coherently
- Identify and resolve contradictions
- Maintain the context and maintain factual consistency
Integrating RAG with the Latitude SDK
Here’s how to implement RAG using the Latitude SDK with external knowledge sources:
import { Pinecone } from '@pinecone-database/pinecone'
import OpenAI from 'openai'
import { Latitude } from '@latitude-data/sdk'
type Tools = { search_knowledge_base: { query: string } }
const PINECONE_INDEX_NAME = 'your-knowledge-index'
async function run() {
const sdk = new Latitude(process.env.LATITUDE_API_KEY, {
projectId: Number(process.env.PROJECT_ID),
versionUuid: 'live',
})
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY })
const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY })
const pc = pinecone.Index(PINECONE_INDEX_NAME)
const userQuestion = "What's the latest research on quantum computing applications?"
const result = await sdk.prompts.run<Tools>('rag/quantum-research', {
parameters: { user_question: userQuestion },
tools: {
search_knowledge_base: async ({ query }) => {
// Create embedding from query
const embedding = await openai.embeddings
.create({
input: query,
model: 'text-embedding-3-small',
})
.then((res) => res.data[0].embedding)
// Search vector database
const searchResults = await pc.query({
vector: embedding,
topK: 5,
includeMetadata: true,
})
// Format and return results
return searchResults.matches
.map((match) => match.metadata?.content)
.join('\n\n')
},
},
})
console.log('AI Response:', result.response.text)
}
Advanced RAG Techniques
Recursive Retrieval
Implement multi-hop retrieval for complex questions:
---
provider: OpenAI
model: gpt-4o
temperature: 0.5
tools:
- search_knowledge_base:
description: Retrieve information from the knowledge base
parameters:
type: object
additionalProperties: false
required: ['query']
properties:
query:
type: string
description: The search query
---
<step>
# Question Decomposition
## Original Question:
{{ complex_question }}
## Sub-questions:
Let me break this down into smaller, answerable sub-questions:
1. [First sub-question]
2. [Second sub-question]
3. [Third sub-question]
</step>
<step>
# Progressive Retrieval
I'll search for information to answer each sub-question:
## First Sub-question Search:
[Search and retrieve information]
## Second Sub-question Search:
[Search and retrieve information]
## Third Sub-question Search:
[Search and retrieve information]
</step>
<step>
# Integrated Response
Based on all the information retrieved for each sub-question:
## Comprehensive Answer:
[Integrated answer that addresses the original complex question]
</step>
Hybrid Retrieval
Combine different retrieval methods for better results:
---
provider: OpenAI
model: gpt-4o
temperature: 0.3
tools:
- keyword_search:
description: Traditional keyword-based search
parameters:
type: object
additionalProperties: false
required: ['query']
properties:
query:
type: string
- semantic_search:
description: Vector-based semantic search
parameters:
type: object
additionalProperties: false
required: ['query']
properties:
query:
type: string
---
# Hybrid Retrieval System
## User Query:
{{ user_query }}
## Keyword Search:
I'll perform a traditional keyword search for exact matches.
## Semantic Search:
I'll also conduct a semantic search to find conceptually related information.
## Combined Results:
Analyzing and combining results from both search methods to provide the most comprehensive answer:
[Comprehensive answer combining both search approaches]
Retrieval-Augmented Generation works well when combined with other prompting techniques:
-
Chain-of-Thought with RAG: Combine retrieved information with step-by-step reasoning for complex problem-solving.
-
Self-Consistency and RAG: Generate multiple RAG-enhanced responses and select the most consistent one.
-
Few-Shot Learning with RAG: Augment few-shot examples with retrieved information to improve performance on specialized tasks.
-
Constitutional AI with RAG: Use retrieved guidelines or policies to ensure AI responses comply with specific rules.
-
Template-Based Prompting with RAG: Use retrieved information to fill in template slots for more accurate and contextual responses.
Real-World Applications
RAG is particularly valuable in these domains:
- Enterprise Knowledge Management: Access to internal documents, policies, and knowledge bases
- Legal Research: Retrieving relevant case law, statutes, and legal opinions
- Medical Information Systems: Accessing up-to-date medical research and clinical guidelines
- Customer Support: Retrieving product information and troubleshooting guides
- Educational Platforms: Providing accurate and source-backed answers to student questions
- Financial Analysis: Accessing market data and financial reports for informed analysis