RAG Implementation
Learn how to implement a Retrieval-Augmented Generation (RAG) workflow with Latitude
This guide demonstrates how to implement a RAG (Retrieval-Augmented Generation) workflow using Latitude, OpenAI, and Pinecone.
Here’s the full code for this guide: RAG implementation
Overview
RAG is a technique that enhances LLM responses by providing relevant context from your data. The workflow consists of:
- Receiving a query from the LLM
- Converting the query into an embedding using OpenAI
- Finding relevant documents in Pinecone using vector similarity search
- Sending the retrieved context back to the LLM
Implementation
The example uses three main services:
- Latitude for LLM orchestration
- OpenAI for generating embeddings
- Pinecone as the vector database
Prerequisites
You’ll need:
- A Latitude API key and project ID
- An OpenAI API key
- A Pinecone API key and index
- Node.js and npm/yarn/pnpm installed
Implementation
Environment Variables
Set up these environment variables:
Initialize the services
First, initialize the services:
The RAG Query Tool
The ragQueryTool
function handles the core RAG functionality:
This function:
- Takes a query string and converts it to an embedding using OpenAI’s text-embedding-3-small model
- Searches Pinecone for the 10 most similar vectors
- Returns the matching documents’ metadata (title and content)
Handling the Conversation Flow
The conversation flow is managed through Latitude:
This section:
- Runs a Latitude document (in this case, ‘geography-quizz’)
- Gets the conversation history and latest message
- If the last message is from the assistant and includes tool calls:
- Extracts the query from the tool call
- Runs the RAG query tool
- Sends the results back to the conversation using Latitude’s chat API