Overview
This guide shows you how to get full LLM observability for ElevenLabs Agents in Latitude. ElevenLabs Agents is a fully hosted platform: speech-to-text, the LLM loop, and text-to-speech all run on ElevenLabs’ infrastructure, and ElevenLabs does not export traces to third parties. When your agent uses an ElevenLabs-managed LLM, the model calls are not observable from outside the platform. The way in is ElevenLabs’ Custom LLM feature: point your agent at an OpenAI-compatible server you run, and instrument that server with Latitude. Your server receives the real LLM traffic — system prompt, full conversation history, and tool definitions — so Latitude captures every turn with actual prompts, completions, token usage, and latency.Your server is a thin streaming proxy in front of any OpenAI-compatible
provider. The agent keeps running on ElevenLabs exactly as before — only the
LLM calls route through code you can observe.
Using ElevenLabs only as the TTS/STT plugin inside LiveKit Agents? You
don’t need this guide — instrument the LiveKit side instead. See
LiveKit Agents.
Requirements
- A Latitude account and API key
- A Latitude project slug
- An ElevenLabs agent and an API key for an OpenAI-compatible LLM provider
- A publicly reachable URL for your server (use a tunnel like ngrok during development)
Steps
Run an instrumented custom LLM server
Expose a Run it with
/v1/chat/completions endpoint that forwards requests to your LLM provider and streams the response back as Server-Sent Events. Initializing Latitude with the OpenAI instrumentation is all it takes for every forwarded call to be traced.- Python
- TypeScript
uvicorn server:app --port 8013.ElevenLabs requires streaming responses (
Content-Type: text/event-stream),
and your provider must support OpenAI-style function calling if your agent
uses tools — ElevenLabs system tools (end_call, transfer_to_agent, etc.)
arrive in the standard tools parameter.Point your ElevenLabs agent at the server
In the ElevenLabs dashboard, open your agent’s settings:
- In the LLM dropdown, select Custom LLM
- Enter your Server URL (e.g.
https://your-server.example.com/v1) and the Model ID your provider expects - Under API key, create a secret with your LLM provider’s key — ElevenLabs forwards it as the
Authorizationheader - Publish the agent
What you get
Because your server receives the exact requests ElevenLabs builds for the model, Latitude shows each conversation turn as a real LLM call:- Input messages — the agent’s system prompt and the full conversation history so far
- Output messages — the assistant response text and any tool calls the model emitted
- Tool definitions — your agent tools and ElevenLabs system tools, as sent in the request
- Model, token usage, and latency — from the provider’s streamed response
Grouping turns into conversations
Each agent turn is a separate LLM call, so by default turns appear as separate traces. To group them, enable Custom LLM extra body in your agent’s LLM settings and pass identifiers (e.g. a conversation id) from your client as overrides — they arrive on the request aselevenlabs_extra_body. Use them with capture() to set session_id and user_id on the spans; see the Python SDK or TypeScript SDK guides.
Seeing Your Traces
Once connected, traces appear automatically in Latitude:- Open your project in the Latitude dashboard
- Each agent turn shows the LLM call with its input/output conversation
- Token usage and latency are aggregated at every level
ElevenLabs-managed LLMs (where you don’t bring your own endpoint) cannot be
traced this way — the calls never leave ElevenLabs’ infrastructure. For those
agents, conversation transcripts are available via ElevenLabs’
post-call webhooks
and Conversations API.