Documentation Index
Fetch the complete documentation index at: https://docs.latitude.so/llms.txt
Use this file to discover all available pages before exploring further.
Evaluations Overview
Evaluations are automated monitors that score incoming traces. They track known issue patterns, catch regressions, and show whether a production problem is getting better or worse. Latitude can use different strategies depending on the issue. Some monitors check structural signals, while others use LLM judgment for behavior that requires semantic understanding. For issue-generated evaluations, Latitude chooses the strategy from the available examples and feedback.What Is an Evaluation
An evaluation defines a quality check for traces. Each evaluation has:- A name: The issue or behavior being monitored
- A description: What the evaluation is trying to detect
- A detection strategy: How Latitude decides whether a trace matches the issue
- A trigger configuration: Which traces to monitor and at what sample rate
How Evaluations Work
- A trace completes in your project.
- Latitude checks whether it matches each active evaluation’s trigger configuration.
- Matching evaluations analyze the trace.
- Each evaluation returns a pass/fail verdict with feedback.
- Latitude creates a score attached to the trace.
- Failed scores feed back into issue discovery.
Evaluation Strategies
Clear structural failures, such as tool errors or empty responses, can often be monitored directly. Semantic behavior, such as relevance, refusal quality, or whether an answer resolved the user’s request, may need LLM judgment. You do not need to choose the strategy manually for issue-generated evaluations. Latitude builds a monitor from the issue’s traces and scores.Realignment
Evaluations improve as more evidence arrives. New annotations, flagger matches, evaluation results, and custom scores help Latitude keep monitors calibrated to recent examples and human judgment. See Alignment.Creating Evaluations
From Issues
Generate an evaluation from an issue to monitor that failure pattern on future traces.From Known Requirements
You can also create evaluations for behaviors you already know you want to enforce, such as answer completeness, policy compliance, formatting requirements, or task success.Evaluation Lifecycle
- Active: Monitoring matching traces in real time
- Paused: Temporarily disabled by setting sampling to
0; configuration is preserved - Archived: Read-only and no longer monitoring new traces
- Deleted: Removed from management views while historical results remain represented in analytics