Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.latitude.so/llms.txt

Use this file to discover all available pages before exploring further.

Scores

Scores are the universal measurement unit in Latitude. Every verdict on your agent’s interactions, whether from an automated evaluation, a human annotation, or your own code, is a score. Everything in Latitude’s reliability system is built on top of scores: issues, evaluation dashboards, annotation workflows, simulations, and analytics.

What Is a Score

A score is a verdict attached to a trace. Every score has:
FieldDescription
ValueA number between 0 and 1
Pass / FailWhether the interaction met expectations
FeedbackText explaining the verdict
SourceWhere the score came from: evaluation, annotation, or custom
Scores can also carry resource usage fields like duration, token count, and cost. Scores are always associated with a trace. They can optionally be associated with a specific span, a session, a simulation, or an issue.

Score Sources

Every score has a source that identifies how it was produced:

Evaluation Scores

Produced by automated scripts that Latitude runs on your traces. When a trace matches an evaluation’s trigger configuration, the evaluation executes and writes a score. Evaluation scores are the backbone of continuous monitoring: they run on every matching trace automatically, giving you real-time quality visibility.

Annotation Scores

Produced by human reviewers. When someone annotates a trace through an annotation queue or inline from the trace view, their verdict becomes a score. Annotation scores serve as ground truth. They represent what a human actually thinks about the agent’s behavior and anchor evaluation alignment metrics.

Custom Scores

Submitted by your own code through the Latitude API. Use custom scores for domain-specific quality signals:
  • User satisfaction ratings
  • Task completion metrics
  • Business KPIs (conversion rates, resolution rates)
  • Downstream validation (was the agent’s output actually correct?)

How Scores Work

Scores from human annotations start as drafts. A draft score:
  • Persists immediately so it survives page refreshes
  • Is visible in annotation queue review and in-progress editing
  • Does not appear in analytics, issue discovery, or alignment metrics
  • Can be edited and revised while still in draft state
Drafts are finalized automatically after a quiet period (default: 5 minutes after the last edit). System-created drafts (from automatic queue classification) wait for explicit human review before finalization. Once a score is finalized, it becomes permanent and cannot be edited.

How Scores Flow Through the System

Scores feed forward into every part of Latitude:
  1. Issue Discovery: When scores fail, Latitude groups similar failures into issues: named, trackable failure patterns your team can investigate and resolve.
  2. Evaluation Generation: Issues can generate monitoring evaluations that watch for that failure pattern on live traffic, producing more scores.
  3. Alignment: Annotation scores are compared against evaluation scores for the same traces, producing alignment metrics that tell you how well automated evaluations match human judgment.
  4. Analytics: Finalized scores feed into time-series dashboards showing quality trends across your project.

Next Steps

  • Annotations: How human reviewers create scores
  • Evaluations: How automated scripts create scores
  • Issues: How failed scores become trackable failure patterns
  • Analytics: Visualizing score trends
  • Scores API: Submit custom scores programmatically