Scores

Scores are the universal measurement unit in Latitude. Every verdict on your agent’s interactions, whether from an automated evaluation, a human annotation, or your own code, is a score. Everything in Latitude’s reliability system is built on top of scores: issues, evaluation dashboards, annotation workflows, simulations, and analytics.

What Is a Score

A score is a verdict attached to a trace. Every score has:

Field	Description
Value	A number between 0 and 1
Pass / Fail	Whether the interaction met expectations
Feedback	Text explaining the verdict
Source	Where the score came from: evaluation, annotation, or custom

Scores can also carry resource usage fields like duration, token count, and cost. Scores are always associated with a trace. They can optionally be associated with a specific span, a session, a simulation, or an issue.

Score Sources

Every score has a source that identifies how it was produced:

Evaluation Scores

Produced by automated scripts that Latitude runs on your traces. When a trace matches an evaluation’s trigger configuration, the evaluation executes and writes a score. Evaluation scores are the backbone of continuous monitoring: they run on every matching trace automatically, giving you real-time quality visibility.

Annotation Scores

Produced by human reviewers or by built-in flaggers. When someone annotates a trace inline from the trace view — or when a flagger matches a trace automatically — the resulting verdict becomes a score. Annotation scores serve as ground truth. They represent what a human (or a Latitude-defined classifier) thinks about the agent’s behavior and anchor evaluation alignment metrics.

Custom Scores

Submitted by your own code through the Latitude API. Use custom scores for domain-specific quality signals:

User satisfaction ratings
Task completion metrics
Business KPIs (conversion rates, resolution rates)
Downstream validation (was the agent’s output actually correct?)

How Scores Work

Scores from human annotations start as drafts. A draft score:

Persists immediately so it survives page refreshes
Is visible in the trace’s annotation panel while you’re still editing
Does not appear in analytics, issue discovery, or alignment metrics
Can be edited and revised while still in draft state

Drafts are finalized automatically after a quiet period (default: 5 minutes after the last edit). Flagger-created annotations are written as published scores directly and skip the draft state. Once a score is finalized, it becomes permanent and cannot be edited.

How Scores Flow Through the System

Scores feed forward into every part of Latitude:

Issue Discovery: When scores fail, Latitude groups similar failures into issues: named, trackable failure patterns your team can investigate and resolve.
Evaluation Generation: Issues can generate monitoring evaluations that watch for that failure pattern on live traffic, producing more scores.
Alignment: Annotation scores are compared against evaluation scores for the same traces, producing alignment metrics that tell you how well automated evaluations match human judgment.
Analytics: Finalized scores feed into time-series dashboards showing quality trends across your project.

Next Steps

Annotations: How human reviewers create scores
Evaluations: How automated scripts create scores
Issues: How failed scores become trackable failure patterns
Analytics: Visualizing score trends
Scores API: Submit custom scores programmatically

Getting Started

Telemetry

Observability

Search

Evaluations

Annotations

Scores

Issues

Simulations

Scores Overview

Scores

What Is a Score

Score Sources

Evaluation Scores

Annotation Scores

Custom Scores

How Scores Work

How Scores Flow Through the System

Next Steps

Getting Started

Telemetry

Observability

Search

Evaluations

Annotations

Scores

Issues

Simulations

Documentation Index

​Scores

​What Is a Score

​Score Sources

​Evaluation Scores

​Annotation Scores

​Custom Scores

​How Scores Work

​How Scores Flow Through the System

​Next Steps

Scores

What Is a Score

Score Sources

Evaluation Scores

Annotation Scores

Custom Scores

How Scores Work

How Scores Flow Through the System

Next Steps