Skip to main content
Where this fits: This is the verification step of Refine. It takes a dataset built from a fixed signal and proves the fix holds before and after you ship.
A regression test replays a set of known inputs against your agent and checks the results, so a failure you already fixed cannot return unnoticed. In Latitude, the inputs come from a dataset of real traces, and the checks reuse the same evaluations that monitor production, so your test quality bar matches your production quality bar.

Fix and verify with your coding agent

The fastest path today pairs the MCP server with a dataset, so your coding agent does the work:
1

Bring the signal into your editor

Connect your coding agent (Claude, Cursor, and others) to Latitude through the MCP server. It can read the failing signal, inspect the example traces, and propose a fix in the same session.
2

Capture the failing traces as a dataset

Turn the traces behind the signal into a dataset, the seed of your regression test. Your agent can do this through the MCP, or you can add the traces from the UI.
3

Add the expected behaviour

Record what the agent should have done by adding expected output to the rows you want to check precisely.
4

Replay and check

Run your agent against each row’s input to produce fresh outputs, then run the signal’s evaluations against them. The same check that found the failure in production now verifies the fix.
5

Gate and repeat

Pass when the results meet your quality bar; fail to block a regression. Re-run whenever the agent, prompts, tools, or models change.

Run it in CI

You can drive a regression test from a dataset in your own pipeline:
  • Export a dataset as CSV and replay its inputs in your own test harness.
  • Submit the results back as scores through the Scores API, so regression results live alongside your production data, and gate the build on the outcome.

Next step

  • Datasets: turn the traces behind a signal into a reusable test set.