Datasets

Where this fits: Datasets are part of Refine, after Signals. They turn real traces into reusable test cases for regression testing.

A dataset is a collection of rows you curate for testing and improving your agent. Each row holds an input, the agent’s output, an optional expected output, and arbitrary metadata. Teams use them as golden datasets: stable, known-good test sets that a fix has to keep passing.

The Datasets page listing golden datasets with name, description, and last updated

What a dataset row contains

Column	Description
Input	The input your agent received, for example the user message.
Output	What your agent actually returned.
Expected output	The correct or desired answer, used to check the agent. Optional, see Add expected output.
Metadata	Arbitrary fields carried alongside the row.

Beyond these built-ins you can add your own custom columns, and rename, reorder, or remove any column.

A dataset detail view showing rows with input, output, and expected output columns

Create a dataset

You can build a dataset three ways:

From real traces

Select traces from the trace list, search results, or a signal, and add them to a dataset. The most realistic test cases come straight from production.

Manually

Open Datasets in your project, create a new dataset, then Import a CSV or Add row to enter cases by hand.

From your coding agent

Through the MCP server, an agent like Claude or Cursor can create datasets and pull in the traces behind a signal for you.

How datasets are used

Regression testing: replay a dataset’s inputs against your agent and compare results to the expected outputs and your evaluations. See Regression testing.
Curating test sets: collect representative traces from Search and Signals into a stable, reusable set.
Sharing with your harness: export a dataset as CSV to drive tests in your own pipeline.

Next step

Add traces to a dataset: build a test set from real production traces.
Custom columns: add, rename, reorder, or remove columns.

Agent dispatch webhooks Add traces to a dataset

⌘I

Overview

Getting Started

Observe

Understand

Refine

Security and Compliance

Deployment

Development

More

What a dataset row contains

Create a dataset

From real traces

Manually

From your coding agent

How datasets are used

Next step

​What a dataset row contains

​Create a dataset

From real traces

Manually

From your coding agent

​How datasets are used

​Next step

What a dataset row contains

Create a dataset

How datasets are used

Next step