Where this fits: Datasets are part of Refine, after Signals. They turn real traces into reusable test cases for regression testing.

What a dataset row contains
| Column | Description |
|---|---|
| Input | The input your agent received, for example the user message. |
| Output | What your agent actually returned. |
| Expected output | The correct or desired answer, used to check the agent. Optional, see Add expected output. |
| Metadata | Arbitrary fields carried alongside the row. |

Create a dataset
You can build a dataset three ways:From real traces
Select traces from the trace list, search results, or a signal, and add them to a dataset. The most realistic test cases come straight from production.
Manually
Open Datasets in your project, create a new dataset, then Import a CSV or Add row to enter cases by hand.
From your coding agent
Through the MCP server, an agent like Claude or Cursor can create datasets and pull in the traces behind a signal for you.
How datasets are used
- Regression testing: replay a dataset’s inputs against your agent and compare results to the expected outputs and your evaluations. See Regression testing.
- Curating test sets: collect representative traces from Search and Signals into a stable, reusable set.
- Sharing with your harness: export a dataset as CSV to drive tests in your own pipeline.
Next step
- Add traces to a dataset: build a test set from real production traces.