Overview

Datasets in Latitude are collections of data, used primarily for running prompt experiments in the playground or in the context of an evaluation. They allow you to test your prompts against a consistent set of inputs and expected outputs.

What is a Dataset?

A dataset consists of rows and columns, where:

Input Columns: Represent the parameters your prompt expects (e.g., customer_query, product_name).
Output/Label Columns (Optional): Contain the ground truth or expected outputs for specific inputs (e.g., expected_sentiment, ideal_summary). These are required for evaluations like Exact Match or Semantic Similarity.

Each row represents a single test case for your prompt.

Creating Datasets

You can create datasets in Latitude in several ways:

1. Uploading CSV Files

This is the most common method for bringing existing test data into Latitude.

Navigate to the “Datasets” section in your project.
Click “Upload Dataset”.
Drag and drop your CSV file or browse to select it.
Preview and Configure: Latitude will show a preview of your data. You may need to confirm:
- Column headers are correctly identified.
- Data types are inferred correctly.
Give your dataset a descriptive name.
Click “Create Dataset”.

2. Generating Synthetic Data

Latitude can use an AI model to generate synthetic datasets based on your specifications, useful for quickly creating test cases or exploring variations.

Navigate to the Datasets section.
Click Generate Dataset.
Describe the data you need:
- Specify the desired columns (e.g., user_query, expected_category).
- Provide instructions on the type of data for each column (e.g., “Generate realistic user support questions”, “Assign a category from [Billing, Technical, General]”).
- Indicate the number of rows to generate.
Click “Generate Dataset”.

The generator has limits on complexity and runtime. For large or very complex datasets, uploading a CSV is often more reliable. Start with smaller generation requests (e.g., 20-50 rows) to test.

3. Saving Logs as Datasets

You can create a new dataset directly from existing production logs, which is excellent for evaluating prompts against real-world interactions.

Navigate to the Logs section of one of your prompts.
Select the logs you want to include in the dataset.
Click the Save logs to Dataset button (or similar option).
Choose in the form whether to create a new dataset or save the logs to an existing dataset.
Confirm your selection

Managing Datasets

Once created, you can manage your datasets from the main “Datasets” page:

View: Click on a dataset name to view its contents.
Edit: Modify, add and remove dataset rows or columns.
Rename: Change the dataset’s name.
Download: Export the dataset as a CSV file.
Delete: Permanently remove a dataset.

Marking an Expected Output Column as a Label

You can mark an expected output column as a label by:

Click on the edit button next to the column’s name:
Set the column’s role to “label”:

Linking Datasets to Evaluations

The primary use of datasets is to run evaluations in batch mode:

Go to the specific evaluation you want to run (under a prompt’s “Evaluations” tab).
Initiate an Experiment in the evaluation.
Select the dataset you want to use.
If the evaluation requires ground truth (e.g., Exact Match), map the evaluation’s expected output requirement to the relevant column in your dataset (e.g., link expected_output to the ideal_summary column).

Latitude then runs the prompt for each row in the dataset and applies the evaluation, comparing the output to the corresponding data in the dataset row.

Next Steps

Learn about establishing Golden Datasets for Regression Testing
Understand how to Run Evaluations
Explore Using Datasets for Fine-tuning

Getting started

Prompts

Agents

Evaluations

Datasets

Experiments

Deployment

Self-Hosting

Support

What is a Dataset?

Creating Datasets

1. Uploading CSV Files

2. Generating Synthetic Data

3. Saving Logs as Datasets

Managing Datasets

Marking an Expected Output Column as a Label

Linking Datasets to Evaluations

Next Steps

Getting started

Prompts

Agents

Evaluations

Datasets

Experiments

Deployment

Self-Hosting

Support

​What is a Dataset?

​Creating Datasets

​1. Uploading CSV Files

​2. Generating Synthetic Data

​3. Saving Logs as Datasets

​Managing Datasets

​Marking an Expected Output Column as a Label

​Linking Datasets to Evaluations

​Next Steps

What is a Dataset?

Creating Datasets

1. Uploading CSV Files

2. Generating Synthetic Data

3. Saving Logs as Datasets

Managing Datasets

Marking an Expected Output Column as a Label

Linking Datasets to Evaluations

Next Steps