What is a Dataset?
A dataset consists of rows and columns, where:- Input Columns: Represent the parameters your prompt expects (e.g.,
customer_query
,product_name
). - Output/Label Columns (Optional): Contain the ground truth or expected outputs for specific inputs (e.g.,
expected_sentiment
,ideal_summary
). These are required for evaluations like Exact Match or Semantic Similarity.

Creating Datasets
You can create datasets in Latitude in several ways:1. Uploading CSV Files
This is the most common method for bringing existing test data into Latitude.- Navigate to the “Datasets” section in your project.
- Click “Upload Dataset”.
- Drag and drop your CSV file or browse to select it.
- Preview and Configure: Latitude will show a preview of your data. You may need to confirm:
- Column headers are correctly identified.
- Data types are inferred correctly.
- Give your dataset a descriptive name.
- Click “Create Dataset”.
2. Generating Synthetic Data
Latitude can use an AI model to generate synthetic datasets based on your specifications, useful for quickly creating test cases or exploring variations.- Navigate to the Datasets section.
- Click Generate Dataset.
- Describe the data you need:
- Specify the desired columns (e.g.,
user_query
,expected_category
). - Provide instructions on the type of data for each column (e.g., “Generate realistic user support questions”, “Assign a category from [Billing, Technical, General]”).
- Indicate the number of rows to generate.
- Specify the desired columns (e.g.,
- Click “Generate Dataset”.
The generator has limits on complexity and runtime. For large or very complex
datasets, uploading a CSV is often more reliable. Start with smaller
generation requests (e.g., 20-50 rows) to test.
3. Saving Logs as Datasets
You can create a new dataset directly from existing production logs, which is excellent for evaluating prompts against real-world interactions.- Navigate to the Logs section of one of your prompts.
- Select the logs you want to include in the dataset.
- Click the Save logs to Dataset button (or similar option).
- Choose in the form whether to create a new dataset or save the logs to an existing dataset.
- Confirm your selection
Managing Datasets
Once created, you can manage your datasets from the main “Datasets” page:- View: Click on a dataset name to view its contents.
- Edit: Modify, add and remove dataset rows or columns.
- Rename: Change the dataset’s name.
- Download: Export the dataset as a CSV file.
- Delete: Permanently remove a dataset.
Marking an Expected Output Column as a Label
You can mark an expected output column as a label by:- Click on the edit button next to the column’s name:
- Set the column’s role to “label”:
Linking Datasets to Evaluations
The primary use of datasets is to run evaluations in batch mode:- Go to the specific evaluation you want to run (under a prompt’s “Evaluations” tab).
- Initiate an Experiment in the evaluation.
- Select the dataset you want to use.
- If the evaluation requires ground truth (e.g., Exact Match), map the evaluation’s expected output requirement to the relevant column in your dataset (e.g., link
expected_output
to theideal_summary
column).
Next Steps
- Learn about establishing Golden Datasets for Regression Testing
- Understand how to Run Evaluations
- Explore Using Datasets for Fine-tuning