Using Datasets for Fine-tuning
Export Latitude datasets to prepare data for fine-tuning language models.
While Latitude focuses on prompt engineering and evaluation, the datasets you create and curate within the platform can be valuable assets for fine-tuning language models using external tools and services.
Why Use Latitude Datasets for Fine-tuning?
- Curated Data: Datasets often contain carefully selected inputs and high-quality outputs (either expected outputs or actual model responses reviewed manually).
- Real-World Examples: Datasets created from production logs represent actual user interactions.
- Structured Format: Latitude datasets are already in a structured format (CSV), making them easier to process for fine-tuning.
Exporting Datasets from Latitude
- Navigate to the “Datasets” section in your project.
- Locate the dataset you want to use for fine-tuning.
- Find the option to Download or Export the dataset (usually represented by a download icon).
- Save the resulting CSV file to your local machine.
Preparing Data for Fine-tuning
Once exported, you’ll likely need to transform the CSV data into the specific format required by your chosen fine-tuning platform or library (e.g., OpenAI’s JSONL format, Hugging Face datasets format).
Common steps include:
- Selecting Columns: Identify the columns containing the input prompt/context and the desired completion/output.
- Formatting: Convert each row into the required structure. For example, for OpenAI fine-tuning, you might create JSON objects like:
- Data Cleaning: Review the data for quality, consistency, and remove any low-quality or irrelevant examples.
- Splitting Data: You might need to split your exported dataset into training and validation sets.
Consult the documentation of your specific fine-tuning tool or platform for detailed formatting requirements.
Example Scenario
Imagine you have a Latitude dataset created from manually reviewed chat logs (input_query
, high_quality_response
).
- Export: Download this dataset as a CSV from Latitude.
- Transform: Write a script (e.g., Python with pandas) to read the CSV and convert each row into the JSONL format required by the fine-tuning API you plan to use.
- Fine-tune: Upload the formatted JSONL file and run the fine-tuning job using the provider’s tools.
- Evaluate: After fine-tuning, you can even evaluate the new model’s performance back in Latitude by configuring it as a new provider/model and running evaluations against your datasets.
By leveraging the data curation work done in Latitude, you can streamline the preparation process for fine-tuning models for specialized tasks.
Next Steps
- Learn about Creating and Using Datasets in Latitude.
- Refer to external documentation for specific fine-tuning platforms (OpenAI, Hugging Face, etc.).