While Latitude focuses on prompt engineering and evaluation, the datasets you create and curate within the platform can be valuable assets for fine-tuning language models using external tools and services.

Why Use Latitude Datasets for Fine-tuning?

  • Curated Data: Datasets often contain carefully selected inputs and high-quality outputs (either expected outputs or actual model responses reviewed manually).
  • Real-World Examples: Datasets created from production logs represent actual user interactions.
  • Structured Format: Latitude datasets are already in a structured format (CSV), making them easier to process for fine-tuning.

Exporting Datasets from Latitude

  1. Navigate to the “Datasets” section in your project.
  2. Locate the dataset you want to use for fine-tuning.
  3. Find the option to Download or Export the dataset (usually represented by a download icon).
  4. Save the resulting CSV file to your local machine.

Preparing Data for Fine-tuning

Once exported, you’ll likely need to transform the CSV data into the specific format required by your chosen fine-tuning platform or library (e.g., OpenAI’s JSONL format, Hugging Face datasets format).

Common steps include:

  1. Selecting Columns: Identify the columns containing the input prompt/context and the desired completion/output.
  2. Formatting: Convert each row into the required structure. For example, for OpenAI fine-tuning, you might create JSON objects like:
    {"prompt": "<Input from CSV column A>", "completion": "<Output from CSV column B>"}
    // or for chat models:
    {"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "<Input>"}, {"role": "assistant", "content": "<Output>"}]}
    
  3. Data Cleaning: Review the data for quality, consistency, and remove any low-quality or irrelevant examples.
  4. Splitting Data: You might need to split your exported dataset into training and validation sets.

Consult the documentation of your specific fine-tuning tool or platform for detailed formatting requirements.

Example Scenario

Imagine you have a Latitude dataset created from manually reviewed chat logs (input_query, high_quality_response).

  1. Export: Download this dataset as a CSV from Latitude.
  2. Transform: Write a script (e.g., Python with pandas) to read the CSV and convert each row into the JSONL format required by the fine-tuning API you plan to use.
  3. Fine-tune: Upload the formatted JSONL file and run the fine-tuning job using the provider’s tools.
  4. Evaluate: After fine-tuning, you can even evaluate the new model’s performance back in Latitude by configuring it as a new provider/model and running evaluations against your datasets.

By leveraging the data curation work done in Latitude, you can streamline the preparation process for fine-tuning models for specialized tasks.

Next Steps

  • Learn about Creating and Using Datasets in Latitude.
  • Refer to external documentation for specific fine-tuning platforms (OpenAI, Hugging Face, etc.).