Export Latitude datasets to prepare data for fine-tuning language models.
While Latitude focuses on prompt engineering and evaluation, the datasets you create and curate within the platform can be valuable assets for fine-tuning language models using external tools and services.
Curated Data: Datasets often contain carefully selected inputs and high-quality outputs (either expected outputs or actual model responses reviewed manually).
Real-World Examples: Datasets created from production logs represent actual user interactions.
Structured Format: Latitude datasets are already in a structured format (CSV), making them easier to process for fine-tuning.
Once exported, you’ll likely need to transform the CSV data into the specific format required by your chosen fine-tuning platform or library (e.g., OpenAI’s JSONL format, Hugging Face datasets format).Common steps include:
Selecting Columns: Identify the columns containing the input prompt/context and the desired completion/output.
Formatting: Convert each row into the required structure. For example, for OpenAI fine-tuning, you might create JSON objects like:
Copy
Ask AI
{"prompt": "<Input from CSV column A>", "completion": "<Output from CSV column B>"}// or for chat models:{"messages": [{"role": "system", "content": "..."}, {"role": "user", "content": "<Input>"}, {"role": "assistant", "content": "<Output>"}]}
Data Cleaning: Review the data for quality, consistency, and remove any low-quality or irrelevant examples.
Splitting Data: You might need to split your exported dataset into training and validation sets.
Consult the documentation of your specific fine-tuning tool or platform for detailed formatting requirements.
Imagine you have a Latitude dataset created from manually reviewed chat logs (input_query, high_quality_response).
Export: Download this dataset as a CSV from Latitude.
Transform: Write a script (e.g., Python with pandas) to read the CSV and convert each row into the JSONL format required by the fine-tuning API you plan to use.
Fine-tune: Upload the formatted JSONL file and run the fine-tuning job using the provider’s tools.
Evaluate: After fine-tuning, you can even evaluate the new model’s performance back in Latitude by configuring it as a new provider/model and running evaluations against your datasets.
By leveraging the data curation work done in Latitude, you can streamline the preparation process for fine-tuning models for specialized tasks.