Changelog

2025-06-04

Numeric similarity evaluation metric

We’ve added a new evaluation metric called Numeric similarity. This metric allows you to compare the numeric values of two responses and determine how similar they are.

Realtime log updates toggle

We’ve added a toggle to enable/disable realtime log updates in the logs section. This was a much needed feature for users who were generating thousands of logs per minute.

Export all logs

Logs selector in the logs section now allows you to select any arbitrary combination of logs (previously it was only the logs visible in the current page). This feature works with both the Download logs and “Export to dataset” features.Exports that contain many logs will happen in the background and an email will be sent to the user when the export is complete.

We’ve added a new vertical sidebar to the app, that now gives you quick in-context access to all the documentation pages. You’ll also see little documentation buttons spread throughout the app, that will open the corresponding documentation page.

psst.. you might notice a sneak-peak to an upcoming feature we’re are working on in this screenshot. Stay tuned!

Per-subscription rate limits

We’ve added per-subscription rate limits to the API. Previously, all users had a flat rate limit of 10 requests per second. Now, paying customers enjoy rate limits of up to 10000 requests per minute.

Reworked examples documentation

We’ve put lots of effort into improving the examples documentation. With many examples of common use cases. Please, give it a look!

Consolidated evaluations API

We’ve deprecated the trigger and createResult methods in the Evaluations API and consolidated them into a single endpoint annotate endpoint.

Performance and stability improvements

In the past few weeks we’ve seen increased usage of the platform, putting a strain in our systems. As a results, we’ve started a company-wide effort to improve the performance and stability of our platform. We’re working on a number of improvements, including:

Improving the performance of the logs section
Improving the performance of the prompt editor
Improving the performance of the project overview section
Preparing the platform to scale for the future

Some concrete improvements have already been implemented, whereas others that require more work are still in progress. We’re committed to making the platform as fast and reliable as possible, and we expect this initiative to conclude by early next month, so stay tuned for more updates!

2025-05-07

Experiments

We’ve added a new Experiments tab to the playground. Now, whenever you run an experiment, you’ll be able to see the results in a dedicated tab that groups all the runs you’ve made for that experiment and it lets you compare them side by side.Additionally, when setting up an experiment, you can now add variants to quickly test different models, providers, and temperature settings; as well as run many instances of the same prompt without having to select a dataset.

Evaluations rework

We’ve completely revamped the evaluations section, here are some of the changes:

Evaluations can now be accessed from the sidebar, whenever you select a prompt you’ll see the evaluations associated with it.
You can now edit and test the prompt of LLM-as-Judge evaluations in a fully featured prompt playground
Now you can set a minimum (or maximum) scoring threshold for an evaluation to pass
Redesigned annotation experience for Human-in-the-Loop evaluations:
- You can now manually evaluate logs for several evaluations inside the logs table with all the filtering options
- You can now leave instructions to the annotators to help them make the right decision
Improved prompt refiner and suggestions
- Refiner now optimizes for a lower or higher evaluation score depending if the evaluation is negative
- New smarter algorithm to select the best evaluation results to improve the prompt
- You can now disable automatic suggestions from specific evaluations
You can now pass custom instructions to guide the evaluation generator
Evaluations are now versioned along with the project

You might remember from our changelog last month that we added a new evaluation type called “Rule-based evaluation”. This evaluation type allows you to create custom evaluations by defining a set of rules. Each rule can have a threshold and a metric.Since we have a lot of evaluation types, here’s a refresher of the different types and what they do:

Programmatic Rule: Evaluate responses using algorithmic metrics

Exact Match: Checks if the response is exactly the same as the expected output
Regular Expression: Checks if the response matches the regular expression
Schema Validation: Checks if the response follows the schema
Length Count: Checks if the response is of a certain length
Lexical Overlap: Checks if the response contains the expected output
Semantic Similarity: Checks if the response is semantically similar to the expected output

LLM-as-a-Judge: Evaluate responses using an LLM as a judge

Binary: Judges whether the response meets the criteria
Rating: Judges the response by rating it under a criteria
[New] Comparison: Judges the response by comparing the criteria to the expected output
Custom: Judges the response under a criteria using a custom prompt

Human-in-the-Loop: Evaluate responses using a human as a judge

Binary: Judges whether the response meets the criteria. The resulting score is “passed” or “failed”
Rating: Judges the response by rating it under a criteria. The resulting score is the rating

2025-04-21

New Onboarding

A fresh onboarding experience for newcomers
PR #1161

Quality of live Improvements (part 2)

We continue last changelog’s streak of minor updates to the general UX with the following updates:

Massively improved performance of editor in playground
Improve performance of overview section
Fixed charts displaying dates in timestamp format
Added more details to error when synthetic dataset generation preview fails
Added an “Improve my prompt” button to trigger a general suggestion improvement from Latitude’s Copilot
Improved startup latency of MCP server integrations

Google SSO support

Added support for Google single-sign-on (SSO)

Added several third-party MCP server integrations

Added 20+ new integrations from the official list of third-party MCP servers
PR 1150

Removed Telemetry and Traces section

We’ve removed this feature because of low usage.

New docs

We’ve revamped our docs in both structure and content.

2025-04-07

Added new providers

Added support for new providers: xAI, DeepkSeek, Amazon Bedrock, and Perplexity
PR #1075

Quality of life improvements

We have made several improvements to the quality of life of our users with a slew of small but important fixes:

In sidebar when creating folder/prompt save it on blur #1044
If theres only one folder in the project, display it open #1051
When creating a prompt put focus in the editor #1052
When open evaluations close parameters. And the other way #1055
Collapsed/Expanded parameters are broken #1000
Move to the center prompt Add message/Reset in playground #1056
Rename “Publish prompt” to “Share” in playground #1060
Move toasts to the left side #1069
PR #1078

Improved public Docker images

Shaved several GBs off the public Docker images
PR #1068

Other Updates

Added several new MCPs to the list of supported integrations.
Improved parameter mapping by using selected dataset columns as a fallback.
Refactored jobs to remove meta programming, potentially fixing a memory leak.
Introduced an impersonation feature for admin users.
Addressed a build issue with heapdump in amd64 environments.
Improved SDK’s error handling to address silent failures.

2025-03-28

New evaluation types

We’ve added a new evaluation type called “Rule-based evaluation”. This new evaluation type allows you to create custom evaluations by defining a set of rules. Each rule can have a threshold and a metric.Within this new evaluation type, we’ve added the following metrics:

Exact Match: Checks if the response is exactly the same as the expected output. The resulting score is “matched” or “unmatched”.
Regular Expression: Checks if the response matches the regular expression. The resulting score is “matched” or “unmatched”.
Schema Validation: Checks if the response follows the schema. The resulting score is “valid” or “invalid”.
Length Count: Checks if the response is of a certain length. The resulting score is the length of the response.
Lexical Overlap: Checks if the response contains the expected output. The resulting score is the percentage of overlap.
Semantic Similarity: Checks if the response is semantically similar to the expected output. The resulting score is the percentage of similarity.

Revamped dataset management

We’ve completely revamped the dataset management section to allow you to create, edit, and delete rows. You can also mark a column as a label or parameter and change its name.Additionally, now it’s very easy to create a dataset from a set of logs. Just go to the logs section, select the logs you want to add to the dataset and click on the “Save logs to dataset” button. You can also add rows to an existing dataset from the logs section.

Webhooks

We’ve added a new webhooks section in the settings page. This section allows you to create webhooks that are triggered when certain events happen in Latitude. Currently, we support the following event but we’ll be adding more in the future:

commitPublished: Triggered when a commit is published in a project. The webhook payload includes details about the commit, such as the commit ID, message, and project information.

Other stuff

We added support for new MCP servers:
- Exa
- Apify
- Audiense
- Redis
- Jira
- Attio
- Supabase

If you’re interested in adding support for a new MCP server, please let us know!

2025-03-17

Integrations

We’ve added a new integrations section in the settings page. This section allows you to connect your Latitude prompts to other tools and services such as Figma, Notion, Slack, and more. We’ve started with 20+ integrations and we are constantly adding more.

Under the hood, integrations are just MCP servers we automatically manage for you, so you will see a deployment status and some deploy-related logs in the integrations section. Most MCP servers take just a few seconds to deploy. We also have a special integration type to connect your prompts to existing MCP servers that you might be hosting yourself.Integrations can easily be made available to your prompts via the integrations section in the playground.

This feature is specially useful for agents that perform long running tasks and need access to several tools and services. For example, here’s an agent that surfaces the web for some AI-related news and sends a summary to a Slack channel.videoThe prompt is triggered everyday at 9AM, which brings us to the next topic…

Triggers

We’ve added support for triggers in the playground. Triggers are a new way to trigger your prompts based on certain heuristics. To start we’ve implemented 2 type of triggers:

Time-based triggers: these triggers are triggered based on a specific time of day, for example, you can trigger your prompts at 9AM everyday.
Email-based triggers: these triggers are triggered based on a specific email address, for example, you can trigger your prompts when a specific email address receives an email.

Subagents

We’ve added support for subagents. Subagents are a way for agents to delegate tasks to other agents. This allows you to create complex agents that can handle more complex tasks.

Other stuff

Drag and drop files/folders from the sidebar
Improved editor error reporting
Improved agentic behavior of agents running in Latitude
Added option to stop streaming from UI and Typescript SDK
General performance and stability improvements

2025-02-20

Built-in tools

We’ve added a set of built-in tools that are available in the playground.

Web search: if enabled, the model will be able to search the web for information.
Extract web content: great when paired with the web search tool, this will allow you to extract the content of a website.
Run code: if your prompt requires some code execution, you can now activate this tool and the model will be able to generate and run code in a sandboxed environment.

We’re working on adding more tools in the future, so if you have any suggestions, please let us know!

Playground improvements

Now you can see the results of your evaluations right there after running a prompt, and access the evaluation editor directly with that log already imported. This simplifies the workflow of iterating on both your prompts and your evaluations.

Additionally, we added a new prompt configuration section that allows you to tweak values like temperature, set up limits, and activate our built-in tools very easily.

Automatic refiner suggestions

We’ve added a new feature that suggests changes to your prompts based on the results of your evaluations, automatically. Now, whenever you run a promptfrom the playground or the APIthat has an evaluation connected, you’ll see suggestions appear next to the Refine button in the prompt editor. These suggestions are directly based on the evaluation results, so you can be sure they’ll improve your prompt’s quality.

Other stuff

General improvements to stability and performance
Updated list of available models
Anthropic’s cache configuration now works as expected
New endpoint to fetch all available prompts for a project
We added Vertex AI as a new provider

2025-01-22

Agents

We’re very happy to introduce the first version of Latitude Agents. An agent is a new way to build LLM-powered applications that can interact with the world and perform tasks.The easiest way to think about agents is as a prompt that runs in a loop and has:

Some instructionsincluding a goal and some contextjust like any other prompt
A set of tools that it can use to interact with the world
A tool the agent can call to signal that it’s finished its work

The agent runs in a loop, using the tools available to it to achieve its goal.

This is the first step in our agent journey. We’re already working on:

Adding support for using other agents as tools
Supporting code execution as a tool
Building a library of built-in tools ready to use (web browsing, search, etc.)
Agent tracing
Triggers

We’ll keep iterating on agents based on your feedback, so let us know what you think!Read more about agents in our docs.

Full support for tool calling

We’ve added full support for tool calling in Latitude. You can now add tools to your prompts and test them directly on the playground.Now, when a model calls a tool, the playground will prompt you to fill in the tool response so that the conversation can continue. You can either fill in the tool response manually or use our mock tool responses to test your prompts. These mock tool responses are AI-generated, so they are a great way to test your prompts without having to manually fill in the tool responses.Running prompts with tool calls in batch mode is also supported, with mock tool responses for each tool call.

Python SDK

We shipped the new Python SDK. Now you can interact with our API from your Python applications.It ships with support for:

Running prompts through our gateway
Pushing logs to Latitude
Telemetry, automatic tracing of your LLM calls

We’re also working on adding support for compiling prompts locally, so you can run them from your code. Stay tuned!Here’s the documentation to get started with the Python SDK.

Other stuff

General improvements to stability and performance
Added revert/reset commit actions to the Project’s History
Added full support for OpenAI’s o1 model family
The response of a step now is automatically parsed as JSON if the JSON output schema is defined.
Lots of tool-related fixes

2024-12-18

Latitude Telemetry

We’ve released the 1.0 version of Latitude’s Typescript SDK, which ships with support for OpenLLMetry. You can now trace all your LLM generations with a couple lines of code and start using Latitude’s powerful prompt manager by creating prompts from your traces.

Here’s the full list of supported providers/sdks at launch:

And here’s the documentation to get started with Telemetry.

Version history

We’ve added a new history section in projects. It displays the full history of prior committed versions of your project.

From a prompt, you can easily inspect its history by clicking on the history button at the top of the editor.

File / Image prompt parameters

We’ve introduced support for “file” content-type in PromptL, unlocking OpenAI input_audio audio files, Anthropic document PDF files and Google fileData audio, video and document files.And since Latitude uses PromptL, you can now upload files and images to prompts as prompt parameters.

Public prompts also support this feature, btw.

Other stuff

General improvements to stability and performance

2024-12-12

Welcome! A bit of a lighter update as we gear for a last big suprise before christmas, next week. Here’s this week’s highlights:

Create prompts from documents

Following up from the previous feature, we now allow you to create prompts by simply uploading a document. This is a great way to quickly create prompts from a document without having to write the prompt yourself.

Log Filters

We’ve added a new feature that allows you to filter your logs by a set of commonly requested criteria.

Keep an eye out for more filters to come!

Other stuff

We’ve increased our keep alive timeout setting to 10 minutes in order to match it with OpenAI’s default setting. This helps when you are trying to generate a big JSON response. That being said, we always recommend using streaming responses when you expect a large response, as it effectively means you have limitless timeout.
Updated some old documentation
Released a new homepage!
General improvements to stability and performance

2024-12-04

Public prompts

We have released a new feature that allows you to share your prompts with the world. Here’s an example of a public prompt we’ve created.To share your prompts, simply click the new share button in the prompt editor.

Export logs

You can now select logs from the logs section and automatically create a dataset from them or download them as a CSV file.

This allows you to easily create golden datasets of parameters for your prompts and use them to test new prompt iterations at scale.

Collapsible parameters

We have updated the prompt preview in our prompt editor. Now parameter values are automatically collapsed and can be expanded with a simple click.

Overview page

We have added a new project overview page that gives you a quick overview of the project’s overall cost, prompts, and evaluations.

Compile Latitude prompts locally

You can now compile Latitude prompts straight from your code using our SDK. This allows you to still use Latitude’s prompt editor, for iterating and evaluating your prompts at scale, while maintaining your existing provider integrations to run the prompts from your code. docs.

const { config, messages } = await latitude.prompts.render({
  prompt,
  parameters: {
    // Any parameters your document expects
  },
})

const response = await openai.chat.completions.create({
  ...config,
  messages,
})

Other improvements

Added tool calls to LLM-as-judge evaluations’ context
Fixed datasets containing JSON not working correctly in some scenarios
General improvements in stability and performance

2024-11-26

PromptL activated in Latitude

We have implemented the new version of our template syntax �a0PromptL �a0in Latitude. As a reminder, PromptL is a new template syntax that has native support for html/xml tags, contextless chain steps, as well as a slew of other improvements that make it writing prompts in Latitude the best way to write prompts. docs

Refine prompt directly from evaluation logs

One of the most powerful features of Latitude is the ability to improve your prompts based on results from evaluations �a0we call it Refiner. We have now made this process easier by directly allowing users to choose evaluation results from the evaluations page and trigger the refiner from there.

Evaluation results in Logs

You can now see evaluation results in the logs section of your prompts. For each log that has an evaluation result associated, the result will show up in the details section of that log.

Other improvements

You can now edit a version title and description before publishing it
You can now rename projects
Several improvements in stability and performance

2024-11-20

Human / Code evaluations

We have released a new type of evaluations: manual / code evaluations. This new evaluation type allows users to evaluate their LLM outputs with human feedback or code-based evaluations, and push the results to Latitude using our SDKs/API.You can also submit results directly from Latitude’s UI.

docs

New prompt template syntax

We have open sourced the new version of our prompt templating syntax and we’ve even given it a new name: PromptL. This new syntax introduces some highly requested features such as support for html/xml tags without needing to escape them, chain steps with custom contexts, and more.

<step as='researchPhase' provider='OpenAI' model='gpt-4'>
  <user>Research key points about {{ topic }} and create an outline.</user>
</step>

<step as='writing' isolated='true' temperature='0.7'>
    <user>
        Using this outline: {{ researchPhase.outline }}
        Write a detailed article.
    </user>
</step>

The new syntax will be enabled to all new prompts in Latitude by default starting Monday 25th November. Since the new syntax is not compatible with the old one, existing prompts will not get automatically upgraded to the new syntax and users will be in charge of updating them.

New parameters section for prompts

We have revamped our parameters section in prompts and introduced some highly requested features. Users can now choose between inputing parameters manually, from datasets, or from existing prompt logs. Moreover, any choice they make in any of these sections gets automatically stored in session so that you don’t lose track of the latest inputs you chose if you ever navigate to another section and later come back.

Prompt analytics

We have added some key metrics to the logs section of your prompts. You can now see at a glance the number of prompt runs, average latency and cost, and more.

Default provider and models

We have added a new section in the settings page where you can set default providers and models for your prompts. This allows you to quickly change the default settings for your prompts without having to go through the prompt creation flow every time.

More improvements

You can now get and create prompts from the SDK/API docs
You can now eject from simple LLM-as-judge evaluations into more complete advanced evaluations that give you complete control over the evaluation prompt
Updated UI code snippets on how to push logs and evaluations to Latitude
Several improvements in infrastructure stability and performance
Several improvements and fixes to UI/UX

2024-11-13

New evaluations playground

We have completely revamped our evaluations to make it super simple to create new evaluations from scratch. From now on youll only need to worry about typing the goal of your evaluationas well as any additional instructions that might be useful.

Latitude Cookbook

Weve started work on Latitudes Cookbook showcasing common use cases with Latitudes SDK. Here you can find the first examples.

Anthropic cache

We have added support for Anthropics prompt caching beta feature.

Rust SDK

Our community member @Dominik Spitzli has implemented a Rust port of Latitudes SDK!

Latitude Typescript SDK v1 released

Weve released the first major version of Latitudes SDK, v1.0.0, currently in beta. It adds support for evaluations, pushing logs, JSON API, and more.

Other improvements

Dramatically improved performance of the prompt editor on large prompts
Improved error reporting in the prompt editor
Long-lived modals no longer close on click-outside or hitting ESC key
Prompt input parameters are now stored in memory so that you can navigate to other sections and come back without losing the latest inputs you used in a specific prompt

2024-11-06

Upload external logs

Users have long asked us to evaluate their prompts without having to run them via Latitudes Gateway. Well, we now support this use case. You can now upload external logs to Latitude for evaluation so that, even if you run your prompts outside of Latitude, you can keep tracking their performance. We support uploading logs to Latitude both from the UI and our SDK/HTTP API.

Trigger evaluations from SDK

In cases where AI agents have long running conversations with users users only want to evaluate the agents performance at particular points in time (i.e when the conversation has finished). You can now trigger evaluations from our SDK / HTTP API, giving you the tools to trigger evaluations at the precise moment you require it.

JSON API

Weve released the v2 version of our Gateway API, which supports non-streaming responses for the run and chat endpoints. We’ve also released the v1 major version of our SDK, which introduces support for the new HTTP API version, as well as the features above described.

Other improvements

Improved performance of prompt editor in large prompts
Added code examples on how to use the SDK to the OSS repository
Improved and fixed documentation in several places
Several performance and stability improvements

Getting started

Prompts

Agents

Evaluations

Datasets

Experiments

Deployment

Self-Hosting

Support

​Numeric similarity evaluation metric

​Realtime log updates toggle

​Export all logs

​Documentation sidebar

​Per-subscription rate limits

​Reworked examples documentation

​Consolidated evaluations API

​Performance and stability improvements

​Experiments

​Evaluations rework

​New Onboarding

​Quality of live Improvements (part 2)

​Google SSO support

​Added several third-party MCP server integrations

​Removed Telemetry and Traces section

​New docs

​Added new providers

​Quality of life improvements

​Improved public Docker images

​Other Updates

​New evaluation types

​Revamped dataset management

​Webhooks

​Other stuff

​Integrations

​Triggers

​Subagents

​Other stuff

​Built-in tools

​Playground improvements

​Automatic refiner suggestions

​Other stuff

​Agents

​Full support for tool calling

​Python SDK

​Other stuff

​Latitude Telemetry

​Version history

​File / Image prompt parameters

​Other stuff

​Create prompts from documents

​Log Filters

​Other stuff

​Public prompts

​Export logs

​Collapsible parameters

​Overview page

​Compile Latitude prompts locally

​Other improvements

​PromptL activated in Latitude

​Refine prompt directly from evaluation logs

​Evaluation results in Logs

​Other improvements

​Human / Code evaluations

​New prompt template syntax

​New parameters section for prompts

​Prompt analytics

​Default provider and models

​More improvements

​New evaluations playground

​Latitude Cookbook

​Anthropic cache

​Rust SDK

​Latitude Typescript SDK v1 released

​Other improvements

​Upload external logs

​Trigger evaluations from SDK

​JSON API

​Other improvements

Numeric similarity evaluation metric

Realtime log updates toggle

Export all logs

Documentation sidebar

Per-subscription rate limits

Reworked examples documentation

Consolidated evaluations API

Performance and stability improvements

Experiments

Evaluations rework

New Onboarding

Quality of live Improvements (part 2)

Google SSO support

Added several third-party MCP server integrations

Removed Telemetry and Traces section

New docs

Added new providers

Quality of life improvements

Improved public Docker images

Other Updates

New evaluation types

Revamped dataset management

Webhooks

Other stuff

Integrations

Triggers

Subagents

Other stuff

Built-in tools

Playground improvements

Automatic refiner suggestions

Other stuff

Agents

Full support for tool calling

Python SDK

Other stuff

Latitude Telemetry

Version history

File / Image prompt parameters

Other stuff

Create prompts from documents

Log Filters

Other stuff

Public prompts

Export logs

Collapsible parameters

Overview page

Compile Latitude prompts locally

Other improvements

PromptL activated in Latitude

Refine prompt directly from evaluation logs

Evaluation results in Logs

Other improvements

Human / Code evaluations

New prompt template syntax

New parameters section for prompts

Prompt analytics

Default provider and models

More improvements

New evaluations playground

Latitude Cookbook

Anthropic cache

Rust SDK

Latitude Typescript SDK v1 released

Other improvements

Upload external logs

Trigger evaluations from SDK

JSON API

Other improvements