Latitude implements a caching system for prompt responses to optimize performance and reduce costs. This guide explains how caching works and when it’s applied.

How Caching Works

When you execute a prompt, Latitude automatically caches the response if certain conditions are met. The cache key is generated based on:

  • The workspace ID
  • The prompt configuration
  • The conversation context

This means that identical prompts with the same parameters in the same workspace will return cached results.

Cache Conditions

Caching is only applied when:

  • The temperature is set to 0 or not specified
  • The prompt execution is successful

This is because non-zero temperatures introduce randomness in the responses, making caching less useful as each execution is intended to be unique.

Benefits

Caching provides several advantages:

  • Reduced Costs: Cached responses don’t consume additional API tokens
  • Faster Response Times: Cached results are returned immediately
  • Consistency: Identical prompts always return the same response

Cache Duration

Currently, cached responses are stored indefinitely. However, you can force a fresh execution by:

  • Modifying any part of the prompt configuration
  • Changing the conversation context
  • Using a non-zero temperature