LLM as judge evaluations
Evaluation Templates
Learn about the different templates of LLM-as-judge evaluations Latitude comes preconfigured with
Overview
Latitude comes with a set of pre-configured evaluations to quickly get you started evaluating your LLM outputs.
Here’s the full list of evaluation templates Latitude comes preconfigured with:
- Adaptability
- Evaluate how well the response adapts to user preferences or context
- Bias and Fairness
- Assess whether the response is free of bias or unfair generalizations
- Coherence and Fluency
- Evaluate the clarity and flow of the response
- Conciseness
- Assess whether the response is brief but informative
- Consistency
- Check if the response is consistent with prior information or context
- Creativity
- Evaluate the originality and imagination shown in the response
- Domain Expertise
- Assess the response for accuracy and knowledge in a specific domain
- Engagement or User Experience
- Rate how well the response engages the user or enhances the conversation
- Error Handling and Recovery
- Evaluate how well the response corrects user errors or misunderstandings
- Ethical Compliance
- Determine if the response follows ethical standards
- Explainability
- Rate how clearly the response explains the concept or information
- Factuality
- Evaluates whether the following response is factually accurate
- Faithfulness to Instructions
- Assess how well the response follows the given instructions
- Helpfulness and Informativeness
- Rate how helpful and informative the response is
- Formality and Style
- Evaluate whether the response matches the desired formality or style
- Hallucination Detection
- Detect if the response introduces unsupported or false information
- Harmlessness and Ethical Considerations
- Check if the response promotes ethical and non-harmful behavior
- Novelty
- Assess the originality of the response in its content or style
- Humor or Emotional Understanding
- Rate whether the response appropriately uses humor or addresses emotional content
- Helpfulness and Informativeness
- Rate how helpful and informative the response is
- Redundancy
- Check if the response repeats information unnecessarily
- Relevance
- Rate how well the response addresses the given context or query
- Response Time or Latency
- Measure whether the response time is suitable for real-time interaction
- Satisfaction
- Rate overall satisfaction with the response
- Specificity
- Evaluate how specific and relevant the response is to the query
- Long-Term Consistency (in Multi-turn Dialogues)
- Check if the response remains consistent over multiple turns of dialogue
- Novelty
- Assess the originality of the response in its content or style
- Persuasiveness
- Rate how convincing the response is
- Toxicity and Safety
- Check if the response contains harmful or inappropriate content
- Uncertainty or Confidence
- Evaluate if the response expresses appropriate confidence or acknowledges uncertainty
- Redundancy
- Check if the response repeats information unnecessarily
- Relevance
- Rate how well the response addresses the given context or query
- Response Time or Latency
- Measure whether the response time is suitable for real-time interaction
- Satisfaction
- Rate overall satisfaction with the response
- Specificity
- Evaluate how specific and relevant the response is to the query
- Toxicity and Safety
- Check if the response contains harmful or inappropriate content
- Uncertainty or Confidence
- Evaluate if the response expresses appropriate confidence or acknowledges uncertainty
Custom evaluations
You can also create your own custom LLM-as-judge evaluations from scratch. Read the docs on custom evaluations to learn more.