- How it works: Team members manually review prompt outputs (logs) and assign scores or labels based on their judgment.
- Best for: Capturing nuanced human preferences, evaluating criteria difficult for LLMs to judge, initial quality assessment, creating golden datasets for other evaluation types.
- Requires: Setting up manual review workflows and criteria for reviewers.
Because HITL evaluations require manual input, they do not support automatic
live or batch execution like LLM-as-Judge or Programmatic Rules. Feedback
must be submitted individually for each log reviewed.
Setup
1
Go to evaluations tab
Go to evaluations tab on a prompt in one of your projects.
2
Add evaluation
On the top right corner, click on the “Add evaluation” button.
3
Choose Human-in-the-Loop
Choose “Human-in-the-Loop” tab in the evaluation modal.

4
Choose a metric

Metrics
Binary
Judges whether the response meets the criteria. The resulting score is “passed” or “failed”
Rating
Judges the response by rating it under a criteria. The resulting score is the rating
Annotate logs in Latitude UI
Manually submitted results appear alongside other evaluation results:- Logs View: Attached to the individual log entry.
- Evaluations Tab: Aggregated statistics and distributions for the HITL evaluation.