Human-in-the-Loop (HITL) involve direct human review and assessment of prompt outputs. This method is essential for capturing nuanced judgments, user preferences, and criteria that are difficult for automated systems to evaluate.

  • How it works: Team members manually review prompt outputs (logs) and assign scores or labels based on their judgment.
  • Best for: Capturing nuanced human preferences, evaluating criteria difficult for LLMs to judge, initial quality assessment, creating golden datasets for other evaluation types.
  • Requires: Setting up manual review workflows and criteria for reviewers.

Because HITL evaluations require manual input, they do not support automatic live or batch execution like LLM-as-Judge or Programmatic Rules. Feedback must be submitted individually for each log reviewed.

Setup

1

Go to evaluations tab

Go to evaluations tab on a prompt in one of your projects.

2

Add evaluation

On the top right corner, click on the “Add evaluation” button.

3

Choose Human-in-the-Loop

Choose “Human-in-the-Loop” tab in the evaluation modal.

4

Choose a metric

Metrics

Binary

Judges whether the response meets the criteria. The resulting score is “passed” or “failed”

Rating

Judges the response by rating it under a criteria. The resulting score is the rating

Annotate logs in Latitude UI

Manually submitted results appear alongside other evaluation results:

  • Logs View: Attached to the individual log entry.
  • Evaluations Tab: Aggregated statistics and distributions for the HITL evaluation.

Capturing Feedback via API/SDK

Check how to annotate a log. A log is the result of running your prompt. So the person can annotate that result and tell if it was good or bad, or provide a score.