What are human-in-the-loop evaluations?

Human-in-the-loop (HITL) evaluations give you full control over assessing the quality of your LLM outputs by incorporating manual, human-generated feedback. Unlike automated methods, these evaluations require direct human intervention and thus, do not support live or batch evaluation modes.

Using humans to judge

Use cases

Human-in-the-loop evaluations are best suited for scenarios where:

  • Hard Verification: You require detailed, human insight to judge response quality, accuracy, or appropriateness.
  • User Feedback: The evaluation criteria involve unique, user-specific feedback that cannot be automated.
  • Complex Criteria: The criteria are too nuanced or multifaceted for algorithmic or LLM-based evaluations.

Trade-offs

While human-in-the-loop evaluations provide valuable and nuanced insights, they come with certain considerations:

  • Manual Intervention: These evaluations do not support live or batch modes.
  • Subjectivity: Feedback may vary between evaluators.
  • Resource Intensive: They are slower and require more resources compared to automated methods.

Creating a human-in-the-loop evaluation

You can create human-in-the-loop evaluations by clicking the button “Add evaluation” in the Evaluations tab of your prompt. Select “Human-in-the-loop” as the evaluation type and configure the evaluation result you expect.

To learn more about how to run evaluations, check out the Running evaluations guide.

Submitting evaluation results through the dashboard

Every log, whether it comes from the API, playground or from datasets, will appear in the evaluation’s dashboard. You can then review the log and submit the evaluation result from there.

Submitting evaluation results through the API or SDK

You can push evaluation results using Latitude’s API or SDK, allowing you to integrate custom user feedback into your evaluation workflow.