Implement a comprehensive QA system for customer support responses using Rating-based LLM evaluation, Exact Match rules, and Manual review
Helpfulness Assessment (LLM-as-Judge)
Configure the evaluation
Create an experiment from the evaluation
Create the synthetic dataset
Run the experiment
View experiment results
Required Information Validation (Programmatic Rule - Exact Match)
Configure the evaluation
Create dataset with expected output
Contains Ticket Number (Programmatic Rule - Regular Expression)
Configure the evaluation
TCKT-
-\d{4}
)Run the experiment
Manual Evaluation (HITL - Human in the Loop)
Configure the evaluation
Annotate past conversations (logs)
Annotate with the SDK
Minimum score
Manual evaluation results
1
but in green. This was before we set the minimum score to 3
. The next one didn’t pass and is shown in red.