Create a scoring rubric to evaluate a customer-facing chatbot across accuracy, tone, and safety, then score 20 sample conversations with it.
Evalon helps teams measure whether their chatbots are actually good. We need a rubric we can hand to a non-expert reviewer and trust the scores.
Deliverable: a scoring rubric covering accuracy, tone, and safety, with clear 1-5 anchors for each dimension (what a 2 looks like vs. a 4). Then apply it to the 20 sample conversations we provide and summarize where the bot is weakest.
You'll get the 20 transcripts, a short description of the bot's purpose, and one Q&A round with our evaluation lead.