Evaluate Chatbot Responses
Evaluate the quality of your chatbot's responses
Manual Evaluation
The simplest way to evaluate your chatbot's responses is to manually test it yourself. You can do this with the MongoDB Chatbot Server by running the server locally and querying it.
If you want to query it from a UI, you have the following options:
- Spin up the UI from the Quick Start guide.
- Build your own UI with the Chatbot UI components.
- Build a custom UI that queries the server directly. Refer to the API specification for more information on the endpoints.
Red Teaming
You can evaluate your chatbot's responses by having a team of people "red team" it. In a chatbot red teaming exercise, a team of people will ask a variety of questions to the chatbot, evaluating response quality and identifying areas for improvement.
To learn more about how you can red team a chatbot, refer to the documentation from Microsoft.
Automated Evaluation
You can evaluate your chatbot's responses using a variety of automated methods and tools.
On the team that builds the MongoDB Chatbot Framework, we use Braintrust as our automated evaluation tool.
You can search in the project's repository for **.eval.ts
files to see how we use Braintrust for evaluation.
The MongoDB Chatbot Framework used to have an evaluation CLI that you could use to evaluate your chatbot's responses.
We have deprecated this CLI in favor of using Braintrust for evaluation.