Skip to main content

Configuration Reference

This page contains reference documentation for the configuration options for the MongoDB Chatbot Evaluation CLI.

An Evaluation CLI config file is a CommonJS file that exports a ConfigConstructor function as its default export.

For an example of setting up a configuration file, refer to the Configuration documentation.

You need to install the Evaluation CLI to configure it. Refer to the Installation documentation for instructions.

API Reference

For a full API reference of all modules exported by mongodb-chatbot-evaluation, refer to the API Reference documentation.

This page links to the key reference documentation for configuring the Ingest CLI.


The ConfigConstructor function is the root configuration type for the Ingest CLI. This exports an EvalConfig object.

Data Stores


The CommandMetadataStore is an interface for storing metadata of each command run.


To create a CommandMetadataStore that stores data in MongoDB, use the constructor function makeMongoDbCommandMetadataStore().

import { makeMongoDbCommandMetadataStore } from "mongodb-chatbot-evaluation";

const commandMetadataStore = makeMongoDbCommandMetadataStore({


The GeneratedDataStore is an interface for storing and working with generated evaluation data.


To create a GeneratedDataStore that stores data in MongoDB, use the constructor function makeMongoDbGeneratedDataStore().

import { makeMongoDbGeneratedDataStore } from "mongodb-chatbot-evaluation";

const generatedDataStore = makeMongoDbGeneratedDataStore({


The EvaluationStore is an interface for storing and accessing the results of an evaluation.


To create an EvaluationStore that stores data in MongoDB, use the constructor function makeMongoDbEvaluationStore().

import { makeMongoDbEvaluationStore } from "mongodb-chatbot-evaluation";

const evaluationStore = makeMongoDbEvaluationStore({


The ReportStore is an interface for storing reports on the results of evaluation runs.


To create a ReportStore that stores data in MongoDB, use the constructor function makeMongoDbReportStore().

import { makeMongoDbReportStore } from "mongodb-chatbot-evaluation";

const reportStore = makeMongoDbReportStore({

Test Cases

You must provide test cases to evaluate the chatbot. Pass the test cases to the commands.generate property in the EvalConfig.

const testCases: ConversationTestCase[] = [
name: `It understands "why the chicken crossed the road" jokes`,
expectation: `
The ASSISTANT responds with a completion of the classic chicken crossing the road joke.
The joke should be completed in a way that is both humorous and appropriate.
tags: ["joke"],
messages: [
{ role: "user", content: "Why did the chicken cross the road?" }

const evalConfig: EvalConfig = {
// ... other fields,
commands: {
generate: {
myTest: {
type: "conversation",
testCases: testCases,
generator: makeGenerateConversationData({ ... }),
evaluate: { /* ... */ },
report: { /* ... */ },

The mongodb-chatbot-evaluation package includes built-in support for the ConversationTestCase type. You can use this to evaluate the chatbot's performance on conversation data.

Load test cases from a file

You can load ConversationTestCase object from a YAML file using the getConversationsTestCasesFromYaml() function.

import { getConversationsTestCasesFromYaml } from "mongodb-chatbot-evaluation";

const testCases = getConversationsTestCasesFromYaml("path/to/test-cases.yaml");

Command Executor Functions

These functions are used to execute commands in the pipeline. There are different functions for the different commands.


The GenerateDataFunc is a function that generates data to be evaluated.

Pass a GenerateDataFunc to the commands.generate property in the EvalConfig.

The mongodb-chatbot-evaluation package includes the following GenerateDataFunc implementation functions:

  • makeGenerateConversationData(): Generates conversation data from the test cases. The function calls a MongoDB Chatbot Server API to create conversations and add messages. This lets you evaluate the chatbot's performance on a running server to get behavior resembling how your actual app behaves.
  • makeGenerateLlmConversationData(): Generates conversation data from the test cases. The function calls a ChatLlm instance to generate responses. This is useful to see how a language model without retrieval-augmented generation performs on a test case.

Example of using makeGenerateConversationData():

// eval.config.ts
import { makeGenerateConversationData } from "mongodb-chatbot-evaluation";

const generateDataFunc = makeGenerateConversationData({
httpHeaders: {
Origin: "Testing",

export default async function configConstructor() {
return {
// ... other configuration options
commands: {
generate: {
conversations: {
type: "conversation",
testCases: someTestCases,
generator: generateDataFunc,
// ... other commands


The EvaluateQualityFunc is a function that evaluates some quality of generated data.

Pass an EvaluateQualityFunc to the commands.evaluate property in the EvalConfig.

The mongodb-chatbot-evaluation package includes the following EvaluateQualityFunc implementation functions:

Example of using makeEvaluateConversationQuality():

// eval.config.ts

import { makeEvaluateConversationQuality } from "mongodb-chatbot-evaluation";
import { OpenAIClient, AzureKeyCredential } from "@azure/openai";

const evaluateQualityFunc = makeEvaluateConversationQuality({
openAiClient: new OpenAIClient(
new AzureKeyCredential(OPENAI_API_KEY)

export default async function configConstructor() {
return {
// ... other configuration options
commands: {
evaluate: {
conversationQuality: {
evaluator: evaluateQualityFunc,
// ... other commands


The ReportEvalFunc is a function that generates a report from the evaluation data.

Pass a ReportEvalFunc to the property in the EvalConfig.

The mongodb-chatbot-evaluation package includes the following ReportEvalFunc implementation functions:

Example of using reportStatsForBinaryEvalRun():

// eval.config.ts

import { reportStatsForBinaryEvalRun } from "mongodb-chatbot-evaluation";

export default async function configConstructor() {
return {
// ... other configuration options
commands: {
// ... other commands
report: {
binaryEvalRun: {
reporter: reportStatsForBinaryEvalRun,