Configuration Reference
This page contains reference documentation for the configuration options for the MongoDB Chatbot Evaluation CLI.
An Evaluation CLI config file is a CommonJS file that exports a ConfigConstructor
function as its default export.
For an example of setting up a configuration file, refer to the Configuration documentation.
You need to install the Evaluation CLI to configure it. Refer to the Installation documentation for instructions.
API Reference
For a full API reference of all modules exported by mongodb-chatbot-evaluation
, refer to the API Reference documentation.
This page links to the key reference documentation for configuring the Ingest CLI.
ConfigConstructor
The ConfigConstructor
function is the root configuration type for the Ingest CLI. This exports an EvalConfig
object.
Data Stores
CommandMetadataStore
The CommandMetadataStore
is an interface for storing metadata of each command run.
MongoDBCommandMetadataStore
To create a CommandMetadataStore
that stores data in MongoDB,
use the constructor function makeMongoDbCommandMetadataStore()
.
import { makeMongoDbCommandMetadataStore } from "mongodb-chatbot-evaluation";
const commandMetadataStore = makeMongoDbCommandMetadataStore({
connectionUri: MONGODB_CONNECTION_URI,
databaseName: MONGODB_DATABASE_NAME,
});
GeneratedDataStore
The GeneratedDataStore
is an interface for storing and working with generated evaluation data.
MongoDBGeneratedDataStore
To create a GeneratedDataStore
that stores data in MongoDB,
use the constructor function makeMongoDbGeneratedDataStore()
.
import { makeMongoDbGeneratedDataStore } from "mongodb-chatbot-evaluation";
const generatedDataStore = makeMongoDbGeneratedDataStore({
connectionUri: MONGODB_CONNECTION_URI,
databaseName: MONGODB_DATABASE_NAME,
});
EvaluationStore
The EvaluationStore
is an interface for storing and accessing the results of an evaluation.
MongoDBEvaluationStore
To create an EvaluationStore
that stores data in MongoDB,
use the constructor function makeMongoDbEvaluationStore()
.
import { makeMongoDbEvaluationStore } from "mongodb-chatbot-evaluation";
const evaluationStore = makeMongoDbEvaluationStore({
connectionUri: MONGODB_CONNECTION_URI,
databaseName: MONGODB_DATABASE_NAME,
});
ReportStore
The ReportStore
is an interface for storing reports on the results of evaluation runs.
MongoDBReportStore
To create a ReportStore
that stores data in MongoDB,
use the constructor function makeMongoDbReportStore()
.
import { makeMongoDbReportStore } from "mongodb-chatbot-evaluation";
const reportStore = makeMongoDbReportStore({
connectionUri: MONGODB_CONNECTION_URI,
databaseName: MONGODB_DATABASE_NAME,
});
Test Cases
You must provide test cases to evaluate the chatbot. Pass the test cases to the commands.generate
property in the EvalConfig
.
const testCases: ConversationTestCase[] = [
{
name: `It understands "why the chicken crossed the road" jokes`,
expectation: `
The ASSISTANT responds with a completion of the classic chicken crossing the road joke.
The joke should be completed in a way that is both humorous and appropriate.
`,
tags: ["joke"],
messages: [
{ role: "user", content: "Why did the chicken cross the road?" }
]
},
];
const evalConfig: EvalConfig = {
// ... other fields,
commands: {
generate: {
myTest: {
type: "conversation",
testCases: testCases,
generator: makeGenerateConversationData({ ... }),
},
},
evaluate: { /* ... */ },
report: { /* ... */ },
},
};
The mongodb-chatbot-evaluation
package includes built-in support for the ConversationTestCase
type.
You can use this to evaluate the chatbot's performance on conversation data.
Load test cases from a file
You can load ConversationTestCase
object from a YAML file using the getConversationsTestCasesFromYaml()
function.
import { getConversationsTestCasesFromYaml } from "mongodb-chatbot-evaluation";
const testCases = getConversationsTestCasesFromYaml("path/to/test-cases.yaml");
Command Executor Functions
These functions are used to execute commands in the pipeline. There are different functions for the different commands.
GenerateDataFunc
The GenerateDataFunc
is a function that generates data to be evaluated.
Pass a GenerateDataFunc
to the commands.generate
property in the EvalConfig
.
The mongodb-chatbot-evaluation
package includes the following GenerateDataFunc
implementation functions:
makeGenerateConversationData()
: Generates conversation data from the test cases. The function calls a MongoDB Chatbot Server API to create conversations and add messages. This lets you evaluate the chatbot's performance on a running server to get behavior resembling how your actual app behaves.makeGenerateLlmConversationData()
: Generates conversation data from the test cases. The function calls aChatLlm
instance to generate responses. This is useful to see how a language model without retrieval-augmented generation performs on a test case.
Example of using makeGenerateConversationData()
:
// eval.config.ts
import { makeGenerateConversationData } from "mongodb-chatbot-evaluation";
const generateDataFunc = makeGenerateConversationData({
conversations,
httpHeaders: {
Origin: "Testing",
},
apiBaseUrl: CONVERSATIONS_SERVER_BASE_URL,
});
export default async function configConstructor() {
return {
// ... other configuration options
commands: {
generate: {
conversations: {
type: "conversation",
testCases: someTestCases,
generator: generateDataFunc,
},
},
// ... other commands
},
};
}
EvaluateQualityFunc
The EvaluateQualityFunc
is a function that evaluates some quality of generated data.
Pass an EvaluateQualityFunc
to the commands.evaluate
property in the EvalConfig
.
The mongodb-chatbot-evaluation
package includes the following EvaluateQualityFunc
implementation functions:
makeEvaluateConversationQuality()
: Evaluates the quality of a conversation by comparing the generated response to a provided expectation. The function uses the OpenAI API to evaluate the quality of the responses.makeEvaluateConversationFaithfulness()
: Evaluates the faithfulness of a conversation by comparing the generated response to the context information retrieved before generating an answer.evaluateConversationAverageRetrievalScore()
: Evaluates the average retrieval score of a conversation by comparing the generated responses to a provided expectation.
Example of using makeEvaluateConversationQuality()
:
// eval.config.ts
import { makeEvaluateConversationQuality } from "mongodb-chatbot-evaluation";
import { OpenAIClient, AzureKeyCredential } from "@azure/openai";
const evaluateQualityFunc = makeEvaluateConversationQuality({
deploymentName: OPENAI_CHAT_COMPLETION_DEPLOYMENT,
openAiClient: new OpenAIClient(
OPENAI_ENDPOINT,
new AzureKeyCredential(OPENAI_API_KEY)
),
});
export default async function configConstructor() {
return {
// ... other configuration options
commands: {
evaluate: {
conversationQuality: {
evaluator: evaluateQualityFunc,
},
},
// ... other commands
},
};
}
ReportEvalFunc
The ReportEvalFunc
is a function that generates a report from the evaluation data.
Pass a ReportEvalFunc
to the commands.report
property in the EvalConfig
.
The mongodb-chatbot-evaluation
package includes the following ReportEvalFunc
implementation functions:
reportStatsForBinaryEvalRun()
: Generates a report for a binary evaluation run, one that has results of either0
or1
.reportAverageScore()
: Generates a report for the average score of a set of evaluation data.
Example of using reportStatsForBinaryEvalRun()
:
// eval.config.ts
import { reportStatsForBinaryEvalRun } from "mongodb-chatbot-evaluation";
export default async function configConstructor() {
return {
// ... other configuration options
commands: {
// ... other commands
report: {
binaryEvalRun: {
reporter: reportStatsForBinaryEvalRun,
},
},
},
};
}