Configuration Reference

This page contains reference documentation for the configuration options for the MongoDB Chatbot Evaluation CLI.

An Evaluation CLI config file is a CommonJS file that exports a ConfigConstructor function as its default export.

For an example of setting up a configuration file, refer to the Configuration documentation.

You need to install the Evaluation CLI to configure it. Refer to the Installation documentation for instructions.

API Reference

For a full API reference of all modules exported by mongodb-chatbot-evaluation, refer to the API Reference documentation.

This page links to the key reference documentation for configuring the Ingest CLI.

`ConfigConstructor`

The ConfigConstructor function is the root configuration type for the Ingest CLI. This exports an EvalConfig object.

Data Stores

`CommandMetadataStore`

The CommandMetadataStore is an interface for storing metadata of each command run.

`MongoDBCommandMetadataStore`

To create a CommandMetadataStore that stores data in MongoDB, use the constructor function makeMongoDbCommandMetadataStore().

import { makeMongoDbCommandMetadataStore } from "mongodb-chatbot-evaluation";

const commandMetadataStore = makeMongoDbCommandMetadataStore({
  connectionUri: MONGODB_CONNECTION_URI,
  databaseName: MONGODB_DATABASE_NAME,
});

`GeneratedDataStore`

The GeneratedDataStore is an interface for storing and working with generated evaluation data.

`MongoDBGeneratedDataStore`

To create a GeneratedDataStore that stores data in MongoDB, use the constructor function makeMongoDbGeneratedDataStore().

import { makeMongoDbGeneratedDataStore } from "mongodb-chatbot-evaluation";

const generatedDataStore = makeMongoDbGeneratedDataStore({
  connectionUri: MONGODB_CONNECTION_URI,
  databaseName: MONGODB_DATABASE_NAME,
});

`EvaluationStore`

The EvaluationStore is an interface for storing and accessing the results of an evaluation.

`MongoDBEvaluationStore`

To create an EvaluationStore that stores data in MongoDB, use the constructor function makeMongoDbEvaluationStore().

import { makeMongoDbEvaluationStore } from "mongodb-chatbot-evaluation";

const evaluationStore = makeMongoDbEvaluationStore({
  connectionUri: MONGODB_CONNECTION_URI,
  databaseName: MONGODB_DATABASE_NAME,
});

`ReportStore`

The ReportStore is an interface for storing reports on the results of evaluation runs.

`MongoDBReportStore`

To create a ReportStore that stores data in MongoDB, use the constructor function makeMongoDbReportStore().

import { makeMongoDbReportStore } from "mongodb-chatbot-evaluation";

const reportStore = makeMongoDbReportStore({
  connectionUri: MONGODB_CONNECTION_URI,
  databaseName: MONGODB_DATABASE_NAME,
});

Test Cases

You must provide test cases to evaluate the chatbot. Pass the test cases to the commands.generate property in the EvalConfig.

const testCases: ConversationTestCase[] = [
  {
    name: `It understands "why the chicken crossed the road" jokes`,
    expectation: `
      The ASSISTANT responds with a completion of the classic chicken crossing the road joke.
      The joke should be completed in a way that is both humorous and appropriate.
    `,
    tags: ["joke"],
    messages: [
      { role: "user", content: "Why did the chicken cross the road?" }
    ]
  },
];

const evalConfig: EvalConfig = {
  // ... other fields,
  commands: {
    generate: {
      myTest: {
        type: "conversation",
        testCases: testCases,
        generator: makeGenerateConversationData({ ... }),
      },
    },
    evaluate: { /* ... */ },
    report: { /* ... */ },
  },
};

The mongodb-chatbot-evaluation package includes built-in support for the ConversationTestCase type. You can use this to evaluate the chatbot's performance on conversation data.

Load test cases from a file

You can load ConversationTestCase object from a YAML file using the getConversationsTestCasesFromYaml() function.

import { getConversationsTestCasesFromYaml } from "mongodb-chatbot-evaluation";

const testCases = getConversationsTestCasesFromYaml("path/to/test-cases.yaml");

Command Executor Functions

These functions are used to execute commands in the pipeline. There are different functions for the different commands.

`GenerateDataFunc`

The GenerateDataFunc is a function that generates data to be evaluated.

Pass a GenerateDataFunc to the commands.generate property in the EvalConfig.

The mongodb-chatbot-evaluation package includes the following GenerateDataFunc implementation functions:

makeGenerateConversationData(): Generates conversation data from the test cases. The function calls a MongoDB Chatbot Server API to create conversations and add messages. This lets you evaluate the chatbot's performance on a running server to get behavior resembling how your actual app behaves.
makeGenerateLlmConversationData(): Generates conversation data from the test cases. The function calls a ChatLlm instance to generate responses. This is useful to see how a language model without retrieval-augmented generation performs on a test case.

Example of using makeGenerateConversationData():

// eval.config.ts
import { makeGenerateConversationData } from "mongodb-chatbot-evaluation";

const generateDataFunc = makeGenerateConversationData({
  conversations,
  httpHeaders: {
    Origin: "Testing",
  },
  apiBaseUrl: CONVERSATIONS_SERVER_BASE_URL,
});

export default async function configConstructor() {
  return {
    // ... other configuration options
    commands: {
      generate: {
        conversations: {
          type: "conversation",
          testCases: someTestCases,
          generator: generateDataFunc,
        },
      },
      // ... other commands
    },
  };
}

`EvaluateQualityFunc`

The EvaluateQualityFunc is a function that evaluates some quality of generated data.

Pass an EvaluateQualityFunc to the commands.evaluate property in the EvalConfig.

The mongodb-chatbot-evaluation package includes the following EvaluateQualityFunc implementation functions:

makeEvaluateConversationQuality(): Evaluates the quality of a conversation by comparing the generated response to a provided expectation. The function uses the OpenAI API to evaluate the quality of the responses.
makeEvaluateConversationFaithfulness(): Evaluates the faithfulness of a conversation by comparing the generated response to the context information retrieved before generating an answer.
evaluateConversationAverageRetrievalScore(): Evaluates the average retrieval score of a conversation by comparing the generated responses to a provided expectation.

Example of using makeEvaluateConversationQuality():

// eval.config.ts

import { makeEvaluateConversationQuality } from "mongodb-chatbot-evaluation";
import { OpenAIClient, AzureKeyCredential } from "@azure/openai";

const evaluateQualityFunc = makeEvaluateConversationQuality({
  deploymentName: OPENAI_CHAT_COMPLETION_DEPLOYMENT,
  openAiClient: new OpenAIClient(
    OPENAI_ENDPOINT,
    new AzureKeyCredential(OPENAI_API_KEY)
  ),
});

export default async function configConstructor() {
  return {
    // ... other configuration options
    commands: {
      evaluate: {
        conversationQuality: {
          evaluator: evaluateQualityFunc,
        },
      },
      // ... other commands
    },
  };
}

`ReportEvalFunc`

The ReportEvalFunc is a function that generates a report from the evaluation data.

Pass a ReportEvalFunc to the commands.report property in the EvalConfig.

The mongodb-chatbot-evaluation package includes the following ReportEvalFunc implementation functions:

reportStatsForBinaryEvalRun(): Generates a report for a binary evaluation run, one that has results of either 0 or 1.
reportAverageScore(): Generates a report for the average score of a set of evaluation data.

Example of using reportStatsForBinaryEvalRun():

// eval.config.ts

import { reportStatsForBinaryEvalRun } from "mongodb-chatbot-evaluation";

export default async function configConstructor() {
  return {
    // ... other configuration options
    commands: {
      // ... other commands
      report: {
        binaryEvalRun: {
          reporter: reportStatsForBinaryEvalRun,
        },
      },
    },
  };
}

Configuration Reference

API Reference​

ConfigConstructor​

Data Stores​

CommandMetadataStore​

MongoDBCommandMetadataStore​

GeneratedDataStore​

MongoDBGeneratedDataStore​

EvaluationStore​

MongoDBEvaluationStore​

ReportStore​

MongoDBReportStore​

Test Cases​

Load test cases from a file​

Command Executor Functions​

GenerateDataFunc​

EvaluateQualityFunc​

ReportEvalFunc​