Skip to main content

Datasets

The Education AI team maintains various datasets for use with AI systems. All datasets can be found in the MongoDB Education AI HuggingFace.

Content

Content datasets can be useful for building RAG systems and training models.

NameTypeDescriptionVisibilityUse CasesLinks
Public documentationLong-form contentMarkdown version of docs and developer center content.PublicRAG, model traininghttps://huggingface.co/datasets/mongodb-eai/docs
Code example datasetPrompt-completionCode examples extracted from the MongoDB docs and developer center with prompts that could be used to generate the code.PublicModel fine-tuninghttps://huggingface.co/datasets/mongodb-eai/code-example-prompts

Benchmarks

NameTypeDescriptionVisibility
Natural language-to-Node.js MongoshCode generationAssess how well LLMs generate mongosh code given a natural language prompt and information about a database.Externalhttps://huggingface.co/datasets/mongodb-eai/natural-language-to-mongosh