Skip to main content

Module: dataSources

Interfaces

Type Aliases

DataSource

Ƭ DataSource: Object

Represents a source of page data.

Type declaration

NameTypeDescription
namestringThe unique name among registered data sources.
fetchPages() => Promise<Page[]>Fetches pages in the data source.

Defined in

packages/mongodb-rag-core/src/dataSources/DataSource.ts:6


FilterFunc

Ƭ FilterFunc: (path: string) => boolean

Type declaration

▸ (path): boolean

Parameters
NameType
pathstring
Returns

boolean

Defined in

packages/mongodb-rag-core/src/dataSources/GitDataSource.ts:155


HandleHtmlPageFuncOptions

Ƭ HandleHtmlPageFuncOptions: Object

Type declaration

NameTypeDescription
extractMetadata?(domDoc: Document) => Record<string, unknown>Extract metadata from page DOM. Added to the Page.metadata field. If a in the result of extractMetadata() is the same as a key in metadata, the extractMetadata() key will override it.
extractTitle?(domDoc: Document) => string | undefinedExtract Page.title from page content and path.
metadata?PageMetadataPage.metadata passed from config. Included in all documents
pathToPageUrl(path: string) => stringConstruct the Page.url from page path.
postProcessMarkdown?(markdown: string) => Promise<string>Transform Markdown once it's been generated
removeElements(domDoc: Document) => Element[]Returns an array of DOM elements to be removed from the parsed document.

Defined in

packages/mongodb-rag-core/src/dataSources/handleHtmlDocument.ts:7


HandlePageFunc

Ƭ HandlePageFunc: (path: string, content: string) => Promise<undefined | Omit<Page, "sourceName"> | Omit<Page, "sourceName">[]>

Type declaration

▸ (path, content): Promise<undefined | Omit<Page, "sourceName"> | Omit<Page, "sourceName">[]>

Function to convert a file in the repo into a Page or Page[].

Parameters
NameTypeDescription
pathstringPath to file in repo
contentstringContents of file in repo
Returns

Promise<undefined | Omit<Page, "sourceName"> | Omit<Page, "sourceName">[]>

Defined in

packages/mongodb-rag-core/src/dataSources/GitDataSource.ts:16


MakeCodeOnGithubTextDataSourceParams

Ƭ MakeCodeOnGithubTextDataSourceParams: Omit<MakeGitHubDataSourceArgs, "handleDocumentInRepo"> & { metadata?: PageMetadata }

Defined in

packages/mongodb-rag-core/src/dataSources/CodeOnGithubTextDataSource.ts:8


MakeGitHubDataSourceArgs

Ƭ MakeGitHubDataSourceArgs: Object

Type declaration

NameTypeDescription
filter?MakeGitDataSourceParams["filter"]Filter function to filter out files from the repo. Using this overrides the repoLoaderOptions.ignorePaths option. Note that file paths will have a leading slash (e.g. /somedir/somefile.txt).
namestringThe data source name.
repoLoaderOptions?Partial<GithubRepoLoaderParams>The branch to fetch.
repoUrlstringThe GitHub repo URL.
handleDocumentInRepo(document: Document<{ source: string }>) => Promise<undefined | Omit<Page, "sourceName"> | Omit<Page, "sourceName">[]>Handle a given file in the repo. Any number of Pages can be returned for a given file. The exact details depend on the given repo. Return undefined to skip this document. Page sourceName will be overridden by the name passed to makeGitHubDataSource.

Defined in

packages/mongodb-rag-core/src/dataSources/GitHubDataSource.ts:7


MakeMdOnGithubDataSourceParams

Ƭ MakeMdOnGithubDataSourceParams: Omit<MakeGitHubDataSourceArgs, "handleDocumentInRepo"> & { extractMetadata?: (pageContent: string, frontMatter?: Record<string, unknown>) => PageMetadata ; extractTitle?: (pageContent: string, frontMatter?: Record<string, unknown>) => string | undefined ; filter?: MakeGitHubDataSourceArgs["filter"] ; frontMatter?: { format?: string ; process: boolean ; separator?: string } ; metadata?: PageMetadata ; pathToPageUrl: (pathInRepo: string, frontMatter?: Record<string, unknown>) => string }

Defined in

packages/mongodb-rag-core/src/dataSources/MdOnGithubDataSource.ts:10

Functions

extractHtmlH1

extractHtmlH1(domDoc): undefined | string

Parameters

NameType
domDocDocument

Returns

undefined | string

Defined in

packages/mongodb-rag-core/src/dataSources/handleHtmlDocument.ts:96


getAcquitTestsFromGithubRepo

getAcquitTestsFromGithubRepo(repoUrl, repoLoaderOptions): Promise<string[]>

Parameters

NameType
repoUrlstring
repoLoaderOptionsPartial<GithubRepoLoaderParams>

Returns

Promise<string[]>

Defined in

packages/mongodb-rag-core/src/dataSources/AcquitRequireMdOnGithubDataSource.ts:97


getRelevantFilePathsInDir

getRelevantFilePathsInDir(directoryPath, filter, fileList?): string[]

Parameters

NameTypeDefault value
directoryPathstringundefined
filterFilterFuncundefined
fileListstring[][]

Returns

string[]

Defined in

packages/mongodb-rag-core/src/dataSources/GitDataSource.ts:157


getRelevantFilesAsStrings

getRelevantFilesAsStrings(«destructured»): Promise<Record<string, string>>

Parameters

NameType
«destructured»Object
› directoryPathstring
› filterFilterFunc

Returns

Promise<Record<string, string>>

Defined in

packages/mongodb-rag-core/src/dataSources/GitDataSource.ts:178


getRepoLocally

getRepoLocally(«destructured»): Promise<void>

Parameters

NameType
«destructured»Object
› localPathstring
› options?TaskOptions
› repoPathstring

Returns

Promise<void>

Defined in

packages/mongodb-rag-core/src/dataSources/GitDataSource.ts:132


handleHtmlDocument

handleHtmlDocument(path, content, options): Promise<Omit<Page, "sourceName">>

Parameters

NameType
pathstring
contentstring
optionsHandleHtmlPageFuncOptions

Returns

Promise<Omit<Page, "sourceName">>

Defined in

packages/mongodb-rag-core/src/dataSources/handleHtmlDocument.ts:31


makeAcquitRequireMdOnGithubDataSource

makeAcquitRequireMdOnGithubDataSource(«destructured»): Promise<DataSource>

Loads an MD/Acquit docs site from a GitHub repo. Acquit is a tool for writing tests in comments, and then extracting them into a test suite. This function loads the tests from the repo, and then transforms the document content to include tests from the test suite in the document. Acquit is used in the Mongoose ODM documentation. This data source assumes that the test files are in the same repo as the docs.

Parameters

NameType
«destructured»Omit<MakeGitHubDataSourceArgs, "handleDocumentInRepo"> & { acquitCodeBlockLanguageReplacement?: string ; metadata?: PageMetadata ; pathToPageUrl: (pathInRepo: string) => string ; testFileLoaderOptions: Partial<GithubRepoLoaderParams> }

Returns

Promise<DataSource>

Defined in

packages/mongodb-rag-core/src/dataSources/AcquitRequireMdOnGithubDataSource.ts:22


makeCodeOnGithubTextDataSource

makeCodeOnGithubTextDataSource(«destructured»): Promise<DataSource>

Loads source code files from a GitHub repo.

Parameters

NameType
«destructured»MakeCodeOnGithubTextDataSourceParams

Returns

Promise<DataSource>

Defined in

packages/mongodb-rag-core/src/dataSources/CodeOnGithubTextDataSource.ts:20


makeGitDataSource

makeGitDataSource(«destructured»): DataSource

Loads and processes files from a Git repo (can be hosted anywhere).

Parameters

NameType
«destructured»MakeGitDataSourceParams

Returns

DataSource

Defined in

packages/mongodb-rag-core/src/dataSources/GitDataSource.ts:57


makeGitHubDataSource

makeGitHubDataSource(«destructured»): DataSource

Loads an arbitrary GitHub repo and converts its contents into pages.

Parameters

NameType
«destructured»MakeGitHubDataSourceArgs

Returns

DataSource

Defined in

packages/mongodb-rag-core/src/dataSources/GitHubDataSource.ts:50


makeLangChainDocumentLoaderDataSource

makeLangChainDocumentLoaderDataSource(«destructured»): DataSource

Create a data source that loads pages from a Langchain document loader.

Parameters

NameType
«destructured»MakeLangChainDocumentLoaderDataSourceParams

Returns

DataSource

Defined in

packages/mongodb-rag-core/src/dataSources/LangchainDocumentLoaderDataSource.ts:37


makeMdOnGithubDataSource

makeMdOnGithubDataSource(«destructured»): Promise<DataSource>

Loads an .md/.mdx docs site from a GitHub repo.

Parameters

NameType
«destructured»MakeMdOnGithubDataSourceParams

Returns

Promise<DataSource>

Defined in

packages/mongodb-rag-core/src/dataSources/MdOnGithubDataSource.ts:73


makeRandomTmp

makeRandomTmp(prefix): string

Parameters

NameTypeDescription
prefixstringprefix for the temporary directory name

Returns

string

Defined in

packages/mongodb-rag-core/src/dataSources/GitDataSource.ts:123


pageBlobUrl

pageBlobUrl(args): string

Parameters

NameType
argsObject
args.branchstring
args.filePath?string | string[]
args.repoUrlstring

Returns

string

Defined in

packages/mongodb-rag-core/src/dataSources/CodeOnGithubTextDataSource.ts:72


removeMarkdownImagesAndLinks(content): string

Utility function to remove markdown images and links from a string. Useful if you do not want to include images and links in content, which can add significantly add to the token count when creating embeddings while also diluting the semantic meaning of the content.

Parameters

NameType
contentstring

Returns

string

Defined in

packages/mongodb-rag-core/src/dataSources/removeMarkdownImagesAndLinks.ts:7