Module: dataSources
Interfaces
Type Aliases
DataSource
Ƭ DataSource: Object
Represents a source of page data.
Type declaration
Name | Type | Description |
---|---|---|
name | string | The unique name among registered data sources. |
fetchPages | () => Promise <Page []> | Fetches pages in the data source. |
Defined in
packages/mongodb-rag-core/src/dataSources/DataSource.ts:6
FilterFunc
Ƭ FilterFunc: (path
: string
) => boolean
Type declaration
▸ (path
): boolean
Parameters
Name | Type |
---|---|
path | string |
Returns
boolean
Defined in
packages/mongodb-rag-core/src/dataSources/GitDataSource.ts:155
HandleHtmlPageFuncOptions
Ƭ HandleHtmlPageFuncOptions: Object
Type declaration
Name | Type | Description |
---|---|---|
extractMetadata? | (domDoc : Document ) => Record <string , unknown > | Extract metadata from page DOM. Added to the Page.metadata field. If a in the result of extractMetadata() is the same as a key in metadata , the extractMetadata() key will override it. |
extractTitle? | (domDoc : Document ) => string | undefined | Extract Page.title from page content and path. |
metadata? | PageMetadata | Page.metadata passed from config. Included in all documents |
pathToPageUrl | (path : string ) => string | Construct the Page.url from page path. |
postProcessMarkdown? | (markdown : string ) => Promise <string > | Transform Markdown once it's been generated |
removeElements | (domDoc : Document ) => Element [] | Returns an array of DOM elements to be removed from the parsed document. |
Defined in
packages/mongodb-rag-core/src/dataSources/handleHtmlDocument.ts:7
HandlePageFunc
Ƭ HandlePageFunc: (path
: string
, content
: string
) => Promise
<undefined
| Omit
<Page
, "sourceName"
> | Omit
<Page
, "sourceName"
>[]>
Type declaration
▸ (path
, content
): Promise
<undefined
| Omit
<Page
, "sourceName"
> | Omit
<Page
, "sourceName"
>[]>
Function to convert a file in the repo into a Page
or Page[]
.
Parameters
Name | Type | Description |
---|---|---|
path | string | Path to file in repo |
content | string | Contents of file in repo |
Returns
Promise
<undefined
| Omit
<Page
, "sourceName"
> | Omit
<Page
, "sourceName"
>[]>
Defined in
packages/mongodb-rag-core/src/dataSources/GitDataSource.ts:16
MakeCodeOnGithubTextDataSourceParams
Ƭ MakeCodeOnGithubTextDataSourceParams: Omit
<MakeGitHubDataSourceArgs
, "handleDocumentInRepo"
> & { metadata?
: PageMetadata
}
Defined in
packages/mongodb-rag-core/src/dataSources/CodeOnGithubTextDataSource.ts:8
MakeGitHubDataSourceArgs
Ƭ MakeGitHubDataSourceArgs: Object
Type declaration
Name | Type | Description |
---|---|---|
filter? | MakeGitDataSourceParams ["filter" ] | Filter function to filter out files from the repo. Using this overrides the repoLoaderOptions.ignorePaths option. Note that file paths will have a leading slash (e.g. /somedir/somefile.txt ). |
name | string | The data source name. |
repoLoaderOptions? | Partial <GithubRepoLoaderParams > | The branch to fetch. |
repoUrl | string | The GitHub repo URL. |
handleDocumentInRepo | (document : Document <{ source : string }>) => Promise <undefined | Omit <Page , "sourceName" > | Omit <Page , "sourceName" >[]> | Handle a given file in the repo. Any number of Pages can be returned for a given file. The exact details depend on the given repo. Return undefined to skip this document. Page sourceName will be overridden by the name passed to makeGitHubDataSource. |
Defined in
packages/mongodb-rag-core/src/dataSources/GitHubDataSource.ts:7
MakeMdOnGithubDataSourceParams
Ƭ MakeMdOnGithubDataSourceParams: Omit
<MakeGitHubDataSourceArgs
, "handleDocumentInRepo"
> & { extractMetadata?
: (pageContent
: string
, frontMatter?
: Record
<string
, unknown
>) => PageMetadata
; extractTitle?
: (pageContent
: string
, frontMatter?
: Record
<string
, unknown
>) => string
| undefined
; filter?
: MakeGitHubDataSourceArgs
["filter"
] ; frontMatter?
: { format?
: string
; process
: boolean
; separator?
: string
} ; metadata?
: PageMetadata
; pathToPageUrl
: (pathInRepo
: string
, frontMatter?
: Record
<string
, unknown
>) => string
}
Defined in
packages/mongodb-rag-core/src/dataSources/MdOnGithubDataSource.ts:10
Functions
extractHtmlH1
▸ extractHtmlH1(domDoc
): undefined
| string
Parameters
Name | Type |
---|---|
domDoc | Document |
Returns
undefined
| string
Defined in
packages/mongodb-rag-core/src/dataSources/handleHtmlDocument.ts:96
getAcquitTestsFromGithubRepo
▸ getAcquitTestsFromGithubRepo(repoUrl
, repoLoaderOptions
): Promise
<string
[]>
Parameters
Name | Type |
---|---|
repoUrl | string |
repoLoaderOptions | Partial <GithubRepoLoaderParams > |
Returns
Promise
<string
[]>
Defined in
packages/mongodb-rag-core/src/dataSources/AcquitRequireMdOnGithubDataSource.ts:97
getRelevantFilePathsInDir
▸ getRelevantFilePathsInDir(directoryPath
, filter
, fileList?
): string
[]
Parameters
Name | Type | Default value |
---|---|---|
directoryPath | string | undefined |
filter | FilterFunc | undefined |
fileList | string [] | [] |
Returns
string
[]
Defined in
packages/mongodb-rag-core/src/dataSources/GitDataSource.ts:157
getRelevantFilesAsStrings
▸ getRelevantFilesAsStrings(«destructured»
): Promise
<Record
<string
, string
>>
Parameters
Name | Type |
---|---|
«destructured» | Object |
› directoryPath | string |
› filter | FilterFunc |
Returns
Promise
<Record
<string
, string
>>
Defined in
packages/mongodb-rag-core/src/dataSources/GitDataSource.ts:178
getRepoLocally
▸ getRepoLocally(«destructured»
): Promise
<void
>
Parameters
Name | Type |
---|---|
«destructured» | Object |
› localPath | string |
› options? | TaskOptions |
› repoPath | string |
Returns
Promise
<void
>
Defined in
packages/mongodb-rag-core/src/dataSources/GitDataSource.ts:132
handleHtmlDocument
▸ handleHtmlDocument(path
, content
, options
): Promise
<Omit
<Page
, "sourceName"
>>
Parameters
Name | Type |
---|---|
path | string |
content | string |
options | HandleHtmlPageFuncOptions |
Returns
Promise
<Omit
<Page
, "sourceName"
>>
Defined in
packages/mongodb-rag-core/src/dataSources/handleHtmlDocument.ts:31
makeAcquitRequireMdOnGithubDataSource
▸ makeAcquitRequireMdOnGithubDataSource(«destructured»
): Promise
<DataSource
>
Loads an MD/Acquit docs site from a GitHub repo. Acquit is a tool for writing tests in comments, and then extracting them into a test suite. This function loads the tests from the repo, and then transforms the document content to include tests from the test suite in the document. Acquit is used in the Mongoose ODM documentation. This data source assumes that the test files are in the same repo as the docs.
Parameters
Name | Type |
---|---|
«destructured» | Omit <MakeGitHubDataSourceArgs , "handleDocumentInRepo" > & { acquitCodeBlockLanguageReplacement? : string ; metadata? : PageMetadata ; pathToPageUrl : (pathInRepo : string ) => string ; testFileLoaderOptions : Partial <GithubRepoLoaderParams > } |
Returns
Promise
<DataSource
>
Defined in
packages/mongodb-rag-core/src/dataSources/AcquitRequireMdOnGithubDataSource.ts:22
makeCodeOnGithubTextDataSource
▸ makeCodeOnGithubTextDataSource(«destructured»
): Promise
<DataSource
>
Loads source code files from a GitHub repo.
Parameters
Name | Type |
---|---|
«destructured» | MakeCodeOnGithubTextDataSourceParams |
Returns
Promise
<DataSource
>
Defined in
packages/mongodb-rag-core/src/dataSources/CodeOnGithubTextDataSource.ts:20
makeGitDataSource
▸ makeGitDataSource(«destructured»
): DataSource
Loads and processes files from a Git repo (can be hosted anywhere).
Parameters
Name | Type |
---|---|
«destructured» | MakeGitDataSourceParams |
Returns
Defined in
packages/mongodb-rag-core/src/dataSources/GitDataSource.ts:57
makeGitHubDataSource
▸ makeGitHubDataSource(«destructured»
): DataSource
Loads an arbitrary GitHub repo and converts its contents into pages.
Parameters
Name | Type |
---|---|
«destructured» | MakeGitHubDataSourceArgs |
Returns
Defined in
packages/mongodb-rag-core/src/dataSources/GitHubDataSource.ts:50
makeLangChainDocumentLoaderDataSource
▸ makeLangChainDocumentLoaderDataSource(«destructured»
): DataSource
Create a data source that loads pages from a Langchain document loader.
Parameters
Name | Type |
---|---|
«destructured» | MakeLangChainDocumentLoaderDataSourceParams |
Returns
Defined in
packages/mongodb-rag-core/src/dataSources/LangchainDocumentLoaderDataSource.ts:37
makeMdOnGithubDataSource
▸ makeMdOnGithubDataSource(«destructured»
): Promise
<DataSource
>
Loads an .md/.mdx docs site from a GitHub repo.
Parameters
Name | Type |
---|---|
«destructured» | MakeMdOnGithubDataSourceParams |
Returns
Promise
<DataSource
>
Defined in
packages/mongodb-rag-core/src/dataSources/MdOnGithubDataSource.ts:73
makeRandomTmp
▸ makeRandomTmp(prefix
): string
Parameters
Name | Type | Description |
---|---|---|
prefix | string | prefix for the temporary directory name |
Returns
string
Defined in
packages/mongodb-rag-core/src/dataSources/GitDataSource.ts:123
pageBlobUrl
▸ pageBlobUrl(args
): string
Parameters
Name | Type |
---|---|
args | Object |
args.branch | string |
args.filePath? | string | string [] |
args.repoUrl | string |
Returns
string
Defined in
packages/mongodb-rag-core/src/dataSources/CodeOnGithubTextDataSource.ts:72
removeMarkdownImagesAndLinks
▸ removeMarkdownImagesAndLinks(content
): string
Utility function to remove markdown images and links from a string. Useful if you do not want to include images and links in content, which can add significantly add to the token count when creating embeddings while also diluting the semantic meaning of the content.
Parameters
Name | Type |
---|---|
content | string |
Returns
string
Defined in
packages/mongodb-rag-core/src/dataSources/removeMarkdownImagesAndLinks.ts:7