Skip to main content

Module: embed

Type Aliases

ChunkFunc

Ƭ ChunkFunc: (page: Page, options?: Partial<ChunkOptions>) => Promise<ContentChunk[]>

Type declaration

▸ (page, options?): Promise<ContentChunk[]>

A ChunkFunc is a function that takes a page and returns it in chunks.

Parameters
NameType
pagePage
options?Partial<ChunkOptions>
Returns

Promise<ContentChunk[]>

Defined in

mongodb-rag-ingest/src/embed/chunkPage.ts:12


ChunkMetadataGetter

Ƭ ChunkMetadataGetter<T>: (args: { chunk: Omit<ContentChunk, "tokenCount"> ; metadata?: T ; page: Page ; text: string }) => Promise<T>

Type parameters

NameType
Textends Record<string, unknown> = Record<string, unknown>

Type declaration

▸ (args): Promise<T>

Parameters
NameTypeDescription
argsObject-
args.chunkOmit<ContentChunk, "tokenCount">-
args.metadata?TPrevious metadata, if any. Omitting this from the return value should not overwrite previous metadata.
args.pagePage-
args.textstringThe text of the chunk without metadata.
Returns

Promise<T>

Defined in

mongodb-rag-ingest/src/embed/ChunkTransformer.ts:13


ChunkOptions

Ƭ ChunkOptions: Object

Options for converting a Page into ContentChunk[].

Type declaration

NameTypeDescription
chunkOverlapnumberNumber of tokens to overlap between chunks. If this is 0, chunks will not overlap. If this is greater than 0, chunks will overlap by this number of tokens.
maxChunkSizenumberMaximum chunk size before transform function is applied to it. If Page has more tokens than this number, it is split into smaller chunks.
minChunkSize?numberMinimum chunk size before transform function is applied to it. If a chunk has fewer tokens than this number, it is discarded before ingestion. You can use this as a vector search optimization to avoid including chunks with very few tokens and thus very little semantic meaning. Example You might set this to 15 to avoid including chunks that are just a few characters or words. For instance, you likely would not want to set a chunk that is just the closing of a code block (), which occurs not infrequently if chunking using the Langchain RecursiveCharacterTextSplitter. Chunk 1: ````text py foo = "bar" # more semantically relevant python code... Chunk 2:text ``` ````
tokenizerSomeTokenizerTokenizer to use to count number of tokens in text.
transform?ChunkTransformerTransform to be applied to each chunk as it is produced. Provides the opportunity to prepend metadata, etc.
yamlChunkSize?numberIf provided, this will override the maxChunkSize for openapi-yaml pages. This is useful because openapi-yaml pages tend to be very large, and we want to split them into smaller chunks than the default maxChunkSize.

Defined in

mongodb-rag-ingest/src/embed/chunkPage.ts:20


ChunkTransformer

Ƭ ChunkTransformer: (chunk: Omit<ContentChunk, "tokenCount">, details: { page: Page }) => Promise<Omit<ContentChunk, "tokenCount">>

Type declaration

▸ (chunk, details): Promise<Omit<ContentChunk, "tokenCount">>

Parameters
NameType
chunkOmit<ContentChunk, "tokenCount">
detailsObject
details.pagePage
Returns

Promise<Omit<ContentChunk, "tokenCount">>

Defined in

mongodb-rag-ingest/src/embed/ChunkTransformer.ts:6


ContentChunk

Ƭ ContentChunk: Omit<EmbeddedContent, "embedding" | "updated">

Defined in

mongodb-rag-ingest/src/embed/chunkPage.ts:7


SomeTokenizer

Ƭ SomeTokenizer: Object

Type declaration

NameType
encode(text: string) => { bpe: number[] ; text: string[] }

Defined in

mongodb-rag-ingest/src/embed/chunkPage.ts:80

Variables

defaultOpenApiSpecYamlChunkOptions

Const defaultOpenApiSpecYamlChunkOptions: ChunkOptions

Defined in

mongodb-rag-ingest/src/embed/chunkOpenApiSpecYaml.ts:13

Functions

chunkMd

chunkMd(page, options?): Promise<ContentChunk[]>

A ChunkFunc is a function that takes a page and returns it in chunks.

Parameters

NameType
pagePage
options?Partial<ChunkOptions>

Returns

Promise<ContentChunk[]>

Defined in

mongodb-rag-ingest/src/embed/chunkPage.ts:12


chunkOpenApiSpecYaml

chunkOpenApiSpecYaml(page, options?): Promise<ContentChunk[]>

A ChunkFunc is a function that takes a page and returns it in chunks.

Parameters

NameType
pagePage
options?Partial<ChunkOptions>

Returns

Promise<ContentChunk[]>

Defined in

mongodb-rag-ingest/src/embed/chunkPage.ts:12


chunkPage

chunkPage(page, options?): Promise<ContentChunk[]>

Returns chunked of a content page.

Parameters

NameType
pagePage
options?Partial<ChunkOptions>

Returns

Promise<ContentChunk[]>

Defined in

mongodb-rag-ingest/src/embed/chunkPage.ts:12


makeChunkFrontMatterUpdater

makeChunkFrontMatterUpdater<T>(getMetadata): ChunkTransformer

Create a function that adds or updates front matter metadata to the chunk text.

Type parameters

NameType
Textends Record<string, unknown> = Record<string, unknown>

Parameters

NameType
getMetadataChunkMetadataGetter<T>

Returns

ChunkTransformer

Defined in

mongodb-rag-ingest/src/embed/ChunkTransformer.ts:36


standardChunkFrontMatterUpdater

standardChunkFrontMatterUpdater(chunk, details): Promise<Omit<ContentChunk, "tokenCount">>

Parameters

NameType
chunkOmit<ContentChunk, "tokenCount">
detailsObject
details.pagePage

Returns

Promise<Omit<ContentChunk, "tokenCount">>

Defined in

mongodb-rag-ingest/src/embed/ChunkTransformer.ts:6


standardMetadataGetter

standardMetadataGetter(args): Promise<{ [k: string]: unknown; codeBlockLanguages?: string[] ; hasCodeBlock: boolean ; pageTitle?: string ; tags?: string[] }>

Forms common metadata based on the chunk text, including info about any code examples in the text.

Parameters

NameTypeDescription
argsObject-
args.chunkOmit<ContentChunk, "tokenCount">-
args.metadata?ObjectPrevious metadata, if any. Omitting this from the return value should not overwrite previous metadata.
args.metadata.codeBlockLanguages?string[]-
args.metadata.hasCodeBlockboolean-
args.metadata.pageTitle?string-
args.metadata.tags?string[]-
args.pagePage-
args.textstringThe text of the chunk without metadata.

Returns

Promise<{ [k: string]: unknown; codeBlockLanguages?: string[] ; hasCodeBlock: boolean ; pageTitle?: string ; tags?: string[] }>

Defined in

mongodb-rag-ingest/src/embed/ChunkTransformer.ts:15


updateEmbeddedContent

updateEmbeddedContent(«destructured»): Promise<void>

(Re-)embeddedContent the pages in the page store that have changed since the given date and stores the embeddedContent in the embeddedContent store.

Parameters

NameType
«destructured»Object
› chunkOptions?Partial<ChunkOptions>
› embeddedContentStoreEmbeddedContentStore
› embedderEmbedder
› pageStorePageStore
› sinceDate
› sourceNames?string[]

Returns

Promise<void>

Defined in

mongodb-rag-ingest/src/embed/updateEmbeddedContent.ts:16


updateEmbeddedContentForPage

updateEmbeddedContentForPage(«destructured»): Promise<void>

Parameters

NameType
«destructured»Object
› chunkOptions?Partial<ChunkOptions>
› embedderEmbedder
› pagePersistedPage
› storeEmbeddedContentStore

Returns

Promise<void>

Defined in

mongodb-rag-ingest/src/embed/updateEmbeddedContent.ts:77