Namespace: Chunk
Type Aliases
ChunkFunc
Ƭ ChunkFunc: (page
: Page
, options?
: Partial
<ChunkOptions
>) => Promise
<ContentChunk
[]>
Type declaration
▸ (page
, options?
): Promise
<ContentChunk
[]>
A ChunkFunc is a function that takes a page and returns it in chunks.
Parameters
Name | Type |
---|---|
page | Page |
options? | Partial <ChunkOptions > |
Returns
Promise
<ContentChunk
[]>
Defined in
packages/mongodb-rag-core/build/chunk/chunkPage.d.ts:8
ChunkMetadataGetter
Ƭ ChunkMetadataGetter<T
>: (args
: { chunk
: Omit
<ContentChunk
, "tokenCount"
> ; metadata?
: T
; page
: Page
; text
: string
}) => Promise
<T
>
Type parameters
Name | Type |
---|---|
T | extends Record <string , unknown > = Record <string , unknown > |
Type declaration
▸ (args
): Promise
<T
>
Parameters
Name | Type | Description |
---|---|---|
args | Object | - |
args.chunk | Omit <ContentChunk , "tokenCount" > | - |
args.metadata? | T | Previous metadata, if any. Omitting this from the return value should not overwrite previous metadata. |
args.page | Page | - |
args.text | string | The text of the chunk without metadata. |
Returns
Promise
<T
>
Defined in
packages/mongodb-rag-core/build/chunk/ChunkTransformer.d.ts:6
ChunkOptions
Ƭ ChunkOptions: Object
Options for converting a Page
into ContentChunk[]
.
Type declaration
Name | Type | Description |
---|---|---|
chunkOverlap | number | Number of tokens to overlap between chunks. If this is 0, chunks will not overlap. If this is greater than 0, chunks will overlap by this number of tokens. |
maxChunkSize | number | Maximum chunk size before transform function is applied to it. If Page has more tokens than this number, it is split into smaller chunks. |
minChunkSize? | number | Minimum chunk size before transform function is applied to it. If a chunk has fewer tokens than this number, it is discarded before ingestion. You can use this as a vector search optimization to avoid including chunks with very few tokens and thus very little semantic meaning. Example You might set this to 15 to avoid including chunks that are just a few characters or words. For instance, you likely would not want to set a chunk that is just the closing of a code block (), which occurs not infrequently if chunking using the Langchain RecursiveCharacterTextSplitter. Chunk 1: ````text py foo = "bar" # more semantically relevant python code... Chunk 2: text ``` ```` |
tokenizer | SomeTokenizer | Tokenizer to use to count number of tokens in text. |
transform? | ChunkTransformer | Transform to be applied to each chunk as it is produced. Provides the opportunity to prepend metadata, etc. |
yamlChunkSize? | number | If provided, this will override the maxChunkSize for openapi-yaml pages. This is useful because openapi-yaml pages tend to be very large, and we want to split them into smaller chunks than the default maxChunkSize. |
Defined in
packages/mongodb-rag-core/build/chunk/chunkPage.d.ts:12
ChunkTransformer
Ƭ ChunkTransformer: (chunk
: Omit
<ContentChunk
, "tokenCount"
>, details
: { page
: Page
}) => Promise
<Omit
<ContentChunk
, "tokenCount"
>>
Type declaration
▸ (chunk
, details
): Promise
<Omit
<ContentChunk
, "tokenCount"
>>
Parameters
Name | Type |
---|---|
chunk | Omit <ContentChunk , "tokenCount" > |
details | Object |
details.page | Page |
Returns
Promise
<Omit
<ContentChunk
, "tokenCount"
>>
Defined in
packages/mongodb-rag-core/build/chunk/ChunkTransformer.d.ts:3
ContentChunk
Ƭ ContentChunk: Omit
<EmbeddedContent
, "embeddings"
| "updated"
>
Defined in
packages/mongodb-rag-core/build/chunk/chunkPage.d.ts:4
SomeTokenizer
Ƭ SomeTokenizer: Object
Type declaration
Name | Type |
---|---|
encode | (text : string ) => { bpe : number [] ; text : string [] } |
Defined in
packages/mongodb-rag-core/build/chunk/chunkPage.d.ts:66
Variables
defaultOpenApiSpecYamlChunkOptions
• Const
defaultOpenApiSpecYamlChunkOptions: ChunkOptions
Defined in
packages/mongodb-rag-core/build/chunk/chunkOpenApiSpecYaml.d.ts:2
Functions
chunkCode
▸ chunkCode(page
, options?
): Promise
<ContentChunk
[]>
A ChunkFunc is a function that takes a page and returns it in chunks.
Parameters
Name | Type |
---|---|
page | Page |
options? | Partial <ChunkOptions > |
Returns
Promise
<ContentChunk
[]>
Defined in
packages/mongodb-rag-core/build/chunk/chunkPage.d.ts:8
chunkMd
▸ chunkMd(page
, options?
): Promise
<ContentChunk
[]>
A ChunkFunc is a function that takes a page and returns it in chunks.
Parameters
Name | Type |
---|---|
page | Page |
options? | Partial <ChunkOptions > |
Returns
Promise
<ContentChunk
[]>
Defined in
packages/mongodb-rag-core/build/chunk/chunkPage.d.ts:8
chunkOpenApiSpecYaml
▸ chunkOpenApiSpecYaml(page
, options?
): Promise
<ContentChunk
[]>
A ChunkFunc is a function that takes a page and returns it in chunks.
Parameters
Name | Type |
---|---|
page | Page |
options? | Partial <ChunkOptions > |
Returns
Promise
<ContentChunk
[]>
Defined in
packages/mongodb-rag-core/build/chunk/chunkPage.d.ts:8
chunkPage
▸ chunkPage(page
, options?
): Promise
<ContentChunk
[]>
Returns chunked of a content page.
Parameters
Name | Type |
---|---|
page | Page |
options? | Partial <ChunkOptions > |
Returns
Promise
<ContentChunk
[]>
Defined in
packages/mongodb-rag-core/build/chunk/chunkPage.d.ts:8
isSupportedLanguage
▸ isSupportedLanguage(str
): str is LangchainSupportedCodePageFormat
Parameters
Name | Type |
---|---|
str | "txt" | "md" | "mdx" | "restructuredtext" | "csv" | "json" | "yaml" | "toml" | "xml" | "openapi-yaml" | "openapi-json" | "graphql" | "c" | "cpp" | "csharp" | "go" | "html" | "java" | "javascript" | "kotlin" | "latex" | "objective-c" | "php" | "python" | "ruby" | "rust" | "scala" | "shell" | "swift" | "typescript" |
Returns
str is LangchainSupportedCodePageFormat
Defined in
packages/mongodb-rag-core/build/chunk/chunkCode.d.ts:7
makeChunkFrontMatterUpdater
▸ makeChunkFrontMatterUpdater<T
>(getMetadata
): ChunkTransformer
Create a function that adds or updates front matter metadata to the chunk text.
Type parameters
Name | Type |
---|---|
T | extends Record <string , unknown > = Record <string , unknown > |
Parameters
Name | Type |
---|---|
getMetadata | ChunkMetadataGetter <T > |
Returns
Defined in
packages/mongodb-rag-core/build/chunk/ChunkTransformer.d.ts:23
pageFormatToLanguage
▸ pageFormatToLanguage(format
): SupportedTextSplitterLanguage
| undefined
Parameters
Name | Type |
---|---|
format | "txt" | "md" | "mdx" | "restructuredtext" | "csv" | "json" | "yaml" | "toml" | "xml" | "openapi-yaml" | "openapi-json" | "graphql" | "c" | "cpp" | "csharp" | "go" | "html" | "java" | "javascript" | "kotlin" | "latex" | "objective-c" | "php" | "python" | "ruby" | "rust" | "scala" | "shell" | "swift" | "typescript" |
Returns
SupportedTextSplitterLanguage
| undefined
Defined in
packages/mongodb-rag-core/build/chunk/chunkCode.d.ts:8
standardChunkFrontMatterUpdater
▸ standardChunkFrontMatterUpdater(chunk
, details
): Promise
<Omit
<ContentChunk
, "tokenCount"
>>
Parameters
Name | Type |
---|---|
chunk | Omit <ContentChunk , "tokenCount" > |
details | Object |
details.page | Page |
Returns
Promise
<Omit
<ContentChunk
, "tokenCount"
>>
Defined in
packages/mongodb-rag-core/build/chunk/ChunkTransformer.d.ts:3
standardMetadataGetter
▸ standardMetadataGetter(args
): Promise
<{ [k: string]
: unknown
; codeBlockLanguages?
: string
[] ; hasCodeBlock
: boolean
; pageTitle?
: string
; tags?
: string
[] }>
Forms common metadata based on the chunk text, including info about any code examples in the text.
Parameters
Name | Type | Description |
---|---|---|
args | Object | - |
args.chunk | Omit <ContentChunk , "tokenCount" > | - |
args.metadata? | Object | Previous metadata, if any. Omitting this from the return value should not overwrite previous metadata. |
args.metadata.codeBlockLanguages? | string [] | - |
args.metadata.hasCodeBlock | boolean | - |
args.metadata.pageTitle? | string | - |
args.metadata.tags? | string [] | - |
args.page | Page | - |
args.text | string | The text of the chunk without metadata. |
Returns
Promise
<{ [k: string]
: unknown
; codeBlockLanguages?
: string
[] ; hasCodeBlock
: boolean
; pageTitle?
: string
; tags?
: string
[] }>
Defined in
packages/mongodb-rag-core/build/chunk/ChunkTransformer.d.ts:6