Kingfisher Architecture¶

Name: Kingfisher
Author: MongoDB

This document focuses on the runtime architecture of Kingfisher as implemented in this repository today.

It shows:

a high-level component map of the main crates, modules, command paths, and outputs
the execution flow for kingfisher scan

Component Map¶

flowchart LR
    User[User or CI] --> CLI[kingfisher CLI] --> Main[Dispatch and runtime]

    subgraph Commands[Commands]
        ScanCmd[scan]
        ValidateCmd[validate]
        RevokeCmd[revoke]
        AccessMapCmd[access-map]
        ViewCmd[view]
        RulesCmd[rules]
    end

    Main --> ScanCmd
    Main --> ValidateCmd
    Main --> RevokeCmd
    Main --> AccessMapCmd
    Main --> ViewCmd
    Main --> RulesCmd

    subgraph Inputs[Inputs]
        FS[Files and dirs]
        Git[Git repos and history]
        Hosts[Git hosts]
        Docs[Jira Confluence Slack Teams]
        Remote[S3 GCS Docker]
    end

    subgraph Pipeline[Scan pipeline]
        Runner[Scan runner]
        Enumerate[Enumerate and fetch]
        Process[Process blobs]
        Match[Match secrets]
        Store[FindingsStore]
        Filter[Dedup baseline safelist]
        Validate[Validate]
        Map[Access map]
        Report[Report]
        Viewer[Viewer]
    end

    subgraph Crates[Reusable crates]
        Core[kingfisher-core]
        Rules[kingfisher-rules]
        ScannerLib[kingfisher-scanner]
    end

    subgraph Engines[Engines]
        Vector[vectorscan]
        ScanPool[scanner pool]
        Tree[tree-sitter]
        Liquid[Liquid templates]
    end

    APIs[Provider APIs]
    Output[Terminal and report files]
    Browser[Browser UI]

    ScanCmd --> Runner --> Enumerate --> Process --> Match --> Store --> Filter
    Filter --> Validate
    Filter --> Report
    Validate --> Map
    Validate --> Report
    Map --> Report
    Report --> Output
    Report --> Viewer --> Browser

    FS --> Enumerate
    Git --> Enumerate
    Hosts --> Enumerate
    Docs --> Enumerate
    Remote --> Enumerate

    Core --> Process
    Core --> Match
    Rules --> Match
    ScannerLib --> Match
    ScannerLib --> Validate

    Match --> Vector --> ScanPool
    Match --> Tree
    Validate --> Liquid
    Validate --> APIs

    ValidateCmd --> Liquid
    ValidateCmd --> APIs
    RevokeCmd --> Liquid
    RevokeCmd --> APIs
    AccessMapCmd --> APIs
    ViewCmd --> Viewer

What Lives Where¶

src/main.rs: top-level command dispatch, Tokio runtime setup, allocator selection (mimalloc/jemalloc/system), update checks, and command routing.
src/scanner/runner.rs: the orchestration hub for scan, including repo enumeration, clone streaming, artifact fetching, validation setup, sequential or parallel scan execution (threshold: >10 git repos triggers parallel mode), reporting, and summary generation.
src/scanner/*: input enumeration (enumerate.rs), repository handling and artifact fetching (repos.rs), blob processing (processing.rs), validation coordination (validation.rs), scan summaries (summary.rs), Docker image scanning (docker.rs), and utilities (util.rs).
src/matcher/*: the main detection engine (mod.rs), including vectorscan callbacks, regex helpers, Base64 discovery (base64_decode.rs), capture group handling (captures.rs), dedup support (dedup.rs), filtering (filter.rs), and finding fingerprinting (fingerprint.rs).
src/parser.rs: tree-sitter integration for language-aware parsing, supporting 17+ languages (Bash, C, C#, C++, CSS, Go, HTML, Java, JavaScript, PHP, Python, Ruby, Rust, TOML, TypeScript, YAML, and regex).
src/scanner_pool.rs: thread-local vectorscan BlockScanner pool, providing safe reuse of compiled pattern databases across scan threads.
src/reporter.rs and src/reporter/*: report rendering for pretty, JSON, BSON, TOON, SARIF, and HTML outputs, plus the data model used by the viewer.
src/direct_validate.rs: direct validation of a known secret without going through pattern matching. Supports HTTP, AWS, Azure, GCP, JDBC, MongoDB, MySQL, PostgreSQL, JWT, and Coinbase validators, with Liquid template integration for custom validation logic.
src/direct_revoke.rs: direct revocation of a known secret without going through the scan pipeline. Uses Liquid templates for revocation configurations and supports multi-step HTTP revocation flows.
src/access_map.rs and src/access_map/*: standalone blast-radius mapping with 24 provider implementations including AWS, Azure, GCP, GitHub, GitLab, Slack, Bitbucket, Gitea, Hugging Face, Buildkite, Anthropic, OpenAI, and more.

Notes And Boundaries¶

The main CLI scan path is implemented primarily in the application modules under src/, not in kingfisher-scanner.
kingfisher-scanner is still important: it provides the embeddable scanner API plus shared validation and primitive functionality reused by the application.
Direct validate, revoke, and standalone access-map are sibling command paths. They are not downstream stages of FindingsStore.
Reporting is downstream from the datastore, which lets Kingfisher emit multiple output formats and drive the local viewer from the same finding set.
The matching layer is intentionally hybrid: vectorscan provides high-throughput SIMD-accelerated pattern detection, while regex helpers, Base64 support, and tree-sitter verification improve accuracy and reduce false positives.
FindingsStore uses an in-memory store with a Bloom filter for deduplication, replacing the earlier SQLite-based storage model.
Validation and revocation templates are rendered via Liquid, allowing rule authors to define HTTP request sequences, variable extraction, and multi-step flows in YAML without touching Rust code.