Skip to content

Kingfisher Architecture

This document focuses on the runtime architecture of Kingfisher as implemented in this repository today.

It shows:

  • a high-level component map of the main crates, modules, command paths, and outputs
  • the execution flow for kingfisher scan

Component Map

flowchart LR
    User[User or CI] --> CLI[kingfisher CLI] --> Main[Dispatch and runtime]

    subgraph Commands[Commands]
        ScanCmd[scan]
        ValidateCmd[validate]
        RevokeCmd[revoke]
        AccessMapCmd[access-map]
        ViewCmd[view]
        RulesCmd[rules]
    end

    Main --> ScanCmd
    Main --> ValidateCmd
    Main --> RevokeCmd
    Main --> AccessMapCmd
    Main --> ViewCmd
    Main --> RulesCmd

    subgraph Inputs[Inputs]
        FS[Files and dirs]
        Git[Git repos and history]
        Hosts[Git hosts]
        Docs[Jira Confluence Slack Teams]
        Remote[S3 GCS Docker]
    end

    subgraph Pipeline[Scan pipeline]
        Runner[Scan runner]
        Enumerate[Enumerate and fetch]
        Process[Process blobs]
        Match[Match secrets]
        Store[FindingsStore]
        Filter[Dedup baseline safelist]
        Validate[Validate]
        Map[Access map]
        Report[Report]
        Viewer[Viewer]
    end

    subgraph Crates[Reusable crates]
        Core[kingfisher-core]
        Rules[kingfisher-rules]
        ScannerLib[kingfisher-scanner]
    end

    subgraph Engines[Engines]
        Vector[vectorscan]
        ScanPool[scanner pool]
        Tree[tree-sitter]
        Liquid[Liquid templates]
    end

    APIs[Provider APIs]
    Output[Terminal and report files]
    Browser[Browser UI]

    ScanCmd --> Runner --> Enumerate --> Process --> Match --> Store --> Filter
    Filter --> Validate
    Filter --> Report
    Validate --> Map
    Validate --> Report
    Map --> Report
    Report --> Output
    Report --> Viewer --> Browser

    FS --> Enumerate
    Git --> Enumerate
    Hosts --> Enumerate
    Docs --> Enumerate
    Remote --> Enumerate

    Core --> Process
    Core --> Match
    Rules --> Match
    ScannerLib --> Match
    ScannerLib --> Validate

    Match --> Vector --> ScanPool
    Match --> Tree
    Validate --> Liquid
    Validate --> APIs

    ValidateCmd --> Liquid
    ValidateCmd --> APIs
    RevokeCmd --> Liquid
    RevokeCmd --> APIs
    AccessMapCmd --> APIs
    ViewCmd --> Viewer

What Lives Where

  • src/main.rs: top-level command dispatch, Tokio runtime setup, allocator selection (mimalloc/jemalloc/system), update checks, and command routing.
  • src/scanner/runner.rs: the orchestration hub for scan, including repo enumeration, clone streaming, artifact fetching, validation setup, sequential or parallel scan execution (threshold: >10 git repos triggers parallel mode), reporting, and summary generation.
  • src/scanner/*: input enumeration (enumerate.rs), repository handling and artifact fetching (repos.rs), blob processing (processing.rs), validation coordination (validation.rs), scan summaries (summary.rs), Docker image scanning (docker.rs), and utilities (util.rs).
  • src/matcher/*: the main detection engine (mod.rs), including vectorscan callbacks, regex helpers, Base64 discovery (base64_decode.rs), capture group handling (captures.rs), dedup support (dedup.rs), filtering (filter.rs), and finding fingerprinting (fingerprint.rs).
  • src/parser.rs: tree-sitter integration for language-aware parsing, supporting 17+ languages (Bash, C, C#, C++, CSS, Go, HTML, Java, JavaScript, PHP, Python, Ruby, Rust, TOML, TypeScript, YAML, and regex).
  • src/scanner_pool.rs: thread-local vectorscan BlockScanner pool, providing safe reuse of compiled pattern databases across scan threads.
  • src/reporter.rs and src/reporter/*: report rendering for pretty, JSON, BSON, TOON, SARIF, and HTML outputs, plus the data model used by the viewer.
  • src/direct_validate.rs: direct validation of a known secret without going through pattern matching. Supports HTTP, AWS, Azure, GCP, JDBC, MongoDB, MySQL, PostgreSQL, JWT, and Coinbase validators, with Liquid template integration for custom validation logic.
  • src/direct_revoke.rs: direct revocation of a known secret without going through the scan pipeline. Uses Liquid templates for revocation configurations and supports multi-step HTTP revocation flows.
  • src/access_map.rs and src/access_map/*: standalone blast-radius mapping with 24 provider implementations including AWS, Azure, GCP, GitHub, GitLab, Slack, Bitbucket, Gitea, Hugging Face, Buildkite, Anthropic, OpenAI, and more.

Notes And Boundaries

  • The main CLI scan path is implemented primarily in the application modules under src/, not in kingfisher-scanner.
  • kingfisher-scanner is still important: it provides the embeddable scanner API plus shared validation and primitive functionality reused by the application.
  • Direct validate, revoke, and standalone access-map are sibling command paths. They are not downstream stages of FindingsStore.
  • Reporting is downstream from the datastore, which lets Kingfisher emit multiple output formats and drive the local viewer from the same finding set.
  • The matching layer is intentionally hybrid: vectorscan provides high-throughput SIMD-accelerated pattern detection, while regex helpers, Base64 support, and tree-sitter verification improve accuracy and reduce false positives.
  • FindingsStore uses an in-memory store with a Bloom filter for deduplication, replacing the earlier SQLite-based storage model.
  • Validation and revocation templates are rendered via Liquid, allowing rule authors to define HTTP request sequences, variable extraction, and multi-step flows in YAML without touching Rust code.