Kingfisher Architecture¶
This document focuses on the runtime architecture of Kingfisher as implemented in this repository today.
It shows:
- a high-level component map of the main crates, modules, command paths, and outputs
- the execution flow for
kingfisher scan
Component Map¶
flowchart LR
User[User or CI] --> CLI[kingfisher CLI] --> Main[Dispatch and runtime]
subgraph Commands[Commands]
ScanCmd[scan]
ValidateCmd[validate]
RevokeCmd[revoke]
AccessMapCmd[access-map]
ViewCmd[view]
RulesCmd[rules]
end
Main --> ScanCmd
Main --> ValidateCmd
Main --> RevokeCmd
Main --> AccessMapCmd
Main --> ViewCmd
Main --> RulesCmd
subgraph Inputs[Inputs]
FS[Files and dirs]
Git[Git repos and history]
Hosts[Git hosts]
Docs[Jira Confluence Slack Teams]
Remote[S3 GCS Docker]
end
subgraph Pipeline[Scan pipeline]
Runner[Scan runner]
Enumerate[Enumerate and fetch]
Process[Process blobs]
Match[Match secrets]
Store[FindingsStore]
Filter[Dedup baseline safelist]
Validate[Validate]
Map[Access map]
Report[Report]
Viewer[Viewer]
end
subgraph Crates[Reusable crates]
Core[kingfisher-core]
Rules[kingfisher-rules]
ScannerLib[kingfisher-scanner]
end
subgraph Engines[Engines]
Vector[vectorscan]
ScanPool[scanner pool]
Tree[tree-sitter]
Liquid[Liquid templates]
end
APIs[Provider APIs]
Output[Terminal and report files]
Browser[Browser UI]
ScanCmd --> Runner --> Enumerate --> Process --> Match --> Store --> Filter
Filter --> Validate
Filter --> Report
Validate --> Map
Validate --> Report
Map --> Report
Report --> Output
Report --> Viewer --> Browser
FS --> Enumerate
Git --> Enumerate
Hosts --> Enumerate
Docs --> Enumerate
Remote --> Enumerate
Core --> Process
Core --> Match
Rules --> Match
ScannerLib --> Match
ScannerLib --> Validate
Match --> Vector --> ScanPool
Match --> Tree
Validate --> Liquid
Validate --> APIs
ValidateCmd --> Liquid
ValidateCmd --> APIs
RevokeCmd --> Liquid
RevokeCmd --> APIs
AccessMapCmd --> APIs
ViewCmd --> Viewer What Lives Where¶
src/main.rs: top-level command dispatch, Tokio runtime setup, allocator selection (mimalloc/jemalloc/system), update checks, and command routing.src/scanner/runner.rs: the orchestration hub forscan, including repo enumeration, clone streaming, artifact fetching, validation setup, sequential or parallel scan execution (threshold: >10 git repos triggers parallel mode), reporting, and summary generation.src/scanner/*: input enumeration (enumerate.rs), repository handling and artifact fetching (repos.rs), blob processing (processing.rs), validation coordination (validation.rs), scan summaries (summary.rs), Docker image scanning (docker.rs), and utilities (util.rs).src/matcher/*: the main detection engine (mod.rs), including vectorscan callbacks, regex helpers, Base64 discovery (base64_decode.rs), capture group handling (captures.rs), dedup support (dedup.rs), filtering (filter.rs), and finding fingerprinting (fingerprint.rs).src/parser.rs: tree-sitter integration for language-aware parsing, supporting 17+ languages (Bash, C, C#, C++, CSS, Go, HTML, Java, JavaScript, PHP, Python, Ruby, Rust, TOML, TypeScript, YAML, and regex).src/scanner_pool.rs: thread-local vectorscanBlockScannerpool, providing safe reuse of compiled pattern databases across scan threads.src/reporter.rsandsrc/reporter/*: report rendering for pretty, JSON, BSON, TOON, SARIF, and HTML outputs, plus the data model used by the viewer.src/direct_validate.rs: direct validation of a known secret without going through pattern matching. Supports HTTP, AWS, Azure, GCP, JDBC, MongoDB, MySQL, PostgreSQL, JWT, and Coinbase validators, with Liquid template integration for custom validation logic.src/direct_revoke.rs: direct revocation of a known secret without going through the scan pipeline. Uses Liquid templates for revocation configurations and supports multi-step HTTP revocation flows.src/access_map.rsandsrc/access_map/*: standalone blast-radius mapping with 24 provider implementations including AWS, Azure, GCP, GitHub, GitLab, Slack, Bitbucket, Gitea, Hugging Face, Buildkite, Anthropic, OpenAI, and more.
Notes And Boundaries¶
- The main CLI scan path is implemented primarily in the application modules under
src/, not inkingfisher-scanner. kingfisher-scanneris still important: it provides the embeddable scanner API plus shared validation and primitive functionality reused by the application.- Direct
validate,revoke, and standaloneaccess-mapare sibling command paths. They are not downstream stages ofFindingsStore. - Reporting is downstream from the datastore, which lets Kingfisher emit multiple output formats and drive the local viewer from the same finding set.
- The matching layer is intentionally hybrid: vectorscan provides high-throughput SIMD-accelerated pattern detection, while regex helpers, Base64 support, and tree-sitter verification improve accuracy and reduce false positives.
FindingsStoreuses an in-memory store with a Bloom filter for deduplication, replacing the earlier SQLite-based storage model.- Validation and revocation templates are rendered via Liquid, allowing rule authors to define HTTP request sequences, variable extraction, and multi-step flows in YAML without touching Rust code.