Skip to content

Kingfisher Library Crates

Kingfisher's functionality is available as a set of Rust library crates that can be embedded into other applications. This guide covers how to use these crates for secret scanning in your own Rust projects.

Crate Overview

Crate Description
kingfisher-core Core types: Blob, BlobId, Location, Origin, entropy calculation
kingfisher-rules Rule definitions, YAML parsing, compiled rule database, builtin rules
kingfisher-scanner High-level scanning API with Scanner and Finding types

Crate Relationships

flowchart LR
    App[Your Rust application]
    Core[kingfisher-core]
    Rules[kingfisher-rules]
    Scanner[kingfisher-scanner]

    App --> Core
    App --> Rules
    App --> Scanner
    Scanner --> Core
    Scanner --> Rules

Optional Features

The kingfisher-scanner crate supports optional validation features:

Feature Description
validation Core validation support (includes HTTP validation)
validation-http HTTP-based validation for API tokens
validation-aws AWS credential validation via STS GetCallerIdentity
validation-all Enable all validation features

Quick Start

Add the crates to your Cargo.toml:

[dependencies]
kingfisher-core = { git = "https://github.com/mongodb/kingfisher" }
kingfisher-rules = { git = "https://github.com/mongodb/kingfisher" }
kingfisher-scanner = { git = "https://github.com/mongodb/kingfisher" }

Basic File Scanning

use std::sync::Arc;
use kingfisher_core::Blob;
use kingfisher_rules::{get_builtin_rules, RulesDatabase, Rule};
use kingfisher_scanner::Scanner;

fn main() -> anyhow::Result<()> {
    // 1. Load the builtin rules
    let rules = get_builtin_rules(None)?;

    // 2. Convert to Rule objects and compile into a database
    let rule_vec: Vec<Rule> = rules.iter_rules()
        .map(|syntax| Rule::new(syntax.clone()))
        .collect();
    let rules_db = Arc::new(RulesDatabase::from_rules(rule_vec)?);

    // 3. Create a scanner
    let scanner = Scanner::new(rules_db);

    // 4. Scan a file
    let findings = scanner.scan_file("path/to/file.txt")?;

    for finding in findings {
        println!(
            "Found {} at line {}",
            finding.rule_name,
            finding.location.line
        );
    }

    Ok(())
}

Scanning In-Memory Content

use std::sync::Arc;
use kingfisher_rules::{get_builtin_rules, RulesDatabase, Rule};
use kingfisher_scanner::Scanner;

fn scan_content(content: &[u8]) -> anyhow::Result<()> {
    let rules = get_builtin_rules(None)?;
    let rule_vec: Vec<Rule> = rules.iter_rules()
        .map(|syntax| Rule::new(syntax.clone()))
        .collect();
    let rules_db = Arc::new(RulesDatabase::from_rules(rule_vec)?);

    let scanner = Scanner::new(rules_db);

    // Scan bytes directly - no file I/O needed
    let findings = scanner.scan_bytes(content);

    for finding in &findings {
        println!("Secret: {} ({})", finding.rule_name, finding.confidence);
    }

    Ok(())
}

kingfisher-core

Core types and utilities for working with scannable content.

Core Structure

flowchart TD
    Core[kingfisher-core]
    Blob[blob module]
    Location[location module]
    Origin[origin module]
    Content[content_type module]
    Entropy[entropy module]
    GitMeta[git_commit_metadata module]
    Escape[bstring_escape module]
    Error[error module]

    Core --> Blob
    Core --> Location
    Core --> Origin
    Core --> Content
    Core --> Entropy
    Core --> GitMeta
    Core --> Escape
    Core --> Error

Blob - Content Abstraction

Blob represents content that can be scanned. It supports:

  • File-backed content with memory mapping for large files
  • In-memory content for programmatic use
  • Borrowed content for zero-copy scanning
use kingfisher_core::Blob;

// From a file (memory-mapped for efficiency)
let blob = Blob::from_file("secret.txt")?;

// From owned bytes
let blob = Blob::from_bytes(vec![0x41, 0x42, 0x43]);

// Access the content
let bytes: &[u8] = blob.bytes();
let id: BlobId = blob.id();  // SHA-1 based identifier

BlobId - Content Identity

BlobId provides a unique identifier for content, computed using a SHA-1 hash (compatible with Git's blob IDs):

use kingfisher_core::BlobId;

let id = BlobId::new(b"hello world");
println!("Blob ID: {}", id.hex());  // 40-character hex string

// Parse from hex
let id = BlobId::from_hex("2aae6c35c94fcfb415dbe95f408b9ce91ee846ed")?;

Location - Source Positions

Track positions within scanned content:

use kingfisher_core::{LocationMapping, SourceSpan};

let content = b"line1\nline2\nline3";
let mapping = LocationMapping::new(content);

// Convert byte offset to line/column
let point = mapping.get_source_point(7);  // Returns (line: 2, column: 2)

// Get a span
let span = mapping.get_source_span(6..11);  // "line2"

Entropy Calculation

Calculate Shannon entropy to filter high-randomness content:

use kingfisher_core::calculate_shannon_entropy;

let entropy = calculate_shannon_entropy(b"AKIAIOSFODNN7EXAMPLE");
println!("Entropy: {:.2} bits", entropy);  // ~4.0 for random-looking strings

Origin - Provenance Tracking

Track where content came from:

use kingfisher_core::{Origin, FileOrigin, GitRepoOrigin};
use std::path::PathBuf;

// File origin
let origin = Origin::File(FileOrigin {
    path: PathBuf::from("/path/to/file.txt"),
});

// Git repository origin
let origin = Origin::GitRepo(GitRepoOrigin {
    repo_path: PathBuf::from("/path/to/repo"),
    remote_url: Some("https://github.com/org/repo".into()),
});

kingfisher-rules

Rule definitions, YAML parsing, and the compiled rule database.

Rules Structure

flowchart TD
    Rules[kingfisher-rules]
    RuleMod[rule module]
    RulesMod[rules module]
    Db[rules_database module]
    Defaults[defaults module]
    Liquid[liquid_filters module]

    Rules --> RuleMod
    Rules --> RulesMod
    Rules --> Db
    Rules --> Defaults
    Rules --> Liquid

    RuleMod --> Syntax[Rule and RuleSyntax]
    RulesMod --> Collections[Rules collection and loading]
    Db --> Compiled[Compiled RulesDatabase]
    Defaults --> Builtins[Builtin rules]
    Liquid --> Filters[Template filters]

Loading Builtin Rules

Kingfisher comes with 700+ builtin rules for common secret types:

use kingfisher_rules::{get_builtin_rules, Confidence};

// Load all rules with Medium confidence or higher (default)
let rules = get_builtin_rules(None)?;

// Load only High confidence rules
let rules = get_builtin_rules(Some(Confidence::High))?;

println!("Loaded {} rules", rules.num_rules());

Loading Custom Rules

Load rules from YAML files or directories:

use kingfisher_rules::{Rules, Confidence};

// From a single file
let rules = Rules::from_paths(&["my-rules.yml"], Confidence::Medium)?;

// From a directory (recursively finds .yml files)
let rules = Rules::from_paths(&["rules/"], Confidence::Medium)?;

// Merge multiple sources
let mut rules = Rules::new();
rules.update(Rules::from_paths(&["builtin/"], Confidence::Medium)?);
rules.update(Rules::from_paths(&["custom/"], Confidence::Medium)?);

Rule Syntax YAML Format

rules:
  - name: My Custom API Key
    id: custom.myapi.1
    pattern: |
      (?i)
      myapi[_-]?key\s*[:=]\s*
      ["']?([A-Za-z0-9]{32})["']?
    min_entropy: 3.5
    confidence: high
    examples:
      - 'MYAPI_KEY=abc123def456ghi789jkl012mno345pq'
    validation:
      type: Http
      content:
        request:
          method: GET
          url: https://api.example.com/validate
          headers:
            Authorization: Bearer {{ TOKEN }}
          response_matcher:
            - type: StatusMatch
              status: [200]

Compiling Rules

The RulesDatabase compiles rules for efficient multi-pattern matching:

use std::sync::Arc;
use kingfisher_rules::{get_builtin_rules, RulesDatabase, Rule};

let rules = get_builtin_rules(None)?;

// Convert RuleSyntax to Rule objects
let rule_vec: Vec<Rule> = rules.iter_rules()
    .map(|syntax| Rule::new(syntax.clone()))
    .collect();

// Compile into a database (uses Vectorscan for fast matching)
let db = Arc::new(RulesDatabase::from_rules(rule_vec)?);

// Access compiled rules
println!("Compiled {} rules", db.num_rules());

// Look up rules by ID
if let Some(rule) = db.get_rule_by_text_id("kingfisher.aws.1") {
    println!("Found rule: {}", rule.name());
}

Confidence Levels

Rules have confidence levels indicating detection accuracy:

use kingfisher_rules::Confidence;

// Available levels (in order)
// Confidence::Low    - May have false positives
// Confidence::Medium - Balanced (default)
// Confidence::High   - High accuracy

let conf = Confidence::High;
if conf.is_at_least(&Confidence::Medium) {
    println!("Confidence is medium or higher");
}

Liquid Filters for Validation

The crate includes Liquid template filters for HTTP validation:

use kingfisher_rules::register_liquid_filters;
use liquid::ParserBuilder;

let parser = register_liquid_filters(ParserBuilder::with_stdlib())
    .build()?;

let template = parser.parse("{{ secret | sha256 }}")?;

Available filters:

  • Encoding: b64enc, b64dec, b64url_enc, url_encode, json_escape
  • Hashing: sha256, crc32, crc32_dec, crc32_hex, crc32_le_b64
  • HMAC: hmac_sha256, hmac_sha384, hmac_sha1, hmac_sha256_b64key
  • Encoding: base62, base36
  • Strings: prefix, suffix, replace, lstrip_chars, random_string, newline
  • Time: unix_timestamp, iso_timestamp, iso_timestamp_no_frac
  • Other: uuid, jwt_header

kingfisher-scanner

High-level scanning API that combines core types and rules.

Scanner Structure

flowchart TD
    Scanner[kingfisher-scanner]
    ScanMod[scanner module]
    FindingMod[finding module]
    PoolMod[scanner_pool module]
    Prim[primitives module]
    Validation[validation module]
    Core[kingfisher-core]
    Rules[kingfisher-rules]

    Scanner --> ScanMod
    Scanner --> FindingMod
    Scanner --> PoolMod
    Scanner --> Prim
    Scanner --> Validation
    Scanner --> Core
    Scanner --> Rules

    ScanMod --> API[Scanner and ScannerConfig]
    FindingMod --> Finding[Finding types]
    PoolMod --> Pool[ScannerPool]
    Prim --> Helpers[Matching helpers]
    Validation --> Validators[Optional validators]

Scanner Configuration

use std::sync::Arc;
use kingfisher_rules::{get_builtin_rules, RulesDatabase, Rule};
use kingfisher_scanner::{Scanner, ScannerConfig};

let rules = get_builtin_rules(None)?;
let rule_vec: Vec<Rule> = rules.iter_rules()
    .map(|syntax| Rule::new(syntax.clone()))
    .collect();
let rules_db = Arc::new(RulesDatabase::from_rules(rule_vec)?);

// Default configuration
let scanner = Scanner::new(Arc::clone(&rules_db));

// Custom configuration
let config = ScannerConfig {
    enable_base64_decoding: true,   // Decode and scan base64 content
    enable_dedup: true,             // Skip duplicate blobs
    min_entropy_override: Some(3.0), // Override minimum entropy
    redact_secrets: false,          // Don't redact in findings
    max_base64_depth: 2,            // Max nested base64 decoding
};
let scanner = Scanner::with_config(Arc::clone(&rules_db), config);

Scanning Methods

// Scan raw bytes
let findings = scanner.scan_bytes(b"AWS_SECRET_KEY=AKIAIOSFODNN7EXAMPLE");

// Scan a file
let findings = scanner.scan_file("config.yml")?;

// Scan a Blob
use kingfisher_core::Blob;
let blob = Blob::from_file("secrets.env")?;
let findings = scanner.scan_blob(&blob)?;

Working with Findings

use kingfisher_scanner::Finding;

for finding in findings {
    println!("Rule: {} ({})", finding.rule_name, finding.rule_id);
    println!("Secret: {}", finding.secret);
    println!(
        "Location: line {} col {} - line {} col {}",
        finding.location.line,
        finding.location.column,
        finding.location.end_line,
        finding.location.end_column
    );
    println!("Entropy: {:.2}", finding.entropy);
    println!("Confidence: {:?}", finding.confidence);
    println!("Fingerprint: {}", finding.fingerprint);

    // Named captures from the regex
    for (name, value) in &finding.captures {
        println!("  {}: {}", name, value);
    }
}

Parallel Scanning

The scanner is thread-safe and uses a thread-local scanner pool:

use std::sync::Arc;
use rayon::prelude::*;

let scanner = Arc::new(Scanner::new(rules_db));

let files = vec!["file1.txt", "file2.txt", "file3.txt"];

let all_findings: Vec<_> = files.par_iter()
    .flat_map(|file| {
        scanner.scan_file(file).unwrap_or_default()
    })
    .collect();

Complete Example

Here's a complete CLI tool that scans files and directories for secrets with configurable options:

use std::sync::Arc;
use std::path::Path;
use walkdir::WalkDir;
use clap::Parser;

use kingfisher_rules::{get_builtin_rules, RulesDatabase, Rule, Confidence};
use kingfisher_scanner::{Scanner, ScannerConfig};

#[derive(Parser)]
#[command(name = "secret-scanner")]
#[command(about = "Scan files and directories for secrets using Kingfisher", long_about = None)]
struct Cli {
    /// Path to scan (file or directory)
    #[arg(value_name = "PATH")]
    path: String,

    /// Minimum confidence level (low, medium, high)
    #[arg(short, long, default_value = "medium")]
    confidence: String,

    /// Enable base64 decoding
    #[arg(short, long, default_value_t = true)]
    base64: bool,

    /// Redact secrets in output
    #[arg(short, long, default_value_t = false)]
    redact: bool,
}

fn main() -> anyhow::Result<()> {
    let cli = Cli::parse();

    // Parse confidence level
    let confidence = match cli.confidence.to_lowercase().as_str() {
        "low" => Confidence::Low,
        "medium" => Confidence::Medium,
        "high" => Confidence::High,
        _ => {
            eprintln!("Invalid confidence level. Use: low, medium, or high");
            std::process::exit(1);
        }
    };

    // Load builtin rules
    println!("Loading {} confidence rules...", cli.confidence);
    let rules = get_builtin_rules(Some(confidence))?;
    println!("Loaded {} rules", rules.num_rules());

    // Convert to Rule objects and compile into a database
    let rule_vec: Vec<Rule> = rules
        .iter_rules()
        .map(|syntax| Rule::new(syntax.clone()))
        .collect();
    let rules_db = Arc::new(RulesDatabase::from_rules(rule_vec)?);

    // Configure scanner
    let config = ScannerConfig {
        enable_base64_decoding: cli.base64,
        enable_dedup: true,
        redact_secrets: cli.redact,
        ..Default::default()
    };
    let scanner = Scanner::with_config(rules_db, config);

    // Scan the path
    let path = Path::new(&cli.path);

    if !path.exists() {
        eprintln!("Error: Path '{}' does not exist", cli.path);
        std::process::exit(1);
    }

    let mut total_findings = 0;
    let mut files_scanned = 0;

    if path.is_file() {
        // Scan single file
        files_scanned = 1;
        println!("\nScanning file: {}", path.display());
        match scanner.scan_file(path) {
            Ok(findings) => {
                print_findings(path, &findings);
                total_findings += findings.len();
            }
            Err(e) => eprintln!("Error scanning file: {}", e),
        }
    } else if path.is_dir() {
        // Scan directory recursively
        println!("\nScanning directory: {}\n", path.display());

        for entry in WalkDir::new(path)
            .into_iter()
            .filter_map(|e| e.ok())
            .filter(|e| e.file_type().is_file())
        {
            let file_path = entry.path();
            files_scanned += 1;

            match scanner.scan_file(file_path) {
                Ok(findings) if !findings.is_empty() => {
                    print_findings(file_path, &findings);
                    total_findings += findings.len();
                }
                Err(e) => {
                    // Silently skip files that can't be scanned (binary, etc.)
                    if std::env::var("DEBUG").is_ok() {
                        eprintln!("Error scanning {}: {}", file_path.display(), e);
                    }
                }
                _ => {}
            }
        }
    }

    // Print summary
    println!("\n{}", "=".repeat(60));
    println!("Scan complete!");
    println!("Files scanned: {}", files_scanned);
    println!("Total findings: {}", total_findings);

    if total_findings > 0 {
        println!("\n⚠️  WARNING: Secrets detected! Please review the findings above.");
        std::process::exit(1);
    } else {
        println!("✓ No secrets found.");
    }

    Ok(())
}

fn print_findings(path: &Path, findings: &[kingfisher_scanner::Finding]) {
    println!("\n📁 {}", path.display());
    println!("{}", "-".repeat(60));

    for finding in findings {
        println!("  🔍 {} ({})", finding.rule_name, finding.rule_id);
        println!("     Location: line {}:{} - {}:{}",
            finding.location.line,
            finding.location.column,
            finding.location.end_line,
            finding.location.end_column);
        println!("     Secret: {}", finding.secret);
        println!("     Entropy: {:.2}", finding.entropy);
        println!("     Confidence: {:?}", finding.confidence);
        println!("     Fingerprint: {}", finding.fingerprint);

        if !finding.captures.is_empty() {
            println!("     Captures:");
            for (name, value) in &finding.captures {
                println!("       {}: {}", name, value);
            }
        }
        println!();
    }
}

Add these dependencies to your Cargo.toml:

[package]
name = "secret-scanner"
version = "0.1.0"
edition = "2021"

[dependencies]
kingfisher-core = { git = "https://github.com/mongodb/kingfisher" }
kingfisher-rules = { git = "https://github.com/mongodb/kingfisher" }
kingfisher-scanner = { git = "https://github.com/mongodb/kingfisher" }
anyhow = "1.0"
walkdir = "2.5"
clap = { version = "4.5", features = ["derive"] }

Try it out:

# Scan a directory with medium confidence rules
cargo run -- -c medium ~/tmp

# Scan with high confidence only and redact secrets
cargo run -- -c high --redact ~/projects

# Scan a single file
cargo run -- config.yml

Credential Validation (Optional)

The kingfisher-scanner crate includes optional credential validation support. This allows you to check if detected secrets are still active/valid.

Enabling Validation

Add the validation feature to your Cargo.toml:

[dependencies]
kingfisher-scanner = { git = "https://github.com/mongodb/kingfisher", features = ["validation"] }

Available Features

Feature Description
validation Core validation support with HTTP validation
validation-http HTTP-based validation for API tokens
validation-aws AWS credential validation via STS
validation-all Enable all validation features

HTTP Validation Example

use kingfisher_scanner::validation::{
    build_request_builder, validate_response, CachedResponse,
    from_string, GLOBAL_USER_AGENT,
};
use kingfisher_rules::ResponseMatcher;
use reqwest::Client;
use std::collections::BTreeMap;
use std::time::Duration;

async fn validate_api_token(token: &str) -> bool {
    let client = Client::builder()
        .timeout(Duration::from_secs(10))
        .build()
        .unwrap();

    let parser = liquid::ParserBuilder::with_stdlib().build().unwrap();
    let mut globals = liquid::Object::new();
    globals.insert("TOKEN".into(), liquid_core::Value::scalar(token.to_string()));

    let url = reqwest::Url::parse("https://api.example.com/validate").unwrap();
    let mut headers = BTreeMap::new();
    headers.insert("Authorization".to_string(), "Bearer {{ TOKEN }}".to_string());

    let request = build_request_builder(
        &client,
        "GET",
        &url,
        &headers,
        &None,
        Duration::from_secs(10),
        &parser,
        &globals,
    ).unwrap();

    match request.send().await {
        Ok(resp) => {
            let status = resp.status();
            let body = resp.text().await.unwrap_or_default();

            // Define matchers for valid response
            let matchers = vec![
                ResponseMatcher::StatusMatch {
                    r#type: "status-match".to_string(),
                    status: vec![200],
                    match_all_status: false,
                    negative: false,
                },
            ];

            validate_response(&matchers, &body, &status, resp.headers(), false)
        }
        Err(_) => false,
    }
}

AWS Credential Validation

Enable the validation-aws feature to validate AWS credentials:

[dependencies]
kingfisher-scanner = { git = "https://github.com/mongodb/kingfisher", features = ["validation-aws"] }
use kingfisher_scanner::validation::{
    validate_aws_credentials, validate_aws_credentials_input,
    aws_key_to_account_number, set_aws_skip_account_ids,
};

async fn check_aws_key(access_key_id: &str, secret_key: &str) {
    // Validate format first
    if let Err(e) = validate_aws_credentials_input(access_key_id, secret_key) {
        println!("Invalid format: {}", e);
        return;
    }

    // Extract account number from the key
    if let Ok(account) = aws_key_to_account_number(access_key_id) {
        println!("AWS Account: {}", account);
    }

    // Validate credentials via STS
    match validate_aws_credentials(access_key_id, secret_key).await {
        Ok((true, arn)) => println!("Valid! ARN: {}", arn),
        Ok((false, msg)) => println!("Invalid: {}", msg),
        Err(e) => println!("Error: {}", e),
    }
}

// Skip validation for known canary/honeypot accounts
fn setup_skip_list() {
    set_aws_skip_account_ids(vec![
        "111122223333",  // Test account
        "444455556666",  // Canary account
    ]);
}

Validation Response Types

use kingfisher_scanner::validation::{
    CachedResponse, ValidationResponseBody,
    from_string, as_str, VALIDATION_CACHE_SECONDS,
};
use http::StatusCode;
use std::time::Duration;

// Create a validation response body
let body = from_string("Credential is valid");

// Create a cached response
let cached = CachedResponse::new(
    body,
    StatusCode::OK,
    true,  // is_valid
);

// Check if cache is still fresh
let cache_duration = Duration::from_secs(VALIDATION_CACHE_SECONDS);
if cached.is_still_valid(cache_duration) {
    println!("Using cached result: valid={}", cached.is_valid);
}

API Stability

These crates are currently internal to Kingfisher. The API may change between versions. For stable integration, pin to a specific git commit or tag.

See Also