← Back to Bundlebot

Architecture

How Bundlebot transforms a raw support bundle into an evidence-backed diagnostic report.

Pipeline

  .tar.gz Bundle
        │
  ┌─────▼─────────┐
  │   Extract     │  Safe extraction: rejects symlinks, path traversal, size bombs
  └─────┬─────────┘
        │
  ┌─────▼─────────┐
  │  Parse +      │  File classification, K8s deserialization via apimachinery
  │  Manifest     │  Produces typed resource inventory
  └─────┬─────────┘
        │
  ┌─────▼─────────┐
  │  Resource     │  In-memory adjacency list: ownerRefs, selectors, volumes
  │  Graph        │  9 relationship types, 5 diagnostic queries
  └─────┬─────────┘
        │
  ┌─────▼─────────┐
  │  Pre-filter   │  7 deterministic heuristics — catches 60-70% of issues
  │               │  ~1000x token reduction (500K+ → ~1000 tokens)
  └─────┬─────────┘
        │
  DiagnosticDigest (ground truth)
        │
  ┌─────▼─────────┐
  │  Claude LLM   │  Reasons over digest with 5 investigation tools
  │  + Tool Use   │  File reading, log search, graph traversal, event lookup
  └─────┬─────────┘
        │
  ┌─────▼─────────┐
  │  Validator    │  Checks every resource ref + file citation against bundle
  └─────┬─────────┘
        │
  DiagnosticReport
        │
  ┌─────▼─────────┐
  │   Output      │  Rich CLI (Lipgloss) / Markdown / JSON
  └───────────────┘

Design Decisions

Go, not Python

Go 1.22+ with k8s.io/apimachinery

Replicated's entire stack is Go. Single static binary, zero runtime deps. k8s.io/apimachinery provides native K8s type deserialization.

Resource Graph

In-memory adjacency list with 9 relationship types

Enables cross-file correlation — Deployment → ReplicaSet → Pod → Node in one traversal. Three YAML files correlated instantly.

Deterministic-First Pipeline

Pre-filter before LLM, not pure RAG or context stuffing

Bundles are structured data, not free text. Pre-filtering extracts the diagnostic 2% programmatically (~1000x reduction). The LLM adds causal reasoning, not discovery.

Tool Use over Context Stuffing

5 Claude tools for on-demand investigation

Bundles are 500K-15M+ tokens. Tool use gives targeted access without front-loading everything into the prompt.

Externalized Prompts

prompts/*.txt loaded via embed.FS

Diffable, versionable, updatable without recompilation. System prompt, analysis template, and examples are separate files.

Competitive Landscape

Every existing AI K8s diagnostic tool requires live cluster access. Bundlebot is the first to apply AI to offline support bundles.

ToolLive ClusterOffline BundleAI/LLMCross-FileOpen Source
K8sGPT✓ Apache-2.0
HolmesGPTPartial✓ CNCF
Komodor
Kagent✓ CNCF
Troubleshoot.sh✓ rule-based✓ Apache-2.0
Bundlebot✓ graph✓ Apache-2.0

Hallucination Mitigation

Layer 1: Ground Truth

Pre-filter produces a DiagnosticDigest — deterministic, verifiable facts about the cluster state. The LLM explains and correlates; it does not discover.

Layer 2: Post-Validation

Every resource reference is checked against the graph. Every file citation is checked against the bundle. Failed checks are flagged in the output.

Data Security

01

Only the pre-filtered digest (~20-50K tokens) reaches the LLM — not the full bundle

02

--dry-run shows the exact prompt before sending

03

--no-llm runs entirely locally with zero external calls