← Back to Ghostfolio

My Approach

Extending, Not Forking

Ghostfolio is an active open-source project with its own maintainers and release cadence. The challenge was adding three major feature systems — AI education, tax computation, translation — without creating merge conflicts or coupling to upstream internals.

Every line of extension code lives in dedicated directories. Upstream files are untouched except for explicit wiring points (module imports, route registration), each marked with // PROJECT: AI Education comments. The result: zero merge conflicts across months of upstream updates.

Fat Tools, Thin Agent

Financial computation cannot hallucinate. If the FIFO capital gains engine produces incorrect numbers, someone files incorrect taxes. So the architecture inverts the typical AI pattern: every computation is a deterministic, unit-tested pure function. The LLM's job is routing (pick the right tool) and narration (explain what the numbers mean). It never computes.

This “fat tools, thin agent” pattern means high confidence in correctness comes from conventional software testing, not prompt engineering. The FIFO engine alone has 42 unit tests covering lot matching, fee proration, FX conversion, and edge cases.

Safety as Infrastructure

Financial AI has a hard constraint: it must not give investment advice. Three safety interceptors run on every response — advice boundary detection, hallucination checking (are claims grounded in tool results?), and output validation. These are NestJS interceptors, not prompt instructions — they execute as code, not suggestions.

The eval suite backs this up with 200+ scenarios across 9 categories. The CI pipeline runs evals on every change to AI education code, gating deployment on safety scores.

Feature Flagging

Every feature is gated by an environment variable (ENABLE_FEATURE_AI_EDUCATION, etc.) that defaults to disabled. Features can be toggled in production without redeployment. This gives confidence to ship incrementally — a half-built feature that's flag-off is safe to deploy.

Observability

Every chat request produces a Braintrust span — latency, token usage, tool calls, safety scores, and user feedback (thumbs up/down). Langfuse provides additional tracing for the full request lifecycle. The eval dashboard gives a real-time view of quality across all scoring dimensions.

Trade-offs

  • FIFO only: No LIFO, HIFO, or specific identification. FIFO is the IRS default and covers most retail investors. Adding lot selection methods is a v2 feature.
  • OpenAI direct over multi-provider: Started with configurable providers (OpenAI, OpenRouter) but standardized on OpenAI for simplicity. The abstraction layer is still there if needed.
  • Self-hosted NLLB: Running the translation model on a home GPU via Cloudflare Tunnel. Not production-grade infrastructure, but zero marginal cost for a personal project.
  • Regex advice boundary (~5% false positive): Patterns like /\b(buy|sell)\b/i occasionally flag educational text. Acceptable because false positives are safe — they're caught and softened, not blocking.