← Back to Seminary

My Approach

Problem Understanding

Biblical language study requires cross-referencing dozens of resources simultaneously — lexicons, manuscripts, translations, commentaries, cross-references. Existing tools are either too narrow (one lexicon at a time), too fragmented (separate apps for each resource), or web-dependent (unusable offline). Seminary students need a single workspace that integrates everything and works without an internet connection.

Why Desktop

The dataset is ~2GB across 10M+ records. Querying this over a network adds unacceptable latency for the kind of rapid cross-referencing scholars do — selecting a Hebrew word and instantly seeing every occurrence, lexicon entry, and manuscript variant. A local SQLite database with FTS5 delivers sub-millisecond lookups. Tauri keeps the binary small (~15MB) while providing native Rust performance for the query engine.

Progressive Disclosure

10 analytical domains contain too much information to display at once. The UI follows a strict progressive disclosure pattern: every domain starts as a one-line summary. Expanding reveals the full analysis. The workspace hierarchy — Verse → Pericope → Portfolio → LLM Analysis — lets researchers zoom from a single word to a full passage study without losing context.

This pattern also helps with performance. Only expanded domains trigger their full queries. A collapsed lexical domain doesn't load all 12 lexicon entries until the user asks for them.

Zero-Code Extensibility

Biblical scholarship continuously produces new resources — updated lexicons, newly digitized manuscripts, fresh translations. The ETL pipeline is driven entirely by TOML configuration files. Adding a new lexicon means writing a TOML file that describes the source format, schema mapping, and indexing rules. No code changes needed.

50+ Python ETL loaders — one per data source — handle ingestion from diverse formats (CSV, JSON, XML, custom academic formats). Each loader is isolated, testable, and configured via its TOML file.

Multi-Tradition Text Comparison

Biblical texts exist in multiple manuscript traditions — Hebrew Masoretic Text, Dead Sea Scrolls, Greek Septuagint, Byzantine text type, and more. Seminary aligns these traditions at the verse level, allowing scholars to compare variants side by side. The textual-critical domain surfaces differences automatically, highlighting where traditions diverge.

Trade-offs

  • Zustand over Redux: 10 domain components with independent state. Zustand's store-per-domain pattern is simpler than Redux slices for this use case.
  • SQLite over PostgreSQL: Portability wins. The entire dataset ships with the app — no database server to install, no connection strings, no cloud dependency.
  • Regex COBOL-style parsers for academic formats: Many biblical data sources use non-standard formats from the 1990s. Simple regex parsers with good test coverage handle these reliably.
  • Multi-language codebase: Rust for performance-critical backend, React for UI, Python for ETL. Three test suites, three build systems. The complexity is justified — each language plays to its strength.

What I'd Build Next

LLM-powered analysis at the portfolio level — summarizing research findings across multiple passages and domains. A collaborative layer where study groups can share annotations and workspaces. And a morphological search engine that finds words by grammatical pattern (“all Hiphil imperatives in Isaiah”) rather than just lexical form.