LegacyLens
Custom RAG pipeline (~3K LOC, no LangChain) for querying legacy codebases across 7 languages. Parallel dense + sparse encoding with hybrid search (Qdrant RRF), cross-encoder reranking, and streaming LLM generation. 7 language-aware parsers (COBOL, C, Fortran, JCL, PL/I, RPG, plaintext) with syntax-boundary-preserving chunking. Metadata-header embedding strategy compensates for COBOL's absence from training data. 22,836 chunks across 6 codebases, retrieval in 149–259ms, full query in under 1.5s.
PythonFastAPIQdrantCerebrasVoyage AISPLADE
