RagVault – Offline AI-Powered RAG Knowledge Portal

PythonFastAPIReactTypeScriptFAISSMistral Nemo 12BLlamafileLangChainNomic-Embed-v1.5BGE-Reranker-v2-M3BM25JWTTesseract OCR

Built a 100% offline, enterprise-grade RAG system delivering GPT-4 level document intelligence with zero data leaving the network — solving the privacy-vs-AI dilemma for 78% of enterprises blocked by data regulations.
Engineered a hybrid retrieval pipeline combining BM25 keyword search (0.3×) with FAISS semantic vector search (0.7×) plus BGE-Reranker-v2-M3 neural reranking, achieving 87% Precision@5 (+19% over vector-only).
Integrated Mistral Nemo 12B via Llamafile (single binary, no Docker/cloud), Nomic-Embed-v1.5 for embeddings, and Tesseract/PaddleOCR for multi-format document ingestion (PDF, DOCX, TXT, PNG, JPG).
Implemented enterprise RBAC with department-isolated access control (bcrypt + JWT), ensuring HR, Finance, and Engineering can only access their own document scopes.
Achieved 38% faster query latency vs cloud RAG (2.8s vs 4.5s), $0 operating cost per query, and a 10-second cold start on a single machine with 8GB RAM.
Designed offline-first scaling roadmap: FAISS sharding, Llamafile load balancing, and on-prem K3s cluster — supporting 1M+ documents and 1000+ users without ever touching the cloud.