Fully local RAG system for a law firm. Ingests contracts, judicial records, audio transcriptions, and documents " then generates structured legal briefs via agentic retrieval. Nothing leaves the lawyer's machine.
Domain-specific retrieval informed by a legal evidence taxonomy, a section-level index filter that removes low-signal boilerplate before embedding, and a structured layer for deterministic queries over case records and contract fields.
Each stage runs independently and in isolation. Hard separation by design " the end user is a non-technical lawyer.
Extracts and normalizes all document types: PDF, DOCX, Excel, HTML, email, JSON, and compressed archives. Audio and video files are transcribed locally via speech-to-text. Source files are never modified.
Chunks are embedded with rich metadata (document type, section, probative relevance score). High-signal contract sections are prioritized; boilerplate is filtered before indexing. Structured entities (case numbers, parties, key financial terms) are extracted in parallel into a relational layer for deterministic queries.
Hybrid retrieval (dense + sparse + reranker). Structured queries bypass the vector store entirely. For brief generation, each section independently retrieves its own evidence and cites sources. Runs locally or via API. Output: .docx + HTML with inline citations.
Six sequential stages. Dense and sparse retrieval run in parallel; fusion merges them before a cross-encoder reranker. A final reorder step counteracts attention degradation when relevant content is buried in the middle of a long context window.
Multilingual dense vector — handles domain-specific legal terminology natively
Approximate nearest neighbor with metadata filtering by document type and case
Exact term retrieval — critical for legal identifiers, clause references, and proper nouns
Merges dense and sparse rankings without score normalization
Second-pass relevance scoring — evaluates query and chunk jointly before context injection
Highest-relevance chunks placed at start and end of the context window
Contracts are dense with boilerplate that adds retrieval noise without legal value. A section classifier runs before indexing, filtering low-signal clauses and assigning each chunk a probative relevance score. Result: significantly smaller index with higher precision on the clauses that actually matter in litigation.
| Level | Section types | Action |
|---|---|---|
| Very High | Financial obligations, fee structures, disclosure requirements, IP transfer | Embedded + max priority |
| High | Rescission, non-compete, territorial exclusivity, termination triggers | Embedded + normal priority |
| Medium | Franchisee obligations, support commitments, supply requirements | Embedded + lower priority |
| Low / Very Low | Jurisdiction, confidentiality boilerplate, severability, waiver | Discarded from index |
Template-driven brief generation covers the main document types used in litigation support. Each brief is assembled section by section, with independent retrieval and inline citations per claim.
Financial modeling, DCF valuation, performance indicators, and indemnification framework.
Statistical analysis of the opposing party's judicial history across multiple courts and case types.
Reconstruction of financial statements, balance sheet, and performance indicators vs. disclosed projections.
Documented evidence of operational support failures and contractual non-conformance.
Fund spend analysis vs. contractual commitments, campaign performance, and compliance audit.
Territory definition, market saturation, and proximity analysis.
Client project " confidential. Not open source.