Cognitive RAG for regulated work
Research notes on retrieval systems that preserve references, provenance, and reasoning context for professional review.
Cognitive RAG: Notes on Mendel’s Retrieval Architecture
A self-employed contractor asking Mendel what they owe the tax authority this year is not asking a single question. They are expressing a knowledge deficit that spans the Income Tax Act, the Health Insurance Act, relevant Czech National Bank guidance on self-employment income classification, and depending on the nature of their work arrangement, potentially case law on what constitutes dependent work under §2358 of the Civil Code. Vector similarity search over that natural language query will retrieve something. It will retrieve topically close chunks. What it will not produce is the relational context across four interlocked legal domains, because standard RAG has no concept of relational context.
This is the core retrieval problem we are designing around.
Standard RAG pipelines treat the user query as a static signal of absolute truth. They embed it, find the nearest neighbors in a flat vector index, and pass the retrieved chunks to the language model for generation. The architecture works well when queries map cleanly to a single source. It degrades under the conditions that define Czech legal and financial queries: compressed, ambiguous natural language that encodes a multi-hop information need the user cannot fully articulate because they lack the vocabulary of the domain they are trying to understand.
The information science literature has a name for this state. Ingwersen’s cognitive IR theory describes it as an anomalous state of knowledge: the user recognizes a gap in their understanding but cannot formulate a well-structured query to address it. The query they submit is, in Ingwersen’s framing, a heavily compressed label that obscures the true depth of the information need. Standard RAG takes that label at face value and retrieves against its surface form.
Thinking Fast and Slow
Daniel Kahneman’s dual-process theory from Thinking, Fast and Slow maps onto this problem more directly than most AI architecture discussions acknowledge. The framework distinguishes between System 1 cognition (fast, associative, pattern-driven) and System 2 (deliberative, sequential, resource-intensive). Transformer-based LLMs are System 1 engines by architecture. They generate statistically probable continuations based on learned associations. Fast, fluent, and structurally prone to what Kahneman calls the substitution heuristic: when a model cannot confidently answer the target question, it substitutes a related but easier heuristic question and answers that instead. The output reads plausibly. The question that was actually asked does not get answered.
Standard RAG augments System 1 with retrieved context. If the retrieval is undirected, driven by surface-level semantic similarity against an undifferentiated index, you have added more text for the model to be fluent about without adding any deliberative structure. The model is still operating in System 1 mode.
Cognitive RAG is an architectural attempt to force System 2 behavior into the pipeline before the model generates a token. Kuhlthau’s Information Search Process model provides the structural map: users move through stages of initiation and uncertainty, into exploration and the characteristic confidence dip where inconsistent information creates confusion, toward formulation where a coherent hypothesis emerges. The architecture needs to mirror these stages. Where the user is in the exploration stage, the system should expand the query and retrieve associatively. Where the user has reached a formulation, the system should rerank and focus. Standard RAG applies a single retrieval strategy regardless of where the user actually is in that process.
The recent AAAI paper Human Cognition Inspired RAG with Knowledge Graph for Complex Problem Solving (arXiv:2503.06567) formalizes this into a concrete architecture: dual-hypergraph retrieval, mind-map query decomposition, and adversarial self-verification. The Cognitive RAG pipeline we are building for Mendel draws on this research and adapts it to the specific constraints of Czech public data.
The Architecture
The first intervention is query decomposition before retrieval. The raw user query does not go directly to a vector search. It first passes through a decomposition layer that constructs a structured problem map: the intermediate legal entities, regulatory relationships, and causal dependencies the question implies. This gives the retrieval engine a blueprint rather than a keyword. It also gives the verifier, later in the pipeline, something to check the final answer against.
Retrieval runs against a layered knowledge graph, not a flat index. The graph operates at three levels of granularity.
At the top is a domain graph of roughly fifty to one hundred nodes: the high-level ontological map of Czech law, finance, and real estate, with their institutional relationships. This is the navigation layer. It is small enough to be built manually and maintained. It routes the query into the right subgraph before any expensive retrieval begins.
Below it is an entity graph that grows automatically from the ingestion pipeline. Named entity recognition running over ingested documents extracts statutes, regulatory decisions, institutional actors, and their co-occurrences. §2358 of the Civil Code appearing alongside a specific ČNB regulation in the same document creates a relational edge. That edge strengthens each time the co-occurrence repeats. The entity graph is not designed in advance and it does not need to be. It grows from the data and from real query traffic.
At the base is the vector index itself: standard pgvector embeddings, but with each chunk linked back to its entity graph nodes. Retrieval does not scan the full index. It enters through the entity graph, traverses topologically relevant nodes, and surfaces only the chunks connected to the decomposed query’s relational structure. The approach follows the dual-hypergraph model from the AAAI paper, separating macro-level thematic retrieval (which parts of the legal ontology are relevant) from micro-level entity diffusion (which specific documents connect them).
Adaptive routing across model tiers runs in parallel. Query classification and graph traversal use lightweight models. Deep reasoning over retrieved evidence uses a stronger one. The decision about which tier to engage happens per-query based on the complexity of the decomposition output.
Self-Verification
The retrieval architecture handles relevance. A separate layer handles truth.
Every answer Mendel generates must trace to a verifiable source. This is a structural constraint enforced at inference time by a dual-LLM protocol. A Reasoning LLM generates the response with explicit chain-of-thought against passage-memory pairs from the retrieval stage. An independent Verifier LLM audits the output before it reaches the user: checking factual consistency against the retrieved evidence, logical soundness of the reasoning chain, and compliance against a structured evidence checklist derived from the query decomposition.
The protocol uses Self-RAG reflection tokens (ISREL, ISSUP, ISUSE) to maintain granular per-claim control. Any claim the Reasoner generates without a supporting ISSUP token does not reach the user. The verifier scans for unsupported assertions and returns the response for correction before output. An adversarial setup rather than a single self-checking model, because a self-checking model can rationalize its own errors in ways a genuinely independent model will not.
This is the mechanism behind Mendel’s “the system stops rather than invents” behavior. It is not a policy applied after generation. It is enforced in the generation pipeline.
The Knowledge Graph Challenge
The most honest section of this document is this one.
Czech public data is massive and structurally fragmented. The full scope required for Mendel’s knowledge base spans e-Sbírka for statutory text, Supreme Court and Constitutional Court repositories for case law, ČNB regulatory archives, ČÚZK cadastral data, and spatial planning documents distributed across more than 1,500 municipal and regional government sites. Millions of URL endpoints, each containing documents with different structure, encoding, and update frequency. The entity graph cannot be built manually at that scale, and it cannot be complete before launch.
The practical answer is lazy graph construction. The entity graph grows from real ingestion and real query traffic. The first time a statute is surfaced by a user query, a node is created. Each subsequent query that activates the same statute strengthens the edges connecting it to the entities retrieved alongside it. After several months of real traffic, the most queried parts of Czech law will be the most densely connected in the graph. That is the retrieval behavior you want: a graph shaped by what users actually need.
The constraint this creates is that entity extraction quality at ingestion is the architectural bottleneck. A NER pipeline that confuses statute numbers with dates, or misidentifies organizational names, writes noise directly into the entity graph. Noise in the graph degrades retrieval for everything downstream. The engineering priority before scaling ingestion is precision on a controlled sample corpus. Measure extraction precision and recall, tune the pipeline, then scale. Scaling extraction errors is worse than starting with limited coverage.
Where This Is Now
Mendel’s initial release will implement source-grounded generation, explicit uncertainty flagging, and structured retrieval against the domain graph. The full Cognitive RAG pipeline, including the entity graph, hypergraph retrieval, and dual-LLM verification protocol, is in active development in parallel.
This article is a design document, not a research paper. The formal paper, with experimental data on retrieval precision, hallucination rates, and multi-hop query performance on real Czech legal data, will be published after the product ships. What we can say now is what we are building toward and why. Whether the specific implementation choices hold under real query load is what the testing will determine.
The architectural direction is right. Standard RAG retrieves by similarity. Legal reasoning requires relationships. Those are different problems and they need different tools.