A financial analyst asks your AI system:
"How did ACME's cash flow change after the CEO transition, and what was the board's response?"
Your vector database dutifully returns the top-K most similar chunks. You get a paragraph about ACME's cash flow from the 10-K. A press release about the CEO transition. Board meeting minutes from Q4. Three relevant chunks, three different documents.
The LLM reads them, generates a confident answer, and gets it wrong.
Not because the chunks were irrelevant. They were highly relevant. The problem is that the answer lives in the connections between them, the CEO transition caused the cash flow decline, and the board's response was to that decline. Vector search found the right pages. It had no way to follow the thread that ties them together.
This is the fundamental failure mode of vector search for enterprise questions. And no amount of better embeddings will fix it.
The Single-Hop Problem
Vector search performs exactly one operation: find the K text chunks most semantically similar to a query. That is one hop. Query in, chunks out.
Most enterprise questions require two to five hops.
Consider: "Which portfolio companies led by first-time founders raised follow-on rounds after their board composition changed?"
Answering this requires:
- Identify portfolio companies
- Check which have first-time founders (hop across founder bios)
- Find board composition changes (hop across governance documents)
- Check for follow-on rounds after those changes (hop across funding records, with a temporal constraint)
- Synthesize the pattern
Vector search can find chunks that mention "first-time founders" and chunks that mention "follow-on rounds." What it cannot do is traverse the chain from company to founder to board change to funding event. Each connection requires a hop through structured relationships that embeddings simply do not encode.
This is not an edge case. In enterprise knowledge bases (legal filings, financial reports, compliance records, medical charts) the questions that matter almost always span multiple documents and require following entity relationships across them.
The Entity Resolution Problem
Your knowledge base contains references to "Amazon," "AMZN," "Amazon.com Inc.," "the Seattle-based e-commerce giant," and "Bezos's company." A human knows these all refer to the same entity. Vector search does not.
Each reference lives in a different chunk with a different embedding. A query about Amazon's logistics strategy might retrieve chunks mentioning "Amazon" but miss critical context buried in a paragraph that only says "AMZN reported..." or "the company announced..." Coreference and aliasing are invisible to cosine similarity.
A knowledge graph solves this by resolving all variants to a single canonical entity at ingestion time. When you query for Amazon, every fact about Amazon is connected, regardless of how the source document phrased it. The graph has one node, not five disconnected chunks.
The Temporal Problem
"What was the revenue trend before and after the acquisition?"
Vector search returns every chunk that mentions revenue. It has no concept of "before" or "after." A chunk from 2019 and a chunk from 2024 are equally "relevant" if they are semantically similar to the query.
A knowledge graph with temporal fields (valid_from, valid_to) can answer this precisely: retrieve revenue facts where the timestamp falls before the acquisition date, then retrieve those after. The temporal constraint is a first-class filter, not something the LLM has to infer from raw text.
This matters enormously in finance (quarterly reporting), legal (regulatory timelines), and healthcare (treatment sequences). Any domain where when something happened is as important as what happened.
The Relationship Problem
"Who reports to the CTO, and which of their teams exceeded Q3 targets?"
Vector search finds chunks that mention the CTO. Maybe some of them list direct reports. Maybe not. Even if they do, the LLM now has to parse org chart information from unstructured text, match names against performance data in other chunks, and filter by the Q3 time constraint.
A knowledge graph stores reports_to as a direct relationship. Traversing it is a single operation: start at the CTO node, follow the reports_to edges, collect the team leads, then follow their team_performance edges filtered by Q3. The answer is a structured traversal, not a hope that the right text chunks landed in the context window.
The Synthesis Problem
Even when vector search retrieves all the right chunks, the LLM still faces an unsolved problem: figuring out how those chunks relate to each other.
Five chunks from five documents, dropped into a context window. The LLM has to discover that the person mentioned in chunk 2 is the same person referenced obliquely in chunk 4. That the event in chunk 1 preceded the outcome in chunk 3. That chunk 5 contradicts chunk 2 because it was written six months later.
This is asking the LLM to do research, the multi-hop, cross-referencing, timeline-building work that a human analyst spends hours on. LLMs are not good at this. They are very good at synthesis and reasoning over context they already have. They are poor at discovering structure within an unstructured pile of text.
The Goldman Sachs Analogy
At an investment bank, the research associate spends days cross-referencing SEC filings, following footnote chains across annual reports, building comparable tables from scattered data points, and assembling a structured research brief. The senior analyst reads that brief and writes the investment thesis.
You would never hand the senior analyst a stack of raw filings and say "figure it out." That is exactly what vector search does to an LLM.
The associate's job is discovery: finding the right facts, connecting them, ordering them, resolving contradictions. The analyst's job is synthesis: interpreting the structured evidence and forming a judgment.
Knowledge graph retrieval does the research associate's job. It pre-structures the relationships, resolves the entities, orders the timeline, and delivers a structured brief. The LLM's job becomes synthesis, which is exactly what large language models excel at.
When you hear people say "LLMs can't reason," what they often mean is "LLMs can't do multi-hop discovery from raw text." That is true. But it is not a reasoning failure, it is an architecture failure. Give an LLM structured, pre-connected evidence and it reasons over it remarkably well.
"But Context Windows Are Getting Bigger"
The most common objection: "Why bother with graphs when we can dump everything into a 2M token context window?"
Three reasons.
Cost. Pricing scales with tokens. Sending 2 million tokens per query instead of 1,500 well-chosen tokens is roughly 1,250x more expensive. At enterprise query volumes, this is the difference between a viable product and a bankruptcy filing.
Scale. Even 10 million tokens cannot hold an enterprise knowledge base. A mid-size company generates millions of documents. A law firm's case history, a hospital's patient records, a bank's regulatory filings, these corpora are measured in billions of tokens. No context window will ever be large enough.
Structure. This is the critical one. A bigger context window does not solve the relationship problem. You can give an LLM a million tokens of raw text, and it still has to figure out that "the company" in paragraph 847 refers to the same entity as "TechCorp" in paragraph 12. It still has to discover that the event on page 31 caused the outcome on page 94. Bigger windows give the model more hay. They do not help it find the needle.
Andrej Karpathy has articulated this well: a small cognitive core that fetches exactly what it needs will outperform a system that tries to hold everything in memory. The brain does not work by loading every memory simultaneously. It works by following associative links to retrieve the specific memories relevant to the current task.
What Works Instead
The solution is not better vector search. It is a different architecture entirely, one that structures knowledge into graphs, traverses relationships, and gives the LLM exactly the context it needs to reason.
Structured fact extraction. When documents are ingested, entities and relationships are extracted as subject-predicate-object triples. "ACME reported $50M revenue in Q3" becomes a structured fact: (ACME, reported_revenue, $50M, Q3_2025). Entities are resolved to canonical forms. Temporal metadata is attached. Aliases and coreferences are collapsed into single nodes.
Graph traversal across documents. When a query arrives, the system does not search for similar text. It identifies the entities in the question, then traverses the graph to find connected facts across documents. The CEO transition connects to ACME, which connects to cash flow data, which connects to board meeting decisions. Each hop follows an explicit relationship, not a similarity score.
Multi-hop reasoning with decomposition. Complex questions are broken into sub-questions, each with targeted retrieval. "How did cash flow change after the CEO transition?" becomes: (1) When did the CEO transition happen? (2) What was cash flow before that date? (3) What was cash flow after? (4) What board actions occurred in response? Each sub-question retrieves precisely the facts it needs.
Pre-structured context for the LLM. The language model receives a structured brief: resolved entities, ordered timelines, explicit relationships, confidence scores, and source attribution. Not a pile of text chunks. Its job is synthesis and reasoning, the task it was built for.
Vector Search Path: Query -> Embed -> Top-K Similar Chunks -> LLM (figure it out) Knowledge Graph Path: Query -> Decompose -> Entity Resolution -> Graph Traversal -> Temporal Filtering -> Relationship Following -> Structured Brief -> LLM (synthesize the answer)
The difference is architectural. The LLM does not have to be a research associate and a senior analyst. The knowledge infrastructure does the research. The LLM does the thinking.
The Punchline
Vector search is a useful tool. It finds similar text quickly and cheaply. For single-document, single-topic lookups, it works well.
But enterprise knowledge work is not about finding similar text. It is about following chains of relationships across documents, resolving entities, respecting timelines, and synthesizing structured evidence into answers. These are graph problems, not similarity problems.
The solution is not to build a better vector database. It is to build knowledge reasoning infrastructure, systems that structure knowledge into graphs, traverse relationships automatically, and deliver pre-structured context that lets language models do what they actually do best.
The senior analyst does not need a bigger desk. They need a better research team.
VRIN structures your enterprise knowledge into traversable graphs and delivers pre-structured context to language models. Try it at vrin.cloud.
Founder & CEO
Building knowledge reasoning infrastructure for enterprise AI at VRIN. We believe in transparent research and open benchmarks.