Open SourceImportance: High

Top-k search may fail at totals and counts in AI apps

r/LLMDevsJun 9, 2026 · 2d ago

The Reddit post says RAG sends only the top few matching document chunks to the model. The author says this works for finding or explaining one passage, but not for aggregation questions such as totals, counts, or “which client was billed most.” The suggested fix is to turn each document into structured records first, then answer those questions with DB calculations and source citations.

Key points

RAG with top-k only gives the model a small selected set of document chunks.
Aggregation questions need all relevant records, not just the most similar chunks.
Raising k can mean sending too much text, which raises token cost and hits context limits.
The post suggests extracting documents into a schema, then using DB queries for totals and counts.
Embeddings can still be useful for open-ended find-and-explain questions.

Quick term guide

aggregation: Combining many items to produce a result such as a count, sum, or average.
source citations: Links or notes that show where an answer came from.
citations: Citations show which source a claim or answer came from.
AI agents: AI agents are AI tools that can carry out steps toward a goal, not just answer once.
token cost: The money or usage spent when sending text to an AI model and getting text back.
context limit: The maximum amount of conversation or instruction an AI model can hold in memory at once — going over it causes the model to forget earlier content
embeddings: A way of converting text into numbers so that similar meanings can be found and compared mathematically.
embedding: A way to turn text meaning into numbers so similar text can be found.

Read original ↗