Top-k search may fail at totals and counts in AI apps

The Reddit post says RAG sends only the top few matching document chunks to the model. The author says this works for finding or explaining one passage, but not for aggregation questions such as totals, counts, or “which client was billed most.” The suggested fix is to turn each document into structured records first, then answer those questions with DB calculations and source citations.

Key points

  • RAG with top-k only gives the model a small selected set of document chunks.
  • Aggregation questions need all relevant records, not just the most similar chunks.
  • Raising k can mean sending too much text, which raises token cost and hits context limits.
  • The post suggests extracting documents into a schema, then using DB queries for totals and counts.
  • Embeddings can still be useful for open-ended find-and-explain questions.

Quick term guide

aggregation
Combining many items to produce a result such as a count, sum, or average.
source citations
Links or notes that show where an answer came from.
citations
Citations show which source a claim or answer came from.
AI agents
AI agents are AI tools that can carry out steps toward a goal, not just answer once.
token cost
The money or usage spent when sending text to an AI model and getting text back.
context limit
The maximum amount of conversation or instruction an AI model can hold in memory at once — going over it causes the model to forget earlier content
embeddings
A way of converting text into numbers so that similar meanings can be found and compared mathematically.
embedding
A way to turn text meaning into numbers so similar text can be found.
Read original