Optimizing your agent's search layer is an easy win most skip

When building AI agents, developers often leave the search step on default settings while fine-tuning everything else. Poor search pulls in irrelevant documents, which forces the LLM to process junk and wastes tokens. Tuning the search layer is one of the cheapest ways to improve both quality and cost.

Most AI agents work in two steps: first search for relevant documents, then hand them to an LLM to generate an answer. Developers tend to spend time crafting prompts and formatting outputs, but rarely touch the search configuration. When search returns noisy or off-topic results, the LLM receives a bloated context window full of unhelpful text — driving up token costs and degrading answer quality.

Optimizing the search layer can mean adding a reranking model to filter results by relevance, rewriting the query before it hits the search engine, or adjusting chunk size so only the right pieces of text are retrieved. These changes shrink the context passed to the LLM, which directly cuts costs and speeds up responses without changing the LLM itself.

Key points

  • Leaving search on default settings sends irrelevant documents to the LLM, wasting tokens
  • Adding a reranking model keeps only the most relevant results before LLM processing
  • Query rewriting improves search precision and reduces context length
  • Better search quality lowers LLM costs and raises answer accuracy at the same time

Quick term guide

AI agents
AI agents are AI tools that can carry out steps toward a goal, not just answer once.
developers
Developers are people who build software, apps, or websites.
context window
The amount of text an AI tool can remember and use in one chat.
token costs
Token costs are the fees paid for the text an AI model reads and writes.
token cost
The money or usage spent when sending text to an AI model and getting text back.
search engine
A website like Google or Bing that helps you find information on the internet.
chunk size
The length of each text snippet stored for retrieval; too large or too small hurts search accuracy.
query rewriting
Rephrasing the original question so the search engine finds more relevant results.
Read original