AI agents crawl websites in wildly different ways — 11M event logs reveal the patterns
An analysis of 11 million access events over three months shows that AI agents vary dramatically in how they browse websites. Some dig deep into dozens of linked pages; others stop at the first page. Understanding these differences helps you build better agents or make your site easier for AI to read.
A dataset of 11 million real-world crawling events, collected over three months, has been shared publicly, revealing clear behavioral differences among AI agents. Some agents follow many links and explore a site several layers deep, while others make only a handful of requests and move on. The data includes which pages agents visit first, how many requests they send, and how far they go.
For anyone building AI agents, this is a practical reference for tuning crawling strategy — knowing what real agents do helps you decide how thorough or selective your own agent should be. For website owners, the data points to ways of structuring content so that important pages are discovered first and unnecessary server load from aggressive bots is reduced.
Key points
- Based on 11 million real access events collected over 3 months — not simulated data.
- Different AI agents vary widely in crawl depth and number of requests sent.
- Some agents follow tens of links per visit; others only read the first page.
- Website owners can use these patterns to surface key content earlier and cut wasted server traffic.
- Agent builders can benchmark their crawling strategy against real-world behavior.
Quick term guide
- AI agents
- AI agents are AI tools that can carry out steps toward a goal, not just answer once.
- AI agent
- An AI program that can inspect information and suggest what to do next.
- dataset
- A large, organized collection of data ready to use for analysis or model training
- crawling
- Automatically visiting websites to collect and save their data
- reference
- Using a source to find information or confirm facts while working.
- server
- A computer that stores files and shares them with other devices in your home.
- surface
- Here it means a distinct channel or interface where users encounter information, such as a search results page or an AI chat answer.
- benchmark
- A test used to compare speed, quality, or cost.