AI crawlers appear to read websites in very different ways

A post in r/SaaS says the author tracked 11 million crawler log events across 34 websites over three months. The post says GPTBot almost never checked robots.txt, but repeatedly requested /llms.txt even when it did not exist. It says Google’s bot checked robots.txt very often, while ClaudeBot visits rose sharply from April to early June. The post also says live AI visits tied to user questions often fetch one specific page instead of browsing the whole site.

Key points

  • The author says they reviewed 11 million crawler log events from 34 websites over three months.
  • GPTBot reportedly checked robots.txt only a few times but kept asking for /llms.txt.
  • Google’s bot reportedly rechecked robots.txt thousands of times.
  • ClaudeBot visits reportedly rose from 7.3k in April to 64k in May and 168k in the first ten days of June.
  • AI visits linked to user questions may fetch the one page that answers the question.

Quick term guide

crawler
A program that automatically visits websites and reads pages.
robots.txt
A small text file on a website that tells bots which pages they are or aren't allowed to visit.
/llms.txt
A proposed file path meant to help AI systems understand a website’s content.
llms.txt
A plain-text file format designed to help AI language models quickly read and understand a website's content.
business
An activity where you provide value to others in exchange for money.
visitors
People who opened the website or app page.
Matter
A smart home standard that helps devices from different brands work together.
server
A computer that stores files and shares them with other devices in your home.
Read original