Most sites are wide open to bots — here's why that matters
Someone scanned a bunch of websites and found most had no protection against automated bots. Any bot — including AI data scrapers — can freely crawl these sites. A quick check of your own site's settings could prevent unwanted data collection.
A user on the r/StopBadBots subreddit scanned multiple websites and found that a large portion had no meaningful bot-blocking measures in place. Common gaps included missing or permissive robots.txt files and no User-Agent filtering, meaning automated programs can access and copy site content without any restriction.
This matters because AI training crawlers, price scrapers, and malicious bots all benefit from sites that don't push back. If you run a website, it's worth checking your robots.txt file and considering a service like Cloudflare to filter out unwanted bot traffic before it becomes a problem.
Key points
- Many websites have no bot-blocking rules set up at all
- Without a robots.txt file, any automated program can freely scrape your site
- AI training data collectors are among the bots that benefit from open sites
- Services like Cloudflare can filter bot traffic at the network level
- Check your own site's robots.txt and bot protection settings now
Quick term guide
- scraper
- An automated program that visits websites and copies their content in bulk.
- subreddit
- A topic-specific community inside Reddit where people post and discuss related content.
- port
- A specific virtual door on your computer used by apps to send and receive information.
- robots.txt
- A small text file on a website that tells bots which pages they are or aren't allowed to visit.
- User-Agent filtering
- A way to check what program is visiting your site and block it if it looks like a bot.
- Rice
- The hobby of visually customizing a computer desktop or operating system to make it look unique and personal.
- Cloudflare
- A service that protects websites and manages web traffic.
- cloud
- A remote computer you use over the internet instead of your own device.