Our software does not block IP addresses of any AI crawler. However, users can choose to block these IP addresses at their discretion using our platform.
Fraud Blocker uses a combination of publicly available IP address blocks and User-Agent strings (a short bit of text that tells your server “who” is making the request) to determine if a visitor to your website is an AI crawler. This is an extremely fast-moving space so these indicators are subject to change.
Once detected, you can view these crawlers in your Fraud Blocker reports:
Below is a list of the AI crawlers detected by Fraud Blocker
1. GPTBot
Purpose: Used by OpenAI to gather publicly available web data to improve their language models like GPT-4 and GPT-4o. This includes both their general crawler and their user-requested crawler.
User-Agent:
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)
IP Ranges:
https://platform.openai.com/docs/bots
2. Anthropic AI
Purpose: Used by Anthropic to train models such as Claude.
User-Agent:
anthropic-ai/1.0
3. Perplexity AI
Purpose: Gathers real-time data for Perplexity’s conversational search engine.
User-Agent:
PerplexityBot/1.0 (+https://www.perplexity.ai/bot)
IP Ranges:
https://docs.perplexity.ai/guides/bots
4. DuckAssist
Purpose: Powers DuckDuckGo’s instant AI answers by summarizing content using LLMs.
User-Agent:
DuckDuckBot/1.0; (+http://duckduckgo.com/duckduckbot.html)
5. NovaAct AI Bot
Purpose: Created by Amazon, NovaAct is used to power AI search and summarization.
User-Agent:
NovaBot/1.0 (+https://novaapp.ai/bot)
IP Ranges:
N/A
AI crawlers not available today
Some crawlers are not available today on Fraud Blocker. Below is a list and the reasons we don't included them on our platform:
Google AI Crawler
Reason: Google does not yet provide a stand-alone bot for their Gemini AI. Currently included with their general Googlebot.
Amazonbot
Reason: Mostly used for content for Alexa. NovaAct bot (shown above) is generally for their AI bot.
Meta AI crawler
Reason: Meta uses their crawler for indexing and potentially LLM training (e.g. LLaMA). Awaiting more clarity.
Applebot
Reason: Currently used for Siri and Spotlight search results, not AI products.
Related: