How do you track AI crawlers (LLM bots)? | Help Documents

Our software does not block IP addresses of any AI crawler. However, users can choose to block these IP addresses at their discretion using our platform.

Fraud Blocker uses a combination of publicly available IP address blocks and User-Agent strings (a short bit of text that tells your server “who” is making the request) to determine if a visitor to your website is an AI crawler. This is an extremely fast-moving space so these indicators are subject to change.

Once detected, you can view these crawlers in your Fraud Blocker reports:

Below is a list of the AI crawlers detected by Fraud Blocker

1. GPTBot

Purpose: Used by OpenAI to gather publicly available web data to improve their language models like GPT-4 and GPT-4o. This includes both their general crawler and their user-requested crawler.

User-Agent:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)

IP Ranges:
https://platform.openai.com/docs/bots

2. Anthropic AI

Purpose: Used by Anthropic to train models such as Claude.
User-Agent:
```
anthropic-ai/1.0
```
IP Ranges:
https://docs.anthropic.com/en/api/ip-addresses

3. Perplexity AI

Purpose: Gathers real-time data for Perplexity’s conversational search engine.

User-Agent:

PerplexityBot/1.0 (+https://www.perplexity.ai/bot)

IP Ranges:
https://docs.perplexity.ai/guides/bots

4. DuckAssist

Purpose: Powers DuckDuckGo’s instant AI answers by summarizing content using LLMs.

User-Agent:

DuckDuckBot/1.0; (+http://duckduckgo.com/duckduckbot.html)

IP Ranges:
https://duckduckgo.com/duckduckgo-help-pages/results/duckassistbot

5. NovaAct AI Bot

Purpose: Created by Amazon, NovaAct is used to power AI search and summarization.
User-Agent:
```
NovaBot/1.0 (+https://novaapp.ai/bot)
```
IP Ranges:
N/A

AI crawlers not available today

Some crawlers are not available today on Fraud Blocker. Below is a list and the reasons we don't included them on our platform:

Google AI Crawler

Reason: Google does not yet provide a stand-alone bot for their Gemini AI. Currently included with their general Googlebot.

Amazonbot

Reason: Mostly used for content for Alexa. NovaAct bot (shown above) is generally for their AI bot.

Meta AI crawler

Reason: Meta uses their crawler for indexing and potentially LLM training (e.g. LLaMA). Awaiting more clarity.

Applebot

Reason: Currently used for Siri and Spotlight search results, not AI products.

Related: