Skip to main content

How do you track AI crawlers (LLM bots)?

Learn how we identify OpenAI, Anthropic, Perplexity and other AI bots and crawlers

Mike avatar
Written by Mike
Updated over a month ago

Our software does not block IP addresses of any AI crawler. However, users can choose to block these IP addresses at their discretion using our platform.

Fraud Blocker uses a combination of publicly available IP address blocks and User-Agent strings (a short bit of text that tells your server “who” is making the request) to determine if a visitor to your website is an AI crawler. This is an extremely fast-moving space so these indicators are subject to change.

Once detected, you can view these crawlers in your Fraud Blocker reports:

Below is a list of the AI crawlers detected by Fraud Blocker

1. GPTBot

  • Purpose: Used by OpenAI to gather publicly available web data to improve their language models like GPT-4 and GPT-4o. This includes both their general crawler and their user-requested crawler.

  • User-Agent:

    Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)

2. Anthropic AI

3. Perplexity AI

  • Purpose: Gathers real-time data for Perplexity’s conversational search engine.

  • User-Agent:

    PerplexityBot/1.0 (+https://www.perplexity.ai/bot)

4. DuckAssist

5. NovaAct AI Bot

  • Purpose: Created by Amazon, NovaAct is used to power AI search and summarization.

  • User-Agent:

    NovaBot/1.0 (+https://novaapp.ai/bot)
  • IP Ranges:
    N/A


AI crawlers not available today

Some crawlers are not available today on Fraud Blocker. Below is a list and the reasons we don't included them on our platform:

Google AI Crawler

  • Reason: Google does not yet provide a stand-alone bot for their Gemini AI. Currently included with their general Googlebot.

Amazonbot

  • Reason: Mostly used for content for Alexa. NovaAct bot (shown above) is generally for their AI bot.

Meta AI crawler

  • Reason: Meta uses their crawler for indexing and potentially LLM training (e.g. LLaMA). Awaiting more clarity.

Applebot

  • Reason: Currently used for Siri and Spotlight search results, not AI products.

Related:

Did this answer your question?