Report on LLM and AI-generated traffic
This use case article explores how to use the Customer Journey Analytics derived fields capability as a foundation to report on LLM (Large Language Model) and AI-generated traffic.
Detection methods
To detect LLM and AI-generated traffic, distinguish between:
- LLM crawlers: Collect data for training and retrieval augmented generation (RAG).
- AI agents: Function as interfaces that perform task on behalf of humans. AI agents prefer to interact via APIs, which bypasses web analytics tracking methods. Nonetheless, you can still analyze a significant portion of AI-generated traffic through websites.
Three common core detection methods to identify and monitor LLM and AI-generated traffic are:
- User agent identification: When a request is made to your server, the HTTP User-Agent header is extracted and analyzed against known AI crawler and agent patterns. This server-side method requires access to HTTP headers and is most effective when implemented at the data collection layer.
- Referrer classification: The HTTP Referrer header contains the URL of the previous webpage that linked to the current request. This header reveals when users click through to your site from web interfaces like ChatGPT or Perplexity.
- Query parameter detection: AI services can append URL parameters (particularly UTM parameters) to links. These parameters persist in the URL and can be detected through standard analytics implementations, making these URL parameters valuable indicators even in client-side tracking scenarios.
The following table illustrates how the detection methods can be used against different LLM and AI interaction scenarios.
GPTBot
, ClaudeBot
, and more) can be identified when server-side logging is implemented.ChatGPT-User
, claude-web
) can be identified when server-side logging captures headers.OAI-SearchBot
, PerplexityBot
) can be identified with server-side logging.Challenges
LLM & AI agents demonstrate complex and evolving behaviors when interacting with digital properties. These technologies operate inconsistently across platforms and versions. This inconsistency creates unique challenges for data professionals. The behavioral patterns vary significantly and depend on the specific AI platform, version, and interaction mode used. This operational diversity complicates efforts to track and categorize LLM and AI-generated traffic within standard analytics frameworks. The complex nature of these interactions, combined with their rapid evolution, requires nuanced detection and classification methods to maintain data integrity:
- Partial data collection: Some newer AI agents execute limited JavaScript, resulting in incomplete analytics data for client-side implementations. As a result, some interactions are tracked while other interactions are missed.
- Inconsistent session data: AI agents might execute JavaScript differently across sessions or page types. This execution difference creates fragmented user journeys in Customer Journey Analytics for client-side implementations.
- Detection challenges: With partial tracking, detection becomes unreliable as certain touchpoints might be invisible to analytics.
Detection signatures
As of August 2025, the following specific signals can be identified for each of the detection methods.
User agent identification
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.1; +https://openai.com/gptbot
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/2.0; +https://openai.com/bot
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ClaudeBot/1.0; +claudebot@anthropic.com
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Claude-User/1.0; +Claude-User@anthropic.com)
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Claude-SearchBot/1.0; +Claude-SearchBot@anthropic.com)
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Perplexity-User/1.0; +https://www.perplexity.ai/useragent)
Mozilla/5.0 (compatible; Google-Extended/1.0; +http://www.google.com/bot.html)
Mozilla/5.0 (compatible; BingBot/1.0; +http://www.bing.com/bot.html)
Mozilla/5.0 (compatible; DuckAssistBot/1.0; +http://www.duckduckgo.com/bot.html)
Mozilla/5.0 (compatible; YouBot (+http://www.you.com))
Mozilla/5.0 (compatible; meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler))
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15 (Applebot/0.1; +http://www.apple.com/go/applebot)
Mozilla/5.0 (compatible; Applebot-Extended/1.0; +http://www.apple.com/bot.html)
Mozilla/5.0 (compatible; Bytespider/1.0; +http://www.bytedance.com/bot.html)
Mozilla/5.0 (compatible; MistralAI-User/1.0; +https://mistral.ai/bot)
Mozilla/5.0 (compatible; cohere-ai/1.0; +http://www.cohere.ai/bot.html)
Referrer classification
Query parameter detection
Implementation
You can report on LLM and AI-generated traffic within a typical Customer Journey Analytics setup (connection, data views, and workspace projects) through the specific setup and configuration of derived fields, segments, and workspace projects.
Derived fields
To configure detection methods and detection signals use derived fields as the foundation. For example, define derived fields for user agent identification, query parameter detection, and referrer classification.
LLM/AI user agent identification
Use the Case When derived field functions to define a derived field that identifies LLM/AI user agents.
LLM/AI query parameter detection
Use the URL Parse and Classify derived field functions to define a derived field that detects query parameters.
LLM/AI referrer classification
Use the URL Parse and Classify derived field functions to define a derived field that classifies referrers.
Segments
Set up dedicated segments that help you to identify events, sessions or people related to LLM and AI-generated traffic. For example, use the derived fields that you created earlier to define a segment that identifies LLM and AI-generated traffic.
Workspace project
Use the derived fields and segments to report and analyze on LLM and AI-generated traffic. For example, see the annotated project below.