Cloudflare Enhances AI Crawler Controls to Protect Content Creators

Cloudflare has rolled out a significant update to its bot management capabilities, introducing new controls designed to give website owners greater agency over how Artificial Intelligence (AI) crawlers access their content. These new features allow for the categorization of AI traffic into three distinct types: Search, Agent, and Training. This granular approach aims to empower content creators to better protect their work from unauthorized scraping and use in AI model training, while still allowing legitimate AI-driven services to function.

Previously, bot management often involved a binary decision: allow or block all automated traffic. Cloudflare's new framework moves beyond this simplistic model by recognizing that different AI crawlers serve different purposes. By classifying bots based on their function—whether for search engine indexing, general agent-based interaction, or specific training data collection—website administrators can now implement more nuanced policies. This addresses a growing concern among content creators who wish to be compensated for their work and have control over its use, without resorting to blanket bans on all automation.

The updated controls will see a change in default settings beginning September 15, 2026. For new domains, Training and Agent crawlers will be blocked by default on pages that display advertisements. Search crawlers, such as those used by major search engines, will continue to be allowed by default. This policy shift is designed to protect ad-revenue-generating content from being indiscriminately scraped for AI training purposes.

Cloudflare also highlighted the challenge posed by multi-purpose crawlers that perform functions across multiple categories. Under the new policy, these crawlers will be evaluated against all applicable policies. For instance, if a website blocks Training crawlers, even those that also perform Search functions (like Googlebot or Bingbot) will be blocked by default. Website owners who prefer the current behavior can opt out of the new default settings before the September 15 deadline.

Beyond access controls, Cloudflare is enhancing visibility and management for its Enterprise Bot Management customers through BotBase. This searchable database provides a centralized view of known bots, including their classifications and behaviors, based on Cloudflare's updated taxonomy. Administrators can use BotBase to identify specific bots, filter traffic, and create custom security rules, offering a more comprehensive understanding of automated traffic patterns.

Further extending control, Cloudflare is introducing content use policies for Enterprise customers. These policies allow administrators to define how crawled content can be used after retrieval, with options ranging from 'Immediate' (no storage or reuse) to 'Reference' (indexing and excerpts) and 'Full' (summaries or reproduction). While robots.txt will be extended with a new use parameter to express these preferences, Cloudflare will monitor compliance through BotBase, potentially revoking 'Verified' status for bots that disregard these declared uses.

In an effort to foster trust and transparency in the evolving AI landscape, Cloudflare is also proposing a transitive trust model for AI agents. This model aims to allow website owners to apply access policies based on the original bot operator, even when requests are routed through intermediaries. By leveraging standard HTTP headers, this approach seeks to provide a clearer chain of trust and accountability for AI-driven web interactions, though Cloudflare acknowledges it may not suit all privacy-sensitive use cases.