Cloudflare changes AI crawler access rules

Cloudflare introduced new controls that let website owners manage AI traffic across three categories: Search, Agent, and Training. The feature is available to all Cloudflare customers, including those on the Free plan, and gives website owners more control over how different types of AI crawlers access their content.

“Content owners still want to be able to protect their content, and they should be compensated for the original content that they work hard to create, curate, and share. We also know that locking down content isn’t a one-size-fits-all solution; website owners want more options than resorting to ‘block all automation, every time,'” Jin-He Lee, Product Manager, and Bryan Becker, Director of Product at Cloudflare, explained.

New AI crawler controls

The company expanded its bot controls beyond a simple allow-or-block model. Instead of classifying bots as AI or non-AI, it now categorizes them by function, including Search, Agent, and Training, allowing website owners to apply separate policies to each.

AI crawler controls

The updated framework accounts for how bots use content after crawling it, and providers are encouraged to use separate crawlers for different functions to improve transparency and access control.

Starting September 15, 2026, the default settings for new domains will change. Training and Agent crawlers will be blocked by default on pages that display ads, while Search crawlers will remain allowed by default.

Multi-purpose crawlers that perform both Search and Training functions will be evaluated under both policies. If a website blocks Training crawlers, multi-purpose crawlers such as Googlebot, Applebot, and BingBot will be blocked, even when Search crawlers are allowed. This change takes effect on September 15, 2026.

Website owners who do not want the new default settings can opt out through the Security settings before September 15. This preserves the current behavior for Training crawlers that perform Search functions. Cloudflare said it will notify customers before the change so they have time to review and update their settings.

BotBase expands Enterprise Bot Management visibility

BotBase, a searchable database of known bots, including Verified Bots and AI agents, gives Enterprise Bot Management customers a centralized view of their classifications and behaviors based on Cloudflare’s updated bot taxonomy.

With BotBase, administrators can browse Cloudflare’s catalog of verified bots, search for specific bots, view their classifications, filter traffic by individual bots, and copy detection IDs for use in security rules.

The company plans to add more controls later this year, allowing customers to manage automated traffic from the same interface.

Content use controls

Cloudflare is adding content use controls that let Enterprise Bot Management customers define how bots may use content after crawling it. The three levels are Immediate (no storage or reuse), Reference (indexing, excerpts, and links back), and Full (summaries or reproduction).

The company is extending its Content Signals format in robots.txt with a new use parameter to express these preferences. Even though robots.txt does not enforce the setting, Cloudflare will report whether Verified Bots comply with the declared preferences through BotBase. Verified Bots that ignore these preferences or reproduce content in full may lose their Verified status.

Previously, all Verified Bots were allowed by default. Under the new model, verification confirms only a bot’s identity. Access depends on its classification and the website owner’s policies, including whether Search, Agent, or Training crawlers are permitted.

Cloudflare also plans to make the verification process more transparent and is developing tools that will help bot operators manage their classifications and Verified status.

Transitive trust for AI agents

Cloudflare proposed a transitive trust model for AI agents and bots operating through third-party platforms. The company plans to use the standard HTTP Forwarded header to identify the original requester behind an intermediary. The header can also include the operator’s declared content use, such as use=reference.

This would let website owners apply trust and access policies based on the original bot operator instead of the intermediary handling the request. The company acknowledged that the approach may not suit every use case, particularly where privacy or anonymous access is important.

Don't miss