Cloudflare 4 min read

How to Deal with Cloudflare Known Bots

Published by Manuel Campos on December 30, 2025 • Updated on December 30, 2025

Match - Known Bots

According to Cloudflare’s 2025 data, approximately 30% of all global web traffic is generated by bots.

Not all bots are created equal. This 30% can be categorized into two distinct groups:

  • Verified Bots (The “Good” Bots): These are automated services that site owners generally want to interact with. They follow rules (like robots.txt) and provide a service.
  • Unverified & Malicious Bots (The “Bad” Bots) make up the majority of the “unwanted” traffic, don’t follow robots.txt rules and often hide their identity.

Just because bots are known and verified doesn’t mean you want their traffic and these are some steps to deal with them


Known Bots and Robots.txt

The relationship between bots and robots.txt is best described as an Honor System.

The file itself has no technical power to stop a bot; it is simply a “Keep Off the Grass” sign that bots choose to respect or ignore before entering a site.

For known bots like GooglebotBingbot, or DuckDuckGo, the robots.txt file is the law. These are known as “Verified Bots” in Cloudflare’s ecosystem.

The  robots.txt file is always located in the root directory of a website.

For WordPress users, keep in mind that WordPress generates a virtual file if a physical one doesn’t exist.

You can add a physical robots.txt file by accessing your the root of your WordPress installation.

I am sure that there might be plugins that take care of that too from the dashboard.


Blocking User Agents using Robots.txt Directives

Just because a bot has been verified doesn’t follow its crawl requests are welcome.

This is how you instruct one of the bots of meta not to crawl your website.

User-Agent: meta-webindexer/1.1
Disallow: /

Even after they see the rule, the bot might finish its current “list” of pages it already planned to visit before stopping.

You can block as many good bots as you want using robots.txt directives so you can focus on the bad actors.

Check the bots directory to learn more


Blocking Paths using Robots.txt Directives

“I include directives for both existing and non-existent directories as a formal ‘keep out’ sign for bots.

While my Cloudflare firewall rules handle the actual enforcement with a hard block.

Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/
Disallow: /tag/
Disallow: /feed/
Disallow: /category/
Disallow: /search/
Disallow: /author/
Disallow: /pages/
Disallow: /blog/
Disallow: /page/

It’s my way of telling the ‘good’ bots: ‘I don’t want to cause you any trouble, and I’d appreciate it if you didn’t cause me any either.’ By setting these rules early, we both know where the boundaries are


Cloudflare Verified Bot Categories

The Verified Bots category is a whitelist of automated services that have been manually reviewed and confirmed as “legitimate” or “helpful” by Cloudflare.

Instead of just looking at a User-Agent string (which can be easily faked), Cloudflare verifies these bots.

(cf.verified_bot_category eq "Page Preview")

You can check a list of all the bots included in that category by visiting the bots directory.

To ensure your content is correctly ranked, shared, and monetized, your firewall should always whitelist three key categories: Search Engine CrawlersPage Previews, and Advertising & Marketing bots.

A targeted crawling strategy means allowing Search Engine Crawlers in general, but specifically directing bots like Yandex to back off via robots.txt.

If you don’t want to whitelist an entire category, you can create a list like this in which you allow a user agent that contains an specified keyword and that belongs to known bots

(cf.client.bot and http.user_agent contains "AhrefsBot")

That rule confirms the bot’s identity so you aren’t accidentally letting in a malicious scraper using a fake User-Agent.


Block Access to File Extensions

Since all my images were converted to WebP, I have instructed know bots not to request for them.

Site doesn’t require external sheets or scripts.

Disallow: /*.jpg$
Disallow: /*.jpeg$
Disallow: /*.png$
Disallow: /*.js$
Disallow: /*.css$

Most of the hits to those types of files are made by bots, once the cache is cleared and a few have passed, a real user will not request those files.


Manuel Campos

Manuel Campos

I'm a WordPress enthusiast. I document my journey and provide actionable insights to help you navigate the ever-evolving world of WordPress."

Read Next

Support Honest Reviews

Help keep the reviews coming by using my recommended links.

May earn commission • No extra cost to you