Robots.txt Generator

Build a valid robots.txt with a visual editor. Add user-agent rules, allow/disallow paths, crawl-delay, and sitemap references — then copy or download.

Presets:

Rule Block 1

Sitemap URLs

Live Preview

# robots.txt generated by sunnypatel.co.uk/tools/robots-generator

User-agent: *
Disallow:

How it works

What is robots.txt?

A robots.txt file tells search engine crawlers which URLs on your site they can and cannot access. It lives at the root of your domain (e.g. example.com/robots.txt) and follows the Robots Exclusion Protocol.

User-agent matching

Each rule block targets a specific crawler via the User-agent directive. Use * to match all crawlers, or name a specific bot like Googlebot or GPTBot. The most specific matching rule wins.

Allow vs Disallow

Disallow: /path/ blocks crawlers from that URL prefix. Allow: /path/ overrides a broader disallow for that specific prefix. An empty Disallow: means "allow everything".

Crawl-delay

The Crawl-delay directive (in seconds) asks bots to wait between requests. Googlebot ignores this — configure crawl rate in Google Search Console instead. Bing and Yandex do respect it.

Common pitfalls

  • Robots.txt does not prevent indexing — use a noindex meta tag or X-Robots-Tag header for that.
  • Blocking CSS/JS files can hurt rendering and SEO — Google needs these to understand your pages.
  • Wildcards (* in paths) and end-of-URL markers ($) are supported by Google and Bing but not all bots.
  • The file must be served from the root domain — not a subdirectory.
  • If your robots.txt returns a 5xx error, Google treats it as if all URLs are blocked.

AI crawler blocking

To prevent AI training crawlers from scraping your content, add specific rules for GPTBot (OpenAI), ChatGPT-User (ChatGPT browse), Google-Extended (Gemini training), and CCBot (Common Crawl / Anthropic). Use the "Block AI Crawlers" preset above to set this up.