Robots.txt Generator
Build a valid robots.txt with a visual editor. Add user-agent rules, allow/disallow paths, crawl-delay, and sitemap references — then copy or download.
Rule Block 1
Sitemap URLs
Live Preview
# robots.txt generated by sunnypatel.co.uk/tools/robots-generator User-agent: * Disallow:
How it works
What is robots.txt?
A robots.txt file tells search engine crawlers which URLs on your site they can and cannot access. It lives at the root of your domain (e.g. example.com/robots.txt) and follows the Robots Exclusion Protocol.
User-agent matching
Each rule block targets a specific crawler via the User-agent directive. Use * to match all crawlers, or name a specific bot like Googlebot or GPTBot. The most specific matching rule wins.
Allow vs Disallow
Disallow: /path/ blocks crawlers from that URL prefix. Allow: /path/ overrides a broader disallow for that specific prefix. An empty Disallow: means "allow everything".
Crawl-delay
The Crawl-delay directive (in seconds) asks bots to wait between requests. Googlebot ignores this — configure crawl rate in Google Search Console instead. Bing and Yandex do respect it.
Common pitfalls
- Robots.txt does not prevent indexing — use a
noindexmeta tag or X-Robots-Tag header for that. - Blocking CSS/JS files can hurt rendering and SEO — Google needs these to understand your pages.
- Wildcards (
*in paths) and end-of-URL markers ($) are supported by Google and Bing but not all bots. - The file must be served from the root domain — not a subdirectory.
- If your robots.txt returns a 5xx error, Google treats it as if all URLs are blocked.
AI crawler blocking
To prevent AI training crawlers from scraping your content, add specific rules for GPTBot (OpenAI), ChatGPT-User (ChatGPT browse), Google-Extended (Gemini training), and CCBot (Common Crawl / Anthropic). Use the "Block AI Crawlers" preset above to set this up.