Robots.txt Tester
Fetch any site's robots.txt and test whether a specific path is allowed for a specific crawler. We parse the file the way Google does — longest-match precedence, wildcards, and end-of-string anchors all supported.
Crawl directives, not index control.
robots.txt tells crawlers which URLs they can fetch. It does notreliably keep pages out of the index — a page that's linked elsewhere can still appear in search results even if robots.txt blocks crawling. For real index control, use the noindex meta tag or HTTP header.
Allow / Disallow
Path patterns starting with /. Wildcards (*) supported. End-of-string anchor with $.
User-agent groups
Rules apply per crawler. Use User-agent: * for the default fallback.
Sitemap
Optional declarations pointing crawlers to your XML sitemap(s). One per line.
Crawl-delay
Honored by some engines, ignored by Google. Express in seconds between requests.
Three robots.txt bugs we see weekly.
Blocking your own JS/CSS
If you Disallow: /assets/ or /js/, Googlebot can't render your page. Layout-shift, JS-rendered content, and structured data all suffer.
Blocking entire site by accident
Disallow: / — usually a copy-paste from a staging robots.txt. Catastrophic. Always diff staging vs prod.
Trying to noindex via robots.txt
Robots.txt controls crawling, not indexing. A blocked URL can still show in SERPs via inbound links. Use noindex meta instead.