Free Robots.txt Tester & Validator

Test any website's robots.txt file to see which pages search engines can and cannot crawl. Debug your SEO by verifying your robots directives are working correctly.

Last updated: May 2026

Robots.txt Validator

Enter a domain name (without http://)

FreetoolsRobotsTxtTester.tool.testPathHint

FreetoolsRobotsTxtTester.tool.userAgentHint

FreetoolsRobotsTxtTester.tool.whatWeTestTitle

  • All User-agent groups
  • Allow & Disallow rules
  • Sitemap declarations
  • Crawl-delay directives
  • Google longest-match precedence

Robots.txt Analysis

FreetoolsRobotsTxtTester.results.placeholderTitle

FreetoolsRobotsTxtTester.results.placeholderHint

Want to control crawler access for every short link you create?

Create a free UseClick account to manage redirects, link-level robots controls, and privacy-first analytics across every campaign you ship.

What Is robots.txt?

robots.txt is a plain text file that lives at the exact root of every website (always at /robots.txt) and tells web crawlers which URLs they are allowed to fetch. It is the cornerstone of the Robots Exclusion Protocol, a voluntary standard formalized as RFC 9309 in 2022 and honored by every major search engine including Google, Bing, Yandex, DuckDuckGo, and Baidu, plus most modern AI crawlers like GPTBot, ClaudeBot, and PerplexityBot. The file uses simple directives such as User-agent, Disallow, Allow, Sitemap, and Crawl-delay to grant or deny crawl access on a per-bot or per-path basis. A robots txt checker like this one parses the file the same way real crawlers do, so you can verify a rule does exactly what you intended before search engines see it. robots.txt does not enforce security and is not a password gate, but for honest crawlers it is the single most important file controlling how your site is discovered, indexed, and ranked.

Why robots.txt Matters for SEO

Three reasons every site owner should validate their robots.txt with a proper tester.

1. Controls Crawl Budget

Search engines allocate a finite crawl budget to every domain. Disallowing low-value URLs (faceted navigation, internal search, duplicate parameters) frees Googlebot to spend that budget on the pages that actually rank and convert.

Up to 40% of crawl budget is wasted on duplicate URLs (Botify, 2024)

2. Prevents Indexing of Private Pages

Admin panels, staging environments, checkout funnels, and member-only areas should never appear in search results. Combined with noindex tags, robots.txt is the first line of defense against accidental indexing of sensitive paths.

7% of sites unintentionally expose private URLs in search

3. Must Be at the Domain Root

robots.txt only works when served from the exact path /robots.txt at the root of the host. A file at /folder/robots.txt is completely ignored by crawlers. Each subdomain needs its own file, and HTTPS and HTTP versions are treated separately.

Must return HTTP 200 from /robots.txt at the root host

Common robots.txt Mistakes

Real-world errors we see every week. Test your file with the robots txt checker above to catch these before they hurt rankings.

Avoid These Critical Errors

Disallow: /
Blocks your entire site from every crawler. Usually leaked from staging to production.
Blocking CSS and JS
Disallowing /assets/ or /static/ prevents Google from rendering pages, which damages rankings.
Wildcard Misuse
Disallow: *.pdf does nothing. Use Disallow: /*.pdf$ to block PDFs site-wide.
Wrong File Path
Placed at /robots/robots.txt instead of /robots.txt. Crawlers will never find it.
Blocking + Noindex Combo
Disallowing a page prevents Google from seeing its noindex tag, keeping it indexed.
Case-Sensitive Paths
Disallow: /Admin does not block /admin. Paths are case-sensitive.

robots.txt Syntax Reference

Every directive you need, with working examples you can copy into your own robots.txt today.

1

User-agent

Declares which crawler the following rules apply to. Use * for all bots, or name a specific product token like Googlebot, Bingbot, GPTBot, or ClaudeBot. Multiple User-agent lines can stack to share one rule block.

User-agent: * User-agent: Googlebot Disallow: /private/
2

Disallow

Tells the matched crawler not to fetch any URL beginning with the given path. An empty Disallow means allow everything. Use $ to anchor the end of the URL and * for wildcards.

Disallow: /admin/ Disallow: /*.pdf$ Disallow: /search?
3

Allow

Overrides a broader Disallow rule for specific paths. The most specific (longest) pattern wins, and when Allow and Disallow tie in length, Allow takes precedence per Google's spec.

User-agent: * Disallow: /private/ Allow: /private/public-page.html
4

Sitemap

Points crawlers to your XML sitemap. Sitemap is a top-level directive (not bound to any User-agent group) and you can list multiple sitemaps. Always use the absolute URL.

Sitemap: https://example.com/sitemap.xml Sitemap: https://example.com/sitemap-images.xml
5

Crawl-delay

Requests a minimum number of seconds between successive crawler requests. Bing, Yandex, and Yahoo respect this directive; Google does not (use the crawl rate setting in Search Console instead).

User-agent: Bingbot Crawl-delay: 10
6

# Comments

Any text after a # on a line is treated as a comment and ignored by crawlers. Use comments to document why a rule exists so future maintainers do not delete it accidentally.

# Block AI crawlers from training pages User-agent: GPTBot Disallow: / # full site block

Frequently Asked Questions

A robots.txt tester is a free online tool that fetches your site's robots.txt file, parses every User-agent, Allow, Disallow, Crawl-delay, and Sitemap directive, and then simulates exactly what a specific crawler like Googlebot or Bingbot would do when it encounters a given URL. You need one because a single misplaced character in robots.txt can de-index your entire site overnight. The UseClick robots txt checker uses the same longest-match precedence algorithm that Google publishes in its open-source robots.txt parser, so the verdict you see here matches what Google Search Console will report. Whether you are debugging why pages disappeared from search results, validating a new disallow rule before pushing it live, or auditing a competitor's crawl strategy, a proper tester removes guesswork and turns robots.txt from a fragile config file into a tested, verifiable contract with search engines.

We follow Google's official robots.txt specification (RFC 9309). First, we find the most specific User-agent group that matches your chosen crawler. Substring matching is used, so a group declared for 'Googlebot' applies to 'Googlebot/2.1' but not to 'Googlebot-Image' which has its own dedicated group if present. Once the right group is selected, we compare your test path against every Allow and Disallow pattern within that group. Wildcards (*) match any sequence of characters and the end-of-string anchor ($) pins a pattern to the end of the URL. The pattern with the longest character count wins, and when an Allow rule and a Disallow rule are equally specific, the Allow rule takes precedence. This is identical to the behavior Google documents at developers.google.com/search/docs/crawling-indexing/robots.

This is one of the most common SEO misunderstandings. The Disallow directive tells crawlers not to fetch a URL, but it does not tell search engines not to index it. If other pages link to a blocked URL, Google can still index the URL itself, often with the snippet 'No information is available for this page.' To truly prevent indexing, you must allow the crawler to fetch the page and then serve a noindex meta tag or X-Robots-Tag header. Blocking a page in robots.txt that already has a noindex tag actually prevents the noindex from being seen, which can keep the URL indexed indefinitely. Use our robots txt checker to confirm the page is fetchable, then verify the noindex tag with our HTTP Header Checker or Meta Analyzer tools.

That depends entirely on your content strategy. Blocking AI crawlers like GPTBot, ClaudeBot, PerplexityBot, CCBot, and Google-Extended prevents your content from being used to train large language models or from appearing as cited sources in AI search results such as ChatGPT Search, Perplexity, and Google's AI Overviews. Publishers protecting unique editorial content often block these bots, while companies focused on brand visibility and discoverability typically allow them because AI search is rapidly becoming a major traffic source. Our robots.txt tester lets you simulate every major AI crawler so you can verify your access rules match your intent before AI bots crawl your site. For a deeper analysis of how AI crawlers see your domain, pair this tool with our AI Crawlability Checker.

The single most destructive mistake is shipping 'Disallow: /' to production, which blocks every page on your site from every crawler. This often happens when staging configurations leak into production deployments. Other frequent errors include blocking CSS and JavaScript directories, which prevents Google from rendering your pages properly and tanks rankings; using wildcards incorrectly, such as 'Disallow: *.pdf' instead of the correct 'Disallow: /*.pdf'; placing robots.txt at a subfolder URL when it must live at the exact root of the domain; mixing case in directives even though crawlers are case-insensitive for directives but case-sensitive for paths; and forgetting that each User-agent group is independent, so a Disallow under one group does not apply to another. The UseClick robots txt checker flags these patterns by letting you test the exact URL and user-agent combinations you care about.

Yes. robots.txt is a public file by design and must be served from the exact path /robots.txt at the root of every host. Our tester fetches it directly from the target domain, so you can analyze any publicly accessible site. This is genuinely useful for competitive research because robots.txt reveals which sections of a competitor's site they consider low-priority for crawl budget, which third-party tools they integrate with based on bot-specific rules, whether they are protecting paid content from indexing, and which sitemap URLs they expose. Combined with our Meta Analyzer and HTTP Header Checker, you can build a complete picture of how a competitor structures their technical SEO. All checks are server-side and do not register as visits in their analytics.

Ship Links Search Engines (and Humans) Actually Trust

Pair a clean robots.txt with branded short links that show a preview card before anyone clicks. UseClick makes both effortless, with real-time analytics and privacy-first tracking on every plan.

Crawler-Friendly

Short links honor your site's robots and SEO rules

Branded Domains

Use your domain instead of bit.ly

Full API Access

Automate links on every plan, including free

Create a Free UseClick Account
Privacy-first (GDPR compliant)No credit card requiredSetup in 60 seconds

Ready to track smarter?

UseClick.io makes link management effortless. Create branded short links that are clean, memorable, and built to strengthen your brand identity.