Sitemap Validator
Fetch any XML sitemap and check structure, URL count, lastmod data, and common issues. Supports both URL-set and sitemap-index formats.
Why sitemap hygiene matters.
A clean XML sitemap is how you tell search engines which URLs you want crawled and when they were last updated. A malformed sitemap can silently hurt crawl efficiency and let stale URLs linger in the index. Search engines treat the file as a trust signal: when the URLs it lists are canonical, indexable, and accurately dated, crawlers spend their budget on the pages you actually care about.
Format integrity
Valid XML, correct namespaces, well-formed <url> and <loc> elements.
URL count limits
Google caps individual sitemaps at 50,000 URLs / 50MB. We flag over-limit files.
Lastmod accuracy
Search engines weight lastmod for prioritization. Stale or missing values hurt.
Sitemap index
Index files split large sitemaps. We resolve children and surface counts.
From URL to verdict in four steps.
The validator mirrors the path a search engine takes when it discovers your sitemap — so the issues it surfaces are the same ones a crawler would hit.
Fetch
We request your sitemap URL with a real crawler user-agent and follow up to one redirect, exactly like Googlebot.
Parse
The XML is parsed against the sitemaps.org 0.9 schema — namespaces, encoding, and well-formedness all verified.
Resolve
If it's a sitemap-index, we walk each child <sitemap> reference and tally the URLs they point to.
Report
You get URL counts, lastmod coverage, size against Google's limits, and a list of structural issues.
Six sitemap errors that quietly cost crawls.
These are the issues we see most often in real-world sitemaps — and exactly how to fix each one.
Over 50,000 URLs
A single sitemap exceeds Google's hard cap of 50,000 URLs or 50MB uncompressed.
Split into multiple sitemaps and reference them from a sitemap-index file.
Bad XML / encoding
Unescaped ampersands, stray characters, or a missing namespace declaration break parsing.
Escape entities (&, <), declare the xmlns, and serve as UTF-8.
Non-canonical URLs
Sitemap lists http when the site is https, www vs non-www, or URLs that 301/404.
List only canonical, 200-status URLs that exactly match your preferred host.
Stale or missing lastmod
lastmod values never change or are absent, so crawlers can't prioritize fresh pages.
Emit an accurate lastmod that reflects real content changes — not the build time.
Noindex / blocked URLs
Sitemap includes URLs that are noindexed or disallowed in robots.txt — mixed signals.
Only submit indexable URLs. Remove anything you don't want in the index.
Not referenced anywhere
The sitemap exists but isn't in robots.txt or submitted in Search Console.
Add a Sitemap: line to robots.txt and submit the file in Google Search Console.
The clean-sitemap checklist.
- One canonical version per URL — no http/https or www duplicates
- Every listed URL returns 200 and is indexable (no noindex, no robots block)
- Accurate <lastmod> dates that track real content changes
- Under 50,000 URLs and 50MB per file; use an index above that
- Referenced in robots.txt and submitted in Google Search Console
- Gzip-compressed for large files to cut crawl bandwidth