XML Sitemaps: The Complete Guide for Modern Sites
XML sitemaps help search engines discover and prioritize your content. Learn how to create, optimize, and submit sitemaps — including common mistakes to avoid.
Auxmeta Team
SEO Engineering · January 10, 2025
An XML sitemap is a roadmap you hand to search engines. It lists the URLs you want crawled and indexed, optionally with metadata about when they were last updated and how important they are relative to each other.
Sitemaps are especially valuable for large sites, new sites with few external links, and sites with pages that aren't well-connected through internal linking. Even if Google could find all your pages through crawling, a sitemap makes discovery faster and more reliable.
What to Include (and What to Leave Out)
Only include URLs you want indexed. This sounds obvious, but it's a common mistake to auto-generate sitemaps that include paginated pages, filtered URLs, session IDs, or pages with noindex tags. Each of these wastes crawl budget and can confuse Googlebot.
- Include: canonical versions of important pages
- Include: pages updated recently that need re-crawling
- Exclude: noindex pages
- Exclude: paginated pages (or include only page 1)
- Exclude: URLs with tracking parameters
- Exclude: redirect URLs
Sitemap Index Files for Large Sites
Google supports sitemap files up to 50,000 URLs and 50MB uncompressed. For larger sites, use a sitemap index file that references multiple individual sitemaps. This also lets you organize by content type—one sitemap for blog posts, one for product pages, one for images.
Separating sitemaps by type gives you clearer data in Google Search Console on which content is being indexed and which is being ignored.
Submitting and Monitoring Your Sitemap
Submit your sitemap in Google Search Console under Indexing → Sitemaps. Check back weekly for errors. Common errors include URLs blocked by robots.txt, redirect URLs, and soft 404s—pages that return a 200 status code but show 'not found' content.
Also add your sitemap URL to your robots.txt with Sitemap: https://yourdomain.com/sitemap.xml. This helps crawlers discover it even if they don't use Search Console.
Auxmeta automatically generates and monitors your XML sitemap, detecting new URLs that aren't included and flagging sitemap errors before they cause indexing gaps.
