XML Sitemaps: How to Build and Submit Them

XML Sitemaps: How to Build and Submit Them

You published a new case study three weeks ago. It still has zero impressions in Search Console. The page is fine, the content is solid, and yet Google acts like it does not exist. Often the cause is boring: Googlebot never found the URL, or it found it and decided the page was not worth crawling.

An XML sitemap is one of the cheapest fixes for that problem. It is a file that lists the URLs you want search engines to know about, along with a few hints about each one. For a small brochure site it barely moves the needle. For a B2B site with hundreds of service pages, blog posts, and case studies, it becomes a real tool for getting pages discovered and indexed faster.

This guide covers what goes in a sitemap, how to generate one for the platform you are on, how to submit it to Google and Bing, and the errors that quietly keep your pages out of the index.

What an XML sitemap actually does

A sitemap helps search engines with discovery, not ranking. Submitting a URL does not promise it will rank, or even get indexed. It tells Google "this page exists and I care about it." Whether the page gets indexed still depends on quality, internal links, and crawl budget.

Here is the minimal structure of a valid sitemap entry:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/services/ppc-management</loc>
    <lastmod>2026-05-12</lastmod>
  </url>
  <url>
    <loc>https://example.com/blog/cost-per-lead-benchmarks</loc>
    <lastmod>2026-04-30</lastmod>
  </url>
</urlset>

Only two things matter in practice: the <loc> tag, which holds the full canonical URL, and <lastmod>, the date the page last changed in a meaningful way. Google reads lastmod and uses it to prioritize what to recrawl, but only if you keep it honest. If every URL shows today's date on every export, Google learns to ignore the field entirely.

Two older tags, <changefreq> and <priority>, still appear in tutorials. Google has said publicly it ignores both. You can include them or drop them, it changes nothing on Google's side. I leave them out to keep the file clean.

When you need one (and when you do not)

A 15-page site that is well linked internally probably does not need a sitemap to get crawled. Googlebot will follow your navigation and find everything.

You want a sitemap when at least one of these is true:

  • Your site has more than a few hundred URLs.
  • You publish frequently and want new pages found within hours, not days.
  • Some pages are poorly linked internally (orphan pages, deep archive content).
  • You run an international site with hreflang, or a large media library.
  • The site is new and has almost no inbound links yet, so discovery through external links is not happening.

Most B2B sites that take content seriously hit at least one of these. If you publish regularly, a sitemap is table stakes.

How to build one

You have three realistic paths. Pick by how your site is built.

CMS plugin or built-in feature

If you run WordPress, Yoast SEO and Rank Math both generate and maintain a sitemap automatically. The file usually lives at /sitemap_index.xml or /sitemap.xml. Shopify, Squarespace, Webflow, and Wix all generate one for you at /sitemap.xml with no setup. Modern frameworks like Next.js can output one at build time.

For most teams this is the right answer. The plugin updates lastmod when you edit a page, splits large sitemaps automatically, and removes noindexed URLs. You do almost nothing.

Crawler-generated

If your platform has no native option, a desktop crawler like Screaming Frog (free up to 500 URLs) or a service like Sitebulb will crawl your site and export a sitemap. This works, but the file is a snapshot. You have to regenerate and reupload it whenever the site changes, which people forget to do. Use it as a one-time fix or for an audit, not as your permanent setup.

Hand-written or scripted

For a static site or a custom app, you can generate the file from your own data. A short script that pulls published URLs from your database and writes the XML on each deploy keeps the sitemap accurate with zero manual work. This is the most reliable approach when you control the codebase, because the sitemap is always derived from the real source of truth.

Whatever method you choose, the sitemap must reflect what you actually want indexed. That sounds obvious and it is the single most common thing teams get wrong.

The rules your sitemap must follow

A sitemap that breaks these rules can be ignored or flagged in Search Console.

Rule Limit or requirement
URLs per file 50,000 maximum
File size 50 MB uncompressed maximum
URL format Absolute, including https:// and the full domain
Canonical only List the canonical version of each URL, never duplicates or parameter variants
Indexable only No URLs that are noindexed, redirected, blocked in robots.txt, or returning 404
Encoding UTF-8, with special characters escaped

Limits per the sitemaps.org protocol; figures are current as of writing and worth confirming in Google's documentation.

When you cross 50,000 URLs or 50 MB, you split into multiple sitemaps and tie them together with a sitemap index file:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-pages.xml</loc>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-blog.xml</loc>
  </sitemap>
</sitemapindex>

Splitting by content type, as above, has a side benefit. In Search Console you can see indexation rates per sitemap, so you find out that your blog is 95% indexed while your service pages sit at 60%. That is a signal worth acting on.

How to submit it

Step 1: Put it where crawlers look

Reference your sitemap in robots.txt so any crawler finds it without being told:

Sitemap: https://example.com/sitemap.xml

This one line covers Google, Bing, and everyone else. If you want a deeper look at what else belongs in that file, see the guide on configuring robots.txt for SEO.

Step 2: Submit in Google Search Console

Open Search Console, pick your property, go to Indexing > Sitemaps, and enter the sitemap path (for example sitemap.xml). Submit it. Google validates the format within minutes and starts reading URLs. If you have not set up Search Console yet, that comes first; the Search Console setup walkthrough covers verification.

Step 3: Submit to Bing

Bing Webmaster Tools has the same flow under Sitemaps. Bing also powers other engines and feeds some AI search tools, so submitting there is worth the five minutes. You can import your Search Console property into Bing to skip re-verifying.

After submission, do not refresh the report every hour. Google reads the file on its own schedule. Check back in a few days and look at how many submitted URLs moved to "indexed."

Reading the Search Console report

This is where the sitemap earns its keep. The Sitemaps report shows submitted versus discovered counts. The Pages report (under Indexing) shows what got indexed and, more usefully, what did not and why.

Common reasons a submitted URL stays unindexed:

  • Discovered, currently not indexed. Google knows the URL but has not crawled it, often a crawl-budget or quality signal. Strengthen internal links to that page.
  • Crawled, currently not indexed. Google looked and chose not to index, usually a thin-content or duplicate signal. Improve the page or consolidate it.
  • Excluded by noindex tag. The page has a noindex directive but sits in your sitemap. Pick one: index it (remove noindex) or remove it from the sitemap.
  • Alternate page with canonical. Google indexed a different URL it considers canonical. Check your canonical tags match the URLs in the sitemap.

A clean sitemap and a messy indexation report point straight at your real problems. If half your service pages are "crawled, currently not indexed," no amount of sitemap tweaking helps; the pages need work. That overlaps heavily with a broader technical SEO review, which is where most of these signals get diagnosed.

Common mistakes that waste the sitemap

The mistakes below are the ones I see most often on B2B sites during audits.

Including non-canonical or junk URLs. Faceted navigation, session IDs, UTM-tagged links, paginated archives. They bloat the file and dilute the signal. List one clean URL per piece of content.

Listing redirected or dead URLs. When you run 301 redirects during a migration, the sitemap must point at the new destinations, not the old ones. A sitemap full of 301s and 404s tells Google your file is stale.

Letting lastmod lie. Auto-setting every date to the export date is worse than omitting the tag. Google notices and discounts the whole field.

Forgetting to update after a relaunch. A site relaunch changes URLs. If the sitemap still lists the old structure, you are feeding Google a map to pages that moved.

Treating submission as the goal. Submitting a sitemap is step one. The work is reading the indexation report and fixing what it surfaces.

A quick visual of the flow

XML sitemap workflow Four steps: generate the sitemap, reference it in robots.txt, submit it in Search Console, then read the indexation report and fix issues. 1. Generate 2. robots.txt 3. Submit 4. Read report & fix

Frequently asked questions

Do I need an XML sitemap if my site is small?

Probably not for crawling. A tidy site under a couple hundred pages with good internal links gets found anyway. There is no harm in having one, and it gives you the per-URL indexation data in Search Console, which is reason enough for most people.

How often should the sitemap update?

Whenever your content changes. A plugin or build script handles this automatically and updates lastmod correctly. If you maintain the file by hand, regenerate it after publishing or editing pages, not on a fixed calendar.

Will submitting a sitemap improve my rankings?

No. A sitemap affects discovery and crawling, not ranking. It can help a page get indexed faster, and a page cannot rank until it is indexed, so there is an indirect link. The ranking itself comes from relevance, content quality, and links.

What is the difference between an XML sitemap and an HTML sitemap?

An XML sitemap is for search engines and lists URLs in a machine-readable format. An HTML sitemap is a page for human visitors, a directory of links. They serve different audiences. Most sites need the XML version; the HTML one is optional and mainly helps navigation on large sites.

Should I include images and videos in my sitemap?

You can, using image and video sitemap extensions, and it helps if visual media is core to your business. For a typical B2B service site it is rarely worth the effort. Get your page sitemap clean first.

My pages are in the sitemap but not indexed. What now?

Open the Pages report in Search Console and read the reason. "Crawled, currently not indexed" usually means the page needs better content or stronger internal links. "Discovered, currently not indexed" points at crawl budget or weak linking. The sitemap did its job; the fix is on the page.

The short version

A sitemap is a discovery tool, not a ranking lever. Build it, keep it honest, and treat the indexation report as the real deliverable.

Quick checklist before you call it done:

  • Generated automatically by your CMS, a script, or (as a stopgap) a crawler.
  • Lists only canonical, indexable, live URLs. No noindex, no redirects, no 404s.
  • lastmod reflects real changes.
  • Referenced in robots.txt and submitted in both Search Console and Bing Webmaster Tools.
  • You have read the indexation report and acted on what it showed.

If your pages are sitting unindexed and the Search Console report is hard to read, that is usually a sign of deeper technical or content issues, not a sitemap problem. We help B2B teams find out which pages are losing traffic and why. Book a 30-minute technical SEO review of your site and we will tell you where the indexation is breaking and what to fix first.