In general, sitemaps provide an overview of the content of a website so that users and crawlers can find their way around the site. The XML sitemap should not be confused with the HTML sitemap. This blog post will briefly explain what the difference between the two is and when and why an XML sitemap is important for SEO.
What’s the difference in XML and HTML sitemaps?
The HTML sitemap is primarily intended for the user of a website and is often linked in the footer of a website. It gives users an overview of the content and information architecture of the site, lists the most important pages with their categories and subcategories and can help with usability. A few years ago, the use of an HTML sitemap was more relevant than it is today, as the clarity of the menus were lacking at that time. Today the HTML sitemap can already be replaced by flat navigation structures.
HTML Sitemaps and SEO
In terms of SEO, HTML sitemaps can be advantageous because they make it easier for crawlers to find the categories and subcategories and can relay Link Juice to the subpages. However, their use for search engine optimization is controversial and can at best be used as a useful supplement to the XML sitemap, but does not replace it.
What is an XML Sitemap?
An XML Sitemap is a text file in XML format and contains all relevant URLs of a website in a structured and machine-readable form. It is used by crawlers, such as Googlebot, to get an overview of the indexable pages. This sitemap is not visible for users. However, an existing XML sitemap is no guarantee that Google will include all URLs of the sitemap in the index. It only supports the Googlebot in searching for the content.
When should an XML Sitemap be used?
For domains with a few hundred or thousand URLs, an XML Sitemap is not absolutely necessary to be included in the Google Index. However, it does offer webmasters the advantage of being able to see in the Google Search Console how many URLs are included in the index and it can have a positive effect on search engine optimization. Last but not least, an XML sitemap is part of SEO best practice and should be on the "to-do list" when optimizing a website.
In addition, search engines can use an XML sitemap to detect changes more quickly, for example, when new URLs are added. Even if pages are not linked to each other (this should of course not happen in the best case), a sitemap can help Google and Co. not to overlook them when crawling.
For domains with more than 5,000 URLs, an XML sitemap should always be used. It is important to note that an XML sitemap can contain a maximum of 50,000 URLs and can be no larger than 50 MB. If the website is larger, the XML sitemap must be split. The individual sitemaps are then linked in the corresponding index file.
On the Google Support page you can read again when it makes sense to use an XML Sitemap.
How is an XML Sitemap created?
An XML Sitemap should not be created manually, this takes too much effort and errors can quickly creep in, as it always has to be updated or touched manually, for example when new pages are added. Exceptions are relaunches or NoIndex Sitemaps. Most content management systems, including Drupal, have an extension to create sitemaps dynamically. The sitemap standard states that a sitemap can only reference files that are located in the same directory or subdirectory, so care must be taken. Whether the file was actually saved in the correct directory can be checked by entering the website URL with the addition "/sitemap.xml". For example: https://www.1xinternet.de/sitemap.xml
Where is an XML Sitemap filed?
The XML sitemap is stored in the root directory (main directory) of the domain, where the robots.txt is also located. This is the text file that determines which areas of a domain may be crawled by a web crawler and which may not. So the robots.txt should be supplemented with the sitemap URL (e.g. https://www.my-website.de/sitemap.xml). If the robots.txt is then called up via the browser (e.g. https://www.my-website.de/robots.txt), the entry for the sitemap URL should be present.
A sitemap should always be kept up to date and submitted to Google via the Search Console. If this happens, adding the sitemap URL to robots.txt is not absolutely necessary.
What does not belong in an XML Sitemap
The following elements do not belong in the XML sitemap, otherwise problems will occur during crawling:
- File that is not UTF-8 encoded
- URLs of other domains / subdomains - each domain gets its own sitemap
- Incorrect timestamps (summer / winter time)
- Redirects - list only unique URLs to not confuse Google
- Status code 404 / 410 (Error) - Error pages must be deleted or fixed
- Pictures, videos etc. - they belong in a separate sitemap
- Duplicates of a URL and URLs with Canonical tags
- URLs with NoIndex tags
Is an XML Sitemap good for SEO?
A sitemap helps Google to find content more easily and to detect changes quickly, as described above, but is not a direct ranking factor. For new websites, Google should be informed as soon as possible that there are new URLs (upload the sitemap via GSC). This way the pages of a website are indexed faster and it can be controlled directly which pages should be included in the index.
By uploading the XML Sitemap in the Google Search Console (GSC), Google is informed about all relevant URLs that are located on a website. This helps to index subpages or content that is difficult to access.
Google can also be informed by additional meta information in the XML Sitemap about when a subpage was last updated and how often an update has taken place.
Other types of XML Sitemaps
Besides the XML sitemap, which contains the URLs of a page, there is also the possibility to integrate image, video or news sitemaps.
This sitemap is an XML file especially for images and is especially worthwhile if a website is very large and has many images and subpages, or if these are to be found via Google Image Search.
The video sitemap is similar to the image sitemap. It is a separate XML file and is used exclusively by search engines for indexing the video content of a website and improving the ranking of videos in “Google Video Search”. In addition, video content can be provided with further characteristics such as category, title and duration. The video sitemap can also be embedded in an existing XML sitemap. With regard to SEO or video optimization, a video sitemap should be considered an integral part of a multi-channel strategy.
Google News Sitemap
This type of sitemap is useful for a news portal, for example, and only works for sites that are approved for Google News (requires registration with Google News). A Google News sitemap is smaller than an XML sitemap, so Google can process it very quickly. News Sitemaps have special rules, for example, they can only contain news from the last two days. Please contact Google Support for more information.
Conclusion: For each website project it is important to consider whether and which sitemap should be used. Our project managers and our SEO team will be happy to advise you!