Sitemaps
Create automatically generated sitemap files to be referenced from your robots.txt
. This helps with SEO and the discovery of new content. AEM can generate three types of sitemaps: without any configuration, based solely on a query index or based on a manual sitemap configuration.
Creating a Sitemap without any configuration
If you don’t do anything you will see your sitemap in sitemap.xml
and have a sitemap index in sitemap.json
. It will contain a list of all your published documents.
If you started with another type of sitemap and would like to switch to this type, you’ll have to delete the helix-sitemap.yaml
configuration file - either manually defined in GitHub or automatically generated - and reindex your site.
Domain name used in external URLs
To customize the domain used in creating external URLs, add a property named host
or cdn.prod.host
in your project configuration (named .helix/config
when using Google Drive as backend or .helix/config.xlsx
on Sharepoint) and preview that file to activate it.
Generating a Sitemap configuration based on an index
Please see the document Indexing to learn more about indexing. In order to generate a sitemap configuration based on an index, please ensure that you have already set up an initial query index as explained there. This will generate a sitemap at the location:
https://<branch>--<repo>--<owner>.hlx.page/sitemap.xml
And a sitemap configuration at the following location:
https://<branch>--<repo>--<owner>.hlx.page/helix-sitemap.yaml
It is recommended that you create a sitemap-index.xml
file that references all your sitemaps and keep that as part of your project code in your github repo. This way it is easy to add new sitemaps as the project expands.
Manual setup of your Sitemap configuration
If you need more customization than your generated sitemap configuration file provides, you can copy its contents and paste it into a file named helix-sitemap.yaml
in the root folder of your project.
Note: When using a manually configured index and sitemap (e.g. your code repo includes a helix-query.yaml and helix-sitemap.yaml file) your index definition must include the robots property to ensure the sitemap excludes pages with robots: noindex
metadata. When using auto-generated index definitions, simply follow the recommendations in the indexing documentation so those pages are excluded from the index.
The following sections contain the supported types of sitemaps.
Simple Sitemap
The following is a simple helix-sitemap.yaml
. It assumes a single index containing all the pages that need to appear in the sitemap.
sitemaps:
example:
source: /query-index.json
destination: /sitemap-en.xml
If you want last modification dates to be included in the URLs of your sitemap, add a lastmod
property including a format to your configuration.
sitemaps:
example:
source: /query-index.json
destination: /sitemap-en.xml
lastmod: YYYY-MM-DD
Multiple Sitemaps
It is common to have sitemaps per section of the sites and/or per country or language. AEM supports sitemaps including the corresponding hreflang
references. In the following example we assume that there is a one to one mapping between the indexes and the sitemaps XML files.
sitemaps:
example:
languages:
en:
source: /en/query-index.json
destination: /sitemap-en.xml
hreflang: en
fr:
source: /fr/query-index.json
destination: /sitemap-fr.xml
hreflang: fr
alternate: /fr/{path}
If there are two pages in the english and french section that share a common suffix, they will be related, so e.g. if you have a page /welcome
in the english section and a page /fr/welcome
in the french section, the resulting entry in the /sitemap-en.xml
will look like this:
<url>
<loc>https://wwww.mysite.com/welcome</loc>
<xhtml:link rel="alternate" hreflang="en" href="https://wwww.mysite.com/welcome"/>
<xhtml:link rel="alternate" hreflang="fr" href="https://wwww.mysite.com/fr/welcome"/>
</url>
A similar entry will be available in /sitemap-fr.xml
.
Specifying the primary language manually
There might be situations where you have alternate versions of a page, but you’re unable to use a common suffix to identify them, possibly because you’re porting a legacy website that should not have its paths changed. In that situation, you can specify a primary-language-url
for the alternate location, in the metadata of the document.
Let’s assume our primary language is english, we have a page /welcome
in the english section and /fr/bienvenu
in the french section, and the latter is an alternate version of the former.
First, we add that information to the document at /fr/bienvenu
in its metadata:
This can also be added to a global metadata
sheet, as shown in Bulk Metadata.
Then, we add an indexed property primary-language-url
to the french index:
primary-language-url:
select: head > meta[name="primary-language-url"]
value: attribute(el, "content")
Finally, we re-publish the french page, and rebuild the sitemap.
Specifying the default language
Another common requirement is to specify the default language for a sitemap with multiple languages. This can be achieved by adding a property default
in the sitemap:
sitemaps:
example:
default: en
languages:
en:
source: /en/query-index.json
destination: /sitemap-en.xml
hreflang: en
fr:
source: /fr/query-index.json
destination: /sitemap-fr.xml
hreflang: fr
alternate: /fr/{path}
In the resulting sitemap, all entries from the english subtree will have an extra alternate entry with hreflang x-default
.
Specifying multiple hreflangs for one subtree
Sometimes, it is required to map multiple hreflangs to only one language subtree, e.g. consider we want the following to appear in the resulting sitemap:
<url>
<loc>https://myhost/la/page</loc>
<xhtml:link rel="alternate" hreflang="es-VE" href="https://myhost/la/page"/>
<xhtml:link rel="alternate" hreflang="es-SV" href="https://myhost/la/page"/>
<xhtml:link rel="alternate" hreflang="es-PA" href="https://myhost/la/page"/>
</url>
Every page in our sitemap source should appear exactly once, but have multiple alternate hreflangs associated with it. In order to achieve this, you should specify an array of languages in the hreflang
property:
sitemaps:
example:
languages:
la:
source: /la/query-index.json
destination: /sitemap-la.xml
hreflang:
- es-VE
- es-SV
- es-PA
Multiple Indexes Aggregated Into One Sitemap
There are cases where it is easier to have a single larger sitemap than fragmented small sitemaps, especially as there is a limit of sitemaps that can be submitted to search engines per site.
The following example shows how to aggregate a number of separate indexes into a single sitemap.
sitemaps:
example:
languages:
dk:
source: /dk/query-index.json
destination: /sitemap.xml
hreflang: dk
alternate: /dk/{path}
no:
source: /no/query-index.json
destination: /sitemap.xml
hreflang: no
alternate: /no/{path}
Using the same destination it is possible to combine multiple small sitemaps into one larger sitemap.
Including other sitemaps as input
In a mixed scenario, where not all languages in a sitemap are managed in AEM, you can include sitemaps from other language trees by specifying an XML path as source, as in:
sitemaps:
example:
languages:
en:
source: /en/query-index.json
destination: /sitemaps/sitemap-en.xml
hreflang: en
fr:
source: https://www.mysite.com/legacy/sitemap-fr.xml
destination: /sitemaps/sitemap-fr.xml
hreflang: fr
alternate: /fr/{path}
In this example, we use an external french sitemap to calculate all sitemap locations. AEM will determine alternates for english sitemap URLs by deconstructing the french counterparts in external sitemap using the alternate
definition.