Add site map and search engine robots

An attempt to generate and write the sitemap.xml file to the root directory results in the following error:

Please make sure that "/" is writable by the web-server.

With Adobe Commerce on cloud infrastructure, you can only write to specific directories, such as var, pub/media, pub/static, or app/etc. When you generate the sitemap.xml file using the Admin panel, you must specify the /media/ path.

You do not have to generate a robots.txt file because it generates the robots.txt content on demand and stores it in the database. You can view the content in your browser with the <domain.your.project>/robots.txt or <domain.your.project>/robots link.

This requires ECE-Tools version 2002.0.12 and later with an updated .magento.app.yaml file. See an example of these rules in the magento-cloud repository.

To generate a sitemap.xml file in version 2.2 and later:

  1. Access the Admin.

  2. On the Marketing menu, click Site Map in the SEO & Search section.

  3. In the Site Map view, click Add Sitemap.

  4. In the New Site Map view, enter the following values:

    • Filename:sitemap.xml
    • Path:/media/
  5. Click Save & Generate. The new site map becomes available in the Site Map grid.

  6. Click the path in the Link for Google column.

To add content to the robots.txt file:

  1. Access the Admin.
  2. On the Content menu, click Configuration in the Design section.
  3. In the Design Configuration view, click Edit for the website in the Action column.
  4. In the Main Website view, click Search Engine Robots.
  5. Update the Edit custom instruction of robots.txt field.
  6. Click Save Configuration.
  7. Verify the <domain.your.project>/robots.txt file or <domain.your.project>/robots URL in your browser.
NOTE
If the <domain.your.project>/robots.txt file generates a 404 error, Submit an Adobe Commerce Support ticket to remove the redirect from /robots.txt to /media/robots.txt.

Rewrite using Fastly VCL snippet

If you have different domains and you need separate site maps, you can create a VCL to route to the proper sitemap. Generate the sitemap.xml file in the Admin panel as described above, then create a custom Fastly VCL snippet to manage the redirect. See Custom Fastly VCL snippets.

NOTE
You can upload custom VCL snippets from the Admin UI or using the Fastly API. See Custom VCL snippet examples and tutorials.

Use a Fastly VCL snippet for redirect

Create a custom VCL snippet to rewrite the path for sitemap.xml to /media/sitemap.xml using the type and content key-value pairs.

{
  "name": "sitemapxml_rewrite",
  "dynamic": "0",
  "type": "recv",
  "priority": "90",
  "content": "if ( req.url.path ~ \"^/?sitemap.xml$\" ) { set req.url = \"/media/sitemap.xml\"; }"
}

The following example demonstrates how to rewrite the path for robots.txt and sitemap.xml to /media/robots.txt and /media/sitemap.xml

{
  "name": "sitemaprobots_rewrite",
  "dynamic": "0",
  "type": "recv",
  "priority": "90",
  "content": "if ( req.url.path ~ \"^/?sitemap.xml$\" ) { set req.url = \"/media/sitemap.xml\"; } else if (req.url.path ~ \"^/?robots.txt$\") { set req.url = \"/media/robots.txt\";}"
}

To use a Fastly VCL snippet for particular domain redirect:

Create a pub/media/domain_robots.txt file, where the domain is domain.com, and use the next VCL snippet:

{
  "name": "domain_robots",
  "dynamic": "0",
  "type": "recv",
  "priority": "90",
  "content": "if ( req.url.path == \"/robots.txt\" ) { if ( req.http.host ~ \"(domain).com$\" ) { set req.url = \"/media/\" re.group.1 \"_robots.txt\"; }}"
}

The VCL snippet routes http://domain.com/robots.txt and presents the pub/media/domain_robots.txt file.

To configure a redirect for robots.txt and sitemap.xml in a single snippet, create pub/media/domain_robots.txt and pub/media/domain_sitemap.xml files, where the domain is domain.com and use the next VCL snippet:

{
  "name": "domain_sitemaprobots",
  "dynamic": "0",
  "type": "recv",
  "priority": "90",
  "content": "if ( req.url.path == \"/robots.txt\" ) { if ( req.http.host ~ \"(domain).com$\" ) { set req.url = \"/media/\" re.group.1 \"_robots.txt\"; }} else if ( req.url.path == \"/sitemap.xml\" ) { if ( req.http.host ~ \"(domain).com$\" ) {  set req.url = \"/media/\" re.group.1 \"_sitemap.xml\"; }}"
}

In the sitemap admin config, you must specify the location of the file using pub/media/ rather than /.

Configure indexing by search engine

To activate robots.txt customizations in Production, you must enable the Indexing by search engines is On for <environment-name> option in your project settings.

Use the Cloud Console to manage environments

NOTE
  • Indexing by search engines can only be enabled in Production, but not in any of the lower environments.

  • If you are using PWA Studio and are unable to access your configured robots.txt file, add robots.txt to the Front Name Allowlist at Stores > Configuration > General > Web > UPWARD PWA Configuration.

recommendation-more-help
05f2f56e-ac5d-4931-8cdb-764e60e16f26