This article provides best practices for using robots.txt
and sitemap.xml
files in Adobe Commerce, including configuration and security. These files instruct web crawlers (typically search engine robots) how to crawl pages on a website. Configuring these files can improve site performance and search engine optimization.
These best practices apply to projects using the native Adobe Commerce storefront only. They do not apply to Adobe Commerce projects that use other storefront solutions (for example, Adobe Experience Manager, headless).
A default Adobe Commerce project contains a hierarchy that includes a single website, store, and store view. For more complex implementations, you can create additional websites, stores, and store views for a multi-site storefront.
Follow these best practices when configuring the robots.txt
and sitemap.xml
files for single-site storefronts:
Make sure that your project is using ece-tools
version 2002.0.12 or later.
Use the Admin application to add content to the robots.txt
file.
View the auto-generated robots.txt
file for your store at <domain.your.project>/robots.txt
.
Use the Admin application to generate a sitemap.xml
file.
Due to the read-only file system on Adobe Commerce on cloud infrastructure projects, you must specify the pub/media
path before generating the file.
Use a custom Fastly VCL snippet to redirect from the root of your site to the pub/media/
location for both files:
{
"name": "sitemaprobots_rewrite",
"dynamic": "0",
"type": "recv",
"priority": "90",
"content": "if ( req.url.path ~ \"^/?sitemap.xml$\" ) { set req.url = \"pub/media/sitemap.xml\"; } else if (req.url.path ~ \"^/?robots.txt$\") { set req.url = \"pub/media/robots.txt\";}"
}
Test the redirect by viewing the files in a web browser. For example, <domain.your.project>/robots.txt
and <domain.your.project>/sitemap.xml
. Make sure you are using the root path that you configured the redirect for and not a different path.
See Add site map and search engine robots for detailed instructions.
You can set up and run several stores with a single implementation of Adobe Commerce on cloud infrastructure. See Set up multiple websites or stores.
The same best practices for configuring the robots.txt
and sitemap.xml
files for single-site storefronts applies to multi-site storefronts with two important differences:
Make sure that the robots.txt
and sitemap.xml
file names contain the names of the corresponding sites. For example:
domaineone_robots.txt
domaintwo_robots.txt
domainone_sitemap.xml
domaintwo_sitemap.xml
Use a slightly modified custom Fastly VCL snippet to redirect from the root of your sites to the pub/media
location for both files across your sites:
{
"name": "sitemaprobots_rewrite",
"dynamic": "0",
"type": "recv",
"priority": "90",
"content": "if ( req.url.path == \"/robots.txt\" ) { if ( req.http.host ~ \"(domainone|domaintwo).com$\" ) { set req.url = \"pub/media/\" re.group.1 \"_robots.txt\"; }} else if ( req.url.path == \"/sitemap.xml\" ) { if ( req.http.host ~ \"(domainone|domaintwo).com$\" ) { set req.url = \"pub/media/\" re.group.1 \"_sitemap.xml\"; }}"
}
Use the Admin application to configure the robots.txt
and sitemap.xml
files to prevent bots from scanning and indexing unnecessary content (see Search Engine Robots).
For on-premises deployments, where you write the files depends on how you installed Adobe Commerce. Write the files to /path/to/commerce/pub/media/
or /path/to/commerce/media
, whichever is right for your installation.
Do not expose your Admin path in your robots.txt
file. Having the Admin path exposed is a vulnerability for site hacking and potential loss of data. Remove the Admin path from the robots.txt
file.
For steps to edit the robots.txt
file and remove all entries of the Admin path, see Marketing User Guide > SEO and Search > Search Engine Robots.
If you need help, submit an Adobe Commerce Support ticket.