This tab has information that explains how to identify if and what bots are causing site problems.
robots.txtfile or settings in a search engine console. Boundaries can restrict access to the site or parts of the site.
robots.txtfile or they may spoof a good bot through the request user agent field of the HTTP request data. Some things that malicious bots do:
robots.txt fieldto restrict or limit the rate of site access.
The Experimental Potential Malicious Bots frame frame runs over 12 separate, complex queries. It detects malicious IP request signatures and then aggregates the results, sums and sorts them by count in descending order. The queries contain a multitude of data signatures of CVE exploits and other malicious requests. Even when the exploits are blocked by security fixes/patches and are a non-threat to the site, the request still has to be handled by the website. The volume of requests can become quite significant in a short period of time. This frame does not show total requests from the IP address but rather requests that have signals that indicate that the request had suspicious intent.
Make sure to verify that the traffic is suspicious and it does not originate from a Content Distributed Network (CDN) address which may also be delivering valid requests. If the requests are determined to be from a CDN IP address, please contact that service supplier to help with blocking the suspicious traffic through their network. If you need to block the address or request URL, refer to Block malicious traffic for Adobe Commerce on Fastly level in Adobe Commerce Support Knowledge Base.
The Rate of HTTP request per second (top 25) during requested time period frame shows the highest requests per second IP addresses during the selected time frame. If these addresses are also in the table above, ensure that they are not CDN addresses and malicious and block them via Fastly.
The Total Bot traffic by bot name during selected time period table contains the aggregated count of non-cached requests where the request_user_agent field has a string of bots in the value. This may or may not be the named bot as the request_user_agent field value can be spoofed. The value under the Count column is the most important.
The Total Bot Traffic by Bot name/IP address during selected time period How to block bot traffic on Fastly level OR manage bots through your robots.txt file Best practices for Adobe Commerce robots.txt table shows the same data as the previous table, but adds IP addresses making the requests on behalf of the named bot. As malicious bots spoof good bots, the IP address(es) should be verified through websites that identify abusive IP addresses or through whois services or DNS lookups. For example, Google publishes their googlebot IP addresses and Microsoft has a verifying tool for Bingbots.
The Graph - Bots with HTTP status errors during selected time period How to block bot traffic on Fastly level OR manage bots through your robots.txt file Best practices for Adobe Commerce robots.txt graph shows errors on bots that declare themselves in the request user agent field. This does not necessarily mean that the error is caused by volume from the bot or other traffic. The errors could be that the bot is requesting information that does not exist or there is another problem in the request.
If there is a spike of errors on IP addresses during site instability or outage, they could be suspects in the site problem.
The Table - IPs that do not identify as bots with HTTP status errors during selected time period How to block bot traffic on Fastly level OR manage bots through your robots.txt file Best practices for Adobe Commerce robots.txt table will show IP requests with non-200 http status codes that DO NOT self-identify as bots in the request user agent field. These IP addresses could be malicious IP addresses, especially if the counts are high for the selected time period.
If the non-200 http status code counts are low and the IP address ranges are not similar, the addresses might not be contributing to the site issues.
When IP addresses are generating a high frequency of errors, ask what are they doing? The Table – Cache Status ‘ERROR’ detail table (what are these IPs doing?) How to block bot traffic on Fastly level OR manage bots through your robots.txt file Best practices for Adobe Commerce robots.txt table will show the requested URL along with the HTTP status value for requests that have a cache status ERROR value. The frequency is faceted by URL so the count may be low. Remember that the IP address may be making thousands of requests during the selected time period. This is a view against up to 2000 requests during the time frame (the record display limit).
The Show 5XX status distribution across IP addresses (top 200 addresses) How to block bot traffic on Fastly level OR manage bots through your robots.txt file Best practices for Adobe Commerce robots.txt frame is powerful. It shows the IP addresses that have 5XX http status codes during the selected time period. If an IP address is making a high volume of requests and the site is impacted to the point where it cannot handle the traffic, then the IP addresses that are making the highest frequency of requests will typically have the highest volume of errors. 5XX http status codes typically indicate a site that is struggling to respond to requests.
The wider the bar, the larger the % of errors that the IP address has in the total number of 5xx errors during that time period. Note: an IP address might have multiple segments in the graph if it has multiple http status codes (example 502 and 503 http statuses).
Typical distribution would be indicated toward the right side of the bar where the IP addresses are equal in width or there would be a few wide bars with very low counts.
If you hover over the bar segment, it will show the number of the indicated errors during the selected time period.
This IP cache status (MISS, PASS, ERROR) and HTTP status during selected time period How to block bot traffic on Fastly level OR manage bots through your robots.txt file Best practices for Adobe Commerce robots.txt frame shows the HTTPS status code count and non-cached requests by IP across the selected time frame. This indicates the proportional load from each IP address and the total volume. It will show the IP addresses with the most requests.
If you click on the Error icon in the below graph, you can compare the last two graphs to each other. This can help indicate where load contributes to site problems.
The Graph - IPs that do not identify as bots without error during selected time period How to block bot traffic on Fastly level OR manage bots through your robots.txt file Best practices for Adobe Commerce robots.txt frame shows the request user agent field, the IP address, and status code for requests where the request user agent field does not indicate a bot. This frame may show high frequency requests from any IP address but pay attention to high frequency requests, especially during a period of time where the site may have issues.
The Graph - Suspicious Non-Bot traffic during selected time period graph looks for a request user agent value of Go-http-client but will be extended to look at other suspicious request user agent values. This request user agent value is used by sites for connecting from services and may be valid but is also used by malicious bots.
The Graph - Bot traffic by Bot name during selected time period frame is showing the same data as the Total Bot traffic by Bot name during selected time period table at the top of the tab. It is showing the data via the timeline so that you can see when the requests by the bots are being made and their distributions.
The Graph - Top 250 Bot Names and IP addresses during selected time period How to block bot traffic on Fastly level OR manage bots through your robots.txt file Best practices for Adobe Commerce robots.txt frame is showing the same data as the Total Bot Traffic by Bot name/IP address during selected time period table at the top of the tab. It is showing the data via the timeline and faceting it by IP address. This shows when the requests by the bots are made, which IP is making requests, and the distributions of the requests.
The Blocked Bot name / IP addresses (in Fastly) during selected time period. This graph displays bot traffic and IPs that were returned a 403 Forbidden HTTP Status code frame shows the bot name and IP addresses that are blocked. You can see in this graph how all requests are blocked in Fastly going forward.
The Blocked non-Bot name / IP addresses (in Fastly) during selected time period graph displays non-bot traffic and IPs that were returned a 403 Forbidden HTTP Status code frame shows IP addresses that do not identify as a bot that have been blocked through Fastly.
Malicious bots often spoof other bots through the value of the Request User Agent field. This table shows how many unique values the IP address has in that field. The higher the value in the Request User Agent field, the more suspicious the IP address is.
The IP with non-200 status errors – without 403 status frame is showing the distribution across the selected timeframe of IP addresses with HTTP status codes other than 200. When you see higher values on a single IP or group of IP addresses, they require further investigation.
The IP with 403 status codes frame shows non-cached requests without cache_status=ERROR that have a HTTP status that is 403. This may show that the origin server is the source of the 403 (unauthorized) rather than a block from Fastly.
The Top 5 with non-200 status codes showing cache_status table is showing at an IP / status level the counts of each with the cache_status value.
The Pageview Latency will show as spikes on this graph: frame shows page load/API response latency that may be in line with the bot traffic.