The bots tab

Last update: June 27, 2023

CREATED FOR:

Experienced
Admin
Developer

This tab has information that explains how to identify if and what bots are causing site problems.

High level overview of bots:

A bot is a piece of software that runs repetitive automated tasks. With artificial intelligence and machine learning evolution, the tasks, methods, and interactions of bots are changing. There are good bots that benefit sites by crawling and adding them to internet search engines. This results in internet users being guided to the site through search engine results. A good bot typically respects boundaries placed on the bot by a robots.txt file or settings in a search engine console. Boundaries can restrict access to the site or parts of the site.
Malicious bots ignore the robots.txt file or they may spoof a good bot through the request user agent field of the HTTP request data. Some things that malicious bots do:
- Add load to a site to deny legitimate users access to the site.
- Scrape and reuse content without permission.
- Register fake accounts to flood email services or addresses or redirect to other sites (SPAM bots).
- Create fake views (Viewbots).
- Buy up products or tickets (Focused bots).
Managing bots
- Observation for Adobe Commerce has views of bot traffic:
  - It shows total non-cached bot activity which displays the load that a bot is adding to a site and when that load is happening.
  - It shows the bots that are generating errors. Typically if a bot is adding load that causes site problems, that bot or IP address has the highest frequency of errors.
  - It shows bot names (request user agent field values) and IP addresses to manage through:
    - Fastly (rate-limiting or VCLs which block IP addresses, ranges, or bots by name value).
    - Adding good bot information to the robots.txt field to restrict or limit the rate of site access.
    - Managing Bing or Google bots through the search engine console.

Experimental Potential Malicious Bots frame

The Experimental Potential Malicious Bots frame frame runs over 12 separate, complex queries. It detects malicious IP request signatures and then aggregates the results, sums and sorts them by count in descending order. The queries contain a multitude of data signatures of CVE exploits and other malicious requests. Even when the exploits are blocked by security fixes/patches and are a non-threat to the site, the request still has to be handled by the website. The volume of requests can become quite significant in a short period of time. This frame does not show total requests from the IP address but rather requests that have signals that indicate that the request had suspicious intent.

Make sure to verify that the traffic is suspicious and it does not originate from a Content Distributed Network (CDN) address which may also be delivering valid requests. If the requests are determined to be from a CDN IP address, please contact that service supplier to help with blocking the suspicious traffic through their network. If you need to block the address or request URL, refer to Block malicious traffic for Adobe Commerce on Fastly level in Adobe Commerce Support Knowledge Base.

Rate of HTTP request per second (top 25) during requested time period

The Rate of HTTP request per second (top 25) during requested time period frame shows the highest requests per second IP addresses during the selected time frame. If these addresses are also in the table above, ensure that they are not CDN addresses and malicious and block them via Fastly.

Total Bot traffic by bot name:

Total Bot traffic by bot name during selected time period:

The Total Bot traffic by bot name during selected time period table contains the aggregated count of non-cached requests where the request_user_agent field has a string of bots in the value. This may or may not be the named bot as the request_user_agent field value can be spoofed. The value under the Count column is the most important.

Total Bot Traffic by Bot name/IP address

Total Bot Traffic by Bot name/IP address during selected time period How to block bot traffic on Fastly level OR manage bots through your robots.txt file Best practices for Adobe Commerce robots.txt

The Total Bot Traffic by Bot name/IP address during selected time period How to block bot traffic on Fastly level OR manage bots through your robots.txt file Best practices for Adobe Commerce robots.txt table shows the same data as the previous table, but adds IP addresses making the requests on behalf of the named bot. As malicious bots spoof good bots, the IP address(es) should be verified through websites that identify abusive IP addresses or through whois services or DNS lookups. For example, Google publishes their googlebot IP addresses and Microsoft has a verifying tool for Bingbots.

Graph - Bots with HTTP status errors

Graph - Bots with HTTP status errors during selected time period How to block bot traffic on Fastly level OR manage bots through your robots.txt file Best practices for Adobe Commerce robots.txt

The Graph - Bots with HTTP status errors during selected time period How to block bot traffic on Fastly level OR manage bots through your robots.txt file Best practices for Adobe Commerce robots.txt graph shows errors on bots that declare themselves in the request user agent field. This does not necessarily mean that the error is caused by volume from the bot or other traffic. The errors could be that the bot is requesting information that does not exist or there is another problem in the request.

If there is a spike of errors on IP addresses during site instability or outage, they could be suspects in the site problem.

Table - IPs that do not identify as bots

Table - IPs that do not identify as bots with HTTP status errors during selected time period How to block bot traffic on Fastly level OR manage bots through your robots.txt file Best practices for Adobe Commerce robots.txt

The Table - IPs that do not identify as bots with HTTP status errors during selected time period How to block bot traffic on Fastly level OR manage bots through your robots.txt file Best practices for Adobe Commerce robots.txt table will show IP requests with non-200 http status codes that DO NOT self-identify as bots in the request user agent field. These IP addresses could be malicious IP addresses, especially if the counts are high for the selected time period.

If the non-200 http status code counts are low and the IP address ranges are not similar, the addresses might not be contributing to the site issues.

Table – Cache Status ‘ERROR’

Table – Cache Status 'ERROR' detail table (what are these IPs doing?) How to block bot traffic on Fastly level OR manage bots through your robots.txt file Best practices for Adobe Commerce robots.txt

When IP addresses are generating a high frequency of errors, ask what are they doing? The Table – Cache Status ‘ERROR’ detail table (what are these IPs doing?) How to block bot traffic on Fastly level OR manage bots through your robots.txt file Best practices for Adobe Commerce robots.txt table will show the requested URL along with the HTTP status value for requests that have a cache status ERROR value. The frequency is faceted by URL so the count may be low. Remember that the IP address may be making thousands of requests during the selected time period. This is a view against up to 2000 requests during the time frame (the record display limit).

Show 5XX status distribution

Show 5XX status distribution across IP addresses (top 200 addresses) How to block bot traffic on Fastly level OR manage bots through your robots.txt file Best practices for Adobe Commerce robots.txt

The Show 5XX status distribution across IP addresses (top 200 addresses) How to block bot traffic on Fastly level OR manage bots through your robots.txt file Best practices for Adobe Commerce robots.txt frame is powerful. It shows the IP addresses that have 5XX http status codes during the selected time period. If an IP address is making a high volume of requests and the site is impacted to the point where it cannot handle the traffic, then the IP addresses that are making the highest frequency of requests will typically have the highest volume of errors. 5XX http status codes typically indicate a site that is struggling to respond to requests.

The wider the bar, the larger the % of errors that the IP address has in the total number of 5xx errors during that time period. Note: an IP address might have multiple segments in the graph if it has multiple http status codes (example 502 and 503 http statuses).

Typical distribution would be indicated toward the right side of the bar where the IP addresses are equal in width or there would be a few wide bars with very low counts.

If you hover over the bar segment, it will show the number of the indicated errors during the selected time period.

IP cache status (MISS, PASS, ERROR) and HTTP status

IP cache status (MISS, PASS, ERROR) and http status during selected time period How to block bot traffic on Fastly level OR manage bots through your robots.txt file Best practices for Adobe Commerce robots.txt

This IP cache status (MISS, PASS, ERROR) and HTTP status during selected time period How to block bot traffic on Fastly level OR manage bots through your robots.txt file Best practices for Adobe Commerce robots.txt frame shows the HTTPS status code count and non-cached requests by IP across the selected time frame. This indicates the proportional load from each IP address and the total volume. It will show the IP addresses with the most requests.

Fastly Cache Summary for selected time period

If you click on the Error icon in the below graph, you can compare the last two graphs to each other. This can help indicate where load contributes to site problems.

Fastly Error check

Graph - IPs that do not identify as bots

IPs that do not identify as bots without error during selected time period How to block bot traffic on Fastly level OR manage bots through your robots.txt file Best practices for Adobe Commerce robots.txt

The Graph - IPs that do not identify as bots without error during selected time period How to block bot traffic on Fastly level OR manage bots through your robots.txt file Best practices for Adobe Commerce robots.txt frame shows the request user agent field, the IP address, and status code for requests where the request user agent field does not indicate a bot. This frame may show high frequency requests from any IP address but pay attention to high frequency requests, especially during a period of time where the site may have issues.

Graph - Suspicious Non-Bot traffic

Suspicious Non-Bot traffic during selected time period

The Graph - Suspicious Non-Bot traffic during selected time period graph looks for a request user agent value of Go-http-client but will be extended to look at other suspicious request user agent values. This request user agent value is used by sites for connecting from services and may be valid but is also used by malicious bots.

Graph - Bot traffic by Bot name

Graph - Bot traffic by Bot name during selected time period)

The Graph - Bot traffic by Bot name during selected time period frame is showing the same data as the Total Bot traffic by Bot name during selected time period table at the top of the tab. It is showing the data via the timeline so that you can see when the requests by the bots are being made and their distributions.

Graph - Top 250 Bot Names and IP addresses

Top 250 Bot Names and IP addresses during selected time period How to block bot traffic on Fastly level OR manage bots through your robots.txt file Best practices for Adobe Commerce robots.txt

The Graph - Top 250 Bot Names and IP addresses during selected time period How to block bot traffic on Fastly level OR manage bots through your robots.txt file Best practices for Adobe Commerce robots.txt frame is showing the same data as the Total Bot Traffic by Bot name/IP address during selected time period table at the top of the tab. It is showing the data via the timeline and faceting it by IP address. This shows when the requests by the bots are made, which IP is making requests, and the distributions of the requests.

Blocked Bot name / IP addresses (in Fastly)

Blocked Bot name / IP addresses (in Fastly) during selected time period. This graph displays bot traffic and IPs that were returned a 403 Forbidden HTTP Status code

The Blocked Bot name / IP addresses (in Fastly) during selected time period. This graph displays bot traffic and IPs that were returned a 403 Forbidden HTTP Status code frame shows the bot name and IP addresses that are blocked. You can see in this graph how all requests are blocked in Fastly going forward.

Blocked non-Bot name / IP addresses (in Fastly)

Blocked non-Bot name / IP addresses (in Fastly) during selected time period. This graph displays non-bot traffic and IPs that were returned a 403 Forbidden HTTP Status code

The Blocked non-Bot name / IP addresses (in Fastly) during selected time period graph displays non-bot traffic and IPs that were returned a 403 Forbidden HTTP Status code frame shows IP addresses that do not identify as a bot that have been blocked through Fastly.

This table shows the number of user agents per IP address, number of successful, unsuccessful and blocked requests:

Malicious bots often spoof other bots through the value of the Request User Agent field. This table shows how many unique values the IP address has in that field. The higher the value in the Request User Agent field, the more suspicious the IP address is.

IP with non-200 status errors

IP with non-200 status errors – without 403 status

The IP with non-200 status errors – without 403 status frame is showing the distribution across the selected timeframe of IP addresses with HTTP status codes other than 200. When you see higher values on a single IP or group of IP addresses, they require further investigation.

IP with 403 status codes:

The IP with 403 status codes frame shows non-cached requests without cache_status=ERROR that have a HTTP status that is 403. This may show that the origin server is the source of the 403 (unauthorized) rather than a block from Fastly.