Sudden bandwidth spike in AEMaaCS

In Adobe Experience Manager as a Cloud Service, the customer’s WAF reported a sudden bandwidth spike despite stable traffic. The spike started on 25 Aug 2025 and persisted even after rolling back the AEM Cloud release. To fix this issue, optimize or replace the heavy asset causing the spike.

Description description

Environment

  • Product: Adobe Experience Manager as a Cloud Service – Sites
  • Customer: ACE Experience Cloud
  • Instance: cm-p92036-e820059 (Production)
  • Upstream WAF: Imperva (customer-managed, in front of Adobe-managed Fastly CDN)

Issue/Symptoms

  • Customer’s WAF team reported a 108% increase in average bandwidth starting 25 Aug 2025
  • Hits / visits remained largely flat
  • No recent go-lives or traffic campaigns
  • Initial suspicion that Adobe automatic releases on 25 Aug and 8 Sep might be responsible

Cause

  • Bandwidth spike was triggered by a large image asset introduced on 25 Aug 2025:

    • /content/dam/mobileapp/home_screen/adobestock-317865631.jpeg
  • Asset characteristics:

    • Large file size
    • Requested tens of thousands of times per day
      • Likely due to placement on a heavily visited page or external deep-linking
  • Imperva calculates bandwidth at the WAF edge:

    • Combination of large size and high request volume caused 108% increase in measured Mbps
  • Overall site metrics remained stable:

    • Request volume and visits did not significantly change
    • AEM Cloud infrastructure and releases were functioning normally
  • Rollback of AEM Cloud release did not alter traffic pattern:

    • Confirms issue originated from customer content changes, not Adobe platform changes

Resolution resolution

To fix this issue, follow these steps:

  1. Verify AEM platform health and release impact

    1. Check AEM production health (author/publish pods, error logs).

    2. Confirm that regular AEM Cloud monthly releases (25 Aug, 8 Sep) do not change customer content or static assets and a rollback is unlikely to reduce external Mbps on its own.

      Note: Per customer request, rollback of Prod/Stage to version 2025.07.21772 was executed via SKYOPS-120450. Bandwidth spike persisted, ruling out the release as the root cause.

  2. Analyze CDN metrics to find the source of bandwidth

    1. Use Splunk/Skyline dashboards to compare content request counts before and after the spike.

    2. Monitor cache hit ratio and review request volume before and after 25 Aug:

      • Aug 18–23: 9K requests.
      • Aug 25–30: 29K requests.
      • Sept 1–6: 60K requests.
    3. Generate and share top URL and content-request reports.

  3. Run CDN/DaaS analysis to find URLs with large response sizes.

    1. Validate findings with Splunk and Databricks outputs.
    2. Optimize or remove the identified heavy asset.
  4. Confirm traffic normalization via Fastly and Imperva reports.

References

  • Jira: SKYOPS-122665 – “[ SLA3] Sudden 108% Bandwidth Spike – ACE Experience Cloud (cm-p92036-e820059)”

  • Related Jira: SKYOPS-120450 – rollback of Cloud Release for isolation (no impact on spike)

  • Internal tools:

    • Skyline / Splunk CDN dashboards (skyline__cdn__cache_hit_ratio_and_coverage_over_time, edge log searches)
    • DaaS Databricks notebook for top URLs by resp_body_size
recommendation-more-help
3d58f420-19b5-47a0-a122-5c9dab55ec7f