Architecture
A typical AEM setup consists of an author and a publish environment. These environments have different requirements regarding the underlying hardware size and system configuration. Detailed considerations for both environments are described in the author environment and publish environment sections.
In a typical project setup, you have several environments on which to stage project phases:
-
Development environment
To develop new features or make significant changes. Best practise is to work using a development environment per developer (usually local installations on their personal systems). -
Author test environment
To verify changes. The number of test environments can vary depending on the project requirements (for example, separate for QA, integration testing, or user acceptance testing). -
Publish test environment
Primarily for testing social collaboration use cases and/or the interaction between author and multiple publishing instances. -
Author production environment
For authors to edit content. -
Publish production environment
To serve published content.
Additionally the environments may vary, ranging from a single-server system running AEM and an application server, through to a highly scaled set of multi-server, multi-CPU clustered instances. We recommend that you use a separate computer for each production system and that you do not run other applications on these computers.
Generic hardware sizing considerations
The sections below provide guidance on how to calculate hardware requirements, taking various considerations into account. For large systems we suggest that you perform a simple set of in-house benchmark tests on a reference configuration.
Performance optimization is a fundamental task that needs to be performed before any benchmarking for a specific project can be done. Please make sure to apply the advice provided in the Performance Optimization documentation before performing any benchmark tests and using their results for any hardware sizing calculations.
Hardware sizing requirements for advanced use cases need to be based on a detailed performance assessment of the project. Characteristics of advanced use cases requiring exceptional hardware resources include combinations of:
- high content payload / throughput
- extensive use of customized code, custom workflows or 3rd party software libraries
- integration with unsupported external systems
Disk Space/ Hard Drive
The disk space required depends heavily on both the volume and type of your web application. The calculations should take into account:
- the quantity and size of pages, assets and other repository-stored entities such as workflows, profiles etc.
- the estimated frequency of content changes and therefore the creation of content versions
- the volume of DAM asset renditions that will be generated
- the overall growth of content over time
Disk space is continuously monitored during Online, and Offline, Revision Cleanup. Should the available disk space drop below a critical value, the process will be cancelled. The critical value is 25% of the current disk footprint of the repository and it is not configurable. It is recommended to size the disk at least two or three times larger than the repository size including the estimated growth.
Consider a setup of redundant arrays of independent disks (RAID, e.g. RAID10) for data redundancy.
Virtualization
AEM runs well in virtualized environments, but there can be factors such as CPU or I/O that cannot be directly equated to physical hardware. A recommendation is to choose a higher I/O speed (in general) as this is a critical factor in most cases. Benchmarking your environment is necessary to get a precise understanding of what resources will be required.
Parallelization of AEM Instances
Fail Safeness
A fail-safe website is deployed on at least two separate systems. If one system breaks down, an other system can take over and thus compensate the system failure.
System resources scalability
While all systems are running, an increased computational performance is available. That additional performance is not necessarily linear with the number of cluster nodes as the relationship is highly dependent on the technical environment; please see the Cluster documentation for more information.
The estimation of how many cluster nodes are necessary is based on the basic requirements and specific use-cases of the particular web project:
- From the perspective of fail-safeness it is necessary to determine, for all environments, how critical failure is and the failure compensation time based on how long it takes for a cluster node to recover.
- For the aspect of scalability, the number of write operations is basically the most important factor; see Authors Working in Parallel for the author environment and Social Collaboration for the publish environment. Load balancing can be established for operations that access the system solely to process read operations; see Dispatcher for details.
Author environment specific calculations
For benchmarking purposes, Adobe has developed some benchmark tests for standalone author instances.
-
Benchmark test 1
Calculate maximum throughput of a load profile where users perform a simple create page exercise on top of a base load of 300 existing pages all of a similar nature. The steps involved were logging in to the site, creating a page with a SWF and Image/Text, adding a tag cloud, then activating the page.
-
Result
Maximum throughput for a simple page creation exercise such as above (considered as one transaction) was found to be 1730 transactions/hour.
-
-
Benchmark test 2
Calculate maximum throughput when load profile has a mix of fresh page creation (10%), modification of an existing page (80%) and creation then modification of a page in succession (10%). The complexity of the pages remains the same as in the profile of benchmark test 1. Basic modification of the page is done by adding an image and modifying the text content. Again, the exercise was performed on top of a base load of 300 pages of the same complexity as defined in benchmark test 1.
-
Result
Maximum throughput for such a mix operation scenario was found to be 3252 transactions per hour.
-
The above two tests clearly highlight that the throughput varies according to the type of operation. Use the activities on your environment as a base for sizing your system. You will get better throughput with less intensive actions such as modify (which is also more common).
Caching
In the author environment the caching efficiency is typically much lower, because changes to the website are more frequent and also the content is highly interactive and personalized. Using the dispatcher, you can cache AEM libraries, JavaScripts, CSS files and layout images. This speeds up some aspects of the authoring process. Configuring the webserver to additionally set headers for browser caching on these resources, will reduce the number of HTTP requests and so improve the system responsiveness as experienced by the authors.
Authors Working in Parallel
In the author environment the number of authors that work in parallel and the load their interactions add to the system are the main limiting factors. Therefore we recommend that you scale your system based on the shared throughput of data.
For such scenarios Adobe executed benchmark tests on a two node shared-nothing cluster of author instances.
-
Benchmark test 1a
With an active-active shared-nothing cluster of 2 author instances, calculate the maximum throughput with a load profile where users perform a simple create page exercise on top of a base load of 300 existing pages, all of a similar nature.
-
Result
Maximum throughput for a simple page creation exercise, such as above, (considered as one transaction) is found to be 2016 transactions/hour. This is an increase of approximately 16% when compared to a standalone author instance for the same benchmark test.
-
-
Benchmark test 2b
With an active-active shared-nothing cluster of 2 author instances, calculate the maximum throughput when the load profile has a mix of fresh page creation (10%), modification of an existing pages (80%) and creation and modification a page in succession (10%). The complexity of the page remains the same as in the profile of benchmark test 1. Basic modification of the page is done by adding an image and modifying the text content. Again, the exercise was performed on top of a base load of 300 pages of complexity the same as defined in benchmark test 1.
-
Result
Maximum throughput for such a mixed operation scenario was found to be 6288 transactions/hour. This is an increase of approximately 93% when compared to a standalone author instance for the same benchmark test.
-
The above two tests clearly highlight that AEM scales well for authors who are performing basic edit operations with AEM. In general, AEM is most effective in scaling read operations.
On a typical website, most authoring happens during the project phase. After the website goes live the number of authors working in parallel usually sinks to a lower (operational mode) average.
You can calculate the number of computers (or CPUs) required for the author environment as follows:
n = numberOfParallelAuthors / 30
This formula can serve as a general guideline for scaling CPUs when authors are performing basic operations with AEM. It assumes that the system and the application are optimized. However, the formula will not hold true for advanced features such as MSM or Assets (see the sections below).
Please also see the additional comments on Parallelization and Performance Optimization.