Minimal MongoDB Deployment for AEM

Below is a minimal deployment for AEM on MongoDB. For simplicity, SSL termination and HTTP Proxy components have been generalized. It consists of a single MongoDB replica set, with one primary and two secondaries.

chlimage_1-4

A minimal deployment requires three mongod instances configured as a replica set. One instance is elected primary with the other instances being secondaries, with the election managed by mongod. Attached to each instance is a local disk. So, the cluster can support the load, a minimum throughput of 12 MB per second with more than 3000 I/O Operations per Second (IOPS) is recommended.

The AEM authors are connected to the mongod instances, with each AEM author connecting to all three mongod instances. Writes are sent to the primary and reads may be read from any of the instances. Traffic is distributed based on load by a Dispatcher to any one of the active AEM author instances. The Oak data store is a FileDataStore, and MongoDB monitoring is provided by MMS or MongoDB Ops Manager depending on the location of the deployment. Operating system level and log monitoring is provided by third-party solutions like Splunk or Ganglia.

In this deployment, all the components are required for a successful implementation. Any missing component leaves the implementation non-functional.

Operating Systems

For a list of supported operating systems for AEM 6, see the Technical Requirements page.

Environments

Virtualized environments are supported provided there is good communication between the different technical teams running the project. This support includes the team that is running AEM, the team owning the operating system, and the team managing the virtualized infrastructure.

There are specific requirements covering the I/O capacity of the MongoDB instances that must be managed by the team managing the virtualized environment. If the project uses a cloud deployment, such as Amazon Web Services, instances must be provisioned with sufficient I/O capacity and consistency to support the MongoDB instances. Otherwise, the MongoDB processes and the Oak repository perform unreliably and erratically.

In virtualized environments, MongoDB requires specific I/O and VM configurations to ensure that the storage engine of MongoDB is not crippled by VMWare resource allocation policies. A successful implementation ensures that there are no barriers between the various teams and all are signed up to deliver the performance required.

Hardware Considerations

Storage

To achieve the read and write throughput for best performance without the need for premature horizontal scaling, MongoDB generally requires SSD storage or storage with performance equivalent to SSD.

RAM

MongoDB versions 2.6 and 3.0 that use the MMAP storage engine require that the working set of the database and its indexes fits into RAM.

Insufficient RAM results in a significant reduction of performance. The size of the working set and of the database is highly application-dependent. While some estimates can be made, the most reliable way of determining the amount of RAM required is building the AEM application and load testing it.

To assist with the load testing process, the following ratio of working set to total database size can be assumed:

  • 1:10 for SSD Storage
  • 1:3 for Hard Disk Storage

These ratios mean that for SSD deployments, 200 GB of RAM is required for a 2 TB database.

While the same limitations apply to the WiredTiger storage engine in MongoDB 3.0, the correlation between the working set, RAM, and page faults is not so strong. WiredTiger does not use memory mapping in the same way the MMAP storage engine does.

NOTE
Adobe recommends using the WiredTiger storage engine for AEM 6.1 deployments that are using MongoDB 3.0.

Data Store

Due to the MongoDB working set limitations, it is recommended that the data store is maintained independent from the MongoDB. In most environments, a FileDataStore using a NAS available to all AEM instances should be used. For situations where the Amazon Web Services are used, there is also an S3 DataStore. If for any reason, the data store is maintained within MongoDB, the size of the datastore should be added to the total database size, and the working set calculations adjusted appropriately. This sizing may mean provisioning more RAM to maintain performance without page faults.

Monitoring

Monitoring is vital for a successful implementation of the project. With sufficient knowledge, it is possible to run AEM on MongoDB without monitoring. However, that knowledge is normally found in engineers specialized for each section of the deployment.

This specialized knowledge typically involves an R&D engineer working on the Apache Oak Core and a MongoDB specialist.

Without monitoring at all levels, detailed knowledge of the code base is required to diagnose issues. With monitoring in place and suitable guidance on the major statistics, implementation teams can react appropriately to anomalies.

While it is possible to use command-line tools to get a quick snapshot of the operation of a cluster, doing that in real time over many hosts is almost impossible. Command-line tools rarely give historical information beyond a few minutes and never allow cross correlation between different types of metrics. A brief period of slow background mongod sync requires significant manual effort to correlate against I/O Wait or excessive write levels to a shared storage resource from an apparently unconnected virtual machine.