Minimal MongoDB Deployment for AEM
Below is a minimal deployment for AEM on MongoDB. For simplicity, SSL termination and HTTP Proxy components have been generalized. It consists of a single MongoDB replica set, with one primary and two secondaries.
A minimal deployment requires three mongod
instances configured as a replica set. One instance is elected primary with the other instances being secondaries, with the election managed by mongod
. Attached to each instance is a local disk. So, the cluster can support the load, a minimum throughput of 12 MB per second with more than 3000 I/O Operations per Second (IOPS) is recommended.
The AEM authors are connected to the mongod
instances, with each AEM author connecting to all three mongod
instances. Writes are sent to the primary and reads may be read from any of the instances. Traffic is distributed based on load by a Dispatcher to any one of the active AEM author instances. The Oak data store is a FileDataStore
, and MongoDB monitoring is provided by MMS or MongoDB Ops Manager depending on the location of the deployment. Operating system level and log monitoring is provided by third-party solutions like Splunk or Ganglia.
In this deployment, all the components are required for a successful implementation. Any missing component leaves the implementation non-functional.
Operating Systems
For a list of supported operating systems for AEM 6, see the Technical Requirements page.
Environments
Virtualized environments are supported provided there is good communication between the different technical teams running the project. This support includes the team that is running AEM, the team owning the operating system, and the team managing the virtualized infrastructure.
There are specific requirements covering the I/O capacity of the MongoDB instances that must be managed by the team managing the virtualized environment. If the project uses a cloud deployment, such as Amazon Web Services, instances must be provisioned with sufficient I/O capacity and consistency to support the MongoDB instances. Otherwise, the MongoDB processes and the Oak repository perform unreliably and erratically.
In virtualized environments, MongoDB requires specific I/O and VM configurations to ensure that the storage engine of MongoDB is not crippled by VMWare resource allocation policies. A successful implementation ensures that there are no barriers between the various teams and all are signed up to deliver the performance required.
Hardware Considerations
Storage
To achieve the read and write throughput for best performance without the need for premature horizontal scaling, MongoDB generally requires SSD storage or storage with performance equivalent to SSD.
RAM
MongoDB versions 2.6 and 3.0 that use the MMAP storage engine require that the working set of the database and its indexes fits into RAM.
Insufficient RAM results in a significant reduction of performance. The size of the working set and of the database is highly application-dependent. While some estimates can be made, the most reliable way of determining the amount of RAM required is building the AEM application and load testing it.
To assist with the load testing process, the following ratio of working set to total database size can be assumed:
- 1:10 for SSD Storage
- 1:3 for Hard Disk Storage
These ratios mean that for SSD deployments, 200 GB of RAM is required for a 2 TB database.
While the same limitations apply to the WiredTiger storage engine in MongoDB 3.0, the correlation between the working set, RAM, and page faults is not so strong. WiredTiger does not use memory mapping in the same way the MMAP storage engine does.
Data Store
Due to the MongoDB working set limitations, it is recommended that the data store is maintained independent from the MongoDB. In most environments, a FileDataStore
using a NAS available to all AEM instances should be used. For situations where the Amazon Web Services are used, there is also an S3 DataStore
. If for any reason, the data store is maintained within MongoDB, the size of the datastore should be added to the total database size, and the working set calculations adjusted appropriately. This sizing may mean provisioning more RAM to maintain performance without page faults.
Monitoring
Monitoring is vital for a successful implementation of the project. With sufficient knowledge, it is possible to run AEM on MongoDB without monitoring. However, that knowledge is normally found in engineers specialized for each section of the deployment.
This specialized knowledge typically involves an R&D engineer working on the Apache Oak Core and a MongoDB specialist.
Without monitoring at all levels, detailed knowledge of the code base is required to diagnose issues. With monitoring in place and suitable guidance on the major statistics, implementation teams can react appropriately to anomalies.
While it is possible to use command-line tools to get a quick snapshot of the operation of a cluster, doing that in real time over many hosts is almost impossible. Command-line tools rarely give historical information beyond a few minutes and never allow cross correlation between different types of metrics. A brief period of slow background mongod
sync requires significant manual effort to correlate against I/O Wait or excessive write levels to a shared storage resource from an apparently unconnected virtual machine.
MongoDB Cloud Manager
MongoDB Cloud Manager is a free service offered by MongoDB that allows monitoring and management of MongoDB instances. It provides a view into the performance and health of the MongoDB cluster in real time. It manages both cloud and privately hosted instances provided the instance can reach the Cloud Manager monitoring server.
It requires an agent installed on the MongoDB instance that connects to the monitoring server. There are three levels of the agent:
- An automation agent that can fully automate everything on the MongoDB server,
- A monitoring agent that can monitor the
mongod
instance, - A backup agent that can perform scheduled backups of the data.
Although using Cloud Manager for maintenance automation of a MongoDB cluster makes many of the routine tasks easier, it is not required, and neither is using it for backup. When choosing a Cloud Manager to monitor, monitoring is however required.
For more information regarding MongoDB Cloud Manager, consult the MongoDB documentation.
MongoDB Ops Manager
MongoDB Ops Manager is the same software as the MongoDB Cloud Manager. Once registered, Ops Manager can be downloaded and installed locally in a private data center or on any other laptop or desktop machine. It uses a local MongoDB database to store data and communicates in the same way as Cloud Manager with the managed servers. If you have security policies that prohibit a monitoring agent, MongoDB Ops Manager should be used.
Operating System Monitoring
Operating system level monitoring is required to run an AEM MongoDB cluster.
Ganglia is a good example of such a system and it provides a picture of the range and detail of information required which goes beyond basic health metrics like CPU, load average, and free disk space. To diagnose issues, lower-level information such as entropy pool levels, CPU I/O Wait, sockets in FIN_WAIT2 state are required.
Log Aggregation
With a cluster of multiple servers, central log aggregation is a requirement for a production system. Software like Splunk supports log aggregation and allow teams to analyze the patterns of behavior of the application without having to manually collect the logs.
Checklists
This section deals with various steps that you should take to ensure that your AEM and MongoDB deployments are properly set up before implementing your project.
Network
- First, make sure that all hosts have a DNS entry
- All hosts should be resolvable by their DNS entry from all other routable hosts
- All MongoDB hosts are routable from all other MongoDB hosts in the same cluster
- MongoDB hosts can route packets to MongoDB Cloud Manager and the other monitoring servers
- AEM Servers can route packets to all MongoDB servers
- Packet latency between any AEM server and any MongoDB server is smaller than two milliseconds, with no packet loss and a standard distribution of one millisecond or less.
- Ensure that there are no more than two hops between an AEM and a MongoDB server
- There are no more than two hops between two MongoDB servers
- There are no routers higher than OSI Level 3 between any core servers (MongoDB or AEM or any combination).
- If VLAN trunking or any form of network tunneling is used, it must comply with the packet latency checks.