From the Experience Manager Assets standpoint, monitoring should include observing and reporting on the following processes and technologies:
Typically, Experience Manager Assets can be monitored in two ways, live monitoring and long term monitoring.
You should perform live monitoring during the performance testing phase of your development or during high-load situations to understand the performance characteristics of your environment. Typically, live monitoring should be performed using a suite of tools. Here are some recommendations:
Visual VM: Visual VM enables you to view detailed Java VM information, including CPU usage, Java memory usage. In addition, it lets you sample and evaluate code that runs on an deployment.
Top: Top is a Linux command that opens up a dashboard, which displays usage statistics, including CPU, memory, and IO usage. It provides a high-level overview of what is happening on an instance.
Htop: Htop is an interactive process viewer. It provides detailed CPU and memory usage in addition to what Top can provide. Htop can be installed on most Linux systems using
yum install htop or
apt-get install htop.
Iotop: Iotop is a detailed dashboard for disk IO usage. It displays bars and meters that depict the processes that use disk IO and the amount they use. Iotop can be installed on most Linux systems using
yum install iotop or
apt-get install iotop.
Iftop: Iftop displays detailed information about ethernet/network usage. Iftop displays per communication channel statistics on the entities using ethernet and the amount of bandwidth they use. Iftop can be installed on most Linux systems using
yum install iftop or
apt-get install iftop.
Java Flight Recorder (JFR): A commercial tool from Oracle that you can use freely in non-production environments. For more details, see How to Use Java Flight Recorder to Diagnose CQ Runtime Problems.
error.log file: You can investigate the Experience Manager
error.log file for details of errors logged in the system. Use the command
tail -F quickstart/logs/error.log to identify errors to investigate.
Workflow console: Leverage the workflow console to monitor workflows that lag behind or get stuck.
Typically, you use these tools together to obtain a comprehensive idea about the performance of your Experience Manager deployment.
These tools are standard tools and not directly supported by Adobe. They don’t require additional licenses.
Figure: Live monitoring using Visual VM tool.
Long term monitoring of an Experience Manager deployment involves monitoring for a longer duration the same portions that are monitored live. It also includes defining alerts specific to your environment.
There are several tools available to aggregate logs, for example Splunk™ and Elastic Search, Logstash, and Kabana (ELK). To evaluate the uptime of your Experience Manager deployment, it is important for you to understand log events specific to your system and create alerts based on them. A good knowledge of your development and operations practices can help you better understand how to tune your log aggregation process to generate critical alerts.
Environment monitoring includes monitoring the following:
You require external tools, such as NewRelic™ and AppDynamics™ to monitor each item. Using these tools, you can define alerts specific to your system, for example high system utilization, workflow back up, health check failures, or unauthenticated access to your website. Adobe does not recommend any particular tools over others. Find the tool that works for you, and leverage it to monitor the items discussed.
Internal application monitoring includes monitoring the application components that make up the Experience Manager stack, including JVM, the content repository, and monitoring through custom application code built on the platform. In general, it is performed through JMX Mbeans that can be monitored directly by many popular monitoring solutions, such as SolarWinds ™, HP OpenView™, Hyperic™, Zabbix™, and others. For systems that do not support a direct connection to JMX, you can write shell scripts to extract the JMX data and expose it to these systems in a format that they natively understand.
Remote access to the JMX Mbeans is not enabled by default. For more information on monitoring through JMX, see Monitoring and Management Using JMX Technology.
In many cases, a baseline is required to effectively monitor a statistic. To create a baseline, observe the system under normal working conditions for a predetermined period and then identify the normal metric.
As with any Java-based application stack, Experience Manager depends on the resources that are provided to it through the underlying Java Virtual Machine. You can monitor the status of many of these resources through Platform MXBeans that are exposed by JVM. For more information on MXBeans, see Using the Platform MBean Server and Platform MXBeans.
Here are some baseline parameters that you can monitor for JVM:
The information provided by this bean is expressed in bytes.
Monitor Experience Manager
Experience Manager also exposes a set of statistics and operations through JMX. These can help assess system health and identify potential problems before they impact users. For more information, see documentation on Experience Manager JMX MBeans.
Here are some baseline parameters that you can monitor for Experience Manager:
Instances: One Author and all publish instances (for flush agents)
Alarm threshold: When the value of
true or the value of
QueueNumEntries is greater than 150% of the baseline.
Alarm definition: Presence of a blocked queue in the system indicating that the replication target is down or unreachable. Often, network or infrastructure issues cause excessive entries to be queued, which can adversely impact system performance.
For the MBean and URL parameters, replace
<AGENT_NAME> with the name of the replication agent you want to monitor.
Health checks that are available in the operations dashboard have corresponding JMX MBeans for monitoring. However, you can write custom health checks to expose additional system statistics.
Here are some out-of-the-box health checks that are helpful to monitor:
In the process of monitoring, if you encounter issues, here are some troubleshooting tasks that you can perform to resolve common issues with Experience Manager deployments:
If using TarMK, run Tar compaction often. For more details, see Maintain the repository.
OutOfMemoryError logs. For more information, see Analyze Memory Problems.
Check the logs for any references to unindexed queries, tree traversals, or index traversals. These indicate unindexed queries or inadequately indexed queries. For For best practices on optimizing query and indexing performance, see Best practices for queries and indexing.
Use the workflow console to verify that your workflows perform as expected. If possible, condense multiple workflows into a single workflow.
Revisit live monitoring, and look for additional bottlenecks or high consumers of any specific resources.
Investigate the egress points from the client network and the ingress points to the Experience Manager deployment network, including the dispatcher. Frequently, these are bottleneck areas. For more information, see Assets network considerations.
Up-size your Experience Manager server. You may have an inadequately sized your Experience Manager deployment. Adobe Customer Support can help you identify whether your server is undersized.
error.log files for entries around the time of something went wrong. Look for patterns that can potentially indicate custom code anomalies. Add them to the list of events you monitor.