Each update to the repository creates a content revision. As a result, with each update, the size of the repository grows. Old revisions must be cleaned up to free disk resources - this is important to avoid uncontrolled repository growth. This maintenance functionality is called Revision Cleanup. It has been available as an offline routine since Adobe Experience Manager (AEM) 6.0.
With AEM 6.3 and higher, an online version of this functionality called Online Revision Cleanup was introduced. Compared to Offline Revision Cleanup where the AEM instance has to be shut down, Online Revision Cleanup can be run while the AEM instance is online. Online Revision Cleanup is turned on by default and it is the recommended way of performing a revision cleanup.
Note: See the Video for an introduction and how to use Online Revision Cleanup.
The revision cleanup process consists of three phases: estimation, compaction, and clean up. Estimation determines whether to run the next phase (compaction) or not based on how much garbage might be collected. During the compaction phase segments and tar files are rewritten leaving out any unused content. The clean up phase then removes the old segments including any garbage that they may contain. The offline mode can usually reclaim more space because the online mode must account for AEM’s working set which retains additional segments from being collected.
For more details regarding Revision Cleanup, see the following links:
Also, you can read the official Oak documentation.
Online Revision Cleanup is the recommended way of performing revision cleanup. Offline Revision cleanup should be used only on an exceptional basis - for example, before migrating to the new storage format or if you are requested by Adobe Customer Care to do so.
Online Revision Cleanup is configured by default to automatically run once a day on both AEM Author and Publish instances. All you need to do is define the maintenance window during a period with the least user activity. You can configure the Online Revision Cleanup task as follows:
In the main AEM window, go to Tools - Operations - Dashboard - Maintenance or point your browser to: https://serveraddress:serverport/libs/granite/operations/content/maintenance.html
Hover over Daily Maintenance Window and click the Settings icon.
Enter the desired values (recurrence, start time, end time) and click Save.
Alternatively, if you want to run the revision cleanup task manually, you can:
Go to Tools - Operations - Dashboard - Maintenance or browse directly to https://serveraddress:serverport/libs/granite/operations/content/maintenance.html
Click the Daily Maintenance Window.
Hover over the Revision Cleanup icon.
Click Run.
The revision cleanup process reclaims old revisions by generations. This means that each time you run revision cleanup a new generation is created and kept on the disk. There is a difference however between the two types of revision cleanup: offline revision cleanup keeps one generation while online revision cleanup keeps two generations. So, when you run online revision cleanup after offline revision cleanup the following happens:
Also, keep in mind that depending on the type and number of commits, each generation can vary in size compared to the previous one, so the final size can vary from one run to the other.
Due to this fact, it is recommended to size the disk at least two or three times larger than the initially estimated repository size.
AEM 6.5 introduces two new modes for the compaction phase of the Online Revision Cleanup process:
These compaction modes constitute a trade-off between efficiency and resource consumption: while tail compaction is less effective it also has less impact on normal system operation. In contrast, full compaction is more effective but has a bigger impact on normal system operation.
AEM 6.5 also introduces a more efficient content deduplication mechanism during compaction, which further reduces the on-disk footprint of the repository.
The two charts below, present results from internal laboratory testing that illustrate the reduction of average execution times and the average footprint on disk in AEM 6.5 compared to AEM 6.3:
The default configuration runs tail compaction on weekdays and full compaction on Sundays. The default configuration can be changed by using the new configuration value full.gc.days
of the RevisionCleanupTask
maintenance task.
When you configure the full.gc.days
value, full compaction runs during the days defined in the value and tail compaction runs during the days that are not defined in the value. For example, if you configure full compaction to run on Sunday then tail compaction runs Monday to Saturday. For example, if you configure full compaction to run every day of the week then tail compaction does not run at all.
Also, consider that:
When using the new compaction modes, keep in mind the following:
RevisionCleanupTaskHealthCheck
indicates the overall health status of the Online Revision Cleanup. It works the same way as in AEM 6.3 and does not distinguish between full and tail compaction.TarMK GC: running tail compaction
TarMK GC: no base state available, running full compaction instead
Sometimes, alternating between the tail and full compaction modes delays the cleanup process. More precisely, the repository will grow after a full compaction (it doubles in size). The extra space is reclaimed in the subsequent tail compaction, when the repository drops below the pre-full compaction size. Parallel maintenance task executions should also be avoided.
It is recommended to size the disk at least two or three times larger than the initially estimated repository size.
Questions | Answers |
What should I be aware of when I upgrade to AEM 6.5? | The persistence format of TarMK changes with AEM 6.5. These changes do not require a proactive migration step. Existing repositories go through a rolling migration, which is transparent to the user. The migration process is initiated the first time AEM 6.5 (or related tools) access the repository. Once the migration to the AEM 6.5 persistence format has been initiated, the repository cannot be reverted to the previous AEM 6.3 persistence format. |
Questions | Answers | |
Why do I need to migrate the repository? | In AEM 6.3 changes to the storage format were needed, especially for improving the performance and efficacy of Online Revision Cleanup. These changes are not backwards compatible, and repositories created with the old Oak Segment (AEM 6.2 and previous) must be migrated. Additional benefits of changing the storage format:
|
|
Is the previous Tar format still supported? | Only the new Oak Segment Tar is supported with AEM 6.3 or higher. | |
Is the content migration always mandatory? | Yes. Unless you start with a fresh instance, you will always have to migrate the content. | |
Can I upgrade to 6.3 or higher and do the migration later (for example, using another maintenance window)? | No, as explained above, the content migration is mandatory. | |
Can downtime be avoided when migrating? | No. This is a one time effort that cannot be done on a running instance. | |
What happens if I accidentally run against the wrong repository format? | If you try to run the oak-segment module against an oak-segment-tar repository (or conversely), startup fails with an IllegalStateException with the message "Invalid segment format". No data corruption occurs. | |
Will a reindex of the search indexes be necessary? | No. Migrating from oak-segment to oak-segment-tar introduces changes in the container format. The contained data is not affected and will not be modified. | |
How to best calculate the expected disk space needed during and after the migration? | The migration is equivalent to recreating the segmentstore in the new format. This can be used to estimate the additional disk space needed during migration. After the migration, the old segment store can be deleted to reclaim space. | |
How to best estimate the duration of the migration? | Migration performance can be greatly improved if offline revision cleanup is executed prior to the migration. All customers are advised to execute it as a pre-requisite of the upgrade process. In general, the duration of the migration should be similar to the duration of the offline revision cleanup task, assuming that the offline revision cleanup task has been executed before the migration. |
Questions | Answers | |
How frequently should Online Revision Cleanup be executed? | Once per day. This is the default configuration in the Operations Dashboard. | |
How can I configure the start time of the Online Revision Cleanup maintenance task ? | See the How to run Online Revision Cleanup section. | |
Is there a maximum frequency that should not be exceeded for Online Revision Cleanup? | It is recommended to run Online Revision Cleanup once per day, as configured by default. |
|
What are the key indicators that determine the frequency at which Online Revision Cleanup should be ran? | There is no need to determine the frequency as Online Revision Cleanup is configured as a maintenance task and it automatically runs each day. | |
Why does Online Revision Cleanup not reclaim any space when run for the first time? | Online Revision Cleanup reclaims old revisions by generations. A fresh generation is generated every time revision cleanup runs. Only the content that is at least two generations old will be reclaimed, which means that on a first run there is nothing to reclaim. | |
Why does the first Online Revision Cleanup not reclaim any space when run after the Offline Revision Cleanup ? | Offline Revision Cleanup is reclaiming everything but the latest generation compared to latest two generations for Online Revision Cleanup. If there is a fresh repository, Online Revision Cleanup will not reclaim any space when executed for the first time after the Offline Revision Cleanup because there is no generation old enough to be reclaimed. Also, read the "Running Online Revision Cleanup after Offline Revision Cleanup" section of this chapter. |
|
Would Author and Publish typically have different Online Revision Cleanup windows? | This depends on office hours and the traffic patterns of the customer online presence. The maintenance windows should be configured outside of the main production times to allow for the best cleanup efficacy. For multiple AEM Publish instances (TarMK Farm), maintenance windows for Online Revision Cleanup should be staggered. | |
Are there any prerequisites before running Online Revision Cleanup? | Online Revision Cleanup is available only with AEM 6.3 and higher releases. Also, if you are using an older version of AEM, you must migrate to the new Oak Segment Tar. |
|
What are the factors that determine the duration of the Online Revision Cleanup? | The factors are:
|
|
Can authors still work while Online Revision Cleanup is running? | Yes, Online Revision Cleanup can cope with concurrent writes. However, Online Revision Cleanup works faster and more efficiently without concurrent write transactions. Adobe recommends scheduling the Online Revision Cleanup maintenance task to a relatively quiet time without a lot traffic. | |
What are the minimum requirements for disk space and heap memory when running Online Revision Cleanup? | Disk space is continuously monitored during Online Revision Cleanup. Should the available disk space drop below a critical value, the process is canceled. The critical value is 25% of the current disk footprint of the repository and it is not configurable. Adobe recommends you size the disk at least two or three times larger than the initially estimated repository size. Free heap space is continuously monitored during the cleanup process. Should the free heap space drop below a critical value, the process is canceled. The critical value is configured through org.apache.jackrabbit.oak.segment.SegmentNodeStoreService#MEMORY_THRESHOLD. The default value is 15%. Recommendations for minimum compaction heap sizing are not separated from the AEM memory sizing recommendations. Generally: If an AEM instance is sized enough to cope with the use cases and expected payload thereon, the cleanup process obtains enough memory. |
|
What is the expected performance impact while running Online Revision Cleanup? | Online Revision Cleanup is a background process that reads from and writes to the repository concurrently to normal system operations. In particular, it might need to acquire exclusive access to the repository for a short time period, preventing other threads from writing into the repository. | |
How long is the Online Revision Cleanup expected to run? | It should take no longer than two hours to be run according to the latest performance tests Adobe performed internally. | |
What should be done if Online Revision Cleanup takes longer? |
|
|
What happens if Online Revision Cleanup exceeds configured Maintenance Windows? | Make sure that other maintenance tasks are not delaying its execution. This could be the case if more maintenance tasks than Online Revision Cleanup are executed within the same maintenance window. Maintenance tasks are run sequentially without a configurable order. | |
Why is revision garbage collection skipped? | Revision Cleanup relies on an estimation phase to decide if there is enough garbage to be cleaned. The estimator compares the current size against the size of the repository after it was last compacted. If the size exceeds the configured delta, cleanup runs. The size delta is set at 1 GB. This effectively means that if the repository size did not grow by 1 GB since the last cleanup run, the new revision cleanup iteration is skipped. Below are the relevant log entries for the estimation phase:
|
|
Is it possible to safely abort the auto compaction if the performance impact is too high? | Yes. Since AEM 6.3, it can be safely stopped by way of the Maintenance Task Window within the Operations Dashboard or by way of JMX. | |
If the AEM instance is shut down during a scheduled cleanup task, does the process abort safely, or is the shutdown blocked until the compaction has finished ? | Revision Cleanup is interrupted and the repository shuts down safely. | |
What happens when the system crashes during Online Revision Cleanup? | There is no risk of data corruption in such cases. Garbage leftovers are cleaned up by a subsequent run. | |
What is the impact of not running Online Revision Cleanup? | Performance degradation over time. | |
Which revisions are being collected ? | By default, the Online Revision Cleanup only collects revisions that are at least 24 hours old. | |
What happens if there is too much interference from concurrent writes to the repository? | If there's write concurrency on the system, online revision cleanup might require exclusive write access to be able to commit the changes at the end of a compaction cycle. The system goes into forceCompact mode, as explained in more detail in the Oak documentation. During force compact, an exclusive write lock is acquired to finally commit the changes without any concurrent writes interfering. To limit the impact on response times, a time-out value can be defined. This value is set to one minute by default, which means that if force compact does not complete within one minute, the compaction process is aborted in favor of concurrent commits. The duration of force compact depends on the following factors:
|
|
How is Online Revision Cleanup executed on a standby instance? |
In a cold standby setup, only the primary instance must be configured to run Online Revision Cleanup. On the standby instance, Online Revision Cleanup does not need to be scheduled specifically. The corresponding operation on a standby instance is the Automatic Cleanup - this corresponds to the cleanup phase of the Online Revision Cleanup. The Automatic Cleanup is run on the standby instance after the execution of the Online Revision Cleanup on the primary instance. Estimation and compaction phases will not be run on a standby instance. |
|
Is Offline Revision Cleanup able to free more disk space than Online Revision Cleanup? | Offline Revision Cleanup can immediately remove old revisions while Online Revision Cleanup must account for old revisions still being referenced by the application stack. The former can thus remove garbage more aggressively than the latter where the effect is amortized over the course of a few garbage collection cycles. Also, read the "Running Online Revision Cleanup after Offline Revision Cleanup" section of this chapter. |
|
Any considerations about memory mapped file operations? |
|
|
What must be monitored during Online Revision Cleanup? |
|
|
How to check if the Online Revision Cleanup has completed successfully? | You can check if the Online Revision Cleanup has completed successfully by checking the logs. For example, " Correspondingly there is a message " |
|
Where can we find the statistics of the last Online Revision Cleanup executions? | Status, progress, and statistics are exposed via JMX ( Progress can be tracked via the You can obtain a reference of the MBean using the The statistics are only available since the last system start. External monitoring tooling could be used to keep the data beyond AEM uptime. See the AEM documentation for attaching health checks to Nagios as an example for an external monitoring tool. |
|
What are relevant log entries? |
Also, see the Troubleshooting Based on Error Messages section below. |
|
How to check how much space was reclaimed after Online Revision Cleanup has completed? | There is a message in the log at the end of the cleanup cycle: "TarMK GC #3: cleanup completed " that includes the size of the repository and the amount of reclaimed garbage. |
|
How to check the integrity of the repository after Online Revision Cleanup has completed? | A repository integrity check is not needed after the Online Revision Cleanup. However, you can perform the following actions to check the repository status after cleanup:
|
|
How to detect if Online Revision Cleanup has failed and what are the steps to recover? | Failure conditions are marked by WARN or ERROR log messages starting with "TarMK GC". Also, see the Troubleshooting Based on Error Messages section below. | |
What information is exposed in the Revision Cleanup Health Check? How and when do they contribute to the color coded status levels? | The Revision Clean-up Health Check is part of the Operations Dashboard. The status is GREEN if the last execution of the Online Revision Cleanup maintenance task has completed successfully. It is YELLOW if the Online Revision Cleanup maintenance task was canceled once. It is RED if the Online Revision Cleanup maintenance task was canceled three times in a row. In this case manual interaction is required or Online Revision Clean-up is likely to fail again. For more information, read the Troubleshooting section below. Also, the Health Check status is reset after a system restart. So, a freshly restarted instance shows a green status on the Revision Cleanup Health Check. External monitoring tooling could be used to keep the data beyond AEM uptime. See the AEM documentation for attaching health checks to Nagios as an example for an external monitoring tool. |
|
How to monitor Automatic Cleanup on a standby instance? |
Status, progress, and statistics are exposed via JMX by using the You can obtain a reference of the MBean by using the The statistics are available only since the last system start. External monitoring tooling could be used to keep the data beyond the AEM uptime. Also, see See the AEM documentation for attaching health checks to Nagios as an example for an external monitoring tool. The log files can also be used to check the status, progress, and statistics of the Automatic Cleanup. |
|
What must be monitored during Automatic Cleanup on a standby instance? |
|
What is the worst that can happen if you do not run Online Revision Cleanup? | The AEM instance runs out of disk space, which causes outages in production. | |
Is high user traffic problematic for running Online Revision Cleanup on a publish instance ? | High user traffic impacts whether the compaction phase is able to successfully finish or not. |
|
According to the Health Check and the log entries, Online Revision Cleanup has not completed successfully three times in a row. What is required to make Online Revision Cleanup complete successfully? | You can take several steps to find and fix the issue:
|
|
What must be done when the Healthcheck alert is on? | See the previous point. | |
What happens if Online Revision Cleanup runs out of time during the scheduled maintenance window? | Online Revision Cleanup is canceled and the leftovers are removed. It starts again next time the maintenance window is scheduled. | |
What is causing SegmentNotFoundException instances to be logged in the error.log and how can I recover? |
A
|
The error.log is verbose if there are incidents during the online revision cleanup process. The following matrix aims to explain the most common messages and to provide possible solutions:
Phase | Log Messages | Explanation | Next Steps |
---|---|---|---|
Estimation | TarMK GC #2: estimation skipped because compaction is paused. | The estimation phase is skipped when compaction is disabled on the system by configuration. | Enable Online Revision Cleanup. |
N/A | TarMK GC #2: estimation interrupted: ${REASON}. Skipping compaction. | The estimation phase terminated prematurely. Some examples of events that could interrupt the estimation phase: not enough memory or disk space on the host system. | Depends on the given reason. |
Compaction | TarMK GC #2: compaction paused. | As long as the compaction phase is paused by configuration, neither the estimation phase nor the compaction phase is run. | Enable online revision cleanup. |
N/A | TarMK GC #2: compaction canceled: ${REASON}. | The compaction phase terminated prematurely. Some examples of events that could interrupt the compaction phase: not enough memory or disk space on the host system. Moreover, compaction can also be canceled by shutting down the system or by explicitly canceling it via administrative interfaces such as the Maintenance Window within the Operations Dashboard. | Depends on the given reason. |
N/A | TarMK GC #2: compaction failed in 32.902 min (1974140 ms), after 5 cycles. | This message doesn't mean that there was an unrecoverable error, but only that compaction was terminated after a some attempts. Also, read the following paragraph. | Read the following Oak documentation, and the last question of the Running Online Revision Cleanup section. |
Cleanup | TarMK GC #2: cleanup interrupted. | Cleanup has been canceled by shutting down the repository. No impact on consistency is expected. Also, disk space will most likely not be reclaimed to full extent. It will be reclaimed during next revision cleanup cycle. | Investigate why the repository has been shut down and going forward try to avoid shutting down the repository during maintenance windows. |
Use an Oak-run tool release which has a version number (both major and minor) that matches the Oak core version of your AEM installation. For example, if your AEM instance has Oak core version 1.22.x you should use the latest version of Oak-run tool 1.22.x.
Adobe provides a tool called Oak-run to perform revision cleanup. It can be downloaded at the following location:
https://repo1.maven.org/maven2/org/apache/jackrabbit/oak-run/
The tool is a runnable jar that can be manually run to compact the repository. The process is called offline revision cleanup because the repository must be shut down to properly run the tool. Make sure to plan the cleanup in accordance with your maintenance window.
For tips on how to increase the performance of the cleanup process, see Increasing the Performance of Offline Revision Cleanup.
You can also clear old checkpoints before the maintenance takes place (steps 2 and 3 in the procedure below). This is recommended only for instances that have more than 100 checkpoints.
Always make sure you have a recent backup of the AEM instance.
Shut down AEM.
(Optional) Use the tool to find old checkpoints:
java -jar oak-run.jar checkpoints install-folder/crx-quickstart/repository/segmentstore
(Optional) Then, delete the unreferenced checkpoints:
java -jar oak-run.jar checkpoints install-folder/crx-quickstart/repository/segmentstore rm-unreferenced
Run the compaction and wait for it to complete:
java -jar -Dsun.arch.data.model=32 oak-run.jar compact install-folder/crx-quickstart/repository/segmentstore
The oak-run tool introduces several features that aim to increase the performance of the revision cleanup process and minimize the maintenance window as much as possible.
The list includes several command-line parameters, as described below:
-mmap. You can set this as true or false. If set to true, memory mapped access is used. If set to false, file access is used. If not specified, memory mapped access is used on 64-bit systems and file access is used on 32-bit systems. On Windows, regular file access is always enforced and this option is ignored. This parameter has replaced the -Dtar.memoryMapped parameter.
-Dupdate.limit. Defines the threshold for the flush of a temporary transaction to disk. The default value is 10000.
-Dcompress-interval. Number of compaction map entries to keep until compressing the current map. The default is 1000000. You should increase this value to an even higher number for faster throughput, if enough heap memory is available. This parameter has been removed in Oak version 1.6 and has no effect.
-Dcompaction-progress-log. The number of compacted nodes that are logged. The default value is 150000, which means that the first 150000 compacted nodes are logged during the operation. Use this with the next parameter documented below.
-Dtar.PersistCompactionMap. Set this parameter to true to use disk space instead of heap memory for compaction map persistence. Requires the oak-run tool versions 1.4 and higher. For further details, see question 3 in the Offline Revision Cleanup Frequently Asked Questions section. This parameter has been removed in Oak version 1.6 and has no effect.
–force. Force compaction and ignore a non-matching segment store version.
Using the --force
parameter upgrades the segment store to the latest version, which is incompatible with older Oak versions. Also, consider that no downgrade is possible. Generally, you should use these parameters with caution and only if you are knowledgeable about how to use them.
An example of the parameters in use:
java -Dupdate.limit=10000 -Dcompaction-progress-log=150000 -Dlogback.configurationFile=logback.xml -Xmx8g -jar oak-run-*.jar checkpoints <repository>
In addition to the methods presented above, you can also trigger the revision cleanup mechanism by using the JMX console as follows:
What are the factors that determine the duration of the Offline Revision Cleanup? | The repository size and the number of revisions that must be cleaned up determines the duration of the cleanup. |
What is the difference between a revision and a page version? |
|
How to speed up the Offline Revision Cleanup task if it does not complete within 8 hours ? | If the revision task does not complete within 8 hours and the thread dumps reveal that the main hotspot is InMemoryCompactionMap.findEntry , use the following parameter with the oak-run tool versions 1.4 or higher: -Dtar.PersistCompactionMap=true . The -Dtar.PersistCompactionMap parameter has been removed in Oak version 1.6. |