Revision Cleanup revision-cleanup
Introduction introduction
Each update to the repository creates a content revision. As a result, with each update, the size of the repository grows. Old revisions must be cleaned up to free disk resources - this is important to avoid uncontrolled repository growth. This maintenance functionality is called Revision Cleanup. It has been available as an offline routine since Adobe Experience Manager (AEM) 6.0.
With AEM 6.3 and higher, an online version of this functionality called Online Revision Cleanup was introduced. Compared to Offline Revision Cleanup where the AEM instance has to be shut down, Online Revision Cleanup can be run while the AEM instance is online. Online Revision Cleanup is turned on by default and it is the recommended way of performing a revision cleanup.
Note: See the Video for an introduction and how to use Online Revision Cleanup.
The revision cleanup process consists of three phases: estimation, compaction, and clean up. Estimation determines whether to run the next phase (compaction) or not based on how much garbage might be collected. During the compaction phase segments and tar files are rewritten leaving out any unused content. The clean up phase then removes the old segments including any garbage that they may contain. The offline mode can usually reclaim more space because the online mode must account for AEM’s working set which retains additional segments from being collected.
For more details regarding Revision Cleanup, see the following links:
Also, you can read the official Oak documentation.
When to use Online Revision Cleanup as opposed to Offline Revision Cleanup? when-to-use-online-revision-cleanup-as-opposed-to-offline-revision-cleanup
Online Revision Cleanup is the recommended way of performing revision cleanup. Offline Revision cleanup should be used only on an exceptional basis - for example, before migrating to the new storage format or if you are requested by Adobe Customer Care to do so.
How to Run Online Revision Cleanup how-to-run-online-revision-cleanup
Online Revision Cleanup is configured by default to automatically run once a day on both AEM Author and Publish instances. All you need to do is define the maintenance window during a period with the least user activity. You can configure the Online Revision Cleanup task as follows:
-
In the main AEM window, go to Tools - Operations - Dashboard - Maintenance or point your browser to:
https://serveraddress:serverport/libs/granite/operations/content/maintenance.html
-
Hover over Daily Maintenance Window and click the Settings icon.
-
Enter the desired values (recurrence, start time, end time) and click Save.
Alternatively, if you want to run the revision cleanup task manually, you can:
-
Go to Tools - Operations - Dashboard - Maintenance or browse directly to
https://serveraddress:serverport/libs/granite/operations/content/maintenance.html
-
Click the Daily Maintenance Window.
-
Hover over the Revision Cleanup icon.
-
Click Run.
Running Online Revision Cleanup After Offline Revision Cleanup running-online-revision-cleanup-after-offline-revision-cleanup
The revision cleanup process reclaims old revisions by generations. This means that each time you run revision cleanup a new generation is created and kept on the disk. There is a difference however between the two types of revision cleanup: offline revision cleanup keeps one generation while online revision cleanup keeps two generations. So, when you run online revision cleanup after offline revision cleanup the following happens:
- After the first online revision cleanup run, the repository size doubles. This happens because there are now two generations that are kept on disk.
- During the subsequent runs, the repository will temporarily grow while the new generation is created and then stabilize back to the size it had after the first run, as the online revision cleanup process reclaims the previous generation.
Also, keep in mind that depending on the type and number of commits, each generation can vary in size compared to the previous one, so the final size can vary from one run to the other.
Due to this fact, it is recommended to size the disk at least two or three times larger than the initially estimated repository size.
Full And Tail Compaction Modes full-and-tail-compaction-modes
AEM 6.5 introduces two new modes for the compaction phase of the Online Revision Cleanup process:
- The full compaction mode rewrites all the segments and tar files in the whole repository. The subsequent cleanup phase can thus remove the maximum amount of garbage across the repository. Because full compaction affects the whole repository, it requires a considerable amount of system resources and time to complete. Full compaction corresponds to the compaction phase in AEM 6.3.
- The tail compaction mode rewrites only the most recent segments and tar files in the repository. The most recent segments and tar files are those that have been added since the last time either full or tail compaction ran. The subsequent cleanup phase can thus only remove the garbage contained in the recent part of the repository. Because tail compaction only affects a part of the repository, it requires considerably less system resources and time to complete than full compaction.
These compaction modes constitute a trade-off between efficiency and resource consumption: while tail compaction is less effective it also has less impact on normal system operation. In contrast, full compaction is more effective but has a bigger impact on normal system operation.
AEM 6.5 also introduces a more efficient content deduplication mechanism during compaction, which further reduces the on-disk footprint of the repository.
The two charts below, present results from internal laboratory testing that illustrate the reduction of average execution times and the average footprint on disk in AEM 6.5 compared to AEM 6.3:
How To Configure Full and Tail Compaction how-to-configure-full-and-tail-compaction
The default configuration runs tail compaction on weekdays and full compaction on Sundays. The default configuration can be changed by using the new configuration value full.gc.days
of the RevisionCleanupTask
maintenance task.
When you configure the full.gc.days
value, full compaction runs during the days defined in the value and tail compaction runs during the days that are not defined in the value. For example, if you configure full compaction to run on Sunday then tail compaction runs Monday to Saturday. For example, if you configure full compaction to run every day of the week then tail compaction does not run at all.
Also, consider that:
- Tail compaction is less effective and it has less impact on normal system operations. It is thus intended to be run during business days.
- Full compaction is more effective but also has a bigger impact on normal system operations. It is thus intended to be used off business days.
- Both tail compaction and full compaction should be scheduled to run during off-peak hours.
Troubleshooting troubleshooting
When using the new compaction modes, keep in mind the following:
- You can monitor the input/output (I/O) activity, for example: I/O operations, CPU waiting for IO, commit queue size. This helps determine whether the system is becoming I/O bound and requires upsizing.
- The
RevisionCleanupTaskHealthCheck
indicates the overall health status of the Online Revision Cleanup. It works the same way as in AEM 6.3 and does not distinguish between full and tail compaction. - The log messages carry relevant information about the compaction modes. For example, when Online Revision Cleanup starts, the corresponding log messages indicate the compaction mode. Also, in some corner cases, the system reverts to full compaction when it was scheduled to run a tail compaction and the log messages indicate this change. The log samples bellow indicate the compaction mode and the change from tail to full compaction:
TarMK GC: running tail compaction
TarMK GC: no base state available, running full compaction instead
Known Limitations known-limitations
Sometimes, alternating between the tail and full compaction modes delays the cleanup process. More precisely, the repository will grow after a full compaction (it doubles in size). The extra space is reclaimed in the subsequent tail compaction, when the repository drops below the pre-full compaction size. Parallel maintenance task executions should also be avoided.
It is recommended to size the disk at least two or three times larger than the initially estimated repository size.
Online Revision Cleanup Frequently Asked Questions online-revision-cleanup-frequently-asked-questions
AEM 6.5 Upgrade Considerations aem-upgrade-considerations
Migrating to Oak Segment Tar migrating-to-oak-segment-tar
Running Online Revision Cleanup running-online-revision-cleanup
Monitoring Online Revision Cleanup monitoring-online-revision-cleanup
Troubleshooting Online Revision Cleanup troubleshooting-online-revision-cleanup
Troubleshooting Based On Error Messages troubleshooting-based-on-error-messages
The error.log is verbose if there are incidents during the online revision cleanup process. The following matrix aims to explain the most common messages and to provide possible solutions:
How to Run Offline Revision Cleanup how-to-run-offline-revision-cleanup
Adobe provides a tool called Oak-run to perform revision cleanup. It can be downloaded at the following location:
https://repo1.maven.org/maven2/org/apache/jackrabbit/oak-run/
The tool is a runnable jar that can be manually run to compact the repository. The process is called offline revision cleanup because the repository must be shut down to properly run the tool. Make sure to plan the cleanup in accordance with your maintenance window.
For tips on how to increase the performance of the cleanup process, see Increasing the Performance of Offline Revision Cleanup.
-
Always make sure you have a recent backup of the AEM instance.
Shut down AEM.
-
(Optional) Use the tool to find old checkpoints:
code language-xml java -jar oak-run.jar checkpoints install-folder/crx-quickstart/repository/segmentstore
-
(Optional) Then, delete the unreferenced checkpoints:
code language-xml java -jar oak-run.jar checkpoints install-folder/crx-quickstart/repository/segmentstore rm-unreferenced
-
Run the compaction and wait for it to complete:
code language-xml java -jar -Dsun.arch.data.model=32 oak-run.jar compact install-folder/crx-quickstart/repository/segmentstore
Increasing the Performance of Offline Revision Cleanup increasing-the-performance-of-offline-revision-cleanup
The oak-run tool introduces several features that aim to increase the performance of the revision cleanup process and minimize the maintenance window as much as possible.
The list includes several command-line parameters, as described below:
-
-mmap. You can set this as true or false. If set to true, memory mapped access is used. If set to false, file access is used. If not specified, memory mapped access is used on 64-bit systems and file access is used on 32-bit systems. On Windows, regular file access is always enforced and this option is ignored. This parameter has replaced the -Dtar.memoryMapped parameter.
-
-Dupdate.limit. Defines the threshold for the flush of a temporary transaction to disk. The default value is 10000.
-
-Dcompress-interval. Number of compaction map entries to keep until compressing the current map. The default is 1000000. You should increase this value to an even higher number for faster throughput, if enough heap memory is available. This parameter has been removed in Oak version 1.6 and has no effect.
-
-Dcompaction-progress-log. The number of compacted nodes that are logged. The default value is 150000, which means that the first 150000 compacted nodes are logged during the operation. Use this with the next parameter documented below.
-
-Dtar.PersistCompactionMap. Set this parameter to true to use disk space instead of heap memory for compaction map persistence. Requires the oak-run tool versions 1.4 and higher. For further details, see question 3 in the Offline Revision Cleanup Frequently Asked Questions section. This parameter has been removed in Oak version 1.6 and has no effect.
-
–force. Force compaction and ignore a non-matching segment store version.
--force
parameter upgrades the segment store to the latest version, which is incompatible with older Oak versions. Also, consider that no downgrade is possible. Generally, you should use these parameters with caution and only if you are knowledgeable about how to use them.An example of the parameters in use:
java -Dupdate.limit=10000 -Dcompaction-progress-log=150000 -Dlogback.configurationFile=logback.xml -Xmx8g -jar oak-run-*.jar checkpoints <repository>
Additional Methods of Triggering Revision Cleanup additional-methods-of-triggering-revision-cleanup
In addition to the methods presented above, you can also trigger the revision cleanup mechanism by using the JMX console as follows:
- Open the JMX Console by going to http://localhost:4502/system/console/jmx
- Click the RevisionGarbageCollection MBean.
- In the next window, click startRevisionGC() and then Invoke to start the Revision Garbage Collection job.