Use oak-run.jar to Manage Indexes

oak-run.jar’s index command consolidates a number of features to manage Oak200 indexes in AEM, from gathering index statistics, running index consistency checks, and re/indexing indexes themselves.

Within this article and videos the terms indexing and re-indexing are used interchangeably and considered the same operation.

oak-run.jar index Command Basics

In this set of videos, we’ll take - a look at the new Index Command, that is now available - with Oak run. Before we get started, the first - thing we want to do is verify the version of Oak that our - AEM instance is running. To do this, we can go to the System - Overview, introduced in AEM 6.4 and we can identify the version - of Oak that we’re using. In my case, - we’re using 1.8.0, so I want to make sure that we’re - using the matching Oak run version.
I’ve already downloaded version - 1.8.0 of the Oak run jar and I’ve copied it into - my AEM folder. Below this video, I’ll provide a - download link for the Oak run jar. To execute the - Oak run jar, simply use the - Java jar command, passing in the Oak - run jar file name. If you only have one Oak run jar - in the context of this command, you can replace the - version with a star. So that this command can be adapted - to various versions of AEM.
Provide the - Command Index, to allow Oak run to know - that we’re interested in index operations and then - provide the help flag.
So as you can see, there - are a variety of flags, that can be used with - the Index Command and we’ll be taking a look at some - of the more common functions. Typically you want to provide - the path to the data store, be that a file data store, an - Azure blob store or S3 store. So for this example, since I’m running time Tarum - K with a file data store, we can use the - FDS path flag and this can take the - absolute file path, to the file data store, or - relative path from the jar. In my case, the relative path - is crx Quick Start, slash repository, - slash data store and next we want to tell Oak run - where the segment store is located. Again, in the - case of Tarum K this is the second store folder, - in the case of Mango K, this would be the - Mango DB address. Because I’m using Tarum K, - I can provide the absolute or relative path to - the segment store. In my case, I’ll use the relative - path again, just like data store. So we have on screen - the parameter set for the more common Oak - run index operations. Running this command - without any other flags, gives us some basic - information about the installed indexes on - our AEM instance. -
  • The version of oak-run.jar used must match the version of Oak used on the AEM instance.

  • Managing indexes using oak-run.jar leverages the index command with various flags to support different operations.

    • java -jar oak-run*.jar index ...

Index Statistics

Oak Run jar allows you to gather index statistics from shut down or running AEM instances. In order to diagnose search performance issues, it’s often necessary to obtain the installed Index Definitions, Index statistics and sometimes even the actual contents of an index. Much of this information is already available in various consoles in AEM. However, Oak Run consolidates this gathering into a single simple command. So, there are three flags that can be used in any combination to gather the index information. The Index Info flag, which collects and dumps various statistics related to the indexes. The Index Definitions flag, which exports the Index Definitions and the Index Dump flag, which dumps the index content.
When the Index Input Command is used, a variety of statistics are shown, including the State of the sync indexes, as well as a brief summary of the indexes installed on the system, broken down by index type. Depending on index type, there may be more or less information available. For example, the Lucene indexes have a variety of information, including the last updated time, the size and estimated entry count.
Property indexes however, are limited to the index path, as well as the estimated entry count. When the Index Definitions flag is used, a file named index-definitions.JSon is created. This will provide a view of all of the Lucene indexes in AEM, in a serialized JSon format. It’s worth noting, that this index JSon format, can be used by another Oak Run command to install or update Oak indexes on other AEM instances. Lastly, when the Index Dump flag is provided, a folder is created that in turn contains folders for each Lucene index and their actual contents. It’s worth noting that, only the index details .txt is in a human readable format and data and suggest data contain the raw data, dump from Lucene. But it can provide key insights for Adobe support, when they’re helping troubleshoot index issues. Together, all of this info can provide powerful insights into your Oak indexes and AEM.
  • oak-run.jar dumps all index definitions, important index stats and index contents for offline analysis.
  • The index statistics gathering is safe to execute on in-use AEM instances.

Index Consistency Check

Oak Run Jar allows you to execute index consistency checks for Lucene Indexes, from shutdown, or running AEM instances. Executing the index consistency operation, checks Lucene Indexes, and provides a report in getting which Lucene Indexes are valid or invalid. The consistency check can be used by AEM support, and AEM system administrators to quickly determine if any indexes are corrupt. This is especially useful, as index corruption, while rare, can have serious side-effects requiring quick resolution. -
  • oak-run.jar quickly determines if lucene Oak indexes are corrupt.
  • The consistency check is safe to run on in-use AEM instance for consistency check levels 1 and 2.

TarMK Online indexing with oak-run.jar tarmkonlineindexingwithoakrunjar

In this video, we’ll take a look at online reindexing using oak-run which allows TarMK based AEM instances to be online during reindexing. Though AEM can be online during this reindexing process, indexing involves traversal of the entire repository, which will increase the I/O load on AEM, adversely impacting runtime performance. And thus, it’s recommended that oak-run online reindexing should be performed during quiet or maintenance periods. This indexing approach creates a set of generated raw index data files that can be immediately imported into AEM. But because of this, we need to make sure we have enough available disk space to store these temporary raw index files. To get an approximation of the required disk space, use oak-run with the index info flag to obtain the size of the current indexes, and make sure you have at least that much of space with a healthy margin for error.
Before executing the oak-run command, we must create a checkpoint using AEM’s JMX checkpoint manager Mbean. Use a high expiration value, for instance 30 days, or approximately 2.6 million seconds. Generate the checkpoint, and record the checkpoint value.
While AEM is still running, execute oak-run with the reindex flag and provide the checkpoint as well as the absolute paths to the indexes to reindex. Depending on the size of the repository, and the coverage of the oak indexes being reindexed, the oak-run reindexing process can take some time. During the reindexing, oak-run will output the progress, so you can keep tabs on how far along the reindexing process is. When complete, a folder populated with the raw generated index files is created and ready for import into AEM. We’ll head back AEM’s JMX console, and this time open the indexer Mbean. From here we can provide the absolute file system path to the raw generated files, and note that this path is provided by the oak-run reindex operation output.
Upon executing the indexer MBean operation, the raw generated index files from the oak-run index process are moved into AEM and immediately available. Note that any content changes that occurred after the checkpoint’s creation were naturally processed and indexed by AEM. So, none of those changes were lost. -
  • Online indexing of TarMK using oak-run.jar is faster than setting reindex=true on the oak:queryIndexDefinition node. Despite this performance increase, online indexing using oak-run.jar still requires a maintenance window to perform the indexing.

  • Online indexing of TarMK using oak-run.jar should not be executed against AEM instances outside of the AEM’s instances maintenance window.

TarMK Offline indexing with oak-run.jar

Oak-run supports re-indexing AEM on TarMK using several approaches. In this video, we’ll take a look at offline re-indexing using oak-run, which requires the TarMK based AEM instance to be shut down. Oak-run has a single command to be issued that re-indexes one or more indexes. The re-index flag is used along with a comma delimited list of oak index paths as well as the read-write flag. Oak-run’s re-index logging is sufficiently verbose and will keep you apprised of the progress of the re-indexing as well as when it’s finished.
Re-indexing is complete, the AEM instance can be restarted. Note that if this command is accidentally executed against a running TarMK AEM instance the oak-run process will appear to hang. -
  • Offline indexing of TarMK using oak-run.jar is the simplest oak-run.jar based indexing approach for TarMK as it requires a single oak-run.jar command, however it requires the AEM instance to be shutdown.

TarMK Out-of-band indexing with oak-run.jar

In this video, we’ll take a look at out of band indexing in AEM using Oak run. Out of band indexing is a great way to index or re-index, tarMK based AEM instances without affecting the running AEM instances performance, allowing them to continue running normally during this indexing period. The reason out of band indexing doesn’t affect AEM’s operational performance is indexing is performing against a clone of the AEM instance, and not against the running AEM instance itself. Thus, the name out of band. This approach generates a set of raw index files from the clone AEM instance, that can be immediately imported into the running AEM instance. Because out of band indexing occurs against a clone unused repository and Oak run is running in its own process, Out of band indexing is the fastest way to re-index AEM on tarMK without interrupting availability. This is particularly welcome when the tarMK repository becomes so large that re-indexing, it takes longer than the available maintenance periods to complete. Before we get started, I want to note that Oak run will generate raw index files to import into AEM. Because of this, we need to make sure we have enough disc space to hold these generated files. We can get an estimate of the space required by running Oak run with the index info flag, and reviewing the current index size. Note the size is only available for Lucene indexes, but we’ll want to make sure that we have enough space to generate these files. So the first thing we need to do is create a checkpoint on the running AEM instance before cloning. using the JMX checkpoint manager and MBean. We’ll use a high expiration value. for instance 30 days or approximately 2.6 million seconds. and we’ll record the checkpoint value after creating it. Once the checkpoint is created, we need to clone our and tarMK based AEM instance. There’s a variety of ways of cloning a tarMK based AEM, but we’ll use the product as backup available via tools, operations backup.
Note that the backup can take some time to complete.
Once the backup is created, unzip it.
Keep in mind the backup and the unzip copy of the backup can become very large on disc. So make sure you have enough disc space. Keeping the backup offline, execute Oak run’s index operation against it. Using the re-index flag, the checkpoint flag, providing the checkpoint ID generated from the JMX MBean, and a comma delimited list of the AEM indexes to re index. Oak run will output the progress as indexing takes place, and when complete, we’ll be notified and a path will be provided to the raw index files, that can then be imported into AEM.
Back in the running AEM’s JMX console. input the generated index files via the indexer MBean.
The index import is immediate, and now we’re running AEM instance. Now reflects the newly indexed indexes. -
  • Out-of-band indexing on TarMK using oak-run.jar minimizes the impact of indexing on in-use AEM instances.
  • Out-of-band indexing is the recommended indexing approach for AEM installations where the time to re/index exceeds the available maintenance windows.

MongoMK Online indexing with oak-run.jar

  • Online index with oak-run.jar on MongoMK and RDBMK is the recommended method for re/indexing MongoMK (and RDBMK) AEM installations. No other method should be used for MongoMK or RDBMK.
  • This indexing needs to be executed only against a single AEM instance in the cluster.
  • Online indexing of MongoMK is safe to execute against a running AEM cluster, as the repository traversal will occur on only a single MongoDB node, allowing the others to continue serving requests without significant performance impact.

The oak-run.jar index command to perform an online indexing of MongoMK is the same as the TarMK Online indexing with oak-run.jar with the difference that the segment store parameter points to the MongoDB instance that contains the Node store.

java -jar oak-run*.jar index
 --fds-path=/path/to/datastore mongodb://server:port/aem

Supporting materials