CRX2Oak is a tool designed to migrate data between different repositories.
It can be used to migrate data from older CQ versions based on Apache Jackrabbit 2 to Oak, and it can also be used to copy data between Oak repositories.
You can download the newest version of crx2oak from the public Adobe repository at this location:
The list of changes and fixes for the newest version can be found in the CRX2Oak Release Notes.
For more information on Apache Oak and key concepts of AEM persistance, see Introduction to the AEM Platform.
The tool can be used for:
Support for migrating repositories using external Blob Stores (commonly known as Data Stores) is provided in different combinations. One possible migration path is from a CRX2 repository that is using an external
FileDataStore to an Oak repository using a
The below diagram illustrates all the possible migration combinations supported by CRX2Oak:
CRX2Oak is called during AEM upgrades in a fashion in which the user can specify a predefined migration profile that automates the reconfiguration of persistence modes. This is called the quickstart mode.
It can also be run separately in case it requires more customization. However, note that in this mode changes are made only to the repository and any additional reconfiguration of AEM needs to be performed manually. This is called the standalone mode.
Another thing to note is that with the default settings in standalone mode, only the Node Store will be migrated and the new repository will re-use the old binary storage.
Since AEM 6.3, CRX2Oak is able to handle user defined migration profiles that can be configured with all the migration options already available. This allows for both higher flexibility, and the ability to automate configuration of AEM, features that are not available if you are using the tool in standalone mode.
In order to switch CRX2Oak to quickstart mode you need to define the path to crx-quickstart folder in the AEM installation directory via this operating system environmental variable:
For UNIX based systems and macOS:
The migration can be interrupted at any time, with the possibility to resume it afterwards.
Custom Java logic cand also be implemented using
RepositoryInitializer classes can be implemented in order to initialize the repository with custom values.
CRX2Oak also supports memory mapped operations by default. Memory mapping greatly improves performance and should be used whenever possible.
Note however that memory mapped operations are not supported for Windows platforms. Therefore, it is recommended to add the –disable-mmap parameter when performing the migration on Windows.
By default, the tool migrates the whole repository under the
"/" path. However, you have complete control over which content should be migrated.
If there is any part of the content that is not required on the new instance, you can use the
--exclude-path parameter to exclude the content and optimize the upgrade procedure.
If data needs to be copied between two repositories and you have a content path that is different on both instances, you can define it in the
--merge-path parameter. Once you do, CRX2Oak will copy only the new nodes to the destination repository and will keep the old ones in place.
By default, AEM will create a version of each node or page that gets modified, and store it in the repository. The versions can be then used to restore the page to an earlier state.
However, these versions are never purged even if the original page is deleted. When dealing with repositories that have been in operation for a long time, the migration might need to process a lot of redundant data caused by orphaned versions.
A useful feature for these types of situations is the addition of the
--copy-versions parameter. It can be used to skip the version nodes during migration or copy of a repository.
You can also choose whether to copy orphaned versions by adding
Both parameters also support a
YYYY-MM-DD date format, in case you want to copy versions no later than a specific date.
An open source version of CRX2Oak is available in the form of oak-upgrade. It supports all the features except for:
See the Apache Documentation for more information.
--cache: Cache size in MB (default is
--mmap: Enable memory mapped file access for Segment Store
--src-password: Password for the source RDB database
--src-user: User for the source RDB
--user: User for the targed RDB
--password: Password for the target RDB.
--early-shutdown: Shuts down the source JCR2 repository after nodes are copied and before the commit hooks are applied
--fail-on-error: Forces a failure of the migration if the nodes cannot be read from the source repository.
--ldap: Migrates LDAP users from a CQ 5.x instance to an Oak based one. In order for this to work, the Identity Provider in the Oak configuration needs to be named ldap. For more information, see the LDAP documentation.
--ldap-config: Use this in conjunction with the
--ldap parameter for CQ 5.x repositories that used multiple LDAP servers for authentication. You can use it to point to the CQ 5.x
jaas.conf configuration files. The format is
--copy-orphaned-versions: Skips copying orphaned versions. Parameters supported are:
yyyy-mm-dd. Defaults to
--copy-versions: Copies the version storage. Parameters:
yyyy-mm-dd. Defaults to
--include-paths:Comma-separated list of paths to include during copy
--merge-paths: Comma-separated list of paths to merge during copy
--exclude-paths:Comma-separated list of paths to exclude during copy.
--src-datastore: The datastore directory to be used as a source
--src-fileblobstore: The datastore directory to be used as a source
--src-s3datastore: The datastore directory to be used for the source
--src-s3config: The configuration file for the source
--datastore: The datastore directory to be used as a target
--fileblobstore: The datastore directory to be used as a target
--s3datastore: The datastore directory to be used for the target
--s3config: The configuration file for the target
-?, -h, --help:Shows help information.
You can also enable debug information for the migration process in order to troubleshoot any issues that might appear during the process. You can do this differently depending on the mode you wish to run the tool in:
|Quickstart mode||You can add the --log-level TRACE or --log-level DEBUG options to the command line when running CRX2Oak. In this mode logs are automatically redirected to the upgrade.log file.|
Add the --trace options to the CRX2Oak command line to show TRACE events on standard output (you need to redirect logs yourself using redirection character: '>' or 'tee' command for later inspection).
When migrating to a MongoDB replica set, make sure you set the
WriteConcern parameter to
2 on all connections to the Mongo databases.
You can do this by adding the
w=2 parameter at the end of the connection string, like this:
java -Xmx4092m -XX:MaxPermSize=1024m -jar crx2oak.jar crx-quickstart/repository/ mongodb://localhost:27017/aem-author?replicaset=replica1&w=2
For more information, see the MongoDB Connection String documentation on Write Concerns.