Handling Large Content Repositories


Copying a large number of blobs with the Content Transfer Tool (CTT) may take multiple days.
To significantly speed up the extraction and ingestion phases of the content transfer activity to move content to AEM as a Cloud Service, CTT can leverage AzCopy as an optional pre-copy step. This pre-copy step can be used when the source AEM instance is configured to use an Amazon S3, Azure Blob Storage data store, or File Data Store. The pre-copy step is most effective for the 1st full extraction and ingestion. However, using pre-copy for subsequent top-ups is not recommended (if the top-up size is less than 200GB) because it may add time to the entire process. Once this pre-step is configured, in the extraction phase, AzCopy copies blobs from Amazon S3, Azure Blob Storage, or File data store to the migration set blob store. In the ingestion phase, AzCopy copies blobs from the migration set blob store to the destination AEM as a Cloud Service blob store.

Important Considerations before you Start

Follow the section below to understand the important considerations before starting:

  • Source AEM version needs to be 6.3 - 6.5.

  • Source AEM’s data store is configured to use Amazon S3 or Azure Blob Storage. For more details, refer Configuring node stores and data stores in AEM 6.

  • Each migration set will copy the entire data store, so only a single migration set should be used.

  • You will need access to install AzCopy on the instance (or VM) running the source AEM instance.

  • Data Store Garbage Collection has been run within the previous 7 days on the source. For more details, refer to Data store garbage collection.

Additional Considerations if source AEM instance is configured to use an Amazon S3 or Azure Blob Storage Data Store

  • Since there is a cost associated with transferring data out of both Amazon S3 and Azure Blob Storage, the transfer cost will be relative to the total amount of data in your existing storage container (whether referenced in AEM, or not). Refer to Amazon S3 and Azure Blob Storage for more details.

  • You will need either an access key & secret key pair for the existing source Amazon S3 bucket, or a SAS URI for the existing source Azure Blob Storage container (read only access is fine).

Additional considerations if source AEM instance is configured to use File Data Store

  • The local system must have free space strictly greater than 1/256 size of the source datastore. For example, if the size of the datastore is 3 TB, free space greater than 11.72 GB must exist in the crx-quickstart/cloud-migration folder on the source for AzCopy to work. At a minimum, the source system should have 1 GB of free space. Free space can be obtained by using df -h command on Linux instances, and dir command in the Windows instances.

  • Each time extraction is run with AzCopy enabled, the entire file datastore is flattened and copied to the cloud migration container. If your migration set is significantly smaller than the size of your datastore, then AzCopy extraction is not the optimal approach.

  • Once AzCopy has been used to copy over the existing datastore, disable it for delta or top-up extractions.

Setting up to Use AzCopy as a Pre-Copy Step

Follow this section to learn how to set up to use AzCopy as a pre-copy step with Content Transfer Tool to migrate the content to AEM as a Cloud Service:

0. Determine total size of all content in the data store

It is important to determine the total size of the data store for two reasons:

  • If the source AEM is configured to use File data store, the local system must have free space strictly greater than 1/256 size of the source data store.

Azure Blob Storage Data Store

From the existing container properties page in the Azure portal, use the Calculate size button to determine the size of all content in the container. For example:


Amazon S3 Data Store

You can use the container’s Metrics tab to determine the size of all content in the container. For example:


File Data Store

  • For mac, UNIX systems, run the du command on the datastore directory to get its size:
    du -sh [path to datastore on the instance]. For example, if your datastore is located at /mnt/author/crx-quickstart/repository/datastore, the following command will get you it’s size: du -sh /mnt/author/crx-quickstart/repository/datastore.

  • For Windows, use the dir command on the datastore directory to get its size:
    dir /a/s [location of datastore].

1. Install AzCopy

AzCopy is a command-line tool provided by Microsoft that needs to be available on the source instance to enable this feature.

In short, you will most likely want to download the Linux x86-64 binary from the AzCopy docs page and un-tar it to a location such as /usr/bin.


Make note of where you placed the binary, as you will need the full path to it in a later step.

2. Install Content Transfer Tool (CTT) release with AzCopy support


The most recently released version of CTT should be used.

AzCopy support for Amazon S3, Azure Blob Storage and File Data Store is included in latest CTT release.
You can download the latest release of CTT from the Software Distribution portal.

3. Configure an azcopy.config file

On the source AEM instance, in crx-quickstart/cloud-migration, create a new file called azcopy.config.


The contents of this config file will be different depending on whether your source AEM instance uses an Azure or Amazon S3 data store or File data store.

Azure Blob Storage Data Store

Your azcopy.config file should include the following properties (make sure to use the correct azCopyPath and azureSas for your instance).


If you’d rather not grant write access to the existing blob storage container, you can generate a new SAS URI which only has Read and List permissions.


Amazon S3 Data Store

Your azcopy.config file should include the following properties (make sure to use the correct values for your instance).


If your instance uses IAM Roles to enable AEM to access S3, you will need to create a policy and user with the ListBucket and GetObject actions enabled for the S3 bucket. Once set up, use this user’s access key and secret key.


File Data Store

Your azcopy.config file must contain the azCopyPath property, and an optional repository.home property that points to the location of the file datastore. Use the correct values for your instance.
File Data Store


The azCopyPath property must contain the full path of the location where the azCopy command line tool is installed on the source AEM instance. If the azCopyPath property is missing, the blob precopy step will not be performed.

If repository.home property is missing from azcopy.config, then the default datastore location /mnt/crx/author/crx-quickstart/repository/datastore will be used to perform precopy.

4. Extracting with AzCopy

With the above configuration file in place, the AzCopy pre-copy phase will run as part of every subsequent extraction. To prevent it from running, you can rename this file or remove it.


If AzCopy is not configured correctly you would see this message in the logs:
INFO c.a.g.s.m.c.a.AzCopyCloudBlobPreCopy - Blob pre-copy is not supported.

  1. Begin an extraction from the CTT UI. Refer to Getting Started with Content Transfer Tool and the Extraction Process for more details.

  2. Confirm the following line is printed in the extraction log:

c.a.g.s.m.commons.ContentExtractor - *************** Beginning AzCopy Pre-Copy phase ***************

Congratulations! This log entry means that your configuration was considered valid and that AzCopy is currently copying all blobs from the source container to the migration container.

The log entries from AzCopy will appear in the extraction log, and will be prefixed with c.a.g.s.m.c.azcopy.AzCopyBlobPreCopy - [AzCopy pre-copy]


For the first few minutes of an extraction, watch the extraction logs closely for any sign of an issue. As an example, here is what would be logged if the source Azure container could not be found:

[AzCopy pre-copy] failed to perform copy command due to error: cannot start job due to error: cannot list files due to reason -> github.com/Azure/azure-storage-blob-go/azblob.newStorageError, github.com/Azure/azure-storage-blob-go@v0.10.1-0.20210407023846-16cf969ec1c3/azblob/zc_storage_error.go:42
[AzCopy pre-copy] ===== RESPONSE ERROR (ServiceCode=ContainerNotFound) =====
[AzCopy pre-copy] Description=The specified container does not exist.
[AzCopy pre-copy] RequestId:5fb674b9-201e-001b-2a5b-527400000000
[AzCopy pre-copy] Time:2021-05-26T18:18:07.5931967Z, Details:
[AzCopy pre-copy] Code: ContainerNotFound

In the event of an issue with AzCopy, the extraction will fail immediately, and the extraction logs will contain detail on the failure.

Any blobs which were copied prior to the error will be skipped automatically by AzCopy on subsequent runs, and will not need to be copied again.

For File Data Store

When AzCopy is running for source file dataStore, you should see messages like these in the logs indicating that folders are getting processed:
c.a.g.s.m.c.a.AzCopyFileSourceBlobPreCopy - [AzCopy pre-copy] Processing folder (1/24) crx-quickstart/repository/datastore/5d

5. Ingesting with AzCopy

Refer to Ingesting Content into Target
for general information about ingesting content into the target from the Cloud Acceleration Manager (CAM), including
instruction on how to use AzCopy (pre-copy), or not, in the “New Ingestion” dialog.

To take advantage of AzCopy during ingestion, we require that you be on an AEM as a Cloud Service version that is at least version 2021.6.5561.

Refer to the “Ingestion Jobs” list in the Cloud Acceleration Manager and the ingestion’s logs to see the progress. The log entries related to the
successful AzCopy tasks will appear as follows (allowing for some differences). Checking the logs occasionally could alert you to problems
early on, and help you find a quick solution to any problems.

*************** Beginning AzCopy pre-copy phase ***************
INFO: Scanning...
INFO: Failed to create one or more destination container(s). Your transfers may still succeed if the container already exists.
INFO: Any empty folders will not be processed, because source and/or destination doesn't have full folder support
INFO: azcopy: A newer version 10.11.0 is available to download

Job 419d98da-fc05-2a45-70cc-797fee632031 has started
Log file is located at: /root/.azcopy/419d98da-fc05-2a45-70cc-797fee632031.log

0.0 %, 0 Done, 0 Failed, 886 Pending, 0 Skipped, 886 Total,

Job 419d98da-fc05-2a45-70cc-797fee632031 summary
Elapsed Time (Minutes): 0.0334
Number of File Transfers: 886
Number of Folder Property Transfers: 0
Total Number of Transfers: 886
Number of Transfers Completed: 17
Number of Transfers Failed: 0
Number of Transfers Skipped: 869
TotalBytesTransferred: 248350
Final Job Status: CompletedWithSkipped

*************** Completed AzCopy pre-copy phase ***************

What’s Next

Once you have learned Handling Large Content Repositories to significantly speed up the extraction and ingestion phases of the content transfer activity to move content to AEM as a Cloud Service, you are now ready to learn the Extraction Process using the Content Transfer Tool. See Extracting Content from Source in Content Transfer Tool to learn how to extract your migration set from the Content Transfer Tool.

On this page