Copying many blobs with the Content Transfer Tool (CTT) may take multiple days.
To speed up the extraction and ingestion phases of the content transfer activity to move content to AEM as a Cloud Service, CTT can use AzCopy as an optional pre-copy step. This pre-copy step can be used when the source AEM instance is configured to use an Amazon S3, Azure Blob Storage data store, or File Data Store. The pre-copy step is most effective for the first full extraction and ingestion. However, using pre-copy for subsequent top-ups is not recommended (if the top-up size is less than 200 GB) because it may add time to the entire process. Once this pre-step is configured, in the extraction phase, AzCopy copies blobs from Amazon S3, Azure Blob Storage, or File data store to the migration set blob store. In the ingestion phase, AzCopy copies blobs from the migration set blob store to the destination AEM as a Cloud Service blob store.
Follow the section below to understand the important considerations before starting:
Starting from version 2.0.16 of CTT, the precopy setup is done automatically when the bundle is installed. Also, if the migration set size is greater than 200 GB, the extraction process automatically uses the precopy feature. The azcopy.config file is created in the crx-quickstart/cloud-migration/ directory. You do not need to manually do the precopy setup if you are using CTT version 2.0.16 or later.
Source AEM version must be 6.3 - 6.5.
Source AEM’s data store is configured to use Amazon S3 or Azure Blob Storage. For more details, see Configuring node stores and data stores in AEM 6.
Each migration set copies the entire data store, so only a single migration set should be used.
You need access to install AzCopy on the instance (or VM) running the source AEM instance.
Data Store Garbage Collection has been run within the previous seven days on the source. For more details, see Data store garbage collection.
There is a cost associated with transferring data out of Amazon S3 and Azure Blob Storage. The transfer cost is relative to the total amount of data in your existing storage container (whether referenced in AEM, or not). See Amazon S3 and Azure Blob Storage for more details.
You need either an access key and secret key pair for the existing source Amazon S3 bucket, or a SAS URI for the existing source Azure Blob Storage container (read-only access is fine).
The local system must have free space strictly greater than 1/256 size of the source datastore. For example, if the size of the datastore is 3 terabytes, free space greater than 11.72 GB must exist in the
crx-quickstart/cloud-migration folder on the source for AzCopy to work. At a minimum, the source system should have 1 GB of free space. Free space can be obtained by using
df -h command on Linux® instances, and dir command in the Windows instances.
Each time extraction is run with AzCopy enabled, the entire file datastore is flattened and copied to the cloud migration container. If your migration set is smaller than the size of your datastore, then AzCopy extraction is not the optimal approach.
Once AzCopy has been used to copy over the existing datastore, disable it for delta or top-up extractions.
Starting from version 2.0.16 of CTT, the precopy setup is done automatically when the bundle is installed. Also, if the migration set size is greater than 200 GB, the extraction process automatically uses the precopy feature. The azcopy.config file is created in the crx-quickstart/cloud-migration/ directory. If you would like to update the configuration of the file manually, review the sections below.
Follow this section so you can learn how to set up to use AzCopy as a pre-copy step with Content Transfer Tool to migrate the content to AEM as a Cloud Service:
It is important to determine the total size of the data store for two reasons:
From the existing container properties page in the Azure portal, use the Calculate size button to determine the size of all content in the container. For example:
You can use the container’s Metrics tab to determine the size of all content in the container. For example:
For Mac, UNIX® systems, run the du command on the datastore directory to get its size:
du -sh [path to datastore on the instance]. For example, if your datastore is at
/mnt/author/crx-quickstart/repository/datastore, the following command gets you its size:
du -sh /mnt/author/crx-quickstart/repository/datastore.
For Windows, use the dir command on the datastore directory to get its size:
dir /a/s [location of datastore].
AzCopy is a command-line tool provided by Microsoft® that must be available on the source instance to enable this feature.
In short, you want to download the Linux® x86-64 binary from the AzCopy docs page and untar it to a location such as /usr/bin.
Make note of where you placed the binary, because you need the full path to it in a later step.
The most recently released version of CTT should be used.
AzCopy support for Amazon S3, Azure Blob Storage, and File Data Store is included in latest CTT release.
You can download the latest release of CTT from the Software Distribution portal.
It should be noted that only versions 2.0.0 and higher are supported, and it is advisable to use the most recent version.
On the source AEM instance, in
crx-quickstart/cloud-migration, create a file called
The contents of this config file is different depending on whether your source AEM instance uses an Azure or Amazon S3 data store or File data store.
Your azcopy.config file should include the following properties (make sure to use the correct azCopyPath and azureSas for your instance).
If you do not want grant write access to the existing blob storage container, you can generate a new SAS URI which only has Read and List permissions.
Your azcopy.config file should include the following properties (make sure to use the correct values for your instance).
If your instance uses IAM Roles to enable AEM to access S3, you must create a policy and user with the ListBucket and GetObject actions enabled for the S3 bucket. Once set up, use this user’s access key and secret key.
azCopyPath=/usr/bin/azcopy s3Bucket=aem-63 s3Region=us-west-2 s3AccessKey=--REDACTED-- s3SecretKey=--REDACTED--
azcopy.config file must contain the azCopyPath property, and an optional repository.home property that points to the location of the file datastore. Use the correct values for your instance.
File Data Store
The azCopyPath property must contain the full path of the location where the azCopy command-line tool is installed on the source AEM instance. If the azCopyPath property is missing, the blob precopy step is not performed.
repository.home property is missing from azcopy.config, then the default datastore location
/mnt/crx/author/crx-quickstart/repository/datastore is used to perform precopy.
With the above configuration file in place, the AzCopy pre-copy phase runs as part of every subsequent extraction. To prevent it from running, you can rename this file or remove it.
If AzCopy is not configured correctly, you would see the following message in the logs:
INFO c.a.g.s.m.c.a.AzCopyCloudBlobPreCopy - Blob pre-copy is not supported.
Confirm that the following line is printed in the extraction log:
c.a.g.s.m.commons.ContentExtractor - *************** Beginning AzCopy Pre-Copy phase ***************
Congratulations! This log entry means that your configuration was considered valid and that AzCopy is copying all blobs from the source container to the migration container.
The log entries from AzCopy appear in the extraction log, and are prefixed with c.a.g.s.m.c.azcopy.AzCopyBlobPreCopy - [AzCopy pre-copy]
For the first few minutes of an extraction, watch the extraction logs closely for any sign of an issue. As an example, here is what would be logged if the source Azure container could not be found:
[AzCopy pre-copy] failed to perform copy command due to error: cannot start job due to error: cannot list files due to reason > github.com/Azure/azure-storage-blob-go/azblob.newStorageError, github.com/Azureemail@example.com/azblob/zc_storage_error.go:42 [AzCopy pre-copy] ===== RESPONSE ERROR (ServiceCode=ContainerNotFound) ===== [AzCopy pre-copy] Description=The specified container does not exist. [AzCopy pre-copy] RequestId:5fb674b9-201e-001b-2a5b-527400000000 [AzCopy pre-copy] Time:2021-05-26T18:18:07.5931967Z, Details: [AzCopy pre-copy] Code: ContainerNotFound
If there is an issue with AzCopy, the extraction fails immediately, and the extraction logs contain detail on the failure.
Any blobs that were copied before the error are skipped automatically by AzCopy on subsequent runs, and do not need to be copied again.
An ingestion can now be scheduled to start automatically immediately after an extraction succeeds. See Ingesting Content into Target for more information.
When AzCopy is running for source file dataStore, you should see messages like these in the logs indicating that folders are getting processed:
c.a.g.s.m.c.a.AzCopyFileSourceBlobPreCopy - [AzCopy pre-copy] Processing folder (1/24) crx-quickstart/repository/datastore/5d
See Ingesting Content into Target for general information about ingesting content into the target from the Cloud Acceleration Manager (CAM), including instruction on how to use AzCopy (pre-copy), or not, in the “New Ingestion” dialog.
To take advantage of AzCopy during ingestion, Adobe requires that you are on an AEM as a Cloud Service version that is at least version 2021.6.5561.
See the “Ingestion Jobs” list in the Cloud Acceleration Manager and the ingestion’s logs so you can see the progress. The log entries related to the
successful AzCopy tasks appear as follows (allowing for some differences). Checking the logs occasionally could alert you to problems
early on, and help you find a quick solution to any problems.
*************** Beginning AzCopy pre-copy phase *************** INFO: Scanning... INFO: Failed to create one or more destination container(s). Your transfers may still succeed if the container already exists. INFO: Any empty folders will not be processed, because source and/or destination does not have full folder support INFO: azcopy: A newer version 10.11.0 is available to download Job 419d98da-fc05-2a45-70cc-797fee632031 has started Log file is located at: /root/.azcopy/419d98da-fc05-2a45-70cc-797fee632031.log 0.0 %, 0 Done, 0 Failed, 886 Pending, 0 Skipped, 886 Total, Job 419d98da-fc05-2a45-70cc-797fee632031 summary Elapsed Time (Minutes): 0.0334 Number of File Transfers: 886 Number of Folder Property Transfers: 0 Total Number of Transfers: 886 Number of Transfers Completed: 17 Number of Transfers Failed: 0 Number of Transfers Skipped: 869 TotalBytesTransferred: 248350 Final Job Status: CompletedWithSkipped *************** Completed AzCopy pre-copy phase ***************
You have now learned about Handling Large Content Repositories to speed up the extraction and ingestion phases of the content transfer activity to move content to AEM as a Cloud Service. You are now ready to learn the Extraction Process using the Content Transfer Tool. See Extracting Content from Source in Content Transfer Tool so you can learn how to extract your migration set from the Content Transfer Tool.