When sizing the environment for an Adobe Experience Manager Assets implementation, it is important to ensure that there are sufficient resources available in terms of disk, CPU, memory, IO, and network throughput. Sizing many of these resources requires an understanding of how many assets are being loaded into the system. If a better metric is not available, you can divide the size of the existing library by the age of the library to find the rate at which assets are created.
A common mistake made when sizing the required disk space for an Assets implementation is to base the calculations on the size of the raw images to be ingested into the system. By default, Experience Manager creates three renditions in addition to the original image for use in rendering the Experience Manager user interface elements. In previous implementations, these renditions have been observed to assume twice the size of the assets that are ingested.
Most users define custom renditions in addition to the out-of-the-box renditions. In addition to the renditions, Assets lets you extract sub-assets from common file types, such as Adobe InDesign and Adobe Illustrator.
Finally, versioning capabilities of Experience Manager store duplicates of the assets in the version history. You can configure the versions to be purged often. However, many users choose to retain versions in the system for a long time, which consumes additional storage space.
Considering these factors, you require a methodology to calculate an acceptably accurate storage space to store user assets.
Performing the above steps help you determine the following:
You can specify these numbers in the Network Sizing spreadsheet to determine the total space required for your datastore. It is also a useful tool to determine the impact of maintaining asset versions or modifying assets in Experience Manager on disk growth.
The example data populated in the tool demonstrates how important it is to perform the steps mentioned. If you size the datastore based solely on the raw images being loaded (1 TB), you may have underestimated the repository size by a factor of 15.
For large datastores, you can implement a shared datastore either through a shared file datastore on a network attached drive or through an Amazon S3 datastore. In this case, individual instances do not need to maintain a copy of the binaries. In addition, a shared datastore facilitates binary-less replication and helps reduce the bandwidth used to replicate assets to publish environments.
The datastore can be shared between a primary and standby author instance to minimize the amount of time that it takes to update the standby instance with changes made in the primary instance. You can also share the datastore between the author and publish instances to minimize the traffic during replication.
Owing to some pitfalls, sharing a datastore is not recommended in all cases.
Having a shared datastore, introduces a single point of failure in an infrastructure. Consider a scenario where your system has one author and two publish instances, each with their own datastore. If any one of them crashes, the other two still can continue running. However, if the datastore is shared, a single disk failure can take down the entire infrastructure. Therefore, ensure that you maintain a backup of the shared datastore from where you can restored the datastore quickly.
Deploying the AWS S3 service for shared datastores is preferred because it significantly reduces the probability of failure compared to normal disk architectures.
Shared datastores also increase the complexity of operations, such as garbage collection. Normally, garbage collection for a standalone datastore can be initiated with a single click. However, shared datastores require mark sweep operations on each member that uses the datastore, in addition to running the actual collection on a single node.
For AWS operations, implementing a single central location (via Amazon S3), rather than building a RAID array of EBS volumes, can significantly offset the complexity and operational risks on the system.
A shared datastore requires the binaries to be stored on a network-mounted drive that is shared between all instances. Because these binaries are accessed over a network, the system performance is adversely impacted. You can partially mitigate the impact by using a fast network connection to a fast array of disks. However, this is an expensive proposition. If there are AWS operations, all disks are remote and require network connectivity. Ephemeral volumes lose data when the instance starts or stops.
Latency in S3 implementations is introduced by the background writing threads. Backup procedures must account for this latency. In addition, Lucene indexes may remain incomplete when making a backup. It applies to any time-sensitive file written to S3 datastore and accessed from another instance.
It is difficult to arrive at precise sizing figures for a NodeStore or DocumentStore because of the resources consumed by the following:
Because the binaries are stored in the datastore, each binary occupies some space. Most repositories are below 100GB in size. However, there may be larger repositories up to 1 TB in size. Additionally, to perform offline compaction, you require enough free space on the volume to rewrite the compacted repository alongside the pre-compacted version. A good rule-of-thumb is to size the disk to 1.5 times the size expected for the repository.
For the repository, use SSDs or disks with an IOPS level greater than 3000. To eliminate chances of IOPS introducing performance bottlenecks, monitor CPU IO Wait levels for early signs of issues.
Assets has several use cases that make network performance more important than on many of our Experience Manager projects. A customer can have a fast server, but if the network connection is not large enough to support the load of the users who are uploading and downloading assets from the system, then it will still appear to be slow. There is a good methodology for determining the choke point in a user’s network connection to Experience Manager at Assets considerations for user experience, instance sizing, workflow evaluation, and network topology.
When sizing an implementation, it is important to keep system limitations in mind. If the proposed implementation exceeds these limitations, employ creative strategies, such as partitioning the assets across multiple Assets implementations.
File size is not the only factor that contributes to out of memory (OOM) issues. It also depends on dimensions of the image. You can avoid OOM issues by providing a higher heap size when you start Experience Manager.
In addition, you can edit the threshold size property of the
com.day.cq.dam.commons.handler.StandardImageHandler component in Configuration Manager to use intermediate temporary file greater than zero.
The limit to the number of files that can exist in a datastore can be 2.1 billion due to filesystem limitations. It is likely that the repository encounters problems due to large number of nodes long before reaching the datastore limit.
If the renditions are incorrectly generated, use the Camera Raw library. However, in this case, the longest side of the image should not be greater than 65000 pixels. In addition, the image should not contain more than 512 MP (512 x 1024 x 1024 pixels). The size of the asset does not matter.
It is difficult to accurately estimate the size of the TIFF file supported out-of-the-box with a specific heap for Experience Manager because additional factors, such as pixel size influence processing. It is possible that Experience Manager can process a file of size of 255 MB out-of-the-box, but cannot process a file size of 18 MB because the latter comprises of an unusually higher number pixels compared to the former.
By default, Experience Manager lets you upload assets of file size up to 2 GB. To upload very large assets in Experience Manager, see Configuration to upload very large assets.