Storage overview

The Oak storage layer provides an abstraction layer for the actual storage of the content.

Currently, there are two storage implementations available in AEM6: Tar Storage and MongoDB Storage.

Tar Storage

The Tar storage uses tar files. It stores the content as various types of records within larger segments. Journals are used to track the latest state of the repository.

There are several key design principles that it was build around:

  • Immutable Segments

The content is stored in segments that can be up to 256 KB. They are immutable, which makes it easy to cache frequently accessed segments and reduce system errors that may corrupt the repository.

Each segment is identified by a unique identifier (UUID) and contains a continuous subset of the content tree. In addition, segments can reference other content. Each segment keeps a list of UUIDs of other referenced segments.

  • Locality

Related records like a node and its immediate children are stored in the same segment. Doing so makes searching the repository fast and avoids most cache misses for typical clients that access more than one related node per session.

  • Compactness

The formatting of records is optimized for size to reduce IO costs and to fit as much content in caches as possible.

Mongo Storage

The MongoDB storage uses MongoDB for sharding and clustering. The repository tree is kept in one MongoDB database where each node is a separate document.

It has several particularities:

  • Revisions

For each update (commit) of the content, a new revision is created. A revision is basically a string that consists of three elements:

  1. A timestamp derived from the system time of the machine it was generated on
  2. A counter to distinguish revisions created with the same timestamp
  3. The cluster node id where the revision was created
  • Branches

Branches are supported, which allows client to stage multiple changes and make them visible with a single merge call.

  • Previous documents

MongoDB storage adds data to a document with every modification. However, it only deletes data if a cleanup is explicitly triggered. Old data is moved when a certain threshold is met. Previous documents only contain immutable data, which means they only contain committed and merged revisions.

  • Cluster node metadata

Data about active and inactive cluster nodes is kept in the database to facilitate cluster operations.

A typical AEM cluster setup with MongoDB storage:

chlimage_1-85

What is different from Jackrabbit 2?

Because Oak is backwards compatible with the JCR 1.0 standard, there is almost no changes on the user level. However, there are some noticeable differences that you must account for when setting up an Oak based AEM installation:

  • Oak does not create indexes automatically. As such, custom indexes must be created when necessary.
  • Unlike Jackrabbit 2 where sessions always reflect the latest state of the repository, with Oak a session reflects a stable view of the repository from the time the session was acquired. The reason is due to the MVCC model on which Oak is based.
  • Same name siblings (SNS) are not supported in Oak.

For more information regarding the AEM platform, also check the articles below:

Experience Manager


Elevate and Empower Teams with Agentic AI for Exceptional Experiences

Online | Strategy Keynote | General Audience

Elevate and empower your CX teams with AI that transforms creativity, personalization, and productivity. Discover how Adobe is...

Tue, Mar 18, 1:00 PM PDT (8:00 PM UTC)

Register

Rapid Feature Releases with AEM Cloud: Telegraph Media Group’s RDE Strategy

Online | Session | Intermediate

Hear how Telegraph Media Group, the award-winning publisher of The Daily Telegraph, The Sunday Telegraph, The Telegraph Magazine,...

Wed, Mar 19, 3:30 PM PDT (10:30 PM UTC)

Register

Connect with Experience League at Summit!

Get front-row access to top sessions, hands-on activities, and networking—wherever you are!

Learn more