How to debug SegmentNotFoundException when Issue is reported in AEM 6.x

This article helps you understand how to debug SegmentNotFoundException when Issue is reported in AEM 6.x by reverting to the last known good revision of the segment store.

Description description

Environment

Experience Manager 6.x

Issue/Symptoms

Error log shows a SegmentNotFound exception, such as in the following example:

org.apache.jackrabbit.oak.segment.SegmentNotFoundException: Segment d2c720c4-c146-4ab1-ac37-542aad93c33f not found at

org.apache.jackrabbit.oak.segment.file.FileStore$8.call(FileStore.java:602) at

org.apache.jackrabbit.oak.segment.file.FileStore$8.call(FileStore.java:542) at

org.apache.jackrabbit.oak.segment.SegmentCache.getSegment(SegmentCache.java:95) at

org.apache.jackrabbit.oak.segment.file.FileStore.readSegment(FileStore.java:542) at

org.apache.jackrabbit.oak.segment.SegmentId.getSegment(SegmentId.java:125) at

org.apache.jackrabbit.oak.segment.Record.getSegment(Record.java:70) at

org.apache.jackrabbit.oak.segment.MapRecord.compare(MapRecord.java:424) at

org.apache.jackrabbit.oak.segment.MapRecord.compare(MapRecord.java:433) at

org.apache.jackrabbit.oak.segment.MapRecord.compare(MapRecord.java:391) at

org.apache.jackrabbit.oak.segment.SegmentNodeState.compareAgainstBaseState(SegmentNodeState.java:608) at

org.apache.jackrabbit.oak.spi.commit.EditorDiff.childNodeChanged(EditorDiff.java:148) at

org.apache.jackrabbit.oak.segment.MapRecord$3.childNodeChanged(MapRecord.java:442) at

org.apache.jackrabbit.oak.segment.MapRecord.compare(MapRecord.java:490) at

org.apache.jackrabbit.oak.segment.MapRecord.compare(MapRecord.java:433) at

org.apache.jackrabbit.oak.segment.SegmentNodeState.compareAgainstBaseState(SegmentNodeState.java:608) at

org.apache.jackrabbit.oak.spi.commit.EditorDiff.process(EditorDiff.java:52) at

org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.updateIndex(AsyncIndexUpdate.java:695) at

org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.runWhenPermitted(AsyncIndexUpdate.java:543) at

org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.run(AsyncIndexUpdate.java:402) at

org.apache.sling.commons.scheduler.impl.QuartzJobExecutor.execute(QuartzJobExecutor.java:118) at

org.quartz.core.JobRunShell.run(JobRunShell.java:202) at

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at

java.lang.Thread.run(Thread.java:745)

Resolution resolution

Steps to Resolve

There are two approaches to fix the issue and remove the inconsistencies in the repository

Revert to the last known good revision of the segment store.

First,  use the oak run tool, which is a runnable jar[ 1] that contains everything you need for a simple Oak installation and performing oak-related operations.

The check run mode of oak-run can be used to determine the last known good revision of a segment store.  That can be used to manually revert a corrupt segment store to its latest good revision.

Caution: This process will roll back the data in the system to a previous point in time.  If you would like to avoid losing changes in your system then you can try option B below instead.

To perform the check and restore:

  1. Download a version of oak-run that matches your oak core version from https://mvnrepository.com/artifact/org.apache.jackrabbit/oak-run

  2. To revert a corrupt segment store to its latest good state, change into CQ’s working directory (the one containing the crx-quickstartfolder) and back up all files in ./crx-quickstart/repository/segmentstore/.

  3. Run the consistency check,

    java -Xmx6000m -jar oak-run-*.jar check --bin=-1 /path/to/crx-quickstart/repository/segmentstore

    This searches backwards through the revisions until it finds a consistent one:

    Look for message like below:

    code language-none
    main INFO o.a.j.o.p.s.f.t.ConsistencyChecker - Found latest good revision afdb922d-ba53-4a1b-aa1b-1cb044b535cf:234880
    
  4. Revert the repository to this revision by editing ./crx-quickstart/repository/segmentstore/journal.log and deleting all lines after the line containing the latest good revision.

  5. Remove all ./crx-quickstart/repository/segmentstore/*.bak files.

  6. Run checkpoint clean-up to remove orphaned checkpoints using the following command:

    code language-none
    java -Xmx6000m -jar oak-run-*.jar checkpoints /path/to/crx-quickstart/repository/segmentstore rm-unreferenced
    
  7. Finally compact the repository:

    java -Xmx6000m -jar oak-run-*.jar compact /path/to/crx-quickstart/repository/segmentstore/

There might be cases when the oak run check cannot find the good revision, and we get “ConsistencyChecker - No good revision found” while running the check command.

How to fix corruption when encountering “ConsistencyChecker - No good revision found” on consistency check

Remove corrupted nodes manually.

You can do the following in AEM, TarMK setups with no FileDatastore configured, and situations where corruption is in the binaries.

Caution: The procedure below is intended for advanced users.  When deleting the corrupt nodes, you need to ensure they are not system nodes (such as /home, /jcr:system, etc) .  If they are system nodes, you need to make sure you can restore them.  Please consult with the AEM Customer Care team for assistance with the steps documented here if you are unsure.

  1. Stop AEM.

  2. Use the Oak run console and load the childCount groovy script to identify the corrupt nodes in the segment store:

    Load the oak-run console shell:

    code language-none
    java -jar oak-run-*.jar console crx-quickstart/repository/segmentstore
    

    Run the two commands below in the shell to load the script and run it:

    code language-none
    :load https://gist.githubusercontent.com/stillalex/e7067bcb86c89bef66c8/raw/d7a5a9b839c3bb0ae5840252022f871fd38374d3/childCount.groovy
    countNodes(session.workingNode)
    

    This results in the following output indicating the path to the corrupted node(s):

    code language-none
    21:21:42.029 main ERROR o.a.j.o.p.segment.SegmentTracker - Segment not found: 63ae05a4-b506-445c-baa2-cfa1b13b6e2f. Creation date delta is 3 ms.
    warning unable to read node /content/dam/test.txt/jcr:content/renditions/original/jcr:content
    

    In some cases, the issue is linked to binary properties and the childCount groovy script cannot locate any corrupted nodes.  In these cases, you can use the following command instead, which will read the first 1024 bytes for every binary encountered during the traversal (Please note this command will be slower and should only be used when the one above doesn’t return the expected results):

    code language-none
    countNodes(session.workingNode,true)
    
  3. Remove all the identified corrupted nodes listed in the output of the last command using rmNodes.groovy
    Load the oak-run console shell using below command:

    code language-none
    java -jar oak-run-*.jar console crx-quickstart/repository/segmentstore
    

    Load the groovy script:

    code language-none
    :load
    https://gist.githubusercontent.com/stillalex/43c49af065e3dd1fd5bf/raw/9e726a59f75b46e7b474f7ac763b0888d5a3f0c3/rmNode.groovy
    

    Run the rmNode command to remove the corrupt node, replace /path/to/corrupt/node with the path to the corrupt node you need to remove.

    code language-none
    rmNode(session, "/path/to/corrupt/node")
    

    where the corrupt node path is the path obtained in step 2, for example: “/content/dam/test.txt/jcr:content/renditions/original/jcr:content/”

    Note: When using oak-run.jar version 1.6.13 and up, set --read-write JVM parameter if you run into an error such as:

    code language-none
    / rmNode(session,"/path/to/corrupt/node")    Removing node /path/to/corrupt/node    ERROR java.lang.UnsupportedOperationException:    Cannot write to read-only store    at org.apache.jackrabbit.oak.segment.SegmentWriterBuilder$1.execute (SegmentWriterBuilder.java:171)    at org.apache.jackrabbit.oak.segment.SegmentWriter.writeNode (SegmentWriter.java:318)    at org.apache.jackrabbit.oak.segment.SegmentNodeBuilder.getNodeState (SegmentNodeBuilder.java:111)    at org.apache.jackrabbit.oak.segment.SegmentNodeStore$Commit.init (SegmentNodeStore.java:581)    at org.apache.jackrabbit.oak.segment.SegmentNodeStore.merge (SegmentNodeStore.java:333)    at org.apache.jackrabbit.oak.spi.state.NodeStore$merge.call (Unknown Source)    at groovysh_evaluate.rmNode (groovysh_evaluate:11)
    
  4. Repeat step 3 for all nodes found in step 2

    This above rmNode command should return true for the corrupt path, which means it deleted it. Make sure these found three corrupt paths get deleted by re-running the rmNode command on those paths. Next run it should return false.
    If you see that the same paths are still there in repository, then use the patched version of oak-run jar i.e. oak-run-1.2.18-NPR-17596. This version of jar skips unreadable binaries on compaction, replacing them with 0-byte binaries and logging the exception and the path to syserr. The resulting compacted repository should then pass oak-run check, and the node count script, and you should also be able to compact it again using a non-patched oak-run.

  5. Perform a checkpoint cleanup by listing checkpoints using below. If there are more than one checkpoint, then clean them up:

    code language-none
    nohup java -Xmx4096m -jar oak-run-1.2.18.jar checkpoints /app/AEM6/author/crx-quickstart/repository/segmentstore rm-allnohup.out
    
  6. Run an offline compaction. If you do not know how to run offline compaction then see here.

  7. Start the server wait for indexing completion.

Cause

A SegmentNotFoundException in the error log means a segment is not present any more although someone apparently tries to still access it. There are broadly three different root causes for this: the segment has been removed by manual intervention (e.g. rm -rf /), the segment has been removed by revision garbage collection or the segment cannot be found due to some bug in the code.

recommendation-more-help
3d58f420-19b5-47a0-a122-5c9dab55ec7f