SegmentNotFoundException and IllegalArgumentException
Running an Offline Compaction can fail with SegmentNotFoundException or IllegalArgumentException. This article discusses ways to resolve the errors and complete the Offline Compaction successfully. However, before you continue, please perform a full backup of your repository.
Description description
Environment
Adobe Experience Manager (AEM)
Issue
Scenario 1
Running an Offline Compaction can fail with SegmentNotFoundException when there are integrity issues of the repository.
You observe SegmentNotFoundException in AEM log files and AEM isn’t working as expected.
Scenario 2
Running an Offline Compaction can fail with SegmentNotFoundException when there are integrity issues of the repository.
A stack trace similar to the one below shows in the logs:
13:51:21.523 [ main] ERROR o.a.j.o.p.segment.SegmentTracker - Segment not found: 4d139bc4-150c-4f0a-b82a-4867593098a. Creation date delta is 4 ms.
org.apache.jackrabbit.oak.plugins.segment.SegmentNotFoundException: Segment 4d139bc4-150c-4f0a-b82a-4867593098a not found
at org.apache.jackrabbit.oak.plugins.segment.file.FileStore.readSegment(FileStore.java:855) [ oak-run-1.0.22.jar:1.0.22]
at org.apache.jackrabbit.oak.plugins.segment.SegmentTracker.getSegment(SegmentTracker.java:134) ~[ oak-run-1.0.22.jar:1.0.22]
at org.apache.jackrabbit.oak.plugins.segment.SegmentId.getSegment(SegmentId.java:101) [ oak-run-1.0.22.jar:1.0.22]
...
Exception in thread "main" org.apache.jackrabbit.oak.plugins.segment.SegmentNotFoundException: Segment 4d139bc4-150c-4f0a-b82a-4867593098a not found
at org.apache.jackrabbit.oak.plugins.segment.file.FileStore.readSegment(FileStore.java:855)
at org.apache.jackrabbit.oak.plugins.segment.SegmentTracker.getSegment(SegmentTracker.java:134)
at org.apache.jackrabbit.oak.plugins.segment.SegmentId.getSegment(SegmentId.java:101)
...
Scenario 3
Running an Offline Compaction can fail with IllegalArgument Exception when there are integrity issues of the repository.
A stack trace similar to the one below shows in the logs:
java.lang.IllegalArgumentException
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:77)
at org.apache.jackrabbit.oak.plugins.segment.ListRecord.(ListRecord.java:41)
at org.apache.jackrabbit.oak.plugins.segment.ListRecord.getEntry(ListRecord.java:64)
at org.apache.jackrabbit.oak.plugins.segment.ListRecord.getEntries(ListRecord.java:81)
at org.apache.jackrabbit.oak.plugins.segment.SegmentStream.read(SegmentStream.java:153)
at org.apache.jackrabbit.oak.commons.IOUtils.readFully(IOUtils.java:53)
at org.apache.jackrabbit.oak.plugins.segment.Compactor.getBlobKey(Compactor.java:412)
at org.apache.jackrabbit.oak.plugins.segment.Compactor.compact(Compactor.java:362)
at org.apache.jackrabbit.oak.plugins.segment.Compactor.compact(Compactor.java:321)
at org.apache.jackrabbit.oak.plugins.segment.Compactor.access$500(Compactor.java:54)
at org.apache.jackrabbit.oak.plugins.segment.Compactor$CompactDiff.propertyAdded(Compactor.java:227)
at org.apache.jackrabbit.oak.plugins.segment.CancelableDiff.propertyAdded(CancelableDiff.java:47)
at org.apache.jackrabbit.oak.plugins.memory.EmptyNodeState.compareAgainstEmptyState(EmptyNodeState.java:156)
at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.compareAgainstBaseState(SegmentNodeState.java:434)
at org.apache.jackrabbit.oak.plugins.segment.Compactor$CompactDiff.diff(Compactor.java:214)
at org.apache.jackrabbit.oak.plugins.segment.Compactor$CompactDiff.childNodeAdded(Compactor.java:263)
at org.apache.jackrabbit.oak.plugins.segment.CancelableDiff.childNodeAdded(CancelableDiff.java:74)
at org.apache.jackrabbit.oak.plugins.memory.EmptyNodeState.compareAgainstEmptyState(EmptyNodeState.java:161)
at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.compareAgainstBaseState(SegmentNodeState.java:434)
at org.apache.jackrabbit.oak.plugins.segment.Compactor$CompactDiff. diff (Compactor.java:214)
at org.apache.jackrabbit.oak.plugins.segment.Compactor$CompactDiff.childNodeAdded(Compactor.java:263)
at org.apache.jackrabbit.oak.plugins.segment.CancelableDiff.childNodeAdded(CancelableDiff.java:74)
at org.apache.jackrabbit.oak.plugins.memory.EmptyNodeState.compareAgainstEmptyState(EmptyNodeState.java:161)
at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.compareAgainstBaseState(SegmentNodeState.java:434)
at org.apache.jackrabbit.oak.plugins.segment.Compactor$CompactDiff. diff (Compactor.java:214)
at org.apache.jackrabbit.oak.plugins.segment.Compactor$CompactDiff.childNodeAdded(Compactor.java:263)
at org.apache.jackrabbit.oak.plugins.segment.CancelableDiff.childNodeAdded(CancelableDiff.java:74)
at org.apache.jackrabbit.oak.plugins.memory.EmptyNodeState.compareAgainstEmptyState(EmptyNodeState.java:161)
at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.compareAgainstBaseState(SegmentNodeState.java:434)
at org.apache.jackrabbit.oak.plugins.segment.Compactor$CompactDiff. diff (Compactor.java:214)
at org.apache.jackrabbit.oak.plugins.segment.Compactor$CompactDiff.childNodeAdded(Compactor.java:263)
at org.apache.jackrabbit.oak.plugins.segment.CancelableDiff.childNodeAdded(CancelableDiff.java:74)
Resolution resolution
There are multiple procedures we can follow to resolve the situation and complete the Offline Compaction successfully.
Important: Perform a full backup of your repository before following the steps below.
A. Revert to the last known good revision of the segment store.
The check run-mode of oak-run can be used to determine the last known good revision of a segment store.
That can be used to manually revert a corrupt segment store to its latest good revision.
Caution: This process will rollback the data in the system to a previous point in time.
If you would like to avoid losing changes in your system, then you can try Option B below instead.
To perform the check and restore, follow these steps:
-
Download the
oak-run
jar file from here https://repo1.maven.org/maven2/org/apache/jackrabbit/oak-run/. -
Stop AEM.
-
Run this command:
java -jar oak-run-*.jar check --bin=-1 crx-quickstart/repository/segmentstore/
This command searches backwards through the revisions until it finds a consistent one:
14:00:30.783 [ main] INFO o.a.j.o.p.s.f.t.ConsistencyChecker - Found latest good revision afdb922d-ba53-4a1b-aa1b-1cb044b535cf:234880
(In case the ConsistencyChecker fails, go to the next section.)
-
Revert the repository to this revision by editing:
/crx-quickstart/repository/segmentstore/journal.log
Delete all lines after the line containing the latest good revision.
If you would like to find out what date and time you are reverting the repository to, run this command in the
segmentstore
folder (replace afdb922d-ba53-4a1b-aa1b-1cb044b535cf with the latest good revision in yourjournal.log
):find . -type f -name "data*.tar" -exec sh -c "tar -tvf {} |grep afdb922d-ba53-4a1b-aa1b-1cb044b535cf" \; -print
The output would show you an approximate date and time of that revision.
-
Remove all
./crx-quickstart/repository/segmentstore/*.bak files.
-
If using AEM 6.0, then download the oak-run version matching what is installed in AEM for the remaining steps.
Download it from here https://repo1.maven.org/maven2/org/apache/jackrabbit/oak-run/.
-
Run checkpoint clean-up to remove orphaned checkpoints:
java -jar oak-run-*.jar checkpoints ./crx-quickstart/repository/segmentstore rm-unreferenced
-
Finally compact the repository:
java -jar oak-run-*.jar compact ./crx-quickstart/repository/segmentstore/
B. Remove corrupted nodes manually.
In AEM, TarMK setups with no FileDatastore configured and situations where corruption is in the binaries, you can do the following.
*Caution:*The procedure below is intended for power users.
When deleting the corrupt nodes you need to make sure they aren’t system nodes (such as /home
, /jcr:system
, etc.) .
Or if they are system nodes then you need to make sure you could restore them.
Please consult with AEM Customer Care team for assistance with the steps documented here if you are unsure.
-
Stop AEM.
-
Use the Oak run console and load the
childCount
groovy script to identify the corrupt nodes in the segment store:Load the oak-run console shell:
java -jar oak-run-*.jar console crx-quickstart/repository/segmentstore
Run the two commands below in the shell to load the script and run it:
:load
https://gist.githubusercontent.com/stillalex/e7067bcb86c89bef66c8/raw/d7a5a9b839c3bb0ae5840252022f871fd38374d3/childCount.groovy
countNodes(session.workingNode)
This results in the following output indicating the path to the corrupted node(s):
21:21:42.029 [ main] ERROR o.a.j.o.p.segment.SegmentTracker - Segment not found: 63ae05a4-b506-445c-baa2-cfa1b13b6e2f. Creation date delta is 3 ms.
warning unable to read node /content/dam/test.txt/jcr:content/renditions/original/jcr:content
In some cases the issue is linked to binary properties and the
childCount
groovy script is unable to locate any corrupted nodes.In these cases you can use the following command instead which will read the first 1024 bytes for every binary encountered during the traversal (Please note this command will be slower and should only be used when the one above doesn’t return the expected results):
countNodes(session.workingNode,true)
-
Remove all the identified corrupted nodes listed in the output of the last command using
rmNodes.groovy
.Load the oak-run console shell:
java -jar oak-run-*.jar console crx-quickstart/repository/segmentstore
Load the groovy script:
:load
https://gist.githubusercontent.com/stillalex/43c49af065e3dd1fd5bf/raw/9e726a59f75b46e7b474f7ac763b0888d5a3f0c3/rmNode.groovy
Run the
rmNode
command to remove the corrupt node, replace/path/to/corrupt/node
with the path to the corrupt node you need to remove.rmNode(session, "/path/to/corrupt/node")
Where the corrupt node path is the path obtained in step 2, for example:
/content/dam/test.txt/jcr:content/renditions/original/jcr:content/
.
Note: When usingoak-run.jar
version 1.6.13 and up, set--read-write
JVM parameter if you run into an error such as:code language-none /> rmNode(session,"/path/to/corrupt/node") Removing node /path/to/corrupt/node ERROR java.lang.UnsupportedOperationException: Cannot write to read-only store at org.apache.jackrabbit.oak.segment.SegmentWriterBuilder$1.execute (SegmentWriterBuilder.java:171) at org.apache.jackrabbit.oak.segment.SegmentWriter.writeNode (SegmentWriter.java:318) at org.apache.jackrabbit.oak.segment.SegmentNodeBuilder.getNodeState (SegmentNodeBuilder.java:111) at org.apache.jackrabbit.oak.segment.SegmentNodeStore$Commit.<init> (SegmentNodeStore.java:581) at org.apache.jackrabbit.oak.segment.SegmentNodeStore.merge (SegmentNodeStore.java:333) at org.apache.jackrabbit.oak.spi.state.NodeStore$merge.call (Unknown Source) at groovysh_evaluate.rmNode (groovysh_evaluate:11)
-
Repeat step 3 for all nodes found in step 2.
This above
rmNode
command should return true for the corrupt path, which means it deleted it.Make sure these found three corrupt paths get deleted by re-running the
rmNode
command on those paths.For the next run it should return
false
.If still you see that the same paths are there in repository, then use the patched version of oak-run jar (i.e. oak-run-1.2.18-NPR-17596).
What does patched version of oak-run jar do?
This version of jar skips unreadable binaries on compaction, replacing them with 0-byte binaries and logging the exception and the path to syserr.
The thus-compacted repository should then pass oak-run check, the node count script and you should also be able to compact it again using a non-patched oak-run.
-
Perform a checkpoint cleanup by listing checkpoints using below.
If there are more than one checkpoint, then clean them up:
nohup java -Xmx4096m -jar oak-run-1.2.18.jar checkpoints /app/AEM6/author/crx-quickstart/repository/segmentstore rm-all>>nohup.out &
-
Run an offline compaction.
If you don’t know how to run offline compaction, then see Oak offline compaction instructions on GitHub Gist.
-
Start the server and wait for indexing completion.
Cause
A SegmentNotFoundException is returned when a segment is not present while the compaction attempts to read the node.
There can be different root causes for this:
- The segment has been removed by manual intervention (For example: rm -rf /).
- The segment has been removed by revision garbage collection.
- The segment cannot be found due to some bug in the code.
In case the issue is caused by revision garbage collection (Cause #2), make sure that online compaction is disabled to avoid further nodes from getting corrupted.