Replication queue issues

This article addresses the AEM (version 6.4 and version 6.5) issue where AEM content replication appears to be blocked. To resolve this, do the following as discussed in the steps below:

  • Make sure that each replication agent is enabled and configured correctly
  • Restart replication agents and related bundles
  • Force the queue clearance by deleting corresponding Sling Jobs

Description description

Environment

  • Experience Manager 6.4
  • Experience Manager 6.5

Issue/Symptoms

The AEM content replication appears to be blocked.

Editors can create content, but the activated pages are not updated on the CQ5 publish instance.

Resolution resolution

A. Make sure that each replication agent is enabled and configured correctly.

  1. Go to the list of replication agents (/etc/replication/agents.author.html)

  2. For each replication agent, do the following:

    • Make sure that the agent is enabled.
    • Verify the connectivity with the publish instance by clicking Test Connection. If it fails, make sure that on TCP network level, the server hosting the AEM author instance can connect to the port of the publish instance.
    • Open the replication log via the “View Log” link and check when the last replication attempt was successful.
    • Note the first payload path in the replication queue. Then try to clear the first element of the replication queue. Then, verify whether the replication resumes. Once it resumes, activate the first payload noted in the queue again.
    • Check with the CRX Content Explorer, and make sure that there is no /bin/receive node on the publish instance. Otherwise, delete it.
    • Check with the CRX Content Explorer, and make sure that there is no /bin/replicate node on the author instance. Otherwise, delete it.

B. Restart replication agents and related bundles.

At that point, we can consider the replications agents are configured correctly. If the logs show no replication attempt for a few minutes, then try the following corrective actions to unblock the queues, in this order, checking between each step if the replication resumes.

  1. Disable the replication agent, then re-enable it.
  2. Restart the replication bundle in the Felix console (http://host:port/system/console/bundles/com.day.cq.cq-replication).
  3. Restart the Apache Sling Event Support bundle (http://host:port/system/console/bundles/org.apache.sling.event).
  4. Restart the Apache Felix EventAdmin (http://host:port/system/console/bundles/org.apache.felix.eventadmin).

C. Force the queue clearance by deleting corresponding Sling Jobs.

If the above fails, then clearing manually the queue might be the last option.

One can achieve this by getting rid directly of the Sling Jobs with topic =replication agent name.

Quickest way to do this is to use CRXDE Lite (http://host:port/crx/de/index.jsp), and delete the node:

/var/eventing/jobs/assigned/%INSTANCE-SLING-ID%/%REPLICATION-AGENT-FULL-ID%

So, for example with the default publish agent:

/var/eventing/jobs/assigned/e23dd09d-83f1-4735-a77c-394df479214c/com.day.cq.replication.job.publish

Note this is considered as an exceptional workaround action,contact AEM Support anyway if such a case occurs.

recommendation-more-help
3d58f420-19b5-47a0-a122-5c9dab55ec7f