This page provides information on how to troubleshoot replication issues.
Replication (non-reverse replication) is failing for some reason.
There are various reasons for replication to fail. This article explains the approach one might take when analyzing these issues.
Are replications getting triggered at all when clicking the Activate button? If NOT then do the following:
Are the replications getting queued up in the replication agent queues?
Check this by going to /etc/replication/agents.author.html then click the replication agents to check.
If one agent queue or a few agent queues are stuck:
Does the queue show blocked status? If so, then is the publish instance not running or unresponsive? Check the publish instance to see what is wrong with it. That is, check the logs, and see if there is an OutOfMemory error or some other issue. If it is just slow, then take thread dumps and analyze them.
Does the queue status show Queue is active - # pending? Basically the replication job could be stuck in a socket read waiting for the publish instance or Dispatcher to respond. This could mean that the publish instance or Dispatcher is under high load or stuck in a lock. Take thread dumps from author and publish in this case.
If all agent queues are stuck
It is possible that a certain piece of content cannot be serialized under /var/replication/data due to repository corruption or some other issue. Check the logs/error.log for a related error. To clear out the bad replication item, do the following:
There might be something wrong with sling eventing framework job queues. Try restarting the org.apache.sling.event bundle in the/system/console.
It might be that job processing is turned off. You can check that under Felix Console in the Sling Eventing Tab. Check if it displays - Apache Sling Eventing (JOB PROCESSING IS DISABLED!)
It might also be the case that DefaultJobManager configuration gets into an inconsistent state. This can happen when someone manually modifies the ‘Apache Sling Job Event Handler’ configuration via the OSGiconsole (For example disable and re-enable the ‘Job Processing Enabled’ property and Save the configuration).
Create a replication.log
Sometimes it is helpful to set all replication logging to be added in a separate log file at DEBUG level. To do this:
Go to https://host:port/system/console/configMgr and login as admin.
Find the Apache Sling Logging Logger factory and create an instance by clicking the + button on the right of the factory configuration. This creates a new logging logger.
Set the configuration like this:
If you suspect the problem to be related to sling eventing/jobs in any way, then you can also add this Java™ package under categories:org.apache.sling.event
Sometime it might be suitable to pause the replication queue to reduce load on the author system, without disabling it. Currently, this is only possible by a hack of temporarily configuring an invalid port. From 5.4 onwards, you could see pause button in replication agent queue it has some limitation
Page permissions are not replicated because they are stored under the nodes to which access is granted, not with the user.
In general, page permissions should not be replicated from the author to publish and are not by default. This is because access rights should be different in those two environments. Therefore, Adobe recommends that you configure ACLs on publish, separately from author.
Sometimes the replication queue is blocked when trying to replicate namespace information from the author instance to the publish instance. This happens because the replication user does not have jcr:namespaceManagement
privilege. To avoid this issue, make sure that:
jcr:namespaceManagement
privilege at the repository level. You can grant the privilege as follows:https://localhost:4502/crx/de/index.jsp
) as administrator.jcr:namespaceManagement
from the privileges list.