Steps to mitigate ForcedDisconnectException
This video walks through the possible causes and steps to mitigate ForcedDisconnectException
Transcript
Hello everyone, at times you may have seen a force disconnect exception in the logs of your application server as shown here. This is because AM forms and lifecycle use Gemfire as a caching mechanism where each node responds to heartbeat request from the member coordinator within 5000 milliseconds to remain in the cluster. When full garbage collection runs, the whole system pauses for some time because the entire CPU is being utilized. This prevents Gemfire from responding to heartbeats and causes the cluster node to get forced out of the cluster, displaying a force disconnect exception in the logs. In this video, we will show you how to mitigate this problem. The first step is to generate GC statistics by adding the following JVM parameters as shown here. The next step is to capture GC logs and analyze them to identify the time taken by full GC. If this value is greater than the time difference between two heartbeats, we know that this must have forced the cluster node out of the cluster. For example, we can see in the given logs that full GC took 12.25 and 7.54 seconds, which is more than the default timeout of 5 seconds. So we need to increase this default Gemfire membership timeout in the form of JVM argument of an application server to solve the issue. In this case, we will increase the Gemfire membership timeout to 15,000 milliseconds, which is more than the time taken by full GC, so that the server gets some buffer time to complete the full GC. Please note that every change in the JVM arguments requires a restart to be effective. Thank you for watching this video.
recommendation-more-help
8de24117-1378-413c-a581-01e660b7163e