Cluster communication problems: Member has left cluster, or Member has been forcefully evicted from cluster, or A potential communication problem has been detected

お困りですか?

アトラシアン コミュニティをご利用ください。

コミュニティに質問

 

This article applies to Confluence clustered 5.4 or earlier.

症状

Confluence cluster is not working as expected. Eg: you cannot start more than one node without the new member being evicted from the cluster.

atlassian-confluence.log に次のメッセージが表示される。

2014-05-11 18:24:05,957 WARN [Logger@9233091 3.3.1/389] [Coherence] log 2014-05-11 18:24:05.957 Oracle Coherence GE 3.3.1/389 <Warning> (thread=PacketPublisher, member=2): A potential communication problem has been detected. A packet has failed to be delivered (or acknowledged) after 45 seconds, although other packets were acknowledged by the same cluster member (Member(Id=1, Timestamp=2014-05-11 18:17:48.519, Address=xxx.xxx.xxx.x:8090, MachineId=12345, Location=process:1234@CONFLUENCE01)) to this member (Member(Id=2, Timestamp=2014-05-11 18:23:16.19, Address=xxx.xxx.xxx.x:8090, MachineId=67891, Location=process:1234@CONFLUENCE02)) as recently as 0 seconds ago. It is possible that the packet size greater than 1468 is responsible; for example, some network equipment cannot handle packets larger than 1472 bytes (IPv4) or 1468 bytes (IPv6). Use the 'ping' command with the <size> option to verify successful delivery of specifically sized packets. Other possible causes include network failure, poor thread scheduling (see FAQ if running on Windows), an extremely overloaded server, a server that is attempting to run its processes using swap space, and unreasonably lengthy GC times.

2014-05-11 18:13:49,218 WARN [Logger@9226875 3.3.1/389] [Coherence] log 2014-05-11 18:13:49.218 Oracle Coherence GE 3.3.1/389 <Warning> (thread=PacketPublisher, member=2): Timeout while delivering a packet; the member appears to be alive, but exhibits long periods of unresponsiveness; removing Member(Id=1, Timestamp=2014-05-11 18:09:52.641, Address=xxx.xxx.xxx.x:8090, MachineId=41352, Location=process:1234@CONFLUENCE01)

2014-05-11 18:13:49,249 INFO [Cluster:EventDispatcher] [confluence.cluster.coherence.TangosolClusterManager] memberLeft Member has left cluster: Member(Id=1, Timestamp=2014-05-11 18:13:49.218, Address=xxx.xxx.xxx.x:8090, MachineId=12345, Location=process:1234@CONFLUENCE01) 2014-05-11 18:13:49,436 WARN [Logger@9226875 3.3.1/389] [Coherence] log 2014-05-11 18:13:49.436 Oracle Coherence GE 3.3.1/389 <Warning> (thread=Cluster, member=2): The member formerly known as Member(Id=1, Timestamp=2014-05-11 18:13:49.218, Address=xxx.xxx.xxx.x:8090, MachineId=12345, Location=process:1234@CONFLUENCE01) has been forcefully evicted from the cluster, but continues to emit a cluster heartbeat; henceforth, the member will be shunned and its messages will be ignored.

 

原因

There are multiple potential causes for this issue:

  1. The packet size is too large for the network configuration to handle
  2. ガベージ コレクション
  3. Other environmental issues:
    1. Network failure
    2. A VM using swap space
    3. An otherwise overloaded server

回避策

Start just one node and allow that to serve your customers independently while you investigate the root cause of the issue.

ソリューション

Packet Size

Run these commands to confirm that larger packets are allowed through your network:

ping
ping -l 1500
ping -l 3000

If any of these are rejected, get your network administrators to allow larger packet sizes.

ガベージ コレクション

  1. How to Enable Garbage Collection (GC) Logging
  2. Review the logs using a tool like GCViewer
  3. Raise a Support Request if you'd like Support to help you analyse the logs and determine if they are causing the issue
  4. Follow these guidelines to reduce the size of your heap and bring the GC times down

Other environmental issues

Get your network and infrastructure administrators to investigate the current state of the network and the server itself.

最終更新日: 2016 年 2 月 26 日

この内容はお役に立ちましたか?

はい
いいえ
この記事についてのフィードバックを送信する
Powered by Confluence and Scroll Viewport.