How to Configure a Cluster Without Multicast Traffic
This article applies to Confluence clustered 5.4 or earlier. This one does not apply for Confluence Data Center.
If you are using Confluence Data Center 5.9+ and are looking to switch from multicast to unicast for node discovery, please refer instead to: Change Node Discovery from Multicast to TCP/IP or AWS
A node in a cluster will fail to rejoin after failure, yet does not form a new cluster (i.e. no cluster panic occurs). WARN level logging shows:
Tangosol Coherence AE 3.2/365 (RC3) <Warning> (thread=Cluster, member=n/a): This Member(Id=0, Timestamp=2007-08-08 03:25:56.418, Address=10.254.228.126:8088, MachineId=4222) has been attempting to join the cluster at address 220.127.116.11:32365 with TTL 1 for 36 seconds without success; this could indicate a mis-configured TTL value, or it may simply be the result of a busy cluster or active failover. Tangosol Coherence AE 3.2/365 (RC3) <Warning> (thread=Cluster, member=n/a): Received a discovery message that indicates the presence of an existing cluster that does not respond to join requests; this is usually caused by a network layer failure:
Use Wireshark or Snoop to capture UDP traffic from the rejoining node and the senior node in the cluster. The rejoining node shows the broadcast of multicast traffic, yet the senior node shows no such traffic arriving from this node.
The network 'cloud' is dropping multicast traffic from the rejoining node, yet is allowing it from the senior node.
Configure Coherence to use a set of well-known addresses. This means the members of your cluster are statically defined - you can't swap in new nodes beyond your original four (unless the new node is a straight replacement and has the same IP address as the node it is replacing). In addition, you can't add new nodes beyond your original configuration without stopping the cluster. The reason this fixes the problem is that multicast becomes unnecessary - when each node already knows the addresses of all the others, autodiscovery is no longer needed. Thus, only unicast (TCP) is used in communication between them.
The following procedure must be done on each node.
<confluence home>/confluence/WEB-INF/lib/confluence-x.y.jar. See How to edit files in Confluence JAR files.
tangosol-coherence-override.xml, comment out the multicast-listener element:
<!-- <multicast-listener> <time-to-live system-property="tangosol.coherence.ttl">0</time-to-live> <address system-property="tangosol.coherence.clusteraddress">18.104.22.168</address> </multicast-listener> -->
Immediately following the commented out element, add a new unicast-listener element like this -
<unicast-listener> <well-known-addresses> <socket-address id="1"> <address>IP address of wiki01</address> <port>8088</port> </socket-address> <socket-address id="2"> <address>IP address of wiki02</address> <port>8088</port> </socket-address> <socket-address id="3"> <address>IP address of wiki03</address> <port>8088</port> </socket-address> <socket-address id="4"> <address>IP address of wiki04</address> <port>8088</port> </socket-address> </well-known-addresses> <address>IP address of node you are making this change on</address> <port>8088</port> </unicast-listener>
Use the IP address of each node and not the name. The address specific after the well-known-addresses will be the IP address (not node name) of the current host your are editing the config for.
- Re-jar and re-deploy the file. Again see How to edit files in Confluence JAR files.