How to run a TCP network test between Confluence or Jira Data Center Nodes
プラットフォームについて: Data Center - この記事は、Data Center プラットフォームのアトラシアン製品に適用されます。
このナレッジベース記事は製品の Data Center バージョン用に作成されています。Data Center 固有ではない機能の Data Center ナレッジベースは、製品のサーバー バージョンでも動作する可能性はありますが、テストは行われていません。サーバー*製品のサポートは 2024 年 2 月 15 日に終了しました。サーバー製品を利用している場合は、アトラシアンのサーバー製品のサポート終了のお知らせページにて移行オプションをご確認ください。
*Fisheye および Crucible は除く
目的
When troubleshooting network connectivity for hazelcast between Data Center nodes, a predefined test script is useful for capturing consistent real-world information which monitoring software may not catch.
This script below can be utilized to autonomously capture data for later analysis by support.
This script should not be used (proactively or otherwise) unless explicitly requested by support.
環境
- Confluence Data Center
- Jira Data Center
- This test requires that your Data Center environment is configured to use TCP/IP for Node Discovery, as detailed in Change Node Discovery from Multicast to TCP/IP or AWS.
- This also requires an open port for communications, typically a port other than the hazelcast port is recommended (e.g. port 8888)
詳細
The idea behind this script is to use the "nc" (netcat) command which allows us to test a TCP connection between two servers. In essence, one server will send a message to a remote server on a particular port, and the receiving server will be configured to listen on that port and record the output to a file. This will need to run continuously until a Cluster incident occurs, after which the script output should be provided to support for anlaysis.
If there is a failure in the TCP stream (i.e. one node stops sending TCP packets successfully to another node), then the script will be stuck on the sending node. The other two nodes will stop logging entries from the stuck node, and by analyzing the timestamps in the output of the receiving nodes we can see when this failure occurred. One entry is logged each second on each node.
It is important to note the scope of this test, as it is designed to track one TCP stream between each node. It does not track all TCP traffic, including those used in Confluence by or other functions. In other words, if we can capture a failure from the script, then it shows us that there are connection issues over TCP. However the inverse is not true: if it does not show a failure, it does not necessarily prove that all TCP connections are fine.
Steps to implement the TCP test script
Download the script tcptest.sh.zip, which contains the below:
#!/bin/bash # TCP test for Confluence Data Center # This script tests ongoing network communications on port '8888' (alternate from the hazelcast port 5801 to identify lower layer issues) # local node's IP (use the IP assigned to the interface configured in confluence.cfg.xml) localip=10.0.0.1 # Each cluster member node IP (found in confluence.cfg.xml) remoteip1=10.0.0.2 remoteip2=10.0.0.3 remoteip3=10.0.0.4 remoteip4=10.0.0.5 #time in seconds between executions sleep_duration=1 # netcat test: while : do echo "From $localip - $(date)" | nc $remoteip1 8888 echo "From $localip - $(date)" | nc $remoteip2 8888 echo "From $localip - $(date)" | nc $remoteip3 8888 echo "From $localip - $(date)" | nc $remoteip4 8888 sleep $sleep_duration done
For Ubuntu, it may be necessary to use
nc -N
to ensure connections are closed after sending the message:For Windows, expand the following section for a powershell script:
- Extract and Edit the tcptest.sh file to set the "localip" value to the IP of the node you are running this on (i.e. replace the 10.0.0.1 placeholder)
- Modify the remoteip values to represent the cluster member IP's as defined in the confluence.cfg.xml file.
- Double-check that those four IPs are correctly representing each of the 4 nodes
- Node count will depend on the environment, add/remove remoteip's as necessary, along with the corresponding "echo" line in the while loop.
Run this script, which will begin sending out messages to each of the other node over TCP:
tcptest.sh &
In a separate shell session, run the following script. This continuously listens to traffic on port 8888 (make sure this port is open on each node) and outputs to a file called "node1.out". For other nodes, name the output file "node2.out" and "node3.out":
nc -k -l 8888 > node1.out &
- Repeat this on each of-of the other nodes, ensuring the scripts are updated to properly reflect the local node IP and node name.
On each node there should be two running processes:
- One is the tcptest.sh script, which sends out messages all four nodes (including itself)
- The other is the listening command, which listens on 8888 and outputs whatever it receives into a file
Please keep these running continuously on all nodes. Once a day or two worth of data has been collected, support can review the output from all 4 nodes (node1.out, node2.out, node3.out, node4.out) daily to see if there are gaps in the timestamps, indicating a possible network issue.
Additional Confluence Hazelcast Logging
When utilizing the above script to troubleshoot network issues, support may also advise to enable health monitoring diagnostics for Hazelcast, which may be useful in the event of a network incident.
This will need to be applied individually to each node, and will be picked up on a restart:
- Add the following JVM startup parameter to <confluence_install>/bin/setenv.sh:
-Dhazelcast.health.monitoring.level=NOISY
- Add the following logging configuration in <confluence_install>/confluence/WEB-INF/classes/log4j.properties
log4j.logger.com.hazelcast.util.HealthMonitor=TRACE
The additional logging will be output to the Confluence logging location (normally <confluence_home>/logs/atlassian-confluence.log) from each node.