How to run a TCP network test between Confluence or Jira Data Center Nodes

お困りですか?

アトラシアン コミュニティをご利用ください。

コミュニティに質問


This Knowledge Base article was written specifically for Atlassian Data Center applications. The contents of this article do not apply to Server installations or the Atlassian Cloud platform.

目的

When troubleshooting network connectivity for hazelcast between Data Center nodes, a predefined test script is useful for capturing consistent real-world information which monitoring software may not catch.

This script below can be utilized to autonomously capture data for later analysis by support.

This script should not be used (proactively or otherwise) unless explicitly requested by support.

環境

  • Confluence Data Center 
  • Jira Data Center
  • This test requires that your Data Center environment is configured to use TCP/IP for Node Discovery, as detailed in Change Node Discovery from Multicast to TCP/IP or AWS.
  • This also requires an open port for communications, typically a port other than the hazelcast port is recommended (e.g. port 8888)

詳細

The idea behind this script is to use the "nc" (netcat) command which allows us to test a TCP connection between two servers. In essence, one server will send a message to a remote server on a particular port, and the receiving server will be configured to listen on that port and record the output to a file. This will need to run continuously until a Cluster incident occurs, after which the script output should be provided to support for anlaysis.

If there is a failure in the TCP stream (i.e. one node stops sending TCP packets successfully to another node), then the script will be stuck on the sending node. The other two nodes will stop logging entries from the stuck node, and by analyzing the timestamps in the output of the receiving nodes we can see when this failure occurred. One entry is logged each second on each node. 

It is important to note the scope of this test, as it is designed to track one TCP stream between each node. It does not track all TCP traffic, including those used in Confluence by or other functions. In other words, if we can capture a failure from the script, then it shows us that there are connection issues over TCP. However the inverse is not true: if it does not show a failure, it does not necessarily prove that all TCP connections are fine.

Steps to implement the TCP test script

  1. Download the script tcptest.sh.zip, which contains the below: 

    #!/bin/bash
    # TCP test for Confluence Data Center
    # This script tests ongoing network communications on port '8888' (alternate from the hazelcast port 5801 to identify lower layer issues)
    
    
    # local node's IP (use the IP assigned to the interface configured in confluence.cfg.xml)
    localip=10.0.0.1
    
    
    # Each cluster member node IP (found in confluence.cfg.xml)
    remoteip1=10.0.0.2
    remoteip2=10.0.0.3
    remoteip3=10.0.0.4 
    remoteip4=10.0.0.5
    
    
    #time in seconds between executions
    sleep_duration=1
    
    
    # netcat test:
    while : 
    do 
      echo "From $localip - $(date)" | nc $remoteip1 8888 
      echo "From $localip - $(date)" | nc $remoteip2 8888 
      echo "From $localip - $(date)" | nc $remoteip3 8888 
      echo "From $localip - $(date)" | nc $remoteip4 8888 
      sleep $sleep_duration 
    done 


    1. For Ubuntu, it may be necessary to use nc -N to ensure connections are closed after sending the message:

    2. ここをクリックして展開...
      # netcat test:
      while : 
      do 
        echo "From $localip - $(date)" | nc -N $remoteip1 8888 
        echo "From $localip - $(date)" | nc -N $remoteip2 8888 
        echo "From $localip - $(date)" | nc -N $remoteip3 8888 
        echo "From $localip - $(date)" | nc -N $remoteip4 8888 
        sleep $sleep_duration 
      done
    3. For Windows, expand the following section for a powershell script: 

      ここをクリックして展開...
      # Powershell TCP test for Confluence Data Center
      
      #List of IP's:
      $local_host_IP="10.0.0.1"
      $remote_host_IP_A="10.0.0.2"
      $remote_host_IP_B="10.0.0.3"
      $remote_host_IP_C="10.0.0.3"
       
      #Create an output file to write to:
      $path_to_TCP_file="C:\Users\atlassian\Documents\Horlle\TCP_Test_File.txt"
      New-Item $path_to_TCP_file -type file
       
      #Actual command Loop
      while ($true)
      {
      Write-Output "Test Connection from $local_host_IP at $(Get-Date) to $remote_host_IP_A - $(Test-NetConnection -ComputerName $remote_host_IP_A -InformationLevel Quiet)" | Out-File -FilePath $path_to_TCP_file -Append
      Write-Output "Test Connection from $local_host_IP at $(Get-Date) to $remote_host_IP_B - $(Test-NetConnection -ComputerName $remote_host_IP_B -InformationLevel Quiet)" | Out-File -FilePath $path_to_TCP_file -Append
      Write-Output "Test Connection from $local_host_IP at $(Get-Date) to $remote_host_IP_C - $(Test-NetConnection -ComputerName $remote_host_IP_C -InformationLevel Quiet)" | Out-File -FilePath $path_to_TCP_file -Append
      Start-Sleep -Seconds 3
      }
  2. Extract and Edit the tcptest.sh file to set the "localip" value to the IP of the node you are running this on (i.e. replace the 10.0.0.1 placeholder) 
  3. Modify the remoteip values to represent the cluster member IP's as defined in the confluence.cfg.xml file.
    1. Double-check that those four IPs are correctly representing each of the 4 nodes 
    2. Node count will depend on the environment, add/remove remoteip's as necessary, along with the corresponding "echo" line in the while loop.
  4. Run this script, which will begin sending out messages to each of the other node over TCP: 

    tcptest.sh &
  5. In a separate shell session, run the following script. This continuously listens to traffic on port 8888 (make sure this port is open on each node) and outputs to a file called "node1.out". For other nodes, name the output file "node2.out" and "node3.out": 

    nc -k -l 8888 > node1.out &
  6. Repeat this on each of-of the other nodes, ensuring the scripts are updated to properly reflect the local node IP and node name.


On each node there should be two running processes:

  1. One is the tcptest.sh script, which sends out messages all four nodes (including itself)
  2. The other is the listening command, which listens on 8888 and outputs whatever it receives into a file

(warning) Please keep these running continuously on all nodes. Once a day or two worth of data has been collected, support can review the output from all 4 nodes (node1.out, node2.out, node3.out, node4.out) daily to see if there are gaps in the timestamps, indicating a possible network issue.

Additional Confluence Hazelcast Logging

When utilizing the above script to troubleshoot network issues, support may also advise to enable health monitoring diagnostics for Hazelcast, which may be useful in the event of a network incident.

This will need to be applied individually to each node, and will be picked up on a restart:

  1. Add the following JVM startup parameter to <confluence_install>/bin/setenv.sh:
    -Dhazelcast.health.monitoring.level=NOISY


  2. Add the following logging configuration in <confluence_install>/confluence/WEB-INF/classes/log4j.properties
    log4j.logger.com.hazelcast.util.HealthMonitor=TRACE
    

 The additional logging will be output to the Confluence logging location (normally <confluence_home>/logs/atlassian-confluence.log) from each node.


説明

When troubleshooting network connectivity for hazelcast between Confluence Data Center nodes, a predefined test script is useful for capturing consistent real-world information which monitoring software may not catch.

製品Confluence Data Center

最終更新日 2019 年 6 月 27 日

この内容はお役に立ちましたか?

はい
いいえ
この記事についてのフィードバックを送信する
Powered by Confluence and Scroll Viewport.