Cluster Cache replication health check fails with error SocketException: Broken pipe exception

お困りですか?

アトラシアン コミュニティをご利用ください。

コミュニティに質問

 

プラットフォームについて: Data Center のみ - この記事は、Data Center プラットフォームのアトラシアン製品にのみ適用されます。

この KB は Data Center バージョンの製品用に作成されています。Data Center 固有ではない機能の Data Center KB は、製品のサーバー バージョンでも動作する可能性はありますが、テストは行われていません。サーバー*製品のサポートは 2024 年 2 月 15 日に終了しました。サーバー製品を利用している場合は、アトラシアンのサーバー製品のサポート終了のお知らせページにて移行オプションをご確認ください。

*Fisheye および Crucible は除く

要約


Cluster Cache replication health check fails and the nodes cannot communicate with each other to replicate cache.


Name: Cluster Cache Replication
NodeId: null
Is healthy: false
Failure reason: ["The node node3 is not replicating","The node node2 is not replicating"]
Severity: CRITICAL


Exception from the atlassian-jira.log:

2021-06-04 18:26:25,830+0000 localq-reader-12 ERROR      [c.a.j.c.distribution.localq.LocalQCacheOpReader] [LOCALQ] [VIA-COPY] Abandoning sending: LocalQCacheOp{cacheName='com.atlassian.jira.plugins.healthcheck.service.HeartBeatService.heartbeat', action=PUT, key=node2, value == null ? false, replicatePutsViaCopy=true, creationTimeInMillis=1622831185825} from cache replication queue: [queueId=queue_node1_2_164546f60261c7e4be0c5f5f9aaeec86_put, queuePath=/var/atlassian/application-data/jira-home/localq/queue_node1_2_164546f60261c7e4be0c5f5f9aaeec86_put], failuresCount: 1/1. Removing from queue. Error: java.rmi.MarshalException: error marshalling arguments; nested exception is: 
    	java.net.SocketException: Broken pipe (Write failed)
com.atlassian.jira.cluster.distribution.localq.LocalQCacheOpSender$UnrecoverableFailure: java.rmi.MarshalException: error marshalling arguments; nested exception is: 
	java.net.SocketException: Broken pipe (Write failed)
	at com.atlassian.jira.cluster.distribution.localq.rmi.LocalQCacheOpRMISender.send(LocalQCacheOpRMISender.java:90)
	at com.atlassian.jira.cluster.distribution.localq.LocalQCacheOpReader.run(LocalQCacheOpReader.java:96)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.rmi.MarshalException: error marshalling arguments; nested exception is: 
	java.net.SocketException: Broken pipe (Write failed)
	at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:158)
	at net.sf.ehcache.distribution.RMICachePeer_Stub.put(Unknown Source)
	at com.atlassian.jira.cluster.distribution.localq.rmi.LocalQCacheOpRMISender.lambda$send$2(LocalQCacheOpRMISender.java:69)
	at com.atlassian.jira.cluster.distribution.localq.rmi.CachingRMICachePeerManager.withCachePeer(CachingRMICachePeerManager.java:94)
	... 6 more
Caused by: java.net.SocketException: Broken pipe (Write failed)
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
	at java.io.ObjectOutputStream$BlockDataOutputStream.writeByte(ObjectOutputStream.java:1915)
	at java.io.ObjectOutputStream.writeFatalException(ObjectOutputStream.java:1576)
	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:351)
	at sun.rmi.server.UnicastRef.marshalValue(UnicastRef.java:294)
	at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:153)


原因

The nodes are unable to communicate with each other due to a loop in the IP configuration of the system.

診断

Verify if the IP entries are correct under /etc/hosts. Ensure there are no duplicated entries pointing to the node with both external and internal IP addresses:

127.0.1.1 ip-node-1
127.0.0.1 ip-node-1

ソリューション

  1. Remove the nodes' hostname  from the local loopback IP 127.0.0.1 – Leaving only its external IP entries
  2. Repeat this for all nodes
  3. Jira を再起動します。
  4. Check if cluster cache replication health is resolved.


You may need to check if the IP was statically defined in other settings of the system, such as:

JIRA_LOCAL_HOME/cluster.properties, entry:

ehcache.listener.hostName

JIRA_INSTALL/bin/setenv.sh, entry:

-Djava.rmi.server.hostname



最終更新日 2021 年 7 月 29 日

この内容はお役に立ちましたか?

はい
いいえ
この記事についてのフィードバックを送信する
Powered by Confluence and Scroll Viewport.