The Instance Health Checks are complaining that at least 1 node in the Jira Data Center cluster is not replicating, even though it is actually replicating successfully

お困りですか?

アトラシアン コミュニティをご利用ください。

コミュニティに質問

    

プラットフォームについて: Data Center のみ - この記事は、Data Center プラットフォームのアトラシアン製品にのみ適用されます。

この KB は Data Center バージョンの製品用に作成されています。Data Center 固有ではない機能の Data Center KB は、製品のサーバー バージョンでも動作する可能性はありますが、テストは行われていません。サーバー*製品のサポートは 2024 年 2 月 15 日に終了しました。サーバー製品を利用している場合は、アトラシアンのサーバー製品のサポート終了のお知らせページにて移行オプションをご確認ください。

*Fisheye および Crucible は除く

要約

The Instance Health Checks are complaining that at least 1 node in the Jira Data Center cluster is not replicating, even though it is actually replicating successfully.

環境

Any Jira 8.x version
Data Center only

診断

  • When checking the Jira application logs of one of the healthy Nodes, we can see that they are complaining that one particular Jira node (or more) is not replicating:

    grep -h 'is not replicating' atlassian-jira.log* | sort
    2021-09-07 19:34:52,751+0000 Caesium-1-1 ERROR ServiceRunner     [c.a.t.healthcheck.concurrent.SupportHealthCheckProcess] Health check 'Cluster Cache Replication' failed with severity 'critical': 'The node problematic-node-ID is not replicating'
    2021-09-07 20:34:52,706+0000 Caesium-1-3 ERROR ServiceRunner     [c.a.t.healthcheck.concurrent.SupportHealthCheckProcess] Health check 'Cluster Cache Replication' failed with severity 'critical': 'The node problematic-node-ID is not replicating'
    2021-09-07 21:34:52,768+0000 Caesium-1-1 ERROR ServiceRunner     [c.a.t.healthcheck.concurrent.SupportHealthCheckProcess] Health check 'Cluster Cache Replication' failed with severity 'critical': 'The node problematic-node-ID is not replicating'
  • However, in these same logs, when checking the replication process related to the "problematic node" (the one that the health checks are complaining about), we can see that the cache replication completes successfully:

    2021-09-08 08:21:32,287+0000 localq-stats-0 INFO      [c.a.j.c.distribution.localq.LocalQCacheManager] [LOCALQ] [scheduled] Running cache replication queue stats for: 20 queues...
    2021-09-08 08:00:27,926+0000 localq-stats-0 INFO      [c.a.j.c.distribution.localq.LocalQCacheManager] [LOCALQ] [VIA-INVALIDATION] Cache replication queue stats per node: problematic-node-ID snapshot stats:
    ...
    2021-09-08 08:00:27,927+0000 localq-stats-0 INFO      [c.a.j.c.distribution.localq.LocalQCacheManager] [LOCALQ] [VIA-COPY] Cache replication replicatePutsViaCopy-queue stats per node: problematic-node-ID snapshot stats:
    ...
    2021-09-08 08:21:32,289+0000 localq-stats-0 INFO      [c.a.j.c.distribution.localq.LocalQCacheManager] [LOCALQ] [scheduled] ... done running cache replication queue stats for: 20 queues.
  • When creating a new Jira ticket while being logged directly into the "problematic node", we can see that this ticket can be found and accessed when logging directly into any other "healthy" node, which is another indication that the replication is actually working properly
  • When running a telnet command between all the Jira nodes of the cluster using their hostname/IP address and the ehcache ports (configured in the files <JIRA_HOME>/cluster.properties of each node), we can confirm that all the nodes are able to communicate with each other
  • When checking the Clustering  page in ⚙ > System, the application status of the "problematic node" might be empty:

原因

We have seen situations where the health check is reporting false positives about the cluster cache replication for some nodes. Unfortunately, the exact root cause of this issue is currently unknown.

ソリューション

Schedule a maintenance window and re-start the "problematic" Jira node. After the restart, the health checks should stop complaining about this node.

アトラシアン サポートにデータを提供する

If a restart of the node did not resolve the issue, please reach out to Atlassian Support via this link. To help the Atlassian support team investigate the issue faster, please attach a support zip from each node to the ticket raised to Atlassian support.


最終更新日 2021 年 9 月 24 日

この内容はお役に立ちましたか?

はい
いいえ
この記事についてのフィードバックを送信する
Powered by Confluence and Scroll Viewport.