Jira Data Center Node state is showing as in Maintenance while the node is actually running and not re-indexing

お困りですか?

アトラシアン コミュニティをご利用ください。

コミュニティに質問

プラットフォームについて: Server および Data Center のみ。この記事は、Server および Data Center プラットフォームのアトラシアン製品にのみ適用されます。

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Fisheye および Crucible は除く

   

要約

Normally, a node on a Jira Data Center cluster will show the "maintenance" status when the node is being reindexed and cannot currently serve users, as explained in Jira cluster monitoring:

{"state":"MAINTENANCE"}

However, there can be some unexpected situations where the node is showing the "maintenance" status even though it is running and it's not performing a re-indexing operation.

環境

Any Jira Data Center version on 7.x or 8.x.

原因

Root cause 1

Some of the nodes are configured with an incorrect value in the JVM startup parameter -Djava.rmi.server.hostname. For example, the hostname might be set to a non resolvable domain, or it might be set to an incorrect IP address.

Since this JVM parameter overwrites the hostname configured in the <JIRA_HOME>/cluster.properties file, the node will end up using the incorrect hostname upon node startup.

As a result, the 2 following symptoms might happen:

Root cause 2

The database is configured with an unsupported database collation, and we are hitting the bug  JRASERVER-65708 - Getting issue details... STATUS

Root cause 3

Jira is on version 8.19.1 or higher, and the node which is showing in the MAINTENANCE status has indexes which are in an inconsistent state (out of sync with the Jira Database). As explained in the feature request  JRASERVER-66970 - Getting issue details... STATUS , from Jira 8.19.1, if a Jira node has inconsistent indexes, the node state will enter the "MAINTENANCE" mode.

(warning) Please note that in this case, the "MAINTENANCE" status is expected by design, since it is meant to tell the load balancer not to route traffic to that node (because of the inconsistent index state).

診断

Diagnosis for Root cause 1

Check the Jira application log to see if you can find any trace of cache replication failure:

  • example of error 1

    2021-11-10 08:59:35,390-0800 localq-reader-16 ERROR      [c.a.j.c.distribution.localq.LocalQCacheOpReader] [LOCALQ] [VIA-COPY] Abandoning sending: LocalQCacheOp{cacheName='com.atlassian.jira.crowd.embedded.ofbiz.EagerOfBizUserCache.userCache', action=PUT, key={10100,brian_campbell}, value == null ? false, replicatePutsViaCopy=true, creationTimeInMillis=1636563571212} from cache replication queue: [queueId=queue_node2_5_78882aaeb08e9a4c81687b5de2add74f_put, queuePath=/vxxxx/atlassian/application-data/jira/localq/queue_node2_5_78882aaeb08e9a4c81687b5de2add74f_put], failuresCount: 1/1. Removing from queue. Error: java.rmi.NoSuchObjectException: no such object in table
    com.atlassian.jira.cluster.distribution.localq.LocalQCacheOpSender$UnrecoverableFailure: java.rmi.NoSuchObjectException: no such object in table
    	at com.atlassian.jira.cluster.distribution.localq.rmi.LocalQCacheOpRMISender.send(LocalQCacheOpRMISender.java:90)
    	at com.atlassian.jira.cluster.distribution.localq.LocalQCacheOpReader.run(LocalQCacheOpReader.java:96)
    	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
    	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    	at java.base/java.lang.Thread.run(Unknown Source)
    Caused by: java.rmi.NoSuchObjectException: no such object in table
  • example of error 2

    2021-11-10 18:34:18,323-0800 HealthCheck:thread-6 WxxxxN ServiceRunner     [c.a.t.j.healthcheck.cluster.ClusterReplicationHealthCheck] Node node3 does not seem to replicate its cache
    2021-11-10 18:34:18,324-0800 HealthCheck:thread-6 WxxxxN ServiceRunner     [c.a.t.j.healthcheck.cluster.ClusterReplicationHealthCheck] Node node1 does not seem to replicate its cache
    2021-11-10 18:34:18,328-0800 support-zip ERROR      [c.a.t.healthcheck.concurrent.SupportHealthCheckProcess] Health check 'Cluster Cache Replication' failed with severity 'critical': '["The node node3 is not replicating","The node node1 is not replicating"]'

For each Jira node, check if the -Djava.rmi.server.hostname JVM startup parameter is in use. If it's in use, then check if it is using a correct IP address or a resolvable hostname. If the IP is incorrect of the hostname is not resolvable, then this root cause applies.

Diagnosis for Root cause 2

Go to the page ⚙ > System > Troubleshooting and support tools > Instance health > Database, and check if the health check if complaining about an unsupported collation.

Diagnosis for Root cause 3

  • Jira is running on version 8.19.1 or higher
  • The following WARNING/INFO can be found in the Jira application logs:

    2021-11-22 14:29:38,069+0000 http-nio-8080-exec-10 url: /status WARN anonymous XXXxXXXxX - XX.XXX.X.XXX /status [c.a.j.issue.index.IndexConsistencyUtils] Index consistency check failed for index 'Issue': expectedCount=875155; actualCount=713032
    2021-11-22 14:29:38,070+0000 http-nio-8080-exec-10 url: /status INFO anonymous XXXxXXXxX - XX.XXX.X.XXX /status [c.a.jira.servlet.ApplicationStateResolverImpl] Checking index consistency. Time taken: 160.9 ms
    2021-11-22 14:29:38,070+0000 http-nio-8080-exec-10 url: /status WARN anonymous XXXxXXXxX - XX.XXX.X.XXX /status [c.a.jira.servlet.ApplicationStateResolverImpl] The issue index is inconsistent. This node will report its status as MAINTENANCE. You will find information on how to resolve this problem here: https://jira.atlassian.com/browse/JRASERVER-66970

ソリューション

Solution for Root cause 1

For each affected node:

  • either remove the parameter -Djava.rmi.server.hostname from the JVM startup parameter, if a correct hostname value is already set up in the <JIRA_HOME>/cluster.properties file, and re-start the node
  • or change the value of this parameter to a correct IP address or resolvable hostname

For more detailed information, refer to the KB article JIRA Data Center Asynchronous Cache replication failing health check.

Solution for Root cause 2

Refer to the workaround mentioned in the bug  JRASERVER-65708 - Getting issue details... STATUS

Solution for Root cause 3

The solution consists in fixing the index inconsistency on the problematic node:

  1. Access the problematic node using its IP address via a browser
  2. Go to ⚙ > System > Indexing
  3. Select Full re-index and click Re-index
  4. Wait until the re-indexing completes and confirm that the status of this node changes to RUNNING

(warning) Note that it is possible to prevent the node from going into MAINTENANCE mode when the indexes are out of sync as explained in the Current status section of the feature request  JRASERVER-66970 - Getting issue details... STATUS . If you want to ensure that in the future, the node remains in RUNNING mode while having inconsistent indexes (which was the expected behavior prior to Jira 8.19.1), you will need to add the following JVM startup parameter to each Jira node and re-start each node:

-Dcom.atlassian.jira.status.index.check=false

Note: Even if Jira node is in MAINTENANCE mode, that specific node will still be accessible when browsing Jira through IP address / hostname; Only when browsing Jira through base URL bound to Load Balancer will this node be inaccessible since Load Balancer will not route any requests to it.



最終更新日: 2022 年 10 月 21 日

この内容はお役に立ちましたか?

はい
いいえ
この記事についてのフィードバックを送信する
Powered by Confluence and Scroll Viewport.