Bamboo Data Center NodeAliveWatchdog shuts down Bamboo during DB scheduled backups
プラットフォームについて: Data Center - この記事は、Data Center プラットフォームのアトラシアン製品に適用されます。
このナレッジベース記事は製品の Data Center バージョン用に作成されています。Data Center 固有ではない機能の Data Center ナレッジベースは、製品のサーバー バージョンでも動作する可能性はありますが、テストは行われていません。サーバー*製品のサポートは 2024 年 2 月 15 日に終了しました。サーバー製品を利用している場合は、アトラシアンのサーバー製品のサポート終了のお知らせページにて移行オプションをご確認ください。
*Fisheye および Crucible は除く
要約
Bamboo Data Center shuts down with a message in <bamboo-home>/logs/atlassian-bamboo.log
stating it could not refresh the state in the DB.
環境
Bamboo Data Center 8.0 and later.
診断
The <bamboo-home>/logs/atlassian-bamboo.log
file contains a message similar to:
2023-03-23 06:17:46,556 ERROR [scheduler_Worker-6] [NodeAliveWatchdog] Current node failed to refresh its state in DB within last 3 minutes. This node will now go down
原因
The Bamboo NodeAliveWatchdog monitors the database for read and write ability. If the Database is unavailable or read-only for more than 3 minutes, the node will shut down to allow the cold standby node, if one is available, to take over.
ソリューション
Prior to Bamboo 9.5
If your database is anticipated to be unavailable for more than 3 minutes you can increase or disable the NodeAliveWatchdog timeout by adding a Bamboo System Property. For example, the snippet below will set the timeout to 5 minutes.
-Dbamboo.node.alive.watchdog.timeout=5
Setting the property value of 0 disables the check, that should stop it from shutting down during periods where it cannot get database connections but it's not a recommended approach as we're just masking/working around a potentially serious underlying issue.A number greater than 0 will be the number of minutes.
Bamboo 9.5 and later
We can disable the health-check that is causing the instance to shutdown, as well as increase the node lock and cluster heartbeat timeout value with the below property:
-Dbamboo.node.alive.watchdog.enabled=false -Dbamboo.primary.node.lock.timeout.seconds=600 -Dbamboo.cluster.heartbeat.alive.timeout.seconds=600
This will prevent the nodes and the cluster to remain active till 10 minutes post which it will shutdown if the DB is still unavailable. You can set the timeout value to a higher number if you foresee the DB to be down for a long time.That should stop it from shutting down during periods where it cannot get database connections but it's not a recommended approach as we're just masking/working around a potentially serious underlying issue.
-Dbamboo.node.alive.watchdog.enabled :- is the one when enabled monitors the database for read and write ability, checks whether the DB is unavailable or readonly.
-Dbamboo.cluster.heartbeat.alive.timeout.seconds :- is the duration (in seconds) after which a node is considered dead if no heartbeat is received. Default 300 seconds.
-Dbamboo.primary.node.lock.timeout.seconds :- is the one that specify how long the secondary nodes waits until they take over the primary role. Default 120 seconds. It is not recommended to have a high value of this parameter in the warm standby setup as it prevents secondary nodes from taking over