Bamboo agents go offline and come back online without restart

お困りですか?

アトラシアン コミュニティをご利用ください。

コミュニティに質問

プラットフォームについて: Data Center - この記事は、Data Center プラットフォームのアトラシアン製品に適用されます。

このナレッジベース記事は製品の Data Center バージョン用に作成されています。Data Center 固有ではない機能の Data Center ナレッジベースは、製品のサーバー バージョンでも動作する可能性はありますが、テストは行われていません。サーバー*製品のサポートは 2024 年 2 月 15 日に終了しました。サーバー製品を利用している場合は、アトラシアンのサーバー製品のサポート終了のお知らせページにて移行オプションをご確認ください。

*Fisheye および Crucible は除く

要約

Bamboo remote agents go offline suddenly and come back online after a brief period without any manual intervention or restarting the agents.

環境

This was observed in 9.6.1 but could be applicable to other versions as well since the issue is related to the network.

診断

When you start seeing your agents going offline or shutting down suddenly, it is important to check the <bamboo-home>/logs/atlassian-bamboo.log  file to understand what could be going wrong.

  • In case you see the below logs, this means there is an ActiveMQ (Remote agent broker) failure. The logs tell us that a request sent to the ActiveMQ server did not receive a response within a certain timeframe.:
2024-05-21 19:25:08,345 WARN [RemoteEventBroadcast-1] [RemoteBroadcastEventListener] Broadcast failed with timeout, backing off...
2024-05-21 19:25:08,362 INFO [RemoteEventBroadcast-1] [RemoteBroadcastEventListener] Caught UncategorizedJmsException
2024-05-21 19:25:08,362 INFO [RemoteEventBroadcast-1] [Emergency] Caught UncategorizedJmsException
org.springframework.jms.UncategorizedJmsException: Uncategorized exception occurred during JMS processing; nested exception is javax.jms.JMSException: org.apache.activemq.transport.RequestTimedOutIOException
  at org.springframework.jms.support.JmsUtils.convertJmsAccessException(JmsUtils.java:311) ~[spring-jms-5.3.33.jar:5.3.33]
...
Caused by: javax.jms.JMSException: org.apache.activemq.transport.RequestTimedOutIOException
  • The logs could be followed by the below messages, which suggest there is a disconnection of the agents:
2024-05-21 19:44:33,364 INFO [scheduler_Worker-10] [PlanStatePersisterImpl] Updating delta states of build following NPV-1
2024-05-21 19:44:33,504 WARN [scheduler_Worker-10] [RemoteAgentManagerImpl] Detected that remote agent 'agent1' has been inactive since Tue May 21 19:20:15 UAT 2024
2024-05-21 19:45:33,604 WARN [scheduler_Worker-10] [RemoteAgentManagerImpl] Marking remote agent 'agent1' as unresponsive
2024-05-21 19:45:33,604 WARN [scheduler_Worker-10] [RemoteAgentManagerImpl] Detected that remote agent 'agent2' has been inactive since Tue May 21 19:22:15 UAT 2024
  • You may also find many network related errors:
2024-05-21 19:55:19,602 ERROR [http-nio-8085-exec-282] [ArtifactServlet] Exception when storing the artifact
org.apache.catalina.connector.ClientAbortException: java.io.IOException: Connection reset by peer

原因

When such messages are seen usually they are due to a network connectivity issue which causes the ActiveMQ to become unresponsive. This inturn causes the disconnection of agents. 

ソリューション

You can check if there were any reported network errors during the time frame.


最終更新日 2024 年 5 月 22 日

この内容はお役に立ちましたか?

はい
いいえ
この記事についてのフィードバックを送信する
Powered by Confluence and Scroll Viewport.