Bamboo agents go offline and come back online without restart
プラットフォームについて: Data Center - この記事は、Data Center プラットフォームのアトラシアン製品に適用されます。
このナレッジベース記事は製品の Data Center バージョン用に作成されています。Data Center 固有ではない機能の Data Center ナレッジベースは、製品のサーバー バージョンでも動作する可能性はありますが、テストは行われていません。サーバー*製品のサポートは 2024 年 2 月 15 日に終了しました。サーバー製品を利用している場合は、アトラシアンのサーバー製品のサポート終了のお知らせページにて移行オプションをご確認ください。
*Fisheye および Crucible は除く
要約
Bamboo remote agents go offline suddenly and come back online after a brief period without any manual intervention or restarting the agents.
環境
This was observed in 9.6.1 but could be applicable to other versions as well since the issue is related to the network.
診断
When you start seeing your agents going offline or shutting down suddenly, it is important to check the <bamboo-home>/logs/atlassian-bamboo.log
file to understand what could be going wrong.
- In case you see the below logs, this means there is an ActiveMQ (Remote agent broker) failure. The logs tell us that a request sent to the ActiveMQ server did not receive a response within a certain timeframe.:
2024-05-21 19:25:08,345 WARN [RemoteEventBroadcast-1] [RemoteBroadcastEventListener] Broadcast failed with timeout, backing off...
2024-05-21 19:25:08,362 INFO [RemoteEventBroadcast-1] [RemoteBroadcastEventListener] Caught UncategorizedJmsException
2024-05-21 19:25:08,362 INFO [RemoteEventBroadcast-1] [Emergency] Caught UncategorizedJmsException
org.springframework.jms.UncategorizedJmsException: Uncategorized exception occurred during JMS processing; nested exception is javax.jms.JMSException: org.apache.activemq.transport.RequestTimedOutIOException
at org.springframework.jms.support.JmsUtils.convertJmsAccessException(JmsUtils.java:311) ~[spring-jms-5.3.33.jar:5.3.33]
...
Caused by: javax.jms.JMSException: org.apache.activemq.transport.RequestTimedOutIOException
- The logs could be followed by the below messages, which suggest there is a disconnection of the agents:
2024-05-21 19:44:33,364 INFO [scheduler_Worker-10] [PlanStatePersisterImpl] Updating delta states of build following NPV-1
2024-05-21 19:44:33,504 WARN [scheduler_Worker-10] [RemoteAgentManagerImpl] Detected that remote agent 'agent1' has been inactive since Tue May 21 19:20:15 UAT 2024
2024-05-21 19:45:33,604 WARN [scheduler_Worker-10] [RemoteAgentManagerImpl] Marking remote agent 'agent1' as unresponsive
2024-05-21 19:45:33,604 WARN [scheduler_Worker-10] [RemoteAgentManagerImpl] Detected that remote agent 'agent2' has been inactive since Tue May 21 19:22:15 UAT 2024
- You may also find many network related errors:
2024-05-21 19:55:19,602 ERROR [http-nio-8085-exec-282] [ArtifactServlet] Exception when storing the artifact
org.apache.catalina.connector.ClientAbortException: java.io.IOException: Connection reset by peer
原因
When such messages are seen usually they are due to a network connectivity issue which causes the ActiveMQ to become unresponsive. This inturn causes the disconnection of agents.
ソリューション
You can check if there were any reported network errors during the time frame.