Build resiliency in Bamboo Data Center

BAMBOO Data center

In Bamboo versions earlier than 8.0, when the server’s work got interrupted or if a server went down for more than 5 minutes, Bamboo builds would fail due to lack of connection of the building agent with the server. Bamboo agents were designed to die when they couldn't connect to a server for longer than 5 minutes.

With Bamboo Data Center, the agent will continue its work and finish building even if the connection with the server is lost. Once the agent’s building work is done, it tries to connect to the server. If the server is already online, the agent will send build results, logs, and artifact to the server, and pick up the next tasks from the server. If the server is still down, the agent will try to reconnect with the server after some time.

How many times does the agent try to reconnect?

If agent is started with the agent wrapper, by default, the agent tries to connect to the server 1440 times, or until it’s successful. You can change this value by going to $BAMBOO_AGENT_HOME/conf/wrapper.conf and modifying the wrapper.max_failed_invocations value.

If agent it not started with the agent wrapper, it will try to transmit results 10 times with 5-minute intervals, and then terminate if not successful. However, if manually restarted, the agent will go into the ‘retry’ loop again provided the result is not removed from the disk.

If the transmission problems are caused by the network failure, the effective timeout is considerably shorter as in such case the server recognizes that the agent is offline and terminates the build on its end. This behaviour is configured by heartbeat timeouts. For more information, see Changing the remote agent heartbeat interval.

It is important to understand that this improved build resiliency to server failures will work only if the build process can be finished. Bamboo will not be able to finish the build if:

  • a child process is failing or stopped

  • an agents process is stopped while the build is running

  • a resource required for build process is unavailable (this includes resources provided by the Bamboo server, like REST endpoints and artifacts from other builds)

  • a build is failing because of intermittent infrastructure problems

Build resiliency with elastic agents

Same logic applies to agents started at EC2 environment. To achieve it Bamboo agent is started using the Tanuki wrapper, which is also used by the remote agent. The wrapper allows to restart Bamboo agent when Java process is interrupted by connection timeout error.

If you’re using elastic images provided by Bamboo 8.0 (or based on them), elastic agents use the agent wrapper and can fully benefit from improved build resiliency. Old images are still functional but will work with the ‘short’ timeout only.

After server restart, elastic agents that use the agent wrapper are able to fully resume their operation. Agents without wrapper are allowed to return the result they worked on but then they will terminate.

Disabling elastic tunnel is no longer prerequisite for seamless restarts/improved build resiliency.

最終更新日 2021 年 8 月 25 日




Powered by Confluence and Scroll Viewport.