Cluster state replication

お困りですか?

アトラシアン コミュニティをご利用ください。

コミュニティに質問

The warm standby clustering architecture requires continuous state replication between all nodes in your Data Center cluster. In Bamboo, most of the state replication is based on invalidation events, which indicate the state change that has occurred on a node. Invalidation events don't carry the new data but provide information about which data needs to be refreshed. Nodes use that information to apply the required changes and rectify the state on all nodes (regardless of where the change originated).

To avoid the potential implications of events occurring in incorrect order, replication isn't immediate. Instead, state modifications are replicated asynchronously — they're added to local queues and then replicated in the background based on the order in which they were queued.

This approach helps:

  • クラスターの拡張性の向上
  • Reduce state inconsistencies between the nodes
  • Improve performance by separating replication from state modifications

On this page:

The state replication process consists of the following stages:

  1. Creating local queues
  2. Adding state modifications to the local queues
  3. Replicating state modifications from local queues to the other nodes

Creating local queues

Before we can queue state modifications, we need to create separate, local queues on each cluster node. We do this so that modifications are properly grouped and ordered.

The queues are created automatically. The nodes retrieve information about the whole cluster from the database by means of the following query:

select * from cluster_node_heartbeat;

Whenever you add a node to the cluster, the already existing nodes detect it and create another 8 queues for the new one in their file system. The new node will also create 8 queues for each remaining node.

By default, each node has 8 separate queues for each remaining node in the active state. We've opted for 8 queues to increase throughput— we can't speed up the replication of a single modification due to the number of actions that must be completed, but we can replicate 8 modifications at the same time.

In a 3-node cluster, the following queues will be created:

On the first node:

  • 8 queues for replication modifications to the second node
  • 8 queues for replicating modifications to the third node

On the second node:

  • 8 queues for replicating modifications to the first node
  • 8 queues for replicating modifications to the third node

On the third node:

  • 8 queues for replicating modifications to the first node
  • 8 queues for replicating modifications to the second node

The queues are created in the <local-bamboo-home>/localq directory on each node. The contents of this directory look similar to the following example:

queue_fa2dcb4cd9614f27ba7cf9cf8932a097_0_fefb71b80fc2701443da7258e8646bce
..
queue_fa2dcb4cd9614f27ba7cf9cf8932a097_2_fefb71b80fc2701443da7258e8646bce
queue_fa2dcb4cd9614f27ba7cf9cf8932a097_3_fefb71b80fc2701443da7258e8646bce
..
queue_fa2dcb4cd9614f27ba7cf9cf8932a097_7_fefb71b80fc2701443da7258e8646bce

The queues are named according to the following scheme:

queue_<node-id>_<node-queue-number>_<hashed-node-id>

ここで:

  • <node-id> is the unique identifier of each node in your Bamboo Data Center cluster
  • <node-queue-number> is the number of the local queue
  • <hashed-node-id> is the hashed node identifier
tip/resting Created with Sketch.

You can obtain the node identifier by inspecting the node.id field in the cluster-node.properties file on each node.

Adding state modifications to the local queues

Once all the queues have been created, state invalidation events can be added to the right queue. Let's use the previous example of the 3-node Bamboo Data Center cluster to illustrate how this occurs:

  1. A state invalidation event occurs on the first node (for example, you make changes to the plan structure).
  2. The changes are made in the database, and the local state related to that change is invalidated.
  3. A request to invalidate the state is added to the following local queues:
    • queue_<node2-id>_<node2-queue-number>_<hashed-node2-id>
    • queue_<node3_id>_<node3-queue-number>_<hashed-node3-id>
  4. The change is replicated to the other nodes based on the order in which it was queued.

Modifications occurring in a single thread are always added to the same queue. This is to keep the order in which they'll be replicated in case two events in the same thread are dependent on each other.

Replicating state modifications from local queues to the other nodes

After all state modifications have been added to the local queues, the replication is handled by dispatchers. Each queue is paired with a dispatcher that handles:

  • reading the state modification from the queue
  • delivering the state modification request to the receiver node over gRPC in a non-blocking manner
  • removing the state modification request from the queue

The dispatcher keeps a single gRPC channel between the publisher and the receiver nodes. The channel transmits only the modifications originating in the local queue the dispatcher was created for.

Following the previous example of the 3-node cluster, this is the process of replicating state modifications:

  1. The dispatcher responsible for the local queue_<node2-id>_<node2-queue-number>_<hashed-node2-id> reads the modifications from that queue and attempts to deliver them over the gRPC channel to the second node.
  2. Another dispatcher responsible for the local queue_<node3_id>_<node3-queue-number>_<hashed-node3-id> reads the modifications from that queue and attempts to deliver them over the gRPC channel to the third node.

If the modifications have been successfully replicated, they're removed from the queues. If the replication fails, error messages are written to the log file.

How to monitor cluster state replication

You can monitor state replication by reviewing statistics that are written in the log file. They’ll show you the size of the local queues, and whether state modifications are successfully replicated or persisted in the queues for too long. In most cases, monitoring just a few parameters will tell you if the replication is working properly.

Learn how to monitor cluster state replication

How to configure cluster state replication

The default state replication settings should be sufficient for most use cases, but if needed, you can adjust options such as the number of local queues per node or the frequency of collecting statistics.

Learn how to configure cluster state replication

最終更新日: 2024 年 2 月 8 日

この内容はお役に立ちましたか?

はい
いいえ
この記事についてのフィードバックを送信する
Powered by Confluence and Scroll Viewport.