Monitoring your mirror farm

このページの内容

お困りですか?

アトラシアン コミュニティをご利用ください。

コミュニティに質問

There are a number of helpful tools and techniques you can use to monitor the health of your mirror farm. 

Performance monitoring using JMX metrics

Java Management eXtensions (JMX) is a technology used for monitoring and managing Java apps. JMX can be used to determine the overall health of each mirror node and the mirror farm. The following statistics are most important to monitor:

  • Hosting tickets on mirror nodes

  • Mirror hosting tickets on the primary

  • Incremental sync time on mirror nodes

  • Snapshot sync time on mirror nodes

  • Disk space, CPU, and memory

Mirror farm JMX metrics

Learn more about mirror farm JMX counters and what they monitor:

For more information and a complete list of JMX metrics, check Enabling JMX counters for performance monitoring.

Timers

The values retained in timer metrics exhibit decaying behavior, with more recent values favored over older values. Unless stated otherwise, the attributes provided by these metrics represent a snapshot in time, reflecting the duration of an operation.

名前

説明

Mean

The mean operation duration

StdDev

The standard duration of a deviation operation

50thPercentile

Returns the duration at the 50th percentile in the distribution

75thPercentile

Returns the duration at the 75th percentile in the distribution

95thPercentile

Returns the duration at the 95th percentile in the distribution

98thPercentile

Returns the duration at the 98th percentile in the distribution

99thPercentile

Returns the duration at the 99th percentile in the distribution

999thPercentile

Returns the duration at the 999th percentile in the distribution

Max

The maximum duration of the operation

Min

The minimum duration of the operation

DurationUnit

ミリ秒

RateUnit

The number of events per second

OneMinuteRate

Returns the one-minute exponentially-weighted moving average rate at which this operation has been called

MeanRate

Returns the mean rate at which events have occurred

FifteenMinuteRate

Returns the 15-minute exponentially-weighted moving average rate at which this operation has been called

FiveMinuteRate

Returns the five-minute exponentially-weighted moving average rate at which this operation has been called

Count

The number of times this operation has been called since application startup

The timer metrics are available for the following operations:

  1. Time to synchronize the repository across the mirror farm with the upstream

    1. Incremental sync com.atlassian.bitbucket:type=metrics,category00=mirror,category01=farm,category02=synchronize,category03=content,name=incremental

    2. Snapshot sync com.atlassian.bitbucket:type=metrics,category00=mirror,category01=farm,category02=synchronize,category03=content,name=snapshot

  2. Time taken to distribute ref changes to the mirror farm nodes and fetching objects from the upstream while syncing a repository

    1. Incremental sync com.atlassian.bitbucket:type=metrics,category00=mirror,category01=farm,category02=distribute-fetch,name=incremental

    2. Snapshot sync

      com.atlassian.bitbucket:type=metrics,category00=mirror,category01=farm,category02=distribute-fetch,name=snapshot

  3. Total time taken to detect and fix all inconsistent repositories on the mirror farm during a single farm vet run com.atlassian.bitbucket:type=metrics,category00=mirror,category01=farm,category02=vet,name=timer

  4. Time taken to synchronize project and/or repository metadata from the upstream on the mirror farm

    1. For syncing all projects: com.atlassian.bitbucket:type=metrics,category00=mirror,category01=farm,category02=synchronize,category03=metadata,name=all-projects

    2. For syncing a single project: com.atlassian.bitbucket:type=metrics,category00=mirror,category01=farm,category02=synchronize,category03=metadata,name=single-project

    3. For syncing a single repository: com.atlassian.bitbucket:type=metrics,category00=mirror,category01=farm,category02=synchronize,category03=metadata,name=repository

  5. Time taken by a repository sync request from creation till it is successfully processed on the mirror farm: com.atlassian.bitbucket:type=metrics,category00=mirror,category01=request,category02=synchronize,name=cycle-time

  6. Time taken by mirror operations in different circumstances. A mirror operation is defined as the unit of work done on each mirror node as part of a process initiated by one of the mirror nodes in a farm. There are different types of mirror operations which are explained in the section below. These are the metrics collected for each operation type:

    1. For successful operation: com.atlassian.bitbucket:mirror.operation.local.<operation_name>.success

    2. For failed operation: com.atlassian.bitbucket:mirror.operation.local.<operation_name>.error

  7. Total time taken by all the nodes to perform the mirror operations and respond to the node that initiated the process. These are the metrics collected for each operation type in different circumstances:

    1. When all the nodes perform the mirror operation successfully and respond with the same result: com.atlassian.bitbucket:mirror.operation.distributed.<operation_name>.success

    2. When one or more nodes failed to perform the mirror operation:  com.atlassian.bitbucket:mirror.operation.distributed.<operation_name>.error

    3. When one or more nodes failed to response within the configured timeout for that operation: com.atlassian.bitbucket:mirror.operation.distributed.<operation_name>.timeout

    4. When all the nodes respond without error but return conflicting results for a given operation: com.atlassian.bitbucket:mirror.operation.distributed.<operation_name>.timeout


Synchronization and consistency

A repo-hash endpoint is provided on both the mirror farm and the primary server. It’s used to check the consistency of a mirror farm and nodes with respect to the primary. This is the same endpoint that Mirror farm vet uses to repair any inconsistencies that come up, such as the result of a missing a webhook. There are some important considerations to keep in mind when using this endpoint:

  • The endpoint, rest/mirroring/latest/repo-hashes, is available on both the primary and the mirror nodes. It returns a stream of JSON containing a content and metadata hash for each repository. The content hash is a digest of the Git repository itself, while the metadata hash is a digest of the metadata that Bitbucket holds concerning the repository, such as the repository name.

  • Content hashes or just metadata hashes are individually requested by calling rest/mirroring/latest/repo-hashes/content or rest/mirroring/latest/repo-hashes/metadata.

  • This is what the payload looks like:

    {
      "projects": [
        {
          "id": 1,
          "public": false,
          "repositories": [
            {
              "id": 1,
              "hashes": {
                "content": "082a2ffa1520447bb6c0072f9f9d850c76f111c0ff9a08cca8838b12b0ccc31a",
                "metadata": "b8fae6cb4704174f8dafae601355279950f921ba55b7620f4bdaa1280e735d14"
              }
            }, 
            {
              "id": 2,
              "hashes": {
                "content": "0000000000000000000000000000000000000000",
                "metadata": "e80aeaf459a69e7000b9e785eb39640a5d929f7ec4f09512a9ab6fabf4a0c80a"
              }
            }
          ]
        }
      ]
    }
  • The process to generate content hashes while reasonably fast needs to run against every repository on the instance, for larger instances this could take quite some time so we make an optimisation. When a upstream is first upgraded to a mirror farm capable version a “empty” content hash is generated for each repository this appears as 0000000000000000000000000000000000000000 as can be seen in the content attribute of the second repository above. When the farm vet encounters a repository with a content hash of 0000000000000000000000000000000000000000 it considers that repository up to date.

  • A mirror will only return entries for the project or repository it’s mirroring. While the content returned from a mirror and the primary will be the same, the order of entries could be different. One way to sort the order consistently for diffing is to use the JQ query jq '.projects | sort | .[].repositories |= sort_by(.id)'

webhook

The mirror synchronized webhook can be used to trigger builds as soon as the mirror has finished synchronizing. It’s also useful for monitoring the repository in your mirror farm. Details of this repository event can be found in the Event payload page.

Monitoring the status of your mirrors

You can configure your load balancer to check the node’s status using the /status endpoint. A response code of 200 is returned if the mirror node is in a SYNCHRONIZED state. If there are no nodes in the SYNCHRONIZED state, a 200 response code will be returned for any mirror that is in one of the following states:

  • BOOTSTRAPPED

  • BOOTSTRAPPING

  • METADATA_SYNCHRONIZED


For customers who want a “strict” status endpoint we provide a plugin.mirroring.strict.hosting.status configuration property that when set to true, the /status endpoint returns a 200 response code only if the mirror is in the SYNCHRONIZED state. The setup for this configuration is outside the scope of the document. It is important to note that at least one mirror node should be accessible from the upstream server. 

The table below displays each state and it’s description:

状態説明
STARTING

Bitbucket application is starting.

STOPPING

Bitbucket application is stopping.

BOOTSTRAPPING

The mirror component is started.

BOOTSTRAPPED

The mirror has joined the cluster. If this is the first time the mirror farm has been connected to a primary, this is the state the application will wait in until it has been authorized.

METADATA_SYNCHRONIZED

Project or repository metadata has been synchronized from the primary and Git repositories have started synchronization.

SYNCHRONIZED

The mirror farm has synchronized all Git repositories from the primary.

If new projects or repositories are added to the mirror farm this state will not change. It indicates that the initial set of projects or repositories that where configured at startup time have been synchronized.

ERROR

There was an error starting the application node

When performing a GET operation against the /status endpoint, the returned data is made up of JSON with two properties, status and nodeCount.

For example; {"state":"SYNCHRONIZED","nodeCount":"4"}

最終更新日: 2024 年 10 月 17 日

この内容はお役に立ちましたか?

はい
いいえ
この記事についてのフィードバックを送信する
Powered by Confluence and Scroll Viewport.