How to perform a manual garbage collection on a repository

お困りですか?

アトラシアン コミュニティをご利用ください。

コミュニティに質問

プラットフォームについて: Server と Data Center のみ - この記事は、サーバーおよびデータセンター プラットフォームのアトラシアン製品にのみ適用されます。

目的

(info) Bitbucket runs a garbage collection when needed, this should never be performed manually on the repositories to avoid any data loss.

The best course of actions is to allow Bitbucket to run it instead of manually run it. Bitbucket will not run the git pack-refs, git repack and git prune  as soon as there are no forks for a specific repository.


This page covers the steps required to allow Bitbucket to successfully run the garbage collection.

ソリューション

For Bitbucket 5+

Bitbucket implements its own garbage collection logic without relying on git gc anymore (this is achieved by setting the [gc] auto = 0 on all repositories). When a fork is created, the pruneexpire=never is added to the git configuration and this is removed when the last fork is deleted.

For Bitbucket < 5.0

Bitbucket server relies on git running auto gc on push. That doesn't necessarily mean git gc will actually run, git uses a heuristic to decide whether gc is necessary (the repository has either 6700 loose objects or 50 pack files. The number of loose objects is estimated by counting how many objects are in objects/17).

For repositories with forks, the git auto garbage collection is disabled by setting the gc.auto 0 configuration option as soon as the first fork is created. This setting is then removed, reenabling the auto gc as soon as the last fork is removed.

More information on this subject

Check for the existence of forks

Since the garbage collection can be performed only if there are no forks (to avoid necessary data from being removed), the first step is to check if there are any forks for a repository.

This can be achieved by running the following REST API (documented here):

curl --user <username>:<password> -H "Content-Type: application/json" -X GET <bitbucket_url>/rest/api/1.0/projects/<project_key>/repos/<repository_slug>/forks > ./rest_output.txt

The following is the result when no forks are available from the repository:

{"size":0,"limit":25,"isLastPage":true,"values":[],"start":0}

If there are forks on the repository, no garbage collection (git gc or git prune) should ever be run to avoid any data loss.

Forks cannot be removed but I still want to run a repack

In the production the following steps can require some time, for this reason it is recommended to check the potential gain on a copy of the repository first.

Also keep track of the time required to perform the full sequence of steps.

cd <repository path in the Bitbucket home directory>
cp -r * some/tmp/location
cd some/tmp/location
du -h
git fsck
git repack -Adln —keep-unreachable
du -h

If the gain is significant, plan for the required downtime and proceed with the next steps.

Another mechanism to check the gain is by running (before and after the repack) the following command:

git count-objects -v


Perform the repack on the repository itself

  • Generate a backup of Bitbucket Server (Data recovery and backups)
  • Stop Bitbucket Server (this is not strictly required but always recommended)
  • Run the following commands:

    cd <repository path in the Bitbucket home directory>
    du -h
    git fsck
    git repack -Adln —keep-unreachable
    du -h
  • Bitbucket Server を開始します。

説明

This page covers the preliminary checks to understand if a garbage collection is a viable option on a repository, and the steps to perform it.

製品Bitbucket
最終更新日 2019 年 8 月 20 日

この内容はお役に立ちましたか?

はい
いいえ
この記事についてのフィードバックを送信する
Powered by Confluence and Scroll Viewport.