Missing commits in Bitbucket after a filesystem migration

お困りですか?

アトラシアン コミュニティをご利用ください。

コミュニティに質問

プラットフォームについて: Server および Data Center のみ。この記事は、Server および Data Center プラットフォームのアトラシアン製品にのみ適用されます。

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Fisheye および Crucible は除く




要約

After a migration to a new filesystem, commits appear as missing from the repositories.

The commits usually being reported as missing are merge commits and the commits being part of merged pull requests. The commits are missing from both the source and the target branch.

診断

When moving to a new filesystem, some customers have been using rsync and follow a workflow similar to the following:

  • do an initial rsync
  • (optional) perform additional rsync. These could be scheduled or manual and have the objective to reduce the downtime during to migrate to a new filesystem.
  • perform a final rsync


While running this sequence of commands, the --delete rsync option is not used so only changes to existing files and new files are synchronized. However, files that have been copied during the earlier rsync are not deleted and this causes unexpected conflicts in the Git repositories.

原因

At certain intervals, Bitbucket runs a git pack-refs --all (see git-pack-refs for git-pack-refs documentation) causing the files containing the refs to the tip of the branch to be packed to the $REPOSITORY_HOME/packed-refs file.

The file containing the refs, stored as $REPOSITORY_HOME/refs/heads/feature/<branch_name>, is emptied as part of this process.


If the first rsync has been performed before the pack-refs ran and again after that without the --delete option, both the $REPOSITORY_HOME/refs/heads/<branch_name> and the $REPOSITORY_HOME/packed-refs file will be present in the target filesystem.

When that happens, git recognizes the hash in $REPOSITORY_HOME/refs/heads/<branch_name> as the tip of the branch but this is now outdated and leads to what is described as "missing commits".

ソリューション

Important Notice:

The script provided on this page is not officially supported by Atlassian. We can't guarantee that it will not cause any side effects or unintended consequences. It is provided as-is, and users are responsible for any issues that may arise from its use.

Precautionary Steps:

1. Take a Backup:
  Before running the script, ensure that you have taken a complete backup of your Bitbucket server environment. This is crucial to restore your system in case of any unexpected outcomes.

2. Turn Off Bitbucket:
  It is recommended that you turn off the Bitbucket Server before running the script to avoid conflicts and ensure data integrity.

3. Test Environment:
  Before executing the script in a production environment, it should be thoroughly tested in a non-production or test environment. This step is critical to verify the script's output and confirm that it works as intended without causing any disruptions.

The script is not a full-blown solution that can handle all cases. It goes over the file system and establishes which unpacked references (represented by files on the file system) are older than the pack file where the packed references are stored. When it finds a file older than the file with packed references, it reports it. However, while doing so, it checks only timestamps on the files, it does not check the file contents. The script won't catch all cases if the timestamps of files with unpacked references are modified. That can happen if "rsync" is done without preserving timestamps, or if the file with the unpacked reference is updated for any reason.

This script or any other should be treated as a "last-resort" tool to use to salvage the Git repository if no other options are possible.

The correct approach to data migration is to always use tools that will make a 1:1 identical copy of the original file system - in the rsync case, it means using the "--delete" parameter, and preserving timestamps, ownerships and permissions.

By proceeding with the script, you acknowledge that you understand and accept the risks involved, and you agree to take full responsibility for any actions performed.

If Bitbucket has not been used yet on the new filesystem. the recommended action is to switch back to the previous filesystem or to perform the rsync again using the delete option to remove all unnecessary refs on the target site.

Whilst the preferred option is to run the rsync --delete before any new work has been performed it may not be possible. The following script will review the refs in the repository and identify which ones are older than the packed-refs file. As is it will identify the suspect refs. Uncommenting the rm will remove the unexpected older refs.

#!/bin/bash

# Script to check the repositories after rsync to find refs files that are older than the latest packed-refs
# after a rsync without --delete
# uncomment the rm -f $ref once you have confirmed the list looks good.

TMPOCFILE="/tmp/$$.oldcommits"

cd $BITBUCKET_HOME/shared/data/repositories

for rep in [0-9]*
do
   cd $rep
   echo Checking `pwd` \[Repository $(grep project repository-config|awk '{print $3;}')/$(grep repository repository-config|awk '{print $3;}')]

   find refs -type f \! -newer packed-refs -print | while read ref
   do
      grep " $ref\$" packed-refs
   done > $TMPOCFILE
   if [ -s $TMPOCFILE ]
   then
      echo Found newer refs in packed-refs
      cat $TMPOCFILE
      awk '{print $2;}' $TMPOCFILE | while read ref
      do
         echo $ref outside of packed-refs looks old
         # rm -f $ref
      done
   fi
   cd ..
done

rm -rf $TMPOCFILE


The above should restore the hidden references. Alternately, the user can resubmit all the changes that they have made since the initial rsync.

Step by step example

This section will show what happens to the git repository on the server with a step by step example.

Push a commit

When a commit is pushed to the rsync_testing branch in the repository, the $REPOSITORY_HOME/refs/heads/rsync_testing is updated containing the hash corresponding to the tip of the branch:

cat refs/heads/rsync_testing
d17c361edcee69f6b4c25f2230896c7ef1673480


If this is a new branch, the packed-refs file does not contain any entry for the rsync_testing branch:

cat packed-refs | grep "refs/heads/rsync_testing"
# no results


First Rsync

Both the $REPOSITORY_HOME/refs/heads/rsync_testing file and the $REPOSITORY_HOME/packed-refs file will match exactly the ones above in the target environment.


Push a second commit

The refs now contain the new hash:

cat refs/heads/rsync_testing
f8d2ce4e12becfdbc875defb81a3041ee9f4e421


The packed-refs file still does not contain any references to the branch:

cat packed-refs | grep "refs/heads/rsync_testing"
# no results


Merge the pull request

When a pull request is merged, a garbage collection is scheduled and, if the minimum interval has passed, the refs are packed:

cat refs/heads/rsync_testing
# The refs have been packed so the file does not exist anymore


The packed-refs file now has an entry for the rsync_testing branch and this points to the most recent commit:

cat packed-refs | grep "refs/heads/rsync_testing"
7622bf84bdfb2fae299609a38b09e7bfdea22b00 refs/heads/rsync_testing


Second Rsync

The $REPOSITORY_HOME/packed-refs file will be synchronized while the $REPOSITORY_HOME/refs/heads/rsync_testing file will not because the file does not exist.

In the target environment, the refs/heads/rsync_testing is still present and contains the hash of the first commit:

cat refs/heads/rsync_testing
cbc34bf8ea2548c37d853a934b8af0a58e436fe1


The packed-refs contain the updated value with the hash of the second commit.

cat packed-refs | grep "refs/heads/rsync_testing"
7622bf84bdfb2fae299609a38b09e7bfdea22b00 refs/heads/rsync_testing


The git logic gives the precedence to the content of the loose refs (refs/heads/rsync_testing) so the history will show the 


Source environment - master branch


Target environment - master branch



Source environment - rsync_testing branch

Target environment - rsync_testing branch


その他

Q: Does it matter if the source branch was deleted or not during the merge?

A: No, what is relevant is the status of the refs files and not the content of the branches. In the example, the source branch has not been deleted.





最終更新日 2024 年 7 月 15 日

この内容はお役に立ちましたか?

はい
いいえ
この記事についてのフィードバックを送信する
Powered by Confluence and Scroll Viewport.