How to optimize migration (export and import) of large number of projects from one Bitbucket Server and Data Center instance to another

お困りですか?

アトラシアン コミュニティをご利用ください。

コミュニティに質問

プラットフォームについて: Server および Data Center のみ。この記事は、Server および Data Center プラットフォームのアトラシアン製品にのみ適用されます。

Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Fisheye および Crucible は除く

このページの内容はサポート対象外のプラットフォームに関連しています。したがって、アトラシアン サポートではこのページの記載内容のサポートの提供は保証されません。この資料は情報提供のみを目的として提供されています。内容はお客様自身の責任でご利用ください。

要約

When migrating - exporting and importing - a large number of projects and repositories from one Bitbucket Server and Data Center instance to the other, the process explained on the page Export and import projects and repositories should be optimized. Without optimization, the whole process won't be efficient, can take much longer time than needed, and can result in GIT repo duplicates if there are repository forks present.

環境

8.9.3, but also applicable to other versions.

ソリューション

With many projects and repositories with a lot of data to move, the general idea is to organize exports into smaller chunks, several projects or repositories each, and to do exports in parallel.

Optimization would include a bit of programming or scripting skills. 

We use these REST API endpoints to construct the list of projects and repositories:

Possible challenges

The challenges, according to Start export job REST API, are these two:

  1. For every selected repository, its full fork hierarchy will be considered selected, even if parts of that hierarchy would otherwise not be matched by the provided selectors.
    For example, when you explicitly select a single repository only, but that repository is a fork, then its origin will be exported (and eventually imported), too.
  2. Only 2 concurrent exports are supported per cluster node.
    If a request ends up on a node that is already running that many export jobs, the request will be rejected and an error returned.

Automatic selection of fork hierarchy

Fork hierarchies are always migrated fully. That means that even if we select for export only project P1 and its repository R1, if R1 itself is a fork, or there are forks of R1, the complete hierarchy will be migrated. For example, if repo R2 in project P2 is a fork of R1, it will be exported, too. Now, if another export selects P2/R2 for export, that one will again include P1/R1, and we will end up with the same repository exported and later imported twice.

In other words, this needs additional handling:

  • If we know how to split projects and repositories into several chunks so there are no duplicates due to fork hierarchies, we can do the splitting manually.
    For example, if we are sure that forks are not created across projects, we can do the splitting on a per-project level.
  • If we don't know how to split them, we will have to apply some programmatic procedure.

One such programmatic procedure may be like this:

  1. Use Get projects and Get repositories for project REST APIs to create a "complete list of projects and their repositories".
  2. Split the "complete list of projects and repositories" into smaller chunks.
  3. Iterate over this list of "chunks" and call Preview export REST API to check what actually would be exported for a given "chunk" used as a project/repo selector.
    • Add the "chunk" (export selector) to the new "list of selectors for export".
    • Check the result of the Preview export and remove from the "complete list of projects and repositories" all repositories that would be selected as part of the fork hierarchy.
      This will ensure that automatically selected repositories won't be added again later.
  4. Repeat step 3. until you iterate over all elements of the "complete list of projects and repositories" or that list becomes empty.
  5. At the end, you will have selectors to use for export in the "list of selectors for export", so you can use it to launch parallel exports.

Only 2 concurrent exports are supported per cluster node

If a request ends up on a node that is already running two export jobs, the request will be rejected, and an error will be returned. You can use that as a signal to try again, hoping your request will end on a different node.

Care must be taken not to create excessive requests, so some back-off timer has to be applied.

With only two concurrent exports per cluster node, the maximal number of export chunks that can run in parallel is

2*<NUMBER_OF_CLUSTER_NODES>


Final notes

  • Please be sure to check all referenced documentation pages. They contain valuable information.
  • Using shell script for a programmatic procedure to split exports into chunks may not be optimal.
    You could better use some more advanced scripting tools or language.
  • Start export job REST API accepts both project and repository as export selectors.
    You can use multiple selectors to export several specific projects or repositories at once.
    Page Exporting has several examples.
  • Preview export REST API accepts the same selectors as Start export job.



最終更新日: 2023 年 10 月 13 日

この内容はお役に立ちましたか?

はい
いいえ
この記事についてのフィードバックを送信する
Powered by Confluence and Scroll Viewport.