"git repack" uses a lot of memory producing "Out of memory: Killed process .... " Linux OOM messages with Bitbucket Data Center

お困りですか?

アトラシアン コミュニティをご利用ください。

コミュニティに質問


 

Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.

Note that this KB was created for the Data Center version of the product. Data Center KBs for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.

*Fisheye および Crucible は除く

要約

"Git" processes are regularly started by Bitbucket's Mesh sidecar process or remote Mesh nodes. Those external "git" processes execute all git-related repository operations; some can allocate large amounts of RAM. The allocated memory can sometimes be so huge that Linux's out-of-memory killer kills some of the running processes. Which process will be killed depends mainly on the memory allocated by processes - usually, Linux selects the process taking the largest part of the RAM. Sometimes it is Bitucket's Java process, sometimes it is one of the "git" processes. When the Linux OOM condition appears, Linux system logs and output of the dmesg  command will show messages like "Out of memory: Killed process .... (...)".

This article describes the case when large amounts of memory are allocated by one or more git repack  processes being run and explains how to limit the memory the git repack will use.

環境

Bitbucket 8.19.0, but is also applicable to other versions.

診断

There may be multiple reasons why the Bitbucket setup may allocate more RAM than we anticipated.
To identify if the git repack is the one using huge amounts of memory, we should look for several signs; not necessarily all of the signs will be present:

  1. Linux out-of-memory events were recorded in system log files, and processes were killed. The killed processes may vary -  Java, git, or something else.

  2. The list of the processes running before the Linux OOM actually killed some of them shows at least one git process that allocated a huge amount of RAM - from several GB to several tens of GB.

    • Take the Linux system logs and dmesg command output and look for the strings "Tasks state (memory values in pages)" and "oom-kill".

    • Between those two strings, you will find the list of running processes.
      Usually, the process list mentions both "git" and "pack-objects" processes. The column "RSS" shows allocated memory; the numbers are given in allocation pages, which are 4KB in size. So, the allocated " RSS" of 5803369 pages is around 22.14GB.

    • Right below the process list is the information about the killed process.

    • 例:

      [Mon Aug  5 10:58:44 2024] Tasks state (memory values in pages):
      [Mon Aug  5 10:58:44 2024] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name...
      ...
      [Mon Aug  5 10:58:44 2024] [1187772] 23029 1187772     3548        0    69632       54             0 git
      [Mon Aug  5 10:58:44 2024] [1187773] 23029 1187773  7664473  5803369 58511360  1354319             0 git
      [Mon Aug  5 10:58:44 2024] [1193095] 23029 1193095     3548        2    69632       42             0 git
      [Mon Aug  5 10:58:44 2024] [1193096] 23029 1193096     3192        1    69632       58             0 git-http-backen
      [Mon Aug  5 10:58:44 2024] [1193099] 23029 1193099     3985       12    73728      193             0 git
      [Mon Aug  5 10:58:44 2024] [1193110] 23029 1193110      184        4    36864       26             0 pack-objects
      [Mon Aug  5 10:58:44 2024] [1193111] 23029 1193111   151791      588   139264      157             0 git
      [Mon Aug  5 10:58:44 2024] [1194031] 23029 1194031    65470      181    81920        4             0 git
      [Mon Aug  5 10:58:44 2024] [1194034] 23029 1194034      184       27    40960        0             0 pack-objects
      [Mon Aug  5 10:58:44 2024] [1194035] 23029 1194035   816704    27502  3485696      191             0 git
      [Mon Aug  5 10:58:44 2024] [1194707] 23029 1194707   509776      131   487424        0             0 git
      [Mon Aug  5 10:58:44 2024] [1194934] 23029 1194934     3548       44    73728        0             0 git
      [Mon Aug  5 10:58:44 2024] [1194936] 23029 1194936     3192       59    73728        0             0 git-http-backen
      [Mon Aug  5 10:58:44 2024] [1194937] 23029 1194937     4353      224    81920        0             0 git
      [Mon Aug  5 10:58:44 2024] [1194993] 23029 1194993   509776      132   483328        0             0 git
      [Mon Aug  5 10:58:44 2024] [1195085] 23029 1195085   509776      131   471040        0             0 git
      [Mon Aug  5 10:58:44 2024] [1195120] 23029 1195120      184       27    36864        0             0 pack-objects
      [Mon Aug  5 10:58:44 2024] [1195121] 23029 1195121   151962      662   147456        0             0 git
      [Mon Aug  5 10:58:44 2024] [1195436] 23029 1195436     3548       44    77824        0             0 git
      [Mon Aug  5 10:58:44 2024] [1195441] 23029 1195441     3212       75    65536        0             0 git-http-backen
      [Mon Aug  5 10:58:44 2024] [1195444] 23029 1195444     5823      332    81920        0             0 git
      [Mon Aug  5 10:58:44 2024] [1195464] 23029 1195464     3548       45    73728        0             0 git
      [Mon Aug  5 10:58:44 2024] [1195484] 23029 1195484     3192       60    65536        0             0 git-http-backen
      [Mon Aug  5 10:58:44 2024] [1195486] 23029 1195486   300616      265   335872        0             0 git
      [Mon Aug  5 10:58:44 2024] [1195488] 23029 1195488      184       26    40960        0             0 pack-objects
      ...
      [Mon Aug  5 10:58:44 2024] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-23029.slice/user@23029.service,task=git,pid=1187773,uid=23029
      [Mon Aug  5 10:58:44 2024] Out of memory: Killed process 1187773 (git) total-vm:30657892kB, anon-rss:23213476kB, file-rss:0kB, shmem-rss:0kB, UID:23029 pgtables:57140kB oom_score_adj:0
      [Mon Aug  5 10:58:47 2024] oom_reaper: reaped process 1187773 (git), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
      [Mon Aug  5 11:01:59 2024] ENFORCEMENT WARNING: [port_table] failed to malloc a new entry
      [Mon Aug  5 11:05:09 2024] agent-linux-amd invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
      
      
    • In this example, the "git" process with PID 1187773 was killed ("Killed process 1187773"), and at the time of killing it, had around 22GB RAM allocated ("anon-rss:23213476kB").
    • The existence of the "git" process with that massive amount of RAM allocated, irrespective of whether it was killed or not, is an indication of "git repack" using excessive amounts of memory.

  3. Even if there are no Linux OOM killings, you may observe unusually large RAM usage, either by looking at the output of the top command or ps xuaOr command - the last one shows a list of running processes sorted by allocated memory ("RSS").

  4. The Bitbucket's Mesh sidecar log, the file BITBUCKET_HOME/mesh/log/atlassian-mesh.log, contains information on "git repack" commands failing.
    If the Linux out-of-memory killer from point 2 above decided to kill the "git repack" process because it allocated too much RAM, the result of this killing will be recorded in Mesh sidecar's log file as "died of signal 9", for example:

    2024-08-04 12:41:41,000 WARN  [git-gc:thread-2:ds/0/h/ac9f3261b51bc3c92ae1/r/41725] - c.a.b.m.g.g.DefaultGarbageCollectionManager [ac9f3261b51bc3c92ae1-41725] Abandoning garbage collection after 3 failed attempts
    com.atlassian.bitbucket.mesh.git.exception.CommandFailedException: [git repack -A -d -l -n --unpack-unreachable=72.hours.ago] exited with code 137 saying: error: pack-objects died of signal 9
            at com.atlassian.bitbucket.mesh.git.GitProcessCompletionHandler.onError(GitProcessCompletionHandler.java:222)
            at com.atlassian.bitbucket.mesh.git.GitProcessCompletionHandler.onComplete(GitProcessCompletionHandler.java:58)
            at com.atlassian.bitbucket.mesh.process.nu.StdioNuProcessHandler.callCompletionHandler(StdioNuProcessHandler.java:286)
            at com.atlassian.bitbucket.mesh.process.nu.StdioNuProcessHandler.finish(StdioNuProcessHandler.java:308)
            at com.atlassian.bitbucket.mesh.process.nu.StdioNuProcessHandler.onExit(StdioNuProcessHandler.java:100)
    ...
    
    

Cause - the explanation

When the server's memory starts to run out, Linux first tries to use swap space as a virtual memory extension. When all virtual memory (physical RAM plus swap space) is filled up, if running processes still try to allocate more RAM, Linux out-of-memory killer triggers. It will choose one of the processes to kill to reclaim memory. It selects the process whose killing will yield the most memory gain, and that is usually the process that allocated the largest part of RAM.

Bitbucket uses external "git" processes for all git-repository-related operations, and one of those is the git repack command, used to combine loose Git objects into more space-efficient, large "pack" files. In case of large Git repositories, "git repack" command can allocate huge amounts of RAM, forcing Linux OOM into action. The result will be unstable system with poor perfomance.

Usually, the Linux system-log OOM events and large memory allocation of Git processes correlate with the entries from the mesh sidecar logs, where there may be a number of cases of "git repack" command crashing while trying to repack loose Git objects. 


Having an adequate swap space may help to avoid Linux OOM killings.

However, writing data to swap greatly impacts the server's performance, and it is one of the reasons for slow-downs and large CPU utilization - while Linux is swapping parts of RAM to disk, almost all processes are stopped. As a hint on how to identify swapping, apart from swap space filling up, you can look at the list of running processes with the "top" command - there will be "kswap" or similar processes in the "D" state, and CPU I/O utilization would be high.


ソリューション

The internal "git repack" algorithm may allocate large amounts of RAM when dealing with large Git repositories, but luckily, we can limit the amount of resources it is allowed to use.
There are git per-repository configuration changes that we can apply to the repositories for which "git repack" uses excessive resources.

Before altering a production system, make sure you have a consistent, up-to-date backup of Git repositories and SQL database data.

Information on backing up Bitbucket is available on the page Data recovery and backups.


The first thing we need is information for which repositories "git repack" uses huge RAM amounts:

  1. In case of crashed "git repack" processes, we can find information in the BITBUCKET_HOME/mesh/log/atlassian-mesh.log file. In this example the repository ID is 41725:

    2024-08-04 12:41:41,000 WARN  [git-gc:thread-2:ds/0/h/ac9f3261b51bc3c92ae1/r/41725] - c.a.b.m.g.g.DefaultGarbageCollectionManager [ac9f3261b51bc3c92ae1-41725] Abandoning garbage collection after 3 failed attempts
    com.atlassian.bitbucket.mesh.git.exception.CommandFailedException: [git repack -A -d -l -n --unpack-unreachable=72.hours.ago] exited with code 137 saying: error: pack-objects died of signal 9
  2. In the case of still-running "git repack" operations, we can use the top and ps xuaOr commands to get the list of running processes and look for the "git" processes that are still running. We can check either the command line itself, or look into the /proc filesystem to see the git process' working directory and opened files; the paths to check are /proc/<GIT_PROCESS_PID>/cwd and /proc/<GIT_PROCESS_PID>/fd/.
    Those should be sufficient to determine the Git repository path where the "git repack" is operating.


We will first test the configuration change on the copy of the canonical Git repository!

Step 1: test the solution

First, it is essential to test the configuration we want to implement and verify that the "git repack" process can complete its task.  Here is what we recommend to do directly on one of the Bitbucket nodes. Test runs of "git repack" on copies of repositories will add some load to the server, so conduct this during off-peak hours! If you need to alter the configuration of more than one Git repository, it is best to make a copy of only one repository at a time and do the test, to limit additional disk space usage.

  1. Navigate to the repository on disk:

    cd $BITBUCKET_HOME/shared/data/repositories
  2. Make a copy of the repository within the same parent location - otherwise, due to the Git configuration files stored, the "git repack" process won’t be the same as the one that Bitbucket Server will run.
    The repository ID in this example is 41725.

    cp -r 41725 41725.copy
  3. Apply the recommended configuration changes:

    cd 41725.copy
    git config pack.threads 8
    git config pack.windowMemory 1g
    • pack.threads  limits the number of threads that the repack process is able to use (which has a default of 14).
    • pack.windowMemory limits how much memory each thread is able to use (which has a default of “unlimited”).
    • Lowering both limits will slow down the "git repack" process for that repository, but it will allow the process to be completed without invoking the Linux OOM Killer.

  4. Remove all more than a few hours old tmp_pack files from the objects/pack directory. These files may exist as the result of the repacks being killed before they could clean up their temporary files.

  5. Enable Git verbose output in the terminal window:

    export GIT_TRACE_PACKET=1
    export GIT_TRACE=1
    export GIT_CURL_VERBOSE=1
     
  6. Run the "git repack" command and monitor the memory usage, for example using the top  and ps xuaOr  commands:

    git repack -adfln --keep-unreachable --depth=20 --window=200
    •  Make sure you collect the output generated by this command. It would be beneficial in case further diagnostics are needed.

    • Since this operation will be long-running, you may wish to run it inside the screen or tmux environment.
  7. If this operation completes successfully - without invoking the OOM Killer - you may proceed in the same way with the copies of other repositories that may need tuning.

  8. After completing all tests, you may remove copies of the canonical repositories - directories we used for testing.
    Ensure you are about to remove the copies, not the original directories!

Step 2: Apply the solution

If the test from above successfully completes the "git repack" with reasonable memory consumption, it is safe to apply the configuration changes to the canonical repositories.
The repository ID in this example is the same, 41725

  1. Navigate to the repository on disk:

    cd $BITBUCKET_HOME/shared/data/repositories/41725
  2. Apply configuration changes:

    git config pack.threads 8
    git config pack.windowMemory 1g
     
  3. Remove any old tmp_pack files from the objects/pack subdirectory.


  4. Do not manually invoke the"git repack" command on the canonical repositories!
    Allow Bitbucket to do this instead. The next "git push" to the Git repository will trigger the "git repack" process.


After this modification, "git repack" commands run by Bitbucket on configured Git repositories will be resource-limited and should no longer trigger the Linux out-of-memory killer.

Other notes

Related to this issue of "git repack" using excessive amounts of memory, there is still an open enhancement suggestion at BSERV-13611 - Getting issue details... STATUS .
You can add yourself as a watcher to that ticket and vote for it.


最終更新日 2024 年 11 月 27 日

この内容はお役に立ちましたか?

はい
いいえ
この記事についてのフィードバックを送信する
Powered by Confluence and Scroll Viewport.