Stash is reaching resource limits

はじめに

This article aims to explain the meaning of the "Stash is reaching resource limits..." banner that you might be seeing on your instance.

It is important to note that you might have two different banners displayed by Stash:

  • The yellow banner indicating Stash is queuing requests or;
  • 赤色のバナーは、リクエストが実際に拒否されていることを示します。

重負荷の下での黄色のバナーは正常 です。 この段階で、設定パラメータを変更しないでください。 設定の制限を増やす前に、サーバー上のCPU、メモリおよび/またはI / O負荷を監視し、スラッシングされていないことを確認する必要があります。チケット制限の増加はインスタンスのパフォーマンス悪化の可能性があります。次のセクションでもっと説明をします。

メモリ割り当て

When deciding on how much memory to allocate for Stash, the most important factor to consider is the amount of memory required by the git operations. Stash's default JVM memory configuration (-Xms256m -Xmx768) works well for customers with large instances in our experience as Stash's heap size has zero impact on hosting operations. Most of the memory usage in a server running Stash is on the forked git processes – the git operations are very expensive in terms of memory. So allocating more memory to the Stash JVM won't help your instance scale and perform better. The effect can be exactly the opposite: if your server doesn't have enough free memory available for the forked git operations because your Stash JVM is using most of the memory, as soon as git concurrent operations start to happen, you will experience a performance loss as your server will not have enough memory to fork out more git processes.

Ok. I get that I shouldn't tweak Stash's default memory in most cases. How should I budget my server memory then?

It is a simple formula: we should focus on how many concurrent hosting tickets Stash allows as well as how much memory is used by a single clone in order to budget the memory usage.

  • Memory usage by a single clone operation

    As a rule of thumb, 1.5 x the repository size on disk (contents of the .git/objects directory) is a rough estimate of the required memory for a single clone operation for repositories up to 400 MB. For larger repositories, memory usage flattens out at about 700 MB. For example, for a single hosting operation, as the number of memory usage for Stash remains constant at about 800MB (default) during the entire hosting operation, Git's memory usage climbs according to the rule just described.
    For a detailed analysis on server's resource usage, please read through Scaling Stash - Clones examined.

  • Number of concurrent operations allowed by Stash

    By default, Stash limits the number of Git operations that can be executed concurrently in order to prevent the performance for all clients dropping below acceptable levels. These limits can be adjusted – see Stash config properties
    The parameter used for that (throttle.resource.scm-hosting) is based on the number of CPUs that you have on your server and its formula is 1.5 x cpu.

  • Awesome! Now I can calculate how much memory I need on my server to safely run Stash

    That's all you need to know. So, for a common Stash environment the budget would be:
    • Stash: 768MB
    • Git: 1.5 * (4 CPU) * 700 MB = 4200 MB
    • オペレーティング システム: 1024 MB
    • 合計: 5992 MB
    • 40 % の安全マージン: ~ 8 GB

    A recommended server to run this instance should have 8 GB of RAM memory considering it is only running Stash. Please remember that these numbers are just estimations and the actual numbers may vary depending on IO, CPU and memory characteristics of the underlying hardware and OS.

  • Please refer to Scaling Stash for more details.

質問と回答

On the UI, the message "Stash is queuing requests" appears and it is followed later by "Stash is reaching resource limits". What does it mean?

Let's take the example described in the Memory budget section. At "Stage 1", with 4 CPUs, Stash will allow 6 SCM hosting operations to be executed concurrently, thus forking each one of them out into Git processes. If the server receives more than these 6 SCM requests, these will be queued up and won't be forked out into Git processes in order to avoid a memory exhaustion on your server. At "Stage 2", some of the Git processes initially forked out will finish processing and Stash takes requests from the SCM queue to fill in those slots. In a third stage, Stash will fork them out into new Git processes.

It is important to note that if requests are queued up for more than a minute, Stash will display the "Stash is queueing requests..." message.

If requests are queued up for more than 5 minutes (throttle.resource.scm-hosting.timeout), they are rejected and the git clone/fetch/push operation will fail. At this time, the "Stash is reaching resource limits..." message is displayed. The message will disappear when 5 minutes have passed in which no requests have been rejected (throttle.resource.busy.message.timeout or server.busy.on.ticket.rejected.within for Stash 3.0+). These parameters can be adjusted – see Stash config properties.

以下に、拒否された SCM リクエストの種類ごとにログに記録されるデータの例を紹介します。

A [scm-hosting] ticket could not be acquired (0/12)

atlassian-stash.log
2015-08-28 11:41:00,327 WARN [ssh-scm-request-handler] Access key user (drohan@localhost) @16NVUP5x701x17836x333 1z0lim3 10.10.10.122 SSH - git-upload-pack '/alpha/apple.git' c.a.s.i.t.SemaphoreThrottleService A [scm-hosting] ticket could not be acquired (0/24) 
2015-08-28 11:41:00,334 INFO [ssh-scm-request-handler] Access key user (drohan@localhost) @16NVUP5x701x17836x333 1z0lim3 10.10.10.122 SSH - git-upload-pack '/alpha/apple.git' c.a.s.s.t.ThrottledScmRequestFactory A scm-hosting request was denied due to heavy server load. Please see http://docs.atlassian.com/stash/docs-0212/Scaling+Stash for performance guidelines.

A [scm-command] ticket could not be acquired (0/1)

atlassian-stash.log
2015-08-28  11:41:05,327 WARN [http-nio-7990-exec-9] usera @16NVUP5x701x17836x3 0:0:0:0:0:0:0:1 "GET /projects/TEST/repos/test/commits HTTP/1.1" c.a.s.i.t.SemaphoreThrottleService A [scm-command] ticket could not be acquired (0/1)

 

 

これらのステージについては以降の図をご参照ください。

メッセージは慎重に確認することをおすすめします。リクエストが却下されていて、それが定期的に発生している場合、インスタンスのサイズが適切に設定されていない可能性があります。

この問題を回避するために取るべきアクション

上記で説明したように、秘密はシステムにより迅速に処理されたホスティングチケットの数とキューを持つことです。小さなキューを形成したが、チケットが拒否取得を開始する前に空だったとしても、問題ありません。こちらでもっと悪いパフォーマンスに至る、よくある間違いについて記載します。 - それらを実行しないでください:  

  • Increasing the amount of memory for the Stash JVMDon't do that! Usually, by taking this action, a lot of memory from the server is allocated to the JVM and little extra free memory is available to the Git processes being forked out by Stash. This is bad and can have side effects like the one described git push fails - Out of memory, malloc failed as the Git processes are being chocked by the lack of free memory. 
  • Increasing the number of hosting tickets by tweaking throttle.resource.scm-hostingDon't do that! The reason for not taking this action lies on what was previously explained: the system will wait up to 5 minutes (default configuration of throttle.resource.scm-hosting.timeout) for a ticket to free up. That said, reducing the number of hosting tickets may result in some queuing, however the individual clones tends to be processed faster due to reduced I/O contention which increases the likelihood a ticket frees up before the timeout. On the other hand, by increasing the number of hosting tickets Stash can handle, you will increase the amount of time your CPUs are processing the forked out Git processes as there will be more which in turn decreases the likelihood of a ticket to be freed up before the timeout.

Below are a few actions that should help you get your Stash processing hosting tickets faster. Please consider these:

継続的インテグレーション ポーリング

Reduce the frequency your CI servers are checking out repositories from Stash. This is a common point reported by customers hitting this issue. If you have a large number of Git operations happening on Stash, you're likely to have the "Stash is reaching resources limits..." message. Make sure your CI servers are reasonably keeping Stash busy and reduce polling when possible. Make sure you have the SCM caching plugin turned on and up-to-date as described in Scaling Stash for Continuous Integration performance. 

SSL 処理

Running SSL all the way through to Stash is another common issue we see in customers hitting this issue. For a better performance, you really need to setup a proxy in front of Stash. The Java SSL stack is nowhere near as efficient as any of those and removing this will result in much more efficient processing from Stash. Please see Proxying and securing Stash on how to perform this change.

Ref advertisement キャッシュ 

The ref advertisement feature is disabled by default as explained in Scaling Stash for Continuous Integration performance under the Caching section. However, it can produce a noticeable reduction in load when enabled. You shouldn't need to restart Stash to accomplish that; you can change it at any time using the REST API calls detailed in Enabling and disabling caching.

サーバ リソース

以下の内容を確認することが重要です。

  • メモリの容量
  • CPU の数

Budget your server accordingly. Refer to the Memory budget section as well as Scaling Stash for more details on how to do this.

  • Plugins: As explained, this issue has a lot to do with processing. Make sure you don't have any plugins affecting your performance. We've had experience with Awesome Graphs: it is a nice plugin but the indexing it does is CPU and IO intensive and can dramatically affect the performance of a Stash instance. If your system is already under heavy load, we advise you to disable all user installed plugins for a period of observation. Follow the instruction in Set UPM to safe mode.
  • 処理: ここでの戦略として、現在のシステムのデフォルトの同時実行の上限を維持しながら、マシンに複数のCPUを追加することが考えられます。システムで 4 個のCPUを持っているとします。現在の SCM ホスティング チケットの数は 6 です。より多くの CPU を追加しながら同時実行のキューを過去の状態に保つことにより、これらの 6 つのホスティング チケットを素早く処理し、追加のリクエストのキュー時間を短縮して、チケットのフローを改善できます。したがって、サーバーでのホスティング操作の処理中に、同じ負荷を処理するキャパシティを保ちながら、全体的なパフォーマンスを改善できます。

You can keep the current concurrency limit by adding the following configuration to your stash-config.properties (create one if it doesn't exist and restart Stash so it will load the new file):

# Limits the number of SCM hosting operations, meaning pushes and pulls over HTTP or SSH, which may be running concurrently.
# This is intended primarily to prevent pulls, which can be very memory-intensive, from pinning a server's resources.
# There is limited support for mathematical expressions; +,-,*,\ and () are supported. You can also use the 'cpu'
# variable which is resolved to the number of cpus that are available.
throttle.resource.scm-hosting=6

 

スレッド ダンプ解析を使用せずに SCM ホスティング チケットの使用状況を監視する方法

There are two ways to monitor Stash's ticket counts, depending on what exactly you want from your monitoring.

The first approach is to write a plugin. Stash raises TicketAcquiredEvent, TicketReleasedEvent and TicketRejectedEvent at all the times you'd expect events with those names to be raised. Each event includes the bucket the ticket came from. You'd want to monitor the "scm-hosting" ones; "scm-command" ones are used for trivial operations like git diff and git rev-list and are never held for more than a couple of seconds at a time.

If you choose to go down this route, please take care to do as little processing as possible in your @EventListener method. If you need to do a lot of processing, please create an ExecutorService of your own to do it on. Tying up Stash's event threads can result in strange behavior.

The second approach is to add the following line to stash-config.properties and restart Stash:

logging.logger.com.atlassian.stash.internal.throttle.SemaphoreThrottleService=TRACE

This will enable trace logging specifically for the ticketing system. With that in place you'll start seeing messages like these:

2014-04-07 22:48:32,492 TRACE [http-bio-7990-exec-3] bturner 1368x89x0 qt8gqa 192.168.30.1 "POST /scm/AT/stash.git/git-receive-pack HTTP/1.1" c.a.s.i.t.SemaphoreThrottleService Acquired [scm-hosting] ticket (5/6)

2014-04-07 22:48:46,266 TRACE [http-bio-7990-exec-3] bturner 1368x89x0 qt8gqa 192.168.30.1 "POST /scm/AT/stash.git/git-receive-pack HTTP/1.1" c.a.s.i.t.SemaphoreThrottleService Released [scm-hosting] ticket (6/6)

The count is shown at the end, showing (available/total). In other words, on the first line it's showing one ticket is in use (not five) out of 6, and on the second it's showing that all 6 tickets are available again. You'll also see lines for "scm-command" tickets, but it's unlikely those counts will ever matter much; we've never yet had a customer have issues with "scm-command" tickets because their lifecycle is so brief.

パフォーマンスの問題の原因を特定できない場合 

Please open an issue with Atlassian Support. Make sure of the following:

  • You downloaded the logparser tool and ran the generate-access-logs-graph.sh on your Stash atlassian-stash-access*.logs as per instructions on its page. Attach the graphs to the issue. If you can't run the tool, that's ok. We will run it for you based on your Support Zip.
  • インスタンスでデバッグ ログが有効化されていること。
  • インスタンスをプロファイルロギングする :どのプロセスが終了するのに長い時間がかかっているか、これを通し、見ることができるので、この項目は非常に重要です。
  • 問題が起こった後にサポート zip ([管理] > [アトラシアン サポート ツール] > [作成]) を生成していること。サポート リクエストのチケットに添付し、[ファイル サイズの制限] が選択されていないことを確認します。

 

最終更新日 2018 年 11 月 2 日

この内容はお役に立ちましたか?

はい
いいえ
この記事についてのフィードバックを送信する
Powered by Confluence and Scroll Viewport.