Scaling Stash

ハードウェア要件

The type of hardware you require to run Stash depends on a number of factors:

ユーザーの数。
The size of your repositories. On large repositories, many operations require more memory and more CPUs.
The frequency of clone operations. Cloning a repository is one of the most demanding operations. One major source of clone operations is continuous integration. When your CI builds involve multiple parallel stages, Stash will be asked to perform multiple clones concurrently, putting significant load on your system.

The following are rough guidelines for choosing your hardware:

Estimate the number of concurrent clones that are expected to happen regularly (look at continuous integration). Add one CPU for every 2 concurrent clone operations.
Estimate or calculate the average repository size and allocate 1.5 x number of concurrent clone operations x min(repository size, 700MB) of memory.

Note that Stash does not currently support virtualised environments.

このページの内容

関連ページ

Using Stash in the enterprise

Understanding Stash's resource usage

Most of the things you do in Stash involve both the Stash server and one or more Git processes created by Stash. For instance, when you view a file in the Stash web application, Stash processes the incoming request, performs permission checks, creates a Git process to retrieve the file contents and formats the resulting webpage. In serving most pages, both the Stash server and Git processes are involved. The same is true for the 'hosting' operations: pushing your commits to Stash, cloning a repository from Stash or fetching the latest changes from Stash.

As a result, when configuring Stash for performance, CPU and memory consumption for both Stash and Git should be taken into account.

Stash basic flow

メモリ

When deciding on how much memory to allocate for Stash, the most important factor to consider is the amount of memory required for Git. Some Git operations are fairly expensive in terms of memory consumption, most notably the initial push of a large repository to Stash and cloning large repositories from Stash. For large repositories, it is not uncommon for Git to use up to 500 MB of memory during the clone process. The numbers vary from repository to repository, but as a rule of thumb 1.5 x the repository size on disk (contents of the .git/objects directory) is a rough estimate of the required memory for a single clone operation for repositories up to 400 MB. For larger repositories, memory usage flattens out at about 700 MB.

The clone operation is the most memory intensive Git operation. Most other Git operations, such as viewing file history, file contents and commit lists are lightweight by comparison.

Stash has been designed to have fairly constant memory usage. Any pages that could show large amounts of data (e.g. viewing the source of a multi-megabyte file) perform incremental loading or have hard limits in place to prevent Stash from holding on to large amounts of memory at any time. In general, the default memory settings (max. 768 MB) should be sufficient to run Stash. The maximum amount of memory available to Stash can be configured in setenv.sh or setenv.bat.

The memory consumption of Git is not managed by the memory settings in setenv.sh or setenv.bat. The Git processes are executed outside of the Java virtual machine, and as a result the JVM memory settings do not apply to Git.

CPU

In Stash, much of the heavy lifting is delegated to Git. As a result, when deciding on the required hardware to run Stash, the CPU usage of the Git processes is the most important factor to consider. And, as is the case for memory usage, cloning large repositories is the most CPU intensive Git operation. When you clone a repository, Git on the server side will create a pack file (a compressed file containing all the commits and file versions in the repository) that is sent to the client. While preparing a pack file, CPU usage will go up to 100% for one CPU.

For users that connect to Stash using SSH, the encryption of data adds to overall CPU usage. For day-to-day push and pull operations the overhead will not be significant, but when cloning repositories the overhead will be noticeable.

To get the maximum performance from Stash, we advise configuring automatic build tools to use the http or https protocol, if possible.

Clones examined

Since cloning a repository is the most demanding operation in terms of CPU and memory, it is worthwhile analyzing the clone operation a bit closer. The following graphs show the CPU and memory usage of a clone of a 220 MB repository:

Git process (blue line)

CPU usage goes up to 100% while the pack file is created on the server side.
CPU peaks at 120% when the pack file is compressed (multiple CPUs used).
CPU drops back to 0.5% while the pack file is sent back to the client.

Stash (red line)

CPU usage briefly peaks at 30% while the clone request is processed.
CPU drops back to 0% while Git prepares the pack file.
CPU hovers around 1% while the pack file is sent to the client.

Git process (blue line)

Memory usage slowly climbs to 270 MB while preparing the pack file.
Memory stays at 270 MB while the pack file is transmitted to the client.
Memory drops back to 0 when the pack file transmission is complete.

Stash (red line)

Memory usage hovers around 800 MB and is not affected by the clone operation.

This graph shows how concurrency affects average response times for clones:

Vertical axis: average response times.
Horizontal axis: number of concurrent clone operations.

The measurements for this graph were done on a 4 CPU server with 12 GB of memory.

Response times become exponentially worse as the number of concurrent clone operations

exceed the number of CPUs.

Configuring Stash scaling options and system properties

Stash only allows a fixed number of Git commands to be executed concurrently, to prevent the performance for all clients dropping below acceptable levels. Stash has two settings to control the number of Git processes that are allowed to process in parallel: one for the web UI and one for the 'hosting' operations (pushing and pulling commits and cloning a repository).

The settings can be overridden by creating a stash-config.properties in STASH_HOME with the following content:

STASH_HOME/stash-config.properties

# The maximum number of concurrent requests using git commands using the UI or REST services (e.g. git diff via the UI). Default value is 25.
throttle.resource.scm-command=20
 
# Controls how long threads will wait for SCM commands to complete when the system is already running the maximum number of SCM commands. Value is in seconds. Default is 2 seconds.
throttle.resource.scm-command.timeout=2
 
# The maximum number of concurrent requests using "hosting" commands, git clone, git push, git pull. Default value is 1.5*cpu (1.5 times the number of available cpus/cores).
throttle.resource.scm-hosting=20
 
# Controls how long threads will wait for SCM commands to complete when the system is already running the maximum number of SCM commands. Value is in seconds. Default is 300 seconds (5 minutes).
throttle.resource.scm-hosting.timeout=300

What happens when the limits are reached?

For the given resource, the request will wait until a currently running request has completed. If no request completes within a configurable timeout, the request will be rejected.

When requests while accessing the Stash UI are rejected, users will see either a 501 error page indicating the server is under load, or a popup indicating part of the current page failed.

When git client commands (pull/push/clone) are rejected, Stash does a number of things:

Stash will return an error message to the client which the user will see on the command line: "Stash is currently under heavy load and is not able to service your request. Please wait briefly and try your request again"
A warning message will be logged for every time a request is rejected due to the resource limits.
For five minutes after a request is rejected, Stash will display a red banner in the UI to warn that the server is under load.

Child pages