Infrastructure recommendations for enterprise Bitbucket instances on AWS
Knowing your load profile is useful for planning your instance's growth, looking for inflated metrics, or simply keeping it at a reasonable size. In Bitbucket Data Center load profiles, we showed you some simple guidelines for finding out if your instance was Small, Medium, Large, or XLarge. We based these size profiles on Server and Data Center case studies, covering varying infrastructure sizes and configurations.
If your instance is close to outgrowing its size profile, it may be time to consider upgrading your infrastructure. In most cases, upgrading involves deciding how to deploy your Bitbucket Server Data Center application, NFS, and database nodes. However, it's not always clear how to do that effectively.
To help you, we ran a series of performance tests against a typical Large-sized Bitbucket Server Data Center instance. We designed these tests to deliver useful, data-driven recommendations for your deployment's application and database nodes. These recommendations can help you plan a suitable environment, or check whether your current instance is adequate for the size of your content and traffic.
Note that large repositories might influence performance.
We advise that you monitor performance on a regular basis.
We ran all tests in AWS. This allowed us to easily define and automate multiple tests, giving us a large (and fairly reliable) sample.
Each part of our test infrastructure was provisioned from a standard AWS component available to all AWS users. This allows for easy deployment of recommended configurations. You can also use AWS Quick Starts for deploying Bitbucket Server Data Center.
It also means you can look up specifications in AWS documentation. This helps you find equivalent components and configurations if your organization prefers a different cloud platform or bespoke clustered solution.
To effectively benchmark Bitbucket on a wide range of configurations, we designed tests that could be easily set up and replicated. Accordingly, when referencing our benchmarks for your production environment, consider:
We didn't install apps on our test instances, as we focused on finding the right configurations for the core product. When designing your infrastructure, you need to account for the impact of apps you want to install.
We used RDS with default settings across all tests. This allowed us to get consistent results with minimal setup and tuning.
アトラシアンのテスト環境では、同じサブネット上でホストされている専用の AWS インフラストラクチャを使用しています。これによりネットワーク遅延を短縮できます。
We used an internal testing tool called Trikit to simulate the influx of git packets. This gave us the ability to measure git request speeds without having to measure client-side git performance. It also meant our tests didn’t unpack git refs, as the tool only receives and decrypts git data.
The performance (response times) of git operations will be affected largely by repository size. Our test repositories averaged 14.2MB in size. We presume that bigger repositories might require stronger hardware.
Due to limitations in AWS, we initialized EBS volumes (storage blocks) on the NFS servers before starting the test. Without disk initializations, there is a significant increase in disk latency, and test infrastructure slows for several hours.
We enabled analytics on each test instance to collect usage data. For more information, see Collecting analytics for Bitbucket Server.
Each test involved applying the same amount of traffic to a Bitbucket data set, but on a different AWS environment. We ran three series of tests, each designed to find optimal configurations for the following components:
Bitbucket application node
To help ensure benchmark reliability, we initialized the EBS volumes and tested each configuration for three hours. We observed stable response times throughout each test. All tests used Bitbucket Data Center 5.16, which was the latest version at the time. We used a custom library running v1 protocol to simulate Git traffic.
We created a Large-sized Bitbucket Data Center instance with the following dimensions:
Traffic (git operations per hour)
Content and traffic profiles are based on Bitbucket Data Center load profiles, which put the instance’s overall load profile at the highest level of Large profile. We believe these metrics represent a majority of real-life, Large-sized Bitbucket Data Center instances.More details about data set dimensions
Projects (including personal)
Comments on pull requests
Total pull requests
Pull requests open
Pull requests merged
(git operations per hour)
We used the following benchmark metrics for our tests.
Git throughput, or the number of git hosting operations (fetch/clone/push) per hour
32,700 (Minimum), the higher the better
This threshold is the upper limit of Large traffic defined in Bitbucket Data Center load profiles. We chose this limit due to the spiky nature of git traffic.
Average CPU utilization (for application nodes)
75% (Maximum), the lower the better
When the application nodes reach an average of CPU usage of 75% and above, Bitbucket's adaptive throttling starts queuing Git hosting operations to ensure the responsiveness of the application for interactive users. This slows down Git operations.
No nodes go offline
When the infrastructure is inadequate in handling the load it may lead to node crashes.The test traffic had fixed sleep times to modulate the volume of git hosting operations. This means the benchmarked git throughput doesn’t represent the maximum each configuration can handle.
We tested each configuration on a freshly-deployed Bitbucket Server Data Center instance on AWS. Every configuration followed the same structure:
Virtual machine type
When testing m5.xlarge (16GB of RAM), we used 8GB for JVM heap. For all others, we used 12GB for JVM heap. Minimum heap (Xms) was set to 1G for all the tests.
If you do not a great number of 3rd party plugins, smaller JVM heap (2-3GB) is enough.
Also note that Git operations are expensive in terms of memory consumption and are executed outside of the Java virtual machine. See more on Scaling Bitbucket Server.
Each Bitbucket application used 30GB General Purpose SSD (gp2) for local storage. This disk had an attached EBS volume with a baseline of 100 IOPS, burstable to 3,000 IOPS.
We used Amazon RDS Postgresql version 9.4.15, with default settings. Each test only featured one node.
Our NFS server used a 900GB General Purpose SSD (gp2) for storage. This disk had an attached EBS volume with a baseline of 2700 IOPS, burstable to 3,000 IOPS. As mentioned, we initialized this volume at the start of each test.
We used AWS Elastic Load Balancer. Application Load Balancer at the time of performance testing doesn't handle SSH traffic.
We ran several case studies of real-life Large-sized Bitbucket Data Center instances to find optimal configurations for each component. In particular, we found many used m5 series virtual machine types (General Purpose Instances). As such, for the application node, we focused on benchmarking different series' configurations.Refer to the AWS documentation on Instance Types (specifically, General Purpose Instances ) for details on each virtual machine type used in our tests.
Recommendations and results for large-sized instances
We analyzed our benchmarks and came up with the following optimal configuration:
Best-performing and most cost-effective configuration
m5.4xlarge nodes x 4
Performance of this configuration
Git throughput: 45,844 per hour
Cost per hour 1: $4.168
Average CPU utilization: 45%
1 In our recommendations for Large-sized profiles, we quoted a cost per hour for each configuration. We provide this information to help inform you about the comparative price of each configuration. This cost only calculates the price of the nodes used for the Bitbucket application, database, and NFS nodes. It does not include the cost of using other components of the application like shared home and application load balancer.
These figures are in USD, and were correct as of February 2019.
We measured performance stability in terms of how far the instance’s average CPU utilization is from the 75% threshold. As mentioned, once we hit this threshold, git operations start to slow down. The further below the instance is from 75%, the less prone it is to slow due to sudden traffic spikes.
However, there are no disadvantages in using larger-size hardware (m5.12xlarge, for example), which will provide better performance.
We also found a low-cost configuration with acceptable performance at $3.044 per hour:
m5.4xlarge x 3
This low-cost configuration offered a lower Git throughput of 43,099 git hosting calls per hour than the optimal configuration. However, this is still above our minimum threshold of 32,700 git hosting calls per hour. The trade-off for the price is fault tolerance. If the instance loses one application node, CPU usage spikes to 85%, which is above our maximum threshold. The instance will survive, but performance will suffer.More details about our recommendations
The following table shows all test configurations that passed our threshold, that is, above 32,500 git hosting operations per hour and below 75% CPU utilization, with no node crashes. We sorted each configuration by descending throughput.
m5.4xlarge x 6
m5.12xlarge x 2
m5.4xlarge x 4
m5.2xlarge x 8
m5.4xlarge x 3
m5.4xlarge x 3
m5.2xlarge x 6
m5.4xlarge x 3
m5.4xlarge x 3
As you can see, the configuration m5.4xlarge x 4 nodes for the application doesn’t provide the highest git throughput. However, configurations with higher throughput cost more and provide only marginal performance gains.
Our first test series focused on finding out which AWS virtual machine types to use (and how many) for the application node. For these tests, we used a single db.m4.4xlarge node for the database and single m4.4xlarge node for the NFS server.
Benchmarks show the best git throughput came from using m5.4xlarge (16 CPUs) and m5.12xlarge nodes (46 CPUs). You will need at least three nodes for m5.4xlarge and two nodes for m5.12xlarge.More details about these test results
CPU is underutilized at 30% for the following application node configurations:
m5.4xlarge x 6
m5.12xlarge x 2
This demonstrates both configurations are overprovisioned. It would be more cost-effective to use three or four m5.4xlarge nodes for the application.
However, on the three-node m5.4xlarge set-up, the CPU usage would be at ~85% if one of the nodes failed. For this reason, we recommend the four-node m5.4xlarge set-up for better fault tolerance.
Database node test results
From the application node test series, we found using three m5.4xlarge nodes for the application yielded optimal performance (even if it wasn’t the most fault tolerant). For our second test series, we tested this configuration against the following virtual machine types for the database:
As expected, the more powerful virtual machine used, the better the performance. We saw the biggest gains in CPU utilization. Git throughput also improved, but only marginally.More details about these test results
Only db.m5.large failed the CPU utilization threshold. All other tested virtual machine types are acceptable, although, db.m5.xlarge is pretty close to our CPU utilization threshold at 60%.
NFS node test results
In previous tests (where we benchmarked different application and database node configurations), we used m5.4xlarge for the NFS node (NFS protocol v3). During each of those tests, NFS node CPU remained highly underutilized at under 18%. We ran further tests to see if we could downgrade the NFS server (and, by extension, find more cost-effective recommendations). Results showed identical git throughput, using the downsized m5.2xlarge NFS node. This led to our low-cost recommendation.More details about these test results
m5.4xlarge x 3
As mentioned, this recommendation costs $3.044 per hour but offers lower fault tolerance.
Based on other test results, we recommend that, for the NFS node, use at least m5.2xlarge with IOPs higher than 1500.
Disk I/O performance is often a limiting factor, so we also paid attention to disk utilization. Our tests revealed the disk specifications we used for the NFS node were appropriate to our traffic:
900GB General Purpose SSD (gp2) for storage
Baseline of 2700 IOPS
Burstable to 3,000 IOPS.
As mentioned, we initialized this volume at the start of each test.Please be aware this information is only a guideline, as IOP requirements will depend on usage patterns.More details about these tests
The table below shows the I/O impact of our tests on the NFS node’s disk:
Total throughput (Read + Write throughput)
Average queue length
Average read latency
Average write latency