Performance in EBS is widely misunderstood, resulting in many writing off EBS as a choice between “slow” or “expensive”. While there is some truth to that, I often see EBS implementations that fail to take advantage of cheap or even free options to boost performance.
This article focuses specifically on GP2 volumes, which are SSD-backed, low cost, and the default in many regions. As a result, they are the most widely deployed volume type by far. Let’s take a look at the documented performance characteristics of a GP2 volume:
Max Burstable IOPS
At first glance, these look like great numbers, but they don’t tell the full story. The actual max IOPS for any specific volume is based on size, specifically 3 IOPS per GB up to the maximum. That means that only volumes of 5.3TB or larger can hit that max. While you might have the potential to hit 16,000 IOPS on a default GP2 instance, you also need to contend with the throughput limits.
Baseline IOPS and Burst Limits
First, we take a look at smaller volumes, which often suffer easily avoidable performance problems. Consider a 100GB GP2 volume. At that size, the volume has a baseline performance of 300 IOPS (3 x 100GB), comparable to a 7200RPM SATA drive. But, it can burst up to 3,000 IOPS using a leaky bucket credit system. Unfortunately, the bucket is small and replenishes slowly. At 100GB, our volume can run at 3,000 IOPS for approximately 30 minutes before depleting the bucket, but must wait 5 hours to replenish.
As a rule of thumb, you can burst to the full 3000 IOPs roughly 10% of the time for any given volume size (under 1000GB). Therefore, your average utilization cannot exceed 330 IOPS, or your rate of depletion will exceed the rate of replenishment. This is difficult to manage because something as benign as a noisy log file can keep your credit balance persistently low, and an isolated burst can steal performance for hours.
Credit depletion becomes even more problematic with ultra-small volumes such as instance root volumes. Small volumes not only have smaller baselines but slower rates of replenishment. Simple access logging will wipe out a small boot volume’s credits on even a moderate workload, as will a swap file. The minimum baseline is 100 IOPS, which won’t get you far.
IOPS is a tricky metric, and you typically care more about throughput–the speed at which data comes on and off the volume. The problem is that assessing expected throughput is (surprise!) complicated. GP2 volumes have a throughput limit based on size:
Like IOPS, these numbers are limits, not expected results. Throughput is a function of IOPS but is also heavily influenced by the exact size of your I/O operations, the overall instance throughput and other factors. For volumes under 334GB, baseline throughput degrades linearly to a minimum of 128MB/s, with bursting based on your available credits. Like IOPS, don’t size throughput based on the burst.
Let’s consider a 100GB document repository with 16KB files. In order to meet user demand, you would like to achieve the maximum throughput of 250MB/s. Doing so would require 16,000 IOPS (250MB/s * 1024KB / 16KB), meaning you would need a 5.3TB volume. If you instead sized your volume to 100GB, your maximum throughput is 128MB/s, but your actual throughput would be limited by IOPS–a blazing 5.2MB/s (16KB * (300 IOPS * 110%) / 1024KB).
Now let’s look at another 100GB repository of 256KB files. Again, you would like to achieve maximum throughput. Doing so would require 1,000 IOPS (250MB/s * 1024KB / 256KB), meaning you would need a 334GB volume. That’s quite a bit smaller than a 5.3TB volume to meet the I/O demands of smaller files.
With GP2, the maximum I/O operation in GP2 is 256KB, so larger files will result in additional I/O operations. For example, reading any one file between 257-512KB in size requires two IOPS. As a result, a 257KB file will take twice the IOPS as a 256KB file–a painful margin.
Leave some margin for error and be sure to test your workload. The critical takeaway here is that you cannot look at just desired IOPS or desired throughput. Either can constrain you depending on your exact workload, even though their values are correlated.
This article is about GP2, and Provisioned IOPS (PIOPS or the io1 storage class) is technically a different type of storage. However, PIOPS are commonly but unnecessarily used to achieve better performance than GP2 provides.
PIOPS allow you to pay for the exact IOPS you’d like for a volume. But there aren’t many AWS services that will break your budget faster than PIOPS. I would guess that 75% of the PIOPS volumes I’ve seen could have achieved the needed throughput for significantly less spend. This is because PIOPS are priced at both the size of the volume ($.125/GB compared to $.1/GB for GP2) as well as the IOPS you are reserving ($0.065/IOPS). Those are small numbers, but they add up fast.
Let’s suppose you need to store 500GB of data and require 4,000 IOPS. In GP2 will require you to create a 1.3TB (4000 IOPS / 3 IOPS/GB) volume at a cost of $130/month. With PIOPS, you’d pay $322/month ($.125/GB * 500GB + $.065/IOPS * 4,000 IOPS) for the same performance. In other words, just making your volume bigger gets you the result you want for less. It may feel wrong to make your volume 2.5x larger than it needs to be, but it probably feels better than paying 2.5x more for the same performance (and less storage!).
Provisioned IOPS do deliver a slightly more consistent experience, with an SLA to deliver within 10% of the requested IOPS 99.9% of the time–GP2 offers a 99% SLA over the same period of time. And PIOPS offer much higher maximum IOPS/throughput than GP2. Generally speaking, though, if you need less than 16,000 IOPS in your volume, the most economical way to get it is to buy a 5.3TB volume regardless of the amount of data you need to store. 16,000 provisioned IOPS costs you a minimum of $1040/month ($.065 * 16,000 IOPS).
Size volumes for both performance and capacity, oversizing volumes to achieve desired IOPS economically.
Consider RAID 0 to put together multiple volumes for increased IOPS and throughput, but be mindful of overall instance limits.
Replace older instance families and very old GP2 volumes to take advantage of EBS optimized instances and improvements to GP2 volumes.
Use Nitro instances whenever you’re in a high I/O situation.
Use CloudWatch to figure out why your performance isn’t meeting expectations–it may be constrained by the instance or other factors.
Don’t count on bursting for volumes under 1TB. Expect actual IOPS to be at or near the baseline.
Consider ST1 when you’re working with larger files.
If you absolutely have to use swap or page files, place them on their own volume.
This is not an exhaustive treatment of performance optimization in EBS and heavily focuses the difficulties we see people run into with typical I/O workloads. If you are in a situation where you need to do extremely I/O-heavy workloads, AWS has a great, comprehensive deep dive available.