AWS Rapid Cost Savings Checklist
Cash is king right now for many companies. We’re working with every single one of our clients to identify AWS cost savings opportunities. Spend management in AWS is key–in many companies, whole teams are dedicated just to this. This post is not intended to be a comprehensive guide to cost savings. Rather, we are sharing our rapid triage checklist. When we start working with a new client, these are the quick win items we look for, and often they make a huge difference with little effort/risk.
If you already have hit the low hanging fruit and want to improve your maturity over time, follow Colin Quinn. Most of the conventional wisdom about AWS cost savings just leaves teams chasing their own tail. Not so with QuinnyPig.
EBS is fertile ground for cost savings.
- Delete unused volumes. Unused volumes tend to pile up over time as instances are torn down and one or more volumes get inadvertently left behind. Take out the trash on your unused volumes. If you have volumes you need to keep, ask whether they really need to be full volumes or if they can be converted to snapshots.
- Delete unneeded snapshots. Snapshot retention is often forgotten about, mostly because it used to be a pain. Tools like AWS Data Lifecycle Manager and Rackspace EBS Snapper make it easy–just set some tags on your instances and you’re done. For now, just focus on bulk deleting any obvious waste and worry about fine tuning your retention later.
- Question io1 volumes (PIOPS) every time. We see a lot of io1 volumes at new clients. Less than 10% actually need the luxuriously expensive PIOPS. Reasons for using them tend to include “things were slow, so we just threw a lot of IO at it” and “we did the math on exactly how many IOPS we needed”. Use the CloudWatch Metrics tab to see how many IOPS you’re actually using. If you aren’t scraping the roof of your PIOPS count, reduce it immediately. If you can, reduce it until you notice performance start to degrade at peak load, then add a little more back. Once you know your actual needs, make sure you wouldn’t be better off with a larger GP2 volume. See our post on understanding GP2 performance for more on that.
- Review at your 10 largest volumes. Are they over-provisioned? You can grow volumes at any time, so don’t defensively add significantly more space than you need. Check out the volume type, too. st1 and sc1 storage types can often save a ton of money and can be easily migrated to. Consult AWS documentation on EBS Volume Types. But, we only recommend making changes to volume types if you can get significant savings–at least in rapid triage mode.
EC2 is where people spend most of their time and energy, but the savings opportunities can often be much harder to realize in practice than theory.
- Upgrade instance type families. As AWS rolls out new instance type families, cost savings are typically backed in. For example, a t3 instance is typically 10% cheaper than the corresponding t2 instance. This can often be as simple as shutting down the instance, changing the instance type, and powering it back up. Sometimes OS compatibility can be an issue, and testing is recommended. But, this can be an easy win for at least some of your instances.
- Turn on detailed CloudWatch metrics. Adding cost sounds can seem unintuitive. But, having good visibility allows you to make better, data driven decisions about how to size your instances.
- Review your 10 most expensive instances. Your detailed AWS bill can let you quickly find those. Make sure they aren’t over-provisioned. Often these instances are beefed up for data processing and other types of periodic jobs. Do they need to be running 24×7? Have they seen peak load in the last 30 days? And does the instance family make sense? For example, if you have a RAM-heavy job but you’re using the c5 instance type, maybe m5 makes sense. Or r5.
- Consider Graviton processors. If you’re running an app with a runtime (e.g., Java), the AWS Graviton instances are often at least as fast and much cheaper than their Intel peers. And often, you can simply switch over to them with no configuration changes whatsoever.
- Schedule non-production instances. It’s easier than you think to turn your resources on/off overnight. The AWS Instance Scheduler fully automates the process, and all you have to do is set a tag on the instances. You can always turn them back on if you need them in a pinch. If you don’t like the AWS option, there are dozens of open source schedulers out there with different bells and whistles.
S3 is cheap. Except when it’s not. Clients often fail to consider just how much S3 costs can add up, particularly since it can creep slowly over time.
- Use the Metrics tab to see how much data is in each bucket. Prioritize your most expensive buckets–do you actually need this data? And do you need it on demand? Infrequent access and Glacier are great options for long-term storage. Use Lifecycle Rules to transition your old data if appropriate.
- Check your usage of Infrequent Access and Glacier. If your average object size is under 128KB, Infrequent Access will cost you more than the standard storage tier. Similarly, Glacier needs at least a 2MB average object size before you can break even. Remove lifecycle policies and use S3 Batch Operations to get data back into the storage tier if it’s costing you more elsewhere (just be mindful that Batch Operations cost money in several ways).
- Use Lifecycle Rules to automate cleanup. Don’t rely on scheduled tasks to purge old data, particularly backups.
General Savings Checklist
There are also overall tactics that can move the needle more than anything else.
- Use AWS Native Cost Management Tools. Look at Cost Explorer. Use AWS Budgets. Review the Trusted Advisor reports and use the “Exclude” feature to reduce the noise they generate.
- Prune unused, orphaned or abandoned resources regularly. Every AWS account has stuff that needs taken out with the trash. Don’t pay to keep it around. ELBs, EIPs, RDS snapshots, and S3 buckets are serial offenders on this list, as teams tear down the EC2 instances and leave all of the surrounding periphery untouched.
- Use Savings Plans and Reserved Instances. Savings Plans are your cost savings machete. Reserved Instances are your surgical knife. If you don’t have a lot of time to devote to managing RIs, Savings Plans are probably for you. Figure out the least you would possibly spend in AWS over a year and put a Savings Plan in place. But be mindful of future projects that might reduce your spend below that threshold and balance your risk appropriately.
- Use Reservations everywhere, not just EC2. RDS, Redshift and Elasticache all support reservations. Use them.
- Sell Reservations you aren’t using. They have more value than you think.
- Use cost allocation tags, if appropriate. If you have multiple teams in a single AWS account, cost allocation tags can create accountability by clearly showing each team’s spend. Prioritize cost allocation tags for your biggest areas of spend. Don’t get bogged down trying to get 100% coverage.