Terraform On a New Cloud – Getting Started
When I started to Terraform and was looking for examples, I found plenty of code showing how to create individual resources or groups of them but not much on how to structure the code for a whole organization. As a software engineer, there are many great examples of how to structure software projects, along with design patterns and principles on how to create extensible but DRY code. Infrastructure projects bring in a variety of new concerns and because infrastructure as code is so new, it doesn’t have the maturity that brings with it tried-and-tested philosophies. I endeavor here to lay out what we’ve found Terraforming dozens of organizations.
This will be a multi-part series, first laying out the basic philosophy and structure of our terraform repositories, then giving examples of importing existing resources, delineating services into “projects,” and adding the automation and safety checks that come with operational maturity. This is the first piece of the series and simply explains that philosophy and structure.
So you think you wanna Terraform?
So, your organization has adopted Terraform and you’ve begun the process of defining all the cloud infrastructure in code. You’ve read the Terraform best practices guide, checked out the modules on the Terraform registry and are intimately familiar with the Terraform docs. Where do you begin? How should you structure all this code; should it be in a single repository or distributed among multiple repositories? How do you handle shared resources among different teams, particularly when your infrastructure spans multiple accounts? Making all of an organization’s infrastructure open source is dangerous, and although it would be helpful for a new DevOps engineer like yourself, it’s not something that’s done. Fortunately, we at Rhythmic Tech have experience Terraforming many organizations on a variety of clouds and can provide some pointers.
I’m going to preface this blog post with a warning: Infrastructure as Code is not (yet) an exact science. As with everything in life, there are decisions to be made, trade-offs to be weighed and risks to be taken. The fact that you’re spending your time reading a blog about Terraform tells me that you’re probably an intellectual who embraces and even enjoys these challenges. This experience brings knowledge and opinions you uniquely can bring to bear on these challenges. I invite you to challenge me here, point for point, and submit your findings as issues here.
Why Infrastructure as Code?
Usage of Hashicorp’s configuration language, HCL, has more than doubled in usage between 2017 and 2018, as measured by GitHub contributors and if you like being able to create the same infrastructure across multiple environments — or even clients — it pays to get familiar with this tool. As I see it, there are a few guiding principles that dictate why this should be a common practice in your organization;
One of the biggest payoffs for DevOps is that which comes with being able to triage situations that have gone sour. The payoff does not come from being able to place blame on someone or something else. This should never be the purpose of triage, as it’s not a constructive exercise, but usually a destructive one. Having an infrastructure that is tagged so you can reference the code that created the infrastructure allows you to see the circumstances under which the event occurred.
“Sharing is caring” has been drilled into our generation since kindergarten, so it should be self-evident that sharing is, as a general rule, a good thing. The success of open-source software should make anyone who doesn’t automatically consider sharing a good thing think twice about that assumption.
We have several clients that have a production environment and want to create identical demo environments, or they have an environment on the West Coast and want to migrate it to the East Coast. If your Terraform is written right, this is as easy as changing a variable and running Terraform apply, but if you create the environment from hand, you’re out of luck!
By making infrastructure changes in Terraform, we can use the same code review process that’s worked so well for software engineers for our infrastructure. No longer will infrastructure changes be made through an email to your sysadmin/wizard, but they can be done as collaborative pull requests that can undergo automated tests.
Where to begin?
Before creating your organization as code, there are a few architectural decisions that need to be made and trade-offs that must be weighed. It’s common practice to have multiple accounts in an organization, each with a different purpose. If each of these accounts is a nearly identical copy of the others, it may make sense to have all your accounts as different Terraform work-spaces in a single repository. This minimizes the amount of drift between the accounts and the amount of duplicate code but in the same stroke limits the amount of customization you can do per account. If you are using the same branch for all of your accounts, it also gets rid of any isolation you have between accounts. That means changes to your development account code risk affecting your production account. You also have to deal with the problem of the Terraform back-end: Do you have a single Terraform back-end for all these accounts and therefore a super-user that has access to all the accounts, or do you have a different back-end for each account? There’s no one-size-fits-all solution here, but because we have customers that vary widely — both between customers and between accounts in the same customer — we solve this by having a separate repository and separate back-end for each account. This does mean that we do a bit of copy-and-paste between different account repositories, but cooking copypasta is far easier than untangling problems of spaghetti code.
Fortunately, there is less debate on what the structure of that repo should look like. Terraform refreshes its “state” on every plan and apply, so if you have a lot of resources defined, that means you’re going to be waiting a while as Terraform is making lots of API calls. It also means there’s more chance for an errant resource to get altered while you’re hacking away in some unrelated part of your infrastructure. I’ve seen this phenomenon of unintended changes because of too many resources being defined in one project referred to as the “blast radius.” You can get a good starter template by cloning this sample-aws-project, which has tooling to cut down on monotonous tasks.
We’ve got our Terraform repository, access to the accounts we need, and the green light to define our infrastructure as code! How do we do that? We also want to take care not to break anything already living in our environments, but we want to define it as code. There’s more than one way to do this, including tools like Terraforming. This can be a good place to start, but once you’ve got an idea of what Terraform resources need to be imported and how to do so, don’t waste time getting everything perfect. The dirty secret is that your resource definitions don’t need to be perfect before you import the resources. If you import them and there’s a discrepancy between what’s defined in Terraform and what’s in the cloud, then simply running a Terraform plan will give you all the attributes you need. Add the changes from your plan to your resource definition, rinse and repeat. As long as you don’t run a Terraform apply and then when asked if you’d like to apply the changes type “yes,” you don’t have to worry about breaking anything that’s currently in your environment. If you can’t get something just right but want to leave it defined in code, don’t be a stranger to Terraform’s life-cycle block, where you can use the ignore_changes list to ignore discrepancies between your code and your existing infrastructure. Just make sure to document why you’re doing this so that anyone who comes after you isn’t left scratching their head!