Delivering high-quality Terraform code is something we are proud of doing at sysdogs on our daily basis. Through the past years, we have gained a lot of knowledge and experience doing that for a variety of customers all over the world, from a variety of industries, trying to do our best to support other teams with best-quality infrastructures. This article's intention is to be a comprehensive list of bullet points, pinning things that should be avoided. This is a completely blameless post though, try to keep it as a checklist of things you should keep in mind while writing Terraform code.
- Use stable releases. Always. And wait for fixes, before upgrade.
- Combine tools. Do not try to Terraform everything.
- Do not treat examples as well-defined solutions.
- Automate. Do not plan or apply manually.
- Do not do things manually. Never.
- Test additive and fresh-start change.
- Separate modules wisely.
- Use tools to test code statically, resources quality and security.
- Use functions accordingly.
- Do not overkill modules with variables.
- Manage provider versions.
If you want to know why our engineering team looks like a bunch of magicians wearing hats while writing Terraform code, sign up for our newsletter, be an early-bird, and get notified about all the things that are gonna happen in November this year!
Use an early-bird release
Three years ago, we have been doing cloud infrastructures with Terraform 0.11.
We've been waiting literally years, for Terraform 0.12 - that brought for
loops
Not everything is Terraform
Sadly, engineers still try to Terraform things, that should not be Terraformed. Very complex virtual machines' configuration/bootstrap scripts are not what Terraform is designed for. It is important to marry other Infrastructure as Code software, like Ansible On Code Quality with Ansible, Salt or Chef that is designed for this task, with Terraform.
Example as production
Terraform documentation has pretty nice examples of resource usage in the Examples section. We have noticed many engineers treat this section as production ready, to-go code. Unfortunately, those examples are mostly insecure and present the very basic implementation which is, in most cases, not the best one, both technically and in the perspective of scale.
Example of the resource documentation
resource "aws_iam_role" "example" {
name = "example"
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "vpc-flow-logs.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
EOF
}
Example of Terraform resource reference from official docs
data "aws_iam_policy_document" "policy_assume_role" {
statement {
effect = "Allow"
actions = [
"sts:AssumeRole"
]
principals {
type = "Service"
identifiers = [
"vpc-flow-logs.amazonaws.com"
]
}
}
}
resource "aws_iam_role" "example" {
name = "example"
assume_role_policy = data.policy_assume_role.json
}
Employ automation
Unfortunately, we can still see operations and development teams that do not use automation at all. They run plans and apply them from their local workstation, do not commit the code to the repository, and even... keep the state file with secrets in git repositories. This in turn creates a lot of security risks and misunderstandings among team members. There is no audit log and possibility to verify the changelog. It often causes confusion among team members, when they run a plan, and see infrastructure changes that are not yet present in the common codebase.
Avoid doing things manually
Well. What else can I say? 🙂 We are fine with testing things manually and doing research and development on sandbox environments. But making a manual change in production in the twenty-first century is difficult to accept, if you'd like to keep high quality standards. Going even further, positioning yourself as "infrastructure as code adopter" while maintaining manual changes inside the infrastructure seems a little off.

Is it worth the time?
Source: https://imgs.xkcd.com/comics/is_it_worth_the_time.png
Additive and full-start
We noticed that engineers mostly stick to testing only the additive changes in Terraform. It is the easiest and the very first-approach; to deliver what is expected from us to deliver. But it is extremely important to test clean state applies too, as very often there are cyclic dependencies between states that will not allow fresh environment creation. It is crucially important to plan disaster recovery procedures and worst case scenarios.
No modules separation
One of the most common problems we've noticed while auditing our customers' code is lack of responsibility enforcement, and module separation. They duplicate resources between modules without consistent resposibility principles
. Eliminating code repetition is only one of the abundant advantages of the modular approach. The second one is an ability to enforce code standards and security compliance by, for example, forcing encryption on resources that support it.Modules dependency hell
On the other hand, we've also noticed customers that, at some point, end up deep in the dependency hell problem. They made their modules so small that upgrading the specific module in the whole environment source code is a living nightmare, because we have to upgrade tons of dependent modules to make a small change on one resource. By the way, we solved the problem of veryfing module version across the whole Terraform state with tooling and made it open source for everyone to use: https://github.com/sysdogs/tfmodvercheck.

An example of dependency hell
Source: https://webwereld.nl/cmsdata/features/3771627/yu082e99pupei7v9_thumb800.jpg
Poor code quality
Keep it simple, stupid
and Do not repeat yourself are principles we strongly believe in every Infrastructure as Code toolset.Make the code look good. Make it an art. Make it nice and clean. Make it consistent. Make it understandable at a first glance.
No tests
Automate all the things. Test all the things. While auditing our customers' code, we noticed the following problems: Terraform is still not treated as real code and Terraform code is not tested like real code should be. Lack of static code analysis, deep inspection analysis, unit tests and poor code quality may cause a lot of problems with infrastructure development in the future, and it is extremely important to have it in mind while writing Terraform code. With a proper continuous integration pipeline in place, you can save a lot of time by marking non-functional code, or possible security vulnerabilities, before you even apply it to the real environment.
Improper functions usage
We have noticed that engineers tend to use functions everywhere, even if they are not necessary. A good example is usage of element
instead of count.index
if we do not require the "wrap around" mechanism.
Variables overkill
Modules and variables are great abstraction layers, but a double-edged sword too. We've seen hundreds of modules that configure literally each resource argument with variables. In the end, you're stuck with hundreds of variables for the module that's only responsible for provisioning S3 bucket resources. Same applies to outputs. Output only things that you will leverage in a different place of the codebase.
No provider management
Most of our customers, while they start working with Terraform, do not pin provider versions. This can lead to unexpected consequences if the provider we are using will introduce breaking changes and we will miss this one, important delete, inside Terraform plan.
Unpinned provider version
provider "aws" {
region = "eu-central-1"
}
Pinned provider version
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 3.0"
}
}
}
provider "aws" {
region = "us-east-1"
}
References
- For & For_each announcement - HashiCorp Official Website. (n. d.). https://www.hashicorp.com/blog/hashicorp-terraform-0-12-preview-for-and-for-each (accessed October 4, 2020).
- Terraform 0.13, issue 26226 on Github. (n. d.). https://github.com/hashicorp/terraform/issues/26226 (accessed October 4, 2020).
- Terraform 0.13, issue 26274 on Github. (n. d.). https://github.com/hashicorp/terraform/issues/26274 (accessed October 4, 2020).
- Terraform 0.13, issue 26070 on Github. (n. d.). https://github.com/hashicorp/terraform/issues/26070 (accessed October 4, 2020).
- Terraform 0.13, issue 26395 on Github. (n. d.). https://github.com/hashicorp/terraform/issues/26395 (accessed October 4, 2020).
- On Code Quality with Ansible. (n. d.). https://sysdogs.com/articles/how-to-write-quality-code-in-ansible (accessed October 4, 2020).
- Writing flexible code with the single responsibility principle - Severin Perez, Medium.com. (n. d.). https://severinperez.medium.com/writing-flexible-code-with-the-single-responsibility-principle-b71c4f3f883f (accessed October 4, 2020).
- https://github.com/sysdogs/tfmodvercheck. (n. d.). https://github.com/sysdogs/tfmodvercheck (accessed October 4, 2020).
- KISS Principle - Wikipedia. (n. d.). https://en.wikipedia.org/wiki/KISS_principle (accessed October 4, 2020).
- DRY Principle - Wikipedia. (n. d.). https://en.wikipedia.org/wiki/Don%27t_repeat_yourself (accessed October 4, 2020).