How to structure Terraform code for scale?

2020-10-12|By Kamil Szczygieł|Code

Terraform is one of the most popular tools to describe infrastructure as code. Being able to incorporate typical software development patterns into infrastructure results in better predictability, higher quality, and lowers the chance of a human error. You can implement continuous integration pipelines that will ensure that the infrastructure code is properly structured, follows all coding guidelines and security compliance requirements in your organization. To go even further, you can develop unit tests to increase the chance of finding an error before it will be applied to the real infrastructure. It is important to keep in mind that with all the blessings, drawbacks are also coming. You have to implement a change management process, review process, proper version control system, and structure. If the operations team in your organization has never worked with any version control, or with any programming language at all, it is crucial to teach them how to properly develop high-quality code, test it, and finally, release it.

In this article, we will discuss why you should plan your Terraform code structure beforehand and what are the best ways to do this, as this is one of the most important decisions you're going to make.

Why should I plan the code structure beforehand?

Planning your code structure is a crucial part of infrastructure as code pattern, as most of the decisions you are going to make, will be very hard to undo/change, without a huge effort. For example, you have to think out those:

  • Single repository vs multiple repositories
  • Branching pattern
  • CI/CD integration
  • Testing methodologies
  • Environment promotion
  • Code consistency
  • Security compliance

The best way to approach this topic is to evaluate this list, and all the pros and cons of each approach. So, let's begin.

Single repository versus multiple repositories

Mono-repo versus multi-repo is a very hot topic, not exclusively regarding Infrastructure as Code

. Lets start with an explanation.

Single repository

With this approach, all your code lives within a single version control repository. Look at the following example:

`terraform-environment` repository

modules/
  vpc/
    main.tf
    outputs.tf
    variables.tf
  networking/
    main.tf
    outputs.tf
    variables.tf
  db/
    main.tf
    outputs.tf
    variables.tf
envs/
  dev/
    main.tf
    outputs.tf
    variables.tf
  staging/
    main.tf
    outputs.tf
    variables.tf
  prod/
    main.tf
    outputs.tf
    variables.tf

All of your Terraform modules and environments live in separate directories. This simplifies the review process - you have only one repository to look at. There is no need to jump between different origins. However, it is much harder to implement access control as you cannot assign permissions to view certain parts of the repository. In larger environments, the amount of code can be counted in 6 or 7 figures which increases the size of the repository. Implementation of continuous integration / continuous deployment can be really painful as you must work with repository paths to distinguish where the code must be applied (the complexity depends on the CI/CD toolset).

Multiple repositories

With this approach, your code is kept in multiple version control repositories. It may look like this:

Multi-repo approach - repository `terraform-env-dev repository`

vpc/
  main.tf
  outputs.tf
  variables.tf
networking/
  main.tf
  outputs.tf
  variables.tf
db/
  main.tf
  outputs.tf
  variables.tf

Multi-repo approach - repository `terraform-env-stg` environment

vpc/
  main.tf
  outputs.tf
  variables.tf
networking/
  main.tf
  outputs.tf
  variables.tf
db/
  main.tf
  outputs.tf
  variables.tf

Multi-repo approach - terraform-env-prod` repository.

vpc/
  main.tf
  outputs.tf
  variables.tf
networking/
  main.tf
  outputs.tf
  variables.tf
db/
  main.tf
  outputs.tf
  variables.tf

Multi-repo approach - directory #1

main.tf
outputs.tf
variables.tf

Multi-repo approach - directory #2

main.tf
outputs.tf
variables.tf

Multi-repo approach - directory #3

main.tf
outputs.tf
variables.tf

Keeping code in multiple repositories massively increases the complexity of the review process as you have to go through multiple places to review a single change. However, this allows you to control the access to environments through repository permissions in a more granular fashion and enables you to work on the codebase like you'd work on a regular software package - you can apply versioning and packaging. At sysdogs, we follow the multiple repositories approach and also this is the recommended way of structuring Terraform code according to HashiCorp guidelines.

Branching pattern

Multiple branching models can be used with Terraform. Take a look at the examples below:

Master branch as a single source of truth

In this approach, the master branch is the only source of truth for the infrastructure that is deployed. Code is developed on feature branches and if ready and tested - merged to master and deployed.

When relying on single repository method, master branch controls all environments, while in the multiple repositories method, each environment has its own repository. Hence, master branch controls only one environment.

By using master as a single source of truth you have a quick and easy way to take a look at whatever is actually deployed in each environment. It is much easier to audit and verify which modules, and in which versions, are present. Of course, it adds another layer of complexity, and a requirement to update all modules at once to keep them in sync. In the end, the master branch reflects the current state of the infrastructure. At sysdogs, we follow this pattern and highly recommend it for the infrastructures and businesses that focus on scale.

Branch per environment

In this approach, you have a single repository that controls the environments and you create a branch for each environment you wish to deploy to.

  • master branch - controls the production environment,
  • staging branch - controls the staging environment,
  • develop branch - controls the development environment.

This way, you still keep the principle of single source of truth, as your branch reflects the current state of the infrastructure for a particular environment. You can force a promotion pattern that will require the code to pass through all environments until it will reach the master branch to keep consistency within version control history. However, this can lead to conflicts in a situation when someone directly commits a change related to a single environment. Imagine something like this:

  1. Someone has to change the database instance amount in the staging environment.
  2. This "Someone" creates a pull request from the feature branch straight to the staging branch.
  3. Git history has diverged, this change never makes it to develop branch.

Monolith approach

The most common code structure we've seen is keeping everything in a single version control repository. Starting from Terraform modules, through resources, to actual values. Resources are referencing modules through local paths within the same repository. You can find an example of the directory structure below:

Monolith approach

modules/
  vpc/
    main.tf
    outputs.tf
    variables.tf
  rds/
    main.tf
    outputs.tf
    variables.tf
  internet-gateway/
    main.tf
    outputs.tf
    variables.tf

dev/
  igw.tf
  main.tf
  rds.tf
  outputs.tf
  variables.tf
  vpc.tf
stg/
  igw.tf
  main.tf
  rds.tf
  outputs.tf
  variables.tf
  vpc.tf
prod/
  igw.tf
  main.tf
  rds.tf
  outputs.tf
  variables.tf
  vpc.tf

This pattern is flawed, and it's easy to see why - there is no way to version modules or promote changes in a controllable way for multiple environments. Changing a module does impact all environments, and reapplying the current state is not possible without introducing new changes. There is no way to reuse the code between environments as each environment is a completely separate being. This becomes overwhelming in larger infrastructures and quickly ends up in diverged and detached environments.

Module approach

Another approach that we encounter very often is the module approach. It is similar to the monolith approach, but all modules are moved to a separate repository/repositories. Here's an example:

`terraform-modules` repository

vpc/
  main.tf
  outputs.tf
  variables.tf
rds/
  main.tf
  outputs.tf
  variables.tf
internet-gateway/
  main.tf
  outputs.tf
  variables.tf

`terraform-environments` repository

dev/
  igw.tf
  main.tf
  rds.tf
  outputs.tf
  variables.tf
  vpc.tf

stg/
  igw.tf
  main.tf
  rds.tf
  outputs.tf
  variables.tf
  vpc.tf


prod/
  igw.tf
  main.tf
  rds.tf
  outputs.tf
  variables.tf
  vpc.tf  

This way you can version modules, repositories, and control which module version is deployed in each environment. However, it is still impossible to share the code between environments and keep them in parity. This becomes overwhelming in larger infrastructures and quickly ends up in diverged environments.

Skeleton

After all those years of working with Terraform in multiple environment deployments, we've took inspiration from a variety of programming patterns, to create something new that would fit us better. Skeleton is a repository, that reflects a single environment, in a very generic way. You can think of it as a place to keep Terraform code that is supposed to be present in every environment - starting from development, through staging, and finally at production. Its main purpose is to keep every environment in sync and treat it as an artifact - versioned package of environment definition.

Skeleton assumptions:


- Description of a single environment in the most generic, flexible way. - Most variables should have default value, to avoid getting stuck in variable passing hell. These can be overridden by the environment repository, which we will discuss in the next section. - Every change to the skeleton creates a new release that is versioned. - Can be split into multiple repositories based on the domain context. This gives you the ability to even further improve the control the access to a particular part of the infrastructure or delegate the responsibility to a particular team. - You can treat it as a regular Terraform module.

You can find an example of directory structure within the skeleton below:

Example of `skeleton` directory structure

app/
  internet_gateway.tf
  main.tf
  nat_gateway.tf
  outputs.tf
  rds.tf
  subnet_app.tf
  subnet_rds.tf
  variables.tf
  vpc.tf

shared/
  ecr.tf
  main.tf
  outputs.tf
  variables.tf

Environment repository

The environment repository leverages the skeleton, as an execution layer. It contains the same directory structure as skeleton, sources skeleton as a module, and describes components that are unique for a particular environment e.g. development database that shouldn't exist in all environments. Of course, you can split this repository into domain contexts the same way as with the skeleton.

Environment directory structure

app/
  main.tf
  outputs.tf
  terraform.auto.tfvars
  variables.tf
shared/
  main.tf
  outputs.tf
  terraform.auto.tfvars
  variables.tf

Example of main.tf file within terraform-env-dev/app directory can be found below:

Example of `main.tf` file

terraform {
  backend "remote" {
  organization = "myorg"

  workspaces {
      name = "dev-app"
  }

  }
}

module "skeleton" {
  source  = "app.terraform.io/myorg/terraform-infra-skeleton/aws//app"
  version = "1.2.20"

  environment                               = var.environment
  cidr_block                                = var.cidr_block
}

The main concept is to pass an absolutely minimal amount of values, and focus on having strong default values declared within the skeleton. By keeping an environment definition as an artifact you can control change promotion in a predictable, safe manner as you would do with a regular Terraform module. Additionally, you are one hundred percent sure that your environment codebase will not diverge between development, staging, and production, as the artifact does not change in-between.

Conclusion

Lack of proper code structure planning very often ends up in hours of running terraform import against the new codebase. However, not all resources can be easily imported, and such migration can be a really traumatic and painful experience. Anyways, you have read another long write-up from sysdogs. A glass of wine and a lovely dog is something you really deserve. 🍷

References

LinkedInLinkedInLinkedIn
Kamil Szczygieł photo

About the author

Kamil Szczygieł