Ensure Cloud Best Practices through OPA for Terraform built AWS Infrastructure

Udaara Jayawardana
4 min readJul 12, 2022

--

In case you missed it, the OPA logo is a Viking Helmet! :P

Are you familiar with Terraform? Of course, you are. Infrastructure-as-Code has a larger say in today’s cloud infrastructure provisioning and independent IaC tools like Terraform (& Pulumi have also been turning heads recently) have a slight edge compared to Cloud Service Provider (CSP)-bound IaC tools like AWS’s CloudFormation, as most organizations opting for a multi-cloud-safe approach.

With Terraform (and through other IaC tools too!) we have the god-like ability to create-update-delete large amount of infrastructure at a simple command. As Uncle Ben famously says goes “With great power comes great responsibility”, this is where an issue arises. We need to make sure the infrastructure implemented by the Terraform code is adhering to the security and organization’s best practices. Otherwise, a missed or overlooked error in the code could put your entire infrastructure at risk.

With great power comes great responsibility

So somebody needs to make sure the Terraform code about to be executed does not break any security or organizational cloud standard. Many organizations however do this via a Git PR. No worries, that works right? Yeah I used to use this too. But No! Please don’t do this…

We are humans after all. Terraform codes can become super complex and it’s near impossible to read the code and map everything. You know how many ways we can pass a variable to a Terraform code block right? A missed step in the hierarchy variable could create a completely different thing.

Wait, what if we print the Terraform output to the Git PR as well? You know like using a tool like Atlantis?
That makes things slightly easy, yes. But assume you are changing something which affects 30 resources. Can you really go through all the 30 output sections and check? We are humans, remember? We make mistakes and mistakes are something we can not afford in infrastructure. A mistake could cost you millions of dollars

Okay, jeez, calm down. I get it. So what should we do? Well I’m glad you’ve asked 🙂

There are many ways to automate the Terraform code reviews. Hashicorp’s own Sentinel works great. But I’m looking into free alternatives.

You can try validation rules in Terraform modules. That way you can make sure things like naming standards are met.

variable "instance_names" {  description = "(Required) AWS Host name of the instances"                                  
type = list
validation {
condition = length(var.instance_names) < 13
error_message = "Names should be less than 13 characters"
}
}

But this still can’t do everything I want. What do we want really? Well, ideally you’d want to

  1. Make sure the Terraform code uses trusted modules from your organization’s Terraform module repo. An untested, publicly open Terraform module could have some serious compliance violations
  2. Make sure code, syntax and formatting are correct (terraform fmt and terraform validate). So no unwanted changes in next PR
  3. Organizational practices are met. For example your organization may use specific set of mandatory AWS tags.
  4. Organizational security standards are met. Like all EBS volumes should be encrypted by default.
  5. A thing to remember however is, organization practices and security standards are frequently updated. So these shouldn’t be hardcoded in places like modules. They should be in a place that can be easily updated and can keep versions.

This is where the Open Policy Agent (OPA) comes in. OPA is a policy engine that automates and unifies the implementation of policies across IT environments, especially in cloud native applications. You can use it for everything! But let’s not get carried away and focus on Terraform.

OPA makes it possible to write policies that test the changes Terraform is about to make before it makes them. In their own words,

  • OPA tests can help individual developers sanity check their Terraform changes
  • Auto-approve run-of-the-mill infrastructure changes and reduce the burden of peer-review
  • Catch problems that arise when applying Terraform to production after applying it to staging

OPA policies works on the Terraform’s output json file. So you can simply write a policy for anything you want!
OPA policies can be easily stored in your Git repository. This way you can have a versioned policy repository for all your compliance needs!

Now how to use OPA? Well simple.

  1. Create and save a Terraform plan
    terraform init && terraform plan --out tfplan.binary
  2. Convert the Terraform plan into JSON
    terraform show -json tfplan.binary > tfplan.json
  3. Write the OPA policy to check the plan
  4. Evaluate your code. Now there are couple of things you can do here!
  • Check whether your code violates any policies
    opa exec --decision terraform/analysis/authz --bundle policy/ tfplan.json
  • Setup a score for each violation and check the score for your errors! This can be used to set up a blast radius and ignore come minor violations
    opa eval --data terraform.rego --input tfplan.json "data.terraform.analysis.score"

Here’s a sample OPA policy to make sure security group rules are not using internter (0.0.0.0/0) as source

OPA provides a variety of examples, so have a look at them too! And here are some policies written by me for AWS!

As a summary this post explains what are the basic features of OPA and how you can use it for AWS infrastructure creation using Terraform.

But the story does not end here. Yes, now we have a policy store and a way to check whether Terraform code violates those, but it still have some manual steps, as you need to manually run OPA policies against the Terraform code. Well, I automated that too :)

Please read more about it in this post: An Automation to Evaluate Terraform Code PR using OPA and Jenkins

--

--

Udaara Jayawardana

A DevOps Engineer who specialises in the design and implementation of AWS and Containerized Infrastructure.