Testing Infrastructure as Code – 3 Lessons Learnt

In the last few weeks, I made a deep dive into Infrastructures as Code. I chose AWS and Terraform to write my provisioning scripts.

It came naturally to me, as a software engineer, to write Terraform code. A lot of software design principles (like KISS, DRY or even SOLID to some extent) can be adapted to write quality IaC. The intention is to end up with small, decoupled modules, used as building blocks for provisioning environments. Still, I felt a bit uncomfortable without TDD or automated tests at all. Are automated IaC tests useful (besides improving my well-being)?

Developers, DevOps, Software Engineers, we always verify if our code works as expected, even if not in an automated manner. In the case of Terraform, running terraform plan (a command that creates an execution plan, but doesn’t deploy anything) checks the syntax and if all resources are correctly defined. 

Manual functional testing involves logging into the management console and verifying all properties of the deployed resources, verifying access-control lists, connectivity tests, etc. This is time-consuming and cumbersome, but necessary.

Operability practices aim to support frequent deployments. This also means constant changes to the underlying infrastructure. In this case, manual testing of the IaC is inefficient and may not add as much value as expected. For this reason, I decided to take some time and test the automated testing tools for IaC. Below, I will talk about three valuable lessons I learned.

1. Testing modules with Terratest isn’t even close to unit testing.

The tool of choice for automated Terraform tests is Gruntwork’s Terratest. It is a Golang framework, actively growing and gaining popularity.

In the beginning, it was tempting to think about module tests like there were unit tests. When you unit-test a function used in your application, you don’t need to run the application. The tests are short, simple, and examine a particular piece of code in isolation (also with input values that yield errors.) Correctness means that the output of the function under test is as expected. We care about broad test coverage.

Module testing in Terratest is different. You write an example infrastructure code using the module you want to verify. Terratest deploys the code and runs your tests against it. The tests should answer the question: “does the infrastructure actually work?” For example: if your module deploys a server with running application, you could send some traffic to it to verify if it responds as expected.

Examining resource’s properties (loaded from an API) is rarely practiced with Terratest. It can be useful when false-negative test results may introduce some kind of high risk. As a result module testing with Terratest is looks almost like end-to-end testing.

2. There are other tools to complement Terratest.

Sometimes, end-to-end tests are not enough. For example, your private network accidentally has a route to the gateway. To confirm, that the private network is really private, it feels convenient to check if no routes in its routing table let public traffic. 

You could also picture a situation where the operation team lets the development teams create own resources. You may need to ensure that the implemented code follows security standards and compliance obligations. E.g all resources should be correctly tagged, all storage should be encrypted, some ports should never be open, etc.

In addition to Terratest, several other testing tools are more convenient to test specific resource properties. One of them is terraform-compliance. You can encode all your policies in “spec” files similar to Cucumber specs and run them against the output of terraform plan


Feature: Define AWS Security Groups
  In order to improve security
  As engineers
  We will use AWS Security Groups to control inbound traffic

  Scenario: Policy Structure
    Given I have AWS Security Group defined
    Then it must contain ingress

  Scenario: Only selected ports should be publicly open
    Given I have AWS Security Group defined
    When it contains ingress
    Then it must only have tcp protocol and port 22,443 for 0.0.0.0/0

This spec would yield an error if any of your security groups allow inbound traffic on a port different than 22 or 443.

If you feel more comfortable testing deployed resources, and you work with AWS, you could try AWSSpec. AWSSpec is built on top of Ruby’s RSpec testing framework. The tests are specs-alike, BDD style. The difference is that you run them against the real infrastructure. Similarly to the Terratest, if you would test modules you need to deploy examples first. You could automate the deployment and verification using Test-Kitchen (along with Kitchen-Terraform plugin). For example,  testing a private subnet may look like this:

require 'spec_helper'
describe subnet('Default Private Subnet') do
  it { should exist }
  its(:cidr_block) { should eq '10.10.2.0/24' }
  its(:state) { should eq 'available' }
end
describe route_table('Private Subnet Route Table') do
  it { should exist }
  it { should have_subnet('Default Private Subnet') }
  it { should have_route('10.10.0.0/16').target(gateway: 'local') }
  it { should_not have_route('0.0.0.0/0')}
end

Executing the test may show following output:

3. Automated IaC tests are expensive

The cost of IaC testing doesn’t only include the charges for the resources deployed for testing. For writing automated IaC tests you need good programming skills, that may go beyond one programming language. (Terratest uses Golang, terraform-compliance uses Python, AWSSpec uses Ruby, etc.)

Writing terraforms tests is time-consuming. The cloud APIs aren’t convenient to use, and helpers libraries may miss important functions. In the case of Terratest, and AWSSpec there is a lot of additional infrastructure code needed for module testing.

Many tools, although quite useful, aren’t yet mature. There is always a danger that they will cease to work with newer versions of Terraform or just be discontinued.

Summary

Should I recommend investing time and money into automated IaC testing? That depends. First of all, your team should focus on using Terraform the right way. This means no direct, manual changes to the infrastructure.

Once delivering IaC works well, your team may consider adding automated tests.

If a change may introduce any kind of risk that can’t be accepted, then it’s a good candidate for an automated test. Another factor to consider is the team topology. If the IaC ownership is decentralised then automated tests may help to ensure code consistency, compliance, and quality.

Is it OK to give up automated IaC testing for now? If you can’t introduce automated IaC testing you can rely on other confirmation techniques like green/blue deployments with comprehensive monitoring. Although they do not substitute each other both can help verify the correctness of the infrastructure.