HTTP Downloads in Old and New PowerShell Versions

There are primarily two ways to download files over HTTP in PowerShell: the Invoke-WebRequest cmdlet and the .NET WebClient class. They’re similar to Linux tools like wget and curl. Which one you need depends on whether you’re using older or newer PowerShell versions. In older versions, it also depends on the file’s size.

In older Posh, Invoke-WebRequest can be slow. Downloading a 1.2 GB Ubuntu ISO file took 1 hour. Downloading the same file with the WebClient class took less than 5 minutes. We tested on 5.1, the version that came out-of-box with Windows 10 Enterprise.

In newer Posh, Invoke-WebRequest performed the same as the WebClient class. We tested on 7.1 running in both Windows 10 and OS X.

If you’re downloading a small file or you’re using the latest version of Posh, use Invoke-WebRequest. We prefer this because it’s idiomatic PowerShell. Invoke-WebRequest is a built-in cmdlet. If you’re downloading a large file with the Posh that came out-of-box with Windows, you may need the WebClient class.

Digging through release notes and old documentation and some other sources didn’t lead to a point in the PowerShell history where this changed, but it may have been this port. If you know where to find the specific change, we’d love to see it!

We’ll demonstrate each way with this URL and file name:

$Url = [uri]"https://releases.ubuntu.com/20.04.3/ubuntu-20.04.3-live-server-amd64.iso"
$FileName = $Url.Segments[-1]

For details on how we got the file name and what the [uri] prefix means, check out our article on splitting URLs.

Invoke-WebRequest (Preferred Way)

Invoke-WebRequest $Url -OutFile "./$FileName"

This creates an ubuntu-20.04.3-live-server-amd64.iso file in the current directory. It shows progress while it runs.

WebClient Class (Old Way for Large Files)

Using just a file name or a file name in the ./ relative path, this downloaded to the $HOME folder. To avoid that, we constructed an absolute path using the directory of the script that ran our test code:

$LocalFilePath = Join-Path -Path $PSScriptRoot -ChildPath $FileName
(New-Object System.Net.WebClient).DownloadFile($Url, $LocalFilePath)

This creates an ubuntu-20.04.3-live-server-amd64.iso file in the same directory as the script that runs the code. It doesn’t show progress while it runs.

Happy automating!

Operating Ops

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

PowerShell: Splitting URL Strings

Sometimes, you have a full URL string but you only need one of its components. PowerShell makes it easy to split them out. Suppose we need the name of the ISO file from here:

$UbuntuIsoUrlString = "https://releases.ubuntu.com/20.04.3/ubuntu-20.04.3-desktop-amd64.iso"

We want the substring ubuntu-20.04.3-desktop-amd64.iso. We could use the split operator:

> $($UbuntuIsoUrlString -split "/")[-1]                                                       
ubuntu-20.04.3-desktop-amd64.iso

This divides the string into the components between the / characters and stores each one in an array. The [-1] index selects the last element of that array. This works for the ISO name, but it fails in other cases. Suppose we need the scheme:

> $($UbuntuIsoUrlString -split "/")[0] 
https:

The scheme is https, but we got https: (with a colon). We were splitting specifically on / (slash) characters. The colon isn’t a slash, so split counted it as part of the first component. Split doesn’t understand URLs. It just divides the string whenever it sees the character we told it to split on. We could strip the colon off after we split, but there’s a better way.

.NET has a class that can be instantiated to represent URI objects. URLs are a type of URI. If we cast our string to a URI, we can use the properties defined by that class to get the scheme:

> $UbuntuIsoUri = [System.Uri]$UbuntuIsoUrlString
> $UbuntuIsoUri.Scheme
https

This class understands URLs. It knows the colon is a delimiter character, not part of the scheme. It excludes that character for us.

We can shorten this a bit with the URI type accelerator:

> $UbuntuIsoUri = [uri]$UbuntuIsoUrlString
> $UbuntuIsoUri.Scheme
https

If we want to get the ISO name from this object, we can use the Segments property:

> $UbuntuIsoUri.Segments[-1]
ubuntu-20.04.3-desktop-amd64.iso

Segments returns an array of all the path segments. We get the last one with the [-1] index.

Let’s make the whole operation a one-liner so it’s easy to copy/paste:

> ([uri]$UbuntuIsoUrlString).Segments[-1]
ubuntu-20.04.3-desktop-amd64.iso

That’s the Posh way to process URLs! Cast to a URI object, then read whatever data you need from that object’s properties. As always, PowerShell is all about objects.

Happy Automating!

Operating Ops

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

Which Way to Write IAM Policy Documents in Terraform

There are many ways to write IAM policy documents in terraform. In this article, we’ll cover each of them and explain why we use it or why we don’t.

For each pattern, we’ll create an example policy using the last statement of this AWS example. It’s a good test case because it references both an S3 bucket name and an IAM user name, which we’ll handle differently.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::bucket-name/home/${aws:username}",
                "arn:aws:s3:::bucket-name/home/${aws:username}/*"
            ]
        }
    ]
}

Table of Contents

Inline jsonencode() Function

This is what we use. You’ll also see it in HashiCorp examples.

resource "aws_s3_bucket" "test" {
  bucket_prefix = "test"
  acl           = "private"
}

resource "aws_iam_policy" "jsonencode" {
  name = "jsonencode"
  path = "/"

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = [
          "s3:*",
        ]
        Effect = "Allow"
        Resource = [
          "${aws_s3_bucket.test.arn}/home/$${aws:username}",
          "${aws_s3_bucket.test.arn}/home/$${aws:username}/*"
        ]
      },
    ]
  })
}
  • ${aws_s3_bucket.test.arn} interpolates the ARN of the bucket we’re granting access to.
  • $${aws:username} escapes interpolation to render a literal ${aws:username} string. ${aws:username} is an AWS IAM policy variable. IAM’s policy variable syntax collides with terraform’s string interpolation syntax. We have to escape it, otherwise terraform expects a variable named aws:username.
  • If you need it, the policy JSON can be referenced with aws_iam_policy.jsonencode.policy (not shown here).

Why we like this pattern:

  • It declares everything in one resource.
  • The policy is written in HCL. Terraform handles the conversion to JSON.
  • There are no extra lines or files like there are in the following patterns. It only requires the lines to declare the resource and the lines that will go into the policy.

aws_iam_policy_document Data Source

The next-best option is the aws_iam_policy_document data source. It’s 95% as good as jsonencode().

resource "aws_s3_bucket" "test" {
  bucket_prefix = "test"
  acl           = "private"
}

data "aws_iam_policy_document" "test" {
  statement {
    actions = [
      "s3:*",
    ]
    resources = [
      "${aws_s3_bucket.test.arn}/home/&{aws:username}",
      "${aws_s3_bucket.test.arn}/home/&{aws:username}/*",
    ]
  }
}

resource "aws_iam_policy" "aws_iam_policy_document" {
  name = "aws_iam_policy_document"
  path = "/"

  policy = data.aws_iam_policy_document.test.json
}
  • The bucket interpolation works the same as in the jsonencode() pattern above.
  • &{aws:username} is an alternate way to escape interpolation that’s specific to this resource. See note in the resource docs. Like above, it renders a literal ${aws:username} string. You can still use $${} interpolation in these resources. The &{} syntax is just another option.

Why we think this is only 95% as good as jsonencode():

  • It requires two resources instead of one.
  • It requires several more lines of code.
  • The different options for escaping interpolation can get mixed together in one declaration, which makes for messy code.
  • The alternate interpolation escape syntax is specific to this resource. If it’s used as a reference when writing other code, it can cause surprises.

These aren’t big problems. We’ve used this resource plenty of times without issues. It’s a fine way to render policies, we just think the jsonencode() pattern is a little cleaner.

Template File

Instead of writing the policy directly in one of your .tf files, you can put them in .tpl template files and render them later with templatefile(). If you don’t need any variables, you could use file() instead of templatefile().

First, you need a template. We’ll call ours test_policy_jsonencode.tpl.

${jsonencode(
  {
    Version = "2012-10-17",
    Statement = [
      {
        Effect = "Allow",
        Action = "s3:*",
        Resource = [
          "${bucket}/home/$${aws:username}",
          "${bucket}/home/$${aws:username}/*"
        ]
      }
    ]
  }
)}

Then, you can render the template into your resources.

resource "aws_s3_bucket" "test" {
  bucket_prefix = "test"
  acl           = "private"
}

resource "aws_iam_policy" "template_file_jsonencode" {
  name = "template_file_jsonencode"
  path = "/"

  policy = templatefile(
    "${path.module}/test_policy_jsonencode.tpl",
    { bucket = aws_s3_bucket.test.arn }
  )
}
  • The interpolation and escape syntax is the same as in the jsonencode() example above.
  • The jsonencode() call wrapped around the contents of the .tpl file allows us to write HCL instead of JSON.
  • You could write a .tpl file containing raw json instead of using jsonencode() around HCL, but then you’d be mixing another language into your module. We recommend standardizing on HCL and letting terraform convert to JSON.
  • templatefile() requires you to explicitly pass every variable you want to interpolate in the .tpl file, like bucket in this example.

Why we don’t use this pattern:

  • It splits the policy declaration across two files. We find this makes modules harder to read.
  • It requires two variable references for every interpolation. One to pass it through to the template, and another to resolve it into the policy. These are tedious to maintain.

In the past, we used these for long policies to help keep our .tf files short. Today, we use the jsonencode() pattern and declare long aws_iam_policy resources in dedicated .tf files. That keeps the policy separate but avoids the overhead of passing through variables.

Heredoc Multi-Line String

You can use heredoc multi-line strings to construct JSON. The HashiCorp docs specifically say not to do this. Because they do, we won’t include an example of using them to construct policy JSON. If you have policies rendered in blocks like this:

<<EOT
{
    "Version": "2012-10-17",
    ...
}
EOT

We recommend replacing them with the jsonencode() pattern.

Happy automating!

Operating Ops

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

Allowing AWS IAM Users to Manage their Passwords, Keys, and MFA

We do these three things for IAM users that belong to humans:

  • Set a console access password and rotate it regularly. We don’t manage resources in the console, but its graphical UI is handy for inspection and diagnostics.
  • Create access keys and rotate them regularly. We use these with aws-vault to run things like terraform.
  • Enable a virtual Multi-Factor Authentication (MFA) device. AWS accounts are valuable resources. It’s worthwhile to protect them with a second factor of authentication.

There’s much more to managing IAM users, like setting password policies and enforcing key rotation. These are just three good practices we follow.

Users with the AdministratorAccess policy can do all three, but that’s a lot of access. Often, we don’t need that much. Maybe we’re just doing investigation and ReadOnlyAccess is enough. Maybe users have limited permissions and instead switch into roles with elevated privileges (more on this in a future article). In cases like those, we need a policy that allows users to manage their own authentication. Here’s what we use.

This article is about enabling human operators to responsibly manage their accounts. Service accounts used by automation and security policy enforcement are both topics for future articles.

Table of Contents

Console Access Policy Statements

This one is easy. The AWS docs have a limited policy that works.

{
    "Sid": "GetAccountPasswordPolicy",
    "Effect": "Allow",
    "Action": "iam:GetAccountPasswordPolicy",
    "Resource": "*"
},
{
    "Sid": "ChangeSelfPassword",
    "Effect": "Allow",
    "Action": "iam:ChangePassword",
    "Resource": "arn:aws:iam::[account id without hyphens]:user/${aws:username}"
}

Access Key Policy Statements

This one is also easy. The AWS docs have a limited policy that works. We made a small tweak.

{
    "Sid": "ManageSelfKeys",
    "Effect": "Allow",
    "Action": [
        "iam:UpdateAccessKey",
        "iam:ListAccessKeys",
        "iam:GetUser",
        "iam:GetAccessKeyLastUsed",
        "iam:DeleteAccessKey",
        "iam:CreateAccessKey"
    ],
    "Resource": "arn:aws:iam::[account id without hyphens]:user/${aws:username}"
}
  • The AWS policy uses * in the account ID component of the ARN. We like to set the account ID so we’re granting the most specific access we can. Security scanning tools also often check for * characters, and removing them reduces the number of flags.
  • Like above, ${aws:username} is an IAM policy variable. See links there for how to handle this in terraform.
  • We changed the sid from “ManageOwn” to “ManageSelf” so it doesn’t sound like it allows taking ownership of keys for other users.

MFA Device Policy Statements

This one was trickier. We based our policy on an example from the AWS docs, but we made several changes.

{
    "Sid": "ManageSelfMFAUserResources",
    "Effect": "Allow",
    "Action": [
        "iam:ResyncMFADevice",
        "iam:ListMFADevices",
        "iam:EnableMFADevice",
        "iam:DeactivateMFADevice"
    ],
    "Resource": "arn:aws:iam::[account id without hyphens]:user/${aws:username}"
},
{
    "Sid": "ManageSelfMFAResources",
    "Effect": "Allow",
    "Action": [
        "iam:DeleteVirtualMFADevice",
        "iam:CreateVirtualMFADevice"
    ],
    "Resource": "arn:aws:iam::[account id without hyphens]:mfa/${aws:username}"
}
  • Like we talked about above, our goal is to enable users to follow good practices. We selected statements that enable but not ones that require.
  • The AWS example included arn:aws:iam::*:mfa/* in the resources for iam:ListMFADevices. According to the the AWS docs for the IAM service’s actions, this permission only supports user in the resources list. We removed the mfa resource.
  • Also according to the the AWS docs for the IAM service’s actions, iam:DeleteVirtualMFADevice and iam:CreateVirtualMFADevice support different resources from iam:ResyncMFADevice and iam:EnableMFADevice. We split them into separate statements that limit each one to their supported resources. This probably doesn’t change access level, but our routine is to limit resource lists as much as possible. That helps make it clear to future readers what the policy enables.
  • Like above, ${aws:username} is an IAM policy variable. See links there for how to handle this in terraform.
  • We continued our convention from above of naming sids for “self” to indicate they’re limited to the user who has the policy.

Complete Policy Document

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "GetAccountPasswordPolicy",
            "Effect": "Allow",
            "Action": "iam:GetAccountPasswordPolicy",
            "Resource": "*"
        },
        {
            "Sid": "ChangeSelfPassword",
            "Effect": "Allow",
            "Action": "iam:ChangePassword",
            "Resource": "arn:aws:iam::[account id without hyphens]:user/${aws:username}"
        },
        {
            "Sid": "ManageSelfKeys",
            "Effect": "Allow",
            "Action": [
                "iam:UpdateAccessKey",
                "iam:ListAccessKeys",
                "iam:GetUser",
                "iam:GetAccessKeyLastUsed",
                "iam:DeleteAccessKey",
                "iam:CreateAccessKey"
            ],
            "Resource": "arn:aws:iam::[account id without hyphens]:user/${aws:username}"
        },
        {
            "Sid": "ManageSelfMFAUserResources",
            "Effect": "Allow",
            "Action": [
                "iam:ResyncMFADevice",
                "iam:ListMFADevices",
                "iam:EnableMFADevice",
                "iam:DeactivateMFADevice"
            ],
            "Resource": "arn:aws:iam::[account id without hyphens]:user/${aws:username}"
        },
        {
            "Sid": "ManageSelfMFAResources",
            "Effect": "Allow",
            "Action": [
                "iam:DeleteVirtualMFADevice",
                "iam:CreateVirtualMFADevice"
            ],
            "Resource": "arn:aws:iam::[account id without hyphens]:mfa/${aws:username}"
        }
    ]
}

User Guide

  1. Replace [account id without hyphens] with the ID for your account in the policy above.
  2. Attach the policy to users (we like to do this through groups).
  3. Tell users to edit their authentication from My Security Credentials in the user dropdown. This policy won’t let them access their user through the IAM console. My Security Credentials may not appear in the dropdown if the user has switched into a role.

Happy automating!

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

Creating Terraform Resources in Multiple Regions

In most terraform modules, resources are created in one region using one provider declaration.

provider "aws" {
  region = "us-west-1"
}

data "aws_region" "primary" {}

resource "aws_ssm_parameter" "param" {
  name  = "/${data.aws_region.primary.name}/param"
  type  = "String"
  value = "notavalue"
}

Sometimes, you need to create resources in multiple regions. Maybe the module has to support disaster recovery to an alternate region. Maybe one of the AWS services you’re using doesn’t support your primary region. When this article was written, Amazon Certificate Manager certificates had to be created in us-east-1 to work with Amazon CloudFront. In cases like these, terraform supports targeting multiple regions.

We recommend using this feature cautiously. Resources should usually be created in the same region. If you’re sure your module should target multiple, here’s how to do it.

  1. Declare a provider for the alternate region. You’ll now have two providers. The original one for your primary region, and the new one for your alternate.
  2. Give the new provider an alias.
  3. Declare resources that reference the new alias in their provider attribute with the format aws.[alias]. This also works for data sources, which is handy for dynamically interpolating region names into resource properties like their name.
provider "aws" {
  alias  = "alternate_region"
  region = "us-west-2"
}

data "aws_region" "alternate" {
  provider = aws.alternate_region
}

resource "aws_ssm_parameter" "alt_param" {
  provider = aws.alternate_region

  name  = "/${data.aws_region.alternate.name}/param"
  type  = "String"
  value = "notavalue"
}

terraform plan doesn’t show what regions it’ll create resources in, so this example interpolates the region name into the resource name to make it visible.

...
Terraform will perform the following actions:

  # aws_ssm_parameter.alt_param will be created
  + resource "aws_ssm_parameter" "alt_param" {
      + arn       = (known after apply)
      + data_type = (known after apply)
      + id        = (known after apply)
      + key_id    = (known after apply)
      + name      = "/us-west-2/param"
      + tags_all  = (known after apply)
      + tier      = "Standard"
      + type      = "String"
      + value     = (sensitive value)
      + version   = (known after apply)
    }
...

To confirm the resources ended up in the right places, here are screenshots of each region’s parameters next to the region drop-down menu in the AWS web console.

We get one in us-west-1 and another in us-west-2, as expected.

Happy automating!

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

Terraform Map and Object Patterns

Terraform variables implement both a map and an object type. They mostly work the same. The docs even say, “The distinctions are only useful when restricting input values for a module or resource.” They can be defined and accessed in several ways. There’s some automatic conversion back and forth between them.

This article distills these details into patterns you can copy and paste, while highlighting some of the subtleties.

Here’s the main detail you need:

Maps contain many things of one type. Objects contain a specific set of things of many types.

This is a simplification. It doesn’t cover all the behavior of terraform’s maps and objects (like loss that can happen in conversions back and forth between them), but it’s enough for the patterns you’re likely to need day to day.

Table of Contents

Style

Key Names

You can quote the key names in map definitions.

variable "quoted_map" {
  default = {
    "key_1" = "value_1"
    "key_2" = "value_2"
  }
}

But you don’t have to.

variable "unquoted_map" {
  default = {
    key_1 = "value_1"
    key_2 = "value_2"
  }
}

We prefer the unquoted format. Partly because the syntax is lighter and partly because it only works if key names are valid identifiers, so it forces us to use ones that are. If the key names are identifiers, the interior of maps look similar to the rest of our terraform variables, and we can also use a dotted notation for referencing them.

Commas

You can separate key/value pairs with commas.

variable "comma_map" {
  default = {
    key_1 = "value_1",
    key_2 = "value_2",
  }
}

But you don’t have to.

variable "no_comma_map" {
  default = {
    key_1 = "value_1"
    key_2 = "value_2"
  }
}

We prefer no commas because the syntax is lighter.

References

You can reference values by attribute name with quotes and square brackets.

output "brackets" {
  value = var.unquoted_map["key_2"]
}

But you can also use the dotted notation.

output "dots" {
  value = var.unquoted_map.key_2
}

We prefer the dotted notation because the syntax is lighter. This also requires the key names to be identifiers, but they will be if you use the unquoted pattern for defining them.

Patterns

  • Each pattern implements a map containing a value_2 string that we’ll read into an output.
  • Examples set values with variable default values, but they work the same with tfvars, etc.
  • The types of values in these examples are known, so they’re set explicitly. There’s also an any keyword for cases where you’re not sure. We recommend explicit types whenever possible.

Untyped Flat Map

This is the simplest pattern. We don’t recommend it. Use a typed map instead.

variable "untyped_flat_map" {
  default = {
    key_1 = "value_1"
    key_2 = "value_2"
  }
}
output "untyped_flat_map" {
  value = var.untyped_flat_map.key_2
}

Typed Flat Map

This is sufficient for simple cases.

variable "typed_flat_map" {
  default = {
    key_1 = "value_1"
    key_2 = "value_2"
  }
  type = map(string)
}
output "typed_flat_map" {
  value = var.typed_flat_map.key_2
}

With the type set, if a module mistakenly passes a value of the wrong type that our code wasn’t expecting, terraform throws an error.

variable "typed_flat_map_bad_value" {
  default = {
    key_1 = []
    key_2 = "value_2"
  }
  type = map(string)
}
│ Error: Invalid default value for variable
│ 
│   on main.tf line 49, in variable "typed_flat_map_bad_value":
│   49:   default = {
│   50:     key_1 = []
│   51:     key_2 = "value_2"
│   52:   }
│ 
│ This default value is not compatible with the variable's type constraint: element "key_1": string required.

except when it doesn’t. If we set key_1 to a number or boolean, it’ll be automatically converted to a string. This is generic terraform behavior. It’s not specific to maps.

Untyped Nested Map

We don’t recommend this, either. Use a typed nested map instead.

variable "untyped_nested_map" {
  default = {
    key_1 = "value_1"
    key_2 = {
      nested_key_1 = "value_2"
    }
  }
}
output "untyped_nested_map" {
  value = var.untyped_nested_map.key_2.nested_key_1
}

Typed Nested Map, Values are Same Type

Like the flat map, this pattern protects us against types of inputs the code isn’t written to handle. This only works when the values of the keys within each map all share the same type.

variable "typed_nested_map_values_same_type" {
  default = {
    key_1 = {
      nested_key_1 = "value_1"
    }
    key_2 = {
      nested_key_2 = "value_2"
    }
  }
  type = map(map(string))
}
output "typed_nested_map_values_same_type" {
  value = var.typed_nested_map_values_same_type.key_2.nested_key_2
}

Typed Nested Map, Values are Different Types

This is where the differences between maps and objects start to show up in implementations. Remembering our distillation of the docs from the start:

Maps contain many things of one type. Objects contain a specific set of things of many types.

variable "typed_nested_map_values_different_types" {
  default = {
    key_1 = "value_1"
    key_2 = {
      nested_key_1 = "value_2"
    }
  }
  type = object({
    key_1 = string,
    key_2 = map(string)
  })
}
output "typed_nested_map_values_different_types" {
  value = var.typed_nested_map_values_different_types.key_2.nested_key_1
}

In this nested map, one value is a string and the other is a map. That means we need an object to define the constraint. We can’t do it with just a map, because maps contain one type of value and we need two.

Flexible Number of Typed Nested Maps, Values are Different Types

This is the most complex case. It lets us read in a map that has an arbitrary number of nested maps like the ones above.

variable "flexible_number_of_typed_nested_maps" {
  default = {
    map_1 = {
      key_1 = "value_1"
      key_2 = {
        nested_key_1 = "value_2"
      }
    }
    map_2 = {
      key_1 = "value_3"
      key_2 = {
        nested_key_1 = "value_4"
      }
    }
  }
  type = map(
    object({
      key_1 = string,
      key_2 = map(string)
    })
  )
}
output "flexible_number_of_typed_nested_maps" {
  value = var.flexible_number_of_typed_nested_maps.map_1.key_2.nested_key_1
}

We could add a map_3 (or as many more as we wanted) without getting type errors. Again remembering our simplification:

Maps contain many things of one type. Objects contain a specific set of things of many types.

Inside, we use objects because their keys have values that are different types. Outside, we use a map because we want an arbitrary number of those objects.

The inside objects all have the same structure. They can be defined with the same type expression. That passes the requirement that maps contain all the same type of thing.

Happy automating!

Operating Ops

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

Testing Azure Pipeline Artifacts

Azure Pipelines supports several types of artifacts. This is about Pipeline Artifacts, the ones managed with the publish and download tasks.

Any pipeline that builds code should do at least these two things:

  • Build an artifact that can be deployed later.
  • Test that artifact.

Specifically, it should test the artifact. Not just the same version of application code used to build the artifact, but the actual artifact file that was built. If it only tests the same version of code, it won’t detect bugs in how that code was built.

This is a case where more errors are better. If we hide errors at build time, we’ll have to diagnose them at deploy time. That could cause release failures, and maybe outages. Like the Zen of Python says:

Errors should never pass silently.

Testing the artifact itself is a best practice for any build pipeline. It’s better to find out right away that the code wasn’t built correctly.

First we’ll create a pipeline that tests its code but not the artifact it builds. We’ll include an artificial bug in the build process and show that the pipeline passes tests but still creates a broken artifact. Then we’ll rework to point the tests at the artifact so the build bug gets caught by the tests and becomes visible.

These examples use Python’s tox, but the principles are the same for any tooling. Tox is a testing tool that creates isolated Python environments, installs the app being tested into those environments, and runs test commands.

Setup

First we need a Python package. We’ll make a super-simple one called app:

.
├── app
│   ├── __init__.py
│   └── main.py
├── setup.py
└── tox.ini

app is a single package with an empty __init__.py. The package contains one main.py module that defines one function:

def main():
    print('Success!')

setup.py contains config that lets us build app into a Python wheel file (an artifact that can be installed into a Python environment):

from setuptools import setup

setup(
    author='Operating Ops, LLC',
    license='MIT',
    description='Demo app.',
    name='app',
    packages=['app'],
    version='0.0.1'
)

tox.ini tells tox to call our main() function:

[tox]
envlist = py38

[testenv]
commands = python -c 'from app.main import main; main()'

That’s not a real test, but it’ll be enough to show the difference between exercising source code and built artifacts. A real project would use the unittest library or pytest or another tool here.

This test passes locally:

(env3) PS /Users/adam/Local/laboratory/pipelines/testing_pipeline_artifacts> tox -e py38
GLOB sdist-make: /Users/adam/Local/laboratory/pipelines/testing_pipeline_artifacts/setup.py
py38 recreate: /Users/adam/Local/laboratory/pipelines/testing_pipeline_artifacts/.tox/py38
py38 inst: /Users/adam/Local/laboratory/pipelines/testing_pipeline_artifacts/.tox/.tmp/package/1/app-0.0.1.zip
py38 installed: app @ file:///Users/adam/Local/laboratory/pipelines/testing_pipeline_artifacts/.tox/.tmp/package/1/app-0.0.1.zip
py38 run-test-pre: PYTHONHASHSEED='3356214888'
py38 run-test: commands[0] | python -c 'from app.main import main; main()'
Success!
___________________________________________________________________________ summary ____________________________________________________________________________
  py38: commands succeeded
  congratulations :)

Negative Case

Our code works locally, now we need a build pipeline to make an artifact we can deploy. We’ll start with the negative case, a broken build that still passes tests:

jobs:
- job: Build
  pool:
    vmImage: ubuntu-20.04
  workspace:
    clean: outputs
  steps:
  - task: UsePythonVersion@0
    displayName: Use Python 3.8
    inputs:
      versionSpec: '3.8'
  - pwsh: pip install --upgrade pip setuptools wheel
    displayName: Install build tools
  - pwsh: Remove-Item app/main.py
    workingDirectory: $(Build.SourcesDirectory)/pipelines/testing_pipeline_artifacts
    displayName: BUILD BUG
  - pwsh: python setup.py bdist_wheel --dist-dir $(Build.BinariesDirectory)
    workingDirectory: $(Build.SourcesDirectory)/pipelines/testing_pipeline_artifacts
    displayName: Build wheel
  - publish: $(Build.BinariesDirectory)/app-0.0.1-py3-none-any.whl
    displayName: Publish wheel
    artifact: wheel

- job: Test
  dependsOn: Build
  pool:
    vmImage: ubuntu-20.04
  steps:
  - task: UsePythonVersion@0
    displayName: Use Python 3.8
    inputs:
      versionSpec: '3.8'
  - pwsh: pip install --upgrade tox
    displayName: Install tox

    # This tests the version of code used to build the 'wheel' artifact, but it
    # doesn't test the artifact itself.
  - pwsh: tox -e py38
    workingDirectory: $(Build.SourcesDirectory)/pipelines/testing_pipeline_artifacts
    displayName: Run tox on code

The Build and Test jobs both succeed even though the BUILD BUG task ran:

An artifact is published to the pipeline:

We can download it:

But, if we install it and try to import from app, we get errors:

(env3) PS /Users/adam/Downloads> pip install ./app-0.0.1-py3-none-any.whl
Processing ./app-0.0.1-py3-none-any.whl
Installing collected packages: app
Successfully installed app-0.0.1
(env3) PS /Users/adam/Downloads> python
Python 3.8.3 (default, Jul  1 2020, 07:50:15) 
[Clang 11.0.0 (clang-1100.0.33.16)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from app.main import main
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'app.main'

It couldn’t find app.main because that module doesn’t exist. Our BUILD BUG task deleted it before the artifact was built. Later, the Test job checked out a fresh copy of the code, which included a fresh copy of the file we accidentally deleted in the Build job. Tox ran in that fresh environment and passed because all the files it needed were present. It was testing the code from the repo, not the artifact we built.

The Fix

If the artifact created by the pipeline doesn’t work, the pipeline should fail. To make tox test the app-0.0.1-py3-none-any.whl file built in the Build job, we need to do two things:

  • Download the artifact in the Test job.
  • Tell tox to test that artifact instead of the files from the repo. Normally, tox builds its own artifacts from source when it runs (that’s what you want when you’re testing locally). We can override this and tell it to install our pipeline’s artifact with the --installpkg flag.

First we need to modify the Test job from our pipeline:

- job: Test
  dependsOn: Build
  pool:
    vmImage: ubuntu-20.04
  steps:
  - task: UsePythonVersion@0
    displayName: Use Python 3.8
    inputs:
      versionSpec: '3.8'
  - pwsh: pip install --upgrade tox
    displayName: Install tox
  - download: current
    displayName: Download wheel
    artifact: wheel

    # This tests the version of code used to build the 'wheel' artifact, but it
    # doesn't test the artifact itself.
  - pwsh: tox -e py38
    workingDirectory: $(Build.SourcesDirectory)/pipelines/testing_pipeline_artifacts
    displayName: Run tox on code

    # This tests the artifact built in the build job above.
    # https://tox.readthedocs.io/en/latest/config.html#conf-sdistsrc
  - pwsh: tox --installpkg $(Pipeline.Workspace)/wheel/app-0.0.1-py3-none-any.whl -e py38
    workingDirectory: $(Build.SourcesDirectory)/pipelines/testing_pipeline_artifacts
    displayName: Run tox on artifact

We kept the old (invalid) test so we can compare it to the new test in the next run.

These two changes are needed for pretty much any build and test system. For Python and tox in this lab specifically, we also need to:

  • Recreate the Python environments between tests. In the “Run tox on code” task, tox will automatically build and install version 0.0.1 of app from the code in the repo. Unless we get rid of that, the “Run tox on artifact” task will see that version 0.0.1 of the app is already installed, so it won’t install the artifact we pass with --installpkg.
  • Change directory away from the repo root. Otherwise the test may import files from the current directory instead of the artifact we pass with --installpkg.

We can do this with two changes to tox.ini:

[tox]
envlist = py38

[testenv]
# Recreate venvs so previously-installed packages aren't importable.
# https://tox.readthedocs.io/en/latest/config.html#conf-recreate
recreate = true

# Change directory so packages in the current directory aren't importable.
# https://tox.readthedocs.io/en/latest/config.html#conf-changedir
# It's convenient to use the {toxworkdir}, but other directories work.
# https://tox.readthedocs.io/en/latest/config.html#globally-available-substitutions
changedir = {toxworkdir}

commands = python -c 'from app.main import main; main()'

The new test fails:

We get the same ModuleNotFoundError we got when we tried to install the artifact manually and import from it. That shows us the new test is exercising the artifact built in the Build job, not just the code that’s in the repo.

Now when there’s a bug in the build, the pipeline will fail at build time. Fixes can be engineered before release, and broken artifacts won’t go live.

Happy building!

Operating Ops

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

PowerShell: Sorting Version Strings

Recently we had a large array of version strings we needed to sort. Like this, but way too long to sort by hand:

$Versions = @(
    '2.1.3',
    '1.2.3',
    '1.2.12'
)

Piping this array to Sort-Object changes the order, but not correctly.

$Versions | Sort-Object
1.2.12
1.2.3
2.1.3

It thinks 1.2.12 comes before 1.2.3. Comparing character-by-character, that’s true. 1 is less than 3. We need it to interpret everything after the period as one number. Then it’ll see that 3 is less than 12.

We can do this by casting the elements of the array to version before sorting.

[version[]]$Versions | Sort-Object

Major  Minor  Build  Revision
-----  -----  -----  --------
1      2      3      -1
1      2      12     -1
2      1      3      -1

The components are parsed out and stored individually as Major, Minor, and Build. Now that we’re sending versions instead of strings to Sort-Object, it compares the 3 build to the 12 build and gets the order right.

Of course, now we have version objects instead of the strings we started with. We can convert back with the ToString() method.

[version[]]$Versions | Sort-Object | foreach {$_.ToString()}
1.2.3
1.2.12
2.1.3

That one-liner is usually all that’s needed. The main limitation is the version class. It works with up to four integer components delimited by dots. That doesn’t handle some common conventions.

Versions are often prefixed with v, like v1.2.3. Fortunately, that doesn’t change the sorting. Just trim it out.

'v1.2.3'.TrimStart('v')
1.2.3

TrimStart() removes the v from the start of the string if it’s present, otherwise it’s a no-op. It’s safe to call on a mix of prefixed and non-prefixed strings. Run it on everything and sort like before.

Some of the patterns defined in the ubiquitous semver allow more characters and delimiters.

  • 1.0.0-alpha.1
  • 1.0.0+21AF26D3—-117B344092BD

One of these adds build metadata and semver doesn’t consider build metadata in precedence, so depending on your situation you might be able to just trim off the problem characters. If not, you’ll need a different parser.

Hope this helped!

Operating Ops

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

Azure Pipelines: Loops

This is about Azure YAML Pipelines, not Azure Classic Pipelines. The docs explain the differences.

Everything shown here is in the docs, but the pieces are scattered and the syntax is fiddly. It took some digging and testing to figure out the options. This article collects all the findings.

In Azure Pipelines, the matrix job strategy and the each keyword both do loop operations. This article shows how to use them in five patterns that dynamically create jobs and steps, but you’re not limited to these examples. The each keyword, especially, can be adapted to many other cases.

Table of Contents

Jobs Created by a Hardcoded Matrix

Pipeline jobs with a matrix strategy dynamically create copies of themselves that each have different variables. This is essentially the same as looping over the matrix and creating one job for each set of those variables. Microsoft uses it for things like testing versions in parallel.

jobs:
- job: MatrixHardcoded
  pool:
    vmImage: ubuntu-20.04
  strategy:
    matrix:
      Thing1:
        thing: foo
      Thing2:
        thing: bar
  steps:
  - pwsh: Write-Output $(thing)
    displayName: Show thing

This creates MatrixHardcoded Thing1 and Matrix Hardcoded Thing2 jobs that each print the value of their thing variable in a Show thing step.

Jobs Created by an Each Loop over an Array

Pipelines have an each keyword in their expression syntax that implements loops more similar to what’s in programming languages like PowerShell and Python. Microsoft has great examples of its uses in their azure-pipelines-yaml repo.

parameters:
- name: thingsArray
  type: object
  default:
  - foo
  - bar

jobs:
- ${{each thing in parameters.thingsArray}}:
  - job: EachArrayJobsThing_${{thing}}
    pool:
      vmImage: ubuntu-20.04
    steps:
    - pwsh: Write-Output ${{thing}}
      displayName: Show thing

Fiddly details:

  • The ${{ }} syntax resolves into values. Since those values are prefixed with a dash (-), YAML interprets them as elements of an array. You need that dash on both the expression and job definition lines (highlighted). This feels like it’ll create an array of arrays that each contain one job, instead of a flat array of jobs, which seems like it would break. Maybe the pipeline interprets this syntax as a flat array, maybe it handles a nested one. Either way, you need both those dashes.
  • The each line has to end with a colon (:), but references to the ${{thing}} loop variable after it don’t.
  • Parameters are different from variables. Parameters support complex types (like arrays we can loop over). Variables are always strings.
  • If you need variables in your loop code, you can reference them in the expression syntax.
  • Parameters are mostly documented in the context of templates, but they can be used directly in pipelines.

This is mostly the same as a hardcoded matrix, but it creates jobs from a parameter that can be passed in dynamically.

There are some cosmetic differences. Since we used an array of values instead of a map of keys and values, there are no ThingN keys to use in the job names. They’re differentiated with values instead (foo and bar). The delimiters are underscores because job names don’t allow spaces (we could work around this with the displayName property).

We still get two jobs that each output their thing variable in a Show thing step.

Jobs Created by an Each Loop over a Map

This is the same as the previous pattern except it processes a map instead of an array.

parameters:
- name: thingsMap
  type: object
  default:
    Thing1: foo
    Thing2: bar

jobs:
- ${{each thing in parameters.thingsMap}}:
  - job: EachMapJobs${{thing.key}}
    pool:
      vmImage: ubuntu-20.04
    steps:
    - pwsh: Write-Output ${{thing.value}}
      displayName: Show thing

Since it’s processing a map, it references thing.key and thing.value instead of just thing. Again it creates two jobs with one step each.

Jobs Created by a Matrix Defined by an Each Loop over a Map

This combines the previous patterns to dynamically define a matrix using an each loop over a map parameter.

parameters:
- name: thingsMap
  type: object
  default:
    Thing1: foo
    Thing2: bar

jobs:
- job: MatrixEachMap
  pool:
    vmImage: ubuntu-20.04
  strategy:
    matrix:
      ${{each thing in parameters.thingsMap}}:
        ${{thing.key}}:
          thing: ${{thing.value}}
  steps:
  - pwsh: Write-Output $(thing)
    displayName: Show thing

Fiddly details:

  • We don’t need the YAML dashes (-) like we did in the two previous examples because we’re creating a map of config for the matrix, not an array of jobs. The ${{ }} syntax resolves to values that we want YAML to interpret as map keys, not array elements.
  • The each line still has to end with a colon (:).
  • We need a new colon (:) after ${{thing.key}} to tell YAML these are keys of a map.

The is the same as a hardcoded matrix except that its variables are dynamically referenced from a map parameter.

Steps with an Each Loop over an Array

The previous patterns used loops to dynamically create multiple jobs. This statically defines one job and dynamically creates multiple steps inside of it.

parameters:
- name: thingsArray
  type: object
  default:
  - foo
  - bar

jobs:
- job: EachArraySteps
  pool:
    vmImage: ubuntu-20.04
  steps:
  - ${{each thing in parameters.thingsArray}}:
    - pwsh: Write-Output ${{thing}}
      displayName: Show thing

As expected, we get one job that contains two Show thing steps.

The differences between these patterns are syntactically small, but they give you a lot of implementation options. Hopefully these examples help you find one that work for your use case.

Happy automating!

Operating Ops

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

Four Guidelines for Valuable Documentation

📃 We’ve written a lot of documentation for a lot of projects. We’ve also read a lot of documentation for a lot of projects and had mixed experiences with what it taught us. Across that work, we’ve found four guidelines that make documentation easy to write and valuable to readers. Hopefully they save you some time and some frustration!

All four come from one principle:

Documentation exists to help users with generic experience learn your specific system.

Generic experience is a prerequisite. Documentation isn’t a substitute for knowing the basics of the tooling your project uses, it’s a quick way for knowledgeable readers to learn the specific ways your project uses those tools.

Don’t Write Click-by-Click Instructions

❌ This is way too much detail:

  1. Go to https://console.aws.amazon.com/cloudwatch/home
  2. Click Log Groups on the left
  3. Type “widgets-dev-async-processor” in the search box
  4. Click the magnifying glass icon
  5. Find the “widgets-dev-async-processor” in the search results
  6. Click “widgets-dev-async-processor”
  7. Click the first stream in the list
  8. Read the log entries

It’s frustratingly tedious for experienced users. Users who are so new that they need this level of detail are unlikely to get much from the logs it helps them find.

This will also go out of date as soon as the CloudWatch UI changes. You won’t always notice when it changes, and even if you do it’s easy to forget to update your docs.

Use simple text directions instead:

Open the widgets-dev-async-processor Log Group in the AWS CloudWatch web console.

That’s easy to read, tells the reader what they need and where to find it, and won’t go out of date until you change how your logs are stored.

Limit Use of Screenshots

🔍 Searches can’t see into images, so anything captured in a screenshot won’t show up in search results. Similarly, readers can’t copy/paste from images.

Also, like click-by-click instructions, screenshots are tedious for experienced readers, they don’t help new users understand the system, and they’re impractical to keep up to date.

Most of the time, simple text directions like the ones given above are more usable.

Duplicated docs always diverge. Here’s a common example:

Infrastructure code and application code live in different repos. Engineers of both need to export AWS credentials into their environment variables. Infra engineers need them to run terraform, app engineers need them to query DynamoDB tables. Trying to make it easy for everybody to find what they need, someone documents the steps in each repo. Later, the way users get their credentials changes. The engineer making that change only works on terraform and rarely uses the app repo. They forget to update its instructions. A new engineer joins the app team, follows those (outdated) instructions, and gets access errors. There’s churn while they diagnose.

It’s better to document the steps in one repo and link 🔗 to those steps from the other. Then, everyone is looking at the same document, not just the same steps. It’s easy to update all docs because there’s only one doc. Readers know they’re looking at the most current doc because there’s only one doc.

This is also true for upstream docs. For example, if it’s already covered in HashiCorp’s excellent terraform documentation, just link to it. A copy will go out of date. Always link to the specific sections of pages that cover the details your readers need. Don’t send them to the header page and force them to search.

Keep a Small Set of Accurate Documentation

If you write too many docs, they’ll eventually rot. You’ll forget to update some. You won’t have time to update others. Users will read those docs and do the wrong thing. Errors are inevitable. It’s better to have a small set of accurate docs than a large set of questionable ones. Only write as many docs as it’s practical to maintain.

Writing docs can be a lot of work. Sometimes they just cause more errors. Hopefully, these guidelines will make your docs easier to write and more valuable to your readers.

Happy documenting!

Operating Ops

Need more than just this article? We’re available to consult.

You might also want to check out these related articles: