Passing Parameters to Docker Builds

Hello!

When I’m building Docker images, sometimes I need to pass data from the build agent (e.g. my CI pipeline) into the build process. Often, I also want to echo that data into the logs so I can use it for troubleshooting or validation later. Docker supports this!

These examples were all tested in Docker for Mac:

docker --version
Docker version 19.03.13, build 4484c46d9d

First, declare your build-time data as an ARG in your Dockerfile:

FROM alpine:3.7
 
ARG USEFUL_INFORMATION
ENV USEFUL_INFORMATION=$USEFUL_INFORMATION
RUN echo "Useful information: $USEFUL_INFORMATION"

In this example, I’ve also set an ENV variable so I can RUN a command to print out the new ARG.

Now, just build like ususal:

docker build --tag test_build_args --build-arg USEFUL_INFORMATION=1337 .
Sending build context to Docker daemon  10.24kB
Step 1/4 : FROM alpine:3.7
 ---> 6d1ef012b567
Step 2/4 : ARG USEFUL_INFORMATION
 ---> Using cache
 ---> 18d20c437445
Step 3/4 : ENV USEFUL_INFORMATION=$USEFUL_INFORMATION
 ---> Using cache
 ---> b8bbdd03a1d1
Step 4/4 : RUN echo "Useful information: $USEFUL_INFORMATION"
 ---> Running in a2161bfb75cd
Useful information: 1337
Removing intermediate container a2161bfb75cd
 ---> 9ca56256cc19
Successfully built 9ca56256cc19
Successfully tagged test_build_args:latest

If you don’t pass in a value for the new ARG, it resolves to an empty string:

docker build --tag test_build_args .
Sending build context to Docker daemon  10.24kB
Step 1/4 : FROM alpine:3.7
 ---> 6d1ef012b567
Step 2/4 : ARG USEFUL_INFORMATION
 ---> Using cache
 ---> 18d20c437445
Step 3/4 : ENV USEFUL_INFORMATION=$USEFUL_INFORMATION
 ---> Running in 63e4b0ce1fb7
Removing intermediate container 63e4b0ce1fb7
 ---> 919769a93b7d
Step 4/4 : RUN echo "Useful information: $USEFUL_INFORMATION"
 ---> Running in 73e158d1bfa6
Useful information:
Removing intermediate container 73e158d1bfa6
 ---> f928fc025270
Successfully built f928fc025270
Successfully tagged test_build_args:latest

Some details:

That’s it! Happy building,

Adam

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

PowerShell Scripts with Arguments

Hello!

I write a lot of utility scripts. Little helpers to automate repetetive work. Like going through all those YAML files and updating that one config item, or reading through all those database entries and finding the two that are messed up because of that weird bug I just found.

These scripts are usually small. I often don’t keep them very long. I also usually have to run them against multiple environments, and sometimes I have to hand them to other engineers. They need to behave predictably everywhere, and they need to be easy to read and run. They can’t be hacks that only I can use.

In my work, that means a script that takes arguments and passes them to internal functions that implement whatever I’m trying to do. Let’s say I need to find a thing with a known index, then reset it. Here’s the pattern I use in PowerShell:

[CmdletBinding()]
param(
    [int]$Index
)

function Get-Thing {
    [CmdletBinding()]
    param(
        [int]$Index
    )
    return "Thing$Index"
}

function Reset-Thing {
    [CmdletBinding()]
    param(
        [string]$Thing
    )
    # We'd do the reset here if this were a real script.
    Write-Verbose "Reset $Thing"
}

$Thing = Get-Thing -Index $Index
Reset-Thing -Thing $Thing

We can run that from a prompt with the Index argument:

./Reset-Thing.ps1 -Index 12 -Verbose
VERBOSE: Reset Thing12

Some details:

  • The param() call for the script has to be at the top. Posh throws errors if you put it down where the functions are invoked.
  • CmdletBinding() makes the script and its functions handle standard arguments like -Verbose. More details here.
  • This uses Write-Verbose to send informative output to the verbose “stream”. This is similar to setting the log level of a Python script to INFO. It allows the operator to select how much output they want to see. More details here.
  • As always, use verbs from Get-Verb when you’re naming things.
  • I could have written this with just straight commands instead of splitting them into Get and Reset functions, especially for an example this small, but it’s almost always better to separate out distinct pieces of logic. It’ll be easier to read if I have to hand it to someone else who’s not familiar with the operation. Same if I have to put it aside for a while and come back to it after I’ve forgotten how it works.

This is my starting point when I’m writing a helper script. It’s usually enough to let me sanely automate a one-off without getting derailed into full-scale application development.

Happy scripting,

Adam

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

PowerShell on OS X: Git Hooks

Hello!

PowerShell works great on Mac OS X. It’s my default shell. I usually only do things the Posh way, but sometimes the underlying system bubbles back up. Like when I’m writing git hooks.

In Posh, git hooks live in the same place and still have to be executable on your platform. That doesn’t change. But, the scripts themselves can be different. You have two options.

Option 1: Don’t Use PowerShell

Your existing hooks written in bash or zsh or whatever Linux-ey shell you were using will still work. That’s great if you already have a bunch and you don’t want to port them all.

If you’re writing anything new, though, use PowerShell. When I get into a mess on my Posh Apple, it’s usually because I mixed PowerShell with the legacy shell. You’re better off using just one.

Option 2: Update the Shebang

The shebang (#!) is the first line of executable scripts on Unix-like systems. It sets the program that’s used to run the script. We just need to write one in our hook script that points at pwsh (the PowerShell executable):

#!/usr/local/microsoft/powershell/7/pwsh
 
Write-Verbose -Verbose "We're about to commit!"

If you don’t have the path to your pwsh, you can find it with Get-Command pwsh.

After that, our hook works like normal:

git commit --allow-empty -m "Example commit."
VERBOSE: We're about to commit!
[master a905079] Example commit.

If you don’t set the shebang at all (leaving nothing but the Write-Verbose command in our example), your hook will run but OS X won’t treat it like PowerShell. You get “not found” errors:

git commit --allow-empty -m "Example commit."
.git/hooks/pre-commit: line 2: Write-Verbose: command not found
[master 1b2ebac] Example commit.

That’s actually good. If you have old hook scripts without shebang lines, they won’t break. Just make sure any new Posh scripts do have a shebang and everything should work.

Enjoy the Posh life!

Adam

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

A Checklist for Submitting Pull Requests

Hello!

Reviewing code is hard, especially because reviewers tend to inherit some responsibility for problems the code causes later. That can lead to churn while they try to develop confidence that new submissions are ready to merge.

I submit a lot of code for review, so I’ve been through a lot of that churn. Over the years I’ve found a few things that help make it easier for my reviewers to develop confidence in my submissions, so I decided to write a checklist. ✔️

The code I write lives in diverse repos governed by diverse requirements. A lot of the items in my checklist are there to help make sure I don’t mix up the issues I’m working on or the requirements of the repos I’m working in.

This isn’t a guide on writing good code. You can spend a lifetime on that topic. This is a quick checklist I use to avoid common mistakes.

This is written for Pull Requests submitted in git repos hosted on GitHub, but most of its steps are portable to other platforms (e.g. Perforce). It assumes common project features, like a contributing guide. Adjust as needed.

The Checklist

Immediately before submitting:

  1. Reread the issue.
  2. Merge the latest changes from the target branch (e.g. master).
  3. Reread the diff line by line.
  4. Rerun all tests. If the project doesn’t have automated tests, you can still:
    • Run static analysis tools on every file you changed.
    • Manually exercise new functionality.
    • Manually exercise existing functionality to make sure it hasn’t changed.
  5. Check if any documentation needs to be updated to reflect your changes.
  6. Check the rendering of any markup files (e.g. README.md) in the GitHub UI.
    • There are remarkable differences in how markup files render on different platforms, so it’s important to check them in the UI where they’ll live.
  7. Reread the project’s contributing guide.
  8. Write a description that:
    1. Links to the issue it addresses.
    2. Gives a plain English summary of the change.
    3. Explains decisions you had to make. Like:
      • Why you didn’t clean up that one piece of messy code.
      • How you chose the libraries you used.
      • Why you expanded an existing module instead of writing a new one.
      • How you chose the directory and file names you did.
      • Why you put your changes in this repo, instead of that other one.
    4. Lists all the tests you ran. Include relevant output or screenshots from manual tests.

There’s no perfect way to submit code for review. That’s why we still need humans to do it. The creativity and diligence of the engineer doing the work are more important than this checklist. Still, I’ve found that these reminders help me get code through review more easily.

Happy contributing!

Adam

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

How to Grep in PowerShell

Hello!

In oldschool Linux shells, you search files for a string with grep. You’re probably used to commands like this (example results from an OSS repo):

grep -r things .
./terraform.tfstate.backup:              "./count_things.py"
./count_things.py:def count_things(query):
./count_things.py:    count_things()
./terraform.tf:  program = ["python", "${path.module}/count_things.py"]

It outputs strings that concatenate the filename and the matching line. You can pipe those into awk or whatever other command to process them. Standard stuff.

You can achieve the same results in PowerShell, but it’s pretty different. Here’s the basic command:

Get-ChildItem -Recurse | Select-String 'things'
 
count_things.py:7:def count_things(query):
count_things.py:17:    count_things()
terraform.tf:6:  program = ["python", "${path.module}/count_things.py"]
terraform.tfstate.backup:25:              "./count_things.py"

This part is similar. Get-ChildItem recurses through the filesystem and passes the results to Select-String, which searches those files for the string things. The output looks the same. File on the left, matching line on the right. That’s just friendly formatting, though. Really what you’re getting is an array of objects that each represent one match. Posh summarizes that array with formatting that’s familiar, but actually processing these results is completely different.

We could parse out details the Linux way by piping into Out-String to convert the results into strings, splitting on :, and so on, but that’s not idiomatic PowerShell. Posh is object-oriented, so instead of manipulating strings we can just process whichever properties contain the information we’re searching for.

First, we need to know what properties are available:

Get-ChildItem -Recurse | Select-String 'things' | Get-Member
 
   TypeName: Microsoft.PowerShell.Commands.MatchInfo
 
Name               MemberType Definition
----               ---------- ----------
Equals             Method     bool Equals(System.Object obj)
GetHashCode        Method     int GetHashCode()
GetType            Method     type GetType()
RelativePath       Method     string RelativePath(string directory)
ToEmphasizedString Method     string ToEmphasizedString(string directory)
ToString           Method     string ToString(), string ToString(string directory)
Context            Property   Microsoft.PowerShell.Commands.MatchInfoContext Context {get;set;}
Filename           Property   string Filename {get;}
IgnoreCase         Property   bool IgnoreCase {get;set;}
Line               Property   string Line {get;set;}
LineNumber         Property   int LineNumber {get;set;}
Matches            Property   System.Text.RegularExpressions.Match[] Matches {get;set;}
Path               Property   string Path {get;set;}
Pattern            Property   string Pattern {get;set;}

Get-Member tells us the properties of the MatchInfo objects we piped into it. Now we can process them however we need.

Select One Property

If we only want the matched lines, not all the other info, and we can filter out the Line property with Select-Object.

Get-ChildItem -Recurse | Select-String 'things' | Select-Object 'Line'
 
Line
----
def count_things(query):
    count_things()
  program = ["python", "${path.module}/count_things.py"]
              "./count_things.py"

Sort Results

We can sort results by the content of a property with Sort-Object.

Get-ChildItem -Recurse | Select-String 'things' | Sort-Object -Property 'Line'
 
terraform.tfstate.backup:25:              "./count_things.py"
count_things.py:17:    count_things()
terraform.tf:6:  program = ["python", "${path.module}/count_things.py"]
count_things.py:7:def count_things(query):

Add More Filters

Often, I search for a basic pattern like ‘things’ and then chain in Where-Object to filter down to more specific results. It can be easier to chain matches as I go than to write a complex match pattern at the start.

Get-ChildItem -Recurse | Select-String 'things' | Where-Object 'Line' -Match 'def'
 
count_things.py:7:def count_things(query):

We’re not limited to filters on the matched text, either:

Get-ChildItem -Recurse | Select-String 'things' | Where-Object 'Filename' -Match 'terraform'
 
terraform.tf:6:  program = ["python", "${path.module}/count_things.py"]
terraform.tfstate.backup:25:              "./count_things.py"

There are tons of things you can do. The main detail to remember is that you need Get-Member to tell you what properties are available, then you can use any Posh command to process those properties.

Enjoy freedom from strings!

Adam

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

Tox: Testing Multiple Python Versions with Pyenv

Hello!

I use Python’s tox to orchestrate a lot of my tests. It lets you set a list of versions in a tox.ini file (in the same directory as your setup.py), like this:

[tox]
envlist = py37, py38
 
[testenv]
allowlist_externals = echo
commands = echo "success"

Then you can run the tox command, it’ll create a venv for each version, and run your tests in each of those environments. It’s an easy way to ensure your code works across all the versions of Python you want to support.

But, if I install tox into a 3.8 environment and run the tox command in the directory where we created the tox.ini above, I get this:

tox
GLOB sdist-make: /Users/adam/Local/fiddle/setup.py
py37 create: /Users/adam/Local/fiddle/.tox/py37
ERROR: InterpreterNotFound: python3.7
py38 create: /Users/adam/Local/fiddle/.tox/py38
py38 inst: /Users/adam/Local/fiddle/.tox/.tmp/package/1/example-0.0.0.zip
py38 installed: example @ file:///Users/adam/Local/fiddle/.tox/.tmp/package/1/example-0.0.0.zip
py38 run-test-pre: PYTHONHASHSEED='2325607949'
py38 run-test: commands[0] | echo success
success
___________________________________________________________________________ summary ____________________________________________________________________________
ERROR:  py37: InterpreterNotFound: python3.7
  py38: commands succeeded

It found the 3.8 interpreter I ran it with, but it couldn’t find 3.7.

pyenv can get you past this. It’s a utility for installing and switching between multiple Python versions. I use it on OS X (⬅️ instructions to get set up, if you’re not already). Here’s how it looks when I have Python 3.6, 3.7, and 3.8 installed, and I’m using 3.8:

pyenv versions
  system
  3.6.11
  3.7.9
* 3.8.5 (set by /Users/adam/.pyenv/version)

Just having those versions installed isn’t enough, though. You still get the error from tox about missing versions. You have to specifically enable each version:

pyenv local 3.8.5 3.7.9
pyenv versions
  system
  3.6.11
* 3.7.9 (set by /Users/adam/Local/fiddle/.python-version)
* 3.8.5 (set by /Users/adam/Local/fiddle/.python-version)

This will create a .python-version file in the current directory that sets your Python versions. pyenv will read that file whenever you’re in that directory. You can also set versions that’ll be picked up in any folder with the pyenv global command.

Now, tox will pick up both versions:

tox
GLOB sdist-make: /Users/adam/Local/fiddle/setup.py
py37 inst-nodeps: /Users/adam/Local/fiddle/.tox/.tmp/package/1/example-0.0.0.zip
py37 installed: example @ file:///Users/adam/Local/fiddle/.tox/.tmp/package/1/example-0.0.0.zip
py37 run-test-pre: PYTHONHASHSEED='1664367937'
py37 run-test: commands[0] | echo success
success
py38 inst-nodeps: /Users/adam/Local/fiddle/.tox/.tmp/package/1/example-0.0.0.zip
py38 installed: example @ file:///Users/adam/Local/fiddle/.tox/.tmp/package/1/example-0.0.0.zip
py38 run-test-pre: PYTHONHASHSEED='1664367937'
py38 run-test: commands[0] | echo success
success
___________________________________________________________________________ summary ____________________________________________________________________________
  py37: commands succeeded
  py38: commands succeeded
  congratulations :)

That’s it! Now you can run your tests in as many verions of Python as you need.

Happy testing,

Adam

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

PowerShell on OS X: Setting Your Path Variable

Hello!

There are two tricky little problems when setting your path variable in PowerShell. Here’s how to get past them.

First, lots of guides show things like this:

$Env:Path += "PATH_STRING"

Which works on Windows but won’t work on OS X. The variable name has to be all-caps:

$Env:PATH += "PATH_STRING"

Next, the separator between path elements on Windows is ;, but on OS X it’s :. Swap them and you should be good to go:

# Windows-only, won't work:
# $Env:PATH += ";/Users/adam/opt/bin"
 
# Works on OS X:
$Env:PATH += ":/Users/adam/opt/bin"

Small details, but they were remarkably fiddly to figure out the first time I ran in to them. Lots of people use Posh on Windows, so lots of guides and docs won’t work on Mac. You may find similar compatibility problems in scripts, too. Hopefully this saves you from some frustration.

Happy scripting,

Adam

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

PowerShell: Python venv Missing Activate.ps1

Hello!

Ran in to a weird problem this week: I created a Python 3.7.9 venv, but I couldn’t activate it in PoweShell (my shell of choice). The Activate.ps1 script was missing.

The core docs for 3.7 list VENV/Scripts/Activate.ps1 as the command to activate venvs in PowerShell (which seemed odd because I’m used to VENV/bin/activate from Bash, but whatever). The Scripts directory didn’t even exist:

gci ./test_venv_379/
 
    Directory: /Users/adam/Local/fiddle/test_venv_379
 
Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----          10/22/2020  9:28 AM                bin
d----          10/22/2020  9:28 AM                include
d----          10/22/2020  9:28 AM                lib
-----          10/22/2020  9:28 AM             98 pyvenv.cfg

I recreated the venv and got the same results. I made new venvs with 3.7.8 and 3.6.11, and again the same results. When I made a 3.8.5 venv, though, it had a VENV/bin/Activate.ps1 (which works great).

gci ./test_venv_385/bin
 
    Directory: /Users/adam/Local/fiddle/test_venv_385/bin
 
Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-----          10/22/2020  9:13 AM           2236 activate
-----          10/22/2020  9:13 AM           1288 activate.csh
-----          10/22/2020  9:13 AM           2440 activate.fish
-----          10/22/2020  9:13 AM           8834 Activate.ps1
-----          10/22/2020  9:13 AM            263 easy_install
...

Then I read the docs for 3.8: VENV/Scripts/Activate.ps1 is the PowerShell activation script but VENV/bin/Activate.ps1 is the PowerShell Core activation script. The 3.7 and 3.6 docs don’t make this distinction, which I’d bet is because PowerShell Core wasn’t supported until 3.8. I’m running Posh on Mac, so of course I’m running Posh Core (only Core supports Mac and Linux).

I suspect the VENV/Scripts/Activate.ps1 file was missing from both venvs because Python detected my shell was Posh Core, which it didn’t support. That would also explain why my 3.8 venv only had a VENV/bin/Activate.ps1 file, the file needed by Posh Core.

Anyway, if you upgrade to 3.8 (I used 3.8.5) you should be good to go.

If you can’t upgrade. Upgrade! But if you really really can’t, you can still use a 3.7 venv in Posh Core. Just call the executables by path instead of activating:

./test_venv_385/bin/python --version
Python 3.8.5

Hope that gets you past the problem!

Adam

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

Terratest Good Practices: Table-Driven Tests

Hello!

Terratest is a common way to run integration tests against terraform modules. I use it on many of the modules I develop. If you haven’t used it before, check out its quickstart for an example of how it works.

For simple cases, the pattern in that quickstart is all you need. But, bigger modules mean more tests and pretty soon you can end up swimming in all the cases you have to define. Go has a tool to help: table-driven tests. Here’s what you need to get them set up for terratest (Dave Cheney also has a great article on them if you want to go deeper).

First, let’s look at a couple simple tests that aren’t table-driven:

package test

import (
	"testing"

	"github.com/gruntwork-io/terratest/modules/terraform"
	"github.com/stretchr/testify/assert"
)

func TestOutputsExample(t *testing.T) {
	terraformOptions := &terraform.Options{
		TerraformDir: ".",
	}
	defer terraform.Destroy(t, terraformOptions)
	terraform.InitAndApply(t, terraformOptions)

	one := terraform.Output(t, terraformOptions, "One")
	assert.Equal(t, "First.", one)
	two := terraform.Output(t, terraformOptions, "Two")
	assert.Equal(t, "Second.", two)
}

Easy. Just repeat the calls to terraform.Output and assert.Equal for each test and assert it’s what you expect. Not a problem, unless you have dozens or hundreds of tests. Then you end up with a lot of duplication.

You can de-duplicate the repeated calls by defining your test cases in a slice of structs (the “table”) and then looping over the cases. Similar to adaptive modeling. Like this:

package test

import (
	"testing"

	"github.com/gruntwork-io/terratest/modules/terraform"
	"github.com/stretchr/testify/assert"
)

func TestOutputsTableDrivenExample(t *testing.T) {
	terraformOptions := &terraform.Options{
		TerraformDir: ".",
	}
	defer terraform.Destroy(t, terraformOptions)
	terraform.InitAndApply(t, terraformOptions)

	outputTests := []struct {
		outputName    string
		expectedValue string
	}{
		{"One", "First."},
		{"Two", "Second."},
	}

	for _, testCase := range outputTests {
		outputValue := terraform.Output(t, terraformOptions, testCase.outputName)
		assert.Equal(t, testCase.expectedValue, outputValue)
	}
}

Now, there’s just one statement each for terraform.Output and assert.Equal. With only two tests it actually takes a bit more code to use a table, but once you have a lot of tests it’ll save you.

That’s it! That’s all table-driven tests are. Just a routine practice in Go that work as well in terratest as anywhere.

Happy testing,

Adam

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

CloudWatch JSON Logs: aws_request_id

Hello!

In a previous article, I showed how to make AWS lambda functions log JSON objects to CloudWatch (⬅️ start there if you’re new to JSON logs). The pattern in that post had a flaw: it didn’t pass the aws_request_id. I was writing small functions to glue together bits of deployment automation and I didn’t need it. It was easy to correlate logs just with timestamps. Not every case is that simple. Sometimes you need that ID. Fortunately, Paulie Pena IV gave me some tips on how to pass it through.

We can look up the aws_request_id in the lambda function’s context object. To add it to log events, the python-json-logger library supports custom fields. To add a field for all log events, we need to subclass jsonlogger.JsonFormatter. Then we can save the ID into the new field and resolve it in the format string. Here’s the code from before with that change:

import logging
from pythonjsonlogger import jsonlogger
 
class CustomJsonFormatter(jsonlogger.JsonFormatter):
    def __init__(self, *args, **kwargs):
        self.aws_request_id = kwargs.pop('aws_request_id')
        super().__init__(*args, **kwargs)
    def add_fields(self, log_record, record, message_dict):
        super().add_fields(log_record, record, message_dict)
        log_record['aws_request_id'] = self.aws_request_id
 
def setup_logging(log_level, aws_request_id):
    logger = logging.getLogger()
 
    # Testing showed lambda sets up one default handler. If there are more,
    # something has changed and we want to fail so an operator can investigate.
    assert len(logger.handlers) == 1
 
    logger.setLevel(log_level)
    json_handler = logging.StreamHandler()
    formatter = CustomJsonFormatter(
        fmt='%(aws_request_id)s %(asctime)s %(levelname)s %(name)s %(message)s',
        aws_request_id=aws_request_id
    )
    json_handler.setFormatter(formatter)
    logger.addHandler(json_handler)
    logger.removeHandler(logger.handlers[0])
 
def lambda_handler(event, context):
    setup_logging(logging.DEBUG, context.aws_request_id)
    logger = logging.getLogger()
    logger.info('Huzzah!')

Now our logs contain the aws_request_id:

WithRequestId

Hope that helps,

Adam

Need more than just this article? We’re available to consult.

You might also want to check out these related articles: