Testing Azure Pipeline Artifacts

Azure Pipelines supports several types of artifacts. This is about Pipeline Artifacts, the ones managed with the publish and download tasks.

Any pipeline that builds code should do at least these two things:

  • Build an artifact that can be deployed later.
  • Test that artifact.

Specifically, it should test the artifact. Not just the same version of application code used to build the artifact, but the actual artifact file that was built. If it only tests the same version of code, it won’t detect bugs in how that code was built.

This is a case where more errors are better. If we hide errors at build time, we’ll have to diagnose them at deploy time. That could cause release failures, and maybe outages. Like the Zen of Python says:

Errors should never pass silently.

Testing the artifact itself is a best practice for any build pipeline. It’s better to find out right away that the code wasn’t built correctly.

First we’ll create a pipeline that tests its code but not the artifact it builds. We’ll include an artificial bug in the build process and show that the pipeline passes tests but still creates a broken artifact. Then we’ll rework to point the tests at the artifact so the build bug gets caught by the tests and becomes visible.

These examples use Python’s tox, but the principles are the same for any tooling. Tox is a testing tool that creates isolated Python environments, installs the app being tested into those environments, and runs test commands.

Setup

First we need a Python package. We’ll make a super-simple one called app:

.
├── app
│   ├── __init__.py
│   └── main.py
├── setup.py
└── tox.ini

app is a single package with an empty __init__.py. The package contains one main.py module that defines one function:

def main():
    print('Success!')

setup.py contains config that lets us build app into a Python wheel file (an artifact that can be installed into a Python environment):

from setuptools import setup

setup(
    author='Operating Ops, LLC',
    license='MIT',
    description='Demo app.',
    name='app',
    packages=['app'],
    version='0.0.1'
)

tox.ini tells tox to call our main() function:

[tox]
envlist = py38

[testenv]
commands = python -c 'from app.main import main; main()'

That’s not a real test, but it’ll be enough to show the difference between exercising source code and built artifacts. A real project would use the unittest library or pytest or another tool here.

This test passes locally:

(env3) PS /Users/adam/Local/laboratory/pipelines/testing_pipeline_artifacts> tox -e py38
GLOB sdist-make: /Users/adam/Local/laboratory/pipelines/testing_pipeline_artifacts/setup.py
py38 recreate: /Users/adam/Local/laboratory/pipelines/testing_pipeline_artifacts/.tox/py38
py38 inst: /Users/adam/Local/laboratory/pipelines/testing_pipeline_artifacts/.tox/.tmp/package/1/app-0.0.1.zip
py38 installed: app @ file:///Users/adam/Local/laboratory/pipelines/testing_pipeline_artifacts/.tox/.tmp/package/1/app-0.0.1.zip
py38 run-test-pre: PYTHONHASHSEED='3356214888'
py38 run-test: commands[0] | python -c 'from app.main import main; main()'
Success!
___________________________________________________________________________ summary ____________________________________________________________________________
  py38: commands succeeded
  congratulations :)

Negative Case

Our code works locally, now we need a build pipeline to make an artifact we can deploy. We’ll start with the negative case, a broken build that still passes tests:

jobs:
- job: Build
  pool:
    vmImage: ubuntu-20.04
  workspace:
    clean: outputs
  steps:
  - task: UsePythonVersion@0
    displayName: Use Python 3.8
    inputs:
      versionSpec: '3.8'
  - pwsh: pip install --upgrade pip setuptools wheel
    displayName: Install build tools
  - pwsh: Remove-Item app/main.py
    workingDirectory: $(Build.SourcesDirectory)/pipelines/testing_pipeline_artifacts
    displayName: BUILD BUG
  - pwsh: python setup.py bdist_wheel --dist-dir $(Build.BinariesDirectory)
    workingDirectory: $(Build.SourcesDirectory)/pipelines/testing_pipeline_artifacts
    displayName: Build wheel
  - publish: $(Build.BinariesDirectory)/app-0.0.1-py3-none-any.whl
    displayName: Publish wheel
    artifact: wheel

- job: Test
  dependsOn: Build
  pool:
    vmImage: ubuntu-20.04
  steps:
  - task: UsePythonVersion@0
    displayName: Use Python 3.8
    inputs:
      versionSpec: '3.8'
  - pwsh: pip install --upgrade tox
    displayName: Install tox

    # This tests the version of code used to build the 'wheel' artifact, but it
    # doesn't test the artifact itself.
  - pwsh: tox -e py38
    workingDirectory: $(Build.SourcesDirectory)/pipelines/testing_pipeline_artifacts
    displayName: Run tox on code

The Build and Test jobs both succeed even though the BUILD BUG task ran:

An artifact is published to the pipeline:

We can download it:

But, if we install it and try to import from app, we get errors:

(env3) PS /Users/adam/Downloads> pip install ./app-0.0.1-py3-none-any.whl
Processing ./app-0.0.1-py3-none-any.whl
Installing collected packages: app
Successfully installed app-0.0.1
(env3) PS /Users/adam/Downloads> python
Python 3.8.3 (default, Jul  1 2020, 07:50:15) 
[Clang 11.0.0 (clang-1100.0.33.16)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from app.main import main
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'app.main'

It couldn’t find app.main because that module doesn’t exist. Our BUILD BUG task deleted it before the artifact was built. Later, the Test job checked out a fresh copy of the code, which included a fresh copy of the file we accidentally deleted in the Build job. Tox ran in that fresh environment and passed because all the files it needed were present. It was testing the code from the repo, not the artifact we built.

The Fix

If the artifact created by the pipeline doesn’t work, the pipeline should fail. To make tox test the app-0.0.1-py3-none-any.whl file built in the Build job, we need to do two things:

  • Download the artifact in the Test job.
  • Tell tox to test that artifact instead of the files from the repo. Normally, tox builds its own artifacts from source when it runs (that’s what you want when you’re testing locally). We can override this and tell it to install our pipeline’s artifact with the --installpkg flag.

First we need to modify the Test job from our pipeline:

- job: Test
  dependsOn: Build
  pool:
    vmImage: ubuntu-20.04
  steps:
  - task: UsePythonVersion@0
    displayName: Use Python 3.8
    inputs:
      versionSpec: '3.8'
  - pwsh: pip install --upgrade tox
    displayName: Install tox
  - download: current
    displayName: Download wheel
    artifact: wheel

    # This tests the version of code used to build the 'wheel' artifact, but it
    # doesn't test the artifact itself.
  - pwsh: tox -e py38
    workingDirectory: $(Build.SourcesDirectory)/pipelines/testing_pipeline_artifacts
    displayName: Run tox on code

    # This tests the artifact built in the build job above.
    # https://tox.readthedocs.io/en/latest/config.html#conf-sdistsrc
  - pwsh: tox --installpkg $(Pipeline.Workspace)/wheel/app-0.0.1-py3-none-any.whl -e py38
    workingDirectory: $(Build.SourcesDirectory)/pipelines/testing_pipeline_artifacts
    displayName: Run tox on artifact

We kept the old (invalid) test so we can compare it to the new test in the next run.

These two changes are needed for pretty much any build and test system. For Python and tox in this lab specifically, we also need to:

  • Recreate the Python environments between tests. In the “Run tox on code” task, tox will automatically build and install version 0.0.1 of app from the code in the repo. Unless we get rid of that, the “Run tox on artifact” task will see that version 0.0.1 of the app is already installed, so it won’t install the artifact we pass with --installpkg.
  • Change directory away from the repo root. Otherwise the test may import files from the current directory instead of the artifact we pass with --installpkg.

We can do this with two changes to tox.ini:

[tox]
envlist = py38

[testenv]
# Recreate venvs so previously-installed packages aren't importable.
# https://tox.readthedocs.io/en/latest/config.html#conf-recreate
recreate = true

# Change directory so packages in the current directory aren't importable.
# https://tox.readthedocs.io/en/latest/config.html#conf-changedir
# It's convenient to use the {toxworkdir}, but other directories work.
# https://tox.readthedocs.io/en/latest/config.html#globally-available-substitutions
changedir = {toxworkdir}

commands = python -c 'from app.main import main; main()'

The new test fails:

We get the same ModuleNotFoundError we got when we tried to install the artifact manually and import from it. That shows us the new test is exercising the artifact built in the Build job, not just the code that’s in the repo.

Now when there’s a bug in the build, the pipeline will fail at build time. Fixes can be engineered before release, and broken artifacts won’t go live.

Happy building!

Operating Ops

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

PowerShell: Sorting Version Strings

Recently we had a large array of version strings we needed to sort. Like this, but way too long to sort by hand:

$Versions = @(
    '2.1.3',
    '1.2.3',
    '1.2.12'
)

Piping this array to Sort-Object changes the order, but not correctly.

$Versions | Sort-Object
1.2.12
1.2.3
2.1.3

It thinks 1.2.12 comes before 1.2.3. Comparing character-by-character, that’s true. 1 is less than 3. We need it to interpret everything after the period as one number. Then it’ll see that 3 is less than 12.

We can do this by casting the elements of the array to version before sorting.

[version[]]$Versions | Sort-Object

Major  Minor  Build  Revision
-----  -----  -----  --------
1      2      3      -1
1      2      12     -1
2      1      3      -1

The components are parsed out and stored individually as Major, Minor, and Build. Now that we’re sending versions instead of strings to Sort-Object, it compares the 3 build to the 12 build and gets the order right.

Of course, now we have version objects instead of the strings we started with. We can convert back with the ToString() method.

[version[]]$Versions | Sort-Object | foreach {$_.ToString()}
1.2.3
1.2.12
2.1.3

That one-liner is usually all that’s needed. The main limitation is the version class. It works with up to four integer components delimited by dots. That doesn’t handle some common conventions.

Versions are often prefixed with v, like v1.2.3. Fortunately, that doesn’t change the sorting. Just trim it out.

'v1.2.3'.TrimStart('v')
1.2.3

TrimStart() removes the v from the start of the string if it’s present, otherwise it’s a no-op. It’s safe to call on a mix of prefixed and non-prefixed strings. Run it on everything and sort like before.

Some of the patterns defined in the ubiquitous semver allow more characters and delimiters.

  • 1.0.0-alpha.1
  • 1.0.0+21AF26D3—-117B344092BD

One of these adds build metadata and semver doesn’t consider build metadata in precedence, so depending on your situation you might be able to just trim off the problem characters. If not, you’ll need a different parser.

Hope this helped!

Operating Ops

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

Azure Pipelines: Loops

This is about Azure YAML Pipelines, not Azure Classic Pipelines. The docs explain the differences.

Everything shown here is in the docs, but the pieces are scattered and the syntax is fiddly. It took some digging and testing to figure out the options. This article collects all the findings.

In Azure Pipelines, the matrix job strategy and the each keyword both do loop operations. This article shows how to use them in five patterns that dynamically create jobs and steps, but you’re not limited to these examples. The each keyword, especially, can be adapted to many other cases.

Jobs Created by a Hardcoded Matrix

Pipeline jobs with a matrix strategy dynamically create copies of themselves that each have different variables. This is essentially the same as looping over the matrix and creating one job for each set of those variables. Microsoft uses it for things like testing versions in parallel.

jobs:
- job: MatrixHardcoded
  pool:
    vmImage: ubuntu-20.04
  strategy:
    matrix:
      Thing1:
        thing: foo
      Thing2:
        thing: bar
  steps:
  - pwsh: Write-Output $(thing)
    displayName: Show thing

This creates MatrixHardcoded Thing1 and Matrix Hardcoded Thing2 jobs that each print the value of their thing variable in a Show thing step.

Jobs Created by an Each Loop over an Array

Pipelines have an each keyword in their expression syntax that implements loops more similar to what’s in programming languages like PowerShell and Python. Microsoft has great examples of its uses in their azure-pipelines-yaml repo.

parameters:
- name: thingsArray
  type: object
  default:
  - foo
  - bar

jobs:
- ${{each thing in parameters.thingsArray}}:
  - job: EachArrayJobsThing_${{thing}}
    pool:
      vmImage: ubuntu-20.04
    steps:
    - pwsh: Write-Output ${{thing}}
      displayName: Show thing

Fiddly details:

  • The ${{ }} syntax resolves into values. Since those values are prefixed with a dash (-), YAML interprets them as elements of an array. You need that dash on both the expression and job definition lines (highlighted). This feels like it’ll create an array of arrays that each contain one job, instead of a flat array of jobs, which seems like it would break. Maybe the pipeline interprets this syntax as a flat array, maybe it handles a nested one. Either way, you need both those dashes.
  • The each line has to end with a colon (:), but references to the ${{thing}} loop variable after it don’t.
  • Parameters are different from variables. Parameters support complex types (like arrays we can loop over). Variables are always strings.
  • If you need variables in your loop code, you can reference them in the expression syntax.
  • Parameters are mostly documented in the context of templates, but they can be used directly in pipelines.

This is mostly the same as a hardcoded matrix, but it creates jobs from a parameter that can be passed in dynamically.

There are some cosmetic differences. Since we used an array of values instead of a map of keys and values, there are no ThingN keys to use in the job names. They’re differentiated with values instead (foo and bar). The delimiters are underscores because job names don’t allow spaces (we could work around this with the displayName property).

We still get two jobs that each output their thing variable in a Show thing step.

Jobs Created by an Each Loop over a Map

This is the same as the previous pattern except it processes a map instead of an array.

parameters:
- name: thingsMap
  type: object
  default:
    Thing1: foo
    Thing2: bar

jobs:
- ${{each thing in parameters.thingsMap}}:
  - job: EachMapJobs${{thing.key}}
    pool:
      vmImage: ubuntu-20.04
    steps:
    - pwsh: Write-Output ${{thing.value}}
      displayName: Show thing

Since it’s processing a map, it references thing.key and thing.value instead of just thing. Again it creates two jobs with one step each.

Jobs Created by a Matrix Defined by an Each Loop over a Map

This combines the previous patterns to dynamically define a matrix using an each loop over a map parameter.

parameters:
- name: thingsMap
  type: object
  default:
    Thing1: foo
    Thing2: bar

jobs:
- job: MatrixEachMap
  pool:
    vmImage: ubuntu-20.04
  strategy:
    matrix:
      ${{each thing in parameters.thingsMap}}:
        ${{thing.key}}:
          thing: ${{thing.value}}
  steps:
  - pwsh: Write-Output $(thing)
    displayName: Show thing

Fiddly details:

  • We don’t need the YAML dashes (-) like we did in the two previous examples because we’re creating a map of config for the matrix, not an array of jobs. The ${{ }} syntax resolves to values that we want YAML to interpret as map keys, not array elements.
  • The each line still has to end with a colon (:).
  • We need a new colon (:) after ${{thing.key}} to tell YAML these are keys of a map.

The is the same as a hardcoded matrix except that its variables are dynamically referenced from a map parameter.

Steps with an Each Loop over an Array

The previous patterns used loops to dynamically create multiple jobs. This statically defines one job and dynamically creates multiple steps inside of it.

parameters:
- name: thingsArray
  type: object
  default:
  - foo
  - bar

jobs:
- job: EachArraySteps
  pool:
    vmImage: ubuntu-20.04
  steps:
  - ${{each thing in parameters.thingsArray}}:
    - pwsh: Write-Output ${{thing}}
      displayName: Show thing

As expected, we get one job that contains two Show thing steps.

The differences between these patterns are syntactically small, but they give you a lot of implementation options. Hopefully these examples help you find one that work for your use case.

Happy automating!

Operating Ops

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

Four Guidelines for Valuable Documentation

📃 We’ve written a lot of documentation for a lot of projects. We’ve also read a lot of documentation for a lot of projects and had mixed experiences with what it taught us. Across that work, we’ve found four guidelines that make documentation easy to write and valuable to readers. Hopefully they save you some time and some frustration!

All four come from one principle:

Documentation exists to help users with generic experience learn your specific system.

Generic experience is a prerequisite. Documentation isn’t a substitute for knowing the basics of the tooling your project uses, it’s a quick way for knowledgeable readers to learn the specific ways your project uses those tools.

Don’t Write Click-by-Click Instructions

❌ This is way too much detail:

  1. Go to https://console.aws.amazon.com/cloudwatch/home
  2. Click Log Groups on the left
  3. Type “widgets-dev-async-processor” in the search box
  4. Click the magnifying glass icon
  5. Find the “widgets-dev-async-processor” in the search results
  6. Click “widgets-dev-async-processor”
  7. Click the first stream in the list
  8. Read the log entries

It’s frustratingly tedious for experienced users. Users who are so new that they need this level of detail are unlikely to get much from the logs it helps them find.

This will also go out of date as soon as the CloudWatch UI changes. You won’t always notice when it changes, and even if you do it’s easy to forget to update your docs.

Use simple text directions instead:

Open the widgets-dev-async-processor Log Group in the AWS CloudWatch web console.

That’s easy to read, tells the reader what they need and where to find it, and won’t go out of date until you change how your logs are stored.

Limit Use of Screenshots

🔍 Searches can’t see into images, so anything captured in a screenshot won’t show up in search results. Similarly, readers can’t copy/paste from images.

Also, like click-by-click instructions, screenshots are tedious for experienced readers, they don’t help new users understand the system, and they’re impractical to keep up to date.

Most of the time, simple text directions like the ones given above are more usable.

Link Instead of Duplicating

Duplicated docs always diverge. Here’s a common example:

Infrastructure code and application code live in different repos. Engineers of both need to export AWS credentials into their environment variables. Infra engineers need them to run terraform, app engineers need them to query DynamoDB tables. Trying to make it easy for everybody to find what they need, someone documents the steps in each repo. Later, the way users get their credentials changes. The engineer making that change only works on terraform and rarely uses the app repo. They forget to update its instructions. A new engineer joins the app team, follows those (outdated) instructions, and gets access errors. There’s churn while they diagnose.

It’s better to document the steps in one repo and link 🔗 to those steps from the other. Then, everyone is looking at the same document, not just the same steps. It’s easy to update all docs because there’s only one doc. Readers know they’re looking at the most current doc because there’s only one doc.

This is also true for upstream docs. For example, if it’s already covered in HashiCorp’s excellent terraform documentation, just link to it. A copy will go out of date. Always link to the specific sections of pages that cover the details your readers need. Don’t send them to the header page and force them to search.

Keep a Small Set of Accurate Documentation

If you write too many docs, they’ll eventually rot. You’ll forget to update some. You won’t have time to update others. Users will read those docs and do the wrong thing. Errors are inevitable. It’s better to have a small set of accurate docs than a large set of questionable ones. Only write as many docs as it’s practical to maintain.

Writing docs can be a lot of work. Sometimes they just cause more errors. Hopefully, these guidelines will make your docs easier to write and more valuable to your readers.

Happy documenting!

Operating Ops

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

PowerShell: The Programmer’s Shell

A couple years ago, I switched all my workstations to PowerShell. Folks often ask me why I did that. What made it worth the trouble of learning a new syntax? Here are the things about Posh that made me switch:

Mac, Linux, and Windows Support

I usually run PowerShell on Mac OS, but it also supports Linux and Windows. It gives me a standardized interface to all three operating systems.

Object Oriented Model

This is the big reason. It’s the thing that makes PowerShell a programming language like Python or Ruby instead of just a scripting language like Bash.

In Bash everything is a string. Say we’ve found something on the filesystem:

bash-3.2$ ls -l | grep tmp
drwxr-xr-x   4 adam  staff   128 Oct 15 18:10 tmp

If we need that Oct 15 date, we’d parse it out with something like awk:

bash-3.2$ ls -l | grep tmp | awk '{print $6, $7}'
Oct 15

That splits the line on whitepace and prints out the 6th and 7th fields. If the whitespacing of that output string changes (like if you run this on someone else’s workstation and they’ve tweaked their terminal), this will silently break. It won’t error, it just won’t parse out the right data. You’ll get downstream failures in code that expected a date but got something different.

PowerShell is object oriented, so it doesn’t rely on parsing strings. If we find the same directory on the filesystem:

PS /Users/adam> Get-ChildItem | Where-Object Name -Match tmp

    Directory: /Users/adam

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----          10/15/2020  6:10 PM                tmp

It displays similarly, but that’s just formatting goodness. Underneath, it found an object that represents the directory. That object has properties (Mode, LastWriteTime, Length, Name). We can get them by reference:

PS /Users/adam> Get-ChildItem | Where-Object Name -Match tmp | Select-Object LastWriteTime

LastWriteTime
-------------
10/15/2020 6:10:55 PM

We tell the shell we want the LastWriteTime property and it gets the value. It’ll get the same value no matter how it was displayed. We’re referencing a property not parsing output strings.

This makes Posh less fragile, but also gives us access to the standard toolbox of programming techniques. Its functions and variable scopes and arrays and dictionaries and conditions and loops and comparisons and everything else work similarly to languages like Python and Ruby. There’s less Weird Stuff. Ever have to set and unset $IFS in Bash? You don’t have to do that in PowerShell.

Streams

Streams are a huge feature of PowerShell, and there are already great articles that cover the details. I’m only going to highlight one thing that makes me love them: they let me add informative output similar to a DEBUG log line in Python and other programming languages. Let’s convert our search for tmp into a super-simple script:

[CmdletBinding()]
param()

function Get-Thing {
    [CmdletBinding()]
    param()
    $AllItems = Get-ChildItem
    Write-Verbose "Filtering for 'tmp'."
    return $AllItems | Where-Object Name -Match 'tmp'
}

Get-Thing

If we run this normally we just get the tmp directory:

PS /Users/adam> ./streams.ps1

    Directory: /Users/adam

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----          10/15/2020  6:10 PM                tmp

If we run it with -Verbose, we also see our message:

PS /Users/adam> ./streams.ps1 -Verbose
VERBOSE: Filtering for 'tmp'.

    Directory: /Users/adam

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----          10/15/2020  6:10 PM                tmp

We can still pipe to the same command to get the LastWriteTime:

PS /Users/adam> ./streams.ps1 -Verbose | Select-Object LastWriteTime
VERBOSE: Filtering for 'tmp'.

LastWriteTime
-------------
10/15/2020 6:10:55 PM

The pipeline reads objects from a different stream, so we can send whatever we want to the verbose stream without impacting what the user may pipe to later. More on this in a future article. For today, I’m just showing that scripts can present information to the user without making it harder for them to use the rest of the output.

The closest you can get to this in Bash is stderr, but that stream is used for more than just information, and realistically you can’t guess the impact of sending messages to it. Having a dedicated stream for verbose messages makes it trivial to provide information without disrupting behavior.

PowerShell is a big language and there’s a lot more to it than what I’ve covered here. These are just the things that I get daily value from. To me, they more than compensate for the (minimal) overhead of learning a new syntax.

Happy scripting!

Adam
Operating Ops

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

Separate Work and Personal Email

Good morning!

Recent security incidents reminded me of an important rule that often doesn’t make it on to security checklists:

Separate work and personal email.

In these incidents, workers used forwarding rules to send work email to personal accounts. Attackers used those rules to collect sensitive information. This is an example of exfiltration. Company security teams can do a lot to protect the email accounts they administer, but there’s not much they can do when data is forwarded from those accounts to outside services.

Here are (just a few) common examples of sensitive information attackers might get from email:

  • Password reset links. Most accounts that aren’t protected by MFA can be accessed by a password reset process that only requires you to click a link in an email. Inboxes are the gateway to many other systems.
  • Bug reports. Information sent between engineers, project managers, or other team members about flaws in your products can help attackers craft exploits.
  • Upgrade notifications. If you get an upgrade notification about any tool your company uses, that tells attackers you’re still using an old version of that tool. They can look for known vulnerabilities in that version and use them in attacks.
  • Personal information about workers who have privileged access. Phishing and other forms of social engineering are still common. Phishing was used in the incidents that prompted this post. The more attackers know about you, the more real they can pretend to be. They only need to fool one person who has access to production.
  • Personally identifying information (PII). Customer error reports, for example. They might contain names, email addresses, real addresses, IP addresses, etc. All it takes is a copy/paste of one database entry by an engineer trying to track down the root cause of a problem with the product and you can have PII in your inbox. PII can be valuable to attackers (e.g. for scams) but it’s also subject to regulation. Sending it outside the company can cause big problems.

This applies to everyone, not just engineers. Project managers get bug reports. Customer service staff get customer error reports and any PII they contain. Upgrade notifications are often blasted out to distributions lists that include half the company. Even if you don’t have an engineering role, it’s still important to keep company email within the company.

Stay safe!

Adam

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

Passing Parameters to Docker Builds

Hello!

When I’m building Docker images, sometimes I need to pass data from the build agent (e.g. my CI pipeline) into the build process. Often, I also want to echo that data into the logs so I can use it for troubleshooting or validation later. Docker supports this!

These examples were all tested in Docker for Mac:

docker --version
Docker version 19.03.13, build 4484c46d9d

First, declare your build-time data as an ARG in your Dockerfile:

FROM alpine:3.7

ARG USEFUL_INFORMATION
ENV USEFUL_INFORMATION=$USEFUL_INFORMATION
RUN echo "Useful information: $USEFUL_INFORMATION"

In this example, I’ve also set an ENV variable so I can RUN a command to print out the new ARG.

Now, just build like ususal:

docker build --tag test_build_args --build-arg USEFUL_INFORMATION=1337 .
Sending build context to Docker daemon  10.24kB
Step 1/4 : FROM alpine:3.7
 ---> 6d1ef012b567
Step 2/4 : ARG USEFUL_INFORMATION
 ---> Using cache
 ---> 18d20c437445
Step 3/4 : ENV USEFUL_INFORMATION=$USEFUL_INFORMATION
 ---> Using cache
 ---> b8bbdd03a1d1
Step 4/4 : RUN echo "Useful information: $USEFUL_INFORMATION"
 ---> Running in a2161bfb75cd
Useful information: 1337
Removing intermediate container a2161bfb75cd
 ---> 9ca56256cc19
Successfully built 9ca56256cc19
Successfully tagged test_build_args:latest

If you don’t pass in a value for the new ARG, it resolves to an empty string:

docker build --tag test_build_args .
Sending build context to Docker daemon  10.24kB
Step 1/4 : FROM alpine:3.7
 ---> 6d1ef012b567
Step 2/4 : ARG USEFUL_INFORMATION
 ---> Using cache
 ---> 18d20c437445
Step 3/4 : ENV USEFUL_INFORMATION=$USEFUL_INFORMATION
 ---> Running in 63e4b0ce1fb7
Removing intermediate container 63e4b0ce1fb7
 ---> 919769a93b7d
Step 4/4 : RUN echo "Useful information: $USEFUL_INFORMATION"
 ---> Running in 73e158d1bfa6
Useful information:
Removing intermediate container 73e158d1bfa6
 ---> f928fc025270
Successfully built f928fc025270
Successfully tagged test_build_args:latest

Some details:

That’s it! Happy building,

Adam

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

PowerShell Scripts with Arguments

Hello!

I write a lot of utility scripts. Little helpers to automate repetetive work. Like going through all those YAML files and updating that one config item, or reading through all those database entries and finding the two that are messed up because of that weird bug I just found.

These scripts are usually small. I often don’t keep them very long. I also usually have to run them against multiple environments, and sometimes I have to hand them to other engineers. They need to behave predictably everywhere, and they need to be easy to read and run. They can’t be hacks that only I can use.

In my work, that means a script that takes arguments and passes them to internal functions that implement whatever I’m trying to do. Let’s say I need to find a thing with a known index, then reset it. Here’s the pattern I use in PowerShell:

[CmdletBinding()]
param(
    [int]$Index
)

function Get-Thing {
    [CmdletBinding()]
    param(
        [int]$Index
    )
    return "Thing$Index"
}

function Reset-Thing {
    [CmdletBinding()]
    param(
        [string]$Thing
    )
    # We'd do the reset here if this were a real script.
    Write-Verbose "Reset $Thing"
}

$Thing = Get-Thing -Index $Index
Reset-Thing -Thing $Thing

We can run that from a prompt with the Index argument:

./Reset-Thing.ps1 -Index 12 -Verbose
VERBOSE: Reset Thing12

Some details:

  • The param() call for the script has to be at the top. Posh throws errors if you put it down where the functions are invoked.
  • CmdletBinding() makes the script and its functions handle standard arguments like -Verbose. More details here.
  • This uses Write-Verbose to send informative output to the verbose “stream”. This is similar to setting the log level of a Python script to INFO. It allows the operator to select how much output they want to see. More details here.
  • As always, use verbs from Get-Verb when you’re naming things.
  • I could have written this with just straight commands instead of splitting them into Get and Reset functions, especially for an example this small, but it’s almost always better to separate out distinct pieces of logic. It’ll be easier to read if I have to hand it to someone else who’s not familiar with the operation. Same if I have to put it aside for a while and come back to it after I’ve forgotten how it works.

This is my starting point when I’m writing a helper script. It’s usually enough to let me sanely automate a one-off without getting derailed into full-scale application development.

Happy scripting,

Adam

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

PowerShell on OS X: Git Hooks

Hello!

PowerShell works great on Mac OS X. It’s my default shell. I usually only do things the Posh way, but sometimes the underlying system bubbles back up. Like when I’m writing git hooks.

In Posh, git hooks live in the same place and still have to be executable on your platform. That doesn’t change. But, the scripts themselves can be different. You have two options.

Option 1: Don’t Use PowerShell

Your existing hooks written in bash or zsh or whatever Linux-ey shell you were using will still work. That’s great if you already have a bunch and you don’t want to port them all.

If you’re writing anything new, though, use PowerShell. When I get into a mess on my Posh Apple, it’s usually because I mixed PowerShell with the legacy shell. You’re better off using just one.

Option 2: Update the Shebang

The shebang (#!) is the first line of executable scripts on Unix-like systems. It sets the program that’s used to run the script. We just need to write one in our hook script that points at pwsh (the PowerShell executable):

#!/usr/local/microsoft/powershell/7/pwsh

Write-Verbose -Verbose "We're about to commit!"

If you don’t have the path to your pwsh, you can find it with Get-Command pwsh.

After that, our hook works like normal:

git commit --allow-empty -m "Example commit."
VERBOSE: We're about to commit!
[master a905079] Example commit.

If you don’t set the shebang at all (leaving nothing but the Write-Verbose command in our example), your hook will run but OS X won’t treat it like PowerShell. You get “not found” errors:

git commit --allow-empty -m "Example commit."
.git/hooks/pre-commit: line 2: Write-Verbose: command not found
[master 1b2ebac] Example commit.

That’s actually good. If you have old hook scripts without shebang lines, they won’t break. Just make sure any new Posh scripts do have a shebang and everything should work.

Enjoy the Posh life!

Adam

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

A Checklist for Submitting Pull Requests

Hello!

Reviewing code is hard, especially because reviewers tend to inherit some responsibility for problems the code causes later. That can lead to churn while they try to develop confidence that new submissions are ready to merge.

I submit a lot of code for review, so I’ve been through a lot of that churn. Over the years I’ve found a few things that help make it easier for my reviewers to develop confidence in my submissions, so I decided to write a checklist. ✔️

The code I write lives in diverse repos governed by diverse requirements. A lot of the items in my checklist are there to help make sure I don’t mix up the issues I’m working on or the requirements of the repos I’m working in.

This isn’t a guide on writing good code. You can spend a lifetime on that topic. This is a quick checklist I use to avoid common mistakes.

This is written for Pull Requests submitted in git repos hosted on GitHub, but most of its steps are portable to other platforms (e.g. Perforce). It assumes common project features, like a contributing guide. Adjust as needed.

The Checklist

Immediately before submitting:

  1. Reread the issue.
  2. Merge the latest changes from the target branch (e.g. master).
  3. Reread the diff line by line.
  4. Rerun all tests. If the project doesn’t have automated tests, you can still:
    • Run static analysis tools on every file you changed.
    • Manually exercise new functionality.
    • Manually exercise existing functionality to make sure it hasn’t changed.
  5. Check if any documentation needs to be updated to reflect your changes.
  6. Check the rendering of any markup files (e.g. README.md) in the GitHub UI.
    • There are remarkable differences in how markup files render on different platforms, so it’s important to check them in the UI where they’ll live.
  7. Reread the project’s contributing guide.
  8. Write a description that:
    1. Links to the issue it addresses.
    2. Gives a plain English summary of the change.
    3. Explains decisions you had to make. Like:
      • Why you didn’t clean up that one piece of messy code.
      • How you chose the libraries you used.
      • Why you expanded an existing module instead of writing a new one.
      • How you chose the directory and file names you did.
      • Why you put your changes in this repo, instead of that other one.
    4. Lists all the tests you ran. Include relevant output or screenshots from manual tests.

There’s no perfect way to submit code for review. That’s why we still need humans to do it. The creativity and diligence of the engineer doing the work are more important than this checklist. Still, I’ve found that these reminders help me get code through review more easily.

Happy contributing!

Adam

Need more than just this article? We’re available to consult.

You might also want to check out these related articles: