PowerShell Scripts with Arguments

Hello!

I write a lot of utility scripts. Little helpers to automate repetetive work. Like going through all those YAML files and updating that one config item, or reading through all those database entries and finding the two that are messed up because of that weird bug I just found.

These scripts are usually small. I often don’t keep them very long. I also usually have to run them against multiple environments, and sometimes I have to hand them to other engineers. They need to behave predictably everywhere, and they need to be easy to read and run. They can’t be hacks that only I can use.

In my work, that means a script that takes arguments and passes them to internal functions that implement whatever I’m trying to do. Let’s say I need to find a thing with a known index, then reset it. Here’s the pattern I use in PowerShell:

[CmdletBinding()]
param(
    [int]$Index
)

function Get-Thing {
    [CmdletBinding()]
    param(
        [int]$Index
    )
    return "Thing$Index"
}

function Reset-Thing {
    [CmdletBinding()]
    param(
        [string]$Thing
    )
    # We'd do the reset here if this were a real script.
    Write-Verbose "Reset $Thing"
}

$Thing = Get-Thing -Index $Index
Reset-Thing -Thing $Thing

We can run that from a prompt with the Index argument:

./Reset-Thing.ps1 -Index 12 -Verbose
VERBOSE: Reset Thing12

Some details:

  • The param() call for the script has to be at the top. Posh throws errors if you put it down where the functions are invoked.
  • CmdletBinding() makes the script and its functions handle standard arguments like -Verbose. More details here.
  • This uses Write-Verbose to send informative output to the verbose “stream”. This is similar to setting the log level of a Python script to INFO. It allows the operator to select how much output they want to see. More details here.
  • As always, use verbs from Get-Verb when you’re naming things.
  • I could have written this with just straight commands instead of splitting them into Get and Reset functions, especially for an example this small, but it’s almost always better to separate out distinct pieces of logic. It’ll be easier to read if I have to hand it to someone else who’s not familiar with the operation. Same if I have to put it aside for a while and come back to it after I’ve forgotten how it works.

This is my starting point when I’m writing a helper script. It’s usually enough to let me sanely automate a one-off without getting derailed into full-scale application development.

Happy scripting,

Adam

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

A Checklist for Submitting Pull Requests

Hello!

Reviewing code is hard, especially because reviewers tend to inherit some responsibility for problems the code causes later. That can lead to churn while they try to develop confidence that new submissions are ready to merge.

I submit a lot of code for review, so I’ve been through a lot of that churn. Over the years I’ve found a few things that help make it easier for my reviewers to develop confidence in my submissions, so I decided to write a checklist. ✔️

The code I write lives in diverse repos governed by diverse requirements. A lot of the items in my checklist are there to help make sure I don’t mix up the issues I’m working on or the requirements of the repos I’m working in.

This isn’t a guide on writing good code. You can spend a lifetime on that topic. This is a quick checklist I use to avoid common mistakes.

This is written for Pull Requests submitted in git repos hosted on GitHub, but most of its steps are portable to other platforms (e.g. Perforce). It assumes common project features, like a contributing guide. Adjust as needed.

The Checklist

Immediately before submitting:

  1. Reread the issue.
  2. Merge the latest changes from the target branch (e.g. master).
  3. Reread the diff line by line.
  4. Rerun all tests. If the project doesn’t have automated tests, you can still:
    • Run static analysis tools on every file you changed.
    • Manually exercise new functionality.
    • Manually exercise existing functionality to make sure it hasn’t changed.
  5. Check if any documentation needs to be updated to reflect your changes.
  6. Check the rendering of any markup files (e.g. README.md) in the GitHub UI.
    • There are remarkable differences in how markup files render on different platforms, so it’s important to check them in the UI where they’ll live.
  7. Reread the project’s contributing guide.
  8. Write a description that:
    1. Links to the issue it addresses.
    2. Gives a plain English summary of the change.
    3. Explains decisions you had to make. Like:
      • Why you didn’t clean up that one piece of messy code.
      • How you chose the libraries you used.
      • Why you expanded an existing module instead of writing a new one.
      • How you chose the directory and file names you did.
      • Why you put your changes in this repo, instead of that other one.
    4. Lists all the tests you ran. Include relevant output or screenshots from manual tests.

There’s no perfect way to submit code for review. That’s why we still need humans to do it. The creativity and diligence of the engineer doing the work are more important than this checklist. Still, I’ve found that these reminders help me get code through review more easily.

Happy contributing!

Adam

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

PowerShell: Sort Hash Table Into Ordered Dictionary

Hello!

PowerShell’s Hash Tables are unordered. The keys don’t always come back in the same order you entered them:

PS /Users/adam/Local/fiddle> $HashTable = @{                   
>>     'a' = 1
>>     'b' = 2
>>     'c' = 3
>>     'd' = 4
>> }
PS /Users/adam/Local/fiddle> $HashTable

Name                           Value
----                           -----
c                              3
b                              2
d                              4
a                              1

I created the hash in the order a, b, c, d but I got back c, b, d, a. That’s normal.

PowerShell also has Ordered Dictionaries that work like Hash Tables but preserve order:

PS /Users/adam/Local/fiddle> $OrderedDictionary = [ordered]@{
>>     'a' = 1
>>     'b' = 2
>>     'c' = 3
>>     'd' = 4
>> }
PS /Users/adam/Local/fiddle> $OrderedDictionary

Name                           Value
----                           -----
a                              1
b                              2
c                              3
d                              4

Today I had large hash and I needed to convert it to a dictionary that was sorted by key name. The hash was returned by a library I didn’t control so I couldn’t just re-define it as a dictionary. I had to convert it. There are a couple ways, but I found this was the cleanest:

$HashTable = @{
    'd' = 4
    'a' = 1
    'b' = 2
    'c' = 3
}
$OrderedDictionary = [ordered]@{}
foreach ($Item in ($HashTable.GetEnumerator() | Sort-Object -Property Key)) {
    $OrderedDictionary[$Item.Key] = $Item.Value
}
$OrderedDictionary

This outputs a dictionary that has been sorted by its keys:

PS /Users/adam/Local/fiddle> ./Ordering.ps1

Name                           Value
----                           -----
a                              1
b                              2
c                              3
d                              4

Because it’s a dictionary and not a regular hash, it’ll keep that ordering.

Happy scripting!

Adam

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

Don’t Import requests From botocore.vendored

Hello!

I’ve seen this anti-pattern scattered around plenty DevOps code, especially in AWS lambda functions:

from botocore.vendored import requests

Vendoring libraries like requests into other libraries like botocore is arguably an anti-pattern in general, but reaching in to botocore and importing it in your own code is definitely one. Here are some of the reasons:

  • The maintainers may un-vendor it. This just happened! In newer versions of botocore you can still import requests but all that’s left are some bits of the error handling system. If you upgrade botocore your imports will still work but you’ll get errors when you try to use requests. Like this in version 1.13.15:
    >>> from botocore.vendored import requests
    >>> print(requests.get('https://google.com'))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: module 'botocore.vendored.requests' has no attribute 'get'
    

    I saw an import like this running in an AWS lambda function a few days ago and it worked but was showing deprecation warnings. When AWS upgrades it’ll break entirely.

  • The vendored version may be outdated and have security vulnerabilities that have been patched in newer versions. Even if maintainers realize there’s a vulnerability, if they know they’re not using a vulnerable module of the package they may still not upgrade. Unless you check every usage to ensure you’re not using vulnerable modules at the vendored version, you should assume you are.
  • The vendored package may have been customized. This shouldn’t happen, but I’ve seen it in other packages plenty of times. Once the code has been copied into the repo it’s super easy for someone to tweak it for convenience. Vendored packages may no longer behave how you expect.

Instead of relying on botocore’s vendoring, add requests to your dependency chain like usual. Add a line to your requirements.txt file or update your setup.py. To package code with dependencies for AWS lambda functions, check out this. To add dependencies to your setup.py check out this.

Happy automating!

Adam

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

PowerShell: Getting Properties From Objects In Arrays

In PowerShell, I often run commands that return arrays I need to filter. Once I’ve filtered out object I’m looking for, I need to read a property off that object. There are a few ways to do this. Here are three.

These are also good examples if you’re new to PowerShell and trying to switch from the Linux mindset of parsing strings to the PowerShell mindset of manipulating objects.

For these examples we’ll be using an array filtered from Get-Process:

Get-Process | Where-Object ProcessName -Match "update"

 NPM(K)    PM(M)      WS(M)     CPU(s)      Id  SI ProcessName
 ------    -----      -----     ------      --  -- -----------
      0     0.00      10.32     380.46     733   1 SoftwareUpdateN

Method 1: Select-Object

Select-Object reads properties from objects. We can pipe it the object we’ve found:

Get-Process | Where-Object ProcessName -Match "update" | Select-Object cpu

       CPU
       ---
380.761615

In this case, Where-Object returns one object and Select-Object reads the property off of it, but this still works if we match multiple processes. Then, Where-Object returns an array that gets unpacked by PowerShell’s pipeline and sent to Select-Object one at a time.

This is basically a property filter. It still returns an array of Process objects, but those objects only have the one property you selected. We can see this with Get-Member:

Get-Process | Where-Object ProcessName -Match "update" | Select-Object cpu | Get-Member

   TypeName: Selected.System.Diagnostics.Process

Name        MemberType   Definition
----        ----------   ----------
Equals      Method       bool Equals(System.Object obj)
GetHashCode Method       int GetHashCode()
GetType     Method       type GetType()
ToString    Method       string ToString()
CPU         NoteProperty System.Double CPU=19.3147851

Four methods, but only one property.

Method 2: Subexpression

If we capture the output of our match command into a subexpression, we can access the object’s properties using dotted notation like we would with any variable:

$(Get-Process | Where-Object ProcessName -Match "update").cpu
380.761615

This also works if our match returned multiple objects. The subexpression will contain an array and PowerShell will automatically get the cpu property from each object in that array.

Unlike the filter of Select-Object, this doesn’t return Process objects with a cpu property. Instead, it returns a double (number) with the actual value:

$(Get-Process | Where-Object ProcessName -Match "update").cpu.GetType()

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     Double                                   System.ValueType

Method 3: Loop

Like any output, we can pipe our match into a ForEach-Object loop and use the $_ variable to access properties of each item the loop sees:

Get-Process | Where-Object ProcessName -Match "update" | ForEach-Object {$_.cpu}
381.625785

The loop will of course work on multiple objects.

Just like the subexpression, this returns the actual value instead of an object with one property.

Happy automating!

Adam

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

How to Use Out-String in PowerShell: Don’t

In my PowerShell Help Commands For Linux Users post, I showed you this pattern for searching for command aliases:

Get-Alias | Out-String -stream | Select-String -Pattern 'move'

A beginner mistake! Here’s the problem:

I’m used to the Unix shells, like bash, where everything is a string. When you run alias in bash, you get this:

$ alias
alias hello='echo hello'
alias la='ls -a'
alias ll='ls -l'

A multiline string with one alias per line. You search those aliases like this:

$ alias | grep hello
alias hello='echo hello'

grep does a pattern match on each string of those lines, one line at a time.

PowerShell is object-oriented. Its Get-Alias command doesn’t return a multiline string, it returns an array of objects. Those object have properties like Name. If you want to find aliases whose name match a pattern, you just iterate the array:

Get-Alias | Where-Object Name -Match ".*move.*"

Where-Object checks the Name property of each object in the array to see if it matches the .*move.* pattern. (Technically Where-Object isn’t doing the iteration, PowerShell is unpacking the array and sending the objects through the pipe one at a time)

This is so much better. It’s like writing Python. Standard data types and logic.

In my past post, I piped to Out-String, which converts objects into strings. That allowed me to imitate the Linux pattern by searching with Select-String (basically the PowerShell grep). Totally unnecessary. In PowerShell you can just match on object properties directly. You don’t need to cast to strings.

In a simple case like this PowerShell’s object-oriented nature is mostly a novelty, but in more complex cases it ends up being hugely powerful. More posts coming!

Happy automating,

Adam

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

Python: JSON Structured Logging

Hello!

If you’re setting up JSON logging in AWS lambda, check out this instead. You need some extra code to prevent duplicate log messages.

Recently, I’ve been switching to logs structured as JSON. Using the sample command in my pattern for production-ready Python scripts, that means we replace delimited-strings like these:

2019-09-29 19:54:44,243 | INFO | sample_scripts.good | Acting.
2019-09-29 19:54:49,244 | INFO | sample_scripts.good | Action complete.

With JSON objects like these:

{"asctime": "2019-09-29 19:53:28,654", "levelname": "INFO", "name": "sample_scripts.good", "message": "Acting."}
{"asctime": "2019-09-29 19:53:33,654", "levelname": "INFO", "name": "sample_scripts.good", "message": "Action complete."}

Or, pretty-printed for human-readability:

{
  "asctime": "2019-09-29 19:53:28,654",
  "levelname": "INFO",
  "name": "sample_scripts.good",
  "message": "Acting."
}
{
  "asctime": "2019-09-29 19:53:33,654",
  "levelname": "INFO",
  "name": "sample_scripts.good",
  "message": "Action complete."
}

This way, your log processor can reference keys in a JSON object instead of splitting strings and hoping the split works right. Modern log processors are pretty good with delimited strings, but I’ve still seen them break because of things like unexpected whitespace or messages that happen to contain delimiter characters.

The Python logging library doesn’t have native JSON support, so I use the python-json-logger library. Here’s how I set it up:

from pythonjsonlogger import jsonlogger

def setup_logging(log_level):
    logger = logging.getLogger(__name__)
    logger.setLevel(log_level)
    json_handler = logging.StreamHandler()
    formatter = jsonlogger.JsonFormatter(
        fmt='%(asctime)s %(levelname)s %(name)s %(message)s'
    )
    json_handler.setFormatter(formatter)
    logger.addHandler(json_handler)

That’s it! Just call setup_logging and then you can get loggers with logging.getLogger(__name__) as usual. If you’re not sure how to get and call loggers, check out the sample script in the production-readiness pattern I mentioned above. It has some code you can copy/paste.

Technically you could do this in raw Python if you set up your loggers right, but you’d basically be re-implementing what the python-json-logger library already does so I don’t recommend that approach.

I recommend switching to logs structured as JSON. It can reduce failures and complexity in log processing, and the output is cleaner overall.

Happy automating!

Adam

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

Replace Conditions with Instructions

Hello!

This pattern is loosely based on Martin Fowler’s article on Adaptive Models.

I often find that my code needs to handle a bunch of different cases. It can be tempting (but ultimately painful) to just write a bunch of if conditions to handle those cases. Imagine converting HTML files into markdown:

file_names = [
    'a.md',
    'a.html',
    'b.html'
]

def should_convert(file_name):
    extension = file_name.split('.')[1]
    if extension == 'md':
        return False
    elif extension == 'html':
        return True

def convert(file_name):
    print(f'{file_name} > converted_{file_name}')

if __name__ == '__main__':
    for file_name in file_names:
        if should_convert(file_name):
            convert(file_name)

We have to write logic that understands each case. This simplified example doesn’t look too bad, but in real life it’ll be worse.

Instead, I like to write an object that encodes the right action for each case and then look up that action as each case is processed:

should_convert = {
    'md': False,
    'html': True
}

file_names = [
    'a.md',
    'a.html',
    'b.html'
]

def convert(file_name):
    print(f'{file_name} > converted_{file_name}')

if __name__ == '__main__':
    for file_name in file_names:
        if should_convert[file_name.split('.')[1]]:
            convert(file_name)

Replacing the should_convert() function with a dictionary simplifies several things:

  • There’s less code to understand. We don’t have to read through implementation for every case, we just read through the cases.
  • To support new cases, we just update the dictionary. We don’t have to write new code.
  • Because we don’t have to write new code to support new cases, we don’t need to write tests to assert the new code works. The logic is the same, it just processes more instructions.

This also handles errors cleanly. Suppose we add a file with an extension we don’t know if we should convert (.doc) and run the script:

Traceback (most recent call last):
  File "./adaptive.py", line 18, in <module>
    if should_convert[file_name.split('.')[1]]:
KeyError: 'doc'

It’s immediately clear that it didn’t know if it should convert the .doc extension.

If you encode desired behavior into objects, you only need to write code that can follow those instructions. Here’s the principle I try to follow:

Conditions are for processing instructions, not for implementing them.

This has simplified a ton of my code. I highly recommend it.

Happy automating!

Adam

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

How to Upgrade DevOps Code to Python 3

Python 2 is going away! It’s time to upgrade.

You shouldn’t run anything in prod that’s not actively supported. If there are security flaws you won’t have a sure path to remediation. Start the upgrade now so you have time to finish before support ends.

In DevOps you’re not usually writing much raw Python. A helper lambda function. A little boto3 script. If you’re writing lots of code, you’re probably making a mistake and you should be looking for an existing tool that already implements whatever you’re doing (terraform, troposphere, Ansible, Salt, paramiko, whatever).

Because of that, migrating DevOps code to Python 3 is usually easy. There are guides and a conversion tool. I usually just switch my interpreter to 3 and fix errors until there aren’t any more. A few old features have been replaced with new ones that are worth adopting. Here are highlights from the easy migrations I’ve done (keep reading for the one that wasn’t easy):

  • Virtual environments are now in core as venv. You don’t need to install virtualenv anymore.
  • basestring was replaced with str.
  • Use the print() function instead of the print statement. Printing output isn’t usually ideal, and this may be a good opportunity to upgrade to logging.
  • ConfigParser was renamed to configparser (to match the Python convention).
  • mock is now in core as unittest.mock.
  • The new f-strings are awesome. format() and the other string formatters still work, so f-strings aren’t a migration requirement, but they make cleaner code and I recommend switching to them.

Like always, lint your code before you run it!

One migration I got into wasn’t simple: I’d bodged together a script from snippets of a library that used Python 2 sockets to implement ping so I could watch the gears turn inside the black boxes of AWS Security Groups. I got into the weeds of unicode and not-unicode strings and then decided to just live with Python 2.

If that story reminded you of any of your own code, I recommend you don’t try to migrate that code. Look for a tool that already implements whatever you’re trying to do, find a way not to need to do whatever you were doing, something. In my case, that script wasn’t part of delivering product. I was just hacking around. I finished my experiments and deleted the script.

Happy upgrading!

Adam

Need more than just this article? We’re available to consult.

You might also want to check out these related articles:

Python DevOps Code Error Checking: Lint with Pyflakes

Hello!

For those unfamiliar with linting (static analysis), read Dan Bader’s introduction.

There are several linters for Python, but when I’m doing DevOps I use Pyflakes. I love the opening sentence of its design principals:

Pyflakes makes a simple promise: it will never complain about style, and it will try very, very hard to never emit false positives.

I’m not generally rigid about style. And, when I enforce it, I use the code review process and not a static analysis tool. The Python interpreter doesn’t care about style. Style is for humans; humans are the best tools to analyze it. Linters turn what should be a human process into something robotic.

Style is especially hard to enforce in DevOps, where you’re often working with a mix of languages and frameworks and tools that all have different style conventions. For example, lots of folks use Chef in AWS. Chef is a Ruby framework. They also need lambda helper functions, but lambda doesn’t support Ruby so they write those functions in Python and now half their code is Ruby and half is Python. And that’s if you ignore all the HCL in their terraform modules… You can go insane trying to configure your linters to keep up with the variation.

More than that, in DevOps you’re not usually writing much code. A helper lambda function. A little boto3 script. If you’re writing lots of code, you’re probably making a mistake and you should be looking for an existing tool that already implements whatever you’re doing (terraform, troposphere, Ansible, Salt, paramiko, whatever).

Pyflakes is great because it catches syntax errors before execution time but won’t suck you in to The Bog of Style Sorrow. It’ll quickly tell you if you misplaced a quote mark, and then it exits. So if you do this:

bad_variable = 'Oops I forgot to close the string.

You get an error:

pyflakes test.py
test.py:1:51: EOL while scanning string literal
bad_variable = 'Oops I forgot to close the string.
                                                  ^

You also get some handy stuff like checking for unused imports. So if you do this:

import logging
good_variable = 'Huzzah! I remembered to close the string.'

You also get an error:

pyflakes test.py
test.py:1: 'logging' imported but unused

Crucially, Pyflakes will pass if you do this:

list_style_one = ['a', 'b']
list_style_two = [ 'a', 'b' ]

It’s a little funky to do both those patterns right next to each other, and if I were writing that code myself I’d fix it, but I don’t want my linter to error. The code works fine and I can read it easily. I prefer consistency, but not to the point that I want a robot to generate build failures.

I recommend running Pyflakes on all your Python DevOps code because it’s a quick win. Pretty much anything it errors on you should fix before you try to use the code, and it’s usually faster to run Pyflakes than to deploy a new version of the code and see if it works. I like things that are fast. 😁

Happy automating!

Adam

Need more than just this article? We’re available to consult.

You might also want to check out these related articles: