In a previous article, I showed how to make AWS lambda functions log JSON objects to CloudWatch (⬅️ start there if you’re new to JSON logs). The pattern in that post had a flaw: it didn’t pass the aws_request_id. I was writing small functions to glue together bits of deployment automation and I didn’t need it. It was easy to correlate logs just with timestamps. Not every case is that simple. Sometimes you need that ID. Fortunately, Paulie Pena IV gave me some tips on how to pass it through.
We can look up the aws_request_id in the lambda function’s context object. To add it to log events, the python-json-logger library supports custom fields. To add a field for all log events, we need to subclass jsonlogger.JsonFormatter. Then we can save the ID into the new field and resolve it in the format string. Here’s the code from before with that change:
import logging
from pythonjsonlogger import jsonlogger
class CustomJsonFormatter(jsonlogger.JsonFormatter):
def __init__(self, *args, **kwargs):
self.aws_request_id = kwargs.pop('aws_request_id')
super().__init__(*args, **kwargs)
def add_fields(self, log_record, record, message_dict):
super().add_fields(log_record, record, message_dict)
log_record['aws_request_id'] = self.aws_request_id
def setup_logging(log_level, aws_request_id):
logger = logging.getLogger()
# Testing showed lambda sets up one default handler. If there are more,
# something has changed and we want to fail so an operator can investigate.
assert len(logger.handlers) == 1
logger.setLevel(log_level)
json_handler = logging.StreamHandler()
formatter = CustomJsonFormatter(
fmt='%(aws_request_id)s %(asctime)s %(levelname)s %(name)s %(message)s',
aws_request_id=aws_request_id
)
json_handler.setFormatter(formatter)
logger.addHandler(json_handler)
logger.removeHandler(logger.handlers[0])
def lambda_handler(event, context):
setup_logging(logging.DEBUG, context.aws_request_id)
logger = logging.getLogger()
logger.info('Huzzah!')
It takes a few pieces to assemble a working lambda action for CodePipeline. I like to start from a simple example and build up to what I need. Here’s the code I use as a starting point.
First, a few notes:
My pipeline lambda functions are usually small, often only a few dozen lines (more than that is usually a signal that I’m implementing an anti-pattern). Because my resources are small, I just drop the code into CloudFormation’s ZipFile. That saves me from building a package. In more complex cases you may want to expand your lambda function resource.
One of the most common problems I see in lambda action development is unhandled exceptions. Read more in my article here.
This example focuses on the minimum resources, permissions, and code for a healthy lambda action. I skipped some of the usual good practices like template descriptions and parameterized config.
I put the S3 Bucket and CloudWatch Logs Log Group in the same template as the function so it was easy to see for this example. Usually I put them in a separate template because they don’t share the same lifecycle. I don’t want rollbacks or reprovisions to delete my artifacts or logs.
My demo function doesn’t do anything with the pipeline artifact, it just logs the user parameter string passed to it. When I’m using custom actions like these it’s often for non-artifact tasks like passing notifications to outside systems and this is all I need.
You’ll have to upload a file as my_artifact to the bucket this creates so the pipeline’s source action has something to pull. The bucket will be named for your account ID and region to prevent collisions with other people’s bucket (S3’s namespace is global to all AWS customers).
Today’s post is a little “gotcha” that sometimes still gets me when I’m developing AWS lambda functions: if you want to stream the function’s logs to CloudWatch the log group’s name has to follow a specific convention.
Suppose I’m creating a lambda function with this CloudFormation snippet:
The IAM role attached to your function of course still needs permissions to send logs, and there’s another gotcha there that can lead to orphaned log groups.
It takes a few pieces to assemble a working CloudFormation Custom Resource. I like to start from a simple example and build up to what I need. Here’s the code I use as a starting point.
First, a few notes:
My custom resources are usually small, often only a few dozen lines (more than that is usually a signal that I’m implementing an anti-pattern). Because my resources are small, I just drop the code into CloudFormation’s ZipFile. That saves me from building a package and from porting my own version of cfn-response. In more complex cases you may want to expand your lambda function resource.
When I tested the code in this article, the current version of AWS’s cfn-response module threw this warning into logs:
DeprecationWarning: You are using the put() function from 'botocore.vendored.requests'
The vendored version of requests in botocore was never really meant for use outside of botocore itself, and it was recently deprecated. AWS knows about this, but they haven’t updated their cfn-response code yet.
One of the most common problems I see in custom resource development is unhandled exceptions. Read more in my article here.
This example focuses on the minimum resources, permissions, and code for a healthy custom resource. I skipped some of the usual good practices like template descriptions and parameterized config.
I put the CloudWatch Logs Log Group in the same template as the custom resource so it was easy to see for this example. Usually I put them in a separate template because they don’t share the same lifecycle. I don’t want rollbacks or reprovisions to delete my logs.
The custom resource Type can have several different values. Check out the details here.
I’ve seen this anti-pattern scattered around plenty DevOps code, especially in AWS lambda functions:
from botocore.vendored import requests
Vendoring libraries like requests into other libraries like botocore is arguably an anti-pattern in general, but reaching in to botocore and importing it in your own code is definitely one. Here are some of the reasons:
The maintainers may un-vendor it. This just happened! In newer versions of botocore you can still import requests but all that’s left are some bits of the error handling system. If you upgrade botocore your imports will still work but you’ll get errors when you try to use requests. Like this in version 1.13.15:
>>> from botocore.vendored import requests
>>> print(requests.get('https://google.com'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'botocore.vendored.requests' has no attribute 'get'
I saw an import like this running in an AWS lambda function a few days ago and it worked but was showing deprecation warnings. When AWS upgrades it’ll break entirely.
The vendored version may be outdated and have security vulnerabilities that have been patched in newer versions. Even if maintainers realize there’s a vulnerability, if they know they’re not using a vulnerable module of the package they may still not upgrade. Unless you check every usage to ensure you’re not using vulnerable modules at the vendored version, you should assume you are.
The vendored package may have been customized. This shouldn’t happen, but I’ve seen it in other packages plenty of times. Once the code has been copied into the repo it’s super easy for someone to tweak it for convenience. Vendored packages may no longer behave how you expect.
Instead of relying on botocore’s vendoring, add requests to your dependency chain like usual. Add a line to your requirements.txt file or update your setup.py. To package code with dependencies for AWS lambda functions, check out this. To add dependencies to your setup.py check out this.
When you need to publish logs to CloudWatch (e.g. from a lambda function), you need an IAM role with access to CloudWatch. It’s tempting to use a simple policy like the one in the AWS docs. You might write a CloudFormation template like this:
Obviously, the role is too permissive: arn:aws:logs:*:*:*
But, there’s another problem: it grants logs:CreateLogGroup.
Here’s what happens:
Launch a stack from this template
Run demo-function
Because we granted it permission, demo-functionautomatically creates /aws/lambda/demo-function log group in CloudWatch Logs
Delete the stack
CloudFormation doesn’t delete the /aws/lambda/demo-function log group
CloudFormation doesn’t know about the function’s log group because it didn’t create that group, so it doesn’t know anything needs to be deleted. Unless an operator deletes it manually, it’ll live in the account forever.
It seems like we can fix that by having CloudFormation create the log group:
But, if the function still has logs:CreateLogGroup I’ve seen race conditions where the stack deletes the group before the lambda function and the function recreates that group before it gets deleted.
Plus, there aren’t any errors if you forget to define the group in CF. The stack launches. The lambda function runs. We even get logs, they’ll just be orphaned if we ever delete the stack.
That’s why it’s a problem to grant logs:CreateLogGroup. It allows lambda (or EC2 or whatever else is logging) to log into unmanaged groups.
All resources in AWS should be managed by CloudFormation (or terraform or whatever resource manager you use). Including log groups. So, you should never grant logs:CreateLogGroup except to your resource manager. Nothing else should need that permission.
And that’s the other reason: lambda doesn’t need logs:CreateLogGroup because it should be logging to groups that already exist. You shouldn’t grant permissions that aren’t needed.
Here’s the best practice: always manage your CloudWatch Logs groups and never grant permission to create those groups except to your resource manager.
#!/usr/bin/env bash
# Builds a lambda package from a single Python 3 module with pip dependencies.
# This is a modified version of the AWS packaging instructions:
# https://docs.aws.amazon.com/lambda/latest/dg/lambda-python-how-to-create-deployment-package.html#python-package-dependencies
# https://stackoverflow.com/a/246128
SCRIPT_DIRECTORY="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
pushd $SCRIPT_DIRECTORY > /dev/null
rm -rf .package function.zip
mkdir .package
pip install --target .package --requirement requirements.txt
pushd .package > /dev/null
zip --recurse-paths ${SCRIPT_DIRECTORY}/function.zip .
popd > /dev/null
zip --grow function.zip function.py
popd > /dev/null
Pip-Installable Python Packages
In lambda with Python 2 this used to be harder, but in lambda with Python 3 it’s as easy as a flat module. If you’re still on Python 2, it’s time to upgrade.
File tree:
.
├── function
│ ├── __init__.py
│ └── function.py (defines a 'lambda_handler' function)
├── package.sh
└── setup.py
#!/usr/bin/env bash
# Builds a lambda package from a pip-installable Python 3 package.
# This is a modified version of the AWS packaging instructions:
# https://docs.aws.amazon.com/lambda/latest/dg/lambda-python-how-to-create-deployment-package.html#python-package-dependencies
# https://stackoverflow.com/a/246128
SCRIPT_DIRECTORY="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
pushd $SCRIPT_DIRECTORY > /dev/null
rm -rf .package function.zip
mkdir .package
pip install --target .package .
pushd .package > /dev/null
zip --recurse-paths ${SCRIPT_DIRECTORY}/function.zip .
popd > /dev/null
popd > /dev/null
That’s it! In the latest versions of lambda it’s pretty easy, you just have to do a little fiddling to make the scripts easy to use and to figure out the handler paths.
This way, your log processor can reference keys in a JSON object instead of splitting strings and hoping the split works right. Modern log processors are pretty good with delimited strings, but I’ve still seen them break because of things like unexpected whitespace or messages that happen to contain delimiter characters.
CloudWatch Logs reads JSON out of box. To send it JSON from a Python lambda function, we need to do to two things:
Configure a JSON handler for the root logger. I use the python-json-logger library.
First, the full example. Then the details. Here’s the code:
import logging
from pythonjsonlogger import jsonlogger
def setup_logging(log_level):
logger = logging.getLogger()
# Testing showed lambda sets up one default handler. If there are more,
# something has changed and we want to fail so an operator can investigate.
assert len(logger.handlers) == 1
logger.setLevel(log_level)
json_handler = logging.StreamHandler()
formatter = jsonlogger.JsonFormatter(
fmt='%(asctime)s %(levelname)s %(name)s %(message)s'
)
json_handler.setFormatter(formatter)
logger.addHandler(json_handler)
logger.removeHandler(logger.handlers[0])
def lambda_handler(event, context):
setup_logging(logging.DEBUG)
logger = logging.getLogger()
logger.info('Huzzah!')
For more details on what that assert is and how it works, check out this.
By default, CloudWatch renders this function’s JSON messages on one line:
But you can expand them into pretty-printed objects:
The expanded form is extra nice when you have large log messages or lots of fields.
Like we saw in my previous post, lambda’s out-of-box setup logs delimited strings. I figured out from testing that it does this with a handler on the root logger. If we added a JSON handler and did nothing else, we’d get duplicate log messages; one from each handler. We have to call removeHandler() on the default one.
To see how this works, let’s add debug messages to our setup function (and switch to logging.DEBUG level, of course):
...
def setup_logging(log_level):
logger = logging.getLogger()
# Testing showed lambda sets up one default handler. If there are more,
# something has changed and we want to fail so an operator can investigate.
assert len(logger.handlers) == 1
logger.setLevel(log_level)
json_handler = logging.StreamHandler()
formatter = jsonlogger.JsonFormatter(
fmt='%(asctime)s %(levelname)s %(name)s %(message)s'
)
json_handler.setFormatter(formatter)
logger.debug(f'Number of handlers: {len(logger.handlers)}')
logger.debug('Adding JSON handler.')
logger.addHandler(json_handler)
logger.debug(f'Number of handlers: {len(logger.handlers)}')
logger.debug('This message logs twice because there are two handlers.')
logger.debug('Removing original handler.')
logger.removeHandler(logger.handlers[0])
logger.debug('This message logs once because we removed the original handler.')
...
Now our output shows what’s happening with the handlers:
The default handler logs an aws_request_id that our JSON handler doesn’t. My functions didn’t need that ID. If yours do, check out this article.
Like we also saw in another previous post, one of the reasons to use Python’s logging library is it automatically captures messages from third-party tools like boto. We still get that:
I recommend switching to logs structured as JSON. It can reduce failures and complexity in log processing, and the output is cleaner overall.
If you’re new to CodePipeline lambda actions check out this complete example first.
There’s a gotcha when writing CodePipeline lambda functions that’s easy to miss and if you miss it your pipeline can get stuck in timeout loops that you can’t cancel. Here’s how to avoid that.
This article assumes you’re familiar with CodePipeline and lambda and that you’ve granted the right IAM permissions to both. You may also want to check out lambda function logging.
This is Python 3. Python 2 is out of support.
CodePipeline uses a callback pattern for running lambda functions: it invokes the function and then waits for that function to call back with either put_job_success_result or put_job_failure_result.
But, the pipeline action takes 20 minutes to time out. The CodePipeline limits doc says it takes 1 hour for lambda functions to time out and that used to apply to functions that didn’t send results, I tested it. Sadly, I didn’t think to keep screenshots back then. In my latest tests it took 20 minutes:
It doesn’t matter what the lambda function’s timeout is. Mine was set to 3 seconds. We’re hitting a timeout that’s internal to CodePipeline.
At least the action’s details link give an error saying specifically that it didn’t receive a result:
There’s a workaround. You should usually only catch specific errors that you know how to handle. It’s an anti-pattern to use except Exception. But, in this case we need to guarantee that the callback always happens. In this one situation (not in general) we need to catch all exceptions:
(logger.exception(error) logs the exception and its stack trace. Even though we’re catching all errors, we shouldn’t let them pass silently.)
Now the failure will be visible to CodePipeline and the action won’t get stuck waiting.
The failureDetailsmessage will appear in the CodePipeline UI. We send the exception message so it’s visible to operators:
Of course, you’ll want to remove that ValueError. It’s just to demonstrate the handling.
You should use this pattern in every lambda action: catch all exceptions and return a JobFailed result to the pipeline. You can still catch more specific exceptions inside the catchall try/except, ones specific to the feature you’re implementing, but you need that catchall to ensure the result returns when the unexpected happens.
If you’re writing a regular script (i.e. not a lambda function), check out this article.
This pattern outputs traditional delimited strings. If you want to upgrade that into output structured as JSON objects, check out this article.
For those custom cases that don’t fit into Terraform or CloudFormation, a little bit of Python and some boto3 in a lambda function can save you. Lambda captures the output of both print() and logging.Logger calls into CloudWatch so it’s easy to log information about what your code is doing. When things go wrong, though, I often find that just the output I wrote doesn’t give me enough to diagnose the problem. In those cases, it’s helpful to see the log output both for your code and boto3. Here’s how you do that.
Use the logging library. It’s a Python core library that provides standard features like timestamped prefixes and support for levels (e.g. INFO or DEBUG). For simple deployment helpers this is usually all you need:
logger = logging.getLogger(logging.INFO)
logger.info('Message at the INFO level.')
logger.debug('Message at the DEBUG level.')
This sets the root logger (which sees all log messages) to the INFO level. Normally you’d have to configure the root logger, but lambda does that automatically (which is actually annoying if you need to change your formatter, but that’s for another post). Now, logger.info() calls will show up in the logs and logger.debug() calls won’t. If you increase the level to DEBUG you’ll see both.
Because logging is the standard Python way to handle log output, maintainers of libraries like boto3 use it throughout their code to show what the library is doing (and they’re usually smart about choosing what to log at each level). By setting a level on the root logger, you’re choosing which of your output to capture and which of boto3’s output to capture. Powerful when you’re diagnosing a failure.
Here’s a demo function to show how the output looks. You might notice that it puts the logger setup calls inside the handler even though the AWS docs tell you to put them under the import. Function calls made directly in modules (e.g. not inside functions declared within the module) are import-side effects and import side-effects are an anti-pattern. I put the calls in the handler so they only run when the handler is called. This isn’t likely to matter much in a lambda function, but I like to stick to good patterns.