Demystifying DevOps

My second week at 2nd Watch, it happened.  I was minding my own business, working to build a framework for our products and how they’d be released when suddenly, an email dinged into my inbox. I opened it and gasped. It read in sum:

  • We want to standardize on our DevOps tooling
  • We have this right now
  • We need to pick one from each bucket
  • We want to do this next week with everyone in the room, so we don’t create silos

This person sent me the CNCF Landscape and asked us to choose which tool to use from each area to enable “DevOps” in their org. Full stop.

I spent the better part of the day attempting to respond to the message without condescension or simply saying ‘no.’ I mean, the potential client had a point. He said they had a “planet of tools” and wanted to narrow it down to 1 for each area. A noble cause. They wanted a single resource from 2nd Watch who had used all the tools to help them make a decision. They had very honorable intentions while they were clearly misguided from the start.

The thing is, this situation hasn’t just come up once in my career in the DevOps space. It’s a rampant misunderstanding of the part tooling, automation, and standardization play into the DevOps transformation. Normalizing the tech stack across your enterprise wastes time and creates enemies. This is not DevOps.

In this blog, I’ll attempt to share my views on how to approach your transformation. Hint: We’re not spending time in the CNCF Landscape.

Let’s start with a definition.

DevOps is a set of cultural values and organizational practices that improve business outcomes by increasing collaboration and feedback between Business Stakeholders, Development, QA, IT Operations, and Security.

A true DevOps transformation includes an evolution of your company culture, automation and tooling, processes, collaboration, measurement systems, and organizational structure.

DevOps does not have an ‘end state’ or level of maturity. The goal is to create a culture of continuous improvement.

DevOps methods were initially formed to bridge the gap between Development and Operations so that teams could increase speed to delivery as well as quality of product at the same time. So, it does seem fair that many feel a tool can bridge this gap. However, without increasing communication and shaping company culture to enable reflection and improvement, tooling ends up being a one-time fix for a consistently changing and evolving challenge. Imagine trying to patch a hole in a boat with the best patch available while the rest of the crew is poking holes in the hull. That’s a good way to imagine how implementing tooling to fix cultural problems works.

Simply stated – No, you cannot implement DevOps by standardizing on tooling.

Why is DevOps so difficult?

Anytime you have competing priorities between departments you end up with conflict and difficulty. Development is focused on speed and new features while Operations is required to keep the systems up and running. The two teams have different goals, some of which undo each other.

For example: As the Dev team is trying to get more features to market faster, the Ops team is often causing a roadblock so that they can keep to their Service Level Agreement (SLA). An exciting new feature delivered by Dev may be unstable, causing downtime in Ops. When there isn’t a shared goal of uptime and availability, both teams suffer.

CHART: Dev vs. Ops

Breaking Down Walls

Many equate the goal of conflict between Dev and Ops to a wall. It’s a barrier between the two teams, and oftentimes Dev, in their effort to increase velocity, will throw the new feature or application over the proverbial wall to Ops and expect them to deal with running the app at scale.

Companies attempt to fix this problem by increasing communications between the two teams, though, without shared responsibility and goals, the “fast to market” Dev team and the “keep the system up and running” Ops team are still at odds.

CI/CD ≠ DevOps

I’ll admit, technology can enable your DevOps transformation, though don’t be fooled by the marketing that suggests simply implementing continuous integration and continuous delivery (CI/CD) tooling will kick-start your DevOps journey. Experience shows us that implementing automation without changing culture just results in speeding up our time to failure.

CI/CD can enable better communication, automate the toil or rework that exists, and make your developers and ops team happier. Though, it takes time and effort to train them on the tooling, evaluate processes and match or change your companies’ policies, and integrate security. One big area that is often overlooked is executive leadership acceptance and support of these changes.

The Role of Leadership in DevOps

The changes implemented during a DevOps Transformation all have a quantifiable impact on a company’s profit, productivity and market share. It is no wonder that organizational leaders and executives have impact on these changes. The challenge is their impact is often overlooked up front in the process causing a transformation to stall considerably and producing mixed results.

According to the book Accelerate: Building and Scaling High Performing Technology Organizations by Nicole Forsgren, PhD, Jez Humble, and Gene Kim, the “characteristics of transformational leadership are highly correlated with software delivery performance.” Leaders need to up their game by enabling cross-functional collaboration, building a climate of learning, and supporting effective use and choice of tools (not choosing for the team like we saw at the beginning of this blog).

Leaders should be asking themselves upfront “how can I create a culture of learning and where do we start?” The answer to this question is different for every organization, though some similarities exist. One of the best ways to figure out where to start is to figure out where you are right now.

2nd Watch offers a 2-week DevOps Assessment and Strategy engagement that helps you identify and assess a single value stream to pinpoint the areas for growth and transformation. This is a good start to identify key areas for improvement. This assessment follows the CALMS framework originally developed by Damon Edwards and John Willis at DevOpsDays 2010 and then enhanced by Jez Humble. It’s an acronym representing the fundamental elements of DevOps:  Culture, Automation, Lean, Metrics, and Sharing.

  • Culture – a culture of shared responsibility, owned by everyone
  • Automation – eliminating toil, make automation continuous
  • Lean – visualize work in progress, limit batch sizes and manage queue lengths
  • Metrics – collect data on everything and provide visibility into systems and data
  • Sharing – encourage ongoing communication, build channels to enable this

While looking at each of these pillars we overlay the people, process and technology aspects to ensure full coverage of the methodology.

  • People:We assess your current team organizational structure, culture, and the role security plays. We also assess what training may be required to get your team off on the right foot. ​
  • Process:We review your collaboration methods, processes and best practices. ​
  • Technology:We identify any gaps in your current tool pipeline including image creation, CI/CD, monitoring, alerting, log aggregation, security and compliance and configuration management. ​

What’s Next?

DevOps is over 10 years old and it’s moving out of the phase of concept and ‘nice to have’ into necessity for companies to keep up in the digital economy. New technology like serverless, containers, and machine learning and a focus on security is shifting the landscape, making it more essential that companies adopt the growth mindset expected in a DevOps culture.

Don’t be left behind. DevOps is no longer a fad diet, it’s a lifestyle change. And if you can’t get enough of Demystifying Devops, check out our webinar on ‘Demystifying Devops: Identifying Your Path to Faster Releases and Higher Quality’ on-demand.

-Stefana Muller, Sr Product Manager


Troubleshooting VMware HCX

I was on a project recently where we had to set up VMware HCX in an environment to connect the on-premises datacenter to VMware Cloud on AWS for a Proof of Concept migration.  The workloads were varied, ranging from 100 MB to 5TB in size.  The customer wanted to stretch two L2 subnets and have the ability to migrate slowly, in waves.  During the POC, we found problems with latency between the two stretched networks and decided that the best course of action would be to NOT stretch the networks and instead, do an all-at-once migration.

While setting up this POC, I had occasion to do some troubleshooting on HCX due to connectivity issues.  I’m going to walk through some of the troubleshooting I needed to do.

The first thing we did was enable SSH on the NSX manager.  To perform this action, you go into the HCX manager appliance GUI and under Appliance Summary, start the SSH service.  Once SSH is enabled, you can then login to the appliance CLI, which is where the real troubleshooting can begin.

You’ll want to login to the appliance using “admin” as the user name and the password entered when you installed the appliance.  SU to “root” and enter the “root” password.  This gives you access to the appliance, which has a limited set of Linux commands.

You’ll want to enter the HCX Central CLI (CCLI) to use the HCX commands.  Since you’re already logged in as “root,” you just type “ccli” at the command prompt.  After you’re in the CCLI, you can type “help” to get a list of commands.

One of the first tests to run would be the Health Checker. Type “hc” at the command prompt, and the HCX manager will run through a series of tests to check on the health of the environment.

“list” will give you a list of the HCX appliances that have been deployed.

You’ll want to connect to an appliance to run the commands specific to that appliance.  As shown above, if you want to connect to the Interconnect appliance, you would type “go 0,” which would connect you to node 0.  From here, you can run a ton of commands, such as “show ipsec status,” which will show a plethora of information related to the tunnel.  Type “q” to exit this command.

You can also run the Health Check on this node from here, export a support bundle (under the debug command), and a multitude of other “show” commands.  Under the “show” command, you can get firewall information, flow runtime information, and a lot of other useful troubleshooting info.

If you need to actually get on the node and run Linux commands for troubleshooting, you’ll enter “debug remoteaccess enable,” which enables SSH on the remote node.  Then you can just type “ssh” and it will connect you to the interconnect node.

Have questions about this process? Contact us or leave a reply.

-Michael Moore, Associate Cloud Consultant


Automating Windows Server 2016 Builds with Packer

It might not be found on a curated list of urban legends, but trust me, IT IS possible (!!!) to fully automate the building and configuration of Windows virtual machines for AWS.  While Windows Server may not be your first choice of operating systems for immutable infrastructure, sometimes you as the IT professional are not given the choice due to legacy environments, limited resources for re-platforming, corporate edicts, or lack of internal *nix knowledge.  Complain all you like, but after the arguing, fretting, crying, and gnashing of teeth is done, at some point you will still need to build an automated deployment pipeline that requires Windows Server 2016.  With some PowerShell scripting and HashiCorp Packer, it is relatively easy and painless to securely build and configure Windows AMIs for your particular environment.

Let’s dig into an example of how to build a custom configured Windows Server 2016 AMI.  You will need access to an AWS account and have sufficient permissions to create and manage EC2 instances.  If you need an AWS account, you can create one for free.

I am using VS Code and built-in terminal with Windows Subsystem for Linux to create the Packer template and run Packer, however Packer is available for several many Linux distros, Mac OS, and Windows.

First, download and unzip Packer:

~/packer-demo:\> wget 
Resolving (,,, ... 
Connecting to (||:443... connected. 
HTTP request sent, awaiting response... 200 OK 
Length: 28851840 (28M) [application/zip] 
Saving to: ‘’ 
‘’ saved [28851840/28851840] 
~/packer-demo:\> unzip 
 inflating: packer

Now that we have Packer unzipped, verify that it is executable by checking the version:

~/packer-demo:\> packer --version

Packer can build machine images for a number of different platforms.  We will focus on the amazon-ebs builder, which will create an EBS-backed AMI.  At a high level, Packer performs these steps:

  1. Read configuration settings from a json file
  2. Uses the AWS API to stand up an EC2 instance
  3. Connect to the instance and provision it
  4. Shut down and snapshot the instance
  5. Create an Amazon Machine Image (AMI) from the snapshot
  6. Clean up the mess

The amazon-ebs builder can create temporary keypairs, security group rules, and establish a basic communicator for provisioning.  However, in the interest of tighter security and control, we will want to be prescriptive for some of these settings and use a secure communicator to reduce the risk of eavesdropping while we provision the machine.

There are two communicators Packer uses to upload scripts and files to a virtual machine: SSH (the default) and WinRM.  While there is a nice Win32 port of OpenSSH for Windows, it is not currently installed by default on Windows machines, but WinRM is available natively in all current versions of Windows, so we will use that to provision our Windows Server 2016 machine.

Let’s create and edit the userdata file that Packer will use to bootstrap the EC2 instance:

Set-ExecutionPolicy -ExecutionPolicy Unrestricted -Scope LocalMachine -Force -ErrorAction Ignore
$ErrorActionPreference = "stop"
# Remove any existing Windows Management listeners
Remove-Item -Path WSMan:\Localhost\listener\listener* -Recurse
# Create self-signed cert for encrypted WinRM on port 5986
$Cert = New-SelfSignedCertificate -CertstoreLocation Cert:\LocalMachine\My -DnsName "packer-ami-builder"
New-Item -Path WSMan:\LocalHost\Listener -Transport HTTPS -Address * -CertificateThumbPrint $Cert.Thumbprint -Force
# Configure WinRM
cmd.exe /c winrm quickconfig -q
cmd.exe /c winrm set "winrm/config" '@{MaxTimeoutms="1800000"}'
cmd.exe /c winrm set "winrm/config/winrs" '@{MaxMemoryPerShellMB="1024"}'
cmd.exe /c winrm set "winrm/config/service" '@{AllowUnencrypted="false"}'
cmd.exe /c winrm set "winrm/config/client" '@{AllowUnencrypted="false"}'
cmd.exe /c winrm set "winrm/config/service/auth" '@{Basic="true"}'
cmd.exe /c winrm set "winrm/config/client/auth" '@{Basic="true"}'
cmd.exe /c winrm set "winrm/config/service/auth" '@{CredSSP="true"}'
cmd.exe /c winrm set "winrm/config/listener?Address=*+Transport=HTTPS" "@{Port=`"5986`";Hostname=`"packer-ami-builder`";CertificateThumbprint=`"$($Cert.Thumbprint)`"}"
cmd.exe /c netsh advfirewall firewall add rule name="WinRM-SSL (5986)" dir=in action=allow protocol=TCP localport=5986
cmd.exe /c net stop winrm
cmd.exe /c sc config winrm start= auto
cmd.exe /c net start winrm

There are four main things going on here:

  1. Set the execution policy and error handling for the script (if an error is encountered, the script terminates immediately)
  2. Clear out any existing WS Management listeners, just in case there are any preconfigured insecure listeners
  3. Create a self-signed certificate for encrypting the WinRM communication channel, and then bind it to a WS Management listener
  4. Configure WinRM to:
    • Require an encrypted (SSL) channel
    • Enable basic authentication (usually this not secure as the password goes across in plain text, but we are forcing encryption)
    • Configure the listener to use port 5986 with the self-signed certificate we created earlier
    • Add a firewall rule to open port 5986

Now that we have userdata to bootstrap WinRM, let’s create a Packer template:

~/packer-demo:\> touch windows2016.json

Open this file with your favorite text editor and add this text:

    "variables": {
        "build_version": "{{isotime \"2006.01.02.150405\"}}",
        "aws_profile": null,
        "vpc_id": null,
        "subnet_id": null,
        "security_group_id": null
    "builders": [
            "type": "amazon-ebs",
            "region": "us-west-2",
            "profile": "{{user `aws_profile`}}",
            "vpc_id": "{{user `vpc_id`}}",
            "subnet_id": "{{user `subnet_id`}}",
            "security_group_id": "{{user `security_group_id`}}",
            "source_ami_filter": {
                "filters": {
                    "name": "Windows_Server-2016-English-Full-Base-*",
                    "root-device-type": "ebs",
                    "virtualization-type": "hvm"
                "most_recent": true,
                "owners": [
            "ami_name": "WIN2016-CUSTOM-{{user `build_version`}}",
            "instance_type": "t3.xlarge",
            "user_data_file": "userdata.ps1",
            "associate_public_ip_address": true,
            "communicator": "winrm",
            "winrm_username": "Administrator",
            "winrm_port": 5986,
            "winrm_timeout": "15m",
            "winrm_use_ssl": true,
            "winrm_insecure": true

Couple of things to call out here.  First, the variables block at the top references some values that are needed in the template.  Insert values here that are specific to your AWS account and VPC:

  • aws_profile: I use a local credentials file to store IAM user credentials (the file is shared between WSL and Windows). Specify the name of a credential block that Packer can use to connect to your account.  The IAM user will need permissions to create and modify EC2 instances, at a minimum
  • vpc_id: Packer will stand up the instance in this VPC
  • aws_region: Your VPC should be in this region. As an exercise, change this value to be set by a variable instead
  • user_data_file: We created this file earlier, remember? If you saved it in another location, make sure the path is correct
  • subnet_id: This should belong to the VPC specified above. Use a public subnet if needed if you do not have a Direct Connect
  • security_group_id: This security group should belong to the VPC specified above. This security group will be attached to the instance that Packer stands up.  It will need, at a minimum, inbound TCP 5986 from where Packer is running

Next, let’s validate the template to make sure the syntax is correct, and we have all required fields:

~/packer-demo:\> packer validate windows2016.json
Template validation failed. Errors are shown below.
Errors validating build 'amazon-ebs'. 1 error(s) occurred:
* An instance_type must be specified

Whoops, we missed instance_type.  Add that to the template and specify a valid EC2 instance type.  I like using beefier instance types so that the builds get done quicker, but that’s just me.

            "ami_name": "WIN2016-CUSTOM-{{user `build_version`}}",
            "instance_type": "t3.xlarge",
            "user_data_file": "userdata.ps1",

Now validate again:

~/packer-demo:\> packer validate windows2016.json
Template validated successfully.

Awesome.  Couple of things to call out in the template:

  • source_ami_filter: we are using a base Amazon AMI for Windows Server 2016 server. Note the wildcard in the AMI name (Windows_Server-2016-English-Full-Base-*) and the owner (801119661308).  This filter will always pull the most recent match for the AMI name.  Amazon updates its AMIs about once a month
  • associate_public_ip_address: I specify true because I don’t have Direct Connect or VPN from my workstation to my sandbox VPC. If you are building your AMI in a public subnet and want to connect to it over the internet, do the same
  • ami_name: this is a required property, and it must also be unique within your account. We use a special function here (isotime) to generate a timestamp, ensuring that the AMI name will always be unique (e.g. WIN2016-CUSTOM-2019.03.01.000042)
  • winrm_port: the default unencrypted port for WinRM is 5985. We disabled 5985 in userdata and enabled 5986, so be sure to call it out here
  • winrm_use_ssl: as noted above, we are encrypting communication, so set this to true
  • winrm_insecure: this is a rather misleading property name. It really means that Packer should not check if encryption certificate is trusted.  We are using a self-signed certificate in userdata, so set this to true to skip certificate validation

Now that we have our template, let’s inspect it to see what will happen:

~/packer-demo:\> packer inspect windows2016.json
Optional variables and their defaults:
  aws_profile       = default
  build_version     = {{isotime "2006.01.02.150405"}}
  security_group_id = sg-0e1ca9ba69b39926
  subnet_id         = subnet-00ef2a1df99f20c23
  vpc_id            = vpc-00ede10ag029c31e0
Note: If your build names contain user variables or template
functions such as 'timestamp', these are processed at build time,
and therefore only show in their raw form here.

Looks good, but we are missing a provisioner!  If we ran this template as is, all that would happen is Packer would make a copy of the base Windows Server 2016 AMI and make you the owner of the new private image.

There are several different Packer provisioners, ranging from very simple (the windows-restart provisioner just reboots the machine) to complex (the chef client provisioner installs chef client and does whatever Chef does from there).  We’ll run a basic powershell provisioner to install IIS on the instance, reboot the instance using windows-restart, and then we’ll finish up the provisioning by executing sysprep on the instance to generalize it for re-use.

We will use the PowerShell cmdlet Enable-WindowsOptionalFeature to install the web server role and IIS with defaults, add this text to your template below the builders [ ] section:

"provisioners": [
            "type": "powershell",
            "inline": [
                "Enable-WindowsOptionalFeature -Online -FeatureName IIS-WebServerRole",
                "Enable-WindowsOptionalFeature -Online -FeatureName IIS-WebServer"
            "type": "windows-restart",
            "restart_check_command": "powershell -command \"& {Write-Output 'Machine restarted.'}\""
            "type": "powershell",
            "inline": [
                "C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\InitializeInstance.ps1 -Schedule",
                "C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\SysprepInstance.ps1 -NoShutdown"

Couple things to call out about this section:

  • The first powershell provisioner uses the inline Packer simply appends these lines in order into a file, then transfers and executes the file on the instance using PowerShell
  • The second provisioner, windows-restart, simply reboots the machine while Packer waits. While this isn’t always necessary, it is helpful to catch instances where settings do not persist after a reboot, which was probably not your intention
  • The final powershell provisioner executes two PowerShell scripts that are present on Amazon Windows Server 2016 AMIs and part of the EC2Launch application (earlier versions of Windows use a different application called EC2Config). They are helper scripts that you can use to prepare the machine for generalization, and then execute sysprep

After validating your template again, let’s build the AMI!

~/packer-demo:\> packer validate windows2016.json
Template validated successfully.
~/packer-demo:\> packer build windows2016.json
amazon-ebs output will be in this color.
==> amazon-ebs: Prevalidating AMI Name: WIN2016-CUSTOM-2019.03.01.000042
    amazon-ebs: Found Image ID: ami-0af80d239cc063c12
==> amazon-ebs: Creating temporary keypair: packer_5c78762a-751e-2cd7-b5ce-9eabb577e4cc
==> amazon-ebs: Launching a source AWS instance...
==> amazon-ebs: Adding tags to source instance
    amazon-ebs: Adding tag: "Name": "Packer Builder"
    amazon-ebs: Instance ID: i-0384e1edca5dc90e5
==> amazon-ebs: Waiting for instance (i-0384e1edca5dc90e5) to become ready...
==> amazon-ebs: Waiting for auto-generated password for instance...
    amazon-ebs: It is normal for this process to take up to 15 minutes,
    amazon-ebs: but it usually takes around 5. Please wait.
    amazon-ebs: Password retrieved!
==> amazon-ebs: Using winrm communicator to connect:
==> amazon-ebs: Waiting for WinRM to become available...
    amazon-ebs: WinRM connected.
    amazon-ebs: #> CLIXML
    amazon-ebs: System.Management.Automation.PSCustomObjectSystem.Object1Preparing modules for first 
use.0-1-1Completed-1 1Preparing modules for first use.0-1-1Completed-1 
==> amazon-ebs: Connected to WinRM!
==> amazon-ebs: Provisioning with Powershell...
==> amazon-ebs: Provisioning with powershell script: /tmp/packer-powershell-provisioner561623209
    amazon-ebs: Path          :
    amazon-ebs: Online        : True
    amazon-ebs: RestartNeeded : False
    amazon-ebs: Path          :
    amazon-ebs: Online        : True
    amazon-ebs: RestartNeeded : False
==> amazon-ebs: Restarting Machine
==> amazon-ebs: Waiting for machine to restart...
    amazon-ebs: Machine restarted.
    amazon-ebs: EC2AMAZ-LJV703F restarted.
    amazon-ebs: #> CLIXML
    amazon-ebs: System.Management.Automation.PSCustomObjectSystem.Object1Preparing modules for first 
==> amazon-ebs: Machine successfully restarted, moving on
==> amazon-ebs: Provisioning with Powershell...
==> amazon-ebs: Provisioning with powershell script: /tmp/packer-powershell-provisioner928106343
    amazon-ebs: TaskPath                                       TaskName                          State
    amazon-ebs: --------                                       --------                          -----
    amazon-ebs: \                                              Amazon Ec2 Launch - Instance I... Ready
==> amazon-ebs: Stopping the source instance...
    amazon-ebs: Stopping instance, attempt 1
==> amazon-ebs: Waiting for the instance to stop...
==> amazon-ebs: Creating unencrypted AMI WIN2016-CUSTOM-2019.03.01.000042 from instance i-0384e1edca5dc90e5
    amazon-ebs: AMI: ami-0d6026ecb955cc1d6
==> amazon-ebs: Waiting for AMI to become ready...
==> amazon-ebs: Terminating the source AWS instance...
==> amazon-ebs: Cleaning up any extra volumes...
==> amazon-ebs: No volumes to clean up, skipping
==> amazon-ebs: Deleting temporary keypair...
Build 'amazon-ebs' finished.
==> Builds finished. The artifacts of successful builds are:
==> amazon-ebs: AMIs were created:
us-west-2: ami-0d6026ecb955cc1d6

The entire process took approximately 5 minutes from start to finish.  If you inspect the output, you can see where the first PowerShell provisioner installed IIS (Provisioning with powershell script: /tmp/packer-powershell-provisioner561623209), where sysprep was executed (Provisioning with powershell script: /tmp/packer-powershell-provisioner928106343) and where Packer yielded the AMI ID for the image (ami-0d6026ecb955cc1d6) and cleaned up any stray artifacts.

Note: most of the issues that occur during a Packer build are related to firewalls and security groups.  Verify that the machine that is running Packer can reach the VPC and EC2 instance using port 5986.  You can also use packer build -debug win2016.json to step through the build process manually, and you can even connect to the machine via RDP to troubleshoot a provisioner.

Now, if you launch a new EC2 instance using this AMI, it will already have IIS installed and configured with defaults.  Building and securely configuring Windows AMIs with Packer is easy!

For help getting started securely building and configuring Windows AMIs for your particular environment, contact us.

-Jonathan Eropkin, Cloud Consultant


How to use waiters in boto3 (And how to write your own!)

What is boto3?

Boto3 is the python SDK for interacting with the AWS api. Boto3 makes it easy to use the python programming language to manipulate AWS resources and automation infrastructure.

What are boto3 waiters and how do I use them?

A number of requests in AWS using boto3 are not instant. Common examples of boto3 requests are deploying a new server or RDS instance. For some long running requests, we are ok to initiate the request and then check for completion at some later time. But in many cases, we want to wait for the request to complete before we move on to the subsequent parts of the script that may rely on a long running process to have been completed. One example would be a script that might copy an AMI to another account by sharing all the snapshots. After sharing the snapshots to the other account, you would need to wait for the local snapshot copies to complete before registering the AMI in the receiving account. Luckily a snapshot completed waiter already exists, and here’s what that waiter would look like in Python:

As far as the default configuration for the waiters and how long they wait, you can view the information in the boto3 docs on waiters, but it’s 600 seconds in most cases. Each one is configurable to be as short or long as you’d like.

Writing your own custom waiters.

As you can see, using boto3 waiters is an easy way to setup a loop that will wait for completion without having to write the code yourself. But how do you find out if a specific waiter exists? The easiest way is to explore the particular boto3 client on the docs page and check out the list of waiters at the bottom. Let’s walk through the anatomy of a boto3 waiter. The waiter is actually instantiated in botocore and then abstracted to boto3. So looking at the code there we can derive what’s needed to generate our own waiter:

  1. Waiter Name
  2. Waiter Config
    1. Delay
    2. Max Attempts
    3. Operation
    4. Acceptors
  3. Waiter Model

The first step is to name your custom waiter. You’ll want it to be something descriptive and, in our example, it will be “CertificateIssued”. This will be a waiter that waits for an ACM certificate to be issued (Note there is already a CertificateValidated waiter, but this is only to showcase the creation of the waiter). Next we pick out the configuration for the waiter which boils down to 4 parts. Delay is the amount of time it will take between tests in seconds. Max Attempts is how many attempts it will try before it fails. Operation is the boto3 client operation that you’re using to get the result your testing. In our example, we’re calling “DescribeCertificate”. Acceptors is how we test the result of the Operation call. Acceptors are probably the most complicated portion of the configuration. They determine how to match the response and what result to return. Acceptors have 4 parts: Matcher, Expected, Argument, State.

  • State: This is what the acceptor will return based on the result of the matcher function.
  • Expected: This is the expected response that you want from the matcher to return this result.
  • Argument: This is the argument sent to the matcher function to determine if the result is expected.
  • Matcher: Matchers come in 5 flavors. Path, PathAll, PathAny, Status, and Error. The Status and Error matchers will effectively check the status of the HTTP response and check for an error respectively. They return failure states and short circuit the waiter so you don’t have to wait until the end of the time period when the command has already failed. The Path matcher will match the Argument to a single expected result. In our example, if you run DescribeCertificate you would get back a “Certificate.Status” as a result. Taking that as an argument, the desired expected result would be “ISSUED”. Notice that if the expected result is “PENDING_VALIDATION” we set the state to “retry” so it will continue to keep trying for the result we want. The PathAny/PathAll matchers work with operations that return a python list result. PathAny will match if any item in the list matches, PathAll will match if all the items in the list match.

Once we have the configuration complete, we feed this into the Waiter Model call the “create_waiter_with_client” request. Now our custom waiter is ready to wait. If you need more examples of waiters and how they are configured, check out the botocore github and poke through the various services. If they have waiters configured, they will be in a file called waiters-2.json. Here’s the finished code for our customer waiter.

And that’s it. Custom waiters will allow you to create automation in a series without having to build redundant code of complicated loops. Have questions about writing custom waiters or boto3? Contact us

-Coin Graham, Principal Cloud Consultant


Using Docker Containers to Move Your Internal IT Orgs Forward

Many people are looking to take advantage of containers to isolate their workloads on a single system. Unlike traditional hypervisor-based virtualization, which utilizes the same operating system and packages, Containers allow you to segment off multiple applications with their own set of processes on the same instance.

Let’s walk through some grievances that many of us have faced at one time or another in our IT organizations:

Say, for example, your development team is setting up a web application. They want to set up a traditional 3 tier system with an app, database, and web servers. They notice there is a lot of support in the open source community for their app when it is run on Ubuntu Trusty (Ubuntu 14.04 LTS) and later. They’ve developed the app in their local sandbox with an Ubuntu image they downloaded, however, their company is a RedHat shop.

Now, depending on the type of environment you’re in, chances are you’ll have to wait for the admins to provision an environment for you. This often entails (but is not limited to) spinning up an instance, reviewing the most stable version of the OS, creating a new hardened AMI, adding it to Packer, figuring out which configs to manage, and refactoring provisioning scripts to utilize aptitude and Ubuntu’s directory structure (e.g Debian has over 50K packages to choose from and manage). In addition to that, the most stable version of Ubuntu is missing some newer packages that you’ve tested in your sandbox that need to be pulled from source or another repository. At this point, the developers are procuring configuration runbooks to support the app while the admin gets up to speed with the OS (not significant but time-consuming nonetheless).

You can see my point here. A significant amount of overhead has been introduced, and it’s stagnating development. And think about the poor sysadmins. They have other environments that they need to secure, networking spaces to manage, operations to improve, and existing production stacks they have to monitor and support while getting bogged down supporting this app that is still in the very early stages of development. This could mean that mission-critical apps are potentially losing visibility and application modernization is stagnating. Nobody wins in this scenario.

Now let us revisit the same scenario with containers:

I was able to run my Jenkins build server and an NGINX web proxy, both running on a hardened CentOS7 AMI provided by the Systems Engineers with docker installed.  From there I executed a docker pull  command pointed at our local repository and deployed two docker images with Debian as the underlying OS.

$ docker pull
$ docker pull


$ docker ps

7478020aef37   “/sbin/tini — /us …”  16 minutes ago   Up 16 minutes ago  8080/tcp,>80/tcp, 50000/tcp jenkins

d68e3b96071e “nginx -g ‘daemon of…” 16 minutes ago Up 16 minutes>80/tcp,>443/tcp nginx

$ sudo systemctl status jenkins-docker

jenkins-docker.service – Jenkins
Loaded: loaded (/etc/systemd/system/jenkins-docker.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2018-11-08 17:38:06 UTC; 18min ago
Process: 2006 ExecStop=/usr/local/bin/jenkins-docker stop (code=exited, status=0/SUCCESS)

The processes above were executed on the actual instance. Note how I’m able to execute a cat of the OS release file from within the container

sudo docker exec d68e3b96071e cat /etc/os-release
PRETTY_NAME=”Debian GNU/Linux 9 (stretch)”
NAME=”Debian GNU/Linux”
VERSION=”9 (stretch)”

I was able to do so because Docker containers do not have their own kernel, but rather share the kernel of the underlying host via linux system calls (e.g setuid, stat, umount, ls) like any other application. These system calls (or syscalls for short) are standard across kernels, and Docker supports version 3.10 and higher. In the event older syscalls are deprecated and replaced with new ones, you can update the kernel of the underlying host, which can be done independently of an OS upgrade. As far as containers go, the binaries and aptitude management tools are the same as if you installed Ubuntu on an EC2 instance (or VM).

Q: But I’m running a windows environment. Those OS’s don’t have a kernel. 

Yes, developers may want to remove cost overhead associated with Windows licenses by exploring running their apps on Linux OS. Others may simply want to modernize their .NET applications by testing out the latest versions on Containers. Docker allows you to run Linux VM’s on Windows 10 and Server 2016. As docker was written to initially execute on Linux distributions, in order to take advantage of multitenant hosting, you will have to run Hyper-V containers, which provision a thin VM on top of your hosts. You can then manage your mixed environment of Windows and Linux hosts via the –isolate option. More information can be found in the Microsoft and Docker documentation.


IT teams need to be able to help drive the business forward. Newer technologies and security patches are procured on a daily basis. Developers need to be able to freely work on modernizing their code and applications. Concurrently, Operations needs to be able to support and enhance the pipelines and platforms that get the code out faster and securely. Leveraging Docker containers in conjunction with these pipelines further helps to ensure these are both occurring in parallel without the unnecessary overhead. This allows teams to work independently in the early stages of the development cycle and yet more collaboratively to get the releases out the door.

For help getting started leveraging your environment to take advantage of containerization, contact us.

-Sabine Blair, Systems Engineer & Cloud Consultant


Fully Coded And Automated CI/CD Pipelines: The Weeds

The Why

In my last post we went over why we’d want to go the CI/CD/Automated route and the more cultural reasons of why it is so beneficial. In this post, we’re going to delve a little bit deeper and examine the technical side of tooling. Remember, a primary point of doing a release is mitigating risk. CI/CD is all about mitigating risk… fast.

There’s a Process

The previous article noted that you can’t do CI/CD without building on a set of steps, and I’m going to take this approach here as well. Unsurprisingly, we’ll follow the steps we laid out in the “Why” article, and tackle each in turn.

Step I: Automated Testing

You must automate your testing. There is no other way to describe this. In this particular step however, we can concentrate on unit testing: Testing the small chunks of code you produce (usually functions or methods). There’s some chatter about TDD (Test Driven Development) vs BDD (Behavior Driven Development) in the development community, but I don’t think it really matters, just so long as you are writing test code along side your production code. On our team, we prefer the BDD style testing paradigm. I’ve always liked the symantically descriptive nature of BDD testing over strictly code-driven ones. However, it should be said that both are effective and any is better than none, so this is more of a personal preference. On our team we’ve been coding in golang, and our BDD framework of choice is the Ginkgo/Gomega combo.

Here’s a snippet of one of our tests that’s not entirely simple:

Describe("IsValidFormat", func() {
  for _, check := range AvailableFormats {
    Context("when checking "+check, func() {
      It("should return true", func() {
  Context("when checking foo", func() {
    It("should return false", func() {

So as you can see, the Ginkgo (ie: BDD) formatting is pretty descriptive about what’s happening. I can instantly understand what’s expected. The function IsValidFormat, should return true given the range (list) of AvailableFormats. A format of foo (which is not a valid format) should return false. It’s both tested and understandable to the future change agent (me or someone else).

Step II: Continuous Integration

Continuous Integration takes Step 1 further, in that it brings all the changes to your codebase to a singular point, and building an object for deployment. This means you’ll need an external system to automatically handle merges / pushes. We use Jenkins as our automation server, running it in Kubernetes using the Pipeline style of job description. I’ll get into the way we do our builds using Make in a bit, but the fact we can include our build code in with our projects is a huge win.

Here’s a (modified) Jenkinsfile we use for one of our CI jobs:

def notifyFailed() {
  slackSend (color: '#FF0000', message: "FAILED: '${env.JOB_NAME} [${env.BUILD_NUMBER}]' (${env.BUILD_URL})")
  label: 'fooProject-build',
  containers: [
      name: 'jnlp',
      image: '',
      args: '${computer.jnlpmac} ${}',
      alwaysPullImage: true,
      name: 'image-builder',
      image: '',
      ttyEnabled: true,
      alwaysPullImage: true,
      command: 'cat'
  volumes: [
      hostPath: '/var/run/docker.sock',
      mountPath: '/var/run/docker.sock'
      hostPath: '/home/jenkins/workspace/fooProject',
      mountPath: '/home/jenkins/workspace/fooProject'
      secretName: 'jenkins-creds-for-aws',
      mountPath: '/home/jenkins/.aws-jenkins'
      hostPath: '/home/jenkins/.aws',
      mountPath: '/home/jenkins/.aws'
  node ('fooProject-build') {
    try {
      checkout scm
      wrap([$class: 'AnsiColorBuildWrapper', 'colorMapName': 'XTerm']) {
          stage('Prep') {
            sh '''
              cp /home/jenkins/.aws-jenkins/config /home/jenkins/.aws/.
              cp /home/jenkins/.aws-jenkins/credentials /home/jenkins/.aws/.
              make get_images
          stage('Unit Test'){
            sh '''
              make test
              make profile
            $class:              'CoberturaPublisher',
            autoUpdateHealth:    false,
            autoUpdateStability: false,
            coberturaReportFile: 'report.xml',
            failUnhealthy:       false,
            failUnstable:        false,
            maxNumberOfBuilds:   0,
            sourceEncoding:      'ASCII',
            zoomCoverageChart:   false
          stage('Build and Push Container'){
            sh '''
              make push
        container('image-builder') {
          sh '''
            make deploy_integration
            make toggle_integration_service
        try {
          wrap([$class: 'AnsiColorBuildWrapper', 'colorMapName': 'XTerm']) {
            container('image-builder') {
              sh '''
                sleep 45
                export KUBE_INTEGRATION=https://fooProject-integration
                export SKIP_TEST_SERVER=true
                make integration
        } catch(e) {
            sh '''
              make clean
      stage('Deploy to Production'){
        container('image-builder') {
          sh '''
            make clean
            make deploy_dev
    } catch(e) {
        sh '''
          make clean
      currentBuild.result = 'FAILED'

There’s a lot going on here, but the important part to notice is that I grabbed this from the project repo. The build instructions are included with the project itself. It’s creating an artifact, running our tests, etc. But it’s all part of our project code base. It’s checked into git. It’s code like all the other code we mess with. The steps are somewhat inconsequential for this level of topic, but it works. We also have it setup to run when there’s a push to github (AND nightly). This ensures that we are continuously running this build and integrating everything that’s happened to the repo in a day. It helps us keep on top of all the possible changes to the repo as well as our environment.

Hey… what’s all that make_ crap?_


Our team uses a lot of tools. We ascribe to the maxim: Use what’s best for the particular situation. I can’t remember every tool we use. Neither can my teammates. Neither can 90% of the people that “do the devops.” I’ve heard a lot of folks say, “No! We must solidify on our toolset!” Let your teams use what they need to get the job done the right way. Now, the fear of experiencing tool “overload” seems like a legitimate one in this scenario, but the problem isn’t the number of tools… it’s how you manage and use use them.

Enter Makefiles! (aka: make)

Make has been a mainstay in the UNIX world for a long time (especially in the C world). It is a build tool that’s utilized to help satisfy dependencies, create system-specific configurations, and compile code from various sources independent of platform. This is fantastic, except, we couldn’t care less about that in the context of our CI/CD Pipelines. We use it because it’s great at running “buildy” commands.

Make is our unifier. It links our Jenkins CI/CD build functionality with our Dev functionality. Specifically, opening up the docker port here in the Jenkinsfile:

volumes: [
    hostPath: '/var/run/docker.sock',
    mountPath: '/var/run/docker.sock'

…allows us to run THE SAME COMMANDS WHEN WE’RE DEVELOPING AS WE DO IN OUR CI/CD PROCESS. This socket allows us to run containers from containers, and since Jenkins is running on a container, this allows us to run our toolset containers in Jenkins, using the same commands we’d use in our local dev environment. On our local dev machines, we use docker nearly exclusively as a wrapper to our tools. This ensures we have library, version, and platform consistency on all of our dev environments as well as our build system. We use containers for our prod microservices so production is part of that “chain of consistency” as well. It ensures that we see consistent behavior across the horizon of application development through production. It’s a beautiful thing! We use the Makefile as the means to consistently interface with the docker “tool” across differing environments.

Ok, I know your interest is peaked at this point. (Or at least I really hope it is!)
So here’s a generic makefile we use for many of our projects:

CONTAINER=$(shell basename $$PWD | sed -E 's/^ia-image-//')
.PHONY: install install_exe install_test_exe deploy test
    docker pull$(CONTAINER)
    docker tag$(CONTAINER):latest $(CONTAINER):latest
    if [[ ! -d $(HOME)/bin ]]; then mkdir -p $(HOME)/bin; fi
    echo "docker run -itP -v \$$PWD:/root $(CONTAINER) \"\$$@\"" > $(HOME)/bin/$(CONTAINER)
    chmod u+x $(HOME)/bin/$(CONTAINER)
    if [[ ! -d $(HOME)/bin ]]; then mkdir -p $(HOME)/bin; fi
    echo "docker run -itP -v \$$PWD:/root $(CONTAINER)-test \"\$$@\"" > $(HOME)/bin/$(CONTAINER)
    chmod u+x $(HOME)/bin/$(CONTAINER)
    docker build -t $(CONTAINER)-test .
    captain push

This is a Makefile we use to build our tooling images. It’s much simpler than our project Makefiles, but I think this illustrates how you can use Make to wrap EVERYTHING you use in your development workflow. This also allows us to settle on similar/consistent terminology between different projects. %> make test? That’ll run the tests regardless if we are working on a golang project or a python lambda project, or in this case, building a test container, and tagging it as whatever-test. Make unifies “all the things.”

This also codifies how to execute the commands. ie: what arguments to pass, what inputs etc. If I can’t even remember the name of the command, I’m not going to remember the arguments. To remedy, I just open up the Makefile, and I can instantly see.

Step III: Continuous Deployment

After the last post (you read it right?), some might have noticed that I skipped the “Delivery” portion of the “CD” pipeline. As far as I’m concerned, there is no “Delivery” in a “Deployment” pipeline. The “Delivery” is the actual deployment of your artifact. Since the ultimate goal should be Depoloyment, I’ve just skipped over that intermediate step.

Okay, sure, if you want to hold off on deploying automatically to Prod, then have that gate. But Dev, Int, QA, etc? Deployment to those non-prod environments should be automated just like the rest of your code.

If you guessed we use make to deploy our code, you’d be right! We put all our deployment code with the project itself, just like the rest of the code concerning that particular object. For services, we use a Dockerfile that describes the service container and several yaml files (e.g. deployment_<env>.yaml) that describe the configurations (e.g. ingress, services, deployments) we use to configure and deploy to our Kubernetes cluster.

Here’s an example:

apiVersion: extensions/v1beta1
kind: Deployment
    app: sweet-aws-service
    stage: dev
  name: sweet-aws-service-dev
  namespace: sweet-service-namespace
  replicas: 1
        app: sweet-aws-service
      name: sweet-aws-service
      - name: sweet-aws-service
        imagePullPolicy: Always
          - name: PORT
            value: "50000"
          - name: TLS_KEY
                name: grpc-tls
                key: key
          - name: TLS_CERT
                name: grpc-tls
                key: cert

This is an example of a deployment into Kubernetes for dev. That %> make deploy_dev from the Jenkinsfile above? That’s pushing this to our Kubernetes cluster.


There is a lot of information to take in here, but there are two points to really take home:

  1. It is totally possible.
  2. Use a unifying tool to… unify your tools. (“one tool to rule them all”)

For us, Point 1 is moot… it’s what we do. For Point 2, we use Make, and we use Make THROUGH THE ENTIRE PROCESS. I use Make locally in dev and on our build server. It ensures we’re using the same commands, the same containers, the same tools to do the same things. Test, integrate (test), and deploy. It’s not just about writing functional code anymore. It’s about writing a functional process to get that code, that value, to your customers!

And remember, as with anything, this stuff get’s easier with practice. So once you start doing it you will get the hang of it and life becomes easier and better. If you’d like some help getting started, download our datasheet to learn about our Modern CI/CD Pipeline.

-Craig Monson, Sr Automation Architect


jQuery(document).ready(function(){ jQuery("#videotest").prop('muted', true); jQuery("#videotest").prop('autoplay', true); });