operations

Think Cloud portable Let Applications drive the Model

In our last intro to Modeling with Juju post we didn’t pay any attention to the hardware needed to run our workloads. We ran with Juju default values for what the hardware characteristics of the cloud instances should be. Rather than define a bunch of YAML about each machine needed to run the infrastructure, Juju picks sane defaults that allow you to prototype and test things out with a lot less work. 

The reason this is important is that Juju is a modeling tool. When it comes time to making decisions about your hardware infrastructure, it’s key to focus on the needs of the Applications in your Model. If we’re going to operate PostgreSQL in production then we need to look at the needs of PostgreSQL. How much disk space do we need to provide to adequately store our data and provide space for backups and the like? How much CPU power do we need in our PostgreSQL instance? How much memory is going to be required to make sure PostgreSQL has the best chance of achieving hits to data already in memory?

All of these questions fall under characteristics required to run the specific Application. It’ll be different for each Application we operate as part of our infrastructure. Juju shines at this because of the way that it’s build around the Cloud idea of infrastructure as an API. There’s no plugins required. Juju natively is put together so that whenever you deploy an Application that you want to run the API calls are made to make sure you get the right type of hardware using the underlying Cloud API calls. Infrastructure is loaded as-needed and there's no requirement to pre-load software on the machines before you use them.

Juju provides a tool called “Constraints” to manage these details in the Model. In their simplest form, a Constraint is just an option you can pass during the deploy command. Let’s compare the outcome of these two commands:

$ juju deploy postgresql 
juju deploy postgresql pgsql-constrained --constraints mem=32G

$ juju status
…
App                Version  Status  Scale  Charm       Store       Rev  OS      Notes
pgsql-constrained  9.5.9    active      1  postgresql  jujucharms  163  ubuntu
postgresql         9.5.9    active      1  postgresql  jujucharms  163  ubuntu

Unit                  Workload  Agent  Machine  Public address   Ports     Message
pgsql-constrained/0*  active    idle   1        35.190.190.119   5432/tcp  Live master (9.5.9)
postgresql/0*         active    idle   0        104.196.135.211  5432/tcp  Live master (9.5.9)

Machine  State    DNS              Inst id        Series  AZ          Message
0        started  104.196.135.211  juju-dd8186-0  xenial  us-east1-b  RUNNING
1        started  35.190.190.119   juju-dd8186-1  xenial  us-east1-c  RUNNING

From juju status these commands don’t look like they did a whole lot different. They’ve both introduced a PostgreSQL server into our Model and it’s available to be part of Relations to other Applications where it will provide them with a database to use. Juju encourages focusing on what’s running on the Application level over the minutia of hardware. However, if we compare the machine details we can see these commands definitely changed PostgreSQL’s ability to perform for us. 

$ juju show-machine 0
machines:
  "0":
    ...
    hardware: arch=amd64 cores=1 cpu-power=138 mem=1700M root-disk=10240M availability-zone=us-east1-b

$ juju show-machine 1
machines:
  "1":
    ...
    constraints: mem=32768M
    hardware: arch=amd64 cores=8 cpu-power=2200 mem=52000M root-disk=10240M availability-zone=us-east1-c

Notice the different output? The constrained machine knows that the Model states that there’s an active constraint. We want 32G of memory for this PostgreSQL Application and if I were to run a juju add-unit command the next machine would know there’s an active Constraint and respect it. In this way Constraints are part of the active Model. If we were to remove the Unit pgsql-constrained the Constraint is still in there as part of the Model. 

The next thing to note is that we’ve got more memory than I asked for. We wanted to make sure my PostgreSQL had 32G of RAM vs the default instance type which has 1.7G. In order to do that Juju had to request a larger instance which also brought in 8 cores and we ended up with 52G of ram. Looking at the GCE instances available, their instance n1-standard-8 only has 30G of memory. So Juju went looking for a better match and pulled up the n1-highmem-8 instance type. In this way we receive the best matching unit that meets or exceeds my Constraints we’ve requested. Juju walks through all the various instance types to help do this. 

As a user, we focus on what our Application needs to run at its best and Juju takes care of finding the most cost-effective instance type that will meet these goals. This is key to allowing my Model to remain cloud agnostic and maintain performance expectations, regardless of what Cloud we use this Model on.  Let’s try this same deploy on AWS and see what happens. 

$ juju add-model pgsql-aws aws
$ juju deploy postgresql pgsql-on-aws --constraints mem=32G
$ juju show-machine 0
machines:
  "0":
    ...
    constraints: mem=32768M
    hardware: arch=amd64 cores=8 cpu-power=2800 mem=62464M root-disk=8192M availability-zone=us-east-1a

Notice here we ended up with an instance that’s got 8 cores and 62G of ram. That’s the best-fit instance-type on AWS for our Constraints we need to make sure our system performs at its best. 

Other Constraints

As you’d expect we can set other constraints to customize the characteristics of the hardware our Applications leverage. We can start with the number of CPU cores. This is vital in today’s world of applications that are able to put those extra cores to use. It might also be key when we’re going to run LXD machine containers or other VM technology on a machine. Keeping a core per VM is a great way to leverage the hardware available.

$ juju deploy postgresql pgsql-4core --constraints cores=4
$ juju show-machine 2
machines:
  "2":
    ...
    constraints: cores=4
    hardware: arch=amd64 cores=4 cpu-power=1100 mem=3600M root-disk=10240M availability-zone=us-east1-d

Voila, Juju helps get the best instance for those requirements. The most cost-effective instance available with 4 cores provides 3.6G of memory.

root-disk sizing

$ juju deploy postgresql pgsql-bigdisk --constraints root-disk=1T
$ show-machine 3 
machines:
  "3":
    ...
    constraints: root-disk=1048576M
    hardware: arch=amd64 cores=1 cpu-power=138 mem=1700M root-disk=1048576M availability-zone=us-east1-b

One thing that’s not setup as part of the instance-type is the disk space allocated to the instance. Some Applications focus on CPU usage but others will want to track state and store that on disk. We can specify the size of the disk that the instances are allocated through the root-disk Constraint.

The root-disk Constraints only mean that Juju will make sure that the instance has a Volume of that size when it comes up. This is especially useful for Big Data Applications and Databases. 

Remembering Constrains in the model

From here let’s combine our Constraints and produce a true production grade PostgreSQL server. Then we’ll create the bundle that will deploy this in a repeatable fashion that we can reuse on any Cloud that we want to take it to. 

$ juju deploy postgresql pgsql-prod --constraints "cores=4 mem=24G root-disk=1T"

series: xenial
applications: 
  "pgsql-prod": 
    charm: "cs:postgresql-163"
    num_units: 1
    constraints: "cores=4 mem=24576 root-disk=1048576"

Constraints on Controllers

Once we understand how Constraints work one area we’ll find it really valuable is in the sizing of your own Juju Controllers. If we’re operating a true multi-user Controller with many models to the folks in your business we’ll want to move up from the default instances sizes. Juju is conservative so that folks testing out and experimenting with Juju don’t end up running much larger instances than expected. There are a lot more test or trial controllers out there than Controllers that run true production. It is much like any product really. 

Controller Constraints are supplied during bootstrap using the flag bootstrap-constraints. For example: 

$ juju bootstrap --bootstrap-constraints="mem=16G cores=4 root-disk=400G" google

Some considerations when you’re operating a controller include: 

  • root-disk - Juju leverages persistent database and tracks logs and such in that database. You want to make sure to provide space for running Juju backups, storage of Charms and Resources used in Models (they’re cached on the controller), and logs that are rotated over time, etc. 
  • mem - as with any database, the more it can fit into memory the better. It’ll respond quicker to events through the system. During events such as upgrades and migrations there can be a lot of memory consumption as entire models are processed as part of those actions. 
  • cores - speed, as you have more clients and units talking over the API the more cores and faster it can respond the better. Here cores is more important than raw CPU as most things are more about fetching/processing documents vs doing pure computation. 

Cloud specific Constraints

There are cloud specific constraints that are supported. MAAS, for instance, allows deploying via tags because MAAS supports tags as a way of classifying bare metal machines where you might optimize a hardware purchase for network applications, storage applications, etc. Make sure to check out the documentation to see all available Constraints and what Clouds they are supported on. 

The great thing about the Constraint system is that it keeps the focus on requirements on the Applications and what they actually need to get their jobs done. We’re not trying to force Applications onto existing hardware that might not fit it’s operating profile. It’s an idea that really excels in the IaaS Cloud world. 

Give it a try and if you have any questions or hit any problems let us know. You can find us in the #juju IRC channel on Freenode or on the Juju mailing list. You can reach me on Twitter. Don’t hesitate to reach out. 

Modeling the infrastructure while keeping security and flexibility in mind

Juju allows the user to model their infrastructure in a clean and simple repeatable way. Often deployments are repeated across different clouds and regions. Sometimes it's repeated from dev to staging to production. Regardless of the way it's repeated, there are some solid practices that users need to follow when taking a model and reusing it. Some bits need to be unique to each deployment. Most of these are security details that need to be different from deployment to deployment. There might also be some specific bits of configuration that regularly vary. In staging the url in the apache config might be staging.jujucharm.com and in production it's jujucharms.com. You need to be able to reuse the model of how the applications, constraints, and common configuration work but make sure there's a clean and simple method of providing the extra unique bits each time you bring up another model.

Let's walk through an example model I've created. I'm going to monitor a pair of Ubuntu machines with Telegraf feeding system details to Prometheus. We'll then use Grafana to visualize those metrics. Finally, we want to setup an HaProxy front end for the Grafana so we can provide a proper SSL terminated web site. After it's up and running it looks a bit like this.

bundle-view.jpg

The first thing we need to do is use the Juju GUI to export a bundle that will be a dump of our model. That's an EXACT dump of everything we've got. We need to edit out the bits of the model that need to be unique from deployment to deployment. Once it's all cleaned up it looks a bit like this.

applications:
  ubuntu:
    charm: "cs:ubuntu"
    num_units: 2
  telegraf:
    charm: "cs:telegraf"
  prometheus:
    charm: "cs:prometheus"
  grafana:
    charm: "cs:grafana"
    options:
      admin_password: CHANGEME
  haproxy:
    charm: "cs:haproxy"
    expose: true
relations:
  - - "ubuntu:juju-info"
    - "telegraf:juju-info"
  - - "prometheus:target"
    - "telegraf:prometheus-client"
  - - "prometheus:grafana-source"
    - "grafana:grafana-source"
  - - "grafana:website"
    - "haproxy:reverseproxy"

A couple of things to note in there are the config values for the Grafana admin password. We want that to be clear that it should be changed. Other than that though, it's a pretty plain model. Where it gets fun is when we leverage new bundle features in Juju 2.2.3.

Overriding config values at deploy time

Juju 2.2.3 provides a new argument to the deploy command, --bundle-config. This flag allows you to pass a filename where that file will override config for the applications in the bundle file that you're deploying. You might use it like this:

juju deploy ./bundle.yaml --bundle-config=production.yaml

So what can we use this for? Well, let's set a unique password for our Grafana admin user. To provide a file with updated config we just mirror the bundle format and point at the application we're targeting like so. Let's edit the production.yaml file to look like this.

applications:
  grafana:
    options:
      admin_password: ImChanged

Note it looks just like the bundle file above with the same keys and we're just setting an admin password of "ImChanged" to prove it's set. We can then deploy the bundle with the --bundle-config argument and when it's done and brought up we can check it was set.

$ juju config grafana admin_password
ImChanged

Reading complex data from a file

That's handy, but sometimes you don't want to just set a new string value but read content from a file. Prometheus can be used to scrape custom jobs. We've used this in the past to scrape prometheus data from Juju controllers themselves. To set this up we need to add a YAML declaration about the job that Prometheus will process. Let's find out what the IP of our Juju controller is and add that job using another new bundle feature; include-file://

Using include-file:// you can specify a path on disk that will be read and passed to the config value in your bundle. In this way you can easily sent complicated multi-line data (like YAML) to a config value in a clean and easy way. First let's setup our new scape job definition.

juju show-machine -m controller 0
...
ip-addresses:
    - 10.0.0.8
    
vim scapejobs.yaml

  metrics_path: /introspection/metrics
  scheme: https
  static_configs:
    - targets: ['10.0.0.8:17070']
  basic_auth:
    username: user-prometheus
    password: testing

Now let's update our production.yaml file to also read this new scrapejobs.yaml file during deployment.

applications:
  grafana:
    options:
      admin_password: ImChanged
  prometheus:
    options:
      scrape-jobs: include-file://scrapejobs.yaml

In order for this to work the file is defined to be in the current working directory. If you want it elsewhere we'll need to define a full path to the file. Now when we run our deploy command we'll both set the grafana password as well as read the new job for Prometheus. 

juju deploy ./bundle.yaml --bundle-config=production.yaml
...
juju config prometheus scrape-jobs
  - job_name: juju
    metrics_path: /introspection/metrics
    scheme: https
    static_configs:
      - targets: ['10.0.0.8:17070']
    basic_auth:
      username: user-prometheus
      password: testing

Awesome, now we can do some work with templating out the file and reusing it providing unique IP address for targets as well as custom usernames and passwords as needed from deployment to deployment while keeping the basics of the model intact and reusable.

base64 the included files

There's a third option for including into this production.yaml and that's include-base64://. This allows reading of a local file and base64'ing the contents before getting set into the config. This is helpful for things like ssl keys and such that are unique to different deployments. In our demo case I want to pass in an SSL key to be used with HAProxy so that I can provide HTTPS for accessing the Grafana dashboard. To do this we need to set the ssl_key and ssl_cert config values int he HAProxy charm. Let's update the production.yaml file for this final bit of configuration overriding. 

applications:
  grafana:
    options:
      admin_password: ImChanged
  prometheus:
    options:
      scrape-jobs: include-file://scrapejobs.yaml
        haproxy:
    options:
      ssl_key: include-base64://ssl.key
      ssl_cert: include-base64://ssl.crt

With this in place the next time we deploy we get the config values updated with base64'd values. 

juju deploy ./bundle.yaml --bundle-config=production.yaml
...wait a bit...
juju config haproxy ssl_key
LS0tLS1CRUdJTiBSU0EgUFJJV...

Now we've constructed a sharable model that can be reused yet easily follow best practices for not putting our passwords and keys into the model which might leak out in some way. These tools provide you the best ways of collaborating on the operations of software at scale and I can't wait to hear about how you're using this to build out the next level of your operations best practices. 

Hit a question or want to share a story? Tell us about it in IRC, on the mailing list, or just bug me on twitter @mitechie.

IRC: #juju on Freenode
Mailing list: https://lists.ubuntu.com/mailman/listinfo/juju

Upgrading Juju using model migrations

Since Juju 2.0 there's a new feature, model migrations, intended to help provide a bulletproof upgrade process. The operator stays in control throughout and has numerous sanity checks to help provide confidence along the upgrade path. Model migrations allow an operator to bring up a new controller on a new version of Juju and to then migrate models from an older controller one at a time. These migrations go through the process of putting agents into a quiet state and queue'ing any changes that want to take place. The state is then dumped out into a standard format and shipped to the new controller. The new controller then loads the state and verifies it matches by checking it against the state from the older controller. Finally, the agents on each managed machine are checked to make sure they can communicate with the new controller and that any state matches expectations before those agents update themselves to report to the new controller for duty.

Once this is all complete the handoff is finished and the old controller can be taken down once the last model is migrated away. In order to show how this works I've got a controller running Juju 2.1.3 and we're going to upgrade my models running on that controller by migrating them to a brand new Juju 2.2 controller. 

One thing to remember is that Juju controllers are the kings of state. Juju is an event based system and on each machine or cloud instance managed an agent runs that communicates with the controller. Events from those agents are processed and the controller updates the state of applications, triggers future events, or just takes note of messages in the system. When we talk about migrating the model, we're only moving where the state system is communicating. None of the workloads are moved. All instances and machines stay exactly where they're at and there's no impact on the workloads themselves. 

$ juju models -c juju2-1
Controller: juju2-1

Model       Cloud/Region   Status     Machines  Cores  Access  Last connection
controller  aws/us-east-1  available         1      1  admin   just now
gitlab      aws/us-east-1  available         2      2  admin   49 seconds ago
k8s*        aws/us-east-1  available         3      2  admin   39 seconds ago

This is our controller running Juju 2.1.3 and it has on it a pair of models running important workloads. One is a model running a Kubernetes workload and the other is running gitlab hosting our team's source code. Lets upgrade to the new Juju 2.2 release. The first thing we need to do is to bootstrap a new controller to move the models to. 

Gitlab running in the gitlab model to host my team's source code.

Gitlab running in the gitlab model to host my team's source code.

First we upgrade our local Juju client to Juju 2.2 by getting the updated snap from the stable channel. 

sudo snap refresh juju --classic

Now we can bootstrap a new controller making sure to match up the cloud and region of the models we want to migrate. They were in AWS in the us-east-1 region so we'll need to make sure to bootstrap there.

$ juju bootstrap aws/us-east-1 juju2-2

Looking at this model we have the two, out of the box, models a new controller always does. 

$ juju models
Controller: juju2-2

Model       Cloud/Region   Status     Machines  Cores  Access  Last connection
controller  aws/us-east-1  available         1      1  admin   2 seconds ago
default*    aws/us-east-1  available         0      -  admin   8 seconds ago

To migrate models to this new controller we need to be on the older controller. 

$ juju switch juju2-1

With that done we can now ask Juju to migrate the gitlab model to our new controller. 

$ juju migrate gitlab juju2-2
Migration started with ID "44d8626e-a829-48f0-85a8-bd5c8b7997bb:0"

The migration kicks off and the UUID is provided as a way of tracking among potentially several migrations going on. If a migration were to fail for any reason and resume its previous state we could make corrections and try again. Watching the output of juju status while the migration processes is a interesting. Once done you'll find that status errors because the model is no longer there. 

$ juju list-models -c juju2-1
Controller: juju2-1
    
Model       Cloud/Region   Status     Machines  Cores  Access  Last connection
controller  aws/us-east-1  available         1      1  admin   just now
k8s         aws/us-east-1  available         3      2  admin   54 minutes ago

Here we can see our model is gone! Where did it go?

$ juju list-models -c juju2-2                                                                                        (rharding@dingy:)
Controller: juju2-2

Model       Cloud/Region   Status     Machines  Cores  Access  Last connection
controller  aws/us-east-1  available         1      1  admin   just now
default*    aws/us-east-1  available         0      -  admin   45 minutes ago
gitlab      aws/us-east-1  available         2      2  admin   1 minute ago

There we go, it's over on the new juju2-2 controller. The controller managing state is on Juju 2.2, but now it's time for us to update the agents running on each of the machines/instances to also be the new version.

$ juju upgrade-juju -m juju2-2:gitlab
started upgrade to 2.2

$ juju status | head -n 2
Model    Controller  Cloud/Region   Version    SLA
default  juju2-2     aws/us-east-1  2.2        unsupported

Our model is now all running on Juju 2.2 and we can use any of the new features that are useful to us against this model and the applications in it. With that upgraded let's go ahead and finish up by migrating the Kubernetes model. The process is exactly the same as the gitlab model. 

$ juju migrate k8s juju2-2
Migration started with ID "2e090212-7b08-494a-859f-96639928fb02:0"

...one, two, skip a few...

$ juju models -c juju2-2
Controller: juju2-2

Model       Cloud/Region   Status     Machines  Cores  Access  Last connection
controller  aws/us-east-1  available         1      1  admin   just now
default*    aws/us-east-1  available         0      -  admin   47 minutes ago
gitlab      aws/us-east-1  available         2      2  admin   1 minute ago
k8s         aws/us-east-1  available         3      2  admin   57 minutes ago

$ juju upgrade-juju -m juju2-2:k8s
started upgrade to 2.2

The controller juju2-1 is no longer required since it's not in control of any models. There's no state for it to track and manage any longer. 

$ juju destroy-controller juju2-1

Give model migrations a try and keep your Juju up to date. If you hit any issues the best place to look is in the juju debug-log from the controller model. 

$ juju switch controller
$ juju debug-log

Make sure to reach out and let us know if it works well or if you hit a bug in the system. You can also use model migrations to move models among the same version of Juju to help balance load or for maintenance purposes. You can find the team in #juju on Freenode or send a message to the mailing list. 

Designing for long term operations, upgrades, and rebalancing

When you're building a tool for a user there's a huge amount of design, decision making, and focus put on getting the user started. There's good reason for this. Users have an ocean full of tools at their disposal, and if you're building something for others to use you need to win the first five minute test.

You've got about a five minute window for a user to make useful headway in understanding your tool and visualizing how it will aid them in their work. Often this manifests in tools that demo well, but that then have to be left behind when the real work hits. I like to use text editors as an example of this. So many users start out with notepad, nano, or some other really light editor. In time, they learn more and the learning curve of something like VIM, Eclipse, or Emacs really pays off. There's a big gap in folks that make that leap though. If you want to get the most folks in the door it needs to have that hit the ground running feel to it.

When you're designing a tool to help users deploy and run software there's a lot of focus on the install process. Nearly every hosting provider has worked with some sort of "one button install" tool. They're great because it gets users started quickly in that five minute window. Over time though, users find that those one click tools end up being very shallow. They need to add users, create new databases, back up data, restart daemons when appropriate, the list goes on and on. Two operational tasks which are particularly interesting are upgrades and rebalancing.

Juju is an operations tool that can track the state for many different models, or state of operated applications,  potentially run by many different users. These models evolve over time and are running production workloads for years. We know of many models that are nearly a year and a half old. In that same time Juju has had five 1.2x minor versions and three 2.x minor versions. That means you’d want to keep up with improvements in performance, features, and security by upgrading nearly every other month. Aiding in this the Juju 2.1 release includes a new feature, known as model migrations, specifically to help operators managing their infrastructure over the long term.

The general idea is that the largest danger is in doing complex upgrade steps such as database migrations and making sure that everything running is able to communicate on the new version of the software. In Juju 2.1, the Juju infrastructure that manages the state of the models as well as API connections from clients (known as the “controller”) uses model migrations allowing an operator to bring up a new controller and migrate the model state over to a new version. In that process the data is shipped over, it’s gone through any migrations that need to occur, the data is sanity checked between old and new controller, and the system is put together in such a way that if anything fails to check out it can roll back and use the previously running controller. That's some reassurance to lean on when you're doing an important upgrade. Since one controller operates many models, the operator is in control of which models get migrated at what time and allows for a very controlled rollout to the new software in a way that permits safely checking that all remains in the green as the new version of the Juju software is adopted.

Another big use case for model migrations is to balance out the load on Juju controllers over time. As an organization we all grow and change in our needs and over time it's important to be able to move to new hardware, shift services that generate heavy load to dedicated hardware, etc. Juju is tested to perform at thousands of managed machines, however there are dependencies such as the size of the controller machines that track state and over time a normal part of a growing organization is to put into play machines with newer CPUs, more memory, or just flat out beefier hardware with more cores.

I'd love to hear about the tool you use and which ones have fallen short on aiding you in managing your long term operational needs and what other types of long term operations you want your tools to assist with. Hit me up on twitter @mitechie

In a follow up post I'll walk through exactly how to perform a migration to the new Juju 2.2 release using the built in model migration feature.

Open Source software and operational usability

A friend of mine linked me to Yaron Haviv's article "Did Amazon Just Kill Open Source" and I can't help but want to shout a reply

NO!

There is something in premise though, and it's something I keep trying to push as it's directly related to the work I do in my day job at Canonical.

Clouds have been shifting from IaaS to SaaS as sure as can be. They've gone from just getting your quick access to hardware on a "pay for what you use" system to providing you full self service access to applications you used to have to run on that hardware. They've been doing this by taking Open Source software and wrapping it with their own API layers and charging you for their operating of your essential services.

When I look at RDS I see lovely images for PostgreSQL, MySQL, and MariaDB. I can find Elasticsearch, Hadoop, and more. It's a good carrot. Use our APIs and you don't have to operate that software on the EC2 platform any more. You don't have to worry about software upgrades, backups, or scale out operational concerns. Amazon isn't the only cloud doing this either. Each cloud if finding the services it can provide directly to the developers to build products on. 

They've taken Open Source software and fixed delivery to make it easy to consume for most folks. That directly ties into the trend we have been working on at Canonical. As software has moved more and more to Open Source and the cost of software in the average IT budget drops, the costs of finding folks that can operate it at production scale has gotten much more expensive. How many folks can really say they can run Hadoop and Kubernetes and Elasticsearch at professional production levels? How hard is it to find them and how expensive is it to retain them? 

We need to focus on how we can provide this same service, but straight from the Open Source communities. If you want users of your software to be users of YOUR software, and not some wrapped API service, then you need to take those operational concerns into account. We can do more than Open Source the code that is running, we can work together as a community to Open Source the best practices for operating that software over the full life cycle of it. Too often, projects stop short at how to install it. The user has to worry about it long after it's installed. 

I have hope that tools like Juju and the Kubernetes can provide a way for the community around the software we use and love to contribute and avoid the lock in of some vendored solution of the Open Source projects we participate in today. 

Operations in Production

Operations in Production

As we setup our important infrastructure we set up monitoring, alerting, and keep a close eye on them. As the services we provide grow and need to expand, or failures in hardware attempt to wreck havoc, we’re ready because of the due diligence that’s gone into monitoring the infrastructure and applications deployed. To do this monitoring, we often bake it into our deployment and configuration management tooling. One thing I often see is that folks forget to monitor the tools that coordinate all of that deployment and configuration management. It’s a bit of a case of “who watches the watcher?”