Infrastructure evolution

By Adrian Bridgett | 2023-01-16

Overview

25 years is a long time. It’s flown by. In that time we’ve gone from sysadmins managing dozens of computers to devops and SREs looking after 10-100,000 computers. Let’s take a quick look at where we’ve come from and where are going.

The dark ages and coming dawn

When I started my career in the mid-90’s, most servers were installed manually. Maybe there was a document to follow, or if you were lucky, some shell script to copy and run. Basic configuration was possible but was pretty raw - generally limited to basic disk setup, package selection and .. running a shell script. The installation media was CD, or if you were ahead of the game, PXE booting over the network. (I still love network booting, it seems quite magical and I find it useful to be able to select rescue or memtest images). These methods are still pretty standard, though PXE managers such as foreman and cloud-init configuration have improved.

Configuration management

Everyone had their own bundle of scripts to install machines, but maintenance devolved to manual changes. “Configuration drift” resulted where supposedly identical servers ran different versions of software and firmware with different configuration files.

The first widespread standard was CFEngine. This gave adminstrators a way to say “Ensure this file has those permissions and contains this text. If it’s wrong, fix it then restart that program”. CFEngine wasn’t too user friendly and it wasn’t until Puppet hit the scene in 2005 that configuration management took off. Around this time the “DevOps” movement was born - bringing good software “Dev”elopment practices into the “Ops” (operations) space.

This drastically increased the number of machines an admin could look after from dozens to hundreds. Machines were built consistently and quickly. I remember building full machines (from a blank box), formatting disks, installing OS, applications and configuration in under 5mins - of which 2mins was waiting for the BIOS to boot.

A new paradigm

In 2006, a book vendor (called Amazon) launched AWS - where you could rent servers on the “cloud”. This has utterly revolutionised the way companies run. Gone are three month lead times for servers or running datacentres - you can rent rather than run. AWS now supports hundreds of services - from storage and databases to call centres and satellite connections.

As a cloud engineer, there’s a balance to strike between paying a cloud provider to run a service and running it yourself which can be cheaper and/or better. The velocity that can be achieved to running a service and maintaining it has increased by an order of magnitude or more.

Terraform emerged as the de-facto standard way to configure everything cloud. Infrastucture as Code IaC is now an established term.

Maybe you don’t even want to run a computer any more - in which case there’s Function as a service - more commonly called Serverless with AWS Lambda being almost synonymous. This will run your snippet of code on-demand in complete isolation, removing much of the underlying OS and hardware from your concern.

Virtualisation

Virtualisation has been around since 1960’s in the current meaning of running multiple “machines” on a single server. In the 2000s it became widely used as a way to reduce costs. Besides increasing utilisation (more bang the for buck), it also aids management - machine consistency via golden images using packer and better isolation from hardware failures via snapshotting or even (live) migration to a different underlying host.

There’s still heavy use of whole-OS virtualisation using VMware, Openstack etc. However the more lightweight containers are rapidly replacing it as a abstraction of choice.

Containers

This brings us to Containers of which Docker is by far the most famous. This is a half-way house between serverless and full virtualisation - the container might be as light as a single program, or as heavy as a full operating system (minus the kernel).

Containers brought a level of consistency that was previously slow (building a whole VM) or difficult to realise. Now running a container means that everyone will be running the same code, with the same libraries and supporting assets. “Works on my laptop” now finally stands a chance of working in production.

Kubernetes

Kubernetes is the supermassive black-hole of the industry at the moment, devouring everything. It’s essentially a platform for running containers - at possibly massive scale, with a lot of the details taken care of in a standard way. This power and flexibility does bring with it a lot of complexity to cover everyone’s needs (often that of large hyperscalers rather than what typical shops need), both to run Kubernetes (outsource it if you can to AWS/Google/etc) and to run applications in Kubernetes.

The future

PXE and server configuration still seem to be the way to build datacentres. Oxide could be interesting, though given how horrific IPMI and BMCs are, I think generic servers will remain stuck in the dark ages for a long time.

Server configuration management has lost most of its value in the face of containers which are more composable, more isolated, more auditable. Almost the only remaining use is to bootstrap servers to run Kubernetes.

Infrastructure as code remains essential to configure all the cloud resources, there’s be a battle between domain specific languages such as Terraform’s HCL and those that expose normal programming interfaces such as Pulumi and CDK. My opinion here is that the rigidity of the DSLs can be a great benefit for readability and simplicity.

VMs will become a niche product for those applications that aren’t yet available in container form.

Containers and Kubernetes (or something similar) will dominate for the next ten years. There’s certainly improvements to make:

  • Dockerfiles remind me of the dark ages - ugliness everywhere. buildpacks are language specific (pro and a con), buildah would at least be composable.
  • Kubernetes is overcomplex for many use cases. Writing helm go-templates makes me sad (counting indents isn’t fun), I prefer kustomize. However we need something much simpler - like Heroku buildpacks - “please run my simple python app”. Acorn or Cuber is what I have in mind. Perhaps we just need some simple CRDs or even just a K8s config generator that could be overridden when needed.