Zuora, the monolith that we are currently reinventing into microservices, used Jenkins as its automation server for continuous integration (CI). Engineers would stand up instances of Jenkins as they needed and load them up with relevant jobs, all manually.
This worked relatively well at the previous scale we were operating but we would see increasing entropy in these configurations. For example, one Jenkins server would be bound to LDAP while another would use Jenkins’ own user database. This caused strain on the amount of cognitive overhead required to use these systems. For example, if I wanted to use dev-jenkins1.zuora.com I would login as admin. If I wanted to use ops-jenkins1.zuora.com I would use my LDAP credentials.
Enter Reinvent Initiative
As we moved to Docker and microservices, we implemented another CI solution also based on Docker. The solution lacked in two areas:
Lack of data persistence. We wanted to see trends over time beyond just success or fail. For example we wanted to see code coverage reports that would highlight dead code, duplicate code, possible bugs and everything in-between. We also wanted to store those results indefinitely.
Lack of plugins. While the other CI solution offered many plugins and were willing to support more, the Jenkins ecosystem is second to none. In fact, as of writing this, https://plugins.jenkins.io/ reports 1436 available plugins. Not necessarily meaningful, but most users will agree there are a lot of useful plugins in that total.
So now what?
We quickly realized that we liked Jenkins, we just didn’t like managing it. Our first reaction to this was to solicit a Jenkins vendor to see if we could offload the management entirely. During our conversations, we quickly realized we could implement something that would meet our needs in house, and reduce the overhead of managing it; so we decided to build our own management interface into Jenkins.
Enter Homer. Homer is our internal code name for a Python Flask application that productizes Jenkins for Zuora. Homer allows users to create and manage their Jenkins clusters (cluster consists of a Jenkins master and corresponding slaves) from a central location in an automated and standardized way. We still allow customizations but in a controlled fashion. A typical user experience would be to login to the Homer UI, create a cluster and seed it with CI jobs for a particular microservice.
Being Mindful of Cost
As you would expect, allowing end-users the ability to provision resources means we need to be mindful of cost; to minimize and control costs we leverage AWS Spot Fleet for the slaves.The idea is that the master Jenkins instance is purely for orchestration and metadata and therefore could be run using a small EC2 instance. We would leverage the heavily discounted Spot Fleet slaves for compute capacity. For more details on how we did this, take a look at this blog post.
The Evolution of Homer
Since it’s initial inception, Homer has evolved to include a large feature set. Here are some of the features:
Homer can create and delete Jenkins clusters on demand (duh)
This includes preseeded credentials into our Git and Maven repositories, Slack and much more
Commonly used plugins already installed
Bound to LDAP for authentication
Standard packages installed (Java, Lua, NodeJS, Ansible, Terraform and much more)
Automated backups with self service restore (EBS snapshots)
Self service Spot Fleet updates for custom compute requirements
Templated job creation (you can create Jenkins jobs directly from Homer)
Since pictures are worth a 1000 words, here's 2000 of them :-)
Figure 1 - List of all provisioned Masters
Figure 2 - Spot Fleet Editor UI
... View more