Continuous Integration (CI) systems are systems that build and test software automatically and regularly. Though their primary benefit lies in avoiding long periods between build and test runs, CI systems can also simplify and automate the execution of many otherwise tedious tasks. These include cross-platform testing, the regular running of slow, data-intensive, or difficult-to-configure tests, verification of proper performance on legacy platforms, detection of infrequently failing tests, and the regular production of up-to-date release products. And, because build and test automation is necessary for implementing continuous integration, CI is often a first step towards a continuous deployment framework wherein software updates can be deployed quickly to live systems after testing.
Continuous integration is a timely subject, not least because of its prominence in the Agile software methodology. There has been an explosion of open source CI tools in recent years, in and for a variety of languages, implementing a huge range of features in the context of a diverse set of architectural models. The purpose of this chapter is to describe common sets of features implemented in continuous integration systems, discuss the architectural options available, and examine which features may or may not be easy to implement given the choice of architecture.
Below, we will briefly describe a set of systems that exemplify the extremes of architectural choices available when designing a CI system. The first, Buildbot, is a master/slave system; the second, CDash is a reporting server model; the third Jenkins, uses a hybrid model; and the fourth, Pony-Build, is a Python-based decentralized reporting server that we will use as a foil for further discussion.
The space of architectures for continuous integration systems seems to be dominated by two extremes: master/slave architectures, in which a central server directs and controls remote builds; and reporting architectures, in which a central server aggregates build reports contributed by clients. All of the continuous integration systems of which we are aware have chosen some combination of features from these two architectures.
Our example of a centralized architecture, Buildbot, is composed of two parts: the central server, or buildmaster, which schedules and coordinates builds between one or more connected clients; and the clients, or buildslaves, which execute builds. The buildmaster provides a central location to which to connect, along with configuration information about which clients should execute which commands in what order. Buildslaves connect to the buildmaster and receive detailed instructions. Buildslave configuration consists of installing the software, identifying the master server, and providing connection credentials for the client to connect to the master. Builds are scheduled by the buildmaster, and output is streamed from the buildslaves to the buildmaster and kept on the master server for presentation via the Web and other reporting and notification systems.
On the opposite side of the architecture spectrum lies CDash, which is used for the Visualization Toolkit (VTK)/Insight Toolkit (ITK) projects by Kitware, Inc. CDash is essentially a reporting server, designed to store and present information received from client computers running CMake and CTest. With CDash, the clients initiate the build and test suite, record build and test results, and then connect to the CDash server to deposit the information for central reporting.
Finally, a third system, Jenkins (known as Hudson before a name change in 2011), provides both modes of operation. With Jenkins, builds can either be executed independently with the results sent to the master server; or nodes can be slaved to the Jenkins master server, which then schedules and directs the execution of builds.
Both the centralized and decentralized models have some features in common, and, as Jenkins shows, both models can co-exist in a single implementation. However, Buildbot and CDash exist in stark contrast to each other: apart from the commonalities of building software and reporting on the builds, essentially every other aspect of the architecture is different. Why?
Further, to what extent does the choice of architecture seem to make certain features easier or harder to implement? Do some features emerge naturally from a centralized model? And how extensible are the existing implementations—can they easily be modified to provide new reporting mechanisms, or scale to many packages, or execute builds and tests in a cloud environment?
The core functionality of a continuous integration system is simple: build software, run tests, and report the results. The build, test, and reporting can be performed by a script running from a scheduled task or cron job: such a script would just check out a new copy of the source code from the VCS, do a build, and then run the tests. Output would be logged to a file, and either stored in a canonical location or sent out via e-mail in case of a build failure. This is simple to implement: in UNIX, for example, this entire process can be implemented for most Python packages in a seven line script:
cd /tmp && \ svn checkout http://some.project.url && \ cd project_directory && \ python setup.py build && \ python setup.py test || \ echo build failed | sendmail notification@project.domain cd /tmp && rm -fr project_directory
In Figure 9.1, the unshaded rectangles represent discrete subsystems and functionality within the system. Arrows show information flow between the various components. The cloud represents potential remote execution of build processes. The shaded rectangles represent potential coupling between the subsystems; for example, build monitoring may include monitoring of the build process itself and aspects of system health (CPU load, I/O load, memory usage, etc.)
Figure 9.1: Internals of a Continuous Integration System
But this simplicity is deceptive. Real-world CI systems usually do much more. In addition to initiating or receiving the results of remote build processes, continuous integration software may support any of the following additional features:
A high-level view of all of these potential components of a CI system is shown in Figure 9.1. CI software usually implements some subset of these components.
Continuous integration systems also need to interact with other systems. There are several types of potential interactions:
Buildbot and CDash have chosen opposite architectures, and implement overlapping but distinct sets of features. Below we examine these feature sets and discuss how features are easier or harder to implement given the choice of architecture.
Figure 9.2: Buildbot Architecture
Buildbot uses a master/slave architecture, with a single central server and multiple build slaves. Remote execution is entirely scripted by the master server in real time: the master configuration specifies the command to be executed on each remote system, and runs them when each previous command is finished. Scheduling and build requests are not only coordinated through the master but directed entirely by the master. No built-in recipe abstraction exists, except for basic version control system integration ("our code is in this repository") and a distinction between commands that operate on the build directory vs. within the build directory. OS-specific commands are typically specified directly in the configuration.
Buildbot maintains a constant connection with each buildslave, and manages and coordinates job execution between them. Managing remote machines through a persistent connection adds significant practical complexity to the implementation, and has been a long-standing source of bugs. Keeping robust long-term network connections running is not simple, and testing applications which interact with the local GUI is challenging through a network connection. OS alert windows are particularly difficult to deal with. However, this constant connection makes resource coordination and scheduling straightforward, because slaves are entirely at the disposal of the master for execution of jobs.
The kind of tight control designed into the Buildbot model makes centralized build coordination between resources very easy. Buildbot implements both master and slave locks on the buildmaster, so that builds can coordinate system-global and machine-local resources. This makes Buildbot particularly suitable for large installations that run system integration tests, e.g. tests that interact with databases or other expensive resources.
The centralized configuration causes problems for a distributed use model, however. Each new buildslave must be explicitly allowed for in the master configuration, which makes it impossible for new buildslaves to dynamically attach to the central server and offer build services or build results. Moreover, because each build slave is entirely driven by the build master, build clients are vulnerable to malicious or accidental misconfigurations: the master literally controls the client entirely, within the client OS security restrictions.
One limiting feature of Buildbot is that there is no simple way to return build products to the central server. For example, code coverage statistics and binary builds are kept on the remote buildslave, and there is no API to transmit them to the central buildmaster for aggregation and distribution. It is not clear why this feature is absent. It may be a consequence of the limited set of command abstractions distributed with Buildbot, which are focused on executing remote commands on the build slaves. Or, it may be due to the decision to use the connection between the buildmaster and buildslave as a control system, rather than as an RPC mechanism.
Another consequence of the master/slave model and this limited communications channel is that buildslaves do not report system utilization and the master cannot be configured to be aware of high slave load.
External CPU notification of build results is handled entirely by the buildmaster, and new notification services need to be implemented within the buildmaster itself. Likewise, new build requests must be communicated directly to the buildmaster.
Figure 9.3: CDash Architecture
In contrast to Buildbot, CDash implements a reporting server model. In this model, the CDash server acts as a central repository for information on remotely executed builds, with associated reporting on build and test failures, code coverage analysis, and memory usage. Builds run on remote clients on their own schedule, and submit build reports in an XML format. Builds can be submitted both by "official" build clients and by non-core developers or users running the published build process on their own machines.
This simple model is made possible because of the tight conceptual integration between CDash and other elements of the Kitware build infrastructure: CMake, a build configuration system, CTest, a test runner, and CPack, a packaging system. This software provides a mechanism by which build, test, and packaging recipes can be implemented at a fairly high level of abstraction in an OS-agnostic manner.
CDash's client-driven process simplifies many aspects of the client-side CI process. The decision to run a build is made by build clients, so client-side conditions (time of day, high load, etc.) can be taken into account by the client before starting a build. Clients can appear and disappear as they wish, easily enabling volunteer builds and builds "in the cloud". Build products can be sent to the central server via a straightforward upload mechanism.
However, in exchange for this reporting model, CDash lacks many convenient features of Buildbot. There is no centralized coordination of resources, nor can this be implemented simply in a distributed environment with untrusted or unreliable clients. Progress reports are also not implemented: to do so, the server would have to allow incremental updating of build status. And, of course, there is no way to both globally request a build, and guarantee that anonymous clients perform the build in response to a check-in—clients must be considered unreliable.
Recently, CDash added functionality to enable an "@Home" cloud build system, in which clients offer build services to a CDash server. Clients poll the server for build requests, execute them upon request, and return the results to the server. In the current implementation (October 2010), builds must be manually requested on the server side, and clients must be connected for the server to offer their services. However, it is straightforward to extend this to a more generic scheduled-build model in which builds are requested automatically by the server whenever a relevant client is available. The "@Home" system is very similar in concept to the Pony-Build system described later.
Jenkins is a widely used continuous integration system implemented in Java; until early 2011, it was known as Hudson. It is capable of acting either as a standalone CI system with execution on a local system, or as a coordinator of remote builds, or even as a passive receiver of remote build information. It takes advantage of the JUnit XML standard for unit test and code coverage reporting to integrate reports from a variety of test tools. Jenkins originated with Sun, but is very widely used and has a robust open-source community associated with it.
Jenkins operates in a hybrid mode, defaulting to master-server build execution but allowing a variety of methods for executing remote builds, including both server- and client-initiated builds. Like Buildbot, however, it is primarily designed for central server control, but has been adapted to support a wide variety of distributed job initiation mechanisms, including virtual machine management.
Jenkins can manage multiple remote machines through a connection initiated by the master via an SSH connection, or from the client via JNLP (Java Web Start). This connection is two-way, and supports the communication of objects and data via serial transport.
Jenkins has a robust plugin architecture that abstracts the details of this connection, which has allowed the development of many third-party plugins to support the return of binary builds and more significant result data.
For jobs that are controlled by a central server, Jenkins has a "locks" plugin to discourage jobs from running in parallel, although as of January 2011 it is not yet fully developed.
Figure 9.4: Pony-Build Architecture
Pony-Build is a proof-of-concept decentralized CI system written in Python. It is composed of three core components, which are illustrated in Figure 9.4. The results server acts as a centralized database containing build results received from individual clients. The clients independently contain all configuration information and build context, coupled with a lightweight client-side library to help with VCS repository access, build process management, and the communication of results to the server. The reporting server is optional, and contains a simple Web interface, both for reporting on the results of builds and potentially for requesting new builds. In our implementation, the reporting server and results server run in a single multithreaded process but are loosely coupled at the API level and could easily be altered to run independently.
This basic model is decorated with a variety of webhooks and RPC mechanisms to facilitate build and change notification and build introspection. For example, rather than tying VCS change notification from the code repository directly into the build system, remote build requests are directed to the reporting system, which communicates them to the results server. Likewise, rather than building push notification of new builds out to e-mail, instant messaging, and other services directly into the reporting server, notification is controlled using the PubSubHubbub (PuSH) active notification protocol. This allows a wide variety of consuming applications to receive notification of "interesting" events (currently limited to new builds and failed builds) via a PuSH webhook.
The advantages of this very decoupled model are substantial:
Unfortunately, there are also many serious disadvantages, as with the CDash model:
Two other aspects of CIs that were raised by Pony-Build were how best to implement recipes, and how to manage trust. These are intertwined issues, because recipes execute arbitrary code on build clients.
Build recipes add a useful level of abstraction, especially for software built in a cross-platform language or using a multi-platform build system. For example, CDash relies on a strict kind of recipe; most, or perhaps all, software that uses CDash is built with CMake, CTest, and CPack, and these tools are built to handle multi-platform issues. This is the ideal situation from the viewpoint of a continuous integration system, because the CI system can simply delegate all issues to the build tool chain.
However, this is not true for all languages and build environments. In the Python ecosystem, there has been increasing standardization around distutils and distutils2 for building and packaging software, but as yet no standard has emerged for discovering and running tests, and collating the results. Moreover, many of the more complex Python packages add specialized build logic into their system, through a distutils extension mechanism that allows the execution of arbitrary code. This is typical of most build tool chains: while there may be a fairly standard set of commands to be run, there are always exceptions and extensions.
Recipes for building, testing, and packaging are therefore problematic, because they must solve two problems: first, they should be specified in a platform independent way, so that a single recipe can be used to build software on multiple systems; and second, they must be customizable to the software being built.
This raises a third problem. Widespread use of recipes by a CI system introduces a second party that must be trusted by the system: not only must the software itself be trustworthy (because the CI clients are executing arbitrary code), but the recipes must also be trustworthy (because they, too, must be able to execute arbitrary code).
These trust issues are easy to handle in a tightly controlled environment, e.g. a company where the build clients and CI system are part of an internal process. In other development environments, however, interested third parties may want to offer build services, for example to open source projects. The ideal solution would be to support the inclusion of standard build recipes in software on a community level, a direction that the Python community is taking with distutils2. An alternative solution would be to allow for the use of digitally signed recipes, so that trusted individuals could write and distribute signed recipes, and CI clients could check to see if they should trust the recipes.
In our experience, a loosely coupled RPC or webhook callback-based model for continuous integration is extremely easy to implement, as long as one ignores any requirements for tight coordination that would involve complex coupling. Basic execution of remote checkouts and builds has similar design constraints whether the build is being driven locally or remotely; collection of information about the build (success/failure, etc.) is primarily driven by client-side requirements; and tracking information by architecture and result involves the same basic requirements. Thus a basic CI system can be implemented quite easily using the reporting model.
We found the loosely coupled model to be very flexible and expandable, as well. Adding new results reporting, notification mechanisms, and build recipes is easy because the components are clearly separated and quite independent. Separated components have clearly delegated tasks to perform, and are also easy to test and easy to modify.
The only challenging aspect of remote builds in a CDash-like loosely-coupled model is build coordination: starting and stopping builds, reporting on ongoing builds, and coordinating resource locks between different clients is technically demanding compared to the rest of the implementation.
It is easy to reach the conclusion that the loosely coupled model is "better" all around, but obviously this is only true if build coordination is not needed. This decision should be made based on the needs of projects using the CI system.
While thinking about Pony-Build, we came up with a few features that we would like to see in future continuous integration systems.
The continuous integration systems described above implemented features that fit their architecture, while the hybrid Jenkins system started with a master/slave model but added features from the more loosely coupled reporting architecture.
It is tempting to conclude that architecture dictates function. This is nonsense, of course. Rather, the choice of architecture seems to canalize or direct development towards a particular set of features. For Pony-Build, we were surprised at the extent to which our initial choice of a CDash-style reporting architecture drove later design and implementation decisions. Some implementation choices, such as the avoidance of a centralized configuration and scheduling system in Pony-Build were driven by our use cases: we needed to allow dynamic attachment of remote build clients, which is difficult to support with Buildbot. Other features we didn't implement, such as progress reports and centralized resource locking in Pony-Build, were desirable but simply too complicated to add without a compelling requirement.
Similar logic may apply to Buildbot, CDash, and Jenkins. In each case there are useful features that are absent, perhaps due to architectural incompatibility. However, from discussions with members of the Buildbot and CDash communities, and from reading the Jenkins website, it seems likely that the desired features were chosen first, and the system was then developed using an architecture that permitted those features to be easily implemented. For example, CDash serves a community with a relatively small set of core developers, who develop software using a centralized model. Their primary consideration is to keep the software working on a core set of machines, and secondarily to receive bug reports from tech-savvy users. Meanwhile, Buildbot is increasingly used in complex build environments with many clients that require coordination to access shared resources. Buildbot's more flexible configuration file format with its many options for scheduling, change notification, and resource locks fits that need better than the other options. Finally, Jenkins seems aimed at ease of use and simple continuous integration, with a full GUI for configuring it and configuration options for running on the local server.
The sociology of open source development is another confounding factor in correlating architecture with features: suppose developers choose open source projects based on how well the project architecture and features fit their use case? If so, then their contributions will generally reflect an extension of a use case that already fits the project well. Thus projects may get locked into a certain feature set, since contributors are self-selected and may avoid projects with architectures that don't fit their own desired features. This was certainly true for us in choosing to implement a new system, Pony-Build, rather than contributing to Buildbot: the Buildbot architecture was simply not appropriate for building hundreds or thousands of packages.
Existing continuous integration systems are generally built around one of two disparate architectures, and generally implement only a subset of desirable features. As CI systems mature and their user populations grow, we would expect them to grow additional features; however, implementation of these features may be constrained by the base choice of architecture. It will be interesting to see how the field evolves.
We thank Greg Wilson, Brett Cannon, Eric Holscher, Jesse Noller, and Victoria Laidler for interesting discussions on CI systems in general, and Pony-Build in particular. Several students contributed to Pony-Build development, including Jack Carlson, Fatima Cherkaoui, Max Laite, and Khushboo Shakya.