The Architecture of Open Source Applications (Volume 1)Continuous Integration

Continuous Integration (CI) systems are systems that build and test software automatically and regularly. Though their primary benefit lies in avoiding long periods between build and test runs, CI systems can also simplify and automate the execution of many otherwise tedious tasks. These include cross-platform testing, the regular running of slow, data-intensive, or difficult-to-configure tests, verification of proper performance on legacy platforms, detection of infrequently failing tests, and the regular production of up-to-date release products. And, because build and test automation is necessary for implementing continuous integration, CI is often a first step towards a continuous deployment framework wherein software updates can be deployed quickly to live systems after testing.

Continuous integration is a timely subject, not least because of its prominence in the Agile software methodology. There has been an explosion of open source CI tools in recent years, in and for a variety of languages, implementing a huge range of features in the context of a diverse set of architectural models. The purpose of this chapter is to describe common sets of features implemented in continuous integration systems, discuss the architectural options available, and examine which features may or may not be easy to implement given the choice of architecture.

Below, we will briefly describe a set of systems that exemplify the extremes of architectural choices available when designing a CI system. The first, Buildbot, is a master/slave system; the second, CDash is a reporting server model; the third Jenkins, uses a hybrid model; and the fourth, Pony-Build, is a Python-based decentralized reporting server that we will use as a foil for further discussion.

9.1. The Landscape

The space of architectures for continuous integration systems seems to be dominated by two extremes: master/slave architectures, in which a central server directs and controls remote builds; and reporting architectures, in which a central server aggregates build reports contributed by clients. All of the continuous integration systems of which we are aware have chosen some combination of features from these two architectures.

Our example of a centralized architecture, Buildbot, is composed of two parts: the central server, or buildmaster, which schedules and coordinates builds between one or more connected clients; and the clients, or buildslaves, which execute builds. The buildmaster provides a central location to which to connect, along with configuration information about which clients should execute which commands in what order. Buildslaves connect to the buildmaster and receive detailed instructions. Buildslave configuration consists of installing the software, identifying the master server, and providing connection credentials for the client to connect to the master. Builds are scheduled by the buildmaster, and output is streamed from the buildslaves to the buildmaster and kept on the master server for presentation via the Web and other reporting and notification systems.

On the opposite side of the architecture spectrum lies CDash, which is used for the Visualization Toolkit (VTK)/Insight Toolkit (ITK) projects by Kitware, Inc. CDash is essentially a reporting server, designed to store and present information received from client computers running CMake and CTest. With CDash, the clients initiate the build and test suite, record build and test results, and then connect to the CDash server to deposit the information for central reporting.

Finally, a third system, Jenkins (known as Hudson before a name change in 2011), provides both modes of operation. With Jenkins, builds can either be executed independently with the results sent to the master server; or nodes can be slaved to the Jenkins master server, which then schedules and directs the execution of builds.

Both the centralized and decentralized models have some features in common, and, as Jenkins shows, both models can co-exist in a single implementation. However, Buildbot and CDash exist in stark contrast to each other: apart from the commonalities of building software and reporting on the builds, essentially every other aspect of the architecture is different. Why?

Further, to what extent does the choice of architecture seem to make certain features easier or harder to implement? Do some features emerge naturally from a centralized model? And how extensible are the existing implementations—can they easily be modified to provide new reporting mechanisms, or scale to many packages, or execute builds and tests in a cloud environment?

9.1.1. What Does Continuous Integration Software Do?

The core functionality of a continuous integration system is simple: build software, run tests, and report the results. The build, test, and reporting can be performed by a script running from a scheduled task or cron job: such a script would just check out a new copy of the source code from the VCS, do a build, and then run the tests. Output would be logged to a file, and either stored in a canonical location or sent out via e-mail in case of a build failure. This is simple to implement: in UNIX, for example, this entire process can be implemented for most Python packages in a seven line script:

cd /tmp && \
svn checkout http://some.project.url && \
cd project_directory && \
python setup.py build && \
python setup.py test || \
echo build failed | sendmail notification@project.domain
cd /tmp && rm -fr project_directory

In Figure 9.1, the unshaded rectangles represent discrete subsystems and functionality within the system. Arrows show information flow between the various components. The cloud represents potential remote execution of build processes. The shaded rectangles represent potential coupling between the subsystems; for example, build monitoring may include monitoring of the build process itself and aspects of system health (CPU load, I/O load, memory usage, etc.)

Figure 9.1: Internals of a Continuous Integration System

But this simplicity is deceptive. Real-world CI systems usually do much more. In addition to initiating or receiving the results of remote build processes, continuous integration software may support any of the following additional features:

Checkout and update: For large projects, checking out a new copy of the source code can be costly in terms of bandwidth and time. Usually, CI systems update an existing working copy in place, which communicates only the differences from the previous update. In exchange for this savings, the system must keep track of the working copy and know how to update it, which usually means at least minimal integration with a VCS.
Abstract build recipes: A configure/build/test recipe must be written for the software in question. The underlying commands will often be different for different operating systems, e.g. Mac OS X vs. Windows vs. UNIX, which means either specialized recipes need to be written (introducing potential bugs or disconnection from the actual build environment) or some suitable level of abstraction for recipes must be provided by the CI configuration system.
Storing checkout/build/test status: It may be desirable for details of the checkout (files updated, code version), build (warnings or errors) and test (code coverage, performance, memory usage) to be stored and used for later analysis. These results can be used to answer questions across the build architectures (did the latest check-in significantly affect performance on any particular architecture?) or over history (has code coverage changed dramatically in the last month?) As with the build recipe, the mechanisms and data types for this kind of introspection are usually specific to the platform or build system.
Release packages: Builds may produce binary packages or other products that need to be made externally available. For example, developers who don't have direct access to the build machine may want to test the latest build in a specific architecture; to support this, the CI system needs to be able to transfer build products to a central repository.
Multiple architecture builds: Since one goal of continuous integration is to build on multiple architectures to test cross-platform functionality, the CI software may need to track the architecture for each build machine and link builds and build outcomes to each client.
Resource management: If a build step is resource intensive on a single machine, the CI system may want to run conditionally. For example, builds may wait for the absence of other builds or users, or delay until a particular CPU or memory load is reached.
External resource coordination: Integration tests may depend on non-local resources such as a staging database or a remote web service. The CI system may therefore need to coordinate builds between multiple machines to organize access to these resources.
Progress reports: For long build processes, regular reporting from the build may also be important. If a user is primarily interested in the results of the first 30 minutes of a 5 hour build and test, then it would be a waste of time to make them wait until the end of a run to see any results.

A high-level view of all of these potential components of a CI system is shown in Figure 9.1. CI software usually implements some subset of these components.

9.1.2. External Interactions

Continuous integration systems also need to interact with other systems. There are several types of potential interactions:

Build notification: The outcomes of builds generally need to be communicated to interested clients, either via pull (Web, RSS, RPC, etc.) or push notification (e-mail, Twitter, PubSubHubbub, etc.) This can include notification of all builds, or only failed builds, or builds that haven't been executed within a certain period.
Build information: Build details and products may need to be retrieved, usually via RPC or a bulk download system. For example, it may be desirable to have an external analysis system do an in-depth or more targeted analysis, or report on code coverage or performance results. In addition, an external test result repository may be employed to keep track of failing and successful tests separately from the CI system.
Build requests: External build requests from users or a code repository may need to be handled. Most VCSs have post-commit hooks that can execute an RPC call to initiate a build, for example. Or, users may request builds manually through a Web interface or other user-initiated RPC.
Remote control of the CI system: More generally, the entire runtime may be modifiable through a more-or-less well-defined RPC interface. Either ad hoc extensions or a more formally specified interface may need to be able to drive builds on specific platforms, specify alternate source branches in order to build with various patches, and execute additional builds conditionally. This is useful in support of more general workflow systems, e.g. to permit commits only after they have passed the full set of CI tests, or to test patches across a wide variety of systems before final integration. Because of the variety of bug tracker, patch systems, and other external systems in use, it may not make sense to include this logic within the CI system itself.

9.2. Architectures

Buildbot and CDash have chosen opposite architectures, and implement overlapping but distinct sets of features. Below we examine these feature sets and discuss how features are easier or harder to implement given the choice of architecture.

9.2.1. Implementation Model: Buildbot

Figure 9.2: Buildbot Architecture

Buildbot uses a master/slave architecture, with a single central server and multiple build slaves. Remote execution is entirely scripted by the master server in real time: the master configuration specifies the command to be executed on each remote system, and runs them when each previous command is finished. Scheduling and build requests are not only coordinated through the master but directed entirely by the master. No built-in recipe abstraction exists, except for basic version control system integration ("our code is in this repository") and a distinction between commands that operate on the build directory vs. within the build directory. OS-specific commands are typically specified directly in the configuration.

Buildbot maintains a constant connection with each buildslave, and manages and coordinates job execution between them. Managing remote machines through a persistent connection adds significant practical complexity to the implementation, and has been a long-standing source of bugs. Keeping robust long-term network connections running is not simple, and testing applications which interact with the local GUI is challenging through a network connection. OS alert windows are particularly difficult to deal with. However, this constant connection makes resource coordination and scheduling straightforward, because slaves are entirely at the disposal of the master for execution of jobs.

The kind of tight control designed into the Buildbot model makes centralized build coordination between resources very easy. Buildbot implements both master and slave locks on the buildmaster, so that builds can coordinate system-global and machine-local resources. This makes Buildbot particularly suitable for large installations that run system integration tests, e.g. tests that interact with databases or other expensive resources.

The centralized configuration causes problems for a distributed use model, however. Each new buildslave must be explicitly allowed for in the master configuration, which makes it impossible for new buildslaves to dynamically attach to the central server and offer build services or build results. Moreover, because each build slave is entirely driven by the build master, build clients are vulnerable to malicious or accidental misconfigurations: the master literally controls the client entirely, within the client OS security restrictions.

One limiting feature of Buildbot is that there is no simple way to return build products to the central server. For example, code coverage statistics and binary builds are kept on the remote buildslave, and there is no API to transmit them to the central buildmaster for aggregation and distribution. It is not clear why this feature is absent. It may be a consequence of the limited set of command abstractions distributed with Buildbot, which are focused on executing remote commands on the build slaves. Or, it may be due to the decision to use the connection between the buildmaster and buildslave as a control system, rather than as an RPC mechanism.

Another consequence of the master/slave model and this limited communications channel is that buildslaves do not report system utilization and the master cannot be configured to be aware of high slave load.

External CPU notification of build results is handled entirely by the buildmaster, and new notification services need to be implemented within the buildmaster itself. Likewise, new build requests must be communicated directly to the buildmaster.

9.2.2. Implementation Model: CDash

Figure 9.3: CDash Architecture

In contrast to Buildbot, CDash implements a reporting server model. In this model, the CDash server acts as a central repository for information on remotely executed builds, with associated reporting on build and test failures, code coverage analysis, and memory usage. Builds run on remote clients on their own schedule, and submit build reports in an XML format. Builds can be submitted both by "official" build clients and by non-core developers or users running the published build process on their own machines.

This simple model is made possible because of the tight conceptual integration between CDash and other elements of the Kitware build infrastructure: CMake, a build configuration system, CTest, a test runner, and CPack, a packaging system. This software provides a mechanism by which build, test, and packaging recipes can be implemented at a fairly high level of abstraction in an OS-agnostic manner.

CDash's client-driven process simplifies many aspects of the client-side CI process. The decision to run a build is made by build clients, so client-side conditions (time of day, high load, etc.) can be taken into account by the client before starting a build. Clients can appear and disappear as they wish, easily enabling volunteer builds and builds "in the cloud". Build products can be sent to the central server via a straightforward upload mechanism.

However, in exchange for this reporting model, CDash lacks many convenient features of Buildbot. There is no centralized coordination of resources, nor can this be implemented simply in a distributed environment with untrusted or unreliable clients. Progress reports are also not implemented: to do so, the server would have to allow incremental updating of build status. And, of course, there is no way to both globally request a build, and guarantee that anonymous clients perform the build in response to a check-in—clients must be considered unreliable.

Recently, CDash added functionality to enable an "@Home" cloud build system, in which clients offer build services to a CDash server. Clients poll the server for build requests, execute them upon request, and return the results to the server. In the current implementation (October 2010), builds must be manually requested on the server side, and clients must be connected for the server to offer their services. However, it is straightforward to extend this to a more generic scheduled-build model in which builds are requested automatically by the server whenever a relevant client is available. The "@Home" system is very similar in concept to the Pony-Build system described later.

9.2.3. Implementation Model: Jenkins

Jenkins is a widely used continuous integration system implemented in Java; until early 2011, it was known as Hudson. It is capable of acting either as a standalone CI system with execution on a local system, or as a coordinator of remote builds, or even as a passive receiver of remote build information. It takes advantage of the JUnit XML standard for unit test and code coverage reporting to integrate reports from a variety of test tools. Jenkins originated with Sun, but is very widely used and has a robust open-source community associated with it.

Jenkins operates in a hybrid mode, defaulting to master-server build execution but allowing a variety of methods for executing remote builds, including both server- and client-initiated builds. Like Buildbot, however, it is primarily designed for central server control, but has been adapted to support a wide variety of distributed job initiation mechanisms, including virtual machine management.

Jenkins can manage multiple remote machines through a connection initiated by the master via an SSH connection, or from the client via JNLP (Java Web Start). This connection is two-way, and supports the communication of objects and data via serial transport.

Jenkins has a robust plugin architecture that abstracts the details of this connection, which has allowed the development of many third-party plugins to support the return of binary builds and more significant result data.

For jobs that are controlled by a central server, Jenkins has a "locks" plugin to discourage jobs from running in parallel, although as of January 2011 it is not yet fully developed.

9.2.4. Implementation Model: Pony-Build

Figure 9.4: Pony-Build Architecture

Pony-Build is a proof-of-concept decentralized CI system written in Python. It is composed of three core components, which are illustrated in Figure 9.4. The results server acts as a centralized database containing build results received from individual clients. The clients independently contain all configuration information and build context, coupled with a lightweight client-side library to help with VCS repository access, build process management, and the communication of results to the server. The reporting server is optional, and contains a simple Web interface, both for reporting on the results of builds and potentially for requesting new builds. In our implementation, the reporting server and results server run in a single multithreaded process but are loosely coupled at the API level and could easily be altered to run independently.

This basic model is decorated with a variety of webhooks and RPC mechanisms to facilitate build and change notification and build introspection. For example, rather than tying VCS change notification from the code repository directly into the build system, remote build requests are directed to the reporting system, which communicates them to the results server. Likewise, rather than building push notification of new builds out to e-mail, instant messaging, and other services directly into the reporting server, notification is controlled using the PubSubHubbub (PuSH) active notification protocol. This allows a wide variety of consuming applications to receive notification of "interesting" events (currently limited to new builds and failed builds) via a PuSH webhook.

The advantages of this very decoupled model are substantial:

Ease of communication: The basic architectural components and webhook protocols are extremely easy to implement, requiring only a basic knowledge of Web programming.
Easy modification: The implementation of new notification methods, or a new reporting server interface, is extremely simple.
Multiple language support: Since the various components call each other via webhooks, which are supported by most programming languages, different components can be implemented in different languages.
Testability: Each component can be completely isolated and mocked, so the system is very testable.
Ease of configuration: The client-side requirements are minimal, with only a single library file required beyond Python itself.
Minimal server load: Since the central server has virtually no control responsibilities over the clients, isolated clients can run in parallel without contacting the server and placing any corresponding load it, other than at reporting time.
VCS integration: Build configuration is entirely client side, allowing it to be included within the VCS.
Ease of results access: Applications that wish to consume build results can be written in any language capable of an XML-RPC request. Users of the build system can be granted access to the results and reporting servers at the network level, or via a customized interface at the reporting server. The build clients only need access to post results to the results server.

Unfortunately, there are also many serious disadvantages, as with the CDash model:

Difficulty requesting builds: This difficulty is introduced by having the build clients be entirely independent of the results server. Clients may poll the results server, to see if a build request is operative, but this introduces high load and significant latency. Alternatively, command and control connections need to be established to allow the server to notify clients of build requests directly. This introduces more complexity into the system and eliminates the advantages of decoupled build clients.
Poor support for resource locking: It is easy to provide an RPC mechanism for holding and releasing resource locks, but much more difficult to enforce client policies. While CI systems like CDash assume good faith on the client side, clients may fail unintentionally and badly, e.g. without releasing locks. Implementing a robust distributed locking system is hard and adds significant undesired complexity. For example, in order to provide master resource locks with unreliable clients, the master lock controller must have a policy in place for clients that take a lock and never release it, either because they crash or because of a deadlock situation.
Poor support for real-time monitoring: Real-time monitoring of the build, and control of the build process itself, is challenging to implement in a system without a constant connection. One significant advantage of Buildbot over the client-driven model is that intermediate inspection of long builds is easy, because the results of the build are communicated to the master interface incrementally. Moreover, because Buildbot retains a control connection, if a long build goes bad in the middle due to misconfiguration or a bad check-in, it can be interrupted and aborted. Adding such a feature into Pony-Build, in which the results server has no guaranteed ability to contact the clients, would require either constant polling by the clients, or the addition of a standing connection to the clients.

Two other aspects of CIs that were raised by Pony-Build were how best to implement recipes, and how to manage trust. These are intertwined issues, because recipes execute arbitrary code on build clients.

9.2.5. Build Recipes

Build recipes add a useful level of abstraction, especially for software built in a cross-platform language or using a multi-platform build system. For example, CDash relies on a strict kind of recipe; most, or perhaps all, software that uses CDash is built with CMake, CTest, and CPack, and these tools are built to handle multi-platform issues. This is the ideal situation from the viewpoint of a continuous integration system, because the CI system can simply delegate all issues to the build tool chain.

However, this is not true for all languages and build environments. In the Python ecosystem, there has been increasing standardization around distutils and distutils2 for building and packaging software, but as yet no standard has emerged for discovering and running tests, and collating the results. Moreover, many of the more complex Python packages add specialized build logic into their system, through a distutils extension mechanism that allows the execution of arbitrary code. This is typical of most build tool chains: while there may be a fairly standard set of commands to be run, there are always exceptions and extensions.

Recipes for building, testing, and packaging are therefore problematic, because they must solve two problems: first, they should be specified in a platform independent way, so that a single recipe can be used to build software on multiple systems; and second, they must be customizable to the software being built.

9.2.6. Trust

This raises a third problem. Widespread use of recipes by a CI system introduces a second party that must be trusted by the system: not only must the software itself be trustworthy (because the CI clients are executing arbitrary code), but the recipes must also be trustworthy (because they, too, must be able to execute arbitrary code).

These trust issues are easy to handle in a tightly controlled environment, e.g. a company where the build clients and CI system are part of an internal process. In other development environments, however, interested third parties may want to offer build services, for example to open source projects. The ideal solution would be to support the inclusion of standard build recipes in software on a community level, a direction that the Python community is taking with distutils2. An alternative solution would be to allow for the use of digitally signed recipes, so that trusted individuals could write and distribute signed recipes, and CI clients could check to see if they should trust the recipes.

9.2.7. Choosing a Model

In our experience, a loosely coupled RPC or webhook callback-based model for continuous integration is extremely easy to implement, as long as one ignores any requirements for tight coordination that would involve complex coupling. Basic execution of remote checkouts and builds has similar design constraints whether the build is being driven locally or remotely; collection of information about the build (success/failure, etc.) is primarily driven by client-side requirements; and tracking information by architecture and result involves the same basic requirements. Thus a basic CI system can be implemented quite easily using the reporting model.

We found the loosely coupled model to be very flexible and expandable, as well. Adding new results reporting, notification mechanisms, and build recipes is easy because the components are clearly separated and quite independent. Separated components have clearly delegated tasks to perform, and are also easy to test and easy to modify.

The only challenging aspect of remote builds in a CDash-like loosely-coupled model is build coordination: starting and stopping builds, reporting on ongoing builds, and coordinating resource locks between different clients is technically demanding compared to the rest of the implementation.

It is easy to reach the conclusion that the loosely coupled model is "better" all around, but obviously this is only true if build coordination is not needed. This decision should be made based on the needs of projects using the CI system.

9.3. The Future

While thinking about Pony-Build, we came up with a few features that we would like to see in future continuous integration systems.

A language-agnostic set of build recipes: Currently, each continuous integration system reinvents the wheel by providing its own build configuration language, which is manifestly ridiculous; there are fewer than a dozen commonly used build systems, and probably only a few dozen test runners. Nonetheless, each CI system has a new and different way of specifying the build and test commands to be run. In fact, this seems to be one of the reasons why so many basically identical CI systems exist: each language and community implements their own configuration system, tailored to their own build and test systems, and then layers on the same set of features above that system. Therefore, building a domain-specific language (DSL) capable of representing the options used by the few dozen commonly used build and test tool chains would go a long way toward simplifying the CI landscape.
Common formats for build and test reporting: There is little agreement on exactly what information, in what format, a build and test system needs to provide. If a common format or standard could be developed it would make it much easier for continuous integration systems to offer both detailed and summary views across builds. The Test Anywhere Protocol, TAP (from the Perl community) and the JUnit XML test output format (from the Java community) are two interesting options that are capable of encoding information about number of tests run, successes and failures, and per-file code coverage details.
Increased granularity and introspection in reporting: Alternatively, it would be convenient if different build platforms provided a well-documented set of hooks into their configuration, compilation, and test systems. This would provide an API (rather than a common format) that CI systems could use to extract more detailed information about builds.

9.3.1. Concluding Thoughts

The continuous integration systems described above implemented features that fit their architecture, while the hybrid Jenkins system started with a master/slave model but added features from the more loosely coupled reporting architecture.

It is tempting to conclude that architecture dictates function. This is nonsense, of course. Rather, the choice of architecture seems to canalize or direct development towards a particular set of features. For Pony-Build, we were surprised at the extent to which our initial choice of a CDash-style reporting architecture drove later design and implementation decisions. Some implementation choices, such as the avoidance of a centralized configuration and scheduling system in Pony-Build were driven by our use cases: we needed to allow dynamic attachment of remote build clients, which is difficult to support with Buildbot. Other features we didn't implement, such as progress reports and centralized resource locking in Pony-Build, were desirable but simply too complicated to add without a compelling requirement.

Similar logic may apply to Buildbot, CDash, and Jenkins. In each case there are useful features that are absent, perhaps due to architectural incompatibility. However, from discussions with members of the Buildbot and CDash communities, and from reading the Jenkins website, it seems likely that the desired features were chosen first, and the system was then developed using an architecture that permitted those features to be easily implemented. For example, CDash serves a community with a relatively small set of core developers, who develop software using a centralized model. Their primary consideration is to keep the software working on a core set of machines, and secondarily to receive bug reports from tech-savvy users. Meanwhile, Buildbot is increasingly used in complex build environments with many clients that require coordination to access shared resources. Buildbot's more flexible configuration file format with its many options for scheduling, change notification, and resource locks fits that need better than the other options. Finally, Jenkins seems aimed at ease of use and simple continuous integration, with a full GUI for configuring it and configuration options for running on the local server.

The sociology of open source development is another confounding factor in correlating architecture with features: suppose developers choose open source projects based on how well the project architecture and features fit their use case? If so, then their contributions will generally reflect an extension of a use case that already fits the project well. Thus projects may get locked into a certain feature set, since contributors are self-selected and may avoid projects with architectures that don't fit their own desired features. This was certainly true for us in choosing to implement a new system, Pony-Build, rather than contributing to Buildbot: the Buildbot architecture was simply not appropriate for building hundreds or thousands of packages.

Existing continuous integration systems are generally built around one of two disparate architectures, and generally implement only a subset of desirable features. As CI systems mature and their user populations grow, we would expect them to grow additional features; however, implementation of these features may be constrained by the base choice of architecture. It will be interesting to see how the field evolves.

9.3.2. Acknowledgments

We thank Greg Wilson, Brett Cannon, Eric Holscher, Jesse Noller, and Victoria Laidler for interesting discussions on CI systems in general, and Pony-Build in particular. Several students contributed to Pony-Build development, including Jack Carlson, Fatima Cherkaoui, Max Laite, and Khushboo Shakya.