The Architecture of Open Source Applications (Volume 2)Twisted

Twisted is an event-driven networking engine in Python. It was born in the early 2000s, when the writers of networked games had few scalable and no cross-platform libraries, in any language, at their disposal. The authors of Twisted tried to develop games in the existing networking landscape, struggled, saw a clear need for a scalable, event-driven, cross-platform networking framework and decided to make one happen, learning from the mistakes and hardships of past game and networked application writers.

Twisted supports many common transport and application layer protocols, including TCP, UDP, SSL/TLS, HTTP, IMAP, SSH, IRC, and FTP. Like the language in which it is written, it is "batteries-included"; Twisted comes with client and server implementations for all of its protocols, as well as utilities that make it easy to configure and deploy production-grade Twisted applications from the command line.

21.1. Why Twisted?

In 2000, glyph, the creator of Twisted, was working on a text-based multiplayer game called Twisted Reality. It was a big mess of threads, 3 per connection, in Java. There was a thread for input that would block on reads, a thread for output that would block on some kind of write, and a "logic" thread that would sleep while waiting for timers to expire or events to queue. As players moved through the virtual landscape and interacted, threads were deadlocking, caches were getting corrupted, and the locking logic was never quite right—the use of threads had made the software complicated, buggy, and hard to scale.

Seeking alternatives, he discovered Python, and in particular Python's select module for multiplexing I/O from stream objects like sockets and pipes (the Single UNIX Specification, Version 3 (SUSv3) describes the select API). At the time, Java didn't expose the operating system's select interface or any other asynchronous I/O API (The java.nio package for non-blocking I/O was added in J2SE 1.4, released in 2002). A quick prototype of the game in Python using select immediately proved less complex and more reliable than the threaded version.

An instant convert to Python, select, and event-driven programming, glyph wrote a client and server for the game in Python using the select API. But then he wanted to do more. Fundamentally, he wanted to be able to turn network activity into method calls on objects in the game. What if you could receive email in the game, like the Nethack mailer daemon? What if every player in the game had a home page? Glyph found himself needing good Python IMAP and HTTP clients and servers that used select.

He first turned to Medusa, a platform developed in the mid-'90s for writing networking servers in Python based on the asyncore module. asyncore is an asynchronous socket handler that builds a dispatcher and callback interface on top of the operating system's select API.

Glyph was facing the prospect of implementing a networking platform himself and realized that Twisted Reality had opened the door to a problem that was just as interesting as his game.

Over time, Twisted Reality the game became Twisted the networking platform, which would do what existing networking platforms in Python didn't:

21.2. The Architecture of Twisted

Twisted is an event-driven networking engine. Event-driven programming is so integral to Twisted's design philosophy that it is worth taking a moment to review what exactly event-driven programming means.

Event-driven programming is a programming paradigm in which program flow is determined by external events. It is characterized by an event loop and the use of callbacks to trigger actions when events happen. Two other common programming paradigms are (single-threaded) synchronous and multi-threaded programming.

Let's compare and contrast single-threaded, multi-threaded, and event-driven programming models with an example. Figure 21.1 shows the work done by a program over time under these three models. The program has three tasks to complete, each of which blocks while waiting for I/O to finish. Time spent blocking on I/O is greyed out.

In the single-threaded synchronous version of the program, tasks are performed serially. If one task blocks for a while on I/O, all of the other tasks have to wait until it finishes and they are executed in turn. This definite order and serial processing are easy to reason about, but the program is unnecessarily slow if the tasks don't depend on each other, yet still have to wait for each other.

In the threaded version of the program, the three tasks that block while doing work are performed in separate threads of control. These threads are managed by the operating system and may run concurrently on multiple processors or interleaved on a single processor. This allows progress to be made by some threads while others are blocking on resources. This is often more time-efficient than the analogous synchronous program, but one has to write code to protect shared resources that could be accessed concurrently from multiple threads. Multi-threaded programs can be harder to reason about because one now has to worry about thread safety via process serialization (locking), reentrancy, thread-local storage, or other mechanisms, which when implemented improperly can lead to subtle and painful bugs.

The event-driven version of the program interleaves the execution of the three tasks, but in a single thread of control. When performing I/O or other expensive operations, a callback is registered with an event loop, and then execution continues while the I/O completes. The callback describes how to handle an event once it has completed. The event loop polls for events and dispatches them as they arrive, to the callbacks that are waiting for them. This allows the program to make progress when it can without the use of additional threads. Event-driven programs can be easier to reason about than multi-threaded programs because the programmer doesn't have to worry about thread safety.

It is also a good choice when an application has to share mutable data between tasks, because no synchronization has to be performed.

Networking applications often have exactly these properties, which is what makes them such a good fit for the event-driven programming model.

Reusing Existing Applications

Many popular clients and servers for various networking protocols already existed when Twisted was created. Why did glyph not just use Apache, IRCd, BIND, OpenSSH, or any of the other pre-existing applications whose clients and servers would have to get re-implemented from scratch for Twisted?

The problem is that all of these server implementations have networking code written from scratch, typically in C, with application code coupled directly to the networking layer. This makes them very difficult to use as libraries. They have to be treated as black boxes when used together, giving a developer no chance to reuse code if he or she wanted to expose the same data over multiple protocols. Additionally, the server and client implementations are often separate applications that don't share code. Extending these applications and maintaining cross-platform client-server compatibility is harder than it needs to be.

With Twisted, the clients and servers are written in Python using a consistent interface. This makes it is easy to write new clients and servers, to share code between clients and servers, to share application logic between protocols, and to test one's code.

The Reactor

Twisted implements the reactor design pattern, which describes demultiplexing and dispatching events from multiple sources to their handlers in a single-threaded environment.

The core of Twisted is the reactor event loop. The reactor knows about network, file system, and timer events. It waits on and then handles these events, abstracting away platform-specific behavior and presenting interfaces to make responding to events anywhere in the network stack easy.

A reactor based on the poll API (decribed in the Single UNIX Specification, Version 3 (SUSv3)) is the current default on all platforms. Twisted additionally supports a number of platform-specific high-volume multiplexing APIs. Platform-specific reactors include the KQueue reactor based on FreeBSD's kqueue mechanism, an epoll-based reactor for systems supporting the epoll interface (currently Linux 2.6), and an IOCP reactor based on Windows Input/Output Completion Ports.

Examples of polling implementation-dependent details that Twisted takes care of include:

Twisted's reactor implementation also takes care of using the underlying non-blocking APIs correctly and handling obscure edge cases correctly. Python doesn't expose the IOCP API at all, so Twisted maintains its own implementation.

Managing Callback Chains

Callbacks are a fundamental part of event-driven programming and are the way that the reactor indicates to an application that events have completed. As event-driven programs grow, handling both the success and error cases for the events in one's application becomes increasingly complex. Failing to register an appropriate callback can leave a program blocking on event processing that will never happen, and errors might have to propagate up a chain of callbacks from the networking stack through the layers of an application.

Let's examine some of the pitfalls of event-driven programs by comparing synchronous and asynchronous versions of a toy URL fetching utility in Python-like pseudo-code:

Synchronous URL fetcher:

import getPage

def processPage(page):
    print page

def logError(error):
    print error

def finishProcessing(value):
    print "Shutting down..."
    exit(0)

url = "http://google.com"
try:
    page = getPage(url)
    processPage(page)
except Error, e:
    logError(error)
finally:
    finishProcessing()

Asynchronous URL fetcher:

from twisted.internet import reactor
import getPage

def processPage(page):
    print page
    finishProcessing()

def logError(error):
    print error
    finishProcessing()

def finishProcessing(value):
    print "Shutting down..."
    reactor.stop()

url = "http://google.com"
# getPage takes: url, 
#    success callback, error callback
getPage(url, processPage, logError)

reactor.run()

In the asynchronous URL fetcher, reactor.run() starts the reactor event loop. In both the synchronous and asynchronous versions, a hypothetical getPage function does the work of page retrieval. processPage is invoked if the retrieval is successful, and logError is invoked if an Exception is raised while attempting to retrieve the page. In either case, finishProcessing is called afterwards.

The callback to logError in the asynchronous version mirrors the except part of the try/except block in the synchronous version. The callback to processPage mirrors else, and the unconditional callback to finishProcessing mirrors finally.

In the synchronous version, by virtue of the structure of a try/except block exactly one of logError and processPage is called, and finishProcessing is always called once; in the asynchronous version it is the programmer's responsibility to invoke the correct chain of success and error callbacks. If, through programming error, the call to finishProcessing were left out of processPage or logError along their respective callback chains, the reactor would never get stopped and the program would run forever.

This toy example hints at the complexity frustrating programmers during the first few years of Twisted's development. Twisted responded to this complexity by growing an object called a Deferred.

Deferreds

The Deferred object is an abstraction of the idea of a result that doesn't exist yet. It also helps manage the callback chains for this result. When returned by a function, a Deferred is a promise that the function will have a result at some point. That single returned Deferred contains references to all of the callbacks registered for an event, so only this one object needs to be passed between functions, which is much simpler to keep track of than managing callbacks individually.

Deferreds have a pair of callback chains, one for success (callbacks) and one for errors (errbacks). Deferreds start out with two empty chains. One adds pairs of callbacks and errbacks to handle successes and failures at each point in the event processing. When an asynchronous result arrives, the Deferred is "fired" and the appropriate callbacks or errbacks are invoked in the order in which they were added.

Here is a version of the asynchronous URL fetcher pseudo-code which uses Deferreds:

In this version, the same event handlers are invoked, but they are all registered with a single Deferred object instead of spread out in the code and passed as arguments to getPage.

The Deferred is created with two stages of callbacks. First, addCallbacks adds the processPage callback and logError errback to the first stage of their respective chains. Then addBoth adds finishProcessing to the second stage of both chains. Diagrammatically, the callback chains look like Figure 21.2.

Deferreds can only be fired once; attempting to re-fire them will raise an Exception. This gives Deferreds semantics closer to those of the try/except blocks of their synchronous cousins, which makes processing the asynchronous events easier to reason about and avoids subtle bugs caused by callbacks being invoked more or less than once for a single event.

Understanding Deferreds is an important part of understanding the flow of Twisted programs. However, when using the high-level abstractions Twisted provides for networking protocols, one often doesn't have to use Deferreds directly at all.

The Deferred abstraction is powerful and has been borrowed by many other event-driven platforms, including jQuery, Dojo, and Mochikit.

Transports

Transports represent the connection between two endpoints communicating over a network. Transports are responsible for describing connection details, like being stream- or datagram-oriented, flow control, and reliability. TCP, UDP, and Unix sockets are examples of transports. They are designed to be "minimally functional units that are maximally reusable" and are decoupled from protocol implementations, allowing for many protocols to utilize the same type of transport. Transports implement the ITransport interface, which has the following methods:

Decoupling transports from procotols also makes testing the two layers easier. A mock transport can simply write data to a string for inspection.

Protocols

Procotols describe how to process network events asynchronously. HTTP, DNS, and IMAP are examples of application protocols. Protocols implement the IProtocol interface, which has the following methods:

The relationship between the reactor, protocols, and transports is best illustrated with an example. Here are complete implementations of an echo server and client, first the server:

Running the server script starts a TCP server listening for connections on port 8000. The server uses the Echo protocol, and data is written out over a TCP transport. Running the client makes a TCP connection to the server, echoes the server response, and then terminates the connection and stops the reactor. Factories are used to produce instances of protocols for both sides of the connection. The communication is asynchronous on both sides; connectTCP takes care of registering callbacks with the reactor to get notified when data is available to read from a socket.

Applications

Twisted is an engine for producing scalable, cross-platform network servers and clients. Making it easy to deploy these applications in a standardized fashion in production environments is an important part of a platform like this getting wide-scale adoption.

To that end, Twisted developed the Twisted application infrastructure, a re-usable and configurable way to deploy a Twisted application. It allows a programmer to avoid boilerplate code by hooking an application into existing tools for customizing the way it is run, including daemonization, logging, using a custom reactor, profiling code, and more.

The application infrastructure has four main parts: Services, Applications, configuration management (via TAC files and plugins), and the twistd command-line utility. To illustrate this infrastructure, we'll turn the echo server from the previous section into an Application.

Service

A Service is anything that can be started and stopped and which adheres to the IService interface. Twisted comes with service implementations for TCP, FTP, HTTP, SSH, DNS, and many other protocols. Many Services can register with a single application.

Our echo service uses TCP, so we can use Twisted's default TCPServer implementation of this IService interface.

Application

An Application is the top-level service that represents the entire Twisted application. Services register themselves with an Application, and the twistd deployment utility described below searches for and runs Applications.

TAC Files

When managing Twisted applications in a regular Python file, the developer is responsible for writing code to start and stop the reactor and to configure the application. Under the Twisted application infrastructure, protocol implementations live in a module, Services using those protocols are registered in a Twisted Application Configuration (TAC) file, and the reactor and configuration are managed by an external utility.

To turn our echo server into an echo application, we can follow a simple algorithm:

The code for managing the reactor will be taken care of by twistd, discussed below. The application code ends up looking like this:

twistd

twistd (pronounced "twist-dee") is a cross-platform utility for deploying Twisted applications. It runs TAC files and handles starting and stopping an application. As part of Twisted's batteries-included approach to network programming, twistd comes with a number of useful configuration flags, including daemonizing the application, the location of log files, dropping privileges, running in a chroot, running under a non-default reactor, or even running the application under a profiler.

In this simplest case, twistd starts a daemonized instance of the application, logging to twistd.log. After starting and stopping the application, the log looks like this:

Running a service using the Twisted application infrastructure allows developers to skip writing boilerplate code for common service functionalities like logging and daemonization. It also establishes a standard command line interface for deploying applications.

Plugins

An alternative to the TAC-based system for running Twisted applications is the plugin system. While the TAC system makes it easy to register simple hierarchies of pre-defined services within an application configuration file, the plugin system makes it easy to register custom services as subcommands of the twistd utility, and to extend the command-line interface to an application.

To extend a program using the Twisted plugin system, all one has to do is create objects which implement the IPlugin interface and put them in a particular location where the plugin system knows to look for them.

Having already converted our echo server to a Twisted application, transformation into a Twisted plugin is straightforward. Alongside the echo module from before, which contains the Echo protocol and EchoFactory definitions, we add a directory called twisted, containing a subdirectory called plugins, containing our echo plugin definition. This plugin will allow us to start an echo server and specify the port to use as arguments to the twistd utility:

Our echo server will now show up as a server option in the output of twistd --help, and running twistd echo --port=1235 will start an echo server on port 1235.

Twisted comes with a pluggable authentication system for servers called twisted.cred, and a common use of the plugin system is to add an authentication pattern to an application. One can use twisted.cred AuthOptionMixin to add command-line support for various kinds of authentication off the shelf, or to add a new kind. For example, one could add authentication via a local Unix password database or an LDAP server using the plugin system.

twistd comes with plugins for many of Twisted's supported protocols, which turns the work of spinning up a server into a single command. Here are some examples of twistd servers that ship with Twisted:

twistd makes it easy to spin up a server for testing clients, but it is also pluggable, production-grade code.

In that respect, Twisted's application deployment mechanisms via TAC files, plugins, and twistd have been a success. However, anecdotally, most large Twisted deployments end up having to rewrite some of these management and monitoring facilities; the architecture does not quite expose what system administrators need. This is a reflection of the fact that Twisted has not historically had much architectural input from system administrators—the people who are experts at deploying and maintaining applications.

Twisted would be well-served to more aggressively solicit feedback from expert end users when making future architectural decisions in this space.

21.3. Retrospective and Lessons Learned

Twisted recently celebrated its 10th anniversary. Since its inception, inspired by the networked game landscape of the early 2000s, it has largely achieved its goal of being an extensible, cross-platform, event-driven networking engine. Twisted is used in production environments at companies from Google and Lucasfilm to Justin.TV and the Launchpad software collaboration platform. Server implementations in Twisted are the core of numerous other open source applications, including BuildBot, BitTorrent, and Tahoe-LAFS.

Twisted has had few major architectural changes since its initial development. The one crucial addition was Deferred, as discussed above, for managing pending results and their callback chains.

There was one important removal, which has almost no footprint in the current implementation: Twisted Application Persistence.

Twisted Application Persistence

Twisted Application Persistence (TAP) was a way of keeping an application's configuration and state in a pickle. Running an application using this scheme was a two-step process:

This process was inspired by Smalltalk images, an aversion to the proliferation of seemingly ad hoc configuration languages that were hard to script, and a desire to express configuration details in Python.

TAP files immediately introduced unwanted complexity. Classes would change in Twisted without instances of those classes getting changed in the pickle. Trying to use class methods or attributes from a newer version of Twisted on the pickled object would crash the application. The notion of "upgraders" that would upgrade pickles to new API versions was introduced, but then a matrix of upgraders, pickle versions, and unit tests had to be maintained to cover all possible upgrade paths, and comprehensively accounting for all interface changes was still hard and error-prone.

TAPs and their associated utilities were abandoned and then eventually removed from Twisted and replaced with TAC files and plugins. TAP was backronymed to Twisted Application Plugin, and few traces of the failed pickling system exist in Twisted today.

The lesson learned from the TAP fiasco was that to have reasonable maintainability, persistent data needs an explicit schema. More generally, it was a lesson about adding complexity to a project: when considering introducing a novel system for solving a problem, make sure the complexity of that solution is well understood and tested and that the benefits are clearly worth the added complexity before committing the project to it.

web2: a lesson on rewrites

While not primarily an architectural decision, a project management decision about rewriting the Twisted Web implementation has had long-term ramifications for Twisted's image and the maintainers' ability to make architectural improvements to other parts of the code base, and it deserves a short discussion.

In the mid-2000s, the Twisted developers decided to do a full rewrite of the twisted.web APIs as a separate project in the Twisted code base called web2. web2 would contain numerous improvements over twisted.web, including full HTTP 1.1 support and a streaming data API.

web2 was labelled as experimental, but ended up getting used by major projects anyway and was even accidentally released and packaged by Debian. Development on web and web2 continued concurrently for years, and new users were perennially frustrated by the side-by-side existence of both projects and a lack of clear messaging about which project to use. The switchover to web2 never happened, and in 2011 web2 was finally removed from the code base and the website. Some of the improvements from web2 are slowly getting ported back to web.

Partially because of web2, Twisted developed a reputation for being hard to navigate and structurally confusing to newcomers. Years later, the Twisted community still works hard to combat this image.

The lesson learned from web2 was that rewriting a project from scratch is often a bad idea, but if it has to happen make sure that the developer community understands the long-term plan, and that the user community has one clear choice of implementation to use during the rewrite.

If Twisted could go back and do web2 again, the developers would have done a series of backwards-compatible changes and deprecations to twisted.web instead of a rewrite.

Keeping Up with the Internet

The way that we use the Internet continues to evolve. The decision to implement many protocols as part of the core software burdens Twisted with maintaining code for all of those protocols. Implementations have to evolve with changing standards and the adoption of new protocols while maintaining a strict backwards-compatibility policy.

Twisted is primarily a volunteer-driven project, and the limiting factor for development is not community enthusiasm, but rather volunteer time. For example, RFC 2616 defining HTTP 1.1 was released in 1999, work began on adding HTTP 1.1 support to Twisted's HTTP protocol implementations in 2005, and the work was completed in 2009. Support for IPv6, defined in RFC 2460 in 1998, is in progress but unmerged as of 2011.

Implementations also have to evolve as the interfaces exposed by supported operating systems change. For example, the epoll event notification facility was added to Linux 2.5.44 in 2002, and Twisted grew an epoll-based reactor to take advantage of this new API. In 2007, Apple released OS 10.5 Leopard with a poll implementation that didn't support devices, which was buggy enough behavior for Apple to not expose select.poll in its build of Python. Twisted has had to work around this issue and document it for users ever since.

Sometimes, Twisted development doesn't keep up with the changing networking landscape, and enhancements are moved to libraries outside of the core software. For example, the Wokkel project, a collection of enhancements to Twisted's Jabber/XMPP support, has lived as a to-be-merged independent project for years without a champion to oversee the merge. An attempt was made to add WebSockets to Twisted as browsers began to adopt support for the new protocol in 2009, but development moved to external projects after a decision not to include the protocol until it moved from an IETF draft to a standard.

All of this being said, the proliferation of libraries and add-ons is a testament to Twisted's flexibility and extensibility. A strict test-driven development policy and accompanying documentation and coding standards help the project avoid regressions and preserve backwards compatibility while maintaining a large matrix of supported protocols and platforms. It is a mature, stable project that continues to have very active development and adoption.

Twisted looks forward to being the engine of your Internet for another ten years.

`write`	Write some data to the physical connection, in sequence, in a non-blocking fashion.
`writeSequence`	Write a list of strings to the physical connection.
`loseConnection`	Write all pending data and then close the connection.
`getPeer`	Get the remote address of this connection.
`getHost`	Get the address of this side of the connection.

`makeConnection`	Make a connection to a transport and a server.
`connectionMade`	Called when a connection is made.
`dataReceived`	Called whenever data is received.
`connectionLost`	Called when the connection is shut down.

`startService`	Start the service. This might include loading configuration data, setting up database connections, or listening on a port
`stopService`	Shut down the service. This might include saving state to disk, closing database connections, or stopping listening on a port