The Architecture of Open Source Applications (Volume 1)Asterisk

Asterisk¹ is an open source telephony applications platform distributed under the GPLv2. In short, it is a server application for making, receiving, and performing custom processing of phone calls.

The project was started by Mark Spencer in 1999. Mark had a company called Linux Support Services and he needed a phone system to help operate his business. He did not have a lot of money to spend on buying one, so he just made his own. As the popularity of Asterisk grew, Linux Support Services shifted focus to Asterisk and changed its name to Digium, Inc.

The name Asterisk comes from the Unix wildcard character, *. The goal for the Asterisk project is to do everything telephony. Through pursuing this goal, Asterisk now supports a long list of technologies for making and receiving phone calls. This includes many VoIP (Voice over IP) protocols, as well as both analog and digital connectivity to the traditional telephone network, or the PSTN (Public Switched Telephone Network). This ability to get many different types of phone calls into and out of the system is one of Asterisk's main strengths.

Once phone calls are made to and from an Asterisk system, there are many additional features that can be used to customize the processing of the phone call. Some features are larger pre-built common applications, such as voicemail. There are other smaller features that can be combined together to create custom voice applications, such as playing back a sound file, reading digits, or speech recognition.

1.1. Critical Architectural Concepts

This section discusses some architectural concepts that are critical to all parts of Asterisk. These ideas are at the foundation of the Asterisk architecture.

1.1.1. Channels

A channel in Asterisk represents a connection between the Asterisk system and some telephony endpoint (Figure 1.1). The most common example is when a phone makes a call into an Asterisk system. This connection is represented by a single channel. In the Asterisk code, a channel exists as an instance of the ast_channel data structure. This call scenario could be a caller interacting with voicemail, for example.

Figure 1.1: A Single Call Leg, Represented by a Single Channel

1.1.2. Channel Bridging

Perhaps a more familiar call scenario would be a connection between two phones, where a person using phone A has called a person on phone B. In this call scenario, there are two telephony endpoints connected to the Asterisk system, so two channels exist for this call (Figure 1.2).

Figure 1.2: Two Call Legs Represented by Two Channels

When Asterisk channels are connected like this, it is referred to as a channel bridge. Channel bridging is the act of connecting channels together for the purpose of passing media between them. The media stream is most commonly an audio stream. However, there may also be a video or a text stream in the call. Even in the case where there is more than one media stream (such as both audio and video), it is still handled by a single channel for each end of the call in Asterisk. In Figure 1.2, where there are two channels for phones A and B, the bridge is responsible for passing the media coming from phone A to phone B, and similarly, for passing the media coming from phone B to phone A. All media streams are negotiated through Asterisk. Anything that Asterisk does not understand and have full control over is not allowed. This means that Asterisk can do recording, audio manipulation, and translation between different technologies.

When two channels are bridged together, there are two methods that may be used to accomplish this: generic bridging and native bridging. A generic bridge is one that works regardless of what channel technologies are in use. It passes all audio and signalling through the Asterisk abstract channel interfaces. While this is the most flexible bridging method, it is also the least efficient due to the levels of abstraction necessary to accomplish the task. Figure 1.2 illustrates a generic bridge.

A native bridge is a technology specific method of connecting channels together. If two channels are connected to Asterisk using the same media transport technology, there may be a way to connect them that is more efficient than going through the abstraction layers in Asterisk that exist for connecting different technologies together. For example, if specialized hardware is being used for connecting to the telephone network, it may be possible to bridge the channels on the hardware so that the media does not have to flow up through the application at all. In the case of some VoIP protocols, it is possible to have endpoints send their media streams to each other directly, such that only the call signalling information continues to flow through the server.

The decision between generic bridging and native bridging is done by comparing the two channels when it is time to bridge them. If both channels indicate that they support the same native bridging method, then that will be used. Otherwise, the generic bridging method will be used. To determine whether or not two channels support the same native bridging method, a simple C function pointer comparison is used. It's certainly not the most elegant method, but we have not yet hit any cases where this was not sufficient for our needs. Providing a native bridge function for a channel is discussed in more detail in Section 1.2. Figure 1.3 illustrates an example of a native bridge.

Figure 1.3: Example of a Native Bridge

1.1.3. Frames

Communication within the Asterisk code during a call is done by using frames, which are instances of the ast_frame data structure. Frames can either be media frames or signalling frames. During a basic phone call, a stream of media frames containing audio would be passing through the system. Signalling frames are used to send messages about call signalling events, such as a digit being pressed, a call being put on hold, or a call being hung up.

The list of available frame types is statically defined. Frames are marked with a numerically encoded type and subtype. A full list can be found in the source code in include/asterisk/frame.h; some examples are:

VOICE: These frames carry a portion of an audio stream.
VIDEO: These frames carry a portion of a video stream.
MODEM: The encoding used for the data in this frame, such as T.38 for sending a FAX over IP. The primary usage of this frame type is for handling a FAX. It is important that frames of data be left completely undisturbed so that the signal can be successfully decoded at the other end. This is different than AUDIO frames, because in that case, it is acceptable to transcode into other audio codecs to save bandwidth at the cost of audio quality.
CONTROL: The call signalling message that this frame indicates. These frames are used to indicate call signalling events. These events include a phone being answered, hung up, put on hold, etc.
DTMF_BEGIN: Which digit just started. This frame is sent when a caller presses a DTMF key² on their phone.
DTMF_END: Which digit just ended. This frame is sent when a caller stops pressing a DTMF key on their phone.

1.2. Asterisk Component Abstractions

Asterisk is a highly modularized application. There is a core application that is built from the source in the main/ directory of the source tree. However, it is not very useful by itself. The core application acts primarily as a module registry. It also has code that knows how to connect all of the abstract interfaces together to make phone calls work. The concrete implementations of these interfaces are registered by loadable modules at runtime.

By default, all modules found in a predefined Asterisk modules directory on the filesystem will be loaded when the main application is started. This approach was chosen for its simplicity. However, there is a configuration file that can be updated to specify exactly which modules to load and in what order to load them. This makes the configuration a bit more complex, but provides the ability to specify that modules that are not needed should not be loaded. The primary benefit is reducing the memory footprint of the application. However, there are some security benefits, as well. It is best not to load a module that accepts connections over a network if it is not actually needed.

When the module loads, it registers all of its implementations of component abstractions with the Asterisk core application. There are many types of interfaces that modules can implement and register with the Asterisk core. A module is allowed to register as many of these different interfaces as it would like. Generally, related functionality is grouped into a single module.

1.2.1. Channel Drivers

The Asterisk channel driver interface is the most complex and most important interface available. The Asterisk channel API provides the telephony protocol abstraction which allows all other Asterisk features to work independently of the telephony protocol in use. This component is responsible for translating between the Asterisk channel abstraction and the details of the telephony technology that it implements.

The definition of the Asterisk channel driver interface is called the ast_channel_tech interface. It defines a set of methods that must be implemented by a channel driver. The first method that a channel driver must implement is an ast_channel factory method, which is the requester method in ast_channel_tech. When an Asterisk channel is created, either for an incoming or outgoing phone call, the implementation of ast_channel_tech associated with the type of channel needed is responsible for instantiation and initialization of the ast_channel for that call.

Once an ast_channel has been created, it has a reference to the ast_channel_tech that created it. There are many other operations that must be handled in a technology-specific way. When those operations must be performed on an ast_channel, the handling of the operation is deferred to the appropriate method from ast_channel_tech. Figure 1.2 shows two channels in Asterisk. Figure 1.4 expands on this to show two bridged channels and how the channel technology implementations fit into the picture.

Figure 1.4: Channel Technology and Abstract Channel Layers

The most important methods in ast_channel_tech are:

requester: This callback is used to request a channel driver to instantiate an ast_channel object and initialize it as appropriate for this channel type.
call: This callback is used to initiate an outbound call to the endpoint represented by an ast_channel.
answer: This is called when Asterisk decides that it should answer the inbound call associated with this ast_channel.
hangup: This is called when the system has determined that the call should be hung up. The channel driver will then communicate to the endpoint that the call is over in a protocol specific manner.
indicate: Once a call is up, there are a number of other events that may occur that need to be signalled to an endpoint. For example, if the device is put on hold, this callback is called to indicate that condition. There may be a protocol specific method of indicating that the call has been on hold, or the channel driver may simply initiate the playback of music on hold to the device.
send_digit_begin: This function is called to indicate the beginning of a digit (DTMF) being sent to this device.
send_digit_end: This function is called to indicate the end of a digit (DTMF) being sent to this device.
read: This function is called by the Asterisk core to read back an ast_frame from this endpoint. An ast_frame is an abstraction in Asterisk that is used to encapsulate media (such as audio or video), as well as to signal events.
write: This function is used to send an ast_frame to this device. The channel driver will take the data and packetize it as appropriate for the telephony protocol that it implements and pass it along to the endpoint.
bridge: This is the native bridge callback for this channel type. As discussed before, native bridging is when a channel driver is able to implement a more efficient bridging method for two channels of the same type instead of having all signalling and media flow through additional unnecessary abstraction layers. This is incredibly important for performance reasons.

Once a call is over, the abstract channel handling code that lives in the Asterisk core will invoke the ast_channel_tech hangup callback and then destroy the ast_channel object.

1.2.2. Dialplan Applications

Asterisk administrators set up call routing using the Asterisk dialplan, which resides in the /etc/asterisk/extensions.conf file. The dialplan is made up of a series of call rules called extensions. When a phone call comes in to the system, the dialed number is used to find the extension in the dialplan that should be used for processing the call. The extension includes a list of dialplan applications which will be executed on the channel. The applications available for execution in the dialplan are maintained in an application registry. This registry is populated at runtime as modules are loaded.

Asterisk has nearly two hundred included applications. The definition of an application is very loose. Applications can use any of the Asterisk internal APIs to interact with the channel. Some applications do a single task, such as Playback, which plays back a sound file to the caller. Other applications are much more involved and perform a large number of operations, such as the Voicemail application.

Using the Asterisk dialplan, multiple applications can be used together to customize call handling. If more extensive customization is needed beyond what is possible in the provided dialplan language, there are scripting interfaces available that allow call handling to be customized using any programming language. Even when using these scripting interfaces with another programming language, dialplan applications are still invoked to interact with the channel.

Before we get into an example, let's have a look at the syntax of an Asterisk dialplan that handles calls to the number 1234. Note that the choice of 1234 here is arbitrary. It invokes three dialplan applications. First, it answers the call. Next, it plays back a sound file. Finally, it hangs up the call.

; Define the rules for what happens when someone dials 1234.
;
exten => 1234,1,Answer()
    same => n,Playback(demo-congrats)
    same => n,Hangup()

The exten keyword is used to define the extension. On the right side of the exten line, the 1234 means that we are defining the rules for when someone calls 1234. The next 1 means this is the first step that is taken when that number is dialed. Finally, Answer instructs the system to answer the call. The next two lines that begin with the same keyword are rules for the last extension that was specified, which in this case is 1234. The n is short for saying that this is the next step to take. The last item on those lines specifies what action to take.

Here is another example of using the Asterisk dialplan. In this case, an incoming call is answered. The caller is played a beep, and then up to 4 digits are read from the caller and stored into the DIGITS variable. Then, the digits are read back to the caller. Finally, the call is ended.

exten => 5678,1,Answer()
    same => n,Read(DIGITS,beep,4)
    same => n,SayDigits(${DIGITS})
    same => n,Hangup()

As previously mentioned, the definition of an application is very loose—the function prototype registered is very simple:

int (*execute)(struct ast_channel *chan, const char *args);

However, the application implementations use virtually all of the APIs found in include/asterisk/.

1.2.3. Dialplan Functions

Most dialplan applications take a string of arguments. While some values may be hard coded, variables are used in places where behavior needs to be more dynamic. The following example shows a dialplan snippet that sets a variable and then prints out its value to the Asterisk command line interface using the Verbose application.

exten => 1234,1,Set(MY_VARIABLE=foo)
    same => n,Verbose(MY_VARIABLE is ${MY_VARIABLE})

Dialplan functions are invoked by using the same syntax as the previous example. Asterisk modules are able to register dialplan functions that can retrieve some information and return it to the dialplan. Alternatively, these dialplan functions can receive data from the dialplan and act on it. As a general rule, while dialplan functions may set or retrieve channel meta data, they do not do any signalling or media processing. That is left as the job of dialplan applications.

The following example demonstrates usage of a dialplan function. First, it prints out the CallerID of the current channel to the Asterisk command line interface. Then, it changes the CallerID by using the Set application. In this example, Verbose and Set are applications, and CALLERID is a function.

exten => 1234,1,Verbose(The current CallerID is ${CALLERID(num)})
    same => n,Set(CALLERID(num)=<256>555-1212)

A dialplan function is needed here instead of just a simple variable since the CallerID information is stored in data structures on the instance of ast_channel. The dialplan function code knows how to set and retrieve the values from these data structures.

Another example of using a dialplan function is for adding custom information into the call logs, which are referred to as CDRs (Call Detail Records). The CDR function allows the retrieval of call detail record information, as well as adding custom information.

exten => 555,1,Verbose(Time this call started: ${CDR(start)})
    same => n,Set(CDR(mycustomfield)=snickerdoodle)

1.2.4. Codec Translators

In the world of VOIP, many different codecs are used for encoding media to be sent across networks. The variety of choices offers tradeoffs in media quality, CPU consumption, and bandwidth requirements. Asterisk supports many different codecs and knows how to translate between them when necessary.

When a call is set up, Asterisk will attempt to get two endpoints to use a common media codec so that transcoding is not required. However, that is not always possible. Even if a common codec is being used, transcoding may still be required. For example, if Asterisk is configured to do some signal processing on the audio as it passes through the system (such as to increase or decrease the volume level), Asterisk will need to transcode the audio back to an uncompressed form before it can perform the signal processing. Asterisk can also be configured to do call recording. If the configured format for the recording is different than that of the call, transcoding will be required.

Codec Negotiation

The method used to negotiate which codec will be used for a media stream is specific to the technology used to connect the call to Asterisk. In some cases, such as a call on the traditional telephone network (the PSTN), there may not be any negotiation to do. However, in other cases, especially using IP protocols, there is a negotiation mechanism used where capabilities and preferences are expressed and a common codec is agreed upon.

For example, in the case of SIP (the most commonly used VOIP protocol) this is a high level view of how codec negotiation is performed when a call is sent to Asterisk.

An endpoint sends a new call request to Asterisk which includes the list of codecs it is willing to use.
Asterisk consults its configuration provided by the administrator which includes a list of allowed codecs in preferred order. Asterisk will respond by choosing the most preferred codec (based on its own configured preferences) that is listed as allowed in the Asterisk configuration and was also listed as supported in the incoming request.

One area that Asterisk does not handle very well is that of more complex codecs, especially video. Codec negotiation demands have gotten more complicated over the last ten years. We have more work to do to be able to better deal with the newest audio codecs and to be able to support video much better than we do today. This is one of the top priorities for new development for the next major release of Asterisk.

Codec translator modules provide one or more implementations of the ast_translator interface. A translator has source and destination format attributes. It also provides a callback that will be used to convert a chunk of media from the source to the destination format. It knows nothing about the concept of a phone call. It only knows how to convert media from one format to another.

For more detailed information about the translator API, see include/asterisk/translate.h and main/translate.c. Implementations of the translator abstraction can be found in the codecs directory.

1.3. Threads

Asterisk is a very heavily multithreaded application. It uses the POSIX threads API to manage threads and related services such as locking. All of the Asterisk code that interacts with threads does so by going through a set of wrappers used for debugging purposes. Most threads in Asterisk can be classified as either a Network Monitor Thread, or a Channel Thread (sometimes also referred to as a PBX thread, because its primary purpose is to run the PBX for a channel).

1.3.1. Network Monitor Threads

Network monitor threads exist in every major channel driver in Asterisk. They are responsible for monitoring whatever network they are connected to (whether that is an IP network, the PSTN, etc.) and monitor for incoming calls or other types of incoming requests. They handle the initial connection setup steps such as authentication and dialed number validation. Once the call setup has been completed, the monitor threads will create an instance of an Asterisk channel (ast_channel), and start a channel thread to handle the call for the rest of its lifetime.

1.3.2. Channel Threads

As discussed earlier, a channel is a fundamental concept in Asterisk. Channels are either inbound or outbound. An inbound channel is created when a call comes in to the Asterisk system. These channels are the ones that execute the Asterisk dialplan. A thread is created for every inbound channel that executes the dialplan. These threads are referred to as channel threads.

Dialplan applications always execute in the context of a channel thread. Dialplan functions almost always do, as well. It is possible to read and write dialplan functions from an asynchronous interface such as the Asterisk CLI. However, it is still always the channel thread that is the owner of the ast_channel data structure and controls the object lifetime.

1.4. Call Scenarios

The previous two sections introduced important interfaces for Asterisk components, as well as the thread execution model. In this section, some common call scenarios are broken down to demonstrate how Asterisk components operate together to process phone calls.

1.4.1. Checking Voicemail

One example call scenario is when someone calls into the phone system to check their Voicemail. The first major component involved in this scenario is the channel driver. The channel driver will be responsible for handling the incoming call request from the phone, which will occur in the channel driver's monitor thread. Depending on the telephony technology being used to deliver the call to the system, there may be some sort of negotiation required to set up the call. Another step of setting up the call is determining the intended destination for the call. This is usually specified by the number that was dialed by the caller. However, in some cases there is no specific number available since the technology used to deliver the call does not support specifying the dialed number. An example of this would be an incoming call on an analog phone line.

If the channel driver verifies that the Asterisk configuration has extensions defined in the dialplan (the call routing configuration) for the dialed number, it will then allocate an Asterisk channel object (ast_channel) and create a channel thread. The channel thread has the primary responsibility for handling the rest of the call (Figure 1.5).

Figure 1.5: Call Setup Sequence Diagram

The main loop of the channel thread handles dialplan execution. It goes to the rules defined for the dialed extension and executes the steps that have been defined. The following is an example extension expressed in the extensions.conf dialplan syntax. This extension answers the call and executes the VoicemailMain application when someone dials *123. This application is what a user would call to be able to check messages left in their mailbox.

exten => *123,1,Answer()
    same => n,VoicemailMain()

When the channel thread executes the Answer application, Asterisk will answer the incoming call. Answering a call requires technology specific processing, so in addition to some generic answer handling, the answer callback in the associated ast_channel_tech structure is called to handle answering the call. This may involve sending a special packet over an IP network, taking an analog line off hook, etc.

The next step is for the channel thread to execute VoicemailMain (Figure 1.6). This application is provided by the app_voicemail module. One important thing to note is that while the Voicemail code handles a lot of call interaction, it knows nothing about the technology that is being used to deliver the call into the Asterisk system. The Asterisk channel abstraction hides these details from the Voicemail implementation.

There are many features involved in providing a caller access to their Voicemail. However, all of them are primarily implemented as reading and writing sound files in response to input from the caller, primarily in the form of digit presses. DTMF digits can be delivered to Asterisk in many different ways. Again, these details are handled by the channel drivers. Once a key press has arrived in Asterisk, it is converted into a generic key press event and passed along to the Voicemail code.

One of the important interfaces in Asterisk that has been discussed is that of a codec translator. These codec implementations are very important to this call scenario. When the Voicemail code would like to play back a sound file to the caller, the format of the audio in the sound file may not be the same format as the audio being used in the communication between the Asterisk system and the caller. If it must transcode the audio, it will build a translation path of one or more codec translators to get from the source to the destination format.

Figure 1.6: A Call to VoicemailMain

At some point, the caller will be done interacting with the Voicemail system and hang up. The channel driver will detect that this has occurred and convert this into a generic Asterisk channel signalling event. The Voicemail code will receive this signalling event and will exit, since there is nothing left to do once the caller hangs up. Control will return back to the main loop in the channel thread to continue dialplan execution. Since in this example there is no further dialplan processing to be done, the channel driver will be given an opportunity to handle technology specific hangup processing and then the ast_channel object will be destroyed.

1.4.2. Bridged Call

Another very common call scenario in Asterisk is a bridged call between two channels. This is the scenario when one phone calls another through the system. The initial call setup process is identical to the previous example. The difference in handling begins when the call has been set up and the channel thread begins executing the dialplan.

The following dialplan is a simple example that results in a bridged call. Using this extension, when a phone dials 1234, the dialplan will execute the Dial application, which is the main application used to initiate an outbound call.

exten => 1234,1,Dial(SIP/bob)

The argument specified to the Dial application says that the system should make an outbound call to the device referred to as SIP/bob. The SIP portion of this argument specifies that the SIP protocol should be used to deliver the call. bob will be interpreted by the channel driver that implements the SIP protocol, chan_sip. Assuming the channel driver has been properly configured with an account called bob, it will know how to reach Bob's phone.

The Dial application will ask the Asterisk core to allocate a new Asterisk channel using the SIP/bob identifier. The core will request that the SIP channel driver perform technology specific initialization. The channel driver will also initiate the process of making a call out to the phone. As the request proceeds, it will pass events back into the Asterisk core, which will be received by the Dial application. These events may include a response that the call has been answered, the destination is busy, the network is congested, the call was rejected for some reason, or a number of other possible responses. In the ideal case, the call will be answered. The fact that the call has been answered is propagated back to the inbound channel. Asterisk will not answer the part of the call that came into the system until the outbound call was answered. Once both channels are answered, the bridging of the channels begins (Figure 1.7).

Figure 1.7: Block Diagram of a Bridged Call in a Generic Bridge

During a channel bridge, audio and signalling events from one channel are passed to the other until some event occurs that causes the bridge to end, such as one side of the call hanging up. The sequence diagram in Figure 1.8 demonstrates the key operations that are performed for an audio frame during a bridged call.

Figure 1.8: Sequence Diagram for Audio Frame Processing During a Bridge

Once the call is done, the hangup process is very similar to the previous example. The major difference here is that there are two channels involved. The channel technology specific hangup processing will be executed for both channels before the channel thread stops running.

Footnotes

http://www.asterisk.org/
DTMF stands for Dual-Tone Multi-Frequency. This is the tone that is sent in the audio of a phone call when someone presses a key on their telephone.

The Architecture of Open Source Applications (Volume 1)
Asterisk