Asterisk1 is an open source telephony applications platform distributed under the GPLv2. In short, it is a server application for making, receiving, and performing custom processing of phone calls.
The project was started by Mark Spencer in 1999. Mark had a company called Linux Support Services and he needed a phone system to help operate his business. He did not have a lot of money to spend on buying one, so he just made his own. As the popularity of Asterisk grew, Linux Support Services shifted focus to Asterisk and changed its name to Digium, Inc.
The name Asterisk comes from the Unix wildcard character,
*
. The goal for the Asterisk project is to do everything
telephony. Through pursuing this goal, Asterisk now supports a long
list of technologies for making and receiving phone calls. This
includes many VoIP (Voice over IP) protocols, as well as both analog
and digital connectivity to the traditional telephone network, or the
PSTN (Public Switched Telephone Network). This ability to get many
different types of phone calls into and out of the system is one of
Asterisk's main strengths.
Once phone calls are made to and from an Asterisk system, there are many additional features that can be used to customize the processing of the phone call. Some features are larger pre-built common applications, such as voicemail. There are other smaller features that can be combined together to create custom voice applications, such as playing back a sound file, reading digits, or speech recognition.
This section discusses some architectural concepts that are critical to all parts of Asterisk. These ideas are at the foundation of the Asterisk architecture.
A channel in Asterisk represents a connection between the Asterisk
system and some telephony endpoint
(Figure 1.1). The most common example is
when a phone makes a call into an Asterisk system. This connection is
represented by a single channel. In the Asterisk code, a channel
exists as an instance of the ast_channel
data structure. This
call scenario could be a caller interacting with voicemail, for
example.
Figure 1.1: A Single Call Leg, Represented by a Single Channel
Perhaps a more familiar call scenario would be a connection between two phones, where a person using phone A has called a person on phone B. In this call scenario, there are two telephony endpoints connected to the Asterisk system, so two channels exist for this call (Figure 1.2).
Figure 1.2: Two Call Legs Represented by Two Channels
When Asterisk channels are connected like this, it is referred to as a channel bridge. Channel bridging is the act of connecting channels together for the purpose of passing media between them. The media stream is most commonly an audio stream. However, there may also be a video or a text stream in the call. Even in the case where there is more than one media stream (such as both audio and video), it is still handled by a single channel for each end of the call in Asterisk. In Figure 1.2, where there are two channels for phones A and B, the bridge is responsible for passing the media coming from phone A to phone B, and similarly, for passing the media coming from phone B to phone A. All media streams are negotiated through Asterisk. Anything that Asterisk does not understand and have full control over is not allowed. This means that Asterisk can do recording, audio manipulation, and translation between different technologies.
When two channels are bridged together, there are two methods that may be used to accomplish this: generic bridging and native bridging. A generic bridge is one that works regardless of what channel technologies are in use. It passes all audio and signalling through the Asterisk abstract channel interfaces. While this is the most flexible bridging method, it is also the least efficient due to the levels of abstraction necessary to accomplish the task. Figure 1.2 illustrates a generic bridge.
A native bridge is a technology specific method of connecting channels together. If two channels are connected to Asterisk using the same media transport technology, there may be a way to connect them that is more efficient than going through the abstraction layers in Asterisk that exist for connecting different technologies together. For example, if specialized hardware is being used for connecting to the telephone network, it may be possible to bridge the channels on the hardware so that the media does not have to flow up through the application at all. In the case of some VoIP protocols, it is possible to have endpoints send their media streams to each other directly, such that only the call signalling information continues to flow through the server.
The decision between generic bridging and native bridging is done by comparing the two channels when it is time to bridge them. If both channels indicate that they support the same native bridging method, then that will be used. Otherwise, the generic bridging method will be used. To determine whether or not two channels support the same native bridging method, a simple C function pointer comparison is used. It's certainly not the most elegant method, but we have not yet hit any cases where this was not sufficient for our needs. Providing a native bridge function for a channel is discussed in more detail in Section 1.2. Figure 1.3 illustrates an example of a native bridge.
Figure 1.3: Example of a Native Bridge
Communication within the Asterisk code during a call is done by using
frames, which are instances of the ast_frame
data
structure. Frames can either be media frames or signalling frames.
During a basic phone call, a stream of media frames containing audio
would be passing through the system. Signalling frames are used to send
messages about call signalling events, such as a digit being pressed, a
call being put on hold, or a call being hung up.
The list of available frame types is statically defined. Frames are
marked with a numerically encoded type and subtype. A full list can
be found in the source code in include/asterisk/frame.h
; some
examples are:
VOICE
: These frames carry a portion of an audio stream.VIDEO
: These frames carry a portion of a video stream.MODEM
: The encoding used for the data in this frame,
such as T.38 for sending a FAX over IP. The primary usage of this
frame type is for handling a FAX. It is important that frames of
data be left completely undisturbed so that the signal can be
successfully decoded at the other end. This is different than
AUDIO frames, because in that case, it is acceptable to transcode
into other audio codecs to save bandwidth at the cost of audio
quality.CONTROL
: The call signalling message that this frame
indicates. These frames are used to indicate call signalling
events. These events include a phone being answered, hung up, put
on hold, etc.DTMF_BEGIN
: Which digit just started. This frame is
sent when a caller presses a DTMF key2 on their phone.DTMF_END
: Which digit just ended. This frame is sent
when a caller stops pressing a DTMF key on their phone.Asterisk is a highly modularized application. There is a core
application that is built from the source in the main/
directory of the source tree. However, it is not very useful by
itself. The core application acts primarily as a module registry. It
also has code that knows how to connect all of the abstract interfaces
together to make phone calls work. The concrete implementations of
these interfaces are registered by loadable modules at runtime.
By default, all modules found in a predefined Asterisk modules directory on the filesystem will be loaded when the main application is started. This approach was chosen for its simplicity. However, there is a configuration file that can be updated to specify exactly which modules to load and in what order to load them. This makes the configuration a bit more complex, but provides the ability to specify that modules that are not needed should not be loaded. The primary benefit is reducing the memory footprint of the application. However, there are some security benefits, as well. It is best not to load a module that accepts connections over a network if it is not actually needed.
When the module loads, it registers all of its implementations of component abstractions with the Asterisk core application. There are many types of interfaces that modules can implement and register with the Asterisk core. A module is allowed to register as many of these different interfaces as it would like. Generally, related functionality is grouped into a single module.
The Asterisk channel driver interface is the most complex and most important interface available. The Asterisk channel API provides the telephony protocol abstraction which allows all other Asterisk features to work independently of the telephony protocol in use. This component is responsible for translating between the Asterisk channel abstraction and the details of the telephony technology that it implements.
The definition of the Asterisk channel driver interface is called the
ast_channel_tech
interface. It defines a set of methods that
must be implemented by a channel driver. The first method that a
channel driver must implement is an ast_channel
factory
method, which is the requester
method in
ast_channel_tech
. When an Asterisk channel is created, either
for an incoming or outgoing phone call, the implementation of
ast_channel_tech
associated with the type of channel needed
is responsible for instantiation and initialization of the
ast_channel
for that call.
Once an ast_channel
has been created, it has a reference to
the ast_channel_tech
that created it. There are many other
operations that must be handled in a technology-specific way. When
those operations must be performed on an ast_channel
, the
handling of the operation is deferred to the appropriate method from
ast_channel_tech
. Figure 1.2 shows
two channels in Asterisk. Figure 1.4
expands on this to show two bridged channels and how the channel
technology implementations fit into the picture.
Figure 1.4: Channel Technology and Abstract Channel Layers
The most important methods in ast_channel_tech
are:
requester
: This callback is used to
request a channel driver to instantiate an
ast_channel
object and initialize it as
appropriate for this channel type.call
: This callback is used to initiate an
outbound call to the endpoint represented by an
ast_channel
.answer
: This is called when Asterisk
decides that it should answer the inbound call associated with this
ast_channel
.hangup
: This is called when the system has
determined that the call should be hung up. The channel driver will
then communicate to the endpoint that the call is over in a protocol
specific manner.indicate
: Once a call is up, there are a
number of other events that may occur that need to be signalled to an
endpoint. For example, if the device is put on hold, this callback
is called to indicate that condition. There may be a protocol
specific method of indicating that the call has been on hold, or the
channel driver may simply initiate the playback of music on hold to
the device.send_digit_begin
: This function is called
to indicate the beginning of a digit (DTMF) being sent to this
device.send_digit_end
: This function is called to
indicate the end of a digit (DTMF) being sent to this device.read
: This function is called by the
Asterisk core to read back an ast_frame
from this
endpoint. An ast_frame
is an abstraction in
Asterisk that is used to encapsulate media (such as audio or video),
as well as to signal events.write
: This function is used to send an
ast_frame
to this device. The channel driver will
take the data and packetize it as appropriate for the telephony
protocol that it implements and pass it along to the
endpoint.bridge
: This is the native bridge callback
for this channel type. As discussed before, native bridging is when
a channel driver is able to implement a more efficient bridging
method for two channels of the same type instead of having all
signalling and media flow through additional unnecessary abstraction
layers. This is incredibly important for performance reasons.Once a call is over, the abstract channel handling code that lives in
the Asterisk core will invoke the ast_channel_tech hangup
callback and then destroy the ast_channel
object.
Asterisk administrators set up call routing using the Asterisk
dialplan, which resides in the /etc/asterisk/extensions.conf
file. The dialplan is made up of a series of call rules called
extensions. When a phone call comes in to the system, the dialed
number is used to find the extension in the dialplan that should be
used for processing the call. The extension includes a list of
dialplan applications which will be executed on the channel. The
applications available for execution in the dialplan are maintained in
an application registry. This registry is populated at runtime as
modules are loaded.
Asterisk has nearly two hundred included applications. The definition
of an application is very loose. Applications can use any of the
Asterisk internal APIs to interact with the channel. Some applications
do a single task, such as Playback
, which plays back a sound
file to the caller. Other applications are much more involved and
perform a large number of operations, such as the Voicemail
application.
Using the Asterisk dialplan, multiple applications can be used together to customize call handling. If more extensive customization is needed beyond what is possible in the provided dialplan language, there are scripting interfaces available that allow call handling to be customized using any programming language. Even when using these scripting interfaces with another programming language, dialplan applications are still invoked to interact with the channel.
Before we get into an example, let's have a look at the syntax of an
Asterisk dialplan that handles calls to the number 1234
. Note
that the choice of 1234
here is arbitrary. It invokes three
dialplan applications. First, it answers the call. Next, it plays back
a sound file. Finally, it hangs up the call.
; Define the rules for what happens when someone dials 1234. ; exten => 1234,1,Answer() same => n,Playback(demo-congrats) same => n,Hangup()
The exten
keyword is used to define the extension. On the
right side of the exten
line, the 1234
means that we are
defining the rules for when someone calls 1234
. The next
1
means this is the first step that is taken when that number
is dialed. Finally, Answer
instructs the system to answer the
call. The next two lines that begin with the same
keyword are
rules for the last extension that was specified, which in this case is
1234
. The n
is short for saying that this is the next
step to take. The last item on those lines specifies what action to
take.
Here is another example of using the Asterisk dialplan. In this case,
an incoming call is answered. The caller is played a beep, and then up
to 4 digits are read from the caller and stored into the DIGITS
variable. Then, the digits are read back to the caller. Finally, the
call is ended.
exten => 5678,1,Answer() same => n,Read(DIGITS,beep,4) same => n,SayDigits(${DIGITS}) same => n,Hangup()
As previously mentioned, the definition of an application is very loose—the function prototype registered is very simple:
int (*execute)(struct ast_channel *chan, const char *args);
However, the application implementations use virtually all of the
APIs found in include/asterisk/
.
Most dialplan applications take a string of arguments. While some
values may be hard coded, variables are used in places where behavior
needs to be more dynamic. The following example shows a dialplan
snippet that sets a variable and then prints out its value to the
Asterisk command line interface using the Verbose
application.
exten => 1234,1,Set(MY_VARIABLE=foo) same => n,Verbose(MY_VARIABLE is ${MY_VARIABLE})
Dialplan functions are invoked by using the same syntax as the previous example. Asterisk modules are able to register dialplan functions that can retrieve some information and return it to the dialplan. Alternatively, these dialplan functions can receive data from the dialplan and act on it. As a general rule, while dialplan functions may set or retrieve channel meta data, they do not do any signalling or media processing. That is left as the job of dialplan applications.
The following example demonstrates usage of a dialplan function.
First, it prints out the CallerID of the current channel to the
Asterisk command line interface. Then, it changes the CallerID by
using the Set
application. In this example, Verbose
and Set
are applications, and CALLERID
is a function.
exten => 1234,1,Verbose(The current CallerID is ${CALLERID(num)}) same => n,Set(CALLERID(num)=<256>555-1212)
A dialplan function is needed here instead of just a simple variable
since the CallerID information is stored in data structures on the
instance of ast_channel
. The dialplan function code knows how
to set and retrieve the values from these data structures.
Another example of using a dialplan function is for adding custom
information into the call logs, which are referred to as CDRs (Call
Detail Records). The CDR
function allows the retrieval of call
detail record information, as well as adding custom information.
exten => 555,1,Verbose(Time this call started: ${CDR(start)}) same => n,Set(CDR(mycustomfield)=snickerdoodle)
In the world of VOIP, many different codecs are used for encoding media to be sent across networks. The variety of choices offers tradeoffs in media quality, CPU consumption, and bandwidth requirements. Asterisk supports many different codecs and knows how to translate between them when necessary.
When a call is set up, Asterisk will attempt to get two endpoints to use a common media codec so that transcoding is not required. However, that is not always possible. Even if a common codec is being used, transcoding may still be required. For example, if Asterisk is configured to do some signal processing on the audio as it passes through the system (such as to increase or decrease the volume level), Asterisk will need to transcode the audio back to an uncompressed form before it can perform the signal processing. Asterisk can also be configured to do call recording. If the configured format for the recording is different than that of the call, transcoding will be required.
Codec Negotiation
The method used to negotiate which codec will be used for a media stream is specific to the technology used to connect the call to Asterisk. In some cases, such as a call on the traditional telephone network (the PSTN), there may not be any negotiation to do. However, in other cases, especially using IP protocols, there is a negotiation mechanism used where capabilities and preferences are expressed and a common codec is agreed upon.
For example, in the case of SIP (the most commonly used VOIP protocol) this is a high level view of how codec negotiation is performed when a call is sent to Asterisk.
One area that Asterisk does not handle very well is that of more complex codecs, especially video. Codec negotiation demands have gotten more complicated over the last ten years. We have more work to do to be able to better deal with the newest audio codecs and to be able to support video much better than we do today. This is one of the top priorities for new development for the next major release of Asterisk.
Codec translator modules provide one or more implementations of the
ast_translator
interface. A translator has source and
destination format attributes. It also provides a callback that will
be used to convert a chunk of media from the source to the destination
format. It knows nothing about the concept of a phone call. It only
knows how to convert media from one format to another.
For more detailed information about the translator API, see
include/asterisk/translate.h
and
main/translate.c
. Implementations of the translator abstraction
can be found in the codecs
directory.
Asterisk is a very heavily multithreaded application. It uses the POSIX threads API to manage threads and related services such as locking. All of the Asterisk code that interacts with threads does so by going through a set of wrappers used for debugging purposes. Most threads in Asterisk can be classified as either a Network Monitor Thread, or a Channel Thread (sometimes also referred to as a PBX thread, because its primary purpose is to run the PBX for a channel).
Network monitor threads exist in every major channel driver in
Asterisk. They are responsible for monitoring whatever network they
are connected to (whether that is an IP network, the PSTN, etc.) and
monitor for incoming calls or other types of incoming requests. They
handle the initial connection setup steps such as authentication and
dialed number validation. Once the call setup has been completed, the
monitor threads will create an instance of an Asterisk channel
(ast_channel
), and start a channel thread to handle the call
for the rest of its lifetime.
As discussed earlier, a channel is a fundamental concept in Asterisk. Channels are either inbound or outbound. An inbound channel is created when a call comes in to the Asterisk system. These channels are the ones that execute the Asterisk dialplan. A thread is created for every inbound channel that executes the dialplan. These threads are referred to as channel threads.
Dialplan applications always execute in the context of a channel
thread. Dialplan functions almost always do, as well. It is
possible to read and write dialplan functions from an asynchronous
interface such as the Asterisk CLI. However, it is still always the
channel thread that is the owner of the ast_channel
data
structure and controls the object lifetime.
The previous two sections introduced important interfaces for Asterisk components, as well as the thread execution model. In this section, some common call scenarios are broken down to demonstrate how Asterisk components operate together to process phone calls.
One example call scenario is when someone calls into the phone system to check their Voicemail. The first major component involved in this scenario is the channel driver. The channel driver will be responsible for handling the incoming call request from the phone, which will occur in the channel driver's monitor thread. Depending on the telephony technology being used to deliver the call to the system, there may be some sort of negotiation required to set up the call. Another step of setting up the call is determining the intended destination for the call. This is usually specified by the number that was dialed by the caller. However, in some cases there is no specific number available since the technology used to deliver the call does not support specifying the dialed number. An example of this would be an incoming call on an analog phone line.
If the channel driver verifies that the Asterisk configuration has
extensions defined in the dialplan (the call routing configuration)
for the dialed number, it will then allocate an Asterisk channel
object (ast_channel
) and create a channel thread. The channel
thread has the primary responsibility for handling the rest of the
call (Figure 1.5).
Figure 1.5: Call Setup Sequence Diagram
The main loop of the channel thread handles dialplan execution. It
goes to the rules defined for the dialed extension and executes the
steps that have been defined. The following is an example extension
expressed in the extensions.conf
dialplan syntax. This
extension answers the call and executes the VoicemailMain
application when someone dials *123
. This application is what a
user would call to be able to check messages left in their mailbox.
exten => *123,1,Answer() same => n,VoicemailMain()
When the channel thread executes the Answer
application,
Asterisk will answer the incoming call. Answering a call requires
technology specific processing, so in addition to some generic answer
handling, the answer
callback in the associated
ast_channel_tech
structure is called to handle answering the
call. This may involve sending a special packet over an IP network,
taking an analog line off hook, etc.
The next step is for the channel thread to execute
VoicemailMain
(Figure 1.6). This application is
provided by the app_voicemail
module. One important thing to
note is that while the Voicemail code handles a lot of call
interaction, it knows nothing about the technology that is being used
to deliver the call into the Asterisk system. The Asterisk channel
abstraction hides these details from the Voicemail implementation.
There are many features involved in providing a caller access to their Voicemail. However, all of them are primarily implemented as reading and writing sound files in response to input from the caller, primarily in the form of digit presses. DTMF digits can be delivered to Asterisk in many different ways. Again, these details are handled by the channel drivers. Once a key press has arrived in Asterisk, it is converted into a generic key press event and passed along to the Voicemail code.
One of the important interfaces in Asterisk that has been discussed is that of a codec translator. These codec implementations are very important to this call scenario. When the Voicemail code would like to play back a sound file to the caller, the format of the audio in the sound file may not be the same format as the audio being used in the communication between the Asterisk system and the caller. If it must transcode the audio, it will build a translation path of one or more codec translators to get from the source to the destination format.
Figure 1.6: A Call to VoicemailMain
At some point, the caller will be done interacting with the Voicemail
system and hang up. The channel driver will detect that this has
occurred and convert this into a generic Asterisk channel signalling
event. The Voicemail code will receive this signalling event and will
exit, since there is nothing left to do once the caller hangs up.
Control will return back to the main loop in the channel thread to
continue dialplan execution. Since in this example there is no further
dialplan processing to be done, the channel driver will be given an
opportunity to handle technology specific hangup processing and then
the ast_channel
object will be destroyed.
Another very common call scenario in Asterisk is a bridged call between two channels. This is the scenario when one phone calls another through the system. The initial call setup process is identical to the previous example. The difference in handling begins when the call has been set up and the channel thread begins executing the dialplan.
The following dialplan is a simple example that results in a bridged
call. Using this extension, when a phone dials 1234
, the
dialplan will execute the Dial
application, which is the main
application used to initiate an outbound call.
exten => 1234,1,Dial(SIP/bob)
The argument specified to the Dial
application says that the
system should make an outbound call to the device referred to as
SIP/bob
. The SIP
portion of this argument specifies that
the SIP protocol should be used to deliver the call. bob
will
be interpreted by the channel driver that implements the SIP protocol,
chan_sip
. Assuming the channel driver has been properly
configured with an account called bob
, it will know how to
reach Bob's phone.
The Dial
application will ask the Asterisk core to allocate a
new Asterisk channel using the SIP/bob
identifier. The core
will request that the SIP channel driver perform technology specific
initialization. The channel driver will also initiate the process of
making a call out to the phone. As the request proceeds, it will pass
events back into the Asterisk core, which will be received by the
Dial
application. These events may include a response that the
call has been answered, the destination is busy, the network is
congested, the call was rejected for some reason, or a number of other
possible responses. In the ideal case, the call will be answered. The
fact that the call has been answered is propagated back to the inbound
channel. Asterisk will not answer the part of the call that came into
the system until the outbound call was answered. Once both channels
are answered, the bridging of the channels begins
(Figure 1.7).
Figure 1.7: Block Diagram of a Bridged Call in a Generic Bridge
During a channel bridge, audio and signalling events from one channel are passed to the other until some event occurs that causes the bridge to end, such as one side of the call hanging up. The sequence diagram in Figure 1.8 demonstrates the key operations that are performed for an audio frame during a bridged call.
Figure 1.8: Sequence Diagram for Audio Frame Processing During a Bridge
Once the call is done, the hangup process is very similar to the previous example. The major difference here is that there are two channels involved. The channel technology specific hangup processing will be executed for both channels before the channel thread stops running.
The architecture of Asterisk is now more than ten years old. However, the fundamental concepts of channels and flexible call handling using the Asterisk dialplan still support the development of complex telephony systems in an industry that is continuously evolving. One area that the architecture of Asterisk does not address very well is scaling a system across multiple servers. The Asterisk development community is currently developing a companion project called Asterisk SCF (Scalable Communications Framework) which is intended to address these scalability concerns. In the next few years, we expect to see Asterisk, along with Asterisk SCF, continue to take over significant portions of the telephony market, including much larger installations.
http://www.asterisk.org/