Working with the Channel Event Log

Written by Luit van Drongelen on 28th April 2016

In the previous post about our telephony platform VoIPGRID’s Call Detail Records, I explained that our new call detail registration system was built using the Channel Event Log. In this long-awaited post, I’ll explain a bit about the challenges I faced in designing and implementing this system.

So most of what Asterisk does happens targeting a channel, or a pair of channels. These channels, as explained in the aforementioned post, come in two flavors: internal and external. The internal channels are the so-called Local channels. And in our case, the only external channel type we use is the SIP channel.

The Channel Event Log (abbreviated CEL) contains a log of all the events that happened to all the channels that Asterisk has handled. Not that surprising. There are several types of events. When Asterisk creates a channel, it emits a CHAN_START event in the CEL. Destroying a channel (no longer keeping data about this channel in memory) results in a CHAN_END event. If a channel is answered, ANSWER appears in the log. Events are logged when channels are bridged, a call is transferred, and in many more situations.

With this description, extracting data from the log of events might sound trivial: for each channel in the log get a list of events. With these events, find out what the channel was used for and by whom, and when and how long it lived. Group some of those channels together, and store this data.

I wish it was that simple.

Subtle implementation details from Asterisk seep through to the CEL, so to understand some of the difficulties I had with the CEL, let’s first explain some Asterisk basics.

Channel name

Each channel in Asterisk has a name. Every event that’s logged in the CEL is linked to a channel identified by this name. The name is generated based on a couple of values. For reference, a few example channel names taken from one of our servers are:

  • Local/ID291131@osvpi_route_phoneaccount-000ce2a2;1
  • SIP/129500008-000f384b
  • SIP/voipgrid-siproute-ams-000f3850

These names can be separated in 3 parts:

  1. channel technology (i.e. Local and SIP in our case)
  2. a channel technology-dependent identifier I’ll explain below
  3. a counter

As you can see in the examples given above, this is not completely true for the Local channel: it’s suffixed ;1 (pronounce semi-one). This is because a Local channel is actually two semi-channels that sit back-to-back passing their data to one another. Data flowing into the channel ending in ;1 is passed through to the channel with the same name but ending in;2, and vice versa.

The meaning of the second part of a channel’s name is different for each channel technology. For a Local channel, it describes the extension and a context, separated by an @. For SIP channels it describes the peer it’s connected to. This peer can either be one of our OpenSIPS proxies routing traffic to the telephony network (e.g. voipgrid-siproute-ams), or an account code (e.g. my desk phone: 129500008).

As the counter never gives out the same number twice in the same run of Asterisk, the channel names are guaranteed to be unique within that instance of Asterisk. Restart Asterisk and you’ll see this counter restart at zero.

This last fact is slightly annoying. I wanted to take a globally unique identifier from a channel, so our new Call Detail Records could use those as its primary key (yes, we store our derived data in an SQL database). As these names can clash with restarts we can’t simply take this name + the server hostname as its primary key. So, we’ll look for another.

Channel uniqueid

Another identifier given with each event is the channel’s uniqueid. This identifier has more uniqueness guarantees than the channel’s name has. A uniqueid consists of three values concatenated to form one string:

  1. the hostname of the machine Asterisk runs on
  2. the UNIX timestamp of the second the channel was created
  3. a global counter

For example, one such uniqueid is vgua0-grq-1449581657.2686311. These identifiers are globally unique under the following assumptions:

  • the hostnames you use are unique
  • Asterisk doesn’t do the following within one second: start, create a channel, restart, create another channel

The first assumption is easy to work with. It’s smart to have different hostnames anyway, so we didn’t need to do anything here. The second assumption, for us, is even easier; our Asterisk can’t start that fast (loading its configuration from an SQL database).

So you could say that the uniqueid is a better identifier to use when storing information about channels. And with some caution it is, but tracking a channel’s state still has to happen based on its name. Why?

Bridging and masquerading

Confusing things happen when calls are bridged. When channels are answered, Asterisk puts them together in a bridge, that will connect the channels, letting data flow between them in both directions. When bridging two SIP channels it’s easy to understand what will happen. Most of the time, though, there are some Local channels involved.

When setting up a call, it’s quite convenient to use Local channels. Whenever the dialed channel answers, the call setup phase is over, and dialer and dialed channels are bridged. With Local channels in between, this might mean SIP/alice-00000001 is bridged to Local/234@something-00000001;1, and Local/234@something-00000001;2 is bridged toSIP/bob-00000002.

Keeping Local channels alive when a call is fully setup and connected isn’t preferred, so Asterisk will start something called a Local Channel Optimization as soon as the first audio travels through this chain. This optimization means that Local channels will step out of the chain through a mechanism called a masquerade. This masquerade is quite a strange thing and a direct result of some design choices in Asterisk’s core. This process goes as follows:

With the above example of SIP/alice calling SIP/bob, Local/234@something will masquerade. What this means is that it’ll take the guts of the semi-one’s peer channel (so, SIP/alice-00000001) and replace the guts of its semi-two channel with that data. The guts of the semi-two end up where SIP/alice-00000001 used to be, which means both ends of the Local channel end up in one bridge, and both SIP channels in the other. The Local channel and the bridge it’s in are now useless, so they’ll disappear, and the masquerade is then complete. The same mechanism works if there are multiple Local channels between the SIP channels, it’ll simply take multiple masquerades to get the desired result.

Why is this important?

The channels that are moved by a masquerade will still have their own name, but the uniqueid will change. This is because the name of a channel is kept in the data that is moved, and the uniqueid is linked to an Asterisk channel-thread in which this data is placed. This means that things happening to a channel will always be identified by the channel name, but can suddenly show a different uniqueid after a masquerade.

To work around this, our Call Detail registration process looks at both. To give a channel a globally unique identifier we take the uniqueid that the channel is created in. After creation, we’ll have to track the channel by its name though, to make sure we’re looking at the right one.

Easy? No!

You’d think something like the Channel Event Log would be an abstraction. Something that’s hiding the inner workings of Asterisk, so you’ll only have to work with whatever data the CEL presents you. Or has documentation of these rules that apply to things like channel names and uniqueids.

Turns out the CEL is quite a leaky abstraction, requiring deep knowledge of Asterisk to understand how and why some things happen in the log. Channel name and uniqueid is one such detail I had to find out the hard way. There are many more. To add insult to injury, there’s very little documentation for CEL, and no publicly documented cases of companies using CEL for call detail registration and/or call billing.

To build a system that uses the CEL for call detail registration was quite a daunting and drawn-out task because of this. All I explained in this article is stuff I had to figure out by myself, instead of through the CEL’s documentation.

Worth it? Yes!

Even though it’s hard, I think this approach is miles ahead of the standard approach of deriving your data from Asterisk’s CDRs. Since we’ve switched, the call detail records we derive from CEL have proven themselves to be reliable and correct on many occasions.

My next article will explain more about this new call detail registration system. If you have any questions about this article or thoughts on the subject, I’d like to hear about them in the comments below.

Your thoughts

  • Written by Josh on 18th December 2015

    Nice write-up… have you looked at differences in how CEL/CDR are handled in Asterisk 13?

    Any reason to continue with the Asterisk 11-based approach?

  • Written by Luit van Drongelen on 6th January 2016

    I have, a bit. At the time I started working with the CEL, Asterisk 12 wasn’t released yet, but getting close. As we’re only interested in working with LTS releases, we then had to wait for Asterisk 13, in addition to migrating everything from 11 to 13, which probably isn’t that simple.

    Long story short: I did look at it, I did like it, and we’ll probably one day use it, but not real soon.

  • Written by Simone Palomba on 25th May 2017

    Hi…I had to calculate time between call is answered by Ivr and agent Pick The call…
    Can i ask if Cel in your opinioni is The Best way?
    Thanks

Devhouse Spindle