Inspire each other
and get better day by day.
Bughunt: glibc should follow
Where to log?
Developing and operating the software others have build sometimes comes with unexpected surprises. Some time ago, on a sunny afternoon, one of our central logging servers was complaining that it was flooded by a lot of log lines from some of our Asterisk servers. This is, generally speaking, not a good sign. On inspection, it showed that it was receiving many lines like
ua0 asterisk: syslog: unknown facility/priority: 404. Which basically means that the syslog servers says: I want to log something but I don’t know where to! There were, however, no other signs of trouble. All services operated normally.
Still, this issue needed to be solved, since it takes away our overview of others things going on, some of which could be important. We quickly realized that this behavior started since we upgraded one of our production telephony servers to the latest version of Asterisk 11. However, how could this have become a problem? Since placing calls using Asterisk servers is the bread and butter of our operations, we do have a cautious upgrade process in place.
Before initiating the upgrade process, we test it thoroughly and also read the Asterisk changelog meticulously. The changes in the code and changelog looked innocent enough this time. However, revisiting the changelog once more, we observed that it includes a fix for
core/logging: Fix logging to more than one syslog channel.
A bug from the distant past
Since the complaint logged on our central log server mentioned syslog, we zoomed in on this issue. Reading some comments, some code, and doing some searching and digging, we found this comment. It stated:
This bug was fixed 15 years ago in BSD, glibc should follow.
And with that change, actually originating from 1997, our bug was born!
Writing the fix
So what happened is that the version of Asterisk we deployed, had a change in the assumption about how to log to syslog. As a quick fix, we changed the syslog facility level where Asterisk was logging to. Next, since we prefer to do things right, we wrote a fix for Asterisk and offered it upstream.
Also, glibc did follow
After almost 15 years, glibc did follow! More recent versions of Linux, shipped with glibc 2.17 or higher, do behave as expected. Next to that, our patch was merged in Asterisk 11.23. And with that, we can safely forget about the specifics of this bug. A takeaway from this story is that changes in code can have consequences more than a decade later!
You can find the original bug here.
No comments so far.