The following is a cleaned up log I wrote for myself while debugging a bug. Writing a log while working helps me keep track of the debugging effort in case I’m interrupted (life, sleep, work, etc.). It also requires me to explain all findings to myself in fully spelled out sentences, making my thoughts considerably easier to follow.
In addition to serving as an example of a personal debug log, I hope it can be useful as an introduction to debugging segfaults or other low-level bugs.
How to reproduce
I have three computers running Ubuntu 11.04. Mopidy revision 9c23949 consistently crashes with a segfault on one of them when I do the following:
- Start Mopidy.
- Connect with an MPD client, e.g.
- Search for anything, e.g.
- Wait less than 10 seconds for the segfault to happen. It always happens
directly after a log message from Mopidy’s Spotify backend stating that it
Rule out the obvious
I check that I’m actually using the latest versions of the important pieces of software, not just some old version I installed by hand and forgot about.
I try uninstalling pyspotify v1.1+mopidy20110405 installed from my own Debian package, and install the latest revision (c6e2a02) from Git instead. Still the same error.
Nobody has reported the same problem, even though both other developers and users should have been on the same mix of versions for the last couple of months.
Luckily I can consistently reproduce the segfault in less than 20 seconds of work. I’m not too familiar with debugging C programs and especially the infamous segfaults, but I’m taking on this fight.
I expect the problem to be in pyspotify, as the rest of the code is either pure Python or more well-tested and broadly used libraries. Also the segfault consistently happens directly after a log message from Mopidy’s Spotify backend.
Antoine Pierlot-Garcin, the new main contributor to pyspotify, says to rebuild pyspotify with
-g will include debug information that
understand in the resulting binary.
-O0 will override the
of the pyspotify build system, and turn off any optimizations to ease the
Googling “debugging segfaults” yields a nice howto on debugging segfaults.
In summary: Get a core dump, inspect it with
Possible causes: “There are four common mistakes that lead to segmentation faults: dereferencing NULL, dereferencing an uninitialized pointer, dereferencing a pointer that has been freed (or deleted, in C++) or that has gone out of scope (in the case of arrays declared in functions), and writing off the end of an array.”
Rerunning Mopidy nothing more interesting happens, except the usual segfault. No core dump is produced.
Hum, how to get a core dump? I’ve done it before, but I can’t remember. Google helps.
ulimit -c 50000 to get a core dump of maximum 50MB, and then
reproduce the segfault.
python mopidy -v, connect with
ncmpcpp, search for
foo, wait less
than 10 seconds. Kaboom:
Loads the core dump up in
gdb complains that the core dump is truncated, and that is should be
man bash and search for
ulimit. Aha. I can specify
I’m not allowed to? Restart shell, run
ulimit -c unlimited again. Works.
Rerun Mopidy to get an untruncated core dump.
Analyzing the core dump
gdb loads debug symbols from all the linked libraries that got them
available. Notably, the proprietary
libspotify library does not include
debug symbols. So what does
Verifying a hypothesis
According to the libspotify docs for the album subsystem
sp_album_add_ref takes an
sp_album struct and increases the reference count
of the album. Looking back at the list of reasons for segfaults from the
segfault debugging howto, it may sound as we increase the reference count of
NULL, which obviously isn’t good. Let’s see if that hypothesis is correct…
Looking in the howto for ways to proceed,
backtrace reveals the events
happening just before the segfault:
Moving one step up the stack, we get to the pyspotify code:
Our guess was that we’re increasing the reference count on an album that is
null. Let’s see what
album actually was…
A pointer to an
sp_album struct at the address
0x0, also known as
Finding a solution
Let’s take a look at the pyspotify code in question:
We create a pointer to a
sp_album struct. Given a Spotify track, we request a
reference to the related album, and assign the result to our pointer. Then we
Album Python object, before we increase the reference count on the
sp_album and give the reference to the Python object. Finally, the Python
object is returned.
Lets try just returning
NULL if we get
If we rebuild pyspotify and try to reproducing the segfault, we now get a familiar Python traceback instead:
Note that we get a
SystemError and not e.g.
object has no attribute 'year' which we would expect if
None. This is because we return
Track_album, which Python
considers an “error return”, without setting an exception code first. If we
want to return
None on failure instead of throwing an exception, we can
track.c as follows:
Rebuilding pyspotify again, and reproducing the error, we now get the expected Python error which we can handle nicely:
By this point, I consider the bug squashed and ready to be reported.
The only part left is to decide how to best fix it. E.g. if
None, throw an exception, or maybe return an empty album object.
That’s another story.