Thursday, August 23, 2007

Compiler Bug, or Not?

I'm a firmware engineer. We use C++ for our firmware. C++ has been disparaged by some great programmers, but they are also the ones who talk about how great Ruby and JavaScript are. Try writing real-time firmware with those! But I digress. Yesterday I had an interesting experience with a bug that took me far to long to find (the fix was easy). Let me see if I can explain this well enough to get some helpful feedback from the C++ geniuses that I know are lurking out there.

I added a new class to our firmware a couple days ago, and had it instantiated inside another class. The two classes can be simplified to look something like this:


class Bar
{
public:
    Bar();
    void initialize();
private:
    int x;
};

class Foo
{
public:
    Foo();
    void start();
private:
    Bar * bar;  
};

Not exactly like that, but close. Firmware isn't very complicated, right? So anyway, here are the implementations for each, roughly:


Foo::Foo()
{
    Bar * bar = new Bar();
}

void Foo::start()
{
    bar->initialize();
}

And for Bar:


void Bar::initialize()
{
    x = 5;
}

And, more or less, that's it. Some other class instantiates Foo and calls it's start method, and things blow up. Or, at least they should. Can anyone see why? You get the job if you can. OK, I'll tell you. It's the little re-declaration of the pointer, bar, inside the Foo constructor. It creates a new Bar, and stores the pointer to it in a variable local to the constructor. It goes out of scope and disappears when the constructor is done. Oops!

So then, when Foo::start() is called, the private member variable, bar, is uninitialized and what happens? Well, not quite what I would expect. You see, according to the crash dump and the debugger, it seemed to still have run the Bar::initialize() method! Apparently, and looking at the assembly code seemed to confirmed this, the function is found at compile time and the address of it is stuck right in the code. The pointer isn't used to find it at runtime at all. I never knew that.

Back to the crash. The crash happened because the hidden "this" pointer that was passed to the method had the wrong value. That's not a big surprise. Looking at the assembly code, "this" was used as an offset to calculate where to store that 5, also not a surprise. The 5 didn't end up inside the Bar class because the value of "this" was zero. Crash!

I was first led astray by the method call succeeding. If that had failed I would have immediately looked at the pointer, but since it worked, I assumed pointer == good. The other reason I had such a hard time catching this bug of mine was because our "debug" build worked! For some reason the compiler was still storing the pointer to the Bar instance inside Foo's private member variable. When Bar::initialize() was called it received the correct value for "this," and everything worked just like I intended. Guessing my intent is impressive and all, but not very useful if the optimized build doesn't include that feature too ;-). It took a few conversations with co-workers for me to realize what was going on and finally find the bug. At least we all learned something...

So my question for the panel: is what the debug build did a compiler bug on top of my code bug? Or am I just grasping for some dignity here, in light of my blunder? What about executing the Bar::initialize() method off of a bad pointer? Do other compilers do that?

1 comment:

Yann said...

Hi,
maybe I was a bit fast at looking at the problem, but since initialize() is not virtual, there is no reason (and actually it would be wrong) to resolve the call using the pointer (hence the dynamic type of the object). Therefore, there is no need for any check before calling the method, the static type of the variable (Bar *) is sufficient to generate the code.
Since the pointer is actually invalid, the crash occurs the first time it is really needed (when a member is accessed)

Hope this helps :)