< Zope Foundation in the Google Summer of Code!
Brief Python 3000 thoughts >

[Comments] (2) Debugging strategy: easy stuff first:

I've been writing software for quite a while now, but the software I write still has bugs. One uses various strategies to avoid bugs, but bugs still creep in. Bugs happens all the time. I sometimes believe only programmers are truly aware how flawed human reasoning really is and how many mistakes a person can make without even noticing. Us prrogrammers are confronted with our own mistakes every day. The real world tends to be more forgiving of slight mistakes than the virtual world of software.

So, your software has a bug.

When you have a bug, you look for its cause. Often you have a pretty good idea about what the problem is, and you know you can fix it quickly. You go in and fix the code, and, perhaps, hopefully, are able to add an automatic test so the code can't break again in the same way in the future. Done.

Often though you have no clue what the problem is. This can happens after you've looked at a likely cause and found out it wasn't that after all. "That's bizarre! That shouldn't happen!" You don't know where to start debugging. You have some ideas about likely causes, perhaps, but you know you are going to have to sit down for this one. What to do then?

One thing that has helped me in this case is the following simple strategy: check the things that are easy to check first. Not because these things are likely to be the cause of the problem -- they may in fact be quite unlikely -- but simply because they're easy to check.

Check stupid things. Perhaps you didn't save the file. Perhaps you didn't restart your program before testing. Check whether you're really working on the right files. Perhaps some symlinks got crossed. Are the versions of the libraries you're using really correct? Check whether the web application that has the bug is really actually being run by the server you thought it was. Perhaps there's another server hanging around serving the buggy app and all your efforts have no effect. Check whether the API you know so very well really works in the way you thought it did. It might be just a one line program to check. Yeah, it's unlikely, but you won't waste a lot of time, so just do it.

Why should we be checking unlikely causes for our bugs? Even if the chance that something is the cause is very small, the chance still exists. And since it's quick and easy to check, just make sure. If it was the cause of the bug, you're lucky and might've saved yourself hours of head scratching only to slap your forehead in the end. If it wasn't the cause of the bug, at least you've not wasted a lot of time to exclude it, and you can move on.

Don't check unlikely causes that are hard to check first. You may end up checking those anyway, as it could be them after all. But not initially. First check the easy stuff.

Also don't check the likely causes first that are hard to verify. You think it's probably a threading issue? Oh no, that's difficult to debug! What if that wasn't the real cause for the bug after all? You've just spent hours testing for it. What if it turned out the bug was really caused by something you thought unlikely to be it, and you could've excluded it with just a minute of work? You would've wished you had spent that minute straight away. It wouldn't have been a big loss if not, and you might've hit the jackpot.

So check the things that are easy to check first.

Of course the best way to deal with bugs is not to create them in the first place. If you find yourself dealing with the same type of bug over and over again, consider whether you can change your way of working to avoid them altogether. But as we all know, there will always be bugs...


Comments:

Posted by Daniel Nouri at Sat Apr 14 2007 21:47

It's an interesting observation that the real world is more forgiving
to slight mistakes. But of course programmers aren't the only ones
noticing the tendency of human reason to mistake. From the
disciplines of the *real* world, psychology and history come to my
mind. Historians are able to track back events and false reasoning
easily by studying human history. One could argue that the flaws of
human reasoning that occur in programming are relatively harmless
compared to bad human reasoning in a historical scale. (Of course,
some software bugs are so big they make history.)

I find it very annoying when the simple things fail and I waste time
looking for the problem in the wrong place or at the wrong level. In
which case checking that the simple things work first usually helps.

The simple things do not fail equally often for me in the course of a
software project. When I set up a development environment for the
first time, simple mistakes are more likely than when I'm in the midst
of implementing a feature for a project where the environment itself
does not change, e.g. where I don't update the source tree while
developing.

Anyway, finding the problem in stupid places and thus having to check
stupid things constantly is annoying. These are my strategies against
stupid mistakes that I like to employ for my work:

- Don't rely too much on automation of the development environment.
I don't like fancy IDEs or build tools (I'm lucky I'm a Python
programmer). If I have to automate my setup, I want to understand
every bit of what is automated and how.

While many people in the Plone community have come to like tools
like instancemanager or buildout, I tend to stay away from them.
I don't want to exclude the possiblity that I *could* be more
effective at setting up a Zope development environment with one of
these tools. However, the effort of *learning* these tools
thoroughly so that I know them by heart and keeping up with
changes in use / API that they might go through is pretty big.

I find that understanding every part of my setup is very helpful
for finding problems with it.

- Learn what your code does -- I want to understand what every line
of my code is doing. This problem is apparent in code generators
and program templates: They automate simple things for you, and as
a result, you like to ignore these things. In the worst case,
they're generating code for you that you don't understand and that

If you need to use program generators, understand every line of
code that they generate. Program generators will never be able to
hide away the complexity of your platform. This will be apparent
to you the first time you see a traceback in your program and
you've never written a line of code. For this reason, I like to
think of the ArchGenXML code generator as a *false friend*.

Just like you want to understand every piece of your development
environment setup, try to understand as much of your own code.
Else, you're less able to identify in which part of the stack your
actual problem lies.

- On a programming level, I find that writing tests for the stupid
things helps a lot. If you cover stupid things, you are more
likely to detect that the problem is at a lower level than you
would have expected it to be. If you only write tests for the
higher level things, you won't know which part of the puzzle is at
fault.

- In a project that I worked for this week I made the mistake of
starting to debug (as in pdb) too early. There was a bug in a
certain part of the project that I thought I knew by heart, but
where in fact quite some substantial, yet subtle, changes were
made recently; I didn't understand the big picture.

The rule that I think would help for this: Communicate a lot with
your team and do code audits. This will also help with
programmers understanding the bigger picture and taking
responsibility of code: Have more than one person (the author)
look at code to avoid not only stupid mistakes.

Posted by Martijn Faassen at Sat Apr 14 2007 23:56

Hey Daniel!

There are many strategies to avoid problems. Checking the easy things first is when you know you already have a problem and you don't really know where to go looking for it.

Automation can both help and harm with problem prevention. It can help as it can raise the abstraction level and avoid human mistakes with the small details. It can also harm as the magic of the automation may not be perfect enough, and a new level of abstraction requires more learning. A code generator where you need to edit the generated code is an example of something that in my opinion harms more than it helps. A code generator (such as a compiler) where you don't need to see the generated code can be fine, though. Where the balance lies depends on the quality of the tools and the preferences of the programmer.


[Main]

Unless otherwise noted, all content licensed by Martijn Faassen
under a Creative Commons License.