The Conifer Systems Blog

Continuous Process Improvement

no comments

With the possible bankruptcy of the US Big Three automakers in the news, it’s interesting to think about the analogies between making cars and making software.  There is no one single reason why the Big Three have been on the decline for many years now, but surely one of the most important reasons is that American consumers decided that Japanese cars are generally higher-quality than American cars.

These days, the Japanese carmakers’ attitude towards quality and process improvement is so mainstream that it’s almost cliched.  It can easily be taken too far; for example, it is clearly not the case that increasing quality always saves money.  Rather than get involved in the “software quality” debate, I’d rather focus right now on the idea of “process improvement.”

Whether you realize it or not, you have a process for building software.  Oftentimes very little conscious thought has been put into this process, and it is frequently ineffective and wasteful.

Here are some basic elements of your software engineering process to think about:

  • Who decides whether a feature is going to be added or a bug is going to be fixed, and if so, in which release?
  • How many outstanding branches/codelines do you have?  When do you create a new one?  When do you shut down an existing one?  Who does integrations between them, and how often?
  • How do you build your software, all the way from the original source code to the final CD/DVD image or downloadable installer, for all of your target platforms/build configurations?  How do you detect and/or prevent build breaks?
  • How do you find bugs in your software?  Customer bug reports?  Internal QA?  Code reviews/code reading?  Static analysis tools and compiler warnings?  Asserts?  Other?
  • What do engineers do before they commit their changes, e.g., what platforms do they build and test on, and what tests do they run?
  • What happens after a change is committed?  What automated or manual builds and tests are run on it?
  • How do you verify that a bug is really fixed?
  • If a previously committed change seems to be causing problems, what do you do?  How much time do you let the change stay in place while you try to debug the problem?  Or do you “back it out first and ask questions later”, putting the responsibility on the original change’s author to figure out what went wrong and reapply a correct version of the change later?
  • Top engineers are often 10x more productive than average engineers.  Is your process geared towards allowing those top engineers to flourish, at the risk of occasional mistakes slipping through, or is it geared towards preventing the average engineer from making mistakes, at the risk of reducing your top engineers’ productivity?

Whatever your current process, if you want to improve the effectiveness of your software development organization, you should be looking for ways to enhance it.  A very simple way to do this, pioneered in manufacturing by the Japanese automakers, is to look for the root cause of each problem and fix the root cause so the problem cannot happen again.  (I’ve written previously on this topic.)  One simple method Toyota adopted to identify root causes is called the “5 Whys.”  The important thing is not the specific method you use, but that you do dig down to understand why problems are happening.

This isn’t just the responsibility of management.  Individual engineers should be looking for opportunities for process improvement, too.  Any time you find or fix a bug, for example, this gives you an opportunity to ask a bunch of questions:

  • When and how was the bug introduced?
  • Could we have prevented this bug from being introduced in the first place?
  • Could we have detected this bug sooner?
  • From the time this bug was reported, could we have fixed it sooner?
  • Was the bug prioritized appropriately?
  • This could be one of a family of related bugs.  Are there other, similar bugs elsewhere we should look for?

We can ask much the same questions any time someone’s build is broken:

  • Who broke the build?
  • Why wasn’t the build break discovered before commit?  How could it have been prevented?
  • How quickly was the build break detected?  How quickly was it fixed?
  • Are other build configurations or other components’ builds broken too?

To give a more concrete example, maybe a build break wasn’t discovered before commit because the developer only did builds on a subset of the target platforms/configurations.  Perhaps the debug build passed and the release build failed.  Or perhaps the Windows build passed and the Linux build failed.  What possible process improvements does this suggest?

  • Require people to test-build all configurations before committing.  I would probably not recommend this; the cost can easily exceed the benefit.  Also, engineers are likely to “forget” to follow such a requirement, either intentionally or unintentionally, or are likely to make “one last change” after doing all their tests and not go back and fully test everything again.
  • Reduce the number of supported build configurations.  Debug/release is pretty typical, but suppose you’re still supporting some ancient operating system that no one cares about any more; perhaps you can finally retire your old DOS or Win9x or MacOS9 build, for example?  Or perhaps you can have a single binary for all versions of Linux rather than a separate binary for each supported Linux distro?
  • Disable “warnings as errors.”  This one is a double-edged sword.  On one hand it prevents warnings from creeping in.  On the other hand it makes your builds more brittle.  It’s up to you to make the right choice.
  • Set up a system like Cascade that will reject the commit of any change that breaks a build.

We can never achieve process perfection, but over time we can improve our process so that we don’t make preventable mistakes.  We should be able to avoid making the same mistake twice, for example.  We also need to watch that our process doesn’t get overly bureaucratic and burdensome.  Every so often it may be useful to “deregulate” your process: toss out of some of the rules, especially the ones that you think might have a poor cost/benefit ratio, and see what happens.

Written by Matt

November 25th, 2008 at 4:04 pm

Leave a Reply