The Conifer Systems Blog

Machine-Independent Builds

1 comment

One of the most important reasons to use revision control is to ensure that all developers are working from the same source base.  If I reproduce a bug in my tree, I want to be sure that someone else can reproduce the same bug on their system by reconstructing the exact same source tree.  Reconstructing a tree is as simple as knowing the tree’s revision number.

Unfortunately, it is all too easy to partially undercut this benefit of revision control if your build system does not generate the same binaries when you run it from two different computers.  Suppose one person is building using Visual Studio 2005 and another using Visual Studio 2008.  Or, they’re both using VS2008 but one has SP1 installed and the other doesn’t.  The resulting binaries will be subtlely different.

No big deal, right?  I mean, how often do you really run into a compiler bug?

Well, not so fast.  In large, complex software projects, changing even the seemingly most insignificant variables can change the behavior of the system in unexpected ways.  I’ll never forget a bug I spent well over a week tracking down, where, depending on the length of the program’s command line, subsequent memory allocations’ addresses shifted around.  Someone was accessing an uninitialized structure element on the stack, and the garbage contents of this stack element (which were actually quite deterministic) would change as the program’s command line length changed.  The bug would appear and go away as people checked in random other changes to the source code (which in turn shifted the addresses around again), but with any given exact set of binaries, the bug would either consistently happen or consistently not happen.

Likewise, my experience on large projects — let’s say, any project larger than 100K lines of code — has been that switching/upgrading compilers always causes at least one unexpected gotcha.  This is especially true for C++ (as opposed to C) projects.  Again, I could dredge up any number of obscure problems from past experience.  Upgrading compilers is not a no-brainer; it’s a decision with pros and cons that has to be made carefully, and if you upgrade, all of the members of your team should upgrade at the same time.

If we’re talking about Visual Studio, I’d also be remiss to not talk about the C runtime library issues.  If you build with 2005, you need the 2005 runtime installed on your computer; if you build with 2008, you need the 2008 runtime installed.  The runtimes also have service pack levels, so a 2005 SP1 app requires the 2005 SP1 runtime.  Again, it really does matter which exact compiler you are using, including service pack level.

Let’s return to the topic of revision control.  What, exactly, should we put in the revision control system?  We all know that our source code goes there, as do our build scripts, including any scripts required to create our installers.  Test cases and test scripts often go there too.  But this is still incomplete.  In my view, your compiler, linker, and related tools also belong in revision control.  So do all of the system header files you include and all the system libraries you link against.  So do the headers and libraries for any third-party SDKs you rely on.

I believe it is a serious mistake to pick up whichever compiler, linker, headers, and libraries happen to be installed on the computer doing the build or happen to be pointed at by the user’s environment variables.  This is a good way to get machine-dependent builds: two people build the same source code and get different results.  “Different results” could mean one person gets a compile error and the other doesn’t.  Or, it could mean that their builds both succeed, but they produce different binaries that behave differently.

Machine-dependent builds may be tolerable in small, informal projects, but if the software project is key to your company’s livelihood, do you really want to take that chance?  In science we speak of “reproducibility of results”: if one person cannot independently validate another’s research, we are inclined to distrust that research.  Thus, it’s important to keep a full record of all the methodology you use in your research, so that someone else can set up the exact same experiment you did.  It’s no different with computers and software engineering.

There are other advantages to machine-independent builds, beyond reproducibility of results.

  • You don’t have to “install” any software on your OS to get a build up and running — you just check out a tree and you’re good to go.
  • You don’t have to roll out patches to the team and make sure that everyone installs them; you just commit a change that updates the tool and everyone gets it for free.
  • It’s much more likely that you will be able to accurately reconstruct old builds.  For example, supposing Version 1.0 was built using VS2008 and Version 1.1 was built using VS2008 SP1, if you want to go back to track down a Version 1.0 issue, you really should switch back to using the original compiler you used on that release, without the service pack installed.  If the tools are in revision control, there’s no chance that you will forget to do this.

One good way to get your machine-independent builds up and running is to use a virtual machine.  (There are a number of free virtualization products out there that will do the trick.)  Set up a clean OS install in a virtual machine.  Without locally installing any software on your virtual machine, check out a tree and build it; if it complains about something, chances are you’re still relying on some local tools by accident rather than pulling everything from source control.  (You can do this without a virtual machine, but it’s so easy to forget about some obscure step you did long ago when you first set up your computer.  With a virtual machine you can force yourself to start with a clean slate.)

Admittedly, there are some components where you simply cannot avoid a local install.  The two main ones I’ve run into are the Visual C runtime library and the .NET Framework.  In these cases, you simply have to document that users must install these on their build machines before doing a build.  Again, using a virtual machine is valuable: it’s one of the best ways to discover missing steps in your build documentation.

One final note: this post has been fairly Windows-centric.  Machine-independent builds are possible on Linux and Mac also, although each OS has its own gotchas.  Probably a topic for a future post…


Written by Matt

September 15th, 2008 at 12:12 pm

One Response to 'Machine-Independent Builds'

Subscribe to comments with RSS or TrackBack to 'Machine-Independent Builds'.

  1. [...] written earlier about machine-independent builds, but let’s talk about a related issue: build determinism.  A build or a build step is [...]

Leave a Reply