The Conifer Systems Blog

Interesting insight on large projects

no comments

Mark Lucovsky’s description of the software engineering processes used in Windows NT development is old, but a good primer for anyone who hasn’t worked on projects this large.  Software development projects become much more difficult as they grow, especially as you start to exceed 100 developers — and Windows 2000 had 1400 developers (plus another 1700 testers) working on 29 million lines of code.

It would be interesting to get an update on how things have changed since this was written, to get some insight on the challenges Windows Vista and Windows 7 development have faced and how the processes built for Windows 2000 have scaled over the last decade.  But some things are safe bets: software projects tend to get bigger and harder over time.  Software complexity has been increasing faster than our ability to deal with that complexity.  Many of the basic tools people use in day-to-day software development are remarkably primitive.

Many of the problems that Mark describes are problems that Cascade attacks head-on:

  • The source tree was very large (50GB — recall, disks were not as big at the time).  It took a very long time (1 week) to download a brand new tree, and 2 hours to get today’s updates.  The performance was improved greatly by moving to a new source control system, but even so, who wants to wait 3 hours to set up a new tree?  With a file system-based approach like Cascade, a tree consumes negligible disk space.  Setting up a new tree or updating an existing tree takes just seconds rather than minutes or hours.
  • Slow builds.  Some things never change: the full OS build took 5 hours on NT3.1, and while hardware got faster, the tree got larger, so it took 8 hours on Windows 2000, even on a very-high end machine (4 CPUs).  I’ve had the same experience on my own projects: computers do get faster every year, but build times always seem to get worse, not better.
  • Frequent regressions (build breaks, boot failures, etc.) that shut down the whole team.  Even the smartest engineers make mistakes.  Automated build and test labs of the sort Mark describes certainly help, but typically such systems can only detect breaks after the fact.  Wouldn’t it be better if developers could know what their changes might break before they commit, or for the system to actively prevent breaks from being committed?

While it’s a little short on technical details, a Wall Street Journal article from a few years back seems to confirm that these problems haven’t gone away, and that they may have even contributed to Vista’s delays:

In making large software programs engineers regularly bring together all the new unfinished features into a single “build,” a sort of prototype used to test how the features work together. Ideally, engineers make a fresh build every night, fix any bugs and go back to refining their features the next day. But with 4,000 engineers writing code each day, testing the build became a Sisyphean task. When a bug popped up, trouble-shooters would often have to manually search through thousands of lines of code to find the problem.


Written by Matt

September 10th, 2008 at 10:06 am

Leave a Reply