The Conifer Systems Blog

Where Does All That Disk Space Go?


Here’s an interesting blog post about why a Windows OS install can be so big.  One reason has to do with the WinSxS folder, which stores various versions of important system DLLs.  Why more than one version?  Well, each time a new patch, service pack, etc. is released, a DLL might be updated, and the new DLL is saved here in addition to the old one.

your next question is probably to ask why we don’t remove the older versions of the components. The short answer to that is reliability. The component store, along with other information on the system, allows us to determine at any given time what the best version of a component to project is. That means that if you uninstall a security update we can install the next highest version on the system – we no longer have an “out of order uninstall” problem. It also means that if you decide to install an optional feature, we don’t just choose the RTM version of the component, we’ll look to see what the highest available version on the system is. As each component on the system changes state that may in turn trigger changes in other components, and because the relationships between all the components are described on the system we can respond to those requirements in ways that we couldn’t in previous OS versions.

It’s downright tricky to design a system where you can install and uninstall both various components and various patches to those components in arbitrary orders, and then expect everything to work.

When you install your OS, you fill your computer’s hard drive with all sorts of stuff.  Realistically, you may never use a lot of that stuff.  As you install and uninstall more applications and updates, stuff tends to accumulate, a lot of it rarely used.  This isn’t a Microsoft/Windows thing.  This can happen on any OS.

Well, disks are big and cheap, so who cares?  I would have agreed a few years back, but there’s an important game-changing technology that makes me care again about conserving disk space: flash.  Solid state drives can be many times faster than traditional hard drives, but they also cost a lot more per gigabyte and aren’t available in extremely large sizes.  OS virtualization is another reason I might care: historically I had just one OS install on my hard drive, maybe 2 if I dual booted.  Now it’s common for me to have any number of virtual machines floating around, each one of which is its own independent OS install.

Cascade suggests a better way to solve this problem — a new model for software deployment for those of us whose computers are always connected to the Internet.  Instead of installing a copy of each component on the local disk, the file system should cache components on the local disk as needed.  Using CFS terminology, the installer would just set up a CFS tree/mount point pointing to a public (available on the Internet) Subversion repository containing the released software binaries.  As you used the software, CFS would automatically download the files you actually touched and cache them locally.  Of course, the cached files would stay around after rebooting your system.

The initial installer you would download would be tiny, since it wouldn’t contain any of the files, just a pointer to their location.  You’d never have to worry about whether you should install all of the features or just a subset to save disk space; all of the features would be available on demand at no cost in disk space.

In corporate environments, the downloads would go through Cascade Proxy, so you wouldn’t be downloading the sames files over and over again.

To update the software to pick up a patch, you would simply point CFS at a newer revision of that repository.  To “uninstall” the patch, you could always roll back to an old revision.

Most importantly, your CFS cache can easily fit on a SSD, so you could get the performance benefits of an SSD without worrying about running out of disk space as you install more applications.  For virtual machines, you could use a small CFS cache; cache misses can be serviced quickly out of a larger Cascade Proxy cache running on your host OS.

Written by Matt

September 30th, 2008 at 12:14 pm

2 Responses to 'Where Does All That Disk Space Go?'

Subscribe to comments with RSS or TrackBack to 'Where Does All That Disk Space Go?'.

  1. Solved on Ubuntu/Debian systems:

    $ sudo apt-get autoremove

    “remove packages that were automatically installed to satisfy dependencies for some package and that are no more needed.”

    Nigel Stewart

    1 Oct 08 at 12:53 am

  2. Windows Installer does reference counting of components, also, so when you uninstall a program that relies on shared components, they are automatically removed when the last reference goes away. Of course, this only works if everyone uses .msi/Windows Installer installers — which they probably should, but often don’t. There are still lots of people using non-.msi installers such as Inno Setup.

    But either way, that only indirectly addresses the issue. On my Ubuntu system, I have lots of packages installed that I rarely or never use. Maybe I installed a package because I needed to use it exactly once, and now it’s just sitting there. Not a big deal given that I have a 500GB hard drive in that computer, but if I was trying to squeeze into a 32GB or 64GB SSD…


    3 Oct 08 at 1:01 pm

Leave a Reply