The Conifer Systems Blog

Cascade File System and Build Performance

2 comments

A common worry with Cascade File System is that building through a file system layer like CFS will be much slower than building off a regular local disk using a file system like NTFS.  After all, building off a network file system like SMB/CIFS or NFS is typically much slower than building off a local disk.

Rather than just speculating, let’s take a look at some real numbers.  I measured wall-clock time for the same build in three scenarios: building off a local NTFS drive, building off CFS, and building over SMB off a Samba server on the same LAN with a ping of about 300 microseconds.  (These are actually times from the second of two clean builds in a row; the first build primes the caches in the system, eliminating sources of variability in the second run.)

NTFS: 51s
CFS: 53s
SMB: 428s

As these numbers make clear, using a network file system as an analogy for CFS performance isn’t quite right.  If the files and metadata you need are already in your CFS cache, CFS will not generate any network traffic.  Further, CFS cache entries don’t “time out” or “expire”, and the main part of your CFS cache (the file data) will persist even after an OS reboot.

Does CFS have overhead?  Sure, of course it does.  There’s plenty of performance tuning that can still be done on CFS.  At the same time, CFS also has at least one big performance advantage over modern disk-based file systems like NTFS and ext3fs: it’s not journaled.  A CFS tree is just a workspace; the real data that needs absolute protection is the data in your repository and in your Cascade Manager checkpoints.  If your OS crashes or your computer loses power, no big deal — you can just clone from your last checkpoint (checkpoints are cheap, so you can create them as often as you’d like).  Journaled file systems, on the other hand, go to great lengths to ensure that once certain types of data have been written to disk, they cannot be lost even in an OS crash or power loss.  Flushing data out to a hard drive is expensive: you have to wait for the hard drive to spin and seek to that spot on the drive, which can take milliseconds.  CFS can skip all of this extra work.

Now, if we compare to a network file system — the details differ from file system to file system, but many network file systems don’t make much of an effort to cache, since someone else might change the files on the server at any time.  Some will do limited amounts of caching but will “time out” cache entries, say, after 30 seconds, and ask the server again for the information.  (Of course, this leaves a window of 30 seconds where you could get the wrong answer to a query!)  Some will send a request to the server each time you open a file, asking whether the file has been changed since they last cached it, but this still requires a network round trip.  Some support “oplock” functionality where they request that the server notify them when their cached data fall out of date, but not all servers support this, those that do might limit the number of outstanding oplocks, and the server can arbitrarily refuse or break oplocks at any time.  The cache data is in memory and is lost either on reboot or even when the OS’s VM system needs to free up some pages to make room for other data.

There are also typically many other inefficiencies in network file system stacks — for example, packing and unpacking requests and replies, the TCP/IP stack, breaking up large requests into smaller ones to satisfy limits in the protocol, limited numbers of requests in flight (a serious problem if combined with network latency), overly “chatty” protocols that require many round trips to do simple operations, and sometimes just poorly-optimized client or server software.

Bottom line: once the files you need are in your CFS cache, CFS’s performance is similar to that of a local disk-based file system such as NTFS.  CFS is much faster than a network file system like SMB.


Written by Matt

September 16th, 2008 at 2:11 pm

Posted in Cascade

2 Responses to 'Cascade File System and Build Performance'

Subscribe to comments with RSS or TrackBack to 'Cascade File System and Build Performance'.

  1. I understand the desire to reduce measurement variability by doing only warm runs, but is a fully populated cache the situation that most users will find themselves in?

    Nick Carter

    20 Sep 08 at 12:49 pm

  2. A valid concern. Of course, we have to compare apples to apples: if you count the time to service cache misses with CFS, then you also have to count the time to check out or update your tree without CFS. That’s a more complex issue; I’m just trying to clarify that CFS (by itself) does not impose a significant performance hit.

    Matt

    22 Sep 08 at 2:44 am

Leave a Reply