Cascade 1.2.0.1069 Manual: Introduction to Cascade

Return to main table of contents

Table of Contents:

  1. What is Cascade?
    1. Cascade File System
    2. Cascade Proxy
    3. Cascade Manager and Cascade Worker
    4. Summary
  2. A Detailed Introduction to Cascade File System
    1. Trees, Mount Points, and Revisions
    2. Destroying a CFS Tree
    3. Cloning Trees
    4. The /results Mount Point
    5. Other Features
  3. Cascade Repository URLs
    1. Subversion
    2. Perforce
    3. Cascade Manager (/results tree)
  4. Checkpointing vs. Committing
  5. Configuring Cascade
  6. Acknowledgements

What is Cascade?

A diagram of a Cascade installation

Cascade is a suite of tools that aims to speed up software development. Here are the primary components that make up Cascade:

Cascade File System

Normally, when you access your source control repository, the first step is to "check out" a tree. When you check out a tree, the client downloads the latest revision of all of the files and stores a copy of each file on your local disk. In some systems, like Subversion, the system stores not one but two copies of each file. You are only allowed to modify one of these two copies. The system can determine which files you've edited by comparing each file against the second copy, which contains the original, unmodified file contents.

For a large tree, checking out a tree can take a long time and consume a lot of disk space. Once you've checked out a tree, further updates are generally incremental, meaning that you only need to download new files or files that have changed since your last update/checkout. Even so, for large projects, an "update" can take a long time.

Cascade File System (sometimes abbreviated CFS) provides a more efficient way to access your source control repository. Instead of spending minutes or hours downloading a tree, you can be up and running in just seconds with Cascade. You don't need to download all the files up front. Instead, CFS provides you a view of the tree where it looks like all of the files are present, even though in reality—behind the scenes—a file is only downloaded the first time you access it.

Because CFS interfaces with the operating system kernel as a file system driver rather than as a shell/GUI (e.g. Windows Explorer) plugin, all of your applications can access files inside CFS just like any other file on your hard drive, even if those applications are not aware of CFS. CFS simply looks like another hard drive with a different drive letter (on Windows) or another file system with a different mount point (on Linux or Macintosh).

Once CFS downloads a file for the first time, the file is cached locally on your hard drive, so subsequent accesses to that file are just as fast as your local disk.

Cascade Proxy

When you work from a site with a slow network link to your source control repository, downloading files can be even slower. Cascade provides a proxy server called Cascade Proxy that caches files that have already been downloaded by other users. With Cascade Proxy, you don't have to waste time downloading files that other people already downloaded earlier.

You don't need to explicitly install a Cascade Proxy server. The Cascade Proxy server is built into the Cascade File System service, so any computer running CFS can also potentially act as a proxy server. To use a proxy, you just need to specify the proxy server's hostname during the Cascade install process, and Cascade will automatically route all requests through the proxy.

Using Cascade Proxy is entirely optional. All Cascade functionality works with or without the proxy server.

Cascade Manager and Cascade Worker

Cascade provides an automated build and regression test environment, sometimes known as a "continuous integration" system. You can set up builds and tests through its Cascade Manager web interface and run these builds and tests on a farm of PCs running the Cascade Worker software. When someone breaks a build or regression test, Cascade Manager can send out an email letting you know. You can see the status of these builds and tests, look at their log files, and download their results from the web interface.

Once you've set up your automated builds and tests, you can clone a pre-built, pre-tested CFS tree from Cascade Manager. Cloning is nearly instantaneous, because you don't need to download any files from the repository to clone. The cloned tree contains not just all the files in your source control repository but also all of the output files produced by your builds and tests. You can access these output files just like any other file in the file system. Cloning allows you to get started on development and testing right away, without having to wait for a build to complete.

When you clone a tree, you can tell Cascade Manager to give you the last known good revision rather than the latest revision. The last known good revision is the most recent revision where all builds and tests have completed and passed. By cloning last known good, you know that you are starting with a tree that works, rather than grabbing the latest revision and hoping that someone hasn't broken things recently.

After cloning a tree and making changes to it, you can checkpoint your changes, or upload them onto Cascade Manager for safekeeping. Unlike a commit to your repository, a checkpoint will never break anything. A checkpoint is a lightweight way to save your changes without permanently recording them for posterity in the repository. Once you've created a checkpoint, you or another person can clone a new tree from the checkpoint, so checkpointing is a fast and easy way to hand off changes between engineers.

Cascade Manager can kick off the builds and tests affected by a checkpointed change. This way, you can know what your change might break before you commit it, rather than hoping for the best after you commit it. This is especially helpful for cross-platform development, where it can take a lot of time and effort to manually verify that your software builds and runs on each of your target platforms.

Finally, Cascade Manager can impose a commit policy to prevent broken changes from being committed to the repository. By default, Cascade Manager will only allow you to commit a change if all of the builds and tests affected by the change have completed and passed.

Summary

Cascade File System and Cascade Proxy provide faster, more efficient access to your source control repository. Cascade Manager tracks changes to your repository and kicks off builds and tests affected by those changes on various Cascade Worker clients. Once the entire system is set up, users can benefit from Cascade's powerful tools for cloning trees and for checkpointing and committing changes.

A Detailed Introduction to Cascade File System

Cascade File System is composed of trees, which in turn contain mount points

Cascade File System is a file system driver. It exposes a tree of files and directories that any program running on your computer can access, just like you would access files on your local disk. The difference is that the files you see in Cascade File System are generally not backed by storage allocated on your local hard disk—instead, they're only cached on your local hard drive as needed.

The first time you access a file, it will be downloaded from the repository. Subsequent accesses will obtain the file from the local cache. Eventually, if you don't use a file for a long time, it may be evicted from your cache to make room for other files. All of the files will still appear to be present, even the ones that aren't in your cache; the only way to tell that a file is cached is that accessing it is faster.

On Windows, you can assign Cascade File System its own drive letter. By default, it uses the X: drive, but you can choose any unused drive letter at install time. On Linux and Macintosh, Cascade File System is typically available at the /mnt/cfs directory.

Trees, Mount Points, and Revisions

Cascade File System is organized into trees and mount points.

A mount point maps a repository at a particular revision. While source control terminology often differs from vendor to vendor, in Cascade, revisions count the total number of changes made since the beginning of time, not the number of changes made to a particular file. (This is similar to how Subversion uses the term "revision." Perforce uses the term "changelist" to describe the same concept. In general, Cascade tries to model its terminology as closely as possible after Subversion.) A mount point might map to repository X at revision 1000; this means that you will see the contents of the repository as they existed after 1000 commits, just as though you had done a svn checkout -r 1000 or p4 sync @1000.

In addition to reading them, you can modify, add, or delete files under CFS trees. Each CFS tree you create is independent of all your other CFS trees. When you make changes to files under the mount points, those changes will be private to just that tree. Also, CFS will never commit your changes to the repository unless you specifically ask it to.

Destroying a CFS Tree

There is one important "gotcha" to remember about destroying a CFS tree. Normally, you would delete a directory tree on your hard drive by typing rm -rf tree (Unix) or rmdir /s tree (Windows), or by clicking on it and selecting "Delete" from the context menu or typing Shift-Delete in Windows Explorer. Destroying a CFS tree this way will work, but this is suboptimal: these commands will recursively walk through the entire directory structure, deleting one file at a time. Since it's unlikely that you have the entire directory structure in your cache, this will take a while.

A much faster way to destroy a CFS tree is to type rmdir tree. CFS tweaks the normal file system semantics slightly and allows an rmdir operation on an entire tree to succeed, even though the directory underneath is not empty. Under Windows Explorer, the Cascade shell extension provides a "Delete Tree" option in its right-click context menu that will do the same thing.

CFS enforces normal rmdir semantics inside a tree, failing this operation if the directory is not empty. You can only use this shortcut to delete an entire CFS tree.

Cloning Trees

The mount point structure of each tree is entirely independent from that of the others—there is no restriction that all trees must have the same mount point structure. In practice, however, you will almost always want to set up the same mount point structure, and it would be tedious to type these directory names and their associated repositories over and over. Instead, you can clone a tree from Cascade Manager. Here, you fill in the paths of the mount points (relative to the root of their associated CFS trees) and their associated repositories only once. Then, when you clone a tree, the Cascade client software will set up this same mount point structure.

Cloning is particularly useful when working with more than one repository. If you tell Cascade Manager to watch more than one repository, it will look at the timestamps of the changes committed to each repository and interleave them appropriately into a single unified timeline. Instead of saying "revision 300 from repository A and revision 400 from repository B", you would simply clone revision 700 from the unified timeline of revisions that Cascade Manager builds up.

Note that even with just one repository, Cascade Manager's revision numbers can and probably will diverge from the underlying repository revision numbers. This can happen because:

The /results Mount Point

There is one special mount point that exists in all cloned trees. This mount point is called /results. Its directory structure parallels that of all your other mount points. The contents of this mount point are all of the output files of all of the tasks you have asked Cascade Manager to run. For example, if you have a source file mapped at /svn/trunk/foo.c, you have a task that compiles foo.c into foo.o, and you've told Cascade Manager to archive the file /svn/trunk/foo.o as an output file, you can find the file foo.o underneath any cloned CFS tree at /results/svn/trunk/foo.o.

The /results mount point has a few special properties that make it unlike other mount points.

Cascade does not attempt to update /results immediately as you edit tasks' input files. You must checkpoint your changes first before /results will reflect those changes. Cascade will not warn you if /results is out of sync with the edits you've made since your last checkpoint.

Other Features

You can access different versions of a file using @ suffixes on filenames.

Note that even though they are legal filename characters on both Windows and Unix, some programs have been observed to get confused by @ characters in paths.

Cascade Repository URLs

Cascade refers to repositories and files in repositories by their "URL." This URL has a similar (but not identical) format to that of the URLs you would use in your web browser.

This release of Cascade supports Perforce and Subversion repositories. If you are interested in using Cascade with another type of repository, please contact us.

Subversion

Subversion already uses URLs to identify repositories and files. Cascade's Subversion URLs are slightly different than normal Subversion URLs.

Subversion supports several different repository access methods. Cascade supports the http://, https://, and svn:// repository access methods. For the http:// and https:// repository access methods, to turn your Subversion URL into a Cascade repository URL, add svn- in front of it. For example:

svn-http://svn.collab.net/repos/svn
svn-https://svn.collab.net/repos/svn

For the svn:// repository access method, you can use the Subversion URL unmodified. For example:

svn://my-server/var/svn/repos

Cascade does not natively support the svn+ssh:// repository access method, but you can emulate it and achieve the same level of security with your own SSH tunnel. Use ssh (on Windows, you can use the free PuTTY SSH client) to log in to the target machine, with a tunnel set up to forward local port 3690 to remote port 3690, and run svnserve -d at the prompt. This will start an svnserve server process on port 3690. Once started, the server will continue to run until someone explicitly shuts it down or the system is rebooted, so you don't need to restart it each time you log in. Also, several users wanting to access the same repository can all share a single svnserve process. If port 3690 is already in use, the server is probably already running and you probably don't need to start a new server. If someone else happens to be using port 3690 for something unrelated, you can use svnserve's --listen-port argument to to specify a custom port number; in this case, you will need to specify this same port number as the remote port when setting up your SSH tunnel (you should still use port 3690 as the local port).

Once you have set up your SSH tunnel, your repository URL is simply svn://localhost/path rather than svn+ssh://server/path. You will need to leave this ssh tunnel running all the time in the background.

You can further simplify the use of an svn+ssh:// repository with Cascade Proxy. When Cascade is configured with a proxy, it will send queries to the proxy rather than directly to the repository. This means that you only need to set up the ssh tunnel on the machine acting as the proxy server, rather than on every single machine running Cascade File System. (If several proxies are chained together, the only one that needs the ssh tunnel is the final one in the chain.) If you set up a proxy on a computer that is "always on", you can start the tunnel once and simply leave it running. Cascade Proxy's network protocol is not secure, but you can secure it using a VPN or forward Cascade Proxy over an SSH tunnel on port 4187.

Cascade does not support the file:// Subversion repository access method.

Perforce

To construct a Cascade URL for a Perforce server, take the value of the P4PORT environment variable and add p4:// in front of it.

For example, if your P4PORT is perforce:1666, your repository URL would be p4://perforce:1666.

Cascade Manager (/results tree)

The special-purpose /results mount point in each Cascade tree is like a repository in many ways, so it, too, has a URL. This mount point is hosted by your Cascade Manager server, so its URL is the same as that of your Cascade Manager server but with csc- in front. For example, it might be:

csc-http://hostname:8080
csc-https://hostname/cascade

Checkpointing vs. Committing

One of the key concepts Cascade introduces is checkpointing changes. It's important to understand how checkpointing a change differs from committing it to the repository. In both cases, you are uploading a change to a server, but there are also some key differences between the two operations.

Commits are permanent. When you commit a change to the repository, it becomes part of the permanent archive of all the changes made to the files stored in your repository. This is a good thing if your changes are done, but usually not if your changes are not finalized yet. For example, before committing, because this is going to become part of the permanent record, you might want to clean up any clutter in your code: delete temporary debugging code, write better comments and clean up spelling and grammatical errors, and so on. Also, commits permanently grow the repository and require the server hosting the repository to do more work. This is usually not a concern for small projects, but on very large projects, if too many people commit too much code too frequently, this can slow down the server.

Checkpoints are immutable (once created, they cannot be modified) but temporary. Cascade Manager stores them in its database for safekeeping, and you can keep them around as long as you want, but typically administrators will purge old checkpoints every so often to free up disk space and remove clutter. For example, you might purge checkpoints more than a month old once a week during the weekend. Checkpointing also does not store any data in the repository, nor does it require the repository server to do any work. Checkpoints are therefore an ideal way to save off changes that aren't done yet and that don't yet meet the level of professionalism you would expect in a commit. You also don't need to worry about how often you checkpoint. Whereas you might commit a few times per day, you can checkpoint every few minutes if you so desire.

Committing a change pushes it on all other users working in the same codeline (project and branch) when they update their trees. This can be good or bad. If the change fixes an important bug, for instance, the sooner people pick it up, the better. On the other hand, if the change introduces a bug, this is dangerous. Either way, this change now becomes part of the baseline that future commits to the codeline must be relative to. This shifts the responsibility for who has to merge. As long as a change stays in your local tree, it is your responsibility to merge your changes with changes that other people commit. Once you commit a change to the repository, other people are now responsible for merging their changes with yours.

Checkpointing a change does not push it on anyone, nor does it affect who is responsible for merging. If another user wants to grab your checkpointed change and use it as a baseline, they can; this is simply a matter of cloning a new tree from the checkpoint. You are still responsible for merging your checkpointed changes with other users' commits. It is OK to checkpoint code that is buggy or that doesn't even compile, since no one else is forced to pick it up.

One way of looking at checkpointing is that it is a lightweight way to create a temporary branch. Traditionally, if you wanted to checkpoint your incomplete work in progress several times while working on a large change, you might create your own "private branch" and commit to it each time you wanted to save off your work. Then, when your change was finished, you would merge your private branch back into the main codeline and (optionally) delete your private branch. Checkpointing is a similar way of accomplishing the same result with less overhead.

Checkpoints are a natural way to build pre-commit workflows. For example, you might have a convention that, before someone is allowed to commit their changes, they must first obtain a code review from another engineer, and the code must compile and pass a certain list of tests. Of course, this relies on trust and requires discipline: it's tempting to make "one last change" to your code before committing it and not rerun all the test cases.

By separating a commit into separate "checkpoint" and "commit the checkpointed change" phases, we can prevent these sorts of problems. Because checkpoints are immutable, if you make another change after getting your code review or running your tests, you must create a new checkpoint. That might be OK, and you might want to allow the commit anyway—that's a decision your project has to make for itself—but it makes it possible for the tools to enforce a higher level of discipline.

In particular, Cascade's default policy is that a checkpointed change cannot be committed until Cascade has run through all of the tasks (builds and tests) affected by that change to demonstrate that the change doesn't break any of them. This allows you to keep your project at a guaranteed minimum level of quality. Cascade cannot write high-quality software or test cases for you, nor can it stop people from bypassing it altogether and committing directly to the underlying repository, but Cascade can help ensure that you don't get stuck waiting for someone to fix a careless, preventable build or regression test break.

Configuring Cascade

A Cascade installation needs to know various configuration information, such as where to find Cascade Manager on the network (in order to be able to clone a tree), or what diff program to launch when the user types csc diff. Cascade refers to these settings as configuration variables.

On Windows, when looking up a configuration variable, Cascade first looks for an environment variable with that name. If it can't find one, it looks in the Windows registry under HKEY_CURRENT_USER\Software\Conifer Systems\Cascade for a REG_SZ (i.e. a string, not a DWORD) registry key. If it can't find it there, it looks in the registry under HKEY_LOCAL_MACHINE\Software\Conifer Systems\Cascade. Finally, if it doesn't exist in any of these three locations, it may fall back to a default value.

On Linux or Macintosh, Cascade first looks for an environment variable. If it can't find one, it looks in the file /etc/cascade.conf. Each line in this file has the format var=value. (Note that this file format is very restrictive. The file format does not support comments, and it is sensitive to whitespace. For example, if you put a space before and after the =, the variable name would end with a space, and the variable's value would begin with a space.)

Some of the configuration variables are set up by the Cascade installer, and you may never need to touch their values after installation.

Here are some of the important configuration variables you should know about:

Acknowledgements

This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit. (http://www.openssl.org/)

This product includes TortoiseMerge and TortoiseOverlays, developed by the TortoiseSVN project.


Comments or questions about the manual? Please email info@conifersystems.com with your feedback.

Copyright © 2008 Conifer Systems LLC. All rights reserved.

Cascade contains valuable trade secrets and other confidential information belonging to Conifer Systems LLC. This software and its associated documentation may not be copied, duplicated or disclosed to third parties without the express written permission of Conifer Systems LLC.