Cascade 1.1.0.943 Manual: Introduction to Cascade

Return to main table of contents

Table of Contents:

  1. What is Cascade?
  2. Introduction to Cascade File System
    1. Trees, Mount Points, and Revisions
    2. Destroying a CFS Tree
    3. Cloning Trees
    4. The /results Mount Point
    5. Other Features
  3. Cascade Repository URLs
    1. Subversion
    2. Perforce
    3. Cascade Manager (/results tree)
  4. Checkpointing vs. Committing
  5. Configuring Cascade
  6. Acknowledgements

What is Cascade?

A diagram of a Cascade installation

Cascade is a set of tools that aims to speed up software development. In particular, here are some of the ways Cascade can help you:

You can use the individual pieces of Cascade by themselves—for example, you could use Cascade File System by itself as a more efficient way to access a source control repository, or Cascade Manager and Cascade Worker by themselves to set up a regression system—but Cascade is most powerful when you combine all of its pieces and use them together.

Introduction to Cascade File System

Cascade File System is composed of trees, which in turn contain mount points

Cascade File System is a file system driver. It exposes a tree of files and directories that any program running on your computer can access, just like they would access files on your local disk. The difference is that the files you see in Cascade File System are generally not backed by storage allocated on your local hard disk—instead, they're only cached on your local hard drive as needed.

The first time you access a file, it will be downloaded from the repository. Subsequent accesses will obtain the file from the local cache. Eventually, if you don't use a file for a long time, it may be evicted from your cache to make room for other files. All of the files will still appear to be present, even the ones that aren't in your cache; the only way to tell is that accessing a cached file is faster.

On Windows, you can assign Cascade File System its own drive letter. By default, it uses the X: drive, but you can choose any unused drive letter at install time. On Linux and Macintosh, Cascade File System is typically available at the /mnt/cfs directory.

Trees, Mount Points, and Revisions

Cascade File System is organized into trees and mount points.

A mount point maps a repository at a particular revision. While source control terminology often differs from vendor to vendor, in Cascade, revisions count the total number of changes made since the beginning of time, not the number of changes made to a particular file. (This is similar to how Subversion uses the term "revision." In contrast, the comparable terminology in Perforce is "changelist number." In general, Cascade tries to model its terminology as closely as possible after Subversion.) A mount point might map to repository X at revision 1000; this means that you will see the contents of the repository as they existed after 1000 commits, just as though you had done a svn checkout -r 1000 or p4 sync @1000.

Each CFS tree you create is independent of all your other CFS trees. When you make changes to files under the mount points, those changes will be private to just that tree. Also, CFS will never commit your changes to the repository unless you specifically ask it to.

Destroying a CFS Tree

There is one important "gotcha" to remember about destroying a CFS tree. Normally, you would delete a directory tree on your hard drive by typing rm -rf tree (Unix) or rmdir /s tree (Windows), or by clicking on it and selecting "Delete" from the context menu or typing Shift-Delete in Windows Explorer. Destroying a CFS tree this way will work, but it is extremely suboptimal: these commands will recursively walk through the entire directory structure, deleting one file at a time. Since it's unlikely that you have the entire directory structure in your cache, this will take a while.

A much faster way to destroy a CFS tree is to type rmdir tree. CFS "cheats" very slightly with normal file system semantics and allows an rmdir operation on an entire tree to succeed, even though the directory underneath is not empty. Under Windows Explorer, the Cascade shell extension provides a "Delete Tree" option in its right-click context menu that will do the same thing.

CFS enforces normal rmdir semantics inside a tree, failing this operation if the directory is not empty. You can only use this shortcut to delete an entire CFS tree.

Cloning Trees

The mount point structure of each tree is entirely independent from that of the others—there is no restriction that all trees must have the same mount point structure. In practice, however, you will almost always want to set up the same mount point structure, and it would be tedious to type these directory names and their associated repositories over and over. Instead, you can clone a tree from Cascade Manager. Here, you fill in the paths of the mount points (relative to the root of their associated CFS trees) and their associated repositories once, and then when you clone a tree, the Cascade client software will set up this same mount point structure.

Cloning is particularly useful when working with more than one repository. If you tell Cascade Manager to watch more than one repository, it will look at the timestamps of the changes committed to each repository and interleave them appropriately into a single unified timeline. Instead of saying "revision 300 from repository A and revision 400 from repository B", you would simply clone revision 700 from the unified timeline of revisions that Cascade Manager builds up itself.

Note that even with just one repository, Cascade Manager's revision numbers can and probably will diverge from the underlying repository revision numbers.

The /results Mount Point

There is one very special mount point that exists in all cloned trees. This mount point is called /results. Its directory structure parallels that of all your other mount points. The contents of this mount point are all of the output files of all of the tasks you have asked Cascade Manager to run on your repository. For example, if you have a source file mapped at /svn/trunk/foo.c, you have a task that compiles foo.c into foo.o, and you've told Cascade Manager to archive the file /svn/trunk/foo.o as an output file, you can find the file foo.o underneath any cloned CFS tree at /results/svn/trunk/foo.o.

The /results mount point has a few special properties that make it unlike other mount points.

Other Features

You can access different versions of a file using @ suffixes on filenames.

Note that even though they are legal filename characters on both Windows and Unix, some programs have been observed to get confused by @ characters in paths.

Cascade Repository URLs

Cascade refers to repositories and files in repositories by their "URL." This URL has a similar (but not identical) format to that of the URLs you would use in your web browser.

This release of Cascade supports Perforce and Subversion repositories. If you are interested in using Cascade with another type of repository, please contact us.

Subversion

Subversion already uses URLs to identify repositories and files. Cascade's Subversion URLs are slightly modified from the ones Subversion uses to disambiguate them from other applications that use the HTTP protocol.

Subversion supports several different repository access methods. Cascade supports the http://, https://, and svn:// repository access methods. For the http:// and https:// repository access methods, to turn your Subversion URL into a Cascade repository URL, add svn- in front of it. For example:

svn-http://svn.collab.net/repos/svn
svn-https://svn.collab.net/repos/svn

For the svn:// repository access method, you can use the Subversion URL unmodified. For example:

svn://my-server/var/svn/repos

Cascade does not natively support the svn+ssh:// repository access method, but you can emulate it and achieve the same level of security with your own SSH tunnel. Use ssh (on Windows, you can use the free PuTTY SSH client) to log in to the target machine, with a tunnel set up to forward local port 3690 to remote port 3690, and run svnserve -d at the prompt. This will start an svnserve server process on port 3690. Once started, the server will continue to run until someone explicitly shuts it down or the system is rebooted, so you don't need to restart it each time you log in. Also, several users wanting to access the same repository can all share a single svnserve process. If port 3690 is already in use, the server is probably already running and you probably don't need to start a new server. If someone else happens to be using port 3690 for something unrelated, you can use svnserve's --listen-port argument to to specify a custom port number; in this case, you will need to specify this same port number as the remote port when setting up your SSH tunnel (you should still use port 3690 as the local port).

Once you have set up your SSH tunnel, your repository URL is simply svn://localhost/path rather than svn+ssh://server/path. You will need to leave this ssh tunnel running all the time in the background.

You can further simplify the use of an svn+ssh:// repository with Cascade Proxy. When Cascade is configured with a proxy, it will send queries to the proxy rather than directly to the repository. This means that you only need to set up the ssh tunnel on the machine running Cascade Proxy, rather than on every single machine running Cascade File System. (If several proxies are chained together, the only one that needs the ssh tunnel is the final one in the chain.) If you set up a proxy on a computer that is "always on", you can start the tunnel once and simply leave it running. Cascade Proxy's network protocol is not secure, but you can secure it using a VPN or forward Cascade Proxy over an SSH tunnel on port 4187.

Cascade does not support the file:// Subversion repository access method.

Perforce

To construct a Cascade URL for a Perforce server, take the value of the P4PORT environment variable and add p4:// in front of it.

For example, if your P4PORT is perforce:1666, your repository URL would be p4://perforce:1666.

Cascade Manager (/results tree)

The special-purpose /results mount point in each Cascade tree is like a repository in many ways, so it, too, has a URL. This mount point is hosted by your Cascade Manager server, so its URL is the same as that of your Cascade Manager server but with csc- in front. For example, it might be:

csc-http://hostname:8080
csc-https://hostname/cascade

Checkpointing vs. Committing

One of the key concepts Cascade introduces is checkpointing changes. It's important to understand how checkpointing a change differs from committing it to the repository. In both cases, you are uploading a change to a server, but there are also some key differences between the two operations.

Commits are permanent. When you commit a change to the repository, it becomes part of the permanent archive of all the changes made to the files stored in your repository. This is a good thing if your changes are done, but usually not if your changes are not finalized yet. For example, before committing, because this is going to become part of the permanent record, you might want to clean up any clutter in your code: delete temporary debugging code, write better comments and clean up spelling and grammatical errors, and so on. Also, commits permanently grow the repository and require the server hosting the repository to do more work. This is usually not a concern for small projects, but on very large projects, if too many people commit too much code too frequently, this can slow down the server.

Checkpoints are immutable (once created, they cannot be modified) but temporary. Cascade Manager stores them in its database for safekeeping, and you can keep them around as long as you want, but typically administrators will purge old checkpoints every so often to free up disk space and remove clutter. For example, you might purge checkpoints more than a month old once a week during the weekend. Checkpointing also does not store any data in the repository, nor does it require the repository server to do any work. Checkpoints are therefore an ideal way to save off changes that aren't done yet and that don't yet meet the level of professionalism you would expect in a commit. You also don't need to worry about how often you checkpoint. Whereas you might commit a few times per day, you can checkpoint every few minutes if you so desire.

Committing a change pushes it on all other users working in the same codeline (project and branch) when they update their trees. This can be good or bad. If the change fixes an important bug, for instance, the sooner people pick it up, the better. On the other hand, if the change introduces a bug, this is dangerous. Either way, this change now becomes part of the baseline that future commits to the codeline must be relative to. This shifts the responsibility for who has to merge. As long as a change stays in your local tree, it is your responsibility to merge your changes with changes that other people commit. Once you commit a change to the repository, other people are now responsible for merging their changes with yours.

Checkpointing a change does not push it on anyone, nor does it affect who is responsible for merging. If another user wants to grab your checkpointed change and use it as a baseline, they can; this is simply a matter of cloning a new tree from the checkpoint. You are still responsible for merging your checkpointed changes with other users' commits. It is OK to checkpoint code that is buggy or that doesn't even compile, since no one else is forced to pick it up.

One way of looking at checkpointing is that it is a lightweight way to create a temporary branch. Traditionally, if you wanted to checkpoint your incomplete work in progress several times while working on a large change, you might create your own "private branch" and commit to it each time you wanted to save off your work. Then, when your change was finished, you would merge your private branch back into the main codeline and (optionally) delete your private branch. Checkpointing is a similar way of accomplishing the same result with less overhead.

Checkpoints are a natural way to build pre-commit workflows. For example, you might have a convention that, before someone is allowed to commit their changes, they must first obtain a code review from another engineer, and the code must compile and pass a certain list of tests. Of course, this relies on trust and requires discipline: it's tempting to make "one last change" to your code before committing it and not rerun all the test cases.

By separating a commit into separate "checkpoint" and "commit the checkpointed change" phases, we can prevent these sorts of problems. Because checkpoints are immutable, if you make another change after getting your code review or running your tests, you must create a new checkpoint. That might be OK, and you might want to allow the commit anyway—that's a decision your project has to make for itself—but it makes it possible for the tools to enforce a higher level of discipline.

In particular, Cascade's default policy is that a checkpointed change cannot be committed until Cascade has run through all of the tasks (builds and tests) affected by that change to demonstrate that the change doesn't break any of them. This allows you to keep your project at a guaranteed minimum level of quality. Cascade cannot write high-quality software or test cases for you, nor can it stop people from bypassing it altogether and committing directly to the underlying repository, but Cascade can help ensure that you don't get stuck waiting for someone to fix a careless, preventable build or regression test break.

Configuring Cascade

A Cascade installation needs to know various configuration information, such as where to find Cascade Manager on the network (in order to be able to clone a tree), or what diff program to launch when the user types csc diff. Cascade refers to these settings as configuration variables.

On Windows, when looking up a configuration variable, Cascade first looks for an environment variable with that name. If it can't find one, it looks in the Windows registry under HKEY_CURRENT_USER\Software\Conifer Systems\Cascade for a REG_SZ (i.e. a string, not a DWORD) registry key. If it can't find it there, it looks in the registry under HKEY_LOCAL_MACHINE\Software\Conifer Systems\Cascade. Finally, if it doesn't exist in any of these three locations, it may fall back to a default value.

On Linux or Macintosh, Cascade first looks for an environment variable. If it can't find one, it looks in the file /etc/cascade.conf. Each line in this file has the format var=value. (Note that this file format is very restrictive; for example, it does not support comments, and it is sensitive to whitespace. For example, if you put a space before and after the =, the variable name would end with a space, and the variable's value would begin with a space.)

Some of the configuration variables are set up by the Cascade installer, and you may never need to touch their values after installation.

Here are some of the important configuration variables you should know about:

Acknowledgements

This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit. (http://www.openssl.org/)

This product includes TortoiseMerge and TortoiseOverlays, developed by the TortoiseSVN project.


Comments or questions about the manual? Please email info@conifersystems.com with your feedback.

Copyright © 2008 Conifer Systems LLC. All rights reserved.

Cascade contains valuable trade secrets and other confidential information belonging to Conifer Systems LLC. This software and its associated documentation may not be copied, duplicated or disclosed to third parties without the express written permission of Conifer Systems LLC.