I wrote previously on the topic of small commits. So when and why would I advise bunching small changes together into bigger ones, aside from the obvious case of changes that must be done atomically to avoid breaking something?
One example is a change that causes a compatibility break. Suppose you have an API, network protocol, file format, database schema, etc. you want to change. If you’re going to make one change to it already, this would be a great opportunity to make other, simultaneous, desirable changes. If people are going to have to upgrade their clients, servers, file parsers, file writers, databases, and/or database queries already, you might as well batch up these changes to reduce the total number of compatibility breaks and the total pain they will cause. The worst case would be if the intermediate API, protocol, database, etc. is released outside your organization. You might then have to support another version for the rest of time.
That’s not to say that you should make unnecessary or gratuitous changes at that time, but if you know you’re going to have to add 2 columns to a table in your database, you might as well add them both at once, rather than doing 2 separate changes to add one column at a time.
If a bug requires making an identical change to a bunch of different places in your source code, I’d likewise advise doing only a single change. If the same code has been copied and pasted to a bunch of different locations, for example, and each one has the same bug, I’d advise fixing them all at once. The last thing you want, certainly, is to be in the middle of fixing the bug and for someone to check in a new change adding another copy of the same buggy code — simply because you didn’t commit your changes all right away. This also makes it clear from the revision history that the changes are connected to one another.
However, if you find yourself in such a situation, where a “simple” bug fix requires changing a lot of similar logic all over the place, I might also suggest that your should look at your design more carefully and refactor your code to reduce the replication of logic. Any time you are copying and pasting code around, you are usually doing something wrong.
One of the often-claimed benefits of large commits is that there is fixed per-commit overhead. A typical example of this overhead would be a mandatory code review: if every change must be emailed out and you must wait for a reply from another engineer approving your change, this might take a while.
Fixed per-commit overhead, which is very real in many organizations, makes it very tempting to batch up your changes. I’d advise against this. If you are finding that fixed per-commit overhead is forcing you to batch up unrelated changes into a single atomic commit, I would contend that you have a process issue that you need to address.
Sometimes fixed per-commit overhead is simply unnecessary bureaucracy: paranoid management enforcing commit policies that have no logical connection to the actual risk of a change. My view is that a manager needs to be able to trust his employees’ judgment. If you don’t trust your employees to make good decisions and to ask around for help when they don’t know the right answer, I’d suggest that you have a much bigger problem in your organization and that your commit policies are just a band-aid.
These policies tend to drag down the productivity of your best engineers. If your best engineers are often 5-10x more productive than your average engineer, then you can ill afford to have them waste time on every commit, just to prevent your worst engineers from checking in bad code. The real solution is to get rid of the bad engineers or to mentor them so that they don’t need extensive babysitting.
I’ve worked in several organizations with these kinds of overkill commit policies, and my general approach as an engineer was simply to ignore the policies, which were rarely enforced, and use my own best judgment instead. (No… it really isn’t necessary to run an long, comprehensive test suite if all you’ve done is change a comment in the source code.)
In other cases, while the commit policy itself was basically reasonable, the time it took to run through the builds and tests was excessive. In this case the answer is to optimize your processes. If it takes several hours to build and test a change before committing it, forget the question of big vs. small commits — you’re killing your engineers’ productivity across the board.
For example, if your software needs to run on Windows, Linux, and Macintosh, it’s perfectly reasonable to expect that everyone’s changes should compile and pass a simple test on all three platforms before they are committed. But building and testing your changes on all three platforms can take a while, and done manually, it’s error-prone (are you sure you copied the exact same files back and forth between your 3 source trees? are you sure the final change you committed is the same one you tested?). This is where better tools like Cascade can help: instead of doing these builds and tests manually, you can simply “checkpoint” your changes and Cascade will take care of running them all.
If you’ve exhausted all the possible process improvements and commits are still taking a while, one final approach is to pipeline your work. Once you’ve kicked off builds and tests for a change, you shouldn’t just need to go off and browse the web waiting for them to complete. You ought to be able to start working on another, unrelated change in another tree. Again, Cascade can help. Traditionally, having more trees has been expensive: you have to check out and update the extra trees, and then you still have to build each tree independently (even though the build results should be the same). With Cascade, cloning a new tree takes just seconds, and each tree comes prepopulated with the results of all your builds and tests.