The Push/Merge Problem
Linus may be overstating it when he asserts, “If you like using CVS, you should be in some kind of mental institution […]”. You can create great software with CVS. Using it does not make you stupid. Many great projects have used, and continue to use, CVS. The entire FreeBSD project is versioned using CVS [1]. And hell, CVS sure beats no version control at all.
What’s wrong with CVS
CVS, however, doesn’t do much to keep your data safe. I’m not referring to server authentication or UNIX file permissions: I’m talking about cvs. Forget branching and merging for the moment. A simple cvs up is a dangerous, destructive operation. It’s not a problem when you’ve got an old copy of the source tree lying around and you want to sync it with the latest development version. But you don’t need CVS — or any other version control system for that matter — to do that; rsync will work just fine.
The problem comes after you’ve just finished developing your feature and you want to share it with the world (or perhaps just with your team). As it is a centralized version control system, CVS does not allow you to commit an outdated working copy. There is no way to say, “Hey CVS, I’ve just finished this feature. It works for me, all the tests pass. Won’t you please remember the state of my checkout that I just spent a week perfecting?”. That is, unless nobody else committed in the last week. If you are so lucky, cvs ci your changes and you are good to go.
Otherwise cvs up rears its ugly head. Before committing, you have to pull any changes committed to the repository since you last updated. The way that you do this in CVS is with cvs up. This is where the problem begins.
cvs up will destructively alter your changes without a second thought. After it has finished doing its dirty deeds, you no longer have your beautiful, test-suite-passing checkout which you worked a full 40 hours to perfect. That, my friend, is forever gone. Instead, your working copy has been transformed into a chimera whose head and arms comprise your changes, whose legs consist of foreign changes, and whose torso is some unholy combination of the two, littered with conflict markers and who knows what else.
Not a problem. You did make a backup of your working copy before cvs up-ing, right? No? Well, you’ll probably be able to figure out what changed from the conflict markers and make everything work OK again. Just hope that whatever changes you pulled didn’t break any assumptions your changes made.
Times have changed, right?
It’s 2008. You couldn’t possibly think I believe anyone would seriously consider choosing CVS to version control their project today. Even subversion is no longer vogue. As Linus so eloquently and persuasively contends, we all should be using a distributed revision control system. Subversion won’t work; not even perforce is good enough. It has got to be distributed — no two ways about it.
A number of distributed revision control tools have sprung up in the last couple of years, and largely thanks to Linus’s evangelism, people have started to join the distributed revision control bandwagon.
Well, at least they are using the distrbuted tools. It seems that people still want their revision control systems to work like CVS. And they’re surprised when they don’t.
One source of confusing is why their beloved cvs up is so difficult. Taking Mercurial as an example: it’s hg pull; hg update; hg merge; hg commit, you say? Care to run one that by me again?
Sure thing:
hg pull: Bring any foreign changesets into your repository.
hg update: Alter your working copy by syncing it with the tip of the branch your working copy is part of. One hopes that you have committed any outstanding changes before updating your working copy, but if you haven’t, Mercurial will merge your changes.
If you haven’t made any changes since last updating your working copy, you’re done. In fact, hg pull; hg update can safely be reduced to hg pull -u.
hg merge: This is where your distributed version control system adds value. Stated otherwise, this is where CVS gets things horribly wrong. At this point, your changes are safe in the version control history as a distinct state of the repository. You are now creating a new changeset representing an entirely distinct state of the repository which reflects the combination of your changes and whatever other changesets you pulled.
hg commit: Only necessary if you merged as above. This is the step where you tell your version control system that you are introducing a distinct changeset which combines two branches of development.
Why is this workflow better than cvs up? Because it protects your data. You have the option of utilizing conflict markers in your merge, but your known-working changeset is preserved in the repository history. And no chimeras are involved.
Once you’ve committed your merge, you can now push your merged branch upstream. Continuing with the Mercurial example, this would consist of hg push.
The push/merge problem
“But wait”, you object. “It’s wonderful that my changes are safely tucked away in the repository’s history. But man, this is still incredibly inconvenient. I just merged my week’s worth of work with the central repository, but when I tried to push my changes, someone else had already pushed their own changes since I last pulled!”
Enter the push/merge problem.
Unfortunately, this problem is inherent to the centralized workflow that is borrowed from CVS. CVS makes it easy because it does very little to protect your data. Distributed tools like Mercurial assume a different philosophy: “Your data is important”. As a consequence, this workflow is inconvenient, as it is inherently prone to races as described above.
Now I’m not suggesting that centralized workflows must be eschewed entirely. If your project can afford to have a gatekeeper whose job it is to hold exclusive write access to an authoritative repository and merge everyone else’s changes into it, then by all means, avoid hg push. It works for Linus (well he uses git, but you get the point).
If you can’t have a gatekeeper, then your distributed tools will still support a centralized workflow. And they will do it better than CVS ever could. But please, don’t blame the tools for protecting your data.
| [1] | They are in the process of switching to subversion. For the purposes of this discussion, however, CVS and subversion are equivalent. |
leave a comment