I have to admit that I’m kind of a tools guy. Every now and then I get this hankering to try out new tools and see if there is a better way to work. For the past few weeks I’ve been researching the latest and greatest revision control tools that are available Free and Open Source. I’ve done some quick comparisons of what appear to be the front-runners as of this third month of 2007.
Before this adventure of mine I had only used three revision control systems. At work I have used CVS on a couple projects, and currently my team uses Clearcase. At home I’ve used CVS and Subversion (svn) for my various little bits of code. All are annoying in their own special ways. You’ve probably heard the annoyances before: CVS doesn’t keep history of renaming files; Clearcase is complicated, slow, completely reliant on the LAN, and complicated; and Subversion, well, I’ll get to that in a second. First I need to talk about their good points.
All of these systems I’ve used did have good points. CVS is simple, fast, and well documented. Subversion has all that, and adds rename support and a cool web server interface. Clearcase’s good point is that it introduced me to the real fun of branching. Branching is handy in that when you need to develop a tricky new feature, you can create a branch, work on your feature and check minor changes in as you go with no regard to breaking the build or disturbing other developers, and then when you are done you simply merge the new feature back into the main branch. Other changes and bugfixes can be done at the same time on other branches. It’s very nice.
So back to Subversion’s annoyances. This may or may not be something Subversion inherited from CVS, I never looked into it, but Subversion does do branching. I tried to use a branch with Subversion for a big change on a home project of mine, and it actually went pretty well. I learned something quite disconcerting though. Subversion doesn’t track merges automatically. The hassle of tracking merged revisions manually was just too much to ask. I was discouraged from branching with Subversion ever again. There had to be a better way.
Turns out there is a better way in the world of open source revision control. About a million better ways, in fact. Check out this comparison chart.
After reading a lot of comparisons like the above, I concluded that of all those revision control choices, the ones that are being actively developed and used by fairly big and/or prominent projects are Bazaar (AKA bazaar-ng), Mercurial, Darcs, and git. And actually I’ll state right here that it’s somewhat questionable whether Darcs meets my criteria (though I wanted it to, really). What follows is my comparison of these four systems. I compared them for speed and ease of use, the two things I cared about most. I also briefly looked at how they work over a network and their windoze support, but not to a great amount of detail. That would be a good sequel to this review.
By performance, I mean that I used the UNIX time command to see how long various basic operations took. Performing the various basic operations gave me some insight into the usability of each as well. For this test I used a directory with 266 MB of files, 258 KB of which were text files, with the rest being image files. I know, kind of weird to version all those binary files, but that was the project I was interested in testing this out on. Your mileage may vary and all that. Here’s a table summarizing the real times reported by time(1):
|Tool||initialize repository||initial file import||initial commit||branch/clone repository||non-conflicting merge||total|
As you can see, Mercurial (hg) was the fastest. I was a little disappointed in git, who’s whole purpose in life (depending on what you read) is to be fast. I’m thinking maybe it just doesn’t handle the binary files as well. Whatever. In the end I decided performance wasn’t that important of a feature for me. (But it still totally rocks that an app written in Python (well most of hg is python) kicked the pants off an app written in bare-metal, hard-core, “efficient” C. OK, I’ll stop being juvenile now.)
The general workflow and command set for each is very similar. Darcs is the only outlier here, having chosen to diverge from familiar cvs-like commands in favor of “record” instead of “commit,” and “changes” instead of “log” or “history,” and “whatsnew” instead of “diff.”
The other area where they are slightly different is in handling merges. For both git and darcs, it’s a one-command operation, as long as there are no conflicts. For hg you do an hg pull from one branch to the other, and then there is an hg merge command, followed by an hg commit to finalize the merge. With bzr it’s similar, except you use bzr pull only if the branches haven’t diverged, if they have it will tell you you need to use bzr merge instead (let’s all shake our heads at that one together…if it knows, and can tell you about it, why doesn’t it just do it?). Then after your merge you have to do a bzr commit. Bazaar also throws in a few extra steps when resolving conflicts that the others don’t have. It tells you all about those when the need arises, similar to the pull vs. merge issue.
I should also mention the nefarious and notorious git index. Technically, when you make a change to a file, you can’t just commit it, you need to add it to “the index” first. I never dug deep enough to fully understand what that really was and why it was needed, because you can just add a -a to the git commit command, and then it automatically add changed files to the index and works just like everything else. But before I figured that out it was pretty annoying.
Lastly on the topic of general usability, I had a strange thing happen amidst all this version control software testing. I’d get this really happy feeling when I was using mercurial, even though it was doing some weird things like requiring three steps to merge something and having difficulty with file and directory renames (more on this later). I’d get a similar happy feeling when using git, even though it has some UI oddities of its own (namely, the index). Upon further consideration, pondering, and meta-cognition, I believe it’s because hg and git are so dang easy to type, and bzr and darcs are one-handed contortions to type. Try it:
Weird, but it made a difference. Pondering a little further, I realized that git and hg are just so dang fast compared to bzr and darcs as well. That makes a big difference as well, at least when running these little test cases in rapid succession one after another. There’s a software usability lesson to be learned here somewhere.
The other part of my evaluation was to see how each of these tools handled file and directory renaming. I came up with some scenarios that may seem pathological, but I’m pretty sure I’ve seen, or at least come close to seeing each one of these in my Clearcase usage (and usually it’s quite impressive in it’s handling of them).
In each case I created a repository and made a branch of that repository. I refer to the initial repository as the parent branch, or just parent, and the branch as the child branch, or just child. Read on to see how it all came out.
- renamed a file in parent
- edited same file in child
- merged from parent to child
It was expected that the merge would preserve the edits made in child and the file would be renamed properly. That would be, “the right thing.”
How they did
bzr did the right thing. It has a bzr mv command to rename files or directories. When you do the rename and then diff it just tells you that the file was renamed. In the child branch the merge worked flawlessly.
hg started asking me confusing questions on the merge that I didn’t want to have to think about, and we both got confused. I ended up with two copies of the file in the child, one with the old name and one with the new. At least the default version that apt-get installed on Ubuntu Edgy Eft, 0.9.1, did this. I downloaded and installed the latest version, 0.9.3, and repeated the exercise. It did the right thing that time. One complaint though, is that when you hg mv a file and then do hg status it shows you a delete of the original file name and an add of the new one. An hg diff shows you the entire contents of the file, twice, once for the deleted one, once for the added one.
git did the right thing. The interesting thing about git is that you can either use the git mv command to do this operation, just like any other version control tool, or you can use the regular old UNIX ‘mv’ command. After renaming the file with mv, git will notice that you have a new file which you can then ‘git add’. It will then figure out that it’s really just a renamed version of the original file. When you go do the merge in the child directory it does the right thing, either way you do it. I should note that using just the UNIX ‘mv’ command you get the full file when you do a git diff, similar to hg. If you use ‘git mv’ then ‘git diff’ will just say the file was renamed.
darcs did the right thing, very similar to bzr but with fewer commands needed.
- rename a directory in parent
- edit a file in that directory in child
- merge from parent to child
It was expected that the merge would rename the directory, and preserve the changes made in the file under that directory. That would be the right thing.
How they did
bzr did the right thing.
hg did the right thing (using 0.9.3 from here on out), but the child directory had two copies of the renamed the directory, the original name and the new name. If I clone the child though, it comes out with just the newly renamed directory and the correctly edited file. Kinda weird, but I would be surprised if it’s not fixed soon.
git did the right thing, using ‘git mv’ or just ‘mv’ to rename the directory. That’s just so cool how it figures these renames out like that.
darcs did it just fine.
- move a file from one directory to another in parent
- edit that file in the child
- merge from parent to child
It was expected that the file would be moved in the child while preserving the edits made to that file in the parent.
How they did
bzr handled it just fine.
hg handled it just fine.
git handled it just fine
darcs handled it just fine
- rename file in parent
- edit same file in parent
- make a conflicting edit of same file in child (no rename)
It was expected that on the merge, the file would be renamed with some sort of conflict resolution taking place.
How they did
A few more details this time, to try and give a feel for how working with each one is.
bzr I remembered to merge, not pull, and it informed me there was a conflict in the file, with the new filename. I then manually opened the file, found the cvs-like conflict markers it had inserted in the file, and resolved the conflict. Then I couldn’t just commit after resolving the conflict, I had to ‘bzr resolve FILE’, then ‘bzr commit’. A lot of steps , but at least it was helpful and walked me through it. It could have been even more helpful and just done it for me!
hg it said it was merging the oldfilename with the newfilename, and then popped up my three-way diff tool that I had configured (emacs ediff, awesome tool, by the way). After resolving the conflicts with that (no manual editing or cvs-like conflict markers needed) the diff showed the whole freaking file, twice. Not very helpful. Then I checked in and everything was fine.
git told me there was a conflict in the newly renamed file. I then manually opened the file, found the cvs-like conflict markers, and resolved the conflict. After resolving the conflict the diff just listed the new filename, twice, which is kinda weird. Then it just needed a ‘git commit -a’.
darcs informed me there was a conflict in the file, with the new filename. I then manually opened the file, found the cvs-like conflict markers, and resolved the conflict. Then it just needed a ‘darcs record’. Very straightforward.
All four of these revision control tools handle merging as well as Clearcase, without the need for a dedicated IT professional supporting a specialized server. They also do renames, as well as all the other basics you expect from a revision control tool. They also have some innovative new features beyond branching and renaming that I haven’t talked about, things like emailed patches, bisect, tarball exports, hooks, plugins, and so forth. You really should try one of them out.
But which one? All seem to be reasonably usable. Darcs still reportedly has a deep, serious bug. Don’t use it (though it is nice). The other three have slight differences. Git supports easiest renaming and moving of files, because you can just use the UNIX commands to do it all, then a single ‘git add’ to pick up all the changes. However its diffs don’t show you want happened with all the renaming as well as bzr’s. Hg’s diffs are just as unhelpful as gits, maybe even less helpful. So for a project where I expect to do a lot of renaming and moving of files, hg probably isn’t the way to go for now. I’m leaning slightly toward bzr because of the more straightforward diff output. For a project where files are pretty much going to stay put I’ll probably use hg because its so fun to type, and it’s just so dang fast. In the end you are a big boy or girl. You can decide for yourself.
P.S. O CSS wizards, my table is too wide for my blogger template. I tried really hard to get Google to tell me how to fix it. I thought maybe I could use the overflow property to make it scroll horizontally, like my preformmated text, but to no avail. Any help would be greatly appreciated.
UPDATE: Thanks for the css tips in the comments! Wrapping the table with a div and then adding the overflow:auto for the div was what finally worked. Well, at least on Firefox. Who cares about anything else, right? ;-)