A Simplistic Comparison of Distributed Revision Control Systems by Example
Lately I've been wasting a lot of time reading articles about Distributed Revision Control Systems trying to figure out which one is right for me. After reading dozens of diverging and/or outdated opinions, hateful rants and linus-fanboy-loveletters on the topic I finally gave up and decided to find the right one™ by myself. In this article I've recorded the most important properties of the different DRCSs which helped me decide on one so others can draw their own conclusions from my findings. The properties I'm discussing are of course just a select few; however, they should be typical enough to draw conclusions about the general behavior of the different systems.
The attributes of the greatest importance to me were:
- Ease-of-Use
I don't want to bother with anything that makes getting the job done more annoying than it has to be or is overly unintuitive to use. - Good Documentation
Without a good documentation learning something new can be a pain. - Portability
While I personally am happy as long as it runs on Linux it is rather likely that I will want to develop software with others (e.g. Windows users). Having a RCS that is equally at-home in both worlds suddenly sounds like a good idea. - Extensibility / Plugin-Systems
If it lacks a feature I want to be able to hack it till it does what i want or easily use the work of somebody else who had the same problem. - Performance
While I don't really care if one DRCS is 10% faster or slower than the other things should stay reasonable.
The DRCSs I tested:
I left out darcs because it appears to have some serious performance issues (at least I've read so multiple times; however this information might be outdated or the issue might get fixed in the future) and 3 systems are already more than enough for a side-by-side comparison. Also i don't have a clue about Haskell but quiet some experience with the other used languages (Git: C+Perl+sh, Mercurial: Python+C, Bazaar: Python), which counts as a lack of “Extensibility” on my personal attribute list. I also left out monotone because apparently nobody is using it and the syntax looked rather cumbersome to me.
I will be using the following example pseudo project to demonstrate the differences between the DRCSs.
phil@straylight:~/tmp/project % ls bar.py bar.pyc foo.py
bar.pyc contains Python-bytecode and the .py-files normal Python-sourcecode.
foo.py
#! /usr/bin/env python import bar bar.hello()
bar.py
def hello(): print "hello"
Documentation
- Mercurial
http://www.selenic.com/mercurial/wiki/
Against my expectations the documentation of all projects was pretty good; even the one of Git which I've read rather negative things about. I especially liked the explanation of the different possible workflows from the Bazaar project (Note: most of these are also possible with Mercurial).
Portability
Mercurial and Bazaar run wherever you want; Git however still got issues with that. Officially Git currently runs only with cygwin on Windows which is rather annoying to install for a single program. Luckily there is a fork of Git which is compilable using MinGW and should soon be merged into the official Git tree. This should solve most issues except for the less-than-great performance on Windows (I haven't benchmarked this myself, but it appears to be consensus that git was written using functions and system calls which are fast on Linux, but not Windows).
Also all tested systems got more or less advanced TortoiseCVS-clones for Windows called git-cheetah, TortoiseHg and TortoiseBZR; so getting Windows-users with a dislike for commandlines to use these should be a none-issue.
Extensibility
Performance
I didn't perform any benchmarking myself because I don't expect that there will be any noticeable performance differences on the projects I'm likely to work on. However, Git is supposed to be the fastest (as long as it runs on Linux at least) and Bazaar the slowest.
Ease of Use
Here I will take a look at the most common operations and how they manage to annoy me.
Getting started
In this section I will create a repository, add the projects files to it and do some minor changes to the code (using vim).
Git
phil@straylight:~/tmp/git % cp ../project/* . phil@straylight:~/tmp/git % git init Initialized empty Git repository in .git/ phil@straylight:~/tmp/git % git add . phil@straylight:~/tmp/git % git commit Created initial commit 3fdb29b: initial commit 3 files changed, 7 insertions(+), 0 deletions(-) create mode 100644 bar.py create mode 100644 bar.pyc create mode 100644 foo.py phil@straylight:~/tmp/git % vim bar.py phil@straylight:~/tmp/git % git commit # On branch master # Changed but not updated: # (use "git add <file>..." to update what will be committed) # # modified: bar.py # no changes added to commit (use "git add" and/or "git commit -a") phil@straylight:~/tmp/git % git commit -a Created commit 006f2f7: 2nd commit 1 files changed, 1 insertions(+), 1 deletions(-)
This is the point where most people normally start wondering why git commit doesn't do what they are expecting it to do. This is because ”Git tracks content not files” or to quote the explanation from the Git tutorial:
Many revision control systems provide an “add” command that tells the system to start tracking changes to a new file. Git's “add” command does something simpler and more powerful: git add is used both for new and newly modified files, and in both cases it takes a snapshot of the given files and stages that content in the index, ready for inclusion in the next commit.
So once I changed a file and want to commit it I've got to add it again first or explicitly call commit with the -a option?
Personally I think this is just plain annoying instead of ”simpler and more powerful”. But to be honest I just don't get the use-case here; must have something to do with the special circumstances of ultra-hierarchical kernel development or something.
UPDATE: This thread on reddit explains a possible use-case; while I don't see why this should be the default behavior it is at least an explanation.
Mercurial
phil@straylight:~/tmp/mercurial % cp ../project/* . phil@straylight:~/tmp/mercurial % hg init phil@straylight:~/tmp/mercurial % hg add adding bar.py adding bar.pyc adding foo.py phil@straylight:~/tmp/mercurial % hg ci No username found, using 'phil@straylight' instead phil@straylight:~/tmp/mercurial % vim bar.py phil@straylight:~/tmp/mercurial % hg ci No username found, using 'phil@straylight' instead
No surprises here.
Bazaar
phil@straylight:~/tmp/bzr % cp ../project/* . phil@straylight:~/tmp/bzr % bzr init phil@straylight:~/tmp/bzr % bzr add added bar.py added foo.py ignored 1 file(s). If you wish to add some of these files, please add them by name. phil@straylight:~/tmp/bzr % bzr ci Committing to: /home/phil/tmp/bzr/ added bar.py added foo.py Committed revision 1. phil@straylight:~/tmp/bzr % vim bar.py phil@straylight:~/tmp/bzr % bzr ci Committing to: /home/phil/tmp/bzr/ modified bar.py Committed revision 2.
bar.pyc is ignored by default – nice.
Branching
In this section I will create an additional branch in the current repository called foo, make some changes in it and then merge it into the main branch.
Git
phil@straylight:~/tmp/git % git branch foo phil@straylight:~/tmp/git % git branch foo * master phil@straylight:~/tmp/git % git checkout foo Switched to branch "foo" phil@straylight:~/tmp/git % git branch * foo master phil@straylight:~/tmp/git % vim bar.py phil@straylight:~/tmp/git % git commit -a Created commit 03de70a: 1st exp commit 1 files changed, 1 insertions(+), 1 deletions(-) phil@straylight:~/tmp/git % git checkout master Switched to branch "master" phil@straylight:~/tmp/git % git merge foo Updating 006f2f7..03de70a Fast forward bar.py | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)
Mercurial
phil@straylight:~/tmp/mercurial % hg branch foo marked working directory as branch foo phil@straylight:~/tmp/mercurial % hg ci No username found, using 'phil@straylight' instead phil@straylight:~/tmp/mercurial % hg branches foo 1:84125d4a839f default 0:7da72b11a288 (inactive) phil@straylight:~/tmp/mercurial % vim bar.py phil@straylight:~/tmp/mercurial % hg ci No username found, using 'phil@straylight' instead phil@straylight:~/tmp/mercurial % hg up default 1 files updated, 0 files merged, 0 files removed, 0 files unresolved phil@straylight:~/tmp/mercurial % hg merge foo 1 files updated, 0 files merged, 0 files removed, 0 files unresolved (branch merge, don't forget to commit) phil@straylight:~/tmp/mercurial % hg ci No username found, using 'phil@straylight' instead
Bazaar
Bazaar doesn't support inline branching. However, lacking this feature you can still just put your branches in their own repositories. While this is a somewhat less cool solution than real in-repository-branching its IMHO easier to get into and more intuitive to use. On the downside this makes it harder to work on multiple branches with others.
Development with a Central Server
This is the most important section for me because this it what people spend most of their time with and where they will run into the most problems. To demonstrate this I will first pull non-conflicting changes from a remote repository and then pull conflicting changes and merge them into the local repository.
Git
phil@straylight:~/tmp/git % git remote add remote-rep ssh://localhost/~/tmp/git-remote phil@straylight:~/tmp/git % git pull remote-rep master Updating 03de70a..d6745a4 Fast forward bar.py | 4 +++- bar.pyc | Bin 214 -> 284 bytes 2 files changed, 3 insertions(+), 1 deletions(-)
While it is possible to create aliases for repositories from the commandline you've got to edit .git/config by hand to specify a default repository.
phil@straylight:~/tmp/git % git pull remote-rep master remote: Counting objects: 5, done. remote: Compressing objects: 100% (3/3), done. remote: Total 3 (delta 0), reused 0 (delta 0) Unpacking objects: 100% (3/3), done. Auto-merged bar.py CONFLICT (content): Merge conflict in bar.py Automatic merge failed; fix conflicts and then commit the result.
This leaves bar.py looking like this:
def hello(): <<<<<<< HEAD:bar.py print "hello local!" ======= print "hello remote!" >>>>>>> 3cb2676a226b4399172b97367f8e791fc7a11c9a:bar.py
Alternatively you can use git mergetool to choose between different graphical merging tools (the default being xxdiff…). IMHO this is rather cumbersome for something that should be the default behavior.
phil@straylight:~/tmp/git % git commit -a Created commit 08e16e1: Merge branch 'master' of ssh://localhost/~/tmp/git-remote
Mercurial
phil@straylight:~/tmp/mercurial % hg pull ssh://localhost/tmp/mercurial-remote/ pulling from ssh://localhost/tmp/mercurial-remote/ searching for changes adding changesets adding manifests adding file changes added 1 changesets with 1 changes to 1 files (run 'hg update' to get a working copy) phil@straylight:~/tmp/mercurial % hg update 1 files updated, 0 files merged, 0 files removed, 0 files unresolved
Again you've got to edit a configuration file by hand to specify the default repository (.hg/hgrc in this case).
phil@straylight:~/tmp/mercurial % hg pull ssh://localhost/tmp/mercurial-remote/ pulling from ssh://localhost/tmp/mercurial-remote/ searching for changes adding changesets adding manifests adding file changes added 1 changesets with 1 changes to 1 files (+1 heads) (run 'hg heads' to see heads, 'hg merge' to merge) phil@straylight:~/tmp/mercurial % hg merge merging bar.py
What follows is the nicest default behavior I've seen so far: the graphical diff/merge tool meld starts and you can conveniently solve any conflicts with it.
0 files updated, 1 files merged, 0 files removed, 0 files unresolved (branch merge, don't forget to commit) phil@straylight:~/tmp/mercurial % hg ci No username found, using 'phil@straylight' instead
Bazaar
phil@straylight:~/tmp/bzr % bzr pull --remember bzr+ssh://localhost/home/phil/tmp/bzr-remote/ M bar.py All changes applied successfully. Now on revision 2.
--remember specifies a default repository – nice and simple.
phil@straylight:~/tmp/bzr % bzr merge Merging from remembered location bzr+ssh://localhost/home/phil/tmp/bzr-remote/ M bar.py Text conflict in bar.py 1 conflicts encountered. phil@straylight:~/tmp/bzr % ls bar.py bar.py.BASE bar.pyc bar.py.OTHER bar.py.THIS foo.py
Which leaves bar.py like this:
bar.py
def hello(): <<<<<<< TREE print "hello local!" ======= print "hello remote!" >>>>>>> MERGE-SOURCE
…and also creates the following files:
bar.py.BASE
def hello(): print "hello"
bar.py.OTHER
def hello(): print "hello remote!"
bar.py.THIS
def hello(): print "hello local!"
If you've got the extmerge-plugin installed you can also use bzr extmerge --all to resolve the conflict using your favorite graphical mergetool.
phil@straylight:~/tmp/bzr % bzr resolve All conflicts resolved. phil@straylight:~/tmp/bzr % ls bar.py bar.pyc foo.py phil@straylight:~/tmp/bzr % bzr ci Committing to: /home/phil/tmp/bzr/ modified bar.py Committed revision 3.
Conclusion
Personally I prefer Bazaar because its easy to use, got the features I need + a plugin system and generally just isn't in the way when I want to get something done.
The second place goes to Mercurial. Overall it may be more complex but its still a nice system that tries not to get in your way too much. The ability to use inline branching is also quiet nice if you want to share multiple branches with others.
The last place goes to Git. It might have many nice features but the overall lack of concern for usability, unnecessary exposure of internals, not having a plugin-system and many odd choices (commit -a, using SHA1-hashes as the only revision ID, …) ruined it for me, but maybe my usual workflow just differs too much from everything Git ever was intended for. Still, Git got its good sides and might be the right choice for you since apparently many people are happy with it (assuming they are not just all fanboys who follow the hype blindly).
Disagreeing with me? Drawing a different conclusion? Just feel like flaming me? Please feel free to comment below.








Discussion
This was an interesting read
. And I agree that git may only wins in “5. Performance”, at least from what you read on the intarweb. It's almost like learning vim, the docs are getting better but are still not the way you would expect, and I probably still don't know enough about it to get the most use out of it. However, I found a nice article which IMHO gives a good explanation about the “commit -a” issue you have. Personally I like it that way, because it gives me total control over what I commit. On the other hand I've only used darcs and svn up to now, so I barely can say anything about bazaar or mercurial.
You should really give Darcs a try. Its command line user interface is really interactive, which makes it a joy to use. AFAICS, the other VCS don't have anything comparable to that. When I used Darcs, for the first time in my live I really grasped the concept of a changeset based, distributed VCS.
Well, I now stick with Mercurial because it performs much better on projects with big files. However, Darcs is perfectly suited for typical small projects, and it is by far the most user-friendly VCS.
I just got a small demo of darcs by a friend ,) It looked interesting but I personally still prefer Bazaar. However, I guess I really got to give darcs a chance once version 2.0 is released.
Regarding the interactive merge of Mercurial, it gets in the way if you are performing a large merge (e.g. 1900 files in my case). There is no easy way to tell mercurial to just get on with it and leave conflicts in the file. What seems to be a nice UI at first quickly becomes a nightmare for large merges.
First to point out at some of your git complaints: Q1 - Doesn't work on windows. A1 - msysgit is quite nice, and have nice installer. I've been using it for quite some time on a large project, and it worked without a glitch. Event performance was acceptable (if you run git gc from time to time).
Q2 - So once I changed a file and want to commit it I've got to add it again first or explicitly call commit with the -a option? A2 - It is easyer to write git config –global alias.submit commit -a, and after that you can use git submit instead of git commit -a. Aliases are quite a nice thing.
Q3 - While it is possible to create aliases for repositories from the commandline you've got to edit .git/config by hand to specify a default repository. A3 - Flase again. Just name your repo “origin” and it is implicitly default.
@Q1: I'm sorry but you must've misunderstood me there. I was actually quiet pleased with msysgit and hope that it becomes the default way to run Git on Windows soon.
@Q2: Thanks for the hint but this is IMHO more a question of usability. Sure, it's great that I can create aliases, but shouldn't that behavior be there from the start without user-intervention?
@Q3: Thanks again. However, this is IMHO still quiet suboptimal usability-wise and also not that easy to grasp just from the documentation (unless I've somehow missed that part of the usermanual or the FAQ).
@Q2 - It's more a matter of personal experience. If you had never used CVS/SVN/Mercurial or others before choosing Git, who's to say that Git's behavior wouldn't be more natural? Between the fact that Git overall is designed for a quite different workflow (learn to use the index and you'll be much happier), and the fact that Git specifically gets away from the 'standard' command sets that haven't changed since CVS, this tiny point always seems to get blown way out of proportion.
Please don't take offense to this, as I'm not implying anything towards you, but there are too many people who blow off Git because they don't learn the reasoning behind the way Git does what it does, how the index works, etc. It's really not meant to be used the way you would use SVN/Mercurial, and if you take the effort to really get used to Git, I all but guarantee your views on how it work will change for the better.
@Q3 - In most cases, a user will be starting their repository by cloning an existing remote, in which case 'git clone <url>' will automatically create a remote named 'origin' in your repository, so that at any point, 'git pull|push|fetch' will default to working from the remote that you first cloned from. Since Git was really designed for everyday use by people who are working from clones of a remote repository, the behavior of defaulting to creating/using 'origin' is mostly transparent to the user, and in most cases, users will rarely need to specify a default remote in their configuration.
@Q2 My issue was that I found it hard to understand that adding files to the index is the default procedure instead of an optional feature (lets say something named 'git iadd && git commit –index'). But I fully agree with you that Git is clearly designed for a completely different workflow which is very different from what I and most people are used to (which is not a bad thing!). But IMHO because of this Git is also just not as suited for some projects as are other, more flexible DRCS's. This doesn't mean that Git is bad, it is just a rather specialized tool that could be so much more than that with just a little care for overall usability (IMHO).
@Q3 True, but in the end this is the same intended-workflow VS usability-for-all issue as in Q2 IMHO.
I guess next time I should write <blink>USABILITY</blink> into the posts topic and make my focus clear from the start ,)
@Suraj Barkale
Since hg uses an external tool to do the merging, you can disable the smart merging (HGMERGE=”/usr/bin/merge” hg merge). Otherwise for big merges, the imerge extension is very useful: http://www.selenic.com/mercurial/wiki/index.cgi/ImergeExtension