Table of Contents

A Simplistic Comparison of Distributed Revision Control Systems by Example

Lately I've been wasting a lot of time reading articles about Distributed Revision Control Systems trying to figure out which one is right for me. After reading dozens of diverging and/or outdated opinions, hateful rants and linus-fanboy-loveletters on the topic I finally gave up and decided to find the right one™ by myself. In this article I've recorded the most important properties of the different DRCSs which helped me decide on one so others can draw their own conclusions from my findings. The properties I'm discussing are of course just a select few; however, they should be typical enough to draw conclusions about the general behavior of the different systems.

The attributes of the greatest importance to me were:

  1. Ease-of-Use
    I don't want to bother with anything that makes getting the job done more annoying than it has to be or is overly unintuitive to use.
  2. Good Documentation
    Without a good documentation learning something new can be a pain.
  3. Portability
    While I personally am happy as long as it runs on Linux it is rather likely that I will want to develop software with others (e.g. Windows users). Having a RCS that is equally at-home in both worlds suddenly sounds like a good idea.
  4. Extensibility / Plugin-Systems
    If it lacks a feature I want to be able to hack it till it does what i want or easily use the work of somebody else who had the same problem.
  5. Performance
    While I don't really care if one DRCS is 10% faster or slower than the other things should stay reasonable.

The DRCSs I tested:

I left out darcs because it appears to have some serious performance issues (at least I've read so multiple times; however this information might be outdated or the issue might get fixed in the future) and 3 systems are already more than enough for a side-by-side comparison. Also i don't have a clue about Haskell but quiet some experience with the other used languages (Git: C+Perl+sh, Mercurial: Python+C, Bazaar: Python), which counts as a lack of “Extensibility” on my personal attribute list. I also left out monotone because apparently nobody is using it and the syntax looked rather cumbersome to me.

I will be using the following example pseudo project to demonstrate the differences between the DRCSs.

phil@straylight:~/tmp/project % ls
bar.py  bar.pyc  foo.py

bar.pyc contains Python-bytecode and the .py-files normal Python-sourcecode.

foo.py

#! /usr/bin/env python
import bar
bar.hello()

bar.py

def hello():
	print "hello"

Documentation

Against my expectations the documentation of all projects was pretty good; even the one of Git which I've read rather negative things about. I especially liked the explanation of the different possible workflows from the Bazaar project (Note: most of these are also possible with Mercurial).

Portability

Mercurial and Bazaar run wherever you want; Git however still got issues with that. Officially Git currently runs only with cygwin on Windows which is rather annoying to install for a single program. Luckily there is a fork of Git which is compilable using MinGW and should soon be merged into the official Git tree. This should solve most issues except for the less-than-great performance on Windows (I haven't benchmarked this myself, but it appears to be consensus that git was written using functions and system calls which are fast on Linux, but not Windows).

Also all tested systems got more or less advanced TortoiseCVS-clones for Windows called git-cheetah, TortoiseHg and TortoiseBZR; so getting Windows-users with a dislike for commandlines to use these should be a none-issue.

Extensibility

Both Mercurial and Bazaar support plugins. Git doesn't.

Performance

I didn't perform any benchmarking myself because I don't expect that there will be any noticeable performance differences on the projects I'm likely to work on. However, Git is supposed to be the fastest (as long as it runs on Linux at least) and Bazaar the slowest.

Ease of Use

Here I will take a look at the most common operations and how they manage to annoy me.

Getting started

In this section I will create a repository, add the projects files to it and do some minor changes to the code (using vim).

Git

phil@straylight:~/tmp/git % cp ../project/* .
phil@straylight:~/tmp/git % git init
Initialized empty Git repository in .git/
phil@straylight:~/tmp/git % git add .
phil@straylight:~/tmp/git % git commit
Created initial commit 3fdb29b: initial commit
 3 files changed, 7 insertions(+), 0 deletions(-)
 create mode 100644 bar.py
 create mode 100644 bar.pyc
 create mode 100644 foo.py
phil@straylight:~/tmp/git % vim bar.py
phil@straylight:~/tmp/git % git commit
# On branch master
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#
#       modified:   bar.py
#
no changes added to commit (use "git add" and/or "git commit -a")
phil@straylight:~/tmp/git % git commit -a
Created commit 006f2f7: 2nd commit
 1 files changed, 1 insertions(+), 1 deletions(-)

This is the point where most people normally start wondering why git commit doesn't do what they are expecting it to do. This is because ”Git tracks content not files” or to quote the explanation from the Git tutorial:

Many revision control systems provide an “add” command that tells the system to start tracking changes to a new file. Git's “add” command does something simpler and more powerful: git add is used both for new and newly modified files, and in both cases it takes a snapshot of the given files and stages that content in the index, ready for inclusion in the next commit.

So once I changed a file and want to commit it I've got to add it again first or explicitly call commit with the -a option? Personally I think this is just plain annoying instead of ”simpler and more powerful”. But to be honest I just don't get the use-case here; must have something to do with the special circumstances of ultra-hierarchical kernel development or something.

UPDATE: This thread on reddit explains a possible use-case; while I don't see why this should be the default behavior it is at least an explanation.

Mercurial

phil@straylight:~/tmp/mercurial % cp ../project/* .
phil@straylight:~/tmp/mercurial % hg init
phil@straylight:~/tmp/mercurial % hg add
adding bar.py
adding bar.pyc
adding foo.py
phil@straylight:~/tmp/mercurial % hg ci
No username found, using 'phil@straylight' instead
phil@straylight:~/tmp/mercurial % vim bar.py
phil@straylight:~/tmp/mercurial % hg ci
No username found, using 'phil@straylight' instead

No surprises here.

Bazaar

phil@straylight:~/tmp/bzr % cp ../project/* .
phil@straylight:~/tmp/bzr % bzr init
phil@straylight:~/tmp/bzr % bzr add
added bar.py
added foo.py
ignored 1 file(s).
If you wish to add some of these files, please add them by name.
phil@straylight:~/tmp/bzr % bzr ci
Committing to: /home/phil/tmp/bzr/
added bar.py
added foo.py
Committed revision 1.
phil@straylight:~/tmp/bzr % vim bar.py
phil@straylight:~/tmp/bzr % bzr ci
Committing to: /home/phil/tmp/bzr/
modified bar.py
Committed revision 2.

bar.pyc is ignored by default – nice.

Branching

In this section I will create an additional branch in the current repository called foo, make some changes in it and then merge it into the main branch.

Git

phil@straylight:~/tmp/git % git branch foo
phil@straylight:~/tmp/git % git branch
  foo
* master
phil@straylight:~/tmp/git % git checkout foo
Switched to branch "foo"
phil@straylight:~/tmp/git % git branch
* foo
  master
phil@straylight:~/tmp/git % vim bar.py
phil@straylight:~/tmp/git % git commit -a
Created commit 03de70a: 1st exp commit
 1 files changed, 1 insertions(+), 1 deletions(-)
phil@straylight:~/tmp/git % git checkout master
Switched to branch "master"
phil@straylight:~/tmp/git % git merge foo
Updating 006f2f7..03de70a
Fast forward
 bar.py |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

Mercurial

phil@straylight:~/tmp/mercurial % hg branch foo
marked working directory as branch foo
phil@straylight:~/tmp/mercurial % hg ci
No username found, using 'phil@straylight' instead
phil@straylight:~/tmp/mercurial % hg branches
foo                            1:84125d4a839f
default                        0:7da72b11a288 (inactive)
phil@straylight:~/tmp/mercurial % vim bar.py
phil@straylight:~/tmp/mercurial % hg ci
No username found, using 'phil@straylight' instead
phil@straylight:~/tmp/mercurial % hg up default
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
phil@straylight:~/tmp/mercurial % hg merge foo
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
(branch merge, don't forget to commit)
phil@straylight:~/tmp/mercurial % hg ci
No username found, using 'phil@straylight' instead

Bazaar

Bazaar doesn't support inline branching. However, lacking this feature you can still just put your branches in their own repositories. While this is a somewhat less cool solution than real in-repository-branching its IMHO easier to get into and more intuitive to use. On the downside this makes it harder to work on multiple branches with others.

Development with a Central Server

This is the most important section for me because this it what people spend most of their time with and where they will run into the most problems. To demonstrate this I will first pull non-conflicting changes from a remote repository and then pull conflicting changes and merge them into the local repository.

Git

phil@straylight:~/tmp/git % git remote add remote-rep ssh://localhost/~/tmp/git-remote
phil@straylight:~/tmp/git % git pull remote-rep master
Updating 03de70a..d6745a4
Fast forward
 bar.py  |    4 +++-
 bar.pyc |  Bin 214 -> 284 bytes
 2 files changed, 3 insertions(+), 1 deletions(-)

While it is possible to create aliases for repositories from the commandline you've got to edit .git/config by hand to specify a default repository.

phil@straylight:~/tmp/git % git pull remote-rep master
remote: Counting objects: 5, done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
Auto-merged bar.py
CONFLICT (content): Merge conflict in bar.py
Automatic merge failed; fix conflicts and then commit the result.

This leaves bar.py looking like this:

def hello():
<<<<<<< HEAD:bar.py
	print "hello local!"
=======
	print "hello remote!"
>>>>>>> 3cb2676a226b4399172b97367f8e791fc7a11c9a:bar.py

Alternatively you can use git mergetool to choose between different graphical merging tools (the default being xxdiff…). IMHO this is rather cumbersome for something that should be the default behavior.

phil@straylight:~/tmp/git % git commit -a
Created commit 08e16e1: Merge branch 'master' of ssh://localhost/~/tmp/git-remote

Mercurial

phil@straylight:~/tmp/mercurial % hg pull ssh://localhost/tmp/mercurial-remote/
pulling from ssh://localhost/tmp/mercurial-remote/
searching for changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 1 changes to 1 files
(run 'hg update' to get a working copy)
phil@straylight:~/tmp/mercurial % hg update
1 files updated, 0 files merged, 0 files removed, 0 files unresolved

Again you've got to edit a configuration file by hand to specify the default repository (.hg/hgrc in this case).

phil@straylight:~/tmp/mercurial % hg pull ssh://localhost/tmp/mercurial-remote/
pulling from ssh://localhost/tmp/mercurial-remote/
searching for changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 1 changes to 1 files (+1 heads)
(run 'hg heads' to see heads, 'hg merge' to merge)
phil@straylight:~/tmp/mercurial % hg merge
merging bar.py

What follows is the nicest default behavior I've seen so far: the graphical diff/merge tool meld starts and you can conveniently solve any conflicts with it.

 merging with meld

0 files updated, 1 files merged, 0 files removed, 0 files unresolved
(branch merge, don't forget to commit)
phil@straylight:~/tmp/mercurial % hg ci
No username found, using 'phil@straylight' instead

Bazaar

phil@straylight:~/tmp/bzr % bzr pull --remember bzr+ssh://localhost/home/phil/tmp/bzr-remote/
 M  bar.py
All changes applied successfully.
Now on revision 2.

--remember specifies a default repository – nice and simple.

phil@straylight:~/tmp/bzr % bzr merge
Merging from remembered location bzr+ssh://localhost/home/phil/tmp/bzr-remote/
 M  bar.py
Text conflict in bar.py
1 conflicts encountered.
phil@straylight:~/tmp/bzr % ls
bar.py  bar.py.BASE  bar.pyc  bar.py.OTHER  bar.py.THIS  foo.py

Which leaves bar.py like this:

bar.py

def hello():
<<<<<<< TREE
	print "hello local!"
=======
	print "hello remote!"
>>>>>>> MERGE-SOURCE

…and also creates the following files:

bar.py.BASE

def hello():
	print "hello"

bar.py.OTHER

def hello():
	print "hello remote!"

bar.py.THIS

def hello():
	print "hello local!"

If you've got the extmerge-plugin installed you can also use bzr extmerge --all to resolve the conflict using your favorite graphical mergetool.

phil@straylight:~/tmp/bzr % bzr resolve
All conflicts resolved.
phil@straylight:~/tmp/bzr % ls
bar.py  bar.pyc  foo.py
phil@straylight:~/tmp/bzr % bzr ci
Committing to: /home/phil/tmp/bzr/
modified bar.py
Committed revision 3.

Conclusion

Personally I prefer Bazaar because its easy to use, got the features I need + a plugin system and generally just isn't in the way when I want to get something done.

The second place goes to Mercurial. Overall it may be more complex but its still a nice system that tries not to get in your way too much. The ability to use inline branching is also quiet nice if you want to share multiple branches with others.

The last place goes to Git. It might have many nice features but the overall lack of concern for usability, unnecessary exposure of internals, not having a plugin-system and many odd choices (commit -a, using SHA1-hashes as the only revision ID, …) ruined it for me, but maybe my usual workflow just differs too much from everything Git ever was intended for. Still, Git got its good sides and might be the right choice for you since apparently many people are happy with it (assuming they are not just all fanboys who follow the hype blindly).

Disagreeing with me? Drawing a different conclusion? Just feel like flaming me? Please feel free to comment below.