Lately I've been wasting a lot of time reading articles about Distributed Revision Control Systems trying to figure out which one is right for me. After reading dozens of diverging and/or outdated opinions, hateful rants and linus-fanboy-loveletters on the topic I finally gave up and decided to find the right one™ by myself. In this article I've recorded the most important properties of the different DRCSs which helped me decide on one so others can draw their own conclusions from my findings. The properties I'm discussing are of course just a select few; however, they should be typical enough to draw conclusions about the general behavior of the different systems.
The attributes of the greatest importance to me were:
The DRCSs I tested:
I left out darcs because it appears to have some serious performance issues (at least I've read so multiple times; however this information might be outdated or the issue might get fixed in the future) and 3 systems are already more than enough for a side-by-side comparison. Also i don't have a clue about Haskell but quiet some experience with the other used languages (Git: C+Perl+sh, Mercurial: Python+C, Bazaar: Python), which counts as a lack of “Extensibility” on my personal attribute list. I also left out monotone because apparently nobody is using it and the syntax looked rather cumbersome to me.
I will be using the following example pseudo project to demonstrate the differences between the DRCSs.
phil@straylight:~/tmp/project % ls bar.py bar.pyc foo.py
bar.pyc contains Python-bytecode and the .py-files normal Python-sourcecode.
foo.py
#! /usr/bin/env python import bar bar.hello()
bar.py
def hello(): print "hello"
Against my expectations the documentation of all projects was pretty good; even the one of Git which I've read rather negative things about. I especially liked the explanation of the different possible workflows from the Bazaar project (Note: most of these are also possible with Mercurial).
Mercurial and Bazaar run wherever you want; Git however still got issues with that. Officially Git currently runs only with cygwin on Windows which is rather annoying to install for a single program. Luckily there is a fork of Git which is compilable using MinGW and should soon be merged into the official Git tree. This should solve most issues except for the less-than-great performance on Windows (I haven't benchmarked this myself, but it appears to be consensus that git was written using functions and system calls which are fast on Linux, but not Windows).
Also all tested systems got more or less advanced TortoiseCVS-clones for Windows called git-cheetah, TortoiseHg and TortoiseBZR; so getting Windows-users with a dislike for commandlines to use these should be a none-issue.
I didn't perform any benchmarking myself because I don't expect that there will be any noticeable performance differences on the projects I'm likely to work on. However, Git is supposed to be the fastest (as long as it runs on Linux at least) and Bazaar the slowest.
Here I will take a look at the most common operations and how they manage to annoy me.
In this section I will create a repository, add the projects files to it and do some minor changes to the code (using vim).
phil@straylight:~/tmp/git % cp ../project/* . phil@straylight:~/tmp/git % git init Initialized empty Git repository in .git/ phil@straylight:~/tmp/git % git add . phil@straylight:~/tmp/git % git commit Created initial commit 3fdb29b: initial commit 3 files changed, 7 insertions(+), 0 deletions(-) create mode 100644 bar.py create mode 100644 bar.pyc create mode 100644 foo.py phil@straylight:~/tmp/git % vim bar.py phil@straylight:~/tmp/git % git commit # On branch master # Changed but not updated: # (use "git add <file>..." to update what will be committed) # # modified: bar.py # no changes added to commit (use "git add" and/or "git commit -a") phil@straylight:~/tmp/git % git commit -a Created commit 006f2f7: 2nd commit 1 files changed, 1 insertions(+), 1 deletions(-)
This is the point where most people normally start wondering why git commit doesn't do what they are expecting it to do. This is because ”Git tracks content not files” or to quote the explanation from the Git tutorial:
Many revision control systems provide an “add” command that tells the system to start tracking changes to a new file. Git's “add” command does something simpler and more powerful: git add is used both for new and newly modified files, and in both cases it takes a snapshot of the given files and stages that content in the index, ready for inclusion in the next commit.
So once I changed a file and want to commit it I've got to add it again first or explicitly call commit with the -a option?
Personally I think this is just plain annoying instead of ”simpler and more powerful”. But to be honest I just don't get the use-case here; must have something to do with the special circumstances of ultra-hierarchical kernel development or something.
UPDATE: This thread on reddit explains a possible use-case; while I don't see why this should be the default behavior it is at least an explanation.
phil@straylight:~/tmp/mercurial % cp ../project/* . phil@straylight:~/tmp/mercurial % hg init phil@straylight:~/tmp/mercurial % hg add adding bar.py adding bar.pyc adding foo.py phil@straylight:~/tmp/mercurial % hg ci No username found, using 'phil@straylight' instead phil@straylight:~/tmp/mercurial % vim bar.py phil@straylight:~/tmp/mercurial % hg ci No username found, using 'phil@straylight' instead
No surprises here.
phil@straylight:~/tmp/bzr % cp ../project/* . phil@straylight:~/tmp/bzr % bzr init phil@straylight:~/tmp/bzr % bzr add added bar.py added foo.py ignored 1 file(s). If you wish to add some of these files, please add them by name. phil@straylight:~/tmp/bzr % bzr ci Committing to: /home/phil/tmp/bzr/ added bar.py added foo.py Committed revision 1. phil@straylight:~/tmp/bzr % vim bar.py phil@straylight:~/tmp/bzr % bzr ci Committing to: /home/phil/tmp/bzr/ modified bar.py Committed revision 2.
bar.pyc is ignored by default – nice.
In this section I will create an additional branch in the current repository called foo, make some changes in it and then merge it into the main branch.
phil@straylight:~/tmp/git % git branch foo phil@straylight:~/tmp/git % git branch foo * master phil@straylight:~/tmp/git % git checkout foo Switched to branch "foo" phil@straylight:~/tmp/git % git branch * foo master phil@straylight:~/tmp/git % vim bar.py phil@straylight:~/tmp/git % git commit -a Created commit 03de70a: 1st exp commit 1 files changed, 1 insertions(+), 1 deletions(-) phil@straylight:~/tmp/git % git checkout master Switched to branch "master" phil@straylight:~/tmp/git % git merge foo Updating 006f2f7..03de70a Fast forward bar.py | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)
phil@straylight:~/tmp/mercurial % hg branch foo marked working directory as branch foo phil@straylight:~/tmp/mercurial % hg ci No username found, using 'phil@straylight' instead phil@straylight:~/tmp/mercurial % hg branches foo 1:84125d4a839f default 0:7da72b11a288 (inactive) phil@straylight:~/tmp/mercurial % vim bar.py phil@straylight:~/tmp/mercurial % hg ci No username found, using 'phil@straylight' instead phil@straylight:~/tmp/mercurial % hg up default 1 files updated, 0 files merged, 0 files removed, 0 files unresolved phil@straylight:~/tmp/mercurial % hg merge foo 1 files updated, 0 files merged, 0 files removed, 0 files unresolved (branch merge, don't forget to commit) phil@straylight:~/tmp/mercurial % hg ci No username found, using 'phil@straylight' instead
Bazaar doesn't support inline branching. However, lacking this feature you can still just put your branches in their own repositories. While this is a somewhat less cool solution than real in-repository-branching its IMHO easier to get into and more intuitive to use. On the downside this makes it harder to work on multiple branches with others.
This is the most important section for me because this it what people spend most of their time with and where they will run into the most problems. To demonstrate this I will first pull non-conflicting changes from a remote repository and then pull conflicting changes and merge them into the local repository.
phil@straylight:~/tmp/git % git remote add remote-rep ssh://localhost/~/tmp/git-remote phil@straylight:~/tmp/git % git pull remote-rep master Updating 03de70a..d6745a4 Fast forward bar.py | 4 +++- bar.pyc | Bin 214 -> 284 bytes 2 files changed, 3 insertions(+), 1 deletions(-)
While it is possible to create aliases for repositories from the commandline you've got to edit .git/config by hand to specify a default repository.
phil@straylight:~/tmp/git % git pull remote-rep master remote: Counting objects: 5, done. remote: Compressing objects: 100% (3/3), done. remote: Total 3 (delta 0), reused 0 (delta 0) Unpacking objects: 100% (3/3), done. Auto-merged bar.py CONFLICT (content): Merge conflict in bar.py Automatic merge failed; fix conflicts and then commit the result.
This leaves bar.py looking like this:
def hello(): <<<<<<< HEAD:bar.py print "hello local!" ======= print "hello remote!" >>>>>>> 3cb2676a226b4399172b97367f8e791fc7a11c9a:bar.py
Alternatively you can use git mergetool to choose between different graphical merging tools (the default being xxdiff…). IMHO this is rather cumbersome for something that should be the default behavior.
phil@straylight:~/tmp/git % git commit -a Created commit 08e16e1: Merge branch 'master' of ssh://localhost/~/tmp/git-remote
phil@straylight:~/tmp/mercurial % hg pull ssh://localhost/tmp/mercurial-remote/ pulling from ssh://localhost/tmp/mercurial-remote/ searching for changes adding changesets adding manifests adding file changes added 1 changesets with 1 changes to 1 files (run 'hg update' to get a working copy) phil@straylight:~/tmp/mercurial % hg update 1 files updated, 0 files merged, 0 files removed, 0 files unresolved
Again you've got to edit a configuration file by hand to specify the default repository (.hg/hgrc in this case).
phil@straylight:~/tmp/mercurial % hg pull ssh://localhost/tmp/mercurial-remote/ pulling from ssh://localhost/tmp/mercurial-remote/ searching for changes adding changesets adding manifests adding file changes added 1 changesets with 1 changes to 1 files (+1 heads) (run 'hg heads' to see heads, 'hg merge' to merge) phil@straylight:~/tmp/mercurial % hg merge merging bar.py
What follows is the nicest default behavior I've seen so far: the graphical diff/merge tool meld starts and you can conveniently solve any conflicts with it.
0 files updated, 1 files merged, 0 files removed, 0 files unresolved (branch merge, don't forget to commit) phil@straylight:~/tmp/mercurial % hg ci No username found, using 'phil@straylight' instead
phil@straylight:~/tmp/bzr % bzr pull --remember bzr+ssh://localhost/home/phil/tmp/bzr-remote/ M bar.py All changes applied successfully. Now on revision 2.
--remember specifies a default repository – nice and simple.
phil@straylight:~/tmp/bzr % bzr merge Merging from remembered location bzr+ssh://localhost/home/phil/tmp/bzr-remote/ M bar.py Text conflict in bar.py 1 conflicts encountered. phil@straylight:~/tmp/bzr % ls bar.py bar.py.BASE bar.pyc bar.py.OTHER bar.py.THIS foo.py
Which leaves bar.py like this:
bar.py
def hello(): <<<<<<< TREE print "hello local!" ======= print "hello remote!" >>>>>>> MERGE-SOURCE
…and also creates the following files:
bar.py.BASE
def hello(): print "hello"
bar.py.OTHER
def hello(): print "hello remote!"
bar.py.THIS
def hello(): print "hello local!"
If you've got the extmerge-plugin installed you can also use bzr extmerge --all to resolve the conflict using your favorite graphical mergetool.
phil@straylight:~/tmp/bzr % bzr resolve All conflicts resolved. phil@straylight:~/tmp/bzr % ls bar.py bar.pyc foo.py phil@straylight:~/tmp/bzr % bzr ci Committing to: /home/phil/tmp/bzr/ modified bar.py Committed revision 3.
Personally I prefer Bazaar because its easy to use, got the features I need + a plugin system and generally just isn't in the way when I want to get something done.
The second place goes to Mercurial. Overall it may be more complex but its still a nice system that tries not to get in your way too much. The ability to use inline branching is also quiet nice if you want to share multiple branches with others.
The last place goes to Git. It might have many nice features but the overall lack of concern for usability, unnecessary exposure of internals, not having a plugin-system and many odd choices (commit -a, using SHA1-hashes as the only revision ID, …) ruined it for me, but maybe my usual workflow just differs too much from everything Git ever was intended for. Still, Git got its good sides and might be the right choice for you since apparently many people are happy with it (assuming they are not just all fanboys who follow the hype blindly).
Disagreeing with me? Drawing a different conclusion? Just feel like flaming me? Please feel free to comment below.