Systems still requiring detailed evaluations

Influence graph

Proofs

Prove that with rename as a primitive, there is a canonical form for any edit of tree nodes.

Peter Miller: >Thought experiment: >Sort all the files in the snapshot by name and >catenate all the contents together end-to-end. > >Do a textual diff from one snapshot like this to another… > >File renames show up as block moves on file boundaries. >Functions moving between files show up as block moves, too. > >This is what makes me think file tree diffs and file diffs have a lot in >common. > >If you have container identifiers, and sort by container identifier >instead of file name, then renames don't result in a block move at all, >no inferences required.

Prove that duality breaks in the presence of renames.

Test suite

Develop a test suite to pound on these systems, especially the funky edge cases like merge-through-rename.

<bos>   you were asking about rename support earlier.
<esr>   Still working hard on the survey paper.
<esr>   Yes?
<bos>   bugs aside, hg has more powerful rename support than bzr or mtn.
<esr>   (Saw your 2006 Google talk earlier. Well done.)
<bos>   thanks.
<esr>   I'm listening?
<esr>   !
<bos>   there are a few places where this crops up.
<bos>   one is if two people add a new file, for example by both applying a patch.
<bos>   (i know this doesn't sound like a rename case, but wait a sec.)
<esr>   No, I'm with you.
<esr>   I mean I do see how it relates.
<bos>   bzr requires every file to have the equivalent of an inode, a persistent identity that is independent of the name.
<bos>   but hg doesn't.
<esr>   Explain to me why not having inodes is good.
<bos>   so if you add a file, and i add it and make edits, one of our histories must vanish in an every-file-has-an-inode scheme.
<esr>   Hmmmm....
<bos>   at least, this is what happens in every inode-like system i've seen.
<Debolaz>       bos: How does mercurial named branches compare to git branches now?
<esr>   Don't see it. Aren't those histories still going to be attached to the parent revs after the merge?
<bos>   so mercurial lets me rename "a" to "b", and it lets you add a new file "b", and it gets the merge right, and doesn't lose history.
<bos>   esr: they'll be attached to the parent revs, but only one child rev will be visible, so the other will be orphaned.
<bos>   Debolaz: not fully implemented yet.
<bos>   esr: in other words, only one inode can exist in the merge revision (unless you resolve the conflict in some other way).
* Debolaz       is now unable to work with anything else than lightweight branches.
* bos   understands the appeal of lightweight branches, but is a bit busy to do anything about them.
<esr>   I don't see why the "orphaned branch" problem isn't resolved by merging. Maybe I'm just not understanding.
<bos>   esr: in the parent revs, i have inode 77, named "b", you have inode 99, named "b". in the merge, there's only one file named "b", and it can't be both 77 and 99 at the same time, so one must win.
<Debolaz>       One repository per branch just isn't feasible when the repo size reaches a significant size and you branch aggressively.
esr>    No, I got that alright.
<esr>   What I don't understand is your assertion that history is lost. I don't see how.
<bos>   history isn't lost, it's just disconnected.
<esr>   Disconnected as in?
<esr>   Am I unable to find it if I look on both 77 and 99 branches?
<bos>   you might want to concoct a test case of this form with a few tools, and see which ones can actually tell you in the UI that there were formerly two files with the same name.
|<--    plopix has left irc.wesnoth.org (Read error: 110 (Connection timed out))
<bos>   it at least *used* to be the case that with bzr, you could only see the history of one name of the file.
<bos>   maybe that is no longer the case; my knowledge is, hmm, 18 months old.
<esr>   bos: In fact that's one of several similar test cases I'm thinking about coding up.

Benchmarking

Do comparative benchmarking. First problem: What to measure?

The Emacs codebase is large enough to make a good test load.

Emacs tarball from Savannah: rsync -av cvs.gnu.org::sources/emacs

Git repo of emacs: git clone git://git.sv.gnu.org/emacs.git

BitKeeper

Does it have true commit-before-merge? Graydon argues not, but I don't completely understand the argument or Larry's counter.

Graydon: >After listening to Ted, that point is not as clear to me as I used to >think; if you have access to Larry you can get a more precise answer. I >thought it forced users to perform merges before/during any >inter-repository communication event, to arrive at a reconciliation or >history-linearization in each repository. > >If that's not actually true — if it permits propagating multi-headed, >unmerged DAGs forward between repositories — then the only sorta >technical, low-level difference in the graph organization is the CHF-DAG >vs. UUID-DAG issue.

git

Go ahead, read the docs, I dare you. Then have a nice lie-down and put an ice pack on your forehead and you'll feel better in no time.

Linus liked monotone's design but found its performance "horribly bad" and couldn't use it.

Linus describes Mercurial as "the only other version-control system worth looking at" and "pretty good".

Ted Ts'o argues for git:

> The other thing which I would note as being why I ultimately migrated
> from hg to git, is that git's ability to allow commits to effectively be
> mutable before they are published is git's killer feature of hg.   So
> the fact that I can do "git commit --amend" to change the most recently
> made commit, or use "git rebase --interactive" to fold commits
> togethers, or reorder commits, or edit specific commits in a portion of
> the repository that hasn't been released yet, is the killer feature that
> you can't do very well in Mercurial.
>
> So too is the ability to do work in a branch, and then when it comes
> time to fold that branch into the mainline, git makes it easy to
> *either* do a traditional merge of the branch, *or* rebase the branch so
> that the new commits are based off the tip of the development branch.
> This means that when I submit push a branch to Linus, I can effectively
> do the merging work by rebasing my branch to his latest, instead of
> letting Linus do the merge (and possibly get it wrong, not mention
> centralizing more merging work into the central merging point, which is
> not a scalable thing to do).
>
> So *this* is I believe key advantage and insight which git has --- the
> fact that before a particular set of commits have become widely
> published, you can revise them as many times as you like (or more
> properly, create new versions of the commits with different
> cryptographic hashes, but as long as you haven't given those commits to
> anyone else, you can revise them to your hearts content), and that by
> making it easy to rebase a branch before pushing it to the central
> maintainer, you have a more scalable solution.
>
> Note that this is very interesting compromise between the
> merge-before-commit and commit-before-merge models.  This is effectively
> a commit-merge-recommit-push model.  And the reason why this is really
> good is because it allows the merging work to be done by developer who
> made the branch and coded the new feature in the first place; that
> developer is much more likely to be able to test that feature works, and
> can much more easily do the merge correctly than forcing the central
> maintainer to do the merge.
>
> Yes, you can use hg + quilt, or hg + mq to do effectively the same
> thing.  But hg's storage model doesn't really allow you to do this at
> all efficiently.  The fact that git uses an object storage model with
> garbage collection makes it much more suited to creating branches, doing
> work, rebasing the work as necessary, editing and recreating commits as
> necessary, and all as first-class git operations.  When you are doing,
> the garbage collection (which now is automatically triggered when git
> notices there are too many loose objects that should be packed) takes
> care of the earlier draft versions of the commits --- with no muss, no
> fuss and no dirty dishes....
>
> E2fsprogs has gone through a large number of systems, from CVS (first
> started April 29, 1997), to BitKeeper (first started June 16, 2001), to
> hg (first started July 19, 2005), and now to git (first started June 30,
> 2007).  Through the years, I've managed to migrate all of the data from
> CVS all the way to git, which has been an amazing historical resource.
> It also means that I have a pretty good perspective about the various
> tools.
>
> I originally used hg because git at the time wasn't as space efficient
> as hg (it was before git had packs and delta compression), and because
> git's user interface would scare small children.  It was approximately a
> 9 months ago that I decided that git's usability had improved
> sufficiently, and that hg's design decisions about its data structures
> limited what it could do in terms of keeping up with git's features,
> that I started using git for all new projects, and approximately 3
> months later I covered e2fsprogs to use git.
>
> It is true that git is still harder to use that Mercurial (but hey, if
> emacs users can learn to type "C-c r o" to add whitespace in front of a
> series of lines when quoting text, they should have absolutely no
> problems learning git :-), and that git is not yet well supported under
> Windows (the main reason why Mercurial was chosen by Mozilla; but I
> suspect that's not a major consideration for emacs).  So it certainly
> isn't perfect.  But for a project which is primarily focused on
> Linux/Unix developers, and where the developers aren't afraid of a
> slightly steeper learning curvey to in trade for significantly more
> power in the hands of an expert, I think git is definitely the best tool
> of the bunch.

Mercurial

bos's Google tech talk in 2006 claimed that container identity was coming in hg, but a comment in Mark Shuttleworth's entry on renaming says it was still not there as of 7 Jun 2007.

I tried to learn how to edit old commit comments in hg:

<ThomasAH>      esr: you can use  qimport -r tip  to transfer the changeset to mq control, then use  hg qrefresh -e  to edit and hg qdel -r tip to move it away from mq back to hg
<esr>   Rollback won't work, the typos are too far back.
<ThomasAH>      esr: far back? Did you push the changes already to somewhere public?
<esr>   Sounds like the first nethod will only work on a tip changeset.
<ThomasAH>      esr: you can qimport a bunch of changesets at once. I don't know how well this currently works with merge chanegsets though
<esr>   It's a linear repository, so that might work. That is, if I can repeatedly pop off the tip changeset. I bet I'll lose my timestamps though, won't I?
<ThomasAH>      esr: no, timestamps are preserved ... and you should enable git style diffs if you have binaries, changed flags (e.g. executable) or renames/copies
ThomasAH>       esr: and instead of repeating you can just to: hg qimport -r tip:1234 (to qimport many), than qpop 1234 (to qpop repeatedly to 1234)
<ThomasAH>      esr: oh, hg qpop 1234.diff (not just 1234)

Research links

Comparisons

http://www.ada-france.org/debian/distributed-version-control-systems.html http://weblogs.mozillazine.org/preed/2007/04/version_control_system_shootou_1.html http://blog.racklabs.com/?p=28 (arch, darcs, Mercurial) https://zooko.com/revision_control_quick_ref.html http://changelog.complete.org/posts/528-Whose-Distributed-VCS-Is-The-Most-Distributed.html (svn, darcs, git, Mercurial, arch, bzr) http://better-scm.berlios.de/comparison/comparison.html http://kylecordes.com/2007/10/11/intro-dvcs/ (bzr, hg, git) http://bryan-murdock.blogspot.com/2007/03/cutting-edge-revision-control.html (bzr, darcs, hg, git) http://bramcohen.livejournal.com/17319.html (dars, arch, git, Codeville, monotone) http://www.cincomsmalltalk.com/userblogs/avi/blogView?showComments=true&entry=3279248343 http://www.gnuarch.org/gnuarchwiki/SubVersionAndCvsComparison (svn, cvs) http://www.ibm.com/developerworks/linux/library/l-vercon/ (CVS, Subversion, arch, git) http://www.nongnu.org/arx/codecon/img12.html

Merge theory ==

https://zooko.com/badmerge/simple.html

Arch

http://en.wikipedia.org/wiki/GNU_arch http://osdir.com/Article1687.phtml

git

http://opensolaris.org/os/community/tools/scm/git-report-final.txt http://community.livejournal.com/evan_tech/tag/vcs http://cubiclemuses.com/cm/blog/2007/git.html http://lwn.net/Articles/245678/ http://bazaar-vcs.org/BzrVsGit http://lists.cairographics.org/archives/cairo/2006-February/006255.html http://www.kernel.org/pub/software/scm/git/docs/tutorial.html http://blog.moertel.com/articles/2007/12/10/how-i-stopped-missing-darcs-and-started-loving-git

Mercurial

http://www.ussg.iu.edu/hypermail/linux/kernel/0504.2/0670.html http://hgbook.red-bean.com/hgbook.html http://lists.cairographics.org/archives/cairo/2006-February/006255.html http://www.bsdcan.org/2006/papers/DistributedVCS-paper.pdf http://lwn.net/Articles/151624/: http://video.google.com/videoplay?docid=-7724296011317502612

bzr

https://wiki.ubuntu.com/MeetingLogs/openweekfeisty/Bazaar http://bazaar-vcs.org/BzrVsGit http://www.bsdcan.org/2006/papers/DistributedVCS-paper.pdf

darcs

https://zooko.com/darcs_demystified.html http://lwn.net/Articles/109719/ http://darcs.net/ http://blog.moertel.com/articles/2007/12/10/how-i-stopped-missing-darcs-and-started-loving-git

Other systems

http://durak.org/sean/pubs/software/producingoss/a3139.html http://www.cmcrossroads.com/cgi-bin/cmwiki/view/CM/HistoryOfCM#1971_FUNDAMENTALS_OF_CONFIGURATI

Skepticism

http://blog.ianbicking.org/distributed-vs-centralized-scm.html

More questions to ask

http://computerroriginaliascience.blogspot.com/2007/08/how-to-evaluate-dvcs.html