Thursday, March 3, 2016

Best Part of Distributed Version Control


I switched jobs recently and I am now using git on a day to day basis.  My previous jobs had been either subversion (boo) or mercurial (which I really liked).  Transitioning to git has been relatively easy.  I've created several aliases to do things I used to do in mercurial (well, as close as I can get for some of them) and to make certain common git operations one command instead command --option --option argument [argument], and it's not too bad.  Once I learned how to "bring back" "lost" commits (aka move branch pointers around with git reset) I lost my fear of losing work.  I do still have some fear when I interact with our "central" git repo, because it's not always clear to me what exactly git push is going to do to the remote repo, but it's becoming more clear as I do it more and more.

In all my googling to learn how to do the things I want with git I came across, Unorthodocs: Abandon your DVCS and Return to Sanity."  I have to agree with some of what Benjamin says there.  For me sane branching and merging was the number one reason I was first attracted to distributed version control and Benjamin is right, good branching and merging could be provided by a centralized tool.  In fact, most people seem to be using decentralized tools just like they used their centralized tools in the past (see: github, gitlab, bitbucket, even hgweb).

I have found, however, that the longer I've used mercurial (and now git), the thing I love most about them is local commits.  I'm pretty sure that local commits are really the thing people want when they talk about needing good branching and merging.  99% of the time, people just want a way to commit their work but not inflict it on the rest of the team.  Then they would like to do some testing, commit and checkpoint their some more, and repeat that until they are sure it's ready to share.  With old centralized tools the only way to do that is with branches and merges (it's actually the only way with DVCS tool too, but they have the ability to mostly hide that from you).

The longer we used mercurial at my last job, the less and less we used branches.  The workflow was basically, do some work, commit it, post the changes to review board for review, and then once you have tested and had your code reviewed, rebase it onto the main branch (after folding all the work-in-progress intermediate commits together) and push.  The history in our main repo was one straight line.  Easy to look at and find the changes in the history you cared about.

The more advanced workflow might have involved downloading a patch from reviewboard and importing it as a local commit to test it out in your local clone, or sending a patch directly to someone else for them to import as a local commit in their local clone to test.  In either case you could then push that new commit (imported from the patch) or strip it if you didn't like it.  You could also make modifications, amend the commit with those modification, etc., etc.

The cognitive load of that workflow was so small and nothing you did in the experimental development stage could affect anyone else.  Your own work was safe, your co-workers work was safe, yet you could share work with each other very easily too.  The commands you had to know were literally:

hg log # -G was sometimes nice

hg commit # maybe with --amend

hg incoming # to preview a pull

hg pull --rebase

rbt post # code review

hg outgoing # to preview a push

hg push

That's it!  Advanced commands were:

hg update # to jump to another revision

hg export > patch-name

hg import patch-name

hg strip

Notice the lack of HEAD^ and reset --hard and checkout -b --track.  Man, those were the days.  Despite the more obtuse command, you can use that workflow with git too, and I'll probably learn how because right now everything we do is create a branch (which includes inventing a name for it) push to central server, pull (or should I fetch?) from central server, and merge on top of merge on top of merge.  It's a lot more to think about and keep straight in your mind, even without git's complex and unintuitive commands.

Having the ability to have those local commits, commits that are essentially in a draft state (not intended to be inflicted on the whole team) is the real killer feature of distributed version control tools.  Yes, you can have that draft state even in a centralized tool by committing to branches, but the amazing thing about DVCSs is you don't *need* to use an explicit branch.  You just commit, right on to trunk/master/default (whatever you call in), and it's local.  A draft.  A work-in-progress.  That's the default mode of operation.  And isn't that how it should be?  The default, no effort, no cognitive load mode of operation should be: create a private, draft commit.  When you are ready to put that commit into production, then a little cognitive load is OK.

 When you use git and the Very Branchy development model, you keep much of the cognitive load of centralized systems and using branches to maintain your work in progress.  The trick with DVCS tools is that you don't have to think about branches at all.  Just commit.  A simple pull --rebase is all it takes to integrate your changes with others, still privately, still preserving your original commit in case you need to go back.  Do the simplest thing that could possibly work.  I think I've heard that somewhere before.

3 comments:

Jordan said...

Ostensibly, the point of branches is to be able to jump back and forth between experiments easily. The point of naming those branches is to make it easier to remember which experiment is which. If you are only working on one thing at a time, then no, branches are not going to help. If you have some other way of remembering which commit path is which, then no, named branches is not going to help.

So if you are only working on one thing at a time, the workflow you outline is awesome. What do you do when you need to change gears mid-experiment?

Display name said...

"If you have some other way of remembering which commit path is which, then no, named branches is not going to help."

In Git, detached HEADs (or disconnected commit trees, or unnamed "branches"...not sure what the correct term is) will be garbage-collected, i.e. deleted, after two weeks by default. One of the fundamental ideas of Git is that branching is cheap and fast.

I think the best practice is, branch first, commit often (in small commits, using hunks staged with Magit), rebase regularly.

And if you're using Emacs, be sure to use Magit! It really is like magic!

Bryan said...

Jordan, with git you have no choice, you have to have a branch name for each commit path (commit path, I like that description) or else git will hide your commits and garbage collect them, as Display name pointed out. With mercurial that is not needed, but how do you remember which commit path is which? Mercurial (and git) labels each commit with a UUID (sha1 hash). You also are prompted to describe each commit with a commit message each time you create a commit. For me back in my mercurial days that commit message was where I put a reminder to myself. I still do that with git and so a branch name feels redundant.

Mercurial also has a helpful command, heads, that shows you the head commit of each commit path. That took the place, for me, of listing branches in git. The process to switch which branch I was working on was: hg heads, read commit messages if needed, hg update . It also helps that hg provides simple numbered commit id's along side the sha1 hashes, so it was a simple hg update 457, not hg update 34ab39e3...