Distributed Version Control (DVCS) is generating lots of interest, buzz, and rhetoric. You’ve heard of Git, Mercurial (Hg), and Bzr. You’ve probably been told that all the cool kids are using it. Unfortunately, for these tools, and you, too often the proponents of these tools do a terrible job of explaining the benefits (at least in my opinion). This post explains why I like DVCS, from several perspectives. I’ve used Bazaar in the past, and have switched to Mercurial for the past several months.
If I get to choose the version control for a project, I will choose Mercurial. The rest of this post explains why.
DVCS is similar to classic VCS, where you have a single repository in many ways. You get code, you modify code, you checkin code. All the DVCS tools I’ve used support the Update / Modify / Commit model used by other systems, like SVN, or CVS.
The difference is that DVCS systems have multiple copies of the entire repository. Every team member that works with the code will have a local copy of the full repo. Instead of one copy of the full history, there are several. I think this supports new workflows that makes everyone on the team more productive.
While many proponents say the DVCS means you don’t have a master server, I find that does a disservice to DVCS. You can have a master server, and in fact, all uses I’ve seen do. At some point, you’re going to build and deploy the software you’re creating. That build has to come from some single, known place. It’s the central server. You will replicate that central server, including all the history (see below). But that is a very different statement than saying there isn’t a central server.
DVCS adds two new commands you’ll need: Push and Pull. Pull brings changes from a central repository into your local repository. Push takes the changes in your local repository and commits them (merging if necessary) into the central repository.
Key Point: DVCS, in practice, doesn’t mean “there’s no central server”. It means “The central repository can be replicated as many times as you need.”
You can also use the Push and Pull commands to share changes among individual developers as well. I’ll discuss why I like that later.
Nerd Note: I know other commands exist as well. Notable, I’m ignoring clone. This isn’t meant to be a how to or a tutorial, but rather a conceptual discussion around what I perceive as the benefits of DVCS. If you want a great tutorial on how DVCS works, I suggest this tutorial by our friends at EdgeCase.
DVCS has made it easier to experiment, re-start, try different designs, and not lose anything. As soon as I feel like I’m on the wrong road, I’ll checkin my changes to my local repository. I won’t push those changes to the shared repository. I just want to make sure I don’t lose them. After that commit, I can delete the code that made me think I had taken a wrong turn, and start a new path. I have my changes safely in the source archive, and I haven’t negatively effected others. It makes it easier for me to try different approaches, and eventually settle on the best one.
DVCS also means I can do small spikes with other developers. I can start on a feature, and ask someone to pair, and share code between our local repositories. I’ll make a change my pair can pull those changes into her local repository. She’ll make changes, and I’ll pull those changes to my repo. Neither of us push to the central repository yet. One of us will do that when we’ve hit a delivery point (finish a task, a story, or whatever the smallest unit of measure into the main repository is.)
In short, the best feature of a DVCS is that I can submit works-in-progress that aren’t ready to share with anyone, or with the whole team. Later, after committing and pushing the finished set of changes, all the interim steps are available to the team. They can see the missteps and trials as well as the finished version. It provides a more complete history of what happened.
DVCS, used properly, means the shared repository is broken less often. I want developers on a project to commit with the highest frequency possible. Those smaller changes enable faster integration. That leads to greater stability.
But concentrating on smaller changes makes it hard to investigate alternatives, try different designs, and make early-stage mistakes. People are too concerned with the fear of “breaking the build.” DVCS avoids that: commit, but don’t push. Push when you reach a slightly larger grained (but not too large) checkpoint that should integrate well.
Also, as I mentioned above, DVCS makes it trivial to create sub-teams to attack smaller problems. They work on their own branch, and when that is ready to integrate, the new feature gets pushed into the main branch.
Here, using DVCS enables people to experiment, and enables sub-teams to collaborate for short spikes.
DVCS takes some of the pressure off the IT infrastructure. Every developer has a reasonably up to date copy of the entire source archive.
If our nightly backup failed AND the server storing the main repo failed AND a recent backup couldn’t be found elsewhere, we still should be able to put together a pretty reasonable version of the last software. This is not to say that regular infrastructure isn’t important, but one more backup safeguard is a good thing.
In addition, using simple commands built into the DVCS, we can deliver the entire history of the project to our customers, in the case where we have work for hire agreements. It’s much easier than developing an out-of-band facility to move code and history from one organization to another.
Here, DVCS means I have created several low-friction backups, including copies and archives located offsite with the customer.
I’ve been extolling the virtues of DVCS for several paragraphs. However, it’s not for everyone. DVCS systems use a workflow that often requires merging changes. The more frequent every member of the team synchs with the main archive, the less friction you’ll encounter.
That means if you have a team that contains people how hide for weeks at a time while developing new features, a DVCS will expose more pain. In practice, I’ve had very few merge conflicts that were not resolved automatically, and correctly. However, our teams practice quick, short cycles. I think almost everyone tries to synch up with the main database on a daily basis, and the longest. The longer between merges, and the more painful it can get.
I’ve also found that large corporations are somewhat concerned about DVCS. The fact that every developer’s laptop contains the complete change history of some important project sends shivers down the corporate IT spine. What many small companies view as a great feature, these organizations view as a scary, irresponsible design decision.
As I said in my opening, every time I’ve used DVCS, I’ve had a central server. If your team can’t agree on a single location from which to build the deliverable code, you’re not ready for a DVCS. DVCS provides new ways to work, and enables people to experiment. If your team is going to use it to hide changes from each other or the central build mechanism, then it’s a bad idea.
I spent this post saying why I like DVCS in general. I do find the benefits, when properly applied, greatly outweigh the concerns. The single biggest benefit is that all these small changes team members have made get folded into the central repository. I view DVCS as a way to have a local repository AND a central repository AND have them work together. It’s not about ripping apart everything we like about classic Version Control Systems, it’s about supporting more workflows.
I’ve used Bzr, Mercurial, and Git. Of those three, Mercurial gave me the best experience. That’s what I’m using on every project that I can.
All of these projects are Open Source (using the Creative Commons license for content, and the MIT license for code). If you would like to contribute, visit our GitHub Repository. Or, if you have questions, comments, or ideas for improvement, please create an issue for us.