0xDE (11011110) wrote,

Version control

Although originally developed for managing large software systems, version control systems can be very helpful in writing scientific papers: they provide mechanisms for managing and tracking revisions to papers, allow multiple authors to work on the same files at once without having to take turns playing "who has the token" email games, provide easy mobility of content between different machines at home and in the office, and by replicating your content on multiple machines provide a form of backup making data losses less likely. And anyone who's recently written an NSF proposal knows that the NSF now requires a data management plan, and version control is an important part of any such plan.

Until now, I've been using a CVS for version control for my single-author papers and most of my within-UCI collaborations, and SVN for some collaborations with other co-authors who have set up their own SVN repositories. But both CVS and SVN are getting old and creaky, so this weekend I started playing with Git instead.

In the context of academic writing, I think switching to Git will have some important advantages:

- The distributed version control model means that it will be possible to work offline (e.g. in an airplane) and still have access to the whole version history, not just the latest version. And for the same reason the whole history is backed up by multiple replicated copies, not just the most recent version. The distributed model also solves the problem of "do we host this at my institution or yours": we can do both!

- Software such as Gitolite should make it possible to manage co-authors from other institutions and give them access to shared master copies of Git repositories without having to create departmental logins for them and without having to deal with managing an Apache installation. And tools such as cvs2git should make it possible to transfer all our old history to the new system seamlessly.

- The default setup for Git repositories doesn't have the cumbersome branches/tags/trunk subdirectory structure that SVN has, which I never found to be particularly useful. Instead tags are handled as separate first-class objects in Git. I hadn't been using tags much in CVS/SVN but I think in Git they should be a good way of tracking major events in the lifetime of a paper such as submission to a journal or uploading to a preprint server.

- Git is being actively maintained by the open source community and is growing in popularity (e.g. see Wikimedia's move from SVN to Git) making it likely that it will continue to work well on whatever platforms I'm likely to use in the near to mid future.

I also looked at Mercurial and Bazaar, which are in many ways similar, but the greater popularity of Git was a winning factor for me.

ETA: See also a post by Thore Husfeldt on getting Git metadata into TeX, and (via a comment on that post) the gitinfo package for doing the same thing.
Tags: tools
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded