Time to leave CVS ?

2007/04/14

A friend sent me link two days ago making me aware that Mozilla decided to change the source code control system (= SCCS) they use. When such large and important project as Mozilla moves ahead and changes something so basic and fundamental to development as the SCCS, there must be good reason behind it. What was even more interesting was the system selected: no, it was not Subversion, but something much less established – Mercurial. Actually, until they selected it I was barely aware of its existence.

For many years, the synonym for Version Control was CVS. At least in the open source area and really large projects, CVS was the SCCS. Then things started to change and today if you look around in large open source projects, you will see that CVS and Subversion probably still lead but several really large projects are using very different tools e.g. Linux kernel is using  Git.

So – is it time to review our technology toolkit in such basic tool as source code control ? Writing is on the wall – in one of the few remaining computer magazines in Chapters I glanced over today (forgot the name, something Linux related) – was a review of the current free version control systems. The article was comparing and evaluating RCS, CVS, Subversion, GIT, Bazaar, Monotone. In their evaluation the best marks were assigned to Subversion – it got 9/10 points. Mainly because ecosystem, support, documentation, user base and add-on/tools support. The next runner up was one of the new kids on the block – distributed version control system (DVCS) named Bazaar. Good old CVS ended up with 6 points and granddaddy RCS with 3/10.

Among the new tools, two important new trends are visible: new approaches to workflow by using distributed and decentralized VCS (= DVCS) rather than central server based and shift of the implementation platform from traditional very low level languages (C) and low level static languages (C++, Java) to dynamic and interpreted scripting languages.

Using dynamic, high level languages for the SCCS is natural result of increasing computing power of hardware – which makes speed and memory limitations disappear – as well as new, more complex usage scenarios which need more complicated software to address it. The dynamic languages such as Python or Ruby also have excellent support for networking and support most of communication protocols right out of the box and allow inherent portability between all supported platforms. This was often an issue for older systems written in C/C++. Bazaar – for example – is written in Python and can therefore runs on almost any platform.The shift towards decentralized and distributed systems is a logical continuation of the trend that started with abandoning the explicit (reserved) check-out mode of work. Anybody remembers the joys of Visual SourceSafe and issues when your colleague left for two weeks vacations and left checked-out few critical files ? In the systems using reserved checkout, every programmer had read-only copy of the code and could edit only files explicitly checked out – and checkout was limited to at most one person at any given time. The big change of CVS was allowing everybody having all code writable (in his/her own sandbox) and allowing parallel changes to the same file. The simultaneous changes from different people were merged using two step process – update (transfer the changes from repository to local sandbox) and commit – upload the local changes to single, central repository. The possible conflicts were resolved between update and commit.

What is the fundamental difference between centralized and decentralized VCS ? In my limited understanding of how the DVCS work (as I have not worked with any of them yet on real project): unlike with CVS where there was one central repository and every developer had local copy of single state only (sandbox), with DVCS every developer has own copy of whole repository and therefore access to all versions of all files without depending on the centralized server. Every node is a repository and every commit is (or can be) local. Rather than synchronizing the local sandbox with the central repository, the independent repositories are synchronized. Nodes can synchronize directly as long as the changes are made available to other nodes by “publishing” them- usually using HTTP or SSH/FTP. The content of your repository will thus depend on number of nodes you synchronized with – and their own synchronization history.

New models of interaction are just one of the new features emerging. The DVCS can easily simulate workflows and processes from the traditional VCS but also allows several workflows impossible with VCS – like committing changes and getting differences without any connectivity to remote repository. (To be completely fair, you can do that to certain extent with Subversion – because latest status from repository is locally cached, you can always do diff and see local changes made – or undo them, but only for one revision). The DVCS is also very flexible – allows very dynamic creation of groups and branches and does not have single point of failure.

So what is the answer for the question from the title ? Should we throw away traditional VCS and start using new distributed decentralized tools ? I am not sure. With flexibility and freedom comes overhead and responsibility. Synchronizing large repositories can be expensive – both in time and network bandwidth. What is more important, it may require changes to the project management style and additional effort invested into keeping codebase under control – see the Development metodologies in Bazaar manual. When no repository is central, it is much harder to say where is the latest, most complete version of the system’s code. And – unless you address it with your process – it may be hard to tell whether it is currently even available. The DVCS seems to make sense mostly when the team is very large and geographically distributed. I do not see many advantages for the smaller development team working in one location – other than fixing several CVS/Subversion annoyances and possibly providing better and easier merging. What is also important is tools availability and IDE integration. All this need to be better understood. Right now the answer for me to whether to switch: no.¬† But to start looking into DVCS and evaluating to see the pros and cons – absolutely yes.

Nice comparison of features of the various SCCS is available here.

Advertisements