Leading the SVN to Git migration

Once upon a time I started a new challenge within a new company. This company grew up so fast that the development processes and tools were not suitable anymore. Most projects started with only one developer and, at the time the company chose Subversion (SVN) as the main Concurrent Versions System (CVS) and a classic Waterfall process (linear) to manage projects. This choices froze things for years..

Coming from a large team (10 members) in a totally Agile environment the gap I faced was kind of huge. The lack of productivity that the company faced resulted from those ancient choices. A colleague and I decided to change things by introducing several development tools such as Git, Gitlab (GitHub alike), Jenkins (Continuous Integration), Composer (Dependency Manager for PHP) and Scrum (Agile development framework).

The SVN to GIT transition can be pretty hard to handle, depending on the size of your company, the number of repositories at stake and the number branches, tags created…

To avoid losing time and money, things have to be well prepared. In this post I will share my experience and some tricks to lead successfully the SVN to GIT transition.

Why companies still use Subversion?

Despite the popularity of GIT nowadays there is still a tremendous amount of company that still use Subversion

Git is not better than Subversion. But is also not worse. It’s different. There are many reasons possible why companies still use Subversion:

  1. Subversion’s initial release was in 2000 while Git was in 2005. Any Software Company created before 2005 might therefore have used Subversion (It is the case of eBay, inc. for example)
  2. Subversion is less complex and suit better developers working alone. Indeed Git adds complexity. Two modes of creating repositories, checkout vs. clone, commit vs. push… You have to know which commands work locally and which work with “the remote”
  3. The company grew up too quickly: The lack of time and money that most start-ups face can lead to underestimating developers needs and making unreasonable architecture choices.
  4. The fear of changing well explained by Kurt Lewin being a three-stage process:
    1. Unfreezing: dismantling the existing “mind set”, defense mechanisms have to be bypassed
    2. Movement: When the change occurs, this is typically a period of confusion and transition
    3. Freezing: The new mindset is crystallizing and one’s comfort level is returning to previous levels

What does Git do better than SVN?

I invite you to watch this video about Linus Torvalds (Git creator and Linux project’s coordinator) tech talk at Google.

If you like using CVS, you should be in some kind of mental institution or somewhere else. Linus Torvalds

Git is Scalable

Git is perfect to handle medium to large projects because it is scalable, the more files and developers involved in your project the more you can leverage from it.

Git is Distributed

Git is a DVCS (Distributed Version Control System), meaning that every user has a complete copy of the repository data (including history) stored locally (While SVN has one Central repository). Developers can therefore commit off-line, and synchronize their work to distant repositories when back online.

Git branching and merging support is a lot better

SVN isn’t branch-centric while Git is designed around the idea of branching. Making branches, using branches, merging two branches together, merging three branches together, merging branches from local and remote repositories together – Git is good at branches.

Git Flow

Git-Flow is a set of git extensions that handle most high-level repository operations (feature, hotfix and release) based on Vincent Driessen’s branching model. I recommend using this tool but only if you understand Git basic commands (git branch, git checkout, git pull etc.) otherwise in some cases you will get lost.

OSX Installation
$ brew install git-flow
Linux Installation
$ apt-get install git-flow
Windows (Cygwin) Installation
$ wget -q -O - --no-check-certificate https://github.com/nvie/gitflow/raw/develop/contrib/gitflow-installer.sh | bash

Git is Speed

Git operations are much faster than on SVN. All operations (except for push and pull) are done locally, there is therefore no network latency involved for most of the daily routine commands (git diff, git log, git commit, git branch, git merge etc.). A Git repository is also around 30x smaller than a SVN repository which is not negligible when cloning (backup).

Preparation

Teaching

Being on GitHub does not mean being competent at GIT!

Teaching Git to developers is essential. Git can be hard to learn, especially if you are used to SVN. You need to insist on what’s better than SVN and the things that you now can do with Git that you could not with SVN. I have seen developers on Github that were lost using Git in a work environment, being on GitHub does not mean being competent at GIT!

Here is some of the slides I presented to developers:

[iframely]http://www.slideshare.net/julienrenaux/git-presentation-31666204[/iframely]

Creating a authors.txt file to map SVN users to Git users.

SVN and Git do not store the same way the developer identity when committing. SVN stores the username while Git stores the full name and the email address.

Therefore prior to migrating to Git, you need to create an author mapping file that has the following format:

fdeveloper  = First Developer <first.developer@company.com>
sdeveloper = Second Developer <second.developer@company.com>
tdeveloper  = Third Developer <third.developer@company.com>
etc..

If you have missed this step and already migrated to Git, don’t worry, we still can rewrite history using the command git filter-branch!

git filter-branch --commit-filter '
        if [ "$GIT_COMMITTER_NAME" = "fdeveloper" ]; // SVN username
        then
                GIT_COMMITTER_NAME="First Developer";
                GIT_AUTHOR_NAME="First Developer";
                GIT_COMMITTER_EMAIL="first.developer@company.com";
                GIT_AUTHOR_EMAIL="first.developer@company.com";
                git commit-tree "$@";
        else
                git commit-tree "$@";
        fi' HEAD

This command will go through every single commit and if necessary change the developer information.

Using this command could be pretty slow depending of the number of commit involved.

Migrating

git svn is a simple conduit for changesets between Subversion and git. It provides a bidirectional flow of changes between a Subversion and a git repository.

Clone

For a complete transparent transition (importing commit history) you will need to use GitSvn

$ git svn clone -A ~/Desktop/authors.txt svn://IP@/Project/trunk .

You can also tell git svn not to include the metadata that Subversion normally imports, by passing –no-metadata

$ git svn clone -A ~/Desktop/authors.txt svn://IP@/Project/trunk . --no-metadata

Migrate tags

This takes the references that were remote branches that started with tag/ and makes them real (lightweight) tags.

$ git for-each-ref refs/remotes/tags | cut -d / -f 4- | grep -v @ | while read tagname; do git tag "$tagname" "tags/$tagname"; git branch -r -d "tags/$tagname"; done

Conclusion

Do not underestimate the unwillingness of your coworkers!

If you want to succeed your migration I suggest you spend a lot of time teaching Git to developers, confront them with real cases and things that Git do better than SVN, do not underestimate the unwillingness of your coworkers!

Once everybody is up to date with Git, then you can start working on your migration. If you use GitSvn things should be pretty smooth but do not forget to make backups, just in case 😉

Sources