August 23, 2020

SVN to Git Migration

Recently, we decided to migrate one of our "legacy" product in SVN repository to git that will be hosted on Bitbucket. It had to have the full history maintained. While researching this topic, I was surprised to find out that git has built-in support for SVN!

Steps

Note that I'm only migrating the trunk in this post, but you can also migrate the full SVN structure including branches and tags.

  1. Get the list of users that have committed to SVN (from PowerShell): PS C:\MySVNRepo> svn log --quiet | ? { $_ -notlike '-*' } | % { "{0} = {0} <{0}>" -f ($_ -split ' \| ')[1] } | Select-Object -Unique | Out-File 'authors-transform.txt' -Encoding utf8
  2. Open the authors-transform.txt generated from above and add the email address of the users. This step is actually optional, it's because git includes email addresses in commits and will help map out the users later when pushed to remote.
  3. "Clone" the SVN repo as a git repo using the git svn clone command, e.g. in bash: git svn clone http://<svn repo URL>/trunk --prefix=svn/ --no-metadata --authors-file "authors-transform.txt" /c/repos/migrated-repo --username <svn user name>. (It will prompt for the SVN password.)
  4. Create an empty repository on the git server, such as Azure Repos/Bitbucket/GitHub, and get the clone URL.
  5. Add the clone URL as origin in the git repo: git remote add origin <clone URL>
  6. Push to the git server: git push -u origin master

If you were to use LFS, you can run LFS migrate between steps 3 and 4.

Migration..?

Now, as you might know already, when you hear the word "migration", it never goes smoothly. Below are some of the issues that I had to deal with.

Git 2.27.0

So at the time, I was using git 2.27.0, and of course, git svn is broken in that release. Used 2.28 RC since it wasn't officially out yet. I suppose I could've downgraded as well.

Author Not Defined

In the middle of the process, it kicked out with a message, Author: [user] not defined in authors-transform.txt file. The [user] was the first user listed in the file. I've used the -Encoding utf8 option in the PowerShell command, and git didn't like the BOM. So opened the file in Notepad++ and converted to UTF-8 without BOM, and that did the trick.

Hanging, Timeouts and Other Errors

The SVN repo had a lot of revisions, riddled with large binaries. After getting some revisions, it hanged – no activity for a while. In the Task Manager, perl was taking up about 50% of the CPU, and the folder was not growing. So after canceling the command, cd'ed into the folder and ran git svn fetch --authors-file "../authors-transform.txt" as per this article, and that seemed to have made it continue from where it left off – it detected that the last retrieved revision was not complete, and started over again from that revision. All errors below were resolved the same way:

  • Connection timed out: Connection timed out at C:/Program Files/Git/mingw64/share/perl5/Git/SVN/Ra.pm line 312
  • 1 [main] perl 44975 cygwin_exception: Dumping stack trace to perl.exe.stackdump
  • Failed to commit, invalid old:
  • Name or service not known at C:/Program Files/Git/mingw64/share/per15/Git/SVN/Ra.pm line 312.
    (This can happen if your DNS goes down, yes, it happened.)
Checksum mismatch

This was a tricky one, and it occurred on random files. After many trials, I was able to avoid this error by running the git svn clone on the SVN server machine itself, e.g.:

git svn clone file:///c/SVNData/MyProject/trunk --prefix=svn/ --no-metadata --authors-file "authors-transform.txt" /c/gitrepo/my-project

I really don't know what the root cause is, maybe our internal network is unstable, or the SVN webserver (CollabNet) had some issues. It was a large SVN repo though, took about 24 hours to complete.

New Changes in SVN

For this migration, I didn't have to worry about two way support, since we were planning to retire the SVN after migrating to git. But I did have to get some new changes from SVN after the initial migration to git.

I used git svn fetch as above to get the latest SVN changes into git. This is not enough, though, since now I have the master branch, so I needed to bring in those changes into the master branch. This SO answer seems incorrect – running git svn rebase -l gave the following error message after running for a while:

$ git svn rebase -l
Unable to determine upstream SVN information from working tree history

Just simply doing git merge remotes/svn/git-svn worked.

LFS

If you run LFS migrate, you may no longer be able to run git svn fetch again and do a merge to master to bring it up to the latest:

$ git merge remotes/svn/git-svn
fatal: refusing to merge unrelated histories

LFS migrate rewrites commits, hence new hashes will be created for commits and git will think the master branch and the svn remote branch are completely separate since they won't share a common ancestor (remember, LFS rewrites the very first commit to add .gitattributes file). There are ways around it, such as specifying --allow-unreleated-histories option, but it may get ugly since all commits are technically different, and git will warn you about merging binaries as well:

$ git merge remotes/svn/git-svn --allow-unrelated-histories
warning: Cannot merge binary files: Libraries/MyLibrary.dll (HEAD vs. remotes/svn/git-svn)
warning: Cannot merge binary files: Libraries/MyLibrary.exe (HEAD vs. remotes/svn/git-svn)
...
...

One way to handle this would be by making backups before each step, and then you can go back to the point right before you ran LFS migrate, run git svn fetch, merge to master, then run LFS migrate again.

No comments:

Post a Comment