A move from Hg to Git

A move from Hg to Git

How to convert Mercurial repository into Git repository

Once our team had a Mercurial (Hg) monorepo with 3 significant projects and several smaller ones with a 8-year history that included ~30000 commits. We set a goal to split the monorepo into multiple separated repositories and migrate each to Git with saving all the history. The goal looked unreachable, but the solution appeared to be quite simple.

Here’s how Hg monorepository can be split into separated repos and converted into Git.

Be careful with data, save backups and don’t forget to check everything in the end of the process.

1. Split the Hg repository (not necessary)

Let’s imagine that there is a Hg repo with a following structure:

1
2
3
4
5
6
7
8
9
10
11
.
├── readme.txt
├── .hgignore
├── Project1
├── Project2
└── Src
    ├── run.sh
    ├── Module1
    ├── Module2
    ├── Module3
    └── Project3

and Project1 directory needs to be separated into another repo with saving all history.

Convert extension can be used to do it, and here is how:

1.1. Prepare the mapping file

Create mapping file map.txt, that describes what directories or files have to be included to the result repo (or excluded from original repo). For example, to create separated repo that contains only Project1 files, it should look like this:

1
2
3
4
exclude "readme.txt"
exclude "Project2"
exclude "Src"
rename "Project1" .

So, all the files except the files from Project1 directory will be excluded, and files from Project1 directory will be moved to root directory. But, in this case the result repo will contain a lot of unnecessary commits and branches. It will affect on a repository weight badly.

The better way is to include only the necessary files. For example:

1
2
3
include "Project1"
include ".hgignore"
rename "Project1" .

In this case result repo will contain only commits and branches that are related to included files.

1.2. Divide the repository

Use the following command with mapping file as a parameter:

1
hg convert --filemap map.txt ${path_to_origin_hg_repo} ${path_to_create_the_result_repo}

After the process completion you will have a new Hg repo that contains only included files and their history in the target path.

I have made my first attempt of converting by excluding unnecessary directories and files, and lately, the second attempt by including necessary ones. First result’s weight was x500 times larger than the second result’s (2gb -> 4mb), and the count of branches was 2 times larger. Possibly, it happened because I didn’t exclude some directories or files that were in project structure somewhen, but were removed at the moment of this procedure.

2. Convert Hg repository into Git

2.1. Prepare the script

Download the hg-fast-export.sh script from instruction on git-scm.com

1
git clone https://github.com/frej/fast-export.git

2.2. Prepare the authors mapping file (not necessary)

Make authors.txt mapping file if you want to add emails to authors of commits, or fix them. Git has more requirements about what is written in author field, and, for example, commits can be not linked with your account in systems like GitHub or GitLab if they do not contain email. Commits made by the same person under different names can be re-written under the same name.

To make this file you can use the command below

1
hg log | grep user: | sort | uniq | sed "s/user: *//" > authors.txt

This command will build the whole list of Hg repository authors

1
2
3
4
5
johndoe
JohnDoe@desktop
johndoe@JD-PC
janesmith
Jane@PC-12311 <janesmith@company.com>

The result file have to contain pairs of an old author and a new author. So, in this case, the result authors.txt file can be:

1
2
3
4
5
"johndoe"="John Doe <johndoe@company.com>"
"JohnDoe@desktop"="John Doe <johndoe@company.com>"
"johndoe@JD-PC"="John Doe <johndoe@company.com>"
"janesmith"="Jane Smith <janesmith@company.com>"
"Jane@PC-12311 <janes@company.com>"="Jane Smith <janesmith@company.com>"

We had a lot of different names for each user in our repo, so it was very useful

2.3. Initialize Git repository

Create new Git repo in the target directory

1
git init

2.4. Make a conversion

Use hg-fast-export.sh script in the directory with newly created Git repo to make a conversion:

1
../${path_to}/hg-fast-export.sh -r ${path_to_hg_repo} -A ../${path_to}/authors.txt --force

2.5. Change the ignore file

Rename .hgignore to .gitignore, commit this change into a dev branch.

BTW, it can be done during the split of the Hg repo by adding rename ".hgignore" ".gitignore" line to the mapping file.

3. Upload Git repository to server

3.1. Create the remote

Create an empty project on your hosting (GitHub, GitLab, Gitea, Bitbucket or something else) and get the URL of remote repo.

3.2. Add the remote

Add the remote URL to your converted Git repo

1
git remote add origin ${remote_url}

3.3. Push

Push all branches

1
 git push --all origin

or only necessary ones (for example main and develop)

1
git push origin main develop 

Subrepositories

If you used subreposiories in Hg, you would need to convert them too.

In my case 2 libraries were connected as subrepositories to the main repo. I converted each into Git, and because they were libraries I added CI to upload every release into the package manager. After, I added these packages to the main project’s dependency manager. So, the package management replaced subrepositories for me because it was a more suitable solution in my case (and maybe it can be a replacement for you), but of course it is not a replacement of subrepositories functionality at all.

If you want to still use subrepositories after migration to Git you can try submodules and convert your .hgsub into .gitmodules. If this solution is not suitable enough there is a git-subrepo project that brings an alternative way to manage subrepositories in Git. In both cases you need to convert your Hg main repo and subrepos and re-connect them with one of these tools.

Possible errors

  • I had a problem with encoding of the commits’ messages when using hg-fast-export.sh on Windows and I couldn’t find a way to fix it. So I used the script only on Linux.
  • Mercurial repo after hg convert can seem empty, but it is not

Conclusion

After this move, our team started to work with Git. Projects were divided into separate repositories, which simplified working processes in our case.

It’s become a lot easier (and safer) when developers don’t have to pull 8 projects when they only need one. Another reason for this move was that this was the last repo in Hg. Our other projects were already in Git. We also got Merge Requests and the ability to use powerful tools like GitLab.

The entire 8-year commit history of each project was still available.

The only problem was the amount of unnecessary garbage in the migrated Git repos. We had a lot of remote branches (about 3000) in Hg because every feature branch was pushed to the server, and remote branches in Hg cannot be removed, only closed. I could not filter important ones from not-important ones, so I had to push them all to Git.

It would be better to clean the converted repo from closed and unused branches and filter commits from “empty” ones like merge commits or hg close before the push.

Maybe I’ll write the next note about cleaning the Git repo.

Thanks

Thanks to my coworker Eugene for sharing me the following links