Scanning the Alexa Top 1M for git daemons

Git comes with a "git daemon" command that allows to easily provide access to a repository so that it can be cloned with git clone git://host/repo.git. In this blogpost I'll share my results of scanning the Alexa Top 1M websites for such exposed git daemons.

About git daemons

While working on a project with some colleagues, I wanted to share a git repository over a local network with them. A quick Google search lead me to the git documentation describing the functionality that is provided with the git daemon command.

For example, you can run the following command from within a git repository or a folder that contains several repositories:

gehaxelt@LagTop /t/gitdaemons (master)> git daemon --reuseaddr --export-all --base-path=./

The --base-path parameter tells the daemon that all incoming requests should be treated as relative paths to the current directory. The --export-all instructs the daemon to serve all repositories within the directory, otherwise one would need to put a file called git-daemon-export-ok in each repository.
The documentation describes some more security-related features that are enabled by default, like the read-only access.

Anyway, once the daemon is up and running, it listens on port 9418/tcp:

gehaxelt@LagTop /t/localhost (master)> sudo netstat -tulpen|grep git
tcp        0      0 0.0.0.0:9418            0.0.0.0:*               LISTEN      1000       3462547    28396/git-daemon

The repository can then be cloned with the help of the git:// protocol:

gehaxelt@LagTop /tmp> git clone git://localhost/
Cloning into 'localhost'...
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (3/3), done.

Scanning for git daemons

As always, I decided to give it a try and see if there are some hosts within the infamous Alexa Top 1M that are using this feature to export git repositories and forgot to get the security aspects right.

I logged in to my RaspberryPi, downloaded the Alexa Top 1M list, extracted all domains and ran the following nmap command:

pi@raspberrypi ~/scan> sudo nmap -sS -p 9418 -iL top-1m.csv -oA gitdaemon.scan -v --open -Pn -n 

After a couple of days I checked back to see that the scan had finished (it actually took about 3.3 days to complete oO):

Nmap done: 999795 IP addresses (999786 hosts up) scanned in 287936.81 seconds
Raw packets sent: 1740141 (76.566MB) | Rcvd: 326678 (15.301MB)

With the finished nmap scan data that is saved in all three formats, we can continue to extract the IPs from the nmap file and to randomize the obtained IP addresses, so that we do not send consecutive requests to the same network:

pi@raspberrypi ~/scan> cat gitdaemon.scan.nmap  | grep "scan report" | wc -l                                                                                                                                                                 
8549 

pi@raspberrypi ~/scan> cat gitdaemon.scan.nmap  | grep "scan report" | grep -oP "(\d+\.\d+\.\d+\.\d+)" | sort | uniq > gitport.ips

pi@raspberrypi ~/scan> cat gitport.ips.sorted | sort -R > gitport.ips

pi@raspberrypi ~/scan> wc -l gitport.ips
2690 gitport.ips 

The initially discovered 8500 domains map to only about 2690 IPs, so on average there are three websites that share a server in regards to open git daemon ports. Afterwards we use the following fish "one-liner" to iterate over all IPs and try to clone a potentially configured repository name:

pi@raspberrypi ~/scan> cat gitport.ips | while read ip
    timeout -k 30s 15s git clone "git://$ip/" "./findings/$ip/" 2>&1 | tee -a output.log 
end

Apparently the git clone command does not have a timeout built in, so we use the timeout command to kill a clone process after 15 seconds (or 30 if it ignores the SIGTERM) if it hangs for some reason. The short timeframe might not be enough to clone huge repositories, but we would see those aborted clones in the log file and could then re-issue them.

As we do not know any (possible) repository names, we simply try to clone the default/root repository (/) if there is one. The whole process took about a whole night.

The results

Spoiler: There was not a single cloned repository in the ./findings/ directory, but let's have a look at some numbers and observations in this section anyway!

The output.log file contained several different error messages:

Cloning into './findings/XXX.ZZZ.YYY.QQQ'...
Cloning into './findings/XXX.ZZZ.YYY.QQQ'...
Cloning into './findings/XXX.ZZZ.YYY.QQQ'...

A huge chunk of the logfile contained these lines that indicate that we ran into a timeout and no cloning process happened. Maybe the nmap scan was not too reliable and/or our IP was already blacklisted (?)

Cloning into './findings/XXX.ZZZ.YYY.QQQ'...
fatal: read error: Connection reset by peer

About 600 times this error occurred and it seems to imply that the remote server reset the connection. A possible reason might be that there was a service on that port that is not a git daemon or something else that didn't want to talk to us.

Cloning into './findings/XXX.ZZZ.YYY.QQQ'...
fatal: unable to connect to XXX.ZZZ.YYY.QQQ:
XXX.ZZZ.YYY.QQQ[0: XXX.ZZZ.YYY.QQQ]: errno=Connection refused

39 times we received this error. Apparently we were refused to connect to the service.

Cloning into './findings/XXX.ZZZ.YYY.QQQ'...
fatal: remote error: access denied or repository not exported: /

More than 200 times we got this error telling us that we either do not have access to clone something or the specified repository does not exist. However, this at least seems promising and tells us that there seems to be some kind of git daemon running on the system.

Cloning into './findings/XXX.ZZZ.YYY.QQQ'...
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Exactly 36 times we got this error message stating a similar thing to the previous error message.

All in all, we didn't successfully clone any repository, but got several strong indicators where a repository could exist. To be honest, I was a bit disappointed with that result, but it seems like the default options for the git daemon command were chosen wisely and with security in mind to prevent such unwanted leaks. At least the --export-all must be explicitly set or a specific file created in a repository to export it. If the --base-path parameter is not set, we would also need to know the full path to the repository on the remote system. Furthermore only read-only access is granted, so an attacker cannot push malicious code if --enable=receive-pack isn't set.

Ideas for the future

Here are some ideas that crossed my mind while analyzing the data, but I didn't have the time to further explore them.

Brute forcing repository names

An idea that came to my mind was to compile a list of commonly used repository names and then trying to run a brute-force against the endpoints and maybe find a correct path to an exposed repository.

Finding anonymous repositories

Some open source projects allow anonymous, read-only access to their repositories. One example is Cygwin that can be found within our results:

Cloning into './findings/209.132.180.131'...
fatal: remote error: access denied or repository not exported: /

The IP address belongs to cygwin.com and sourceware.org:

gehaxelt@LagTop /b/g/h/r/g/scan> grep 209.132.180.131 gitdaemon.scan.nmap
Nmap scan report for cygwin.com (209.132.180.131)
Nmap scan report for sourceware.org (209.132.180.131)

The cygwin website points to the anonymous git repository and we can indeed clone it with the right repository name:

gehaxelt@LagTop /tmp > git clone git://209.132.180.131/git/newlib-cygwin.git
Cloning into 'newlib-cygwin'...
remote: Counting objects: 172787, done.
remote: Compressing objects: 100% (35598/35598), done.
remote: Total 172787 (delta 137212), reused 170722 (delta 135264)
Receiving objects: 100% (172787/172787), 116.39 MiB | 4.57 MiB/s, done.
Resolving deltas: 100% (137212/137212), done.

So maybe one can pinpoint more such repositories by having a closer look at the IP addresses and projects/websites/companies behind them.

-=-