ATTACK

Demystifying the Pl0x GitHub attack

Mike Doyle
Head of Security Research
August 17, 2022
Mike Doyle earned a Computer Science degree in 2003, just in time to watch the post-bubble job market dry up. Handy with a bash prompt, he found work as a system admin in an attempt to edge back into development. Instead, he moved toward security consulting and penetration testing (which is what he always wanted to do anyway). Doyle believes that hard problems require elegant solutions.

TL;DR

What appeared on first blush to be a massive repository compromise turned out to be a little automation mixed with a little knowledge of some esoteric git commands. If certain other actions had followed, like creating backdoored pull requests, things could’ve gotten much worse.

{{arnica-top-signup-banner="/template-pages/try-arnica-banner"}}

Introduction: Understanding the Pl0x GitHub attack

Software engineer Stephen Lacy found 35,000 surprises a week before Patch Tuesday. The code of many thousands of repos had had code inserted, which sent environment variables to a Russian virtual private server and ran code from that server. Strangely, these identically written backdoors appeared to have been committed by many developers going back for years.

Lacy notified GitHub who quickly zapped the affected repos and reported the good news that the affected repos were all just clones of real repositories and that no real accounts were compromised.

Shortly thereafter, a new, anonymous twitter account replied to Lacy’s thread taking credit for the cloned repos, indicating that rather than trying to hack the planet, they were a security researcher in pursuit of a bug bounty, and that a report from them is forthcoming.

Patch Tuesday has come and gone and there’s been no follow up from the security researcher. That leaves us with the puzzle: how did the researcher (hereinafter: pl0x) commit backdoors that GitHub attributed to other accounts?

Breaking down the details: attack of the clones

GitHub taking swift action on the repos is good operational security because it prevents anyone from mistakenly downloading and executing the backdoor, but it’s bad for security research because it makes it difficult to analyze the threat. Fortunately, GitHub left a backdoored clone of the open-source development platform nanobox up long enough for the wayback machine to archive, possibly because it has the word “malware” right in its title, so it’s rather unlikely that anyone will think it is a legitimate copy of nanobox.

Figure 1: Infected clone of nanobox. Nice of them to let us know!

This repo is an almost exact clone of nanobox. Except for the last commit, the dates, hashes, author names, and contents of all the commits are identical.  

GitHub has a useful feature that allows anyone to produce a “fork” of anyone else’s repo, but that feature wasn’t used to produce these clones. You can tell because a forked repo will have a link to the original repo. This is a useful security feature because it lets anyone browsing repos know that they might not be looking at the genuine article.

Figure 2: Acidburn0zzz's fork of nanobox.

How were these clones created if not from being forked? One way to do it is to create a second remote to a local clone. This is a useful feature for producing mirrors. Say you want to have the same code on GitHub as on bitbucket. You could do this:

$ git clone https://github.com/arnica-ext/GitGoat
$ git remote add bb-mirror https://bitbucket.com/arnica-ext/GitGoat

But there’s no rule saying you can’t have your remotes all pointing to the same service.  

Why would you do this? One reason is to prevent GitHub from letting other users know it’s a clone, but there might be another reason. Let’s look at that last commit.

Forging a backdoor

The below table documents the similarities and differences between the last commit of the original Nanobox repo and the cloned repo:

Category Original Nanobox Cloned Nanobox
Author Name Tyler Flint Tyler Flint
Author Email tylerflint@gmail.com tylerflint@gmail.com
Commit time Mon Oct 21 15:58:15 2019 Mon Oct 21 15:58:15 2019
Commit scope Two files 281 files
Commit changes 4 lines added, 4 deleted 8430 lines added, 4 deleted

Table 1: Nanobox original vs clone comparison.

GitHub links the clone commit to the same author’s page. Clearly, Tyler Flint didn’t commit to the real nanobox and at the exact same time produce a backdoored clone two years ago. Something else is going on.

Figure 3: GitHub links the backdoored commit to the original author.

First, git makes it trivially easy to make commits as someone else. Just change your username and email.

$ git config user.name "Nir Valtman"
$ git config user.email "nirvaltman@arnica.io"

Using this one weird trick that your DevSecOps team doesn’t want you to know about will let you produce new commits as anyone, but Pl0x didn’t do that. He backdoored an existing commit. Modifying any arbitrary commit is tedious yet possible, but changing only the last commit is a one-liner:

$ git commit --amend

Here’s an example:

$ cd GitGoat
$ gsed -i 's/CODEOWNERS/CODESTEALERS/g' run.py
$ git commit --amend
  [main 779a2f0] updated secrets fetching bug
   Author: nir-valtman <nirvaltman@arnica.io>
   Date: Wed May 11 22:27:30 2022 -0400
   2 files changed, 6 insertions(+), 5 deletions(-)

Notice that git keeps the original commit date, even though I just made modifications to it. It also uses the original commits username and email.

Also, consider that since I’ve amended the commit on a local clone, I can add second remote and push to GitHub. GitHub’s web UI doesn’t expose the commit amend feature.

Putting it all together

Make a local clone, add a remote that you control, amend the last commit with a backdoor, push. It might look like the following shell script:

#!/usr/bin/env bash
# Call me with the source org/branch and destination branch.
# Have an ssh agent authed to the dest org.
src=$1
dst=$2
dir=`basename $src`

# Clone the source locally.

git clone https://github.com/$src `basename $src`

# Set my git username and email to the username and email of
# the most recent committer.
cd `basename $src`
username=`git log | egrep '^Author:' | head -1 | cut -d' ' -f2- | cut -d\< -f1`
useremail=`git log | egrep '^Author:' | head -1 | cut -d\< -f2 | cut -d\> -f1`
git config user.name "$username"
git config user.email "$useremail"

echo "Infect the files here and press return, or Ctrl-C to exit"
read

# Amend the malware to the last commit.
git commit -a --amend --no-edit

# Create the remote repo and push to it.
gh repo create $dst --private -r malware -s . --push

I ran this against our open-source repo testing tool GitGoat. You can see the results on GitHub, here.

Notice that GitHub links to Nir’s page. GitHub picks the name and email of the last commit and trusts it, acting as a confused deputy.

Figure 4: GitHub says that this is Nir's code, but it isn't.

Conclusion: There is a right and wrong way to 'bug bounty' & some best practices

What’s the takeaway from all this?

First and foremost, security researchers should all know by now that infecting packages with code that steals environment variables is not an innocent way to look for bug bounties. In containerized deployments, environment variables tend to contain secrets. In fact, this attack vector was used in the HauteLook attack I blogged about previously. But alternatively, anonymously stating that you were just pursuing a bug bounty sounds like plausible deniability for a hacker who got caught in the first phase of a massive reposquatting attack, depending on how much innocence you like to presume. If anyone had mistaken one of the cloned repos for a real repo, this would’ve been a nasty breach for them.

Besides reposquatting, this sort of clone could be used to create backdoored PRs for the original projects, although a fork instead of a clone would work just as well.  

We at Arnica have anticipated this sort of attack and have been working on a feature in arnica.io to identify and report when the code pusher differs from the code author. If you’d like to beta this with us, please reach out. Sign-up is free.

Figure 5: Arnica identifies when the code pusher differs from the code author.

Second, git is a more flexible VCS than a lot of folks realize. Like the Matrix, some of its rules can be bent, some can be broken.  

Finally, the author of a commit might not be the person who pushes a commit. Signing and verifying your commits with GPG is a good practice. Almost no one does it and to prevent this sort of attack, consistency is crucial. We at Arnica will have an upcoming announcement related to this problem.

THE LATEST UPDATES

More from our blog

How Arnica's Low-Reputation Package Detection Could Have Prevented the XML-RPC npm Package Breach
How Arnica's Low-Reputation Package Detection Could Have Prevented the XML-RPC npm Package Breach
December 4, 2024
New York Times Data Breach Reveals Secrets & Source Code
New York Times Data Breach Reveals Secrets & Source Code
October 30, 2024
Rabbit r1 Data Breach Again Shows The Dire Need for Improved Secrets Security
Rabbit r1 Data Breach Again Shows The Dire Need for Improved Secrets Security
August 20, 2024

{{arnica-bottom-signup-banner="/template-pages/try-arnica-banner"}}