ATTACK

Malicious Code Campaign on GitHub Repos: Is it Hype or a Dire Threat?

Nir Valtman
CEO & Co-Founder
March 5, 2024
Nir is an experienced information & application security leader, most recently as VP security at Finastra and CISO at Kabbage. Nir is a frequent public speaker at leading conferences globally, including Black Hat, Defcon, BSides, and RSA.

TL;DR

So, you've probably heard the buzz about the attack that created hundreds of thousands of malicious open-source repositories on GitHub, right? It's been all over the news, with articles warning us about the dangers lurking in forked repos. But let's take a step back and assess this situation.  

Is the attack being overhyped, or are we looking at a genuine cybersecurity apocalypse? Let's dive in.

{{arnica-top-signup-banner="/template-pages/try-arnica-banner"}}

Malicious Code Campaign on GitHub Repos: Is it Hype or a Dire Threat?

What is a repo confusion attack?

Picture this: threat actors decide to fork a reputable repository. Then they sprinkle some malicious code into the mix. They automate that process to for each repo thousands of times. Now, an unsuspecting developer clones one of these compromised repos and runs the program – and its malicious code – which might be handing over data on a silver platter, or worse, infecting the developer’s own system!  

Scary, right?

Well, that is what happened across 100,000+ GitHub repositories in late February 2024. But before we all panic, let's not get carried away just yet...

What is the real risk of a repo confusion attack?  

Here's where things get a bit more nuanced. For this dastardly plan to work, you'd first need to somehow convince a developer to clone the forked open-source repo from GitHub, which contains malicious code. Remember, the original open-source repo is just fine. It's a bit like convincing someone to take a wrong turn on their drive to work – possible, but it requires some social engineering effort.  

For the attack to be executed, the developer needs to run the code from the spoofed repo on their workstation. But, this is only on the developer workstation, assuming it doesn’t propagate across the network.  

If the developer decides to import the repo to the organization’s source code management solution, other developers might trust the spoofed repo as legitimate as is now a company asset. That's when the alarm bells should really start ringing. Why? Because other developers might trust it by default.  

But let's be clear: this attack isn't executed when packages are merely downloaded via pip or other package managers; it requires cloning from GitHub and the execution of the code within those cloned (spoofed) repos.  

Defending against a repo confusion attack

As we just saw, the path for this type of attack to work is long, with many opportunities for effective security controls to limit its impact.  

Git posture hardening

First off, harden your git posture. Limit who can create repos in your organization to prevent unwanted "guests." Do this by going to your organization settings on GitHub, then to “Member Privileges” and prevent repository creation by members.

A screenshot of a computerDescription automatically generated

This step gives you the ability to narrow the group of people within your organization who can import or create repos, thus proportionately limiting your org’s exposure to this type of repo confusion attack.

Identify indicators of compromise in source code

Next up, make sure that you are scanning all source code repos for indicators of compromise. And if you spot something fishy, notify your developers and the security team straightaway.

At Arnica, we embedded the indicators of compromise in our platform for all of our customers, but you can use it as well by simply running a custom Semgrep rule that we developed.  

Here is the code:

Pipelineless application security testing

Implementing a pipelineless security approach to application security removes a major gap for CI/CD or IDE based risk detection: 100% code coverage from day one until the end of time! This means that even if malicious code was pushed by mistake, no matter where it was pushed, the risk would be scanned and detected. A pipelineless security approach even gives you the opportunity to message the developer directly so that they can fix the problem before it is ever introduced.  

Screenshot of Arnica

Security training

Training is key. Educate your devs about the signs of a low-reputation repo – think fewer stars than a cloudy night, no recent commits, or lack of releases. It's like teaching someone to spot a fake designer bag; the details matter!

So, are we getting too worked up about repo confusion attacks?

Look, I'm not saying there's no risk. What I am saying is that with the right precautions, the threat becomes far less menacing.

The media may love a good cyber scare story, but the reality is, with a little bit of savvy and a lot of precaution, we can avoid and otherwise navigate repo confusion attacks with our usual cool-headed competence. While malicious repositories on GitHub represent a real threat, a bit of awareness, some good practices, and voilà, you're already on your way to safeguarding your development ecosystem. So, let's keep our heads, educate and empower our teams, and remember that not every headline spells imminent doom.

THE LATEST UPDATES

More from our blog

How Arnica's Low-Reputation Package Detection Could Have Prevented the XML-RPC npm Package Breach
How Arnica's Low-Reputation Package Detection Could Have Prevented the XML-RPC npm Package Breach
December 4, 2024
New York Times Data Breach Reveals Secrets & Source Code
New York Times Data Breach Reveals Secrets & Source Code
October 30, 2024
Rabbit r1 Data Breach Again Shows The Dire Need for Improved Secrets Security
Rabbit r1 Data Breach Again Shows The Dire Need for Improved Secrets Security
August 20, 2024

{{arnica-bottom-signup-banner="/template-pages/try-arnica-banner"}}