Defending Against Source Code Exfiltration, Fast and Slow
It was early one Monday morning when Raquel, the marketing manager of a medium-sized global enterprise googled her own name. There, amongst links to her social media pages and aging articles about her academic achievements, was a curious use of her name in reference to a development ticket she made some months prior. She clicked the link and was soon staring at a source code file for one of her company’s software products, the development of which was outsourced to a vendor. A little sleuthing lead her to the discovery that all the source files—the entire repository—of this closed-source application that they were paying millions of dollars to produce was publicly available for download… by anybody.
Like any data breach, source code exfiltration—the disclosure of the human-written materials that comprise your software applications—can be accidental or intentional. The attacker can be a disgruntled employee or Lapsus$. The sources disclosed can be partial or complete. They can be disclosed to a select few, such as a competitor, or to the entire globe. But defending against them is a unique challenge.
Your developers are smarter than you!
Early in my security career, I had rolled out some expensive enterprise endpoint protection software. One of its features prevented the use of unauthorized USB drives. One day, the head of development showed me a cool trick. If he plugged in his thumb drive the protection system popped up a window threatening to encrypt the drive’s contents in sixty seconds, like a little anti-virus ED-209. So, he ejected it and plugged in a USB hub. Then, when he plugged his thumb drive into the hub, he could read and write files with no interference.
Lots of network-based solutions promise to detect exfiltration of different types of important data. A lot of them do good work stopping, say, credit card theft. But your developers are uniquely privileged to bypass these sorts of controls. They can move files between systems and to the cloud. Try to detect them, and they can zip them. Disallow transfer of zip files and they can encrypt them. Prohibit encryption and they can copy the text into a spreadsheet. Stop that from happening and they can write a small substitution program on the command line:
$ cat hello.go | tr 'packgeminfu(){t.Prl"S,Wod?}' 'abcdefghijklmnopqrstuvwxyz1'
abcdbef gbhi
hgaxro tjgot
jkic gbhilm n
jgopqrhiosiltukav wxrsyztm
1
That doesn’t look like source code at all but swapping the order of the parameters to ‘tr’ is all you need to decrypt it back into code.
Somehow stop them from doing that and they can photograph the source code and OCR it—your smartphone already does this automatically!
If you merely attempt to stop source code exfiltration at the time of breach, you are playing a game of cat and mouse where you’re the mouse. To win your adversary must only get it right once, but for you to win, you must get it right 100 percent of the time.
Castle walls don’t defend what never enters through the castle gate.
Raquel found her name in code that was written by a third-party vendor. That code was born and grew up outside the network of her employer. Once the leak happened, there was nothing her employer could do.
Okta’s source code was compromised from its hosted repository service. Snapchat’s source code was compromised through it’s iOS app.
There are many valid reasons for your source code to exist temporarily and permanently outside of your control.
Accidents happen.
Raquel’s vendor didn’t mean to let the source code slip public. They had misconfigured their repo. Snapchat accidentally included source code in their app’s .ipa package.
A preventative security control is ineffective when it’s accidentally turned off.
Defending, Fast and Slow.
In his seminal 2011 Thinking, Fast and Slow, economist Danny Kahneman develops a theory about two systems of thought. System 1 is fast, instinctive, intuitive, and emotional. System 2 is slow, analytical, deliberative, and logical.
Your security controls can be similarly classified into two sets. Many of your network controls are System 1: firewalls, WAFs, network intrusion and extrusion detectors, and DLP systems all prevent security incidents at the time of the incident. For this reason, they must all operate at or near wire speeds. When they don’t and they become a burden to operations, they get turned off. Even so-called “deep-packet inspection” systems don’t have time to deliberate.
System 1 controls are naturally susceptible to bypass. Marcus Ranum used to have a running joke about how the ultimate firewall/IPS/DPI/WAF was a set of wire cutters. And Marcus is one to joke, having written the first commercial internet firewall product.
System 2 controls don’t operate at the time of incident; they operate either well before or shortly after. Let’s take a look at some System 2 controls that are effective at stopping source code exfiltration.
How to defend, fast and slow to protect against source code exfiltration
Lawyer up and paper up.
Anyone who touches source code should be made to sign a non-disclosure agreement before they get access. Third party vendors should be made aware of the monetary value that they will be liable for in the event of accidental or intentional disclosure and may be required to be insured for that value.
Minimize access permissions.
Reduce the number of people and pipelines with access to source code to the minimum viable number but be sure to do so in a way that also minimizes friction.
Harden all the configs.
Make sure that users of repos, dashboards, pipelines, and build systems must be authenticated to access them and continually validate that access. There are various inexpensive services you can use to validate that access, or you can put some variation of
curl $page | grep “access denied” || echo “$page is visible” | mailx devsecops@example.com
in your enterprise scheduler of choice.
Implement automated anomaly detection.
Automated anomaly detection tools are the most sophisticated defense against source code exfiltration. These agents deliberatively spot suspicious developer and source code access patterns after the fact. They flag unusual behavior and prevent the malicious activity from continuing until the alert has been triaged.
Policy-driven anomaly detection systems make decisions about access attempts based on predefined rules and knowledge of past behavior. You can, for example, automatically block clients that try to pull large numbers of repositories, guess different repo URLs, or initiate downloads from previously unseen networks.
Search for unique identifiers.
Raquel found her company’s source code by googling her name. If you have particularly unique variable identifiers or filenames in your code, set up automated searches for them.
Early in the software career that preceded my security career, I wrote a program that used “mr_cruncher” as a variable name. It’s hard to describe what the code did, but that variable was something Donald Knuth might call an accumulator: its value mutated over different arithmetic operations. Calling it an accumulator was too fancy for me because I’m no Donald Knuth so instead I called it “mr_cruncher” because it was where the numbers got crunched. My boss didn’t like it but couldn’t think of a suitable replacement, so it stayed.
To this day, all I have to do is search the internet for “mr_cruncher” to see if that hideous code of mine ever gets leaked, or worse, open sourced!
Conclusion
Source code exfiltration can be costly and embarrassing. Trying to detect and block it at the time of exfil is doomed to failure. Instead, minimize access to source early on, keep configurations tight and validate them with continuous monitoring, use anomaly detection and more automated monitoring to find potential leaks.