Source code exfiltration is a rising form of cybercrime. It occurs when business competitors, disgruntled employees, or malicious actors obtain access to and disclose the contents of private code repositories. Exfiltration is commonly used as a form of extortion, but it also facilitates commercial espionage and intellectual property theft. To detect source code exfiltration, you need tools that can spot anomalous activity, provide immediate alerts, and help you investigate suspected historical losses.
Source code exfiltration occurs when closed-source components of a company's systems are leaked outside the organization. It can happen as a result of malware attacks, cyber espionage, or blackmail attempts by disgruntled employees and contractors.
Developers and security personnel often remain unaware that exfiltration has occurred until an attacker demands a ransom or leaks the code online. Being able to anticipate and prevent exfiltration is essential because code losses can be more devastating than physical asset theft; they can let competitors replicate your IP, or expose sensitive information about your system’s operations or customers. Threat actors could use the leaked information to plan further attacks using their inside knowledge of the vulnerabilities in your code.
In this article, we'll explore the risks of source code exfiltration and share some mitigation techniques. We will explore how to defend robustly against exfiltration attempts by implementing tools that block code misappropriation and detect breaches quickly and effectively.
The loss of source code should be treated as an existential threat. Any code leaks diminish the unique value of your product. When exfiltration occurs as part of a commercial espionage campaign, your competitors may use your assets to develop their own platforms. Less scrupulous firms might copy your product wholesale and resell it with modified branding.
If your source contains hardcoded secrets, those credentials will also be exposed, permitting further attacks against your infrastructure.
As if that wasn’t enough, cybercriminals also use code exfiltration as a means of extortion. In 2022, hacker group Lapsus$ made headlines for stealing source code from companies including NVIDIA, Samsung, and Microsoft. The group demanded ransoms before publishing the stolen data. This kind of exfiltration affects reputation, compliance standing, and can result in costly regulatory penalties.
Exfiltration can be carried out in multiple ways, often unpredictably. Source can leave organizations as a result of deliberate human actions, or due to oversights such as setting improper access controls in source management systems.
Here are a few ways in which exfiltration can happen:
The variability in how exfiltration occurs makes incident detection more challenging. Nonetheless, using a combination of automated tools and employee vigilance can allow you to recognize exfiltration attempts as they occur.
While ideally source code exfiltration would be prevented in all cases, in reality you need to be prepared for it to occur by setting up alert mechanisms so that you can detect it as it happens. You also need tools that retrospectively report the sequence of events leading up to an attempt, and help you find stolen code that has been published online. Here are four techniques for finding and blocking source exfiltration attempts:
Audit trails are a simple but effective way to analyze source code movements. Enabling audit logging for your repositories lets you track who has accessed a project, when they were active, and the location from which they logged in. You can use this information as the basis for detecting abnormal clones and pulls.
Major source control providers offer integrated support for audit logs on their team and organization plans. GitHub, GitLab, and Bitbucket all have similar event-based systems that record when repositories are downloaded, mirrored, migrated, and forked. Enabling these features in your environment will provide vital information if you suspect unauthorized source access has occurred. They also provide evidence to support your case in any disciplinary or legal proceedings that may arise.
Audit trails alone are not enough to defend against exfiltration, though. They’re a historical record of access attempts, useful when you’re forensically investigating known or suspected breaches. Audit logs must be combined with other tools to detect exfiltration and alert you when it happens.
Unfortunately, not every instance of code exfiltration is systematically detectable as it occurs. Some events are inherently invisible, such as when an employee locally copies a repo they've previously cloned, then re-uploads it to the internet. No audit event will be recorded in this circumstance.
You can gain awareness of this kind of activity by monitoring public repos for the presence of unique portions of your code. Use the search APIs of GitHub, GitLab, and similar platforms to detect stolen code and analyze whether it is being spread further across the internet. Write your own script that regularly calls these APIs to find occurrences of uniquely named files from your repositories.
Repository scans are a useful additional defense layer, but are naturally limited in scope. Stolen code might not appear online for months after the theft, if it shows up at all. Attackers that exfiltrate code to incorporate into their own products may never publicize the theft, leaving you oblivious to the incident.
Automated anomaly detection tools are the most sophisticated defense against source code exfiltration. These agents spot suspicious source code access patterns, alert relevant developers and security teams, and apply automatic actions that prevent the malicious activity from continuing, until the alert has been triaged.
Policy-driven anomaly detection systems make decisions about access attempts based on predefined rules and knowledge of permissible past behavior. You can, for example, automatically block clients that try to pull a large number of repositories, guess different repo URLs, or initiate a download from an unknown location.
Automated anomaly detection empowers security teams by increasing the speed and precision with which exfiltration attempts can be addressed. You're informed each time code reaches your perimeter, without having to manually inspect audit logs or wait for it to appear in public repos.
Preventing exfiltration starts with knowing when it's happening, using the strategies outlined above. Once you've established this basic informational layer, you can implement additional hardening measures to increase your protection.
Collectively, these techniques allow you to stay ahead of code exfiltration by limiting the situations in which it's possible, alerting you when it happens, and allowing you to investigate suspected but undetected leaks. It's still not possible to catch every incident, however; there's always the risk of lone developers copying repos onto private storage devices, or malware succeeding in slipping through your net.
Reducing the number of people with access to source code helps to lower the risk of its exposure. Restricting repos to essential users only means that there exist fewer credentials that could be stolen, and the chance of a disillusioned employee misappropriating your assets is reduced. Non-developers, such as support and admin staff, rarely read or modify code, so they’re unlikely to require access.
For developers, testers, security teams, and others who must interact with code, you should precisely scope their repository access to the minimum set of privileges they require. Very few engineers need to download complete archives of a repository regularly, or set up mirroring to another server, for example.
Take steps to secure your supply chain by scanning for malicious and vulnerable packages. Malware can be unwittingly added by legitimate developers; once it has entered your project, the code could exfiltrate details of its surroundings. Avoid using external dependencies from sources you don't trust.
Developer anomaly detection automates the discovery and prevention of exfiltration attempts. Centralized policy management and tuned awareness of your team's normal access patterns increase the accuracy and coverage of the alerts you receive. You can choose to respond with immediate actions, such as revoking an access token that has been used in a suspicious location.
Source code exfiltration differs from other kinds of cyber threats because it relates to information leaving your organization, as opposed to DDoSes, account takeovers, and ransomware, which are normally targeted attacks from outside. Preventing code from crossing company perimeters requires a dedicated prevention strategy.
Addressing the challenge of code exfiltration starts with deploying tools that can detect when exfiltration happens. Combining public repo monitoring, granular audit logs, and network traffic analysis solves this part of the problem, but still leaves you several steps behind the perpetrator.
Anomalous developer behavior detection is the most comprehensive method for defending against source exfiltration. These tools work in real time, identifying exfiltration attempts as they occur and providing relevant administrators with immediate alerts and possible actions. Try Arnica today to explore how early, effective developer anomaly detection can secure your repositories.