Secrets in source code continue to be a serious problem as company after company falls victim to source code leaks. An organization’s codebase is a critical asset that may contain concentrated pieces of intellectual property, sensitive information about your software’s vulnerabilities, and potentially hardcoded secrets that can be exploited to gain access to valuable resources like cloud compute power or databases with sensitive customer information. This article focuses on how to gain control of secrets in source code, why it happens, and what tools are available to help keep your resources secure in the event of a leak.
Your organization’s codebase is very likely to contain hardcoded secrets in code, such as AWS/Azure/GCP credentials, API tokens, cryptographic keys, GitHub webhooks secrets, and even raw passwords could all be lurking in your application and configuration files. Large and growing organizations that have not yet adopted a proactive policy around source code security are even more likely to have these secrets available to anyone with access to the source code.
Imagine a software supply chain that evolves to include common additional dependencies, such as:
Over time, the growing complexity of production and development environments, combined with an increase in the number of developers working within this ecosystem, will inevitably lead to secret sprawl: an uncontrolled growth in the number of secrets in source code.
More and more companies are getting hit with source code leaks and exfiltration that could expose the contents of their codebase to malicious actors or the public. This means that for many organizations, developing a security posture around secrets management - secret detection and prevention of secret sprawl, can be mission-critical.
Before we talk about solutions to managing secrets in code, it is helpful to understand how secrets get there in the first place. Here are some key reasons why secret sprawl is common to so many organizations and why it can be difficult to prevent entirely.
Software development happens at a quick pace, and software supply chain security is not always a priority when developers are working through a problem or building on a tight deadline. Many small and growing companies with constrained resources often do not have an Application Security team, or perhaps have a small one to simply check a box. Software supply chain security is also an emerging practice and not yet as mainstream as, say, cloud security posture management. In organizations with no formal policies or guidance in place related to secrets management, developers themselves can be a major source of secrets in source code.
This is especially true when developers are working with a prototype or proof of concept using a new integration or service; it can often simply be faster to put API keys and credentials in the code. Developers may also reason that if the source code has access controls, it is a safe place to put sensitive information.
Additionally, in more than 40% of the companies integrated with Arnica that have data scientists, we identified secrets in their source code, as their notebooks or scripts tend to read and write data from various sources.
Even if developers are making a conscious effort to prevent secrets from being committed to source code, it is unfortunately still very easy for them to end up in commits, hidden inside pull requests.
For example, imagine a developer who has a habit of making many incremental commits and they need to push to an experimental branch to see changes in another system; in doing so, they ignore best practices and include hardcoded API credentials in the commits to shorten their development workflow. After verifying that their changes work as expected, the developer adds a few more commits to clean up their code, including removing the API credentials, and puts it up for review without squashing any of the intermediate commits.
When looking at the developer’s changes, the reviewer often only focuses on the final lines of changed code of a pull request; they do not normally go through intermediate commits to check for leaked credentials. Business pressure and development velocity often reign supreme over meticulous code reviews, especially for large PRs. As a result, the reviewer approves the merge, after which, the intermediate commits that contain the raw API credentials become a permanent part of the commit history of that repo.
Intermediate commits that accidentally contain secrets are such a common occurrence that GitHub has documented a tool for removing these secrets from your commit history. However, having to do this adds yet another headache to the task of ensuring secrets are purged from the repository and other developers’ work is not impacted (they need to rebase the code).
For an organization to commit to securing secrets, there are additional barriers. While the need to keep secrets secure and protected from exfiltration is a relatively modern development, system-to-system authentication has been around for decades. However, even with the most sophisticated secrets management system in place, organizations will still need to invest in removing hardcoded secrets and replacing them with more secure access patterns.
For example, some examples of popular secrets management solutions include open-source vaults such as Knox, released by Pinterest, and fully managed solutions like HashiCorp Vault. Some cloud providers also offer their own key management solutions, including Azure Key Vault. Github Encrypted Secrets provides the ability to encrypt and store secrets securely in the configuration of a repository and then use REST APIs via GitHub Actions Secrets to access them in workflows.
However, these products will not automatically migrate your secrets for you if your codebase is already littered with hardcoded credentials. Not only will you need to set up the key management solution, but you will also have to manually move and test every migrated credential used in any critical system to ensure nothing breaks.
Another new offering is GitHub secret scanning. By partnering with third-party services that publish their secret patterns, GitHub can detect known secrets in public repositories – check out Github’s supported secret scanning partners here. GitHub also allows users to add their own custom patterns within their own repositories. And by implementing specific & configurable admin roles, GitHub enables an organization to implement fine-grained access controls to secrets management within their organization. This service is free for public repositories. However, for private repositories or those inside your GitHub organization, you will need to use the GitHub Enterprise license for GitHub Advanced Security, and this option is not free.
Given the difficulty of gaining control over secrets in your source code, are the risks worth the investment? The answer is yes! Failing to get control over secrets sprawl could result in real risks to your business. Not only can material damages result from compromised resources and privacy breaches, there could also be very real damage to the company brand and customer trust. Furthermore, a company could find itself the subject of litigation by a customer or a partner implicated in a data leak as a result of exposed secrets in source code.
Here are some recent examples of companies that experienced such consequences:
It could be argued that if these companies had prioritized secret detection and kept sensitive data out of their source code, these incidents would have had less of an impact in terms of source code exposure.
Protecting secrets entails an investment in security, and as with all investments, organizations want to know if it will deliver a worthwhile ROI. It may be prohibitively expensive, unnecessary, or simply impossible to ensure there are no secrets in the code base.
Below, we offer up some tips on how to determine the right security posture for your organization.
Having a secret in your possession does not always mean you can use it to cause serious damage. Let’s take a look at a few examples of secrets and how, depending on the scope of their access, they might be harmful or benign to an organization if leaked.
Higher-risk credentials
These secrets include:
Access to this category of credentials can have disastrous effects, including but not limited to:
Consider the difference between root access and access to a very restricted test user. If root access was leaked, there is no telling how the system could be compromised. But a test user with only the most limited access needed to accomplish their tasks will present a much lower risk in the event of a leak.
Lower-risk credentials
Some secrets are simply not going to be targeted because of their limited exploitation value.
These include, for example, Slack webhook access or similar credentials that are highly dependent on data formats and context. Secrets that give only limited access to resources are also in this low-risk category.
Aside from the fact that not all secrets grant access to useful assets, being in possession of a secret does not necessarily grant broad access. This is achieved via the principle of least privilege: Credentials should be associated and valid only to the users who need them to do their job. Excessive permissions can compromise security, such as a shared API key that has no identification or expiration policy. Implementing least privilege means that credentials are narrow in scope and theoretically can do less damage than broad generic ones. This can help lower the risk of exposure in the event of a leak.
Internal systems may also be harder to access, such as credentials only valid within applications that are isolated by subnets or virtual private clouds (VPCs). For example, in the 2022 Samsung source code leak, it was determined that 90% of the 6,600 secrets contained in the source code were for internal systems, and only 10% were for external systems like GitHub, AWS, etc.
Being able to understand if an exposed secret has been accessed is also critical to assessing the impact of a breach. An audit trail showing the history of a secret can help determine the blast radius, but not all source code managers have a full audit trail:
Protecting secrets requires investment. Therefore, knowing what secrets are the most important, and which ones have been accessed, will result in the most effective security posture. Many commercially available scanners have limitations. These include a high rate of false positives, which can lead to developers finding ways to bypass warnings altogether, and a lack of organizational context. Also, most scanners can only access public repositories.
Without full insight into the secrets landscape of your organization, including what each one is used for and who has access to it, the effectiveness of these solutions can come up short.
The ideal solution to secrets management is a centrally managed tool that not only has context based on the repositories and branches in your organization, but can also detect and mitigate the introduction of new secrets in real time. This real-time aspect is crucial to ensuring that the growth of your backlog of secrets is slowed or even frozen. By locking the number of secrets found in your source code, the focus can then shift to eliminating the backlog altogether.
Where other secret scanners have limitations, Arnica Secret Detection & Mitigation:
Our secret detection and validation services are free for everyone (as they should be!). That way you know how exposed you are and how much work mitigating these risks by hand will be. Learn more about our approach here.