The New York Times has confirmed that their internal source code was leaked on a 4chan message board. The leaked data, which included over 5,000 repositories, was accessed using stolen GitHub credentials.
On June 6th, 2024, VX-Underground identified 270 GB of data stolen from the New York Times and leaked on 4chan. The 3.6 million files that comprised the stolen data contained internal source code from over 5,000 repositories – of which, only 30 were encrypted. According to the New York Times, the breach occurred six months prior in January 2024. The leaked data contained a READ.ME file in which the hacker claims to have used exposed GitHub tokens to access the New York Times source code repositories.
Security researcher, Alex Ivanovs, has gone through much of the leaked content and validated it’s validity including:
Considering the contents of the source code, the leak data seems to be from the IT & Infrastructure organization rather than the core News business.
It is important to recognize the New York Times’s for taking the proper responsible steps to properly disclose the incident and details to help the global security community avoid similar cases. Though not sharing any details of the incident, they published a statement two days after the leak was announced and 6 months after the breach:
In January 2024, a credential for a third-party cloud-based code platform was accidentally exposed. We identified the issue quickly and took necessary actions. There is no evidence of unauthorized access to Times-owned systems or any impact on our operations. We continuously monitor for any unusual activity to ensure security.
The third-party cloud-based code platform in this case is GitHub. And while I guess you can say that they did release a statement, it seems pretty nonchalant considering the sensitivity of the released data.
The New York Times monetized their platform through subscription (digital and print), advertising, and services like games, content licensing, and events. As a baseline, leaked source code exposes the possibility that competitors will be able to replicate certain games easily and launch similar services.
Beyond the source code, the breach represents a serious dent to user trust, which could have a negative impact on subscriptions.
In GitHub, the default base permission is Read, which allows all users to clone all repositories. The more secure approach is to configure “No Permission” which allows members to clone only public and internal repositories. The downside of reducing permission to “No Permission” is that certain companies using an “Inner Sourcing” approach (i.e. allow contributions to any projects by any developer in the company), will no longer be able to use this approach.
Check out the GitHub docs for how to set the base permissions for an organization.
Leverage Dynamic Developer Permission Management with Arnica: Arnica automatically identifies excessive permissions for you based on developer behavior. By either automatically or manually eliminating or reducing unused or underused permissions, you can effectively reduce permission risk in your development environment. To learn more about how Lemonade leverages Arnica’s permissions management to ensure least privilege while maintaining developer velocity, check out our Lemonade case study.
GitHub allows developers to configure one of the two token types: Classic Personal Access Token and Fine-grained Personal Access Tokens. The fine-grained token is more secure as it allows the developer to scope the accessible repositories more narrowly.
Personal access tokens organizational policies can be found under the “Third Party Access” section within organization settings – currently in beta.
It is important to consider the current usage of classic tokens within your organization to ensure a smooth migration into fine-grained personal access tokens.
The New York Times data breach highlights the need for organizations to systematically eliminate hard-coded secrets from their code and prevent new ones from being added. Regular validation of existing secrets helps prioritize the mitigation of active and valid credentials.
Preventing New Secrets in Code with Real-Time Detection
Real-time secret detection is essential to prevent new secrets from being committed to code. This proactive approach stops the continuous addition of secrets, allowing security and development teams to focus on high-severity historical issues and reducing the backlog. Detecting secrets at every code change, not just at pull requests, is key.
Eliminating Historical Secrets in Code
Secrets likely exist in your source code, with 96% of organizations reportedly affected. Effective tools are needed to identify, prioritize, and mitigate critical hard-coded secrets. Regular scans help avoid flagging already rotated secrets.
Read our comprehensive guide on selecting a secret detection and mitigation solution!
Arnica is built explicitly to avoid scenarios like the New York Times breach. By combining secret detection and mitigation with developer permissions management along with anomaly detection, code risks (SAST, SCA, IaC), and more, you’re able to build a robust application security program that works for your organization.
Book a demo with the Arnica team to see how Arnica can consolidate your application security tooling into a single, powerful platform that not only identifies risks, but helps you fix the most important risks, fast.