ATTACK

New York Times Data Breach Reveals Secrets & Source Code

Simon Wenet
Head of Growth
July 10, 2024
Simon has spent the last decade in security leading product management & growth teams at various companies focused on DNS security, DLP, and now application security.

TL;DR

The New York Times has confirmed that their internal source code was leaked on a 4chan message board. The leaked data, which included over 5,000 repositories, was accessed using stolen GitHub credentials.

{{arnica-top-signup-banner="/template-pages/try-arnica-banner"}}

What happened in the New York Times data breach?

On June 6th, 2024, VX-Underground identified 270 GB of data stolen from the New York Times and leaked on 4chan. The 3.6 million files that comprised the stolen data contained internal source code from over 5,000 repositories – of which, only 30 were encrypted. According to the New York Times, the breach occurred six months prior in January 2024. The leaked data contained a READ.ME file in which the hacker claims to have used exposed GitHub tokens to access the New York Times source code repositories. 

Security researcher, Alex Ivanovs, has gone through much of the leaked content and validated it’s validity including: 

  • The source code for Wordle – the viral word game acquired by New York Times for “low seven figures”
  • A database of 1,500 New York Times Education users including names, emails, and passwords
  • Internal Slack communications
  • Authentication URLs, passwords, secret keys, and tokens some of which were exposed and active
  • Private user keys for authentication 

Considering the contents of the source code, the leak data seems to be from the IT &  Infrastructure organization rather than the core News business. 

New York Times’s response to the data breach  

It is important to recognize the New York Times’s for taking the proper responsible steps to properly disclose the incident and details to help the global security community avoid similar cases. Though not sharing any details of the incident, they published a statement two days after the leak was announced and 6 months after the breach: 

In January 2024, a credential for a third-party cloud-based code platform was accidentally exposed. We identified the issue quickly and took necessary actions. There is no evidence of unauthorized access to Times-owned systems or any impact on our operations. We continuously monitor for any unusual activity to ensure security.

The third-party cloud-based code platform in this case is GitHub. And while I guess you can say that they did release a statement, it seems pretty nonchalant considering the sensitivity of the released data. 

Understanding the breach’s impact for New York Times 

The New York Times monetized their platform through subscription (digital and print), advertising, and services like games, content licensing, and events. As a baseline, leaked source code exposes the possibility that competitors will be able to replicate certain games easily and launch similar services. 

Beyond the source code, the breach represents a serious dent to user trust, which could have a negative impact on subscriptions. 

Takeaways from the New York Times data breach 

Reduce Organization-Level Base Permissions

In GitHub, the default base permission is Read, which allows all users to clone all repositories. The more secure approach is to configure “No Permission” which allows members to clone only public and internal repositories. The downside of reducing permission to “No Permission” is that certain companies using an “Inner Sourcing” approach (i.e. allow contributions to any projects by any developer in the company), will no longer be able to use this approach.

Check out the GitHub docs for how to set the base permissions for an organization.

Setting base permissions in GitHub
Setting org-level base permissions in GitHub

Leverage Dynamic Developer Permission Management with Arnica: Arnica automatically identifies excessive permissions for you based on developer behavior. By either automatically or manually eliminating or reducing unused or underused permissions, you can effectively reduce permission risk in your development environment. To learn more about how Lemonade leverages Arnica’s permissions management to ensure least privilege while maintaining developer velocity, check out our Lemonade case study.

Restrict Access Tokens

GitHub allows developers to configure one of the two token types: Classic Personal Access Token and Fine-grained Personal Access Tokens. The fine-grained token is more secure as it allows the developer to scope the accessible repositories more narrowly. 

Personal access tokens organizational policies can be found under the “Third Party Access” section within organization settings – currently in beta. 

Managing personal access tokens in GitHub
Managing personal access tokens in GitHub

It is important to consider the current usage of classic tokens within your organization to ensure a smooth migration into fine-grained personal access tokens. 

Implementing Effective Secret Detection & Mitigation 

The New York Times data breach highlights the need for organizations to systematically eliminate hard-coded secrets from their code and prevent new ones from being added. Regular validation of existing secrets helps prioritize the mitigation of active and valid credentials.

Preventing New Secrets in Code with Real-Time Detection

Real-time secret detection is essential to prevent new secrets from being committed to code. This proactive approach stops the continuous addition of secrets, allowing security and development teams to focus on high-severity historical issues and reducing the backlog. Detecting secrets at every code change, not just at pull requests, is key.

Eliminating Historical Secrets in Code

Secrets likely exist in your source code, with 96% of organizations reportedly affected. Effective tools are needed to identify, prioritize, and mitigate critical hard-coded secrets. Regular scans help avoid flagging already rotated secrets.

Read our comprehensive guide on selecting a secret detection and mitigation solution!

Arnica’s Comprehensive Application Security Platform 

Arnica is built explicitly to avoid scenarios like the New York Times breach. By combining secret detection and mitigation with developer permissions management along with anomaly detection, code risks (SAST, SCA, IaC), and more, you’re able to build a robust application security program that works for your organization. 

Book a demo with the Arnica team to see how Arnica can consolidate your application security tooling into a single, powerful platform that not only identifies risks, but helps you fix the most important risks, fast.

THE LATEST UPDATES

More from our blog

How Arnica's Low-Reputation Package Detection Could Have Prevented the XML-RPC npm Package Breach
How Arnica's Low-Reputation Package Detection Could Have Prevented the XML-RPC npm Package Breach
December 4, 2024
Rabbit r1 Data Breach Again Shows The Dire Need for Improved Secrets Security
Rabbit r1 Data Breach Again Shows The Dire Need for Improved Secrets Security
August 20, 2024
5 critical lessons from the latest GitHub phishing campaign by Gitloker
5 critical lessons from the latest GitHub phishing campaign by Gitloker
June 28, 2024

{{arnica-bottom-signup-banner="/template-pages/try-arnica-banner"}}