Connect with us

Security

Over 100,000 GitHub repos have leaked API or cryptographic keys

Published

on

A scan of billions of files from 13 percent of all GitHub public repositories over a period of six months has revealed that over 100,000 repos have leaked API tokens and cryptographic keys, with thousands of new repositories leaking new secrets on a daily basis.

The scan was the object of academic research carried out by a team from the North Carolina State University (NCSU), and the study’s results have been shared with GitHub, which acted on the findings to accelerate its work on a new security feature called Token Scanning, currently in beta.

Academics scanned billions of GitHub files

The NCSU study is the most comprehensive and in-depth GitHub scan to date and exceeds any previous research of its kind.

NCSU academics scanned GitHub accounts for a period of nearly six months, between October 31, 2017, and April 20, 2018, and looked for text strings formatted like API tokens and cryptographic keys.

They didn’t just use the GitHub Search API to look for these text patterns, like other previous research efforts, but they also looked at GitHub repository snapshots recorded in Google’s BigQuery database.

Across the six-month period, researchers analyzed billions of files from millions of GitHub repositories.

In a research paper published last month, the three-man NCSU team said they captured and analyzed 4,394,476 files representing 681,784 repos using the GitHub Search API, and another 2,312,763,353 files from 3,374,973 repos that had been recorded in Google’s BigQuery database.

NCSU team scanned for API tokens from 11 companies

Inside this gigantic pile of files, researchers looked for text strings that were in the format of particular API tokens or cryptographic keys.

Since not all API tokens and cryptographic keys are in the same format, the NCSU team decided on 15 API token formats (from 15 services belonging to 11 companies, five of which were from the Alexa Top 50), and four cryptographic key formats.

This included API key formats used by Google, Amazon, Twitter, Facebook, Mailchimp, MailGun, Stripe, Twilio, Square, Braintree, and Picatic.

NCSU GitHub scan tested APIs

Image: Meli et. al

Results came back right away, with thousands of API and cryptographic keys leaking being found every day of the research project.

In total, the NCSU team said they found 575,456 API and cryptographic keys, of which 201,642 were unique, all spread over more than 100,000 GitHub projects.

NCSU GitHub scan results

Image: Meli et. al

An observation that the research team made in their academic paper was that the “secrets” found using the Google Search API and the ones via the Google BigQuery dataset also had little overlap.

“After joining both collections, we determined that 7,044 secrets, or 3.49% of the total, were seen in both datasets. This indicates that our approaches are largely complementary,” researchers said.

Furthermore, most of the API tokens and cryptographic keys –93.58 percent– came from single-owner accounts, rather than multi-owner repositories.

What this means is that the vast majority of API and cryptographic keys found by the NCSU team were most likely valid tokens and keys used in the real world, as multi-owner accounts usually tend to contain test tokens used for shared-testing environments and with in-dev code.

Leaked API and crypto keys to hang around for weeks

Because the research project also took place over a six-month period, researchers also had a chance to observe if and when account owners would realize they’ve leaked API and cryptographic keys, and remove the sensitive data from their code.

The team said that six percent of the API and cryptographic keys they’ve tracked were removed within an hour after they leaked, suggesting that these GitHub owners realized their mistake right away.

Over 12 percent of keys and tokens were gone after a day, while 19 percent stayed for as much as 16 days.

“This also means 81% of the secrets we discover were not removed,” researchers said. “It is likely that the developers for this 81% either do not know the secrets are being committed or are underestimating the risk of compromise.”

NCSU GitHub scan timeline

Image: Meli et. al

Research team uncovers some high-profile leaks

The extraordinary quality of these scans was evident when researchers started looking at what and where were some of these leaks were originating.

“In one case, we found what we believe to be AWS credentials for a major website relied upon by millions of college applicants in the United States, possibly leaked by a contractor,” the NCSU team said.

“We also found AWS credentials for the website of a major government agency in a Western European country. In that case, we were able to verify the validity of the account, and even the specific developer who committed the secrets. This developer claims in their online presence to have nearly 10 years of development experience.”

In another case, researchers also found 564 Google API keys that were being used by an online site to skirt YouTube rate limits and download YouTube videos that they’d later host on another video sharing portal.

“Because the number of keys is so high, we suspect (but cannot confirm) that these keys may have been obtained fraudulently,” NCSU researchers said.

Last, but not least, researchers also found 7,280 RSA keys inside OpenVPN config files. By looking at the other settings found inside these configuration files, researchers said that the vast majority of the users had disabled password authentication and were relying solely on the RSA keys for authentication, meaning anyone who found these keys could have gained accessed to thousands of private networks.

The high quality of the scan results was also apparent when researchers used other API token-scanning tools to analyze their own dataset, to determine the efficiency of their scan system.

“Our results show that TruffleHog is largely ineffective at detecting secrets, as its algorithm only detected 25.236% of the secrets in our Search dataset and 29.39% in the BigQuery dataset,” the research team said.

GitHub is aware and on the job

In an interview with ZDNet today, Brad Reaves, Assistant Professor at the Department of Computer Science at North Carolina State University, said they shared the study’s results with GitHub in 2018.

“We have discussed the results with GitHub. They initiated an internal project to detect and notify developers about leaked secrets right around the time we were wrapping up our study. This project was publicly acknowledged in October 2018,” Reaves said.

“We were told they are monitoring additional secrets beyond those listed in the documentation, but we weren’t given further details.

“Because leakage of this type is so pervasive, it would have been very difficult for us to notify all affected developers. One of the many challenges we faced is that we simply didn’t have a way to obtain secure contact information for GitHub developers at scale,” Reaves added.

“At the time our paper went to press, we were trying to work with GitHub to do notifications, but given the overlap between our token scanning and theirs, they felt an additional notification was not necessary.”

API key leaks –a known issue

The problem of developers leaving their API and cryptographic keys in apps and websites’ source code is not a new one. Amazon has urged web devs to search their code and remove any AWS keys from public repos as far as 2014, and has even released a tool to help them scan repos before commiting any code to a public repo.

Some companies have taken it upon themselves to scan GitHub and other code-sharing repositories for accidentaly exposed API keys, and revoke the tokens even before API key owners notice the leak or abuse.

What the NCSU study has done was to provide the most in-depth look at this problem to date.

The paper that Reaves authored together with Michael Meli and Matthew R. McNiece is titled “How Bad Can It Git? Characterizing Secret Leakage in Public GitHub Repositories,” and is available for download in PDF format.

“Our findings show that credential management in open-source software repositories is still challenging for novices and experts alike,” Reaves told us.

Related security coverage:

Source link

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Security

GigaOm Radar for Disaster Recovery as a Service (DRaaS)

Published

on

Very few organizations see disaster recovery (DR) for their IT systems as a business differentiator, so they often prefer to outsource the process and consume it as a service (DRaaS) that’s billed monthly. There are many DRaaS providers with varying backgrounds, whose services are often shaped by that background. Products that started as customer-managed DR applications tend to have the most mature orchestration and automation, but vendors may face challenges transforming their application into a consumable service. Backup as a Service (BaaS) providers typically have great consumption models and off-site data protection, but they might be lacking in rich orchestration for failover. Other DRaaS providers come from IaaS backgrounds, with well-developed, on-demand resource deployment for recovery and often a broader platform with automation capabilities.

Before you invest in a DRaaS solution, you should attempt to be clear on what you see as its value. If your motivation is simply not to operate a recovery site, you probably want a service that uses technology similar to what you’re using at the protected site. If the objective is to spend less effort on DR protection, you will be less concerned about similarity and more with simplicity. And if you want to enable regular and granular testing of application recovery with on-demand resources, advanced failover automation and sandboxing will be vital features.

Be clear as well on the scale of disaster you are protecting against. On-premises recovery will protect against shared component failure in your data center. A DRaaS location in the same city will allow a lower RPO and provide lower latency after failover, but might be affected by the same disaster as your on-premises data center. A more distant DR location would be immune to your local disaster, but what about the rest of your business? It doesn’t help to have operational IT in another city if your only factory is under six feet of water.

DR services are designed to protect enterprise application architectures that are centered on VMs with persistent data and configuration. A lift-and-shift cloud adoption strategy leads to enterprise applications in the cloud, requiring cloud-to-cloud DR that is very similar to DRaaS from on-premises. Keep in mind, however, that cloud-native applications have different DR requirements.

How to Read this Report

This GigaOm report is one of a series of documents that helps IT organizations assess competing solutions in the context of well-defined features and criteria. For a fuller understanding consider reviewing the following reports:

Key Criteria report: A detailed market sector analysis that assesses the impact that key product features and criteria have on top-line solution characteristics—such as scalability, performance, and TCO—that drive purchase decisions.

GigaOm Radar report: A forward-looking analysis that plots the relative value and progression of vendor solutions along multiple axes based on strategy and execution. The Radar report includes a breakdown of each vendor’s offering in the sector.

Solution Profile: An in-depth vendor analysis that builds on the framework developed in the Key Criteria and Radar reports to assess a company’s engagement within a technology sector. This analysis includes forward-looking guidance around both strategy and product.

The post GigaOm Radar for Disaster Recovery as a Service (DRaaS) appeared first on Gigaom.

Continue Reading

Security

GigaOm Radar for DDoS Protection

Published

on

With ransomware getting all the news coverage when it comes to internet threats, it is easy to lose sight of distributed denial of service (DDoS) attacks even as these attacks become more frequent and aggressive. In fact, the two threats have recently been combined in a DDoS ransom attack, in which a company is hit with a DDoS and then a ransom demanded in exchange for not launching a larger DDoS. Clearly, a solid mechanism for thwarting such attacks is needed, and that is exactly what a good DDoS protection product will include. This will allow users, both staff and customers, to access their applications with no indication that a DDoS attack is underway. To achieve this, the DDoS protection product needs to know about your applications and, most importantly, have the capability to absorb the massive bandwidth generated by botnet attacks.

All the DDoS protection vendors we evaluated have a cloud-service element in their products. The scale-out nature of cloud platforms is the right response to the scale-out nature of DDoS attacks using botnets, thousands of compromised computers, and/or embedded devices. A DDoS protection network that is larger, faster, and more distributed will defend better against larger DDoS attacks.

Two public cloud platforms we review have their own DDoS protection, both providing it for applications running on their public cloud and offering only cloud-based protection. We also look at two content delivery networks (CDNs) that offer only cloud-based protection but also have a large network of locations for distributed protection. Many of the other vendors offer both on-premises and cloud-based services that are integrated to provide unified protection against the various attack vectors that target the network and application layers.

Some of the vendors have been protecting applications since the early days of the commercial internet. These vendors tend to have products with strong on-premises protection and integration with a web application firewall or application delivery capabilities. These companies may not have developed their cloud-based protections as fully as the born-in-the-cloud DDoS vendors.

In the end, you need a DDoS protection platform equal to the DDoS threat that faces your business, keeping in mind that such threats are on the rise.

How to Read this Report

This GigaOm report is one of a series of documents that helps IT organizations assess competing solutions in the context of well-defined features and criteria. For a fuller understanding consider reviewing the following reports:

Key Criteria report: A detailed market sector analysis that assesses the impact that key product features and criteria have on top-line solution characteristics—such as scalability, performance, and TCO—that drive purchase decisions.

GigaOm Radar report: A forward-looking analysis that plots the relative value and progression of vendor solutions along multiple axes based on strategy and execution. The Radar report includes a breakdown of each vendor’s offering in the sector.

Solution Profile: An in-depth vendor analysis that builds on the framework developed in the Key Criteria and Radar reports to assess a company’s engagement within a technology sector. This analysis includes forward-looking guidance around both strategy and product.

Continue Reading

Security

GigaOm Radar for Security Information and Event Management (SIEM) Solutions

Published

on

The security information and event management (SIEM) solution space is mature and competitive. Most vendors have had well over a decade to refine their products, and the differentiation among basic SIEM functions is fairly small.

In response, SIEM vendors are developing advanced platforms that ingest more data, provide greater context, and deploy machine learning and automation capabilities to augment security analysts’ efforts. These solutions deliver value by giving security analysts deeper and broader visibility into complex infrastructures, increasing efficiency and decreasing the time to detection and time to respond.

Vendors offer SIEM solutions in a variety of forms, such as on-premises appliances, software installed in the customers’ on-premises or cloud environments, and cloud hosted SIEM-as-a-Service. Many vendors have developed multi-tenant SIEM solutions for large enterprises or for managed security service providers. Customers often find SIEM solutions challenging to deploy, maintain, or even operate, leading to a growing demand for managed SIEM services, whether provided by the SIEM vendor or third-party partners.

SIEM solutions continue to vie for space with other security solutions, such as endpoint detection and response (EDR), security orchestration automation and response (SOAR), and security analytics solutions. All SIEM vendors support integrations with other security solutions. Many vendors also offer tightly integrated solution stacks, allowing customers to choose the solutions they need most, whether just a SIEM, a SIEM and a SOAR, or some other combination. Other vendors are incorporating limited EDR- or SOAR-like capabilities into their SIEM solutions for customers who want the extra features but are not ready to invest in multiple solutions.

With so many options, choosing a SIEM solution is challenging. You will have to consider several key factors, starting with your existing IT infrastructure. Is an on-premises SIEM the right choice for you, or do you want a cloud-based or hybrid solution? Which systems and devices will be sending data to your SIEM, and how much data will it need to collect, correlate, analyze, and store? You should also consider the relative importance of basic capabilities and advanced features, bearing in mind that the basic capabilities may be considerably easier to deploy, maintain, and operate. Will your IT and security teams be able to deploy, maintain, and operate the solution on their own, or should you look for managed services to handle those tasks?

This GigaOm Radar report details the key SIEM solutions on the market, identifies key criteria and evaluation metrics for selecting a SIEM, and identifies vendors and products that excel. It will give you an overview of the key SIEM offering and help decision-makers evaluate existing solutions and decide where to invest.

How to Read this Report

This GigaOm report is one of a series of documents that helps IT organizations assess competing solutions in the context of well-defined features and criteria. For a fuller understanding consider reviewing the following reports:

Key Criteria report: A detailed market sector analysis that assesses the impact that key product features and criteria have on top-line solution characteristics—such as scalability, performance, and TCO—that drive purchase decisions.

GigaOm Radar report: A forward-looking analysis that plots the relative value and progression of vendor solutions along multiple axes based on strategy and execution. The Radar report includes a breakdown of each vendor’s offering in the sector.

Solution Profile: An in-depth vendor analysis that builds on the framework developed in the Key Criteria and Radar reports to assess a company’s engagement within a technology sector. This analysis includes forward-looking guidance around both strategy and product.

Continue Reading

Trending