Connect with us

Security

Over 100,000 GitHub repos have leaked API or cryptographic keys

Published

on

A scan of billions of files from 13 percent of all GitHub public repositories over a period of six months has revealed that over 100,000 repos have leaked API tokens and cryptographic keys, with thousands of new repositories leaking new secrets on a daily basis.

The scan was the object of academic research carried out by a team from the North Carolina State University (NCSU), and the study’s results have been shared with GitHub, which acted on the findings to accelerate its work on a new security feature called Token Scanning, currently in beta.

Academics scanned billions of GitHub files

The NCSU study is the most comprehensive and in-depth GitHub scan to date and exceeds any previous research of its kind.

NCSU academics scanned GitHub accounts for a period of nearly six months, between October 31, 2017, and April 20, 2018, and looked for text strings formatted like API tokens and cryptographic keys.

They didn’t just use the GitHub Search API to look for these text patterns, like other previous research efforts, but they also looked at GitHub repository snapshots recorded in Google’s BigQuery database.

Across the six-month period, researchers analyzed billions of files from millions of GitHub repositories.

In a research paper published last month, the three-man NCSU team said they captured and analyzed 4,394,476 files representing 681,784 repos using the GitHub Search API, and another 2,312,763,353 files from 3,374,973 repos that had been recorded in Google’s BigQuery database.

NCSU team scanned for API tokens from 11 companies

Inside this gigantic pile of files, researchers looked for text strings that were in the format of particular API tokens or cryptographic keys.

Since not all API tokens and cryptographic keys are in the same format, the NCSU team decided on 15 API token formats (from 15 services belonging to 11 companies, five of which were from the Alexa Top 50), and four cryptographic key formats.

This included API key formats used by Google, Amazon, Twitter, Facebook, Mailchimp, MailGun, Stripe, Twilio, Square, Braintree, and Picatic.

NCSU GitHub scan tested APIs

Image: Meli et. al

Results came back right away, with thousands of API and cryptographic keys leaking being found every day of the research project.

In total, the NCSU team said they found 575,456 API and cryptographic keys, of which 201,642 were unique, all spread over more than 100,000 GitHub projects.

NCSU GitHub scan results

Image: Meli et. al

An observation that the research team made in their academic paper was that the “secrets” found using the Google Search API and the ones via the Google BigQuery dataset also had little overlap.

“After joining both collections, we determined that 7,044 secrets, or 3.49% of the total, were seen in both datasets. This indicates that our approaches are largely complementary,” researchers said.

Furthermore, most of the API tokens and cryptographic keys –93.58 percent– came from single-owner accounts, rather than multi-owner repositories.

What this means is that the vast majority of API and cryptographic keys found by the NCSU team were most likely valid tokens and keys used in the real world, as multi-owner accounts usually tend to contain test tokens used for shared-testing environments and with in-dev code.

Leaked API and crypto keys to hang around for weeks

Because the research project also took place over a six-month period, researchers also had a chance to observe if and when account owners would realize they’ve leaked API and cryptographic keys, and remove the sensitive data from their code.

The team said that six percent of the API and cryptographic keys they’ve tracked were removed within an hour after they leaked, suggesting that these GitHub owners realized their mistake right away.

Over 12 percent of keys and tokens were gone after a day, while 19 percent stayed for as much as 16 days.

“This also means 81% of the secrets we discover were not removed,” researchers said. “It is likely that the developers for this 81% either do not know the secrets are being committed or are underestimating the risk of compromise.”

NCSU GitHub scan timeline

Image: Meli et. al

Research team uncovers some high-profile leaks

The extraordinary quality of these scans was evident when researchers started looking at what and where were some of these leaks were originating.

“In one case, we found what we believe to be AWS credentials for a major website relied upon by millions of college applicants in the United States, possibly leaked by a contractor,” the NCSU team said.

“We also found AWS credentials for the website of a major government agency in a Western European country. In that case, we were able to verify the validity of the account, and even the specific developer who committed the secrets. This developer claims in their online presence to have nearly 10 years of development experience.”

In another case, researchers also found 564 Google API keys that were being used by an online site to skirt YouTube rate limits and download YouTube videos that they’d later host on another video sharing portal.

“Because the number of keys is so high, we suspect (but cannot confirm) that these keys may have been obtained fraudulently,” NCSU researchers said.

Last, but not least, researchers also found 7,280 RSA keys inside OpenVPN config files. By looking at the other settings found inside these configuration files, researchers said that the vast majority of the users had disabled password authentication and were relying solely on the RSA keys for authentication, meaning anyone who found these keys could have gained accessed to thousands of private networks.

The high quality of the scan results was also apparent when researchers used other API token-scanning tools to analyze their own dataset, to determine the efficiency of their scan system.

“Our results show that TruffleHog is largely ineffective at detecting secrets, as its algorithm only detected 25.236% of the secrets in our Search dataset and 29.39% in the BigQuery dataset,” the research team said.

GitHub is aware and on the job

In an interview with ZDNet today, Brad Reaves, Assistant Professor at the Department of Computer Science at North Carolina State University, said they shared the study’s results with GitHub in 2018.

“We have discussed the results with GitHub. They initiated an internal project to detect and notify developers about leaked secrets right around the time we were wrapping up our study. This project was publicly acknowledged in October 2018,” Reaves said.

“We were told they are monitoring additional secrets beyond those listed in the documentation, but we weren’t given further details.

“Because leakage of this type is so pervasive, it would have been very difficult for us to notify all affected developers. One of the many challenges we faced is that we simply didn’t have a way to obtain secure contact information for GitHub developers at scale,” Reaves added.

“At the time our paper went to press, we were trying to work with GitHub to do notifications, but given the overlap between our token scanning and theirs, they felt an additional notification was not necessary.”

API key leaks –a known issue

The problem of developers leaving their API and cryptographic keys in apps and websites’ source code is not a new one. Amazon has urged web devs to search their code and remove any AWS keys from public repos as far as 2014, and has even released a tool to help them scan repos before commiting any code to a public repo.

Some companies have taken it upon themselves to scan GitHub and other code-sharing repositories for accidentaly exposed API keys, and revoke the tokens even before API key owners notice the leak or abuse.

What the NCSU study has done was to provide the most in-depth look at this problem to date.

The paper that Reaves authored together with Michael Meli and Matthew R. McNiece is titled “How Bad Can It Git? Characterizing Secret Leakage in Public GitHub Repositories,” and is available for download in PDF format.

“Our findings show that credential management in open-source software repositories is still challenging for novices and experts alike,” Reaves told us.

Related security coverage:

Source link

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published.

Security

Defeating Distributed Denial of Service Attacks

Published

on

It seems like every day the news brings new stories of cyberattacks. Whether ransomware, malware, crippling viruses, or more frequently of late—distributed denial of service (DDoS) attacks. According to Infosec magazine, in the first half of 2020, there was a 151% increase in the number of DDoS attacks compared to the same period the previous year. That same report states experts predict as many as 15.4 million DDoS attacks within the next two years.

These attacks can be difficult to detect until it’s too late, and then they can be challenging to defend against. There are solutions available, but there is no one magic bullet. As Alastair Cooke points out in his recent “GigaOm Radar for DDoS Protection” report, there are different categories of DDoS attacks.

And different types of attacks require different types of defenses. You’ll want to adopt each of these three defense strategies against DDoS attacks to a certain degree, as attackers are never going to limit themselves to a single attack vector:

Network Defense: Attacks targeting the OS and network operate at either Layer 3 or Layer 4 of the OSI stack. These attacks don’t flood the servers with application requests but attempt to exhaust TCP/IP resources on the supporting infrastructure. DDoS protection solutions defending against network attacks identify the attack behavior and absorb it into the platform.

Application Defense: Other DDoS attacks target the actual website itself or the web server application by overwhelming the site with random data and wasting resources. DDoS protection against these attacks might handle SSL decryption with hardware-based cryptography and prevent invalid data from reaching web servers.

Defense by Scale: There have been massive DDoS attacks, and they show no signs of stopping. The key to successfully defending against a DDoS attack is to have a scalable platform capable of deflecting an attack led by a million bots with hundreds of gigabits per second of network throughput.

Table 1. Impact of Features on Metrics
[chart id=”1001387″ show=”table”]

DDoS attacks are growing more frequent and more powerful and sophisticated. Amazon reports mitigating a massive DDoS attack a couple of years ago in which peak traffic volume reached 2.3 Tbps. Deploying DDoS protection across the spectrum of attack vectors is no longer a “nice to have,” but a necessity.

In his report, Cooke concludes that “Any DDoS protection product is only part of an overall strategy, not a silver bullet for denial-of-service hazards.” Evaluate your organization and your needs, read more about each solution evaluated in the Radar report, and carefully match the right DDoS solutions to best suit your needs.

Learn More About the Reports: Gigaom Key Criteria for DDoS, and Gigaom Radar for DDoS

The post Defeating Distributed Denial of Service Attacks appeared first on GigaOm.

Continue Reading

Security

Assessing Providers of Low-Power Wide Area Networks

Published

on

/*! elementor – v3.6.4 – 13-04-2022 */
.elementor-widget-text-editor.elementor-drop-cap-view-stacked .elementor-drop-cap{background-color:#818a91;color:#fff}.elementor-widget-text-editor.elementor-drop-cap-view-framed .elementor-drop-cap{color:#818a91;border:3px solid;background-color:transparent}.elementor-widget-text-editor:not(.elementor-drop-cap-view-default) .elementor-drop-cap{margin-top:8px}.elementor-widget-text-editor:not(.elementor-drop-cap-view-default) .elementor-drop-cap-letter{width:1em;height:1em}.elementor-widget-text-editor .elementor-drop-cap{float:left;text-align:center;line-height:1;font-size:50px}.elementor-widget-text-editor .elementor-drop-cap-letter{display:inline-block}

Blog Title: Assessing Providers of Low-Power Wide Area Network Technology

Companies are taking note of how Low-Power Wide Area Networks (LPWAN) can provide long-distance communications for certain use cases. While its slow data transfer rates and high latency aren’t going to be driving any high intensity video streaming or other bandwidth-hungry situations, it can provide inexpensive, low power, long-distance communication.

According to Chris Grundemann and Logan Andrew Green’s recent report “GigaOm Radar for LPWAN Technology Providers (Unlicensed Spectrum) v1.0,” this growing communications technology is suitable for use cases with the following characteristics:

  • Requirement for long-distance transmission—10 km/6 miles or more wireless connectivity from sensor to gateway
  • Low power consumption, with battery life lasting up to 10 years
  • Terrain and building penetration to circumvent line-of-sight issues
  • Low operational costs (device management or connection subscription cost)
  • Low data transfer rate of roughly 20kbps

These use cases could include large-scale IoT deployments within heavy industry, manufacturing, government, and retail. The LPWAN technology providers evaluated in this Radar report are currently filling a gap in the IoT market. They are certainly poised to benefit from the anticipated rapid adoption of LPWAN solutions.

Depending on the use case you’re looking to fulfill, you can select from four basic deployment models from these LPWAN providers:

  • Physical Appliance: This option would require a network server on-premises to receive sensor data from gateways.
  • Virtual Appliance: Network servers could also be deployed as virtual appliances, running either on-premises or in the cloud.
  • Network Stack as a Service: With this option, the LPWAN provider fully manages your network stack and provides you with the service. You only need devices and gateways to satisfy your requirements.
  • Network as a Service: This option is provided by mobile network operators, with the provider operating the network stack and gateways. You would only need to connect to the LPWAN provider.

Figure 1. LPWAN Connectivity

The LPWAN providers evaluated in this report are well-positioned from both a business and technical perspective, as they can function as a single point of contact for building IoT solutions. Instead of cobbling together other solutions to satisfy connectivity protocols, these providers can set up your organization with a packaged IoT solution, reducing time to market and virtually eliminating any compatibility issues.

The unlicensed spectrum aspect is also significant. The LPWAN technology providers evaluated in this Radar report use at least one protocol in the unlicensed electromagnetic spectrum bands. There’s no need to buy FCC licenses for specific frequency bands, which also lowers costs.

Learn More: Gigaom Enterprise Radar for LPWAN

The post Assessing Providers of Low-Power Wide Area Networks appeared first on GigaOm.

Continue Reading

Security

The Benefits of a Price Benchmark for Data Storage

Published

on

Why Price Benchmark Data Storage?

Customers, understandably, are highly driven by budget when it comes to data storage solutions. The cost of switching, upkeep and upgrades are high risk factors for businesses, and therefore, decision makers need to look for longevity in their chosen solution. Many factors influence how data needs to be handled within storage, including data that is frequently accessed, or storing rarely-accessed legacy data. 

Storage performance may also be shaped by geographic location, from remote work or global enterprises that need to access and share data instantly, or by the necessity of automation. Each element presents a new price-point that needs to be considered, by customers and by vendors.

A benchmark gives a comparison of system performance based on a key performance indicator, such as latency, capacity, or throughput. Competitor systems are analyzed in like-for-like situations that optimize the solution, allowing a clear representation of the performance. Price benchmarks for data storage are ideal for marketing, showing customers exactly how much value for money a solution has against competitor vendors.

Benchmark tests reinforce marketing collateral and tenders with verifiable evidence of performance capabilities and how the transactional costs relate to them. Customers are more likely to invest in long-term solutions with demonstrable evidence that can be corroborated. Fully disclosed testing environments, processes, and results, give customers the proof they need and help vendors stand out from the crowd.

The Difficulty in Choosing

Storage solutions vary greatly, from cloud options to those that utilize on-premises software. Data warehouses have different focuses which impact the overall performance, and they can vary in their pricing and licensing models. Customers find it difficult to compare vendors when the basic data storage configurations differ and price plans vary. With so many storage structures available, it’s hard to explain to customers how output relates to price, appeal to their budget, and maintain integrity, all at the same time.

Switching storage solutions is also a costly, high-risk decision that requires careful consideration. Vendors need to create compelling and honest arguments that provide reassurance of ROI and high quality performance.

Vendors should begin by pitching their costs at the right level; they need to be profitable but also appealing to the customer. Benchmarking can give an indication of how competitor cost models are calculated, allowing vendors to make judgements on their own price plans to keep ahead of the competition. 

Outshining the Competition

Benchmark testing gives an authentic overview of storage transaction-based price-performance, carrying out the test in environments that imitate real-life. Customers can gain a higher understanding of how the product works in terms of transactions per second, and how competitors process storage data in comparison.

The industry-standard for benchmarking is the TPC Benchmark E (TPC-E), a recognized standard for storage vendors. Tests need to be performed in credible environments; by giving full transparency on their construction, vendors and customers can understand how the results are derived. This can also prove systems have been configured to offer the best performance of each platform.

A step-by-step account allows tests to be recreated by external parties given the information provided. This transparency in reporting provides more trustworthy and reliable outcomes that offer a higher level of insight to vendors. Readers can also examine the testing and results themselves, to draw independent conclusions.

Next Steps

Price is the driving factor for business decisions and the selection for data storage is no different. Businesses often look towards low-cost solutions that offer high capacity, and current trends have pushed customers towards cloud solutions which are often cheaper and flexible. The marketplace is full in regard to options: new start-ups are continually emerging, and long serving vendors are needing to reinvent and upgrade their systems to keep pace. 

Vendors need evidence of price-performance, so customers can be reassured that their choice will offer longevity and functionality at an affordable price point. Industry-standard benchmarking identifies how performance is impacted by price and which vendors are best in the market – the confirmation customers need to invest.

 

The post The Benefits of a Price Benchmark for Data Storage appeared first on GigaOm.

Continue Reading

Trending