A scan of billions of files from 13 percent of all GitHub public repositories over a period of six months has revealed that over 100,000 repos have leaked API tokens and cryptographic keys, with thousands of new repositories leaking new secrets on a daily basis.
The scan was the object of academic research carried out by a team from the North Carolina State University (NCSU), and the study’s results have been shared with GitHub, which acted on the findings to accelerate its work on a new security feature called Token Scanning, currently in beta.
Academics scanned billions of GitHub files
The NCSU study is the most comprehensive and in-depth GitHub scan to date and exceeds any previous research of its kind.
NCSU academics scanned GitHub accounts for a period of nearly six months, between October 31, 2017, and April 20, 2018, and looked for text strings formatted like API tokens and cryptographic keys.
They didn’t just use the GitHub Search API to look for these text patterns, like other previous research efforts, but they also looked at GitHub repository snapshots recorded in Google’s BigQuery database.
Across the six-month period, researchers analyzed billions of files from millions of GitHub repositories.
In a research paper published last month, the three-man NCSU team said they captured and analyzed 4,394,476 files representing 681,784 repos using the GitHub Search API, and another 2,312,763,353 files from 3,374,973 repos that had been recorded in Google’s BigQuery database.
NCSU team scanned for API tokens from 11 companies
Inside this gigantic pile of files, researchers looked for text strings that were in the format of particular API tokens or cryptographic keys.
Since not all API tokens and cryptographic keys are in the same format, the NCSU team decided on 15 API token formats (from 15 services belonging to 11 companies, five of which were from the Alexa Top 50), and four cryptographic key formats.
This included API key formats used by Google, Amazon, Twitter, Facebook, Mailchimp, MailGun, Stripe, Twilio, Square, Braintree, and Picatic.
Results came back right away, with thousands of API and cryptographic keys leaking being found every day of the research project.
In total, the NCSU team said they found 575,456 API and cryptographic keys, of which 201,642 were unique, all spread over more than 100,000 GitHub projects.
An observation that the research team made in their academic paper was that the “secrets” found using the Google Search API and the ones via the Google BigQuery dataset also had little overlap.
“After joining both collections, we determined that 7,044 secrets, or 3.49% of the total, were seen in both datasets. This indicates that our approaches are largely complementary,” researchers said.
Furthermore, most of the API tokens and cryptographic keys –93.58 percent– came from single-owner accounts, rather than multi-owner repositories.
What this means is that the vast majority of API and cryptographic keys found by the NCSU team were most likely valid tokens and keys used in the real world, as multi-owner accounts usually tend to contain test tokens used for shared-testing environments and with in-dev code.
Leaked API and crypto keys to hang around for weeks
Because the research project also took place over a six-month period, researchers also had a chance to observe if and when account owners would realize they’ve leaked API and cryptographic keys, and remove the sensitive data from their code.
The team said that six percent of the API and cryptographic keys they’ve tracked were removed within an hour after they leaked, suggesting that these GitHub owners realized their mistake right away.
Over 12 percent of keys and tokens were gone after a day, while 19 percent stayed for as much as 16 days.
“This also means 81% of the secrets we discover were not removed,” researchers said. “It is likely that the developers for this 81% either do not know the secrets are being committed or are underestimating the risk of compromise.”
Research team uncovers some high-profile leaks
The extraordinary quality of these scans was evident when researchers started looking at what and where were some of these leaks were originating.
“In one case, we found what we believe to be AWS credentials for a major website relied upon by millions of college applicants in the United States, possibly leaked by a contractor,” the NCSU team said.
“We also found AWS credentials for the website of a major government agency in a Western European country. In that case, we were able to verify the validity of the account, and even the specific developer who committed the secrets. This developer claims in their online presence to have nearly 10 years of development experience.”
In another case, researchers also found 564 Google API keys that were being used by an online site to skirt YouTube rate limits and download YouTube videos that they’d later host on another video sharing portal.
“Because the number of keys is so high, we suspect (but cannot confirm) that these keys may have been obtained fraudulently,” NCSU researchers said.
Last, but not least, researchers also found 7,280 RSA keys inside OpenVPN config files. By looking at the other settings found inside these configuration files, researchers said that the vast majority of the users had disabled password authentication and were relying solely on the RSA keys for authentication, meaning anyone who found these keys could have gained accessed to thousands of private networks.
The high quality of the scan results was also apparent when researchers used other API token-scanning tools to analyze their own dataset, to determine the efficiency of their scan system.
“Our results show that TruffleHog is largely ineffective at detecting secrets, as its algorithm only detected 25.236% of the secrets in our Search dataset and 29.39% in the BigQuery dataset,” the research team said.
GitHub is aware and on the job
In an interview with ZDNet today, Brad Reaves, Assistant Professor at the Department of Computer Science at North Carolina State University, said they shared the study’s results with GitHub in 2018.
“We have discussed the results with GitHub. They initiated an internal project to detect and notify developers about leaked secrets right around the time we were wrapping up our study. This project was publicly acknowledged in October 2018,” Reaves said.
“We were told they are monitoring additional secrets beyond those listed in the documentation, but we weren’t given further details.
“Because leakage of this type is so pervasive, it would have been very difficult for us to notify all affected developers. One of the many challenges we faced is that we simply didn’t have a way to obtain secure contact information for GitHub developers at scale,” Reaves added.
“At the time our paper went to press, we were trying to work with GitHub to do notifications, but given the overlap between our token scanning and theirs, they felt an additional notification was not necessary.”
API key leaks –a known issue
The problem of developers leaving their API and cryptographic keys in apps and websites’ source code is not a new one. Amazon has urged web devs to search their code and remove any AWS keys from public repos as far as 2014, and has even released a tool to help them scan repos before commiting any code to a public repo.
Some companies have taken it upon themselves to scan GitHub and other code-sharing repositories for accidentaly exposed API keys, and revoke the tokens even before API key owners notice the leak or abuse.
What the NCSU study has done was to provide the most in-depth look at this problem to date.
The paper that Reaves authored together with Michael Meli and Matthew R. McNiece is titled “How Bad Can It Git? Characterizing Secret Leakage in Public GitHub Repositories,” and is available for download in PDF format.
“Our findings show that credential management in open-source software repositories is still challenging for novices and experts alike,” Reaves told us.
Related security coverage:
Five Top Tips for Radar Briefings
Inspired by Harley Manning’s excellent advice on vendor briefings for evaluations, I thought I would document some of my recent experiences. Let’s be realistic: GigaOm is not the gorilla in the analyst market. Plus, we have some curious differences from other analyst firms — not least that we major in practitioner-led evaluation, bringing in an expert rather than (as Chris Mellor points out) “a team of consultants”. Nothing wrong with either approach, as I have said before, they’re just different.
So, what would be my top tips for vendors looking to brief us for a Radar report?
1. Make it technical
At GigaOm we care less about market share or ‘positioning’, and more about what the product or solution actually does. Our process involves considerable up-front effort pulling together, and peer reviewing a research proposal, following which (every time) we produce a Key Criteria report — for subscribers, this offers a how-to guide for writing an RFP.
By the time we’re onto the Radar, we’re mainly thinking, “Does it do the thing, and how well?” If we can get our technical experts in a virtual room with your technical experts, we can all get out of the way. See also: provide a demo.
2. Understand the scoring
Behind GigaOm’s model is a principle that technology commoditizes over time: this year’s differentiating product feature may be next year’s baseline. For this reason, we score against a general level, with two plusses given if a vendor delivers on a feature or quality. A vendor doing better than the rest will gain points (and we say why), and the converse is true. If we’re saying something, we need to be able to defend it — in this case, in the strengths and weaknesses in the report.
3. Make it defensible
Speaking of which, a vendor can make our lives simpler by telling us why a particular feature is better than everyone else’s. Sorry, we’re not looking for an easy ride, but to say what makes something special gives us something to talk about (as opposed to “but everyone thinks so,” etc). Note that customer proof points carry much more weight than general statements — if a customer says it to us directly, we’re far more likely to take it on board.
4. Tell us scenarios
At GigaOm, we’re scenario-led — which means we’re looking at how technology categories address particular problems. Many vendors solve specific problems particularly well (note, I don’t believe there’s such a thing as a top-right shortlist of vendors to suit all needs). Often in briefings, I ask ‘magic’ questions like, “Why do your customers love you?” which cut through generalist website hype and focus on where the solution is particularly strong.
5. Focus on the goal
A Radar briefing shouldn’t be perceived as a massive overhead — we want to know what your product does, not how well your media-trained speakers can present. Once done, our experts will be able to complete their work, then run the resulting one-pager back past you for a fact check. For sure, we’d love as much information as you can provide, and we have an extensive set of questionnaires for that purpose.
I’ve just flicked back through Harley’s ten points, and there’s a lot in there about being respectful, aiming to hit dates, not arguing over every judgment, and so on. Wise words, which we get just as often, I wager. I also recognize that even as we have published schedules, methodologies, planned improvements, and so on, you also have your own challenges and priorities.
All of which means that together, our primary goals should be effectiveness, such that we are presenting you, the vendor, correctly with respect to the category, and efficiency, in that a small amount of effort in the right places can benefit all of us. Which probably means, let’s talk.
The post Five Top Tips for Radar Briefings appeared first on GigaOm.
Achieve more with GigaOm
As we have grown substantially over the past two years. We are often asked who (even) is GigaOm, what the company does, how it differentiates, and so on. These are fair questions—many people still remember what we can call GigaOm 1.0, that fine media company born of the blogging wave.
We’ve been through the GigaOm 2.0 “boutique analyst firm” phase, before deciding we wanted to achieve more. That decision put us on a journey to where we are today, ten times the size in terms of headcount and still growing, and covering as many technology categories as the biggest analyst firms.
Fuelling our growth has been a series of interconnected decisions. First, we asked technology decision-makers —CIOs, CTOs, VPs of Engineering and Operations, and so on—what they needed, and what was missing: unanimously, they said they needed strategic technical information based on practical experience, that is, not just theory. Industry analysts, it has been said, can be like music critics who have never played in an orchestra. Sure, there’s a place for that, but it leaves a gap for practitioner-led insights.
Second, and building on this, we went through a test-and-learn phase to try various report models. Enrico Signoretti, now our VP of Product, spearheaded the creation of the Key Criteria and Radar document pair, based on his experience in evaluating solutions for enterprise clients. As we developed this product set in collaboration with end-user strategists, we doubled down on the Key Criteria report as a how-to guide for writing a Request For Proposals.
Doing this led to the third strand, expanding this thinking to the enterprise decision-making cycle. Technology decision-makers don’t wake up one morning and say, “I think I need some Object Storage.”
Rather, they will be faced with a challenge, a situation, or some other scenario – perhaps existing storage products are not scaling sufficiently, applications are being rationalized, or a solution has reached the end of life. These scenarios dictate a nhttps://gigaom.com/end-user-products/btis/eed: often, the decision maker will not only need to define a response but will also then have to justify the spending.
This reality dictates the first product in the GigaOm portfolio, the GigaBrief, which is (essentially) a how-to guide for writing a business case. Once the decision maker has confirmed the budget, they can get on with writing an RFP (cf the Key Criteria and Radar), and then consider running a proof of concept (PoC).
We have a how-to guide for these as well, based on our Benchmarks, field tests, and Business Technology Impact (BTI) reports. We know that, alongside thought leadership, decision-makers need hard numbers for costs and benefits, so we double down on these.
For end-user organizations, our primary audience, we have therefore created a set of tools to make decisions and unblock deployments: our subscribers come to us for clarity and practitioner-led advice, which helps them work both faster and smarter and achieve their goals more effectively. Our research is high-impact by design, which is why we have an expanding set of partner organizations using it to enable their clients.
Specifically, learning companies such as Pluralsight and A Cloud Guru use GigaOm reports helping subscribers set direction and lock down the solutions they need to deliver. By its nature, our how-to approach to report writing has created a set of strategic training tools, which directly feed more specific technical training.
Meanwhile, channel companies such as Ingram Micro and Transformation Continuum use our research to help their clients lock down the solutions they need, together with a practitioner-led starting point for supporting frameworks, architectures, and structures. And we work together with media partners like The Register and The Channel Company to support their audiences with research and insights.
Technology vendors, too, benefit from end-user decision makers that are better equipped to make decisions. Rather than generic market making or long-listing potential vendors, our scenario-led materials directly impact buying decisions, taking procurement from a shortlist to a conclusion. Sales teams at systems, service, and software companies tell us how they use our reports when discussing options with prospects, not to evangelize but to explore practicalities and help reach a conclusion.
All these reasons and more enable us to say with confidence how end-user businesses, learning, channel and media companies, and indeed technology vendors are achieving more with GigaOm research. In a complex and constantly evolving landscape, our practitioner- and scenario-led approach brings specificity and clarity, helping organizations reach further, work faster and deliver more.
Our driving force is the value we bring; at the same time, we maintain a connection with our media heritage, which enables us to scale beyond traditional analyst models. We also continue to learn, reflect, and change — our open and transparent model welcomes feedback from all stakeholders so that we can drive improvements in our products, our approach, and our outreach.
This is to say, if you have any thoughts, questions, raves, or rants, don’t hesitate to get in touch with me directly. My virtual door, and my calendar, are always open.
The post Achieve more with GigaOm appeared first on GigaOm.
Pragmatic view of Zero Trust
Traditionally we have taken the approach that we trust everything in the network, everything in the enterprise, and put our security at the edge of that boundary. Pass all of our checks and you are in the “trusted” group. That worked well when the opposition was not sophisticated, most end user workstations were desktops, the number of remote users was very small, and we had all our servers in a series of data centers that we controlled completely, or in part. We were comfortable with our place in the world, and the things we built. Of course, we were also asked to do more with less and this security posture was simple and less costly than the alternative.
Starting around the time of Stuxnet this started to change. Security went from a poorly understood, accepted cost, and back room discussion to one being discussed with interest in board rooms and at shareholder meetings. Overnight the executive level went from being able to be ignorant of cybersecurity to having to be knowledgable of the company’s disposition on cyber. Attacks increased, and the major news organizations started reporting on cyber incidents. Legislation changed to reflect this new world, and more is coming. How do we handle this new world and all of its requirements?
Zero Trust is that change in security. Zero Trust is a fundamental change in cybersecurity strategy. Whereas before we focused on boundary control and built all our security around the idea of inside and outside, now we need to focus on every component and every person potentially being a Trojan Horse. It may look legitimate enough to get through the boundary, but in reality it could be hosting a threat actor waiting to attack. Even better, your applications and infrastructure could be a time bomb waiting to blow, where the code used in those tools is exploited in a “Supply Chain” attack. Where through no fault of the organization they are vulnerable to attack. Zero Trust says – “You are trusted only to take one action, one time, in one place, and the moment that changes you are no longer trusted and must be validated again, regardless of your location, application, userID, etc”. Zero Trust is exactly what it says, “I do not trust anything, so I validate all the things”.
That is a neat theory, but what does that mean in practice? We need to restrict users to the absolute minimum required access to networks that have a tight series of ACL’s, to applications that can only communicate to those things they must communicate with, to devices segmented to the point they think they are alone on private networks, while being dynamic enough to have their sphere of trust changed as the organization evolves, and still enable management of those devices. The overall goal is to reduce the “blast radius” any compromise would allow in the organization, since it is not a question of “if” but “when” for a cyber attack.
So if my philosophy changes from “I know that and trust it” to “I cannot believe that is what it says it is” then what can I do? Especially when I consider I did not get 5x budget to deal with 5x more complexity. I look to the market. Good news! Every single security vendor is now telling me how they solve Zero Trust with their tool, platform, service, new shiny thing. So I ask questions. It seems to me they only really solve it according to marketing. Why? Because Zero Trust is hard. It is very hard. Complex, it requires change across the organization, not just tools, but the full trifecta of people, process, and technology, and not restricted to my technology team, but the entire organization, not one region, but globally. It is a lot.
All is not lost though, because Zero Trust isn’t a fixed outcome, it is a philosophy. It is not a tool, or an audit, or a process. I cannot buy it, nor can I certify it (no matter what people selling things will say). So that shows hope. Additionally, I always remember the truism; “Perfection is the enemy of Progress”, and I realize I can move the needle.
So I take a pragmatic view of security, through the lens of Zero Trust. I don’t aim to do everything all at once. Instead I look at what I am able to do and where I have existing skills. How is my organization designed, am I a hub and spoke where I have a core organization with shared services and largely independent business units? Maybe I have a mesh where the BU’s are distributed to where we organically integrated and staffed as we went through years of M&A, maybe we are fully integrated as an organization with one standard for everything. Maybe it is none of those.
I start by considering my capabilities and mapping my current state. Where is my organization on the NIST security framework model? Where do I think I could get with my current staff? Who do I have in my partner organization that can help me? Once I know where I am I then fork my focus.
One fork is on low hanging fruit that can be resolved in the short term. Can I add some firewall rules to better restrict VLAN’s that do not need to communicate? Can I audit user accounts and make sure we are following best practices for organization and permission assignment? Does MFA exist, and can I expand it’s use, or implement it for some critical systems?
My second fork is to develop an ecosystem of talent, organized around a security focused operating model, otherwise known as my long term plan. DevOps becomes SecDevOps, where security is integrated and first. My partners become more integrated and I look for, and acquire relationships with, new partners that fill my gaps. My teams are reorganized to support security by design AND practice. And I develop a training plan that includes the same focus on what we can do today (partner lunch and learns) with long term strategy (which may be up skilling my people with certifications).
This is the phase where we begin looking at a tools rationalization project. What do my existing tools not perform as needed in the new Zero Trust world, these will likely need to be replaced in the near term. What tools do I have that work well enough, but will need to be replaced at termination of the contract. What tools do I have that we will retain.
Finally where do we see the big, hard rocks being placed in our way? It is a given that our networks will need some redesign, and will need to be designed with automation in mind, because the rules, ACL’s, and VLAN’s will be far more complex than before, and changes will happen at a far faster pace than before. Automation is the only way this will work. The best part is modern automation is self documenting.
The wonderful thing about being pragmatic is we get to make positive change, have a long term goal in mind that we can all align on, focus on what we can change, while developing for the future. All wrapped in a communications layer for executive leadership, and an evolving strategy for the board. Eating the elephant one bite at a time.
The post Pragmatic view of Zero Trust appeared first on GigaOm.
Nothing’s Ear (Stick) Teaser Tells Us A Whole Lot Of Nothing
The good news for fans of the relatively new company is that we know Nothing will be launching a new...
Intel Reveals Arc GPU Pricing As It Goes Head-To-Head With Nvidia
At $329, the Intel Arc A770 doesn’t directly compete with the RTX 3070, but it’s vying for a spot among...
Apple Stage Manager’s iPadOS 16 Surprise Could Save You From Buying A New One
Among the older Apple iPad models that have officially received the nod (via Engadget) for Stage Manager on iPadOS 16 include...
LG’s Rollable Phone Is Dead, But Samsung Will Give You A Slidable Screen Instead
Choi didn’t go into detail about the screen resolution figures and whether the slideable concept was an OLED panel or...
Intel’s 13th-gen “Raptor Lake” CPUs are official, launch October 20
Enlarge / An overview of the improvements coming to Intel’s 13th-gen desktop chips. Intel If there’s one thing Intel has...
Social6 months ago
Web.com website builder review
Social3 years ago
CrashPlan for Small Business Review
Gadgets4 years ago
A fictional Facebook Portal videochat with Mark Zuckerberg – TechCrunch
Cars4 years ago
What’s the best cloud storage for you?
Mobile4 years ago
Memory raises $5M to bring AI to time tracking – TechCrunch
Social4 years ago
iPhone XS priciest yet in South Korea
Security4 years ago
Google latest cloud to be Australian government certified
Social4 years ago
Apple’s new iPad Pro aims to keep enterprise momentum