Connect with us


Intel and Argonne National Lab on ‘exascale’ and their new Aurora supercomputer – TechCrunch



The scale of supercomputing has grown almost too large to comprehend, with millions of compute units performing calculations at rates requiring, for first time, the exa prefix — denoting quadrillions per second. How was this accomplished? With careful planning… and a lot of wires, say two people close to the project.

Having noted the news that Intel and Argonne National Lab were planning to take the wrapper off a new exascale computer called Aurora (one of several being built in the U.S.) earlier this year, I recently got a chance to talk with Trish Damkroger, head of Intel’s Extreme Computing Organization, and Rick Stevens, Argonne’s associate lab director for computing, environment and life sciences.

The two discussed the technical details of the system at the Supercomputing conference in Denver, where, probably, most of the people who can truly say they understand this type of work already were. So while you can read at industry journals and the press release about the nuts and bolts of the system, including Intel’s new Xe architecture and Ponte Vecchio general-purpose compute chip, I tried to get a little more of the big picture from the two.

It should surprise no one that this is a project long in the making — but you might not guess exactly how long: more than a decade. Part of the challenge, then, was to establish computing hardware that was leagues beyond what was possible at the time.

“Exascale was first being started in 2007. At that time we hadn’t even hit the petascale target yet, so we were planning like three to four magnitudes out,” said Stevens. “At that time, if we had exascale, it would have required a gigawatt of power, which is obviously not realistic. So a big part of reaching exascale has been reducing power draw.”

Intel’s supercomputing-focused Xe architecture is based on a 7-nanometer process, pushing the very edge of Newtonian physics — much smaller and quantum effects start coming into play. But the smaller the gates, the less power they take, and microscopic savings add up quickly when you’re talking billions and trillions of them.

But that merely exposes another problem: If you increase the power of a processor by 1000x, you run into a memory bottleneck. The system may be able to think fast, but if it can’t access and store data equally fast, there’s no point.

“By having exascale-level computing, but not exabyte-level bandwidth, you end up with a very lopsided system,” said Stevens.

And once you clear both those obstacles, you run into a third: what’s called concurrency. High performance computing is equally about synchronizing a task between huge numbers of computing units as it is about making those units as powerful as possible. The machine operates as a whole, and as such every part must communicate with every other part — which becomes something of a problem as you scale up.

“These systems have many thousands of nodes, and the nodes have hundreds of cores, and the cores have thousands of computation units, so there’s like, billion-way concurrency,” Stevens explained. “Dealing with that is the core of the architecture.”

How they did it, I, being utterly unfamiliar with the vagaries of high performance computing architecture design, would not even attempt to explain. But they seem to have done it, as these exascale systems are coming online. The solution, I’ll only venture to say, is essentially a major advance on the networking side. The level of sustained bandwidth between all these nodes and units is staggering.

Making exascale accessible

While even in 2007 you could predict that we’d eventually reach such low-power processes and improved memory bandwidth, other trends would have been nearly impossible to predict — for example, the exploding demand for AI and machine learning. Back then it wasn’t even a consideration, and now it would be folly to create any kind of high performance computing system that wasn’t at least partially optimized for machine learning problems.

“By 2023 we expect AI workloads to be a third of the overall HPC server market,” said Damkroger. “This AI-HPC convergence is bringing those two workloads together to solve problems faster and provide greater insight.”

To that end the architecture of the Aurora system is built to be flexible while retaining the ability to accelerate certain common operations, for instance the type of matrix calculations that make up a great deal of certain machine learning tasks.

“But it’s not just about performance, it has to be about programmability,” she continued. “One of the big challenges of an exacale machine is being able to write software to use that machine. oneAPI is going to be a unified programming model — it’s based on an open standard of Open Parallel C++, and that’s key for promoting use in the community.”

Summit, as of this writing the most powerful single computing system in the world, is very dissimilar to many of the systems developers are used working on. If the creators of a new supercomputer want it to have broad appeal, they need to bring it as close to being like a “normal” computer to operate as possible.

“It’s something of a challenge to bring x86-based packages to Summit,” Stevens noted. “The big advantage for us is that, because we have x86 nodes and Intel GPUs, this thing is basically going to run every piece of software that exists. It’ll run standard software, Linux software, literally millions of apps.”

I asked about the costs involved, since it’s something of a mystery with a system like this how that a half-billion dollar budget gets broken down. Really I just thought it would be interesting to know how much of it went to, say, RAM versus processing cores, or how many miles of wire they had to run. Though both Stevens and Damkroger declined to comment, the former did note that “the backlink bandwidth on this machine is many times the total of the entire internet, and that does cost something.” Make of that what you will.

Aurora, unlike its cousin El Capitan at Lawrence Livermore National Lab, will not be used for weapons development.

“Argonne is a science lab, and it’s open, not classified science,” said Stevens. “Our machine is a national user resource; We have people using it from all over the country. A large amount of time is allocated via a process that’s peer reviewed and priced to accommodate the most interesting projects. About two thirds is that, and the other third Department of Energy stuff, but still unclassified problems.”

Initial work will be in climate science, chemistry, and data science, with 15 teams between them signed up for major projects to be run on Aurora — details to be announced soon.

Source link

Continue Reading


Apple and Google’s AI wizardry promises privacy—at a cost



Getty Images

Since the dawn of the iPhone, many of the smarts in smartphones have come from elsewhere: the corporate computers known as the cloud. Mobile apps sent user data cloudward for useful tasks like transcribing speech or suggesting message replies. Now Apple and Google say smartphones are smart enough to do some crucial and sensitive machine learning tasks like those on their own.

At Apple’s WWDC event this month, the company said its virtual assistant Siri will transcribe speech without tapping the cloud in some languages on recent and future iPhones and iPads. During its own I/O developer event last month, Google said the latest version of its Android operating system has a feature dedicated to secure, on-device processing of sensitive data, called the Private Compute Core. Its initial uses include powering the version of the company’s Smart Reply feature built into its mobile keyboard that can suggest responses to incoming messages.

Apple and Google both say on-device machine learning offers more privacy and snappier apps. Not transmitting personal data cuts the risk of exposure and saves time spent waiting for data to traverse the internet. At the same time, keeping data on devices aligns with the tech giants’ long-term interest in keeping consumers bound into their ecosystems. People that hear their data can be processed more privately might become more willing to agree to share more data.

The companies’ recent promotion of on-device machine learning comes after years of work on technology to constrain the data their clouds can “see.”

In 2014, Google started gathering some data on Chrome browser usage through a technique called differential privacy, which adds noise to harvested data in ways that restrict what those samples reveal about individuals. Apple has used the technique on data gathered from phones to inform emoji and typing predictions and for web browsing data.

More recently, both companies have adopted a technology called federated learning. It allows a cloud-based machine learning system to be updated without scooping in raw data; instead, individual devices process data locally and share only digested updates. As with differential privacy, the companies have discussed using federated learning only in limited cases. Google has used the technique to keep its mobile typing predictions up to date with language trends; Apple has published research on using it to update speech recognition models.

Rachel Cummings, an assistant professor at Columbia who has previously consulted on privacy for Apple, says the rapid shift to do some machine learning on phones has been striking. “It’s incredibly rare to see something going from the first conception to being deployed at scale in so few years,” she says.

That progress has required not just advances in computer science but for companies to take on the practical challenges of processing data on devices owned by consumers. Google has said that its federated learning system only taps users’ devices when they are plugged in, idle, and on a free internet connection. The technique was enabled in part by improvements in the power of mobile processors.

Beefier mobile hardware also contributed to Google’s 2019 announcement that voice recognition for its virtual assistant on Pixel devices would be wholly on-device, free from the crutch of the cloud. Apple’s new on-device voice recognition for Siri, announced at WWDC this month, will use the “neural engine” the company added to its mobile processorsto power up machine learning algorithms.

The technical feats are impressive. It’s debatable how much they will meaningfully change users’ relationship with tech giants.

Presenters at Apple’s WWDC said Siri’s new design was a “major update to privacy” that addressed the risk associated with accidentally transmitting audio to the cloud, saying that was users’ largest privacy concern about voice assistants. Some Siri commands—such as setting timers—can be recognized wholly locally, making for a speedy response. Yet in many cases transcribed commands to Siri—presumably including from accidental recordings—will be sent to Apple servers for software to decode and respond. Siri voice transcription will still be cloud-based for HomePod smart speakers commonly installed in bedrooms and kitchens, where accidental recording can be more concerning.

Google also promotes on-device data processing as a privacy win and has signaled it will expand the practice. The company expects partners such as Samsung that use its Android operating system to adopt the new Privacy Compute Core and use it for features that rely on sensitive data.

Google has also made local analysis of browsing data a feature of its proposal for reinventing online ad targeting, dubbed FLoC and claimed to be more private. Academics and some rival tech companies have said the design is likely to help Google consolidate its dominance of online ads by making targeting more difficult for other companies.

Michael Veale, a lecturer in digital rights at University College London, says on-device data processing can be a good thing but adds that the way tech companies promote it shows they are primarily motivated by a desire to keep people tied into lucrative digital ecosystems.

“Privacy gets confused with keeping data confidential, but it’s also about limiting power,” says Veale. “If you’re a big tech company and manage to reframe privacy as only confidentiality of data, that allows you to continue business as normal and gives you license to operate.”

A Google spokesperson said the company “builds for privacy everywhere computing happens” and that data sent to the Private Compute Core for processing “needs to be tied to user value.” Apple did not respond to a request for comment.

Cummings of Columbia says new privacy techniques and the way companies market them add complexity to the trade-offs of digital life. Over recent years, as machine learning has become more widely deployed, tech companies have steadily expanded the range of data they collect and analyze. There is evidence some consumers misunderstand the privacy protections trumpeted by tech giants.

A forthcoming survey study from Cummings and collaborators at Boston University and the Max Planck Institute showed descriptions of differential privacy drawn from tech companies, media, and academics to 675 Americans. Hearing about the technique made people about twice as likely to report they would be willing to share data. But there was evidence that descriptions of differential privacy’s benefits also encouraged unrealistic expectations. One-fifth of respondents expected their data to be protected against law enforcement searches, something differential privacy does not do. Apple’s and Google’s latest proclamations about on-device data processing may bring new opportunities for misunderstandings.

This story originally appeared on

Continue Reading


Amazon joins Apple, Google by reducing its app store cut



Enlarge / The Amazon Fire HD 8 tablet, which runs Amazon’s Fire OS.

Apparently following the lead of Apple and Google, Amazon has announced that it will take a smaller revenue cut from apps developed by teams earning less than $1 million annually from their apps on the Amazon Appstore. The same applies to developers who are brand-new to the marketplace.

The new program from Amazon, called the Amazon Appstore Small Business Accelerator Program, launches in Q4 of this year, and it will reduce the cut Amazon takes from app revenue, which was previously 30 percent. (Developers making over $1 million annually will continue to pay the original rate.) For some, it’s a slightly worse deal than Apple’s or Google’s, and for others, it’s better.

Amazon’s new indie-friendly rate is 20 percent, in contrast to Apple’s and Google’s 15 percent. Amazon seeks to offset this difference by granting developers 10 percent of their Appstore revenue in the form of a credit for AWS. For certain developers who use AWS, it could mean that Amazon’s effective cut is actually 10 percent, not 15 or 20 percent.

But for some, it amounts to something more like giving the developer a coupon on a purchase of services from Amazon than actually putting more cash in their pockets. It leaves small developers who aren’t spending a bunch of money on Amazon’s services with a worse deal than they’d get on Apple’s or Google’s marketplaces.

As with Apple’s program—but not Google’s—the lower rate applies to developers only if they made $1 million or less in total (in this case, the numbers assessed are those from the previous year). Crossing that threshold will lead developers to pay the older, higher rate on all of their earnings. In contrast, Google always takes a smaller cut of the first million in a given year and then applies the bigger cut to revenues after $1 million without changing the amount it took from the first million.

The Amazon Appstore primarily exists as the app store for Amazon’s Android-based Fire OS software that runs on tablets. It’s also offered as an alternative App Store for users of other Android-based operating systems.

All three companies are facing various forms of regulatory scrutiny, and that scrutiny was likely a factor in Apple’s decision to cut the fees it applies to apps released by small developers on the Apple App Store. Google followed shortly afterward for its Google Play marketplace.

Continue Reading


Microsoft’s Linux repositories were down for 18+ hours



Enlarge / In 2017, Tux was sad that he had a Microsoft logo on his chest. In 2021, he’s mostly sad that Microsoft’s repositories were down for most of a day.

Jim Salter

Yesterday,—the repository from which Microsoft serves software installers for Linux distributions including CentOS, Debian, Fedora, OpenSUSE, and more—went down hard, and it stayed down for around 18 hours. The outage impacted users trying to install .NET Core, Microsoft Teams, Microsoft SQL Server for Linux (yes, that’s a thing) and more—as well as Azure’s own devops pipelines.

We first became aware of the problem Wednesday evening when we saw 404 errors in the output of apt update on an Ubuntu workstation with Microsoft Teams installed. The outage is somewhat better documented at this .NET Core-issue report on Github, with many users from all around the world sharing their experiences and theories.

The short version is, the entire repository cluster which serves all Linux packages for Microsoft was completely down—issuing a range of HTTP 404 (content not found) and 500 (Internal Server Error) messages for any URL—for roughly 18 hours. Microsoft engineer Rahul Bhandari confirmed the outage roughly five hours after it was initially reported, with a cryptic comment about the infrastructure team “running into some space issues.”

Eighteen hours after the issue was reported, Bhandari reported that the mirrors were once again available—although with temporarily degraded performance, likely due to cold caches. In this update, Bhandari said that the original cause of the outage was “a regression in [apt repositories] during some feature migration work that resulted in those packages becoming unavailable on the mirrors.”

We’re still waiting for a comprehensive incident report, since Bhandari’s status updates provide clues but no real explanations. The good news is, we can confirm that is indeed up once again, and it is serving packages as it should.

Continue Reading