Cerebras’ wafer-size chip is 10,000 times faster than a GPU

Cerebras Systems and the federal Department of Energy’s National Energy Technology Laboratory announced that the company’s CS-1 system is more than 10,000 times faster than a graphics processing unit (GPU).

On a practical level, that means that AI neural networks that in the past took months to train, can now train in minutes on the Cerebras system.

Cerebras makes the world’s largest computer chip, the WSE. Normally, chip makers slice a wafer from a 12-inch diameter ingot of silicon to process in a chip factory. Once processed, the wafer is sliced into hundreds of separate chips that can be used in electronic hardware.

But Cerebras, started by SeaMicro founder Andrew Feldman, takes that wafer and makes a single, massive chip out of it. Each piece of the chip, dubbed a core, is interconnected in a sophisticated way to other cores. The interconnections are designed to keep it all functioning at high speeds so the transistors all work together as one.

Cerebras’s CS-1 system uses the WSE wafer-size chip, which has 1.2 trillion transistors, the basic on-off electronic switches that are the building blocks of silicon chips. Intel’s first 4004 processor in 1971 had 2,300 transistors, and the Nvidia A100 80GB chip, announced yesterday, has 54 billion transistors.

Feldman said in an interview with VentureBeat that the CS-1 was also 200 times faster than the Joule Supercomputer, which is No. 82 on the list of the top 500 supercomputers in the world.

“It shows record-shattering performance,” Feldman said. “It also shows that wafer scale technology has applications beyond AI.”

Above: The Cerebras WSE has 1.2 trillion transistors compared to Nvidia’s largest GPU, the A100 at 54.2 billion transistors.

Those are the fruits of the radical approach of Los Altos, California-based Cerebras, which created a silicon wafer with 400,000 AI cores on it instead of slicing that wafer into individual chips. The unusual design makes it a lot easier to do tasks because the processor and memory are closer to each other and have lots of bandwidth to connect them, Feldman said. The question remains how widely applicable it is to different computing tasks.

In a paper based on the results of Cerebras’ work with the federal lab, the parties said that the CS-1 can deliver performance that is unattainable with any number of central processing units (CPUs) and GPUs, which are both commonly used in supercomputers (Nvidia’s GPUs are used in 70% of the top supercomputers now). And that is “no matter how large that supercomputer is,” Feldman said.

Cerebras is making the presentation at the SC20 supercomputing online event this week. The CS-1 beat the Joule Supercomputer at a workload for computational fluid dynamics, which simulates the movement of fluids in places such as a carburetor. The Joule Supercomputer costs tens of millions of dollars to build. It has 84,000 CPU cores, spread over dozens of racks, and it consumes 450 kilowatts of power.

Above: Cerebras has a half-dozen or so supercomputing customers.

In this demo, the Joule Supercomputer used 16,384 cores, and the Cerebras computer was 200 times faster than that, according to the energy lab director Brian Anderson.

“For these workloads, the wafer-scale CS-1 is the fastest machine ever built,” Feldman said. “And it is faster than any other combination or cluster of other processors.”

A single Cerebras CS-1 is 26 inches tall, fits in one-third of a rack and is powered by the industry’s only wafer-scale processing engine, Cerebras’ WSE. It combines memory performance with massive bandwidth, low latency interprocessor communication, and an architecture optimized for high bandwidth

The research was led by Dirk Van Essendelft, machine learning and data science engineer at NETL, and Michael James, Cerebras cofounder and chief architect of advanced technologies. The results came after months of work.

In September 2019, the Department of Energy announced its partnership with Cerebras, including deployments with Argonne National Laboratory and Lawrence Livermore National Laboratory.

The Cerebras CS-1 was announced in November 2019. The CS-1 is
built around the WSE, which is 56 times larger, has 54 times more cores,
450 times more on-chip memory, 5,788 times more memory bandwidth and 20,833 times more fabric bandwidth than the leading GPU competitor, Cerebras said.

Above: Cerebras at the Lawrence Livermore National Lab.

Depending on workload, from AI to HPC, the CS-1 delivers hundreds or thousands of times more compute than legacy alternatives, and it does so at a fraction of the power draw and space.

Feldman noted that the CS-1 can finish calculations faster than real time. That means that it can start the simulation of a power plant’s reaction core when the reaction starts, but the simulation finishes before the reaction finishes.

“These dynamic modeling problems have an interesting characteristic,” Feldman said. “They scale poorly across CPU and GPU cores. In the language of the computational scientist, they do not exhibit ‘strong scaling.’ This means that beyond a certain point, adding more processors to a supercomputer does not yield additional performance gains.”

Cerebras has raised $450 million and it has 275 employees.

Source: Read Full Article