AI: In the latest performance test, Nvidia ousts the competition


Chip giant Nvidia was already casting a long shadow over the world of artificial intelligence. And yet, its ability to squeeze competition out of the market could grow, if the latest benchmark test results are to be believed.

MLCommons, the industry consortium that oversees a popular test of machine learning performance, MLPerf, released the latest numbers on Wednesday regarding the “training” of artificial neural networks. In three years, Nvidia has had only one competitor: processor giant Intel.

In previous rounds, including the most recent, in June, Nvidia had two or more competitors. Among them, Intel of course, but also Google, with its “Tensor Processing Unit” (or TPU) chip, as well as the British start-up Graphcore. In the past, Chinese telecommunications giant Huawei also competed with it.

Nvidia grabs the high scores

For lack of competition, Nvidia this time swept all the best scores while in June, the company had shared first place with Google.

Nvidia showed off systems using its A100 GPU, which has been out for several years, as well as its newest H100, known as the “Hopper” GPU – in honor of computing pioneer Grace Hopper. The H100 received the highest score in one of eight benchmark tests, for so-called recommender systems commonly used to suggest products on the web.

Intel offered two systems using its Habana Gaudi2 chips, as well as systems labeled “preview” that featured its upcoming Xeon sever chip, codenamed “Sapphire Rapids.” Intel’s systems proved to be much slower than Nvidia’s.

world records

“H100 (aka Hopper) GPUs set world records for training models in all eight MLPerf enterprise workloads. They delivered up to 6.7x more performance than previous generation GPUs when first subjected to MLPerf training. By the same comparison, today’s A100 GPUs are 2.5 times more powerful, thanks to software advancements,” Nvidia said in a press release.

Dave Salvator, Nvidia’s senior product manager for AI and cloud, spoke at a press conference about Hopper’s performance improvements and software tweaks to the A100, showing both how Hopper speeds up performance compared to the A100 – a test of Nvidia against Nvidia, in other words – and how capable it is of stomping Intel’s Gaudi2 chips and Rapids Sapphire at the same time.

Google and Graphcore absent from the competition

While the absence of different suppliers is noteworthy this time around, it is not in itself a sign of a trend. Indeed, in previous rounds of MLPerf, individual vendors had already opted out of the competition, only to return in a later round.

Asked by ZDNet about its non-participation in this round, Google declined to comment.

Graphcore told ZDNET that it had decided to spend its engineers’ time on things other than the weeks or months it takes to prepare submissions for MLPerf.

“The issue of diminishing returns has been raised,” Graphcore communications manager Iain McKenzie told ZDNet, “in the sense that there will be an inevitable jump to infinity, seconds less, system configurations always bigger highlights”.

Graphcore “may participate in future rounds of MLPerf, but at this time this does not reflect the areas of AI where we see the most exciting progress,” he adds. Instead, “we really want to focus our energies” on “unlocking new capabilities for AI practitioners,” he says. To that end, the head of communications “you can expect to see some exciting progress soon” from Graphcore.

Different levels of performance

Besides Nvidia’s chips dominating the competition, all of the top-scoring computer systems were those built by Nvidia rather than those of its partners. This is also a change from previous editions of the benchmark test. Usually certain vendors, like Dell, get top marks for systems they’ve built with Nvidia chips. This time around, no system vendor has been able to beat Nvidia in having its chips used by Nvidia itself.

MLPerf training evaluation tests indicate the number of minutes it takes to adjust “weights” or neural parameters until the computer program achieves a minimum precision required for a given task, a process called “training”. a neural network, where a shorter time is preferable.

Although top scores often grab the headlines – and are hyped by vendors – in reality, MLPerf results include a wide variety of systems and a wide range of scores, not just one score designating the best. product.

Beyond High Scores

David Kanter, the executive director of MLCommons, explained to ZDNet on this subject that it should not focus only on the high scores. According to him, the value of the test suite for companies considering buying AI hardware is to have a wide range of systems of different sizes and different types of performance.

The entries, which number in the hundreds, range from machines with a few ordinary microprocessors to machines with thousands of AMD host processors and thousands of Nvidia GPUs, the type of systems that score the best.

“When it comes to ML training and inference, there is a wide variety of needs for all different levels of performance,” David Kanter tells ZDNET, “and part of the goal is to provide metrics performance that can be used at all these different scales”.

“There is as much value in information about some of the smaller systems as there is in larger-scale systems,” he adds. “These systems are all equally relevant and important, but perhaps for different people. »

Reacting to the lack of participation from Graphcore and Google this time around, David Kanter says he would like to “see more applications”. “I understand that for many companies there may be a choice to be made about the investment of engineering resources,” he says nonetheless.

“I think you’ll see these things come and go over time in different cycles” of the benchmark, he adds.

Scores that sometimes regress

The scarcity of competition for Nvidia has resulted in an interesting side effect: the top scores for some training tasks didn’t necessarily show an improvement over the previous time – there was actually a regression instead.

For example, in the venerable ImageNet task, where a neural network is trained to assign a classification label to millions of images, the best result this time around was the same as the third-place finish in June, a system built by Nvidia that took 19 seconds to form. This result was beaten by results from Google’s “TPU” chip, which took only 11.5 seconds and 14 seconds.

Asked about the repetition of a previous application, Nvidia clarified that its attention is on the H100 chip this time, and not on the A100. Nvidia also notes the progress made since the very first results of the A100 in 2018. In this cycle of training benchmarks, an eight-lane Nvidia system took almost 40 minutes to train ResNet-50. In this week’s results, that time had been reduced to less than 30 minutes.

Set up standards

Asked about MLPerf’s lack of competitive offerings and viability, Dave Salvatore told reporters, “That’s a good, fair question,” before responding that the company does “everything [qu’elle peut] to encourage participation; industry benchmarks thrive on participation”.

“We hope that as new solutions continue to be released by others, they will want to show the benefits and quality of these solutions in an industry standard benchmark, rather than coming up with their own. one-off performance claims, which are very difficult to verify,” he says.

According to him, one of the key elements of MLPerf is the rigorous publication of test configuration and code, so that the results are clear and consistent among the hundreds of applications from dozens of companies.

Along with practice test results for MLPerf, MLCommons also released test results for supercomputing, meaning scientific computing and supercomputers, on Wednesday. These submissions included a mix of systems from Nvidia and its partners, as well as Fujitsu’s Fugaku supercomputer which uses its own chips.

TinyML: GreenWaves from Grenoble stands out

A third competition called TinyML measures the performance of low-power chips and embedded chips in inference, the part of machine learning where a trained neural network makes predictions.

This competition, which Nvidia has not entered so far, features an interesting diversity of chips and entries from vendors such as chipmakers Silicon Labs and Qualcomm, European tech giant STMicroelectronics and start-up OctoML, Syntiant and GreenWaves Technologies.

In one of the tests of TinyML, an image recognition test using the CIFAR dataset and the ResNet neural network, GreenWaves (headquartered in Grenoble) obtained the highest score for the lowest latency in processing the data and making a prediction. The company presented its artificial intelligence accelerator Gap9 associated with a RISC processor.

According to GreenWaves, Gap9 “allows extraordinarily low power consumption on neural networks of medium complexity such as the MobileNet series in classification and detection tasks, but also on complex mixed-precision recurrent neural networks.”

Source: ZDNet.com





Source link -97