Computing power: Nvidia, Dell and Qualcomm are moving upmarket on AI tasks

Graphical representation of AI tasks by Nvidia. Nvidia.

Last Tuesday, the latest benchmark test of how fast a neural network can run to make predictions was presented by MLCommons, the consortium that runs the MLPerf tests. Major vendors, such as Nvidia, Dell, Qualcomm, and Supermicro, submitted computer systems with different chip configurations to determine which systems would earn top marks.

The test is to provide the highest number of questions that can be answered per second, to provide the shortest response time, i.e. latency, or the lowest power consumption, i.e. i.e. energy efficiency.

A group of startups also participated, including Neural Magic, xFusion, cTuning, Nettrix, Neuchips, Moffett, and Krai.

MLPerf Inference 3.0

Dubbed “MLPerf Inference 3.0,” the test emulates the computational operations that occur when a trained neural network receives new data and needs to output conclusions. The benchmark test measures how quickly a computer can produce an answer for a number of tasks, including ImageNet, where the challenge is for the neural network to apply one of the labels to a photo describing the object in the photo, such as a cat or a dog.

The test results follow the MLPerf 2.1 inference presented in September 2022.

In a press release, MLCommons notes that results submitted by multiple vendors show “significant performance gains of over 60% in some benchmark tests.”

More organizations are embracing benchmark testing

The results relate to computer systems operating in data centers or in the context of edge computing, a term that now encompasses a variety of computer systems other than traditional data center machines. A spreadsheet lists the results.

The results show that more and more organizations are adhering to the benchmark tests. According to MLCommons, “a record number of 25 organizations” submitted “more than 6,700 performance results and more than 2,400 performance and energy efficiency metrics.” This figure is up from 5,300 performance measures and 2,400 energy efficiency measures in September.

Submissions are grouped into two categories: “closed” submissions and “open” submissions.

  • In the first category, the different bidders follow strict rules regarding the use of the AI ​​software, which allows the most direct comparison of systems on an equal footing.
  • In the second, applicants are allowed to use unique software approaches that do not conform to the standard rules of the benchmarks, allowing innovations to be produced.

Nvidia always in the spotlight

As is often the case, Nvidia, the main supplier of GPUs used for AI, won numerous honors for its performance in most tests. Nvidia’s system running two Intel Xeon processors and eight “Hopper” GPU chips scored first place in five of six different benchmark tasks, including running the Google BERT language model, a precursor to ChatGPT. In one of the six tasks, a Dell system using an almost identical configuration of Intel and Nvidia chips earned the best spot.

For more on Nvidia’s results, check out the company’s blog.

Qualcomm was able to achieve a three-fold increase in query throughput for the BERT language program compared to results in Cycle 2.1, the company said. A system submitted by Qualcomm using two AMD EPYC server chips and 18 “AI100” AI accelerator chips earned the highest score for the open category of data center computers on the BERT task. Its result, a throughput of 53,024 requests per second on the BERT network, is only slightly lower than the first place obtained by Nvidia in the closed category.

cTuning gets first place for lowest latency

Among the new entrants was cTuning, a Paris-based nonprofit that develops open-source tools for AI programmers to replicate benchmark test results across different hardware platforms.

cTuning took first place for the lowest latency, meaning the shortest time between submitting a request and returning the response, for four out of five tasks in the benchmark test for the edge computing, in the closed category.

Neural Magic, a startup co-founded by MIT researcher Nir Shavit, has once again put to use its software that can determine which “neural weights” of a neural network can be left unused so that they are not processed by the computer chip, which saves computer power.

x86 rises in power over GPUs

The company’s DeepSparse software is capable of using only the host CPU, an x86 chip from Intel, AMD or, in the future, ARM-based chips, without any help from Nvidia’s GPUs.

In the BERT language test in the open category for edge computing, Neural Magic’s DeepSparse software used two AMD EPYC server processors to get 5,578 responses per second from the Google BERT neural network. This result is only slightly lower than the second place obtained by Supermicro’s computer in the closed category, which was composed of two Xeon processors and an Nvidia Hopper GPU.

The company says that relying on x86 chips rather than more expensive GPUs will help spread AI to more businesses and institutions by reducing the overall cost of execution programs.

“You can get a lot more out of this consumer hardware,” Michael Goin, product engineering manager at Neural Magic, said in an interview with ZDNET. “These are the same AMD chips that companies are already using in their store or retail site to manage sales, inventory and logistics. »

To learn more about Neural Magic’s approach see the company’s blog post.

To go further on ML Perf

AI: Google and Nvidia fly over the machine learning market


Source link -97