The latest MLPerf Inference v3.0 is out with new accelerator technology being shown. Some of the highlights are the NVIDIA H100 and L4 GPUs along with several new accelerators. As always, there is a huge segment of smaller edge systems as well.
MLPerf Inference v3.0 Shows New Accelerators
MLPerf Inference has a large number of results. While the NVIDIA H100 made it into this list, and we still see the Qualcomm accelerators, the bigger items are really GPUs like the NVIDIA L4 and L40 that are new for this list. It appears as though the L4 is going to be >3x the performance of the T4, especially if NVIDIA gets its normal performance gains over the next 6-12 months of software optimizations.
On the H100 side, the Dell PowerEdge XE9680 8x NVIDIA H100 configuration was disappointing, only beating the NVIDIA DGX H100 four of the 12 benchmark numbers that it was submitted competitively against. Still, there was a LibriSpeech RNN-T speech-to-text application where it outpaced the NVIDIA system by a wide margin, and Dell was using lower-end Intel Xeon Platinum 8470 CPUs instead of the DGX’s Platinum 8480C’s.
One of the cool results is that the NVIDIA RTX 4090 was actually submitted, but we did not see submissions for NVIDIA’s professional GPUs. NVIDIA is touting its AI capabilities for things like creative application AI inference, but those GPUs are not finding their way to MLPerf Inference. That shows just how far the benchmark still has to go.
There were also submissions for the Apple silicon chips, the AMD Ryzen 7950X and more thanks to community submissions.
Other interesting submissions are that we have the AI inference performance of both the 4th generation Intel Xeon “Sapphire Rapids” and the AMD EPYC 9004 “Genoa” parts with top-bin Xeon Platinum 8480+ and AMD EPYC 9654 CPUs.
We saw a few accelerators like the Moffett cards. There were the S10, S30, and S40 parts scaling up to 80GB of memory onboard in this list.
The NeuChips RecAccel N3000 is designed specifically for recommender systems, and showed up on the list.
There was even the South Korean Rebellions Atom accelerator in the results.
There were a ton of lower-end platform submissions, and somewhere in the middle were the NVIDIA Jetson Orin NX and Jetson AGX Orin. We just covered the NVIDIA Jetson Orin Nano Developer Kit on STH.
There is a new “network” solution. Right now it is a throughput benchmark where NVIDIA submitted one Infiniband + A100 and Qualcomm submitted three results but it is one we can see becoming a latency benchmark in the future. NVIDIA has a lot of influence in MLPerf, and it is building Infiniband attached GPUs and future BlueField-4 DPUs with AI inference, so it feels like that is coming.
Final Words
While training has gotten a lot of attention over the years, that market seems to be coalescing around NVIDIA for the general market, Google TPUs for its cloud, perhaps Intel Habana as an alternative, Cerebras for large-scale training where wafer-scale compute is important, and perhaps a few others like Intel’s Xeon as they add AI-focused acceleration to their CPUs. Inference is, by far, the more exciting application. There is going to be a market for AI inference everywhere from sub-1W chips up to 1kW+ chips in the coming years.
MLPerf Inference results are still focused on what NVIDIA and perhaps Google see as their strengths. Eventually, it will have to be delineated on power envelopes. That is probably where Qualcomm will have more success than with an add-in card for servers.
L4 looks really good. Performance and power are impressive.