Intel Shows GPU Max 1550 Performance and Gaudi3 AI Updates at SC23

0
Intel SC23 Updated Habana Gaudi3 Slide Memory BW Not Capacity
Intel SC23 Updated Habana Gaudi3 Slide Memory BW Not Capacity

At SC23, Intel had a number of announcements, but it also took a different approach. Instead of having its performance team show off the performance of its new hardware, it let Argonne National Lab show off performance data using the systems it has access to as a major US supercomputing center.

Intel Shows GPU Max 1550 Performance at SC23

Last year we were shown the OAM modules for the Intel Data Center GPU Max Series OAM modules that are the GPU Max 1550 series now.

Intel SC22 Data Center GPU Max 1350 And 1550 OAM
Intel SC22 Data Center GPU Max 1350 And 1550 OAM

This year, with Aurora installed, Argonne has performance numbers on portions of its new supercomputer. One way it is showing that is with its performance of the Max 1550 versus the AMD MI250 and the NVIDIA A100 that it has in other platforms.

Intel SC23 Argonne Intel AMD NVIDIA Comparison 1
Intel SC23 Argonne Intel AMD NVIDIA Comparison 1

It was a bit of a bummer that we did not get to see H100 numbers here, but with FP64 performance NVIDIA does not get the same generational scaling it can show on the AI side by dropping from FP16 for the A100 to the FP8 Transformer Engine on the H100 (and L40S). As a result, this is a better comparison than using the A100 on the AI side.

Intel SC23 Argonne Intel AMD NVIDIA Comparison 2
Intel SC23 Argonne Intel AMD NVIDIA Comparison 2

Aurora is built, but still being tuned so we are not going to get a full system Top500 submission to crack the #1 spot. Still, that has freed up the capacity to sell the GPUs to other parties and through OAM modules.

Intel SC23 Where To Get HPC Accelerators
Intel SC23 Where To Get HPC Accelerators

We saw a number of these during our Touring the Intel AI Playground: Inside the Intel Developer Cloud piece. We are even using one of the Max 1100 series Sueprmicro systems from the IDC tour as the YouTube cover image.

Still, a big focus is on AI with Gaudi.

Intel Gaudi2, Gaudi3, and Falcon Shores at SC23

Intel just had a big win with Gaudi2 with NVIDIA’s marketing slide showing Intel Gaudi2 is 4x better performance per dollar than its H100. We are not going to cover that again, but here is another test from MLPerf Inference. On GPT-J-6B inference it is in the same ballpark as the current NVIDIA H100 generation with both the Gaudi2 and the Max 1550 series is a step beyond the A100.

Intel SC23 Intel Data Center Max GPU 1550 And Gaudi To NVIDIA H100
Intel SC23 Intel Data Center Max GPU 1550 And Gaudi To NVIDIA H100

Intel showed off the 96GB HBM2E Gaudi2 architecutre.

Intel SC23 Gaudi2 Overview
Intel SC23 Gaudi2 Overview

Today, NVIDIA H200 with 141GB of HBM3e was launched at SC23. What should be interesting for folks is that Intel will have its own 144GB part in 2024 with Gaudi3. Update 2024-11-15: There was a mistake on the original slide that said 1.5x HBM Capacity. That is now “BW” or Bandwidth. 144GB (96GB x 1.5x) seemed strange in the eight HBM packages shown. Our best guess is that Gaudi3 might be targeting 8x 16GB stacks for 128GB, but it would be exciting if we were incorrect on that. 

Intel SC23 Updated Habana Gaudi3 Slide Memory BW Not Capacity
Intel SC23 Updated Habana Gaudi3 Slide Memory BW Not Capacity

That could make things really interesting on the AI side. What should give us pause is the Gaudi3 diagram above. It has eight HBM stacks in that photo so getting 1.5x HBM capacity with 1.33x as many HBM stacks puts us into an interesting capacity point. The networking side is important since unlike NVIDIA solutions where one generally uses Infiniband, Gaudi’s approach from the outset was to run over Ethernet and integrate that in the accelerator.

Intel SC23 Updated Falcon Shores Slide With HBM3e
Intel SC23 Updated Falcon Shores Slide With HBM3e

Something that seems a bit odd in the announcement is that Falcon Shores, the unification of the GPU and AI lines of today, is still HBM3, not HBM3e. It is still some time off in ~2025 so perhaps that will change, but NVIDIA should be past the first-gen HBM3e equipped H200 at that point and AMD is not sitting still. Update 2024-11-15: There was a mistake on the original slide that said HBM3. Intel updated the slide to say HBM3e.

Final Words

Overall, it is a bummer that Aurora is not going to take the #1 spot on the Top500 this week, but it is great to see Intel get more competitive on the accelerator side.

Intel also trumpeted the benefits of the Data Center MAX CPU series, but we already did an Intel Xeon MAX 9480 Deep-Dive with 64GB HBM2e onboard.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.