Intel Xeon MAX 9480 Deep-Dive 64GB HBM2e Onboard Like a GPU or AI Accelerator

11

Intel Xeon Max 9480 Power Consumption and Cooling

We managed to get our dual Intel Xeon Max 9480 Intel developer system up to almost 1kW of power consumption at the wall when we had it loaded in a test configuration. At the same time, there is room to go up or down in terms of power consumption from there.

Intel Xeon Max Dev Platform With DDR5 DIMMs
Intel Xeon Max Dev Platform With DDR5 DIMMs

By far one of the biggest opportunities is HBM2e-only mode. Removing DDR5 from a system can remove a few hundred dollars (16GB DIMMs) to a few thousand of cost, but it also reduces power consumption. We saw at least 40W per socket lower power consumption or often 80-100W lower power consumption in HBM2e-only mode on a dual-socket server. Some vendors use 10W per DIMM for 160W of power savings by removing the memory.

Intel Aurora Sapphire Rapids And Ponte Vecchio HPE Cray EX Node ISC 2022 2x SPR
Intel Aurora Sapphire Rapids And Ponte Vecchio HPE Cray EX Node ISC 2022 2x SPR

There is, however, one use case where this can also shine. In liquid-cooled systems, one can use a cold plate to remove heat from the CPUs. Often, these same systems have DIMM cold plates to cool DDR5 memory. Removing the DIMMs from a system also removes the need to cool the DIMMs. When running in HBM2e-only mode, the heatsink or for liquid cooling cold plate is actually cooling the CPU and the memory.

Intel Xeon Max No DIMMs 2
Intel Xeon Max No DIMMs 2

That may not seem like a big deal, but folks who do liquid cooling in servers do not often speak fondly of liquid cooling DIMMs. HBM2e-only mode means one has a significant reduction in overall memory capacity, but also a much easier path to liquid cooling.

Getting Crazy with Intel Xeon MAX

At the launch of the 4th Gen Intel Xeon Scalable Sapphire Rapids the chips were realistically still in the manufacturing process so Intel likely did not push them as hard as they could have at the time. Intel’s initial batch of Xeon MAX was destined for the Aurora supercomputer that we think is likely to take the #1 spot on the November 2023 Top500 list. The impact of this is that many Xeon server buyers do not know they exist, or they think Xeon MAX is a HPC-only part. That is false.

Intel Xeon Max No DIMMs 1
Intel Xeon Max No DIMMs 1

Intel Xeon MAX is still a Xeon, and almost anything runs on Xeon CPUs. We wanted to show a crazy case that we doubt Intel has tested, so we installed Proxmox VE, a popular open-source virtualization, container, Ceph, and clustering solution built upon Debian linux. It worked immediately going through the normal installer routine, and it was running without issue without DDR5 in HBM2e-only mode.

2P Intel Xeon MAX 9480 HBM2e Cache Mode In Proxmox VE Running Ubuntu VM
2P Intel Xeon MAX 9480 HBM2e Cache Mode In Proxmox VE Running Ubuntu VM

Above, you can see that not only is the Debian base OS running, but we also have a Ubuntu virtual machine running. Again, Xeon MAX is a drop-in replacement for Xeon in many servers.

We then added DDR5 memory back in.

Intel Xeon Max Dev Platform Airflow With DIMMs 1
Intel Xeon Max Dev Platform Airflow With DIMMs 1

Here we can see our memory total is up to 256GB because the system is running in cache mode. We did not have to change any BIOS settings. We installed memory, turned the system on, and it was working.

2P Intel Xeon MAX 9480 HBM2e Cache Mode In Proxmox VE
2P Intel Xeon MAX 9480 HBM2e Cache Mode In Proxmox VE

Having seen a lot of Intel’s marketing on the Xeon MAX, this simple fact feels like it has been absent. Assuming your server can support the higher TDP and such and supports Intel Xeon MAX, one can drop it into the same server and start experiencing HBM accelerated Xeon compute without any changes. That is the power of caching mode and even HBM2e-only mode.

Of course, caching mode is more relevant here, but the point is, that both caching and HBM2e-only modes worked out of the box as a direct replacement for standard high-end Xeons.

Final Words

Summing this up, the “winged” Intel Xeon MAX processors come with 64GB of HBM2e memory packaged with one 16GB HBM2e stack per compute tile.

Intel Xeon Max Chip 3
Intel Xeon Max Chip 3

Despite the “wings” the processors are drop-in options for many 4th Gen Intel Xeon Scalable sockets. One has the opportunity to run the Xeon MAX in either HBM2e only mode where DDR5 is not installed alongside the CPU, or with DDR5 to increase overall memory capacity.

Intel Xeon Max Dev Platform 12
Intel Xeon Max Dev Platform 12

For workloads that depend on memory performance, adding HBM2e memory to a socket can increase the performance of the system by a significant amount, whether in traditional HPC workloads, AI workloads, or even in applications not typically discussed alongside these chips. It all comes down to how effectively the HBM2e memory can be used.

Intel Xeon Max Dev Platform Angle
Intel Xeon Max Dev Platform Angle

Given that these CPUs are options for many servers and the fact that using them can be done transparently using default features like caching mode, they are something that we would recommend looking at if you are buying new servers. If you think you might benefit from HBM2e, then our best advice is to see if you can try Xeon MAX to see how well it works for your application, even if you plan on doing little to no traditional HPC work.

11 COMMENTS

  1. Terabyte per second STREAM is spectacular – this is comparable speed from a single server to running STREAM across an entire Altix 3700 with 512 Itanium processors in 2004, and rather faster than the NEC SX-7 which was the last cry of vector supercomputers.

  2. Despite what Intel stated by power states, I’d have at least tried booting the Xeon Max chip on a workstation board. Worth a try and it would open up a slew of workstation/desktop style benchmarks. While entirely inappropriate a chip of this caliber, I’m curious how a HBM2e only chip would run Starfield as it has some interesting scaling affected by memory bandwidth and latency. Be different to have that HBM2e comparison for the subject.

  3. The open foam results don’t match between the two plots. Where one says hbm2e only is 1.85 times faster and the other says it’s only 1.05 times faster.

  4. Can these be plugged into a normal workstation motherboard socket? as in a few years when these come on the market that mortels can buy off of ebay we wantto play with them in normal motherboards with normal cooling air cooling solutions

  5. I had no idea that they’re able to run virtualization. I remember that I’d seen them at launch but I was under the impression that they’re only for HPC and that they’d done no virtualization and acceleration because of it. We’re not a big IT outfit, only buying around 1000/servers/year but we’re going to check this out. Even at our scale it could be useful

  6. Is that a real Proxmox VE pic? I didn’t think these could run virtual machines. Why didn’t Intel just call these an option if so. That 32c 64gb part sounds chill

  7. It’s possible virtualization is not an advertised feature because there are too many information-leaking side channels.

    At any rate, as demonstrated by the Fujitsu A64FX a couple years ago, placing HBM on the CPU package makes GPUs unnecessary and is easier to program. After the technology has been monetised at the high end, I expect on-package HBM will be cheaper than GPU acceleration as well.

  8. Thank god there’s a good review of this tech that normal people can understand. This is the right level STH. I’m finally understanding this tech after years of hearing about it.

  9. That STREAM benchmark result is impressive.

    My 4GHz 16 core desktop computer copies value of double arrays at 58GB/sec, according to my STREAM build with MSVC, and I consider it as pretty decent, because it copies 15 bytes per 1 CPU clock cycle.

    intel compiler should optimize STREAM for loop of double array copy with very efficient SIMD instructions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.