Intel Xeon Max 9480 Power Consumption and Cooling
We managed to get our dual Intel Xeon Max 9480 Intel developer system up to almost 1kW of power consumption at the wall when we had it loaded in a test configuration. At the same time, there is room to go up or down in terms of power consumption from there.
By far one of the biggest opportunities is HBM2e-only mode. Removing DDR5 from a system can remove a few hundred dollars (16GB DIMMs) to a few thousand of cost, but it also reduces power consumption. We saw at least 40W per socket lower power consumption or often 80-100W lower power consumption in HBM2e-only mode on a dual-socket server. Some vendors use 10W per DIMM for 160W of power savings by removing the memory.
There is, however, one use case where this can also shine. In liquid-cooled systems, one can use a cold plate to remove heat from the CPUs. Often, these same systems have DIMM cold plates to cool DDR5 memory. Removing the DIMMs from a system also removes the need to cool the DIMMs. When running in HBM2e-only mode, the heatsink or for liquid cooling cold plate is actually cooling the CPU and the memory.
That may not seem like a big deal, but folks who do liquid cooling in servers do not often speak fondly of liquid cooling DIMMs. HBM2e-only mode means one has a significant reduction in overall memory capacity, but also a much easier path to liquid cooling.
Getting Crazy with Intel Xeon MAX
At the launch of the 4th Gen Intel Xeon Scalable Sapphire Rapids the chips were realistically still in the manufacturing process so Intel likely did not push them as hard as they could have at the time. Intel’s initial batch of Xeon MAX was destined for the Aurora supercomputer that we think is likely to take the #1 spot on the November 2023 Top500 list. The impact of this is that many Xeon server buyers do not know they exist, or they think Xeon MAX is a HPC-only part. That is false.
Intel Xeon MAX is still a Xeon, and almost anything runs on Xeon CPUs. We wanted to show a crazy case that we doubt Intel has tested, so we installed Proxmox VE, a popular open-source virtualization, container, Ceph, and clustering solution built upon Debian linux. It worked immediately going through the normal installer routine, and it was running without issue without DDR5 in HBM2e-only mode.
Above, you can see that not only is the Debian base OS running, but we also have a Ubuntu virtual machine running. Again, Xeon MAX is a drop-in replacement for Xeon in many servers.
We then added DDR5 memory back in.
Here we can see our memory total is up to 256GB because the system is running in cache mode. We did not have to change any BIOS settings. We installed memory, turned the system on, and it was working.
Having seen a lot of Intel’s marketing on the Xeon MAX, this simple fact feels like it has been absent. Assuming your server can support the higher TDP and such and supports Intel Xeon MAX, one can drop it into the same server and start experiencing HBM accelerated Xeon compute without any changes. That is the power of caching mode and even HBM2e-only mode.
Of course, caching mode is more relevant here, but the point is, that both caching and HBM2e-only modes worked out of the box as a direct replacement for standard high-end Xeons.
Final Words
Summing this up, the “winged” Intel Xeon MAX processors come with 64GB of HBM2e memory packaged with one 16GB HBM2e stack per compute tile.
Despite the “wings” the processors are drop-in options for many 4th Gen Intel Xeon Scalable sockets. One has the opportunity to run the Xeon MAX in either HBM2e only mode where DDR5 is not installed alongside the CPU, or with DDR5 to increase overall memory capacity.
For workloads that depend on memory performance, adding HBM2e memory to a socket can increase the performance of the system by a significant amount, whether in traditional HPC workloads, AI workloads, or even in applications not typically discussed alongside these chips. It all comes down to how effectively the HBM2e memory can be used.
Given that these CPUs are options for many servers and the fact that using them can be done transparently using default features like caching mode, they are something that we would recommend looking at if you are buying new servers. If you think you might benefit from HBM2e, then our best advice is to see if you can try Xeon MAX to see how well it works for your application, even if you plan on doing little to no traditional HPC work.
Terabyte per second STREAM is spectacular – this is comparable speed from a single server to running STREAM across an entire Altix 3700 with 512 Itanium processors in 2004, and rather faster than the NEC SX-7 which was the last cry of vector supercomputers.
Thanks for the power state info – I was wondering about 14 core/16GB HBM/dual memory consumer version. Oh well!
Despite what Intel stated by power states, I’d have at least tried booting the Xeon Max chip on a workstation board. Worth a try and it would open up a slew of workstation/desktop style benchmarks. While entirely inappropriate a chip of this caliber, I’m curious how a HBM2e only chip would run Starfield as it has some interesting scaling affected by memory bandwidth and latency. Be different to have that HBM2e comparison for the subject.
The open foam results don’t match between the two plots. Where one says hbm2e only is 1.85 times faster and the other says it’s only 1.05 times faster.
Can these be plugged into a normal workstation motherboard socket? as in a few years when these come on the market that mortels can buy off of ebay we wantto play with them in normal motherboards with normal cooling air cooling solutions
I had no idea that they’re able to run virtualization. I remember that I’d seen them at launch but I was under the impression that they’re only for HPC and that they’d done no virtualization and acceleration because of it. We’re not a big IT outfit, only buying around 1000/servers/year but we’re going to check this out. Even at our scale it could be useful
@Todd, Shhhhh! Quiet! Lest Intel hear you and fuse off the functionaility as they used to do…
Is that a real Proxmox VE pic? I didn’t think these could run virtual machines. Why didn’t Intel just call these an option if so. That 32c 64gb part sounds chill
It’s possible virtualization is not an advertised feature because there are too many information-leaking side channels.
At any rate, as demonstrated by the Fujitsu A64FX a couple years ago, placing HBM on the CPU package makes GPUs unnecessary and is easier to program. After the technology has been monetised at the high end, I expect on-package HBM will be cheaper than GPU acceleration as well.
Thank god there’s a good review of this tech that normal people can understand. This is the right level STH. I’m finally understanding this tech after years of hearing about it.
That STREAM benchmark result is impressive.
My 4GHz 16 core desktop computer copies value of double arrays at 58GB/sec, according to my STREAM build with MSVC, and I consider it as pretty decent, because it copies 15 bytes per 1 CPU clock cycle.
intel compiler should optimize STREAM for loop of double array copy with very efficient SIMD instructions