Last week, we covered the AMD Instinct MI200 and how AMD is claiming it offers a 4.9x speedup in some FP64 workloads over the NVIDIA A100. This week is Supercomputing 2021 and STH is on-site. We have a look at the AMD Instinct MI250X.
Behold the AMD Instinct MI250X OAM at SC21
AMD’s booth at the show looked like many others. The space was still open with a few chairs. Instead of the large display of EPYC and GPU systems, AMD had some printed banners in the background, branding for the ~3,000 attendees (SC19 was closer to 14,000 for scale.) It seems as though the relatively bare both with printing is common with many companies that did not send large delegations due to the pandemic.
What was different with AMD’s booth is that they had a HPE Cray EX235a node on full display. This node showed an AMD EPYC-based system with the new MI200 series GPUs/ accelerators. The node also had the HPE Slingshot interconnect.
HPE’s placard said that this is the AMD Instinct MI250X so that is what we are tagging this as. There are the main two compute dies each with four HBM stacks around them.
In AMD’s architecture, these have an interconnect between the GPU die and a CPU die (CCD) on the EPYC CPUs so we have two compute dies and eight of these HBM stacks on the dual GPU package.
One can see the liquid cooling plates on all of the main components of the OAM GPUs. Also the PCBs for the Slingshot interface on the CPU side.
The tray has two CPUs each with four OAM accelerators from what we saw.
Final Words
AMD/ HPE did not specifically disclose which EPYC CPU is in this node, so we do not know if it is Trento or something else. All we can see is that this is still an 8-channel memory CPU (based on the RAM coolers) and so it is likely a Milan-era design.
There is a not a lot of hardware here at SC21, so expect coverage to be muted compared to a normal year. Not having Dell, Lenovo, Intel, AMD (really), NVIDIA, Arm and others here is taking its toll. The companies that are here have generally much more sparse displays than a normal year, so it is hard to even find a lot of HPE/ Cray hardware even though HPE is here.
Also as a quick one, I am sending these photos to Ian Cutress at AnandTech (he asked) so you may see them over there in a bit as well.
OK. Good. It is water cooled, I can put a rack in the basement and rig up cool water from the domestic supply. It won’t sound quite like a stack of jet engines. Start training models that will revolutionize the world.
Now. Can I buy these things, and where?
emerth: probably not those or not yet, but watercolling servers are usually sold by fujitsu and supermicro and both serving also small markets…
It is quite common in HPC though.
For example, the Swiss supercomputer pumps water directly from the lake of Lugano.
In that case the cooling is almost for free.
Necro update. The processor is the 7713 per the TOP 500 list.
Post Egenera, SGI, Cray heyday, and except for the radicals at GRCooling, this is one of the finest hardware layouts I’ve seen in years. I expect nothing less thoughtful and thorough hardware and software design to wear the Cray badge. I wish OCP gear was as nice.