AMD MI250X and Toplogies Explained at HC34

0
AMD MI250X MVM At HC34 Floorplan
AMD MI250X MVM At HC34 Floorplan

At Hot Chips 34, AMD talked a bit more about making its MI250X GPU and the system topologies with it.  We first saw the AMD Instinct MI250X OAM at SC21 and since then HPE and AMD Powered Frontier Tops the Top500 Ushering in Exascale. This is the GPU and systems architecture powering the #1 Top500 supercomputer today.

Note: We are doing this piece live at HC34 during the presentation so please excuse typos.

AMD MI250X and CDNA 2 HC34

AMD has its floorplan of the MI250X. This is 58B in 6nm TSMC. The NVIDIA H100 is 80B in N4. Here we get the key features like the 128GB of HBM2e, the PCIe/ Infinity Fabric links, coherent links to the CPU, and more. Between the two GPUs on the MI250X, there is a 400GB/s link on the OAM package. The PCIe link allows a NIC to be directly attached to each side of the GPU.

AMD MI250X MVM At HC34 Floorplan
AMD MI250X MVM At HC34 Floorplan

Just to keep things in perspective, the MI250X is a single OAM package, but it is effectively two GPUs tied together. When you see “eight stacks of HBM2E” the way to think of this is that each GPU die has four of them. AMD says the total package is 10 chips and uses an elevated fan-out bridge.

HC34 AMD Instinct MI200 Series Key Innovations
HC34 AMD Instinct MI200 Series Key Innovations

AMD also split its GPU architectures into RDNA for graphics and CDNA for CPU compute. We call the MI250X a GPU, but perhaps it is more like a GPU compute architecture.

HC34 AMD CDNA Domain Specific
HC34 AMD CDNA Domain Specific

Between the CDNA optimization, along with the new improvements in CDNA2, it gets around twice the performance per watt on double precision workloads than the MI100 generation.

HC34 AMD CDNA 2 2x Performance Per Watt
HC34 AMD CDNA 2 2x Performance Per Watt

With CDNA 2, each compute unit gets a matrix core unit as well. This matrix compute feature is at a high level similar to other matrix math features from companies like NVIDIA and Intel. AMD has a different implementation that was designed to drive efficiency as well as performance.

HC34 AMD CDNA 2 Compute Unit With Enhanced Matrix Cores
HC34 AMD CDNA 2 Compute Unit With Enhanced Matrix Cores

A major implementation detail is also how the L2 cache and the memory works. Here each GCD gets 8MB of L2 cache. The L2 cache design is needed to feed the compute units with data.

HC34 AMD CDNA 2 Memory And Cache Hierarchy
HC34 AMD CDNA 2 Memory And Cache Hierarchy

We have seen this before, but AMD says that its new version is faster than its old version. What is also interesting here is that AMD has effectively two co-packaged GPUs with the MI250X. So only a 2x speedup feels more like having more GPUs.

AMD MI250X MVM At HC34 Performance
AMD MI250X MVM At HC34 Performance

AMD’s 3rd generation Infinity Architecture looks very cool in the photo, but what it effectively does is map an AMD EPYC 7003 Milan CCD to one of the GPU halves on the MI250/ MI250X.

HC34 AMD Infinity Architecture Gen3
HC34 AMD Infinity Architecture Gen3

The biggest theme we hear about from AMD is minimizing data movement.

Building Systems with AMD MI250X and MI250 OAM GPU Topologies

This is the HPC topology from AMD that is being used in Frontier. AMD says that the EPYC is customized, but has not provided much on what those customizations are.

HC34 AMD MI250X HPC Topology
HC34 AMD MI250X HPC Topology

The HPE Cray EX235a is the node that is powering modern supercomputers.

HC34 AMD MI250X HPE Cray EX235A
HC34 AMD MI250X HPE Cray EX235A

You may have also seen this in our Top 10 Showcases of SC21 piece as well as our How Liquid Cooling is Prototyped and Tested in the CoolIT Liquid Lab Tour. These nodes are heavy!

HPE Cray Shasta Node In The CoolIT Liquid Lab 2
HPE Cray Shasta Node In The CoolIT Liquid Lab 2

The AMD-HPE blade has the #1 and #3 spots on the Top500 list and leads on the efficiency-focused Green500 as well.

HC34 AMD MI250X HPE Cray EX235a Efficiency
HC34 AMD MI250X HPE Cray EX235a Efficiency

AMD has a PCIe Gen4 compatible platform using PCIe switches. The PCIe switches are used for RDMA network transfers.

HC34 AMD MI250 Mainstream MI250 GPU Topology
HC34 AMD MI250 Mainstream MI250 GPU Topology

AMD also has an AMD Instinct MI250 machine learning topology. This has 8x GPUs and 2x EPYC CPUs.

HC34 AMD MI250 AI ML MI250 GPU Topology
HC34 AMD MI250 AI ML MI250 GPU Topology

Gigabyte has a mainstream topology 2U server called the G262-ZO0.

HC34 AMD MI250 AI ML MI250 GPU Topology Gigabyte G262 ZO0
HC34 AMD MI250 AI ML MI250 GPU Topology Gigabyte G262 ZO0

The Supermicro AS-4124GQ-TNMI is also a mainstream topology MI250 server.

HC34 AMD MI250 AI ML MI250 GPU Topology Supermicro AS 4124GQ TNMI
HC34 AMD MI250 AI ML MI250 GPU Topology Supermicro AS 4124GQ TNMI

This is very interesting since they should be in similar power envelopes to the Dell EMC PowerEdge XE8545 and Supermicro SYS-220GQ-TNAR+ Redstone platforms we have looked at.

Final Words

This is a very cool presentation. We have been watching the MI250X and MI250 progress for a few years now, into running the #1 Top500 supercomputer. In the background, we are also working on getting some AMD GPU content online. Stay tuned for AMD GPU server reviews that may be the first since we did the 2015 ASRock Rack server review with 8x AMD FirePro GPUs!

AMD FirePro W9100
AMD FirePro W9100

AMD has sure come a long way since then!

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.