At theĀ AMD AI event where we featured the EPYC 9005 Turin CPU, AMD also launched a new AI GPU, the AMD Instinct MI325X. This accelerator is an upgrade over the MI300X, but it is also one that has changed a bit from when we have seen it previously. AMD also talked about its mid-2025 accelerator, the AMD Instinct MI355X.
AMD Instinct MI355X Announced and MI325X Launched
First off, here is the OAM packaged MI325X with its air-cooled heatsink.
It is wild to think that STH was showing similar giant OAM heatsinks back in March 2019 and now they are commonplace.
Underneath, we can see that the OAM module is labeled as the MI326X IFX OAM. We can also see this is using HBM3E memory.
In terms of specs, we heard that AMD is getting some benefit from tuning core frequencies, but that swap to HBM3E is giving it more memory. AMD now has 256GB of HBM3E onboard which is simply much more than the NVIDIA H200’s 141GB. AMD also gets a bit more memory bandwidth.
Packaged in the OAM module, the GPUs are not really meant to be deployed one at a time. Instead, they are designed to be installed in the Universal Baseboard or UBB platform with eight of the accelerators.
This quarter, NVIDIA’s key competitor is the NVIDIA H200. AMD is showing it can compete here. Of course, NVIDIA is also working to ramp Blackwell.
Next year, AMD plans to launch the AMD Instinct MI350 series with FP4 and FP6 support and also on 3nm.
AMD plans to use CDNA 4 architecture to achieve a speedup on the 2025 parts.
Between all of the advancements, AMD is saying a 35x performance increase on its next-generation.
In 2026, AMD will have a next-gen architecture with the MI400 series.
Our sense is that NVIDIA will have an update in 2026 as well.
Final Words
Looking forward to 2026, the AMD Instinct MI400 will be the big architectural overhaul. What is cool is that we are in a full-scale GPU arms race in the industry. AMD’s AI accelerator sales are significantly behind NVIDIA but also significantly ahead of Intel and many AI startups. AMD seems to be focusing on selling its GPUs with a focus on high memory capacity at the moment. With around 80% more memory capacity than the NVIDIA H200, AMD can fit larger models in single GPUs and single systems. Staying on a package has many performance, power, and cost implications for inference at scale.
Does it run PyTorch with ROCm?