Last week, AMD flew the global press and analyst folks to Los Angeles, only a few blocks from where I lived in law school. At the event, AMD detailed three key architectures that are making it into 2024 products. AMD Zen 5 is the next generation of CPU cores used in most segments of AMD, including servers. The AMD RDNA 3.5 is a significantly faster GPU IP block that takes learnings from AMD’s license of GPU IP to Samsung for its mobile phones and brings that efficiency to the new integrated GPU. Finally, XDNA 2 is the Xilinx-derived NPU accelerator. All three of these are AMD’s newest chip building blocks, which we will cover in turn.
AMD Zen 5 – New CPU Cores
AMD is taking a very different approach to the market compared to Intel. While Intel is bifurcating its architecture into small power-efficient cores with different capabilities, AMD is building a Zen 5 core with two flavors. Zen 5 is the performance variant with full cache, while Zen 5c is the area-optimized variant with less cache. AMD’s reasoning is twofold. First, having the same ISA makes managing in single systems and across an ecosystem easier than having two core architectures. Second, having a single architecture and scale in this manner is much less costly.
AMD is claiming an IPC uplift of 16% across a basket of workloads.
AMD had this later in its presentation, but here is where AMD Zen 5 is getting its generational improvements from.
The smaller bucket is the improved fetch and branch prediction capabilities on Zen 5. A good chunk of performance, however comes from better dual decode pipes and Opcache.
The big bucket is making a wider dispatch and execute engine. This is a fairly common technique to get more throughput from the same number of cores.
AMD also has done work on the L1 cache to ensure it is feeding the execution units. In Zen 5 L1 and L2 caches are private. The L3 cache is the shared cache level. Current designs will have separate L3 caches for Zen 5 and Zen 5c cores in order to let each hit different performance and power optimization targets.
AMD has a full 512-bit data path for things like AVX-512 instead of “double pumping” a 256-bit path.
Of course, Zen 5 and Zen 5c will underpin the AMD Turin family coming later in 2024. 16% IPC uplift and 33% more cores mean that the 128-core Zen 5 part might be around 50% faster than the 96-core Genoa without any other factors coming into play.
Zen 5 will be a big deal for AMD. AMD is set to lose the server performance crown in September 2024 as Intel rolls out its 128 P-core Granite Rapids-AP line. Our best guess is that Granite Rapids-AP will launch around Intel Innovation in September. AMD will not want to fall that far behind for that long, so we expect Turin to debut in time for Supercomputing 2024 in November. This will be the first time in seven years that AMD and Intel will be at P-core count parity in the data center.
Still, the Los Angeles event was about desktop and mobile, so let us get to the RDNA 3.5 update.
AMD RDNA 3.5 – Updated iGPU Graphics
AMD RDNA 3.5 is a big enough update to the AMD Radeon 780M that we have been using for some time that it is a “.5” version. At the same time, it is not a big enough jump for RDNA 4. The big focus was on improving performance, but also performance per watt.
AMD says that its RDNA 3.5 parts are 19-32% faster, which is a huge jump. AMD is likely picking very favorable comparison points here.
The AMD RDNA 3.5 gets performance from a number of different areas. One interesting point was that AMD looked at how its phone GPU IP used optimization techniques to streamline requests to memory, which tend to use a lot of power.
The net is that we get a more power-efficient GPU, but the prominent feature of the AMD Ryzen AI 300 series is really the XDNA 2 NPU. Next, let us get to the AMD XDNA 2 NPU.
The problem I have is that the desktop version of Zen5 will use RDNA2. Which really is the shame. My computer is not only a workstation, I do need the CPU power to do compiles. I don’t use the graphics capabilities so much that I would need an additional graphics card. The main problem with RDNA is the use of DisplayPort 1.4.