Architecture Trifecta AMD Zen 5 RDNA 3.5 and XDNA 2

1
AMD XDNA Spatial Architecture With Flexible Dataflow
AMD XDNA Spatial Architecture With Flexible Dataflow

Last week, AMD flew the global press and analyst folks to Los Angeles, only a few blocks from where I lived in law school. At the event, AMD detailed three key architectures that are making it into 2024 products. AMD Zen 5 is the next generation of CPU cores used in most segments of AMD, including servers. The AMD RDNA 3.5 is a significantly faster GPU IP block that takes learnings from AMD’s license of GPU IP to Samsung for its mobile phones and brings that efficiency to the new integrated GPU. Finally, XDNA 2 is the Xilinx-derived NPU accelerator. All three of these are AMD’s newest chip building blocks, which we will cover in turn.

AMD Zen 5 – New CPU Cores

AMD is taking a very different approach to the market compared to Intel. While Intel is bifurcating its architecture into small power-efficient cores with different capabilities, AMD is building a Zen 5 core with two flavors. Zen 5 is the performance variant with full cache, while Zen 5c is the area-optimized variant with less cache. AMD’s reasoning is twofold. First, having the same ISA makes managing in single systems and across an ecosystem easier than having two core architectures. Second, having a single architecture and scale in this manner is much less costly.

AMD Zen 5 Roadmap
AMD Zen 5 Roadmap

AMD is claiming an IPC uplift of 16% across a basket of workloads.

AMD Zen 5 Performance Uplift Summary
AMD Zen 5 Performance Uplift Summary

AMD had this later in its presentation, but here is where AMD Zen 5 is getting its generational improvements from.

AMD Zen 5 Uplift Breakdown
AMD Zen 5 Uplift Breakdown

The smaller bucket is the improved fetch and branch prediction capabilities on Zen 5. A good chunk of performance, however comes from better dual decode pipes and Opcache.

AMD Zen 5 Overview 1 Dual Pipe Fetch With Branch Prediction
AMD Zen 5 Overview 1 Dual Pipe Fetch With Branch Prediction

The big bucket is making a wider dispatch and execute engine. This is a fairly common technique to get more throughput from the same number of cores.

AMD Zen 5 Overview 2 Wider Dispatch And Execute
AMD Zen 5 Overview 2 Wider Dispatch And Execute

AMD also has done work on the L1 cache to ensure it is feeding the execution units. In Zen 5 L1 and L2 caches are private. The L3 cache is the shared cache level. Current designs will have separate L3 caches for Zen 5 and Zen 5c cores in order to let each hit different performance and power optimization targets.

AMD Zen 5 Overview 3 Increased Data Bandwidth
AMD Zen 5 Overview 3 Increased Data Bandwidth

AMD has a full 512-bit data path for things like AVX-512 instead of “double pumping” a 256-bit path.

AMD Zen 5 Overview 4 512 Bit AI Datapath
AMD Zen 5 Overview 4 512 Bit AI Datapath

Of course, Zen 5 and Zen 5c will underpin the AMD Turin family coming later in 2024. 16% IPC uplift and 33% more cores mean that the 128-core Zen 5 part might be around 50% faster than the 96-core Genoa without any other factors coming into play.

AMD EPYC Turin 2H 2024
AMD EPYC Turin 2H 2024

Zen 5 will be a big deal for AMD. AMD is set to lose the server performance crown in September 2024 as Intel rolls out its 128 P-core Granite Rapids-AP line. Our best guess is that Granite Rapids-AP will launch around Intel Innovation in September. AMD will not want to fall that far behind for that long, so we expect Turin to debut in time for Supercomputing 2024 in November. This will be the first time in seven years that AMD and Intel will be at P-core count parity in the data center.

Still, the Los Angeles event was about desktop and mobile, so let us get to the RDNA 3.5 update.

AMD RDNA 3.5 – Updated iGPU Graphics

AMD RDNA 3.5 is a big enough update to the AMD Radeon 780M that we have been using for some time that it is a “.5” version. At the same time, it is not a big enough jump for RDNA 4. The big focus was on improving performance, but also performance per watt.

AMD RDNA 3.5 Overview
AMD RDNA 3.5 Overview

AMD says that its RDNA 3.5 parts are 19-32% faster, which is a huge jump. AMD is likely picking very favorable comparison points here.

AMD RDNA 3.5 Performance
AMD RDNA 3.5 Performance

The AMD RDNA 3.5 gets performance from a number of different areas. One interesting point was that AMD looked at how its phone GPU IP used optimization techniques to streamline requests to memory, which tend to use a lot of power.

AMD RDNA 3.5 2x
AMD RDNA 3.5 2x

The net is that we get a more power-efficient GPU, but the prominent feature of the AMD Ryzen AI 300 series is really the XDNA 2 NPU. Next, let us get to the AMD XDNA 2 NPU.

1 COMMENT

  1. The problem I have is that the desktop version of Zen5 will use RDNA2. Which really is the shame. My computer is not only a workstation, I do need the CPU power to do compiles. I don’t use the graphics capabilities so much that I would need an additional graphics card. The main problem with RDNA is the use of DisplayPort 1.4.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.