Architecture Trifecta AMD Zen 5 RDNA 3.5 and XDNA 2

1

AMD XDNA 2 – Faster NPU

This was perhaps the coolest slide that we have seen on “Why a NPU?” The performance per watt of a NPU is high, albeit with very focused capabilities around data types and types calculations.

AMD XDNA AI Why NPU
AMD XDNA AI Why NPU

AMD says its new XDNA 2 is 5x the compute capacity at 2x the power efficiency.

AMD XDNA 2 Compute And Power
AMD XDNA 2 Compute And Power

It is worth noting that the AMD XDNA AI architecture feels like an evolution of the Xilinx  Xilinx AI Engines. There are a bunch of AI Engines with a fabric and memory. AMD does not include the NPU memory in its other chip cache figures, but they are distinct.

AMD XDNA Architecture 2H 2024
AMD XDNA Architecture 2H 2024

With XDNA 2, one can partition the NPU to have different sets of AI engines accelerating different features, a dataflow architecture, and a programmable interconnect. The partitioning is interesting since when you see AI PC demos in 2024, they are accelerating 1-2 applications (perhaps one on the NPU and one on an integrated GPU.) In the future, we will see many models running on a PC simultaneously.

AMD XDNA Spatial Architecture With Flexible Dataflow
AMD XDNA Spatial Architecture With Flexible Dataflow

Still, XDNA 2 is a big jump in performance. Part of that comes from having more AI Engine tiles, and another part comes from optimization. One of the more interesting questions is why stop at 32 AI Engine tiles, and not make an aggressive move against Qualcomm and Intel and make a 40 AI Engine tile or larger NPU.

AMD XDNA To XDNA 2 Architecture
AMD XDNA To XDNA 2 Architecture

AMD is offering Block Floating Point 16 or Block FP16 as its data format to solve the accuracy/ performance tradeoffs of using other data types. Of note, this is Block FP16 so it is not bfloat16.

AMD XDNA 2 Block FP16
AMD XDNA 2 Block FP16

AMD says with Block FP16 it gets smaller models comparable to a INT8 data type and similar throughput, but with much better accuracy.

AMD XDNA 2 Block FP16 Leadership
AMD XDNA 2 Block FP16 Leadership

The Block FP16 accuracy can be very close to the FP32 baseline. This is important for things like LLMs and diffusion models since it means that the outputs tend to be much more usable.

AMD XDNA 2 Block FP16 To FP32 Baseline Accuracy
AMD XDNA 2 Block FP16 To FP32 Baseline Accuracy

One of the big reasons AMD says it is using Block FP16 is that it allows FP16, FP32, and bfloat16 models to be brought to AMD NPUs.

Final Words

AMD has a trifecta of new architectures covering CPU, GPU, and NPU IP. Zen 5 and Zen 5c will power everything from high-end Turin server CPUs to Ryzen 9000 desktop and Ryzen AI 300 mobile CPUs and embedded parts.

We cannot wait to start using and being able to share our experiences with these new parts with the STH family.

1 COMMENT

  1. The problem I have is that the desktop version of Zen5 will use RDNA2. Which really is the shame. My computer is not only a workstation, I do need the CPU power to do compiles. I don’t use the graphics capabilities so much that I would need an additional graphics card. The main problem with RDNA is the use of DisplayPort 1.4.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.