NVIDIA Blackwell Platform at Hot Chips 2024

0
NVIDIA Roadmap 2024-08
NVIDIA Roadmap 2024-08

It is easy to say that NVIDIA Blackwell will sell like hotcakes in 2025. The company went into the platform architecture a bit more at Hot Chips 2024. Blackwell is something a lot of folks are excited about in the industry. As a quick note, for some reason NVIDIA had strange slides, so there is a lot of white space when doing this quickly. This was the one strange PDF of the over a dozen posted today. Sorry for that. On the plus side, NVIDIA showed its latest data center roadmap.

Please note that we are doing these live at Hot Chips 2024 this week, so please excuse typos.

NVIDIA Blackwell Platform at Hot Chips 2024

NVIDIA is not talking about the individual GPU as much as it is talking about the cluster level for AI. That makes a lot of sense especially if you see talks from large AI shops like the OpenAI Keynote on Building Scalable AI Infrastructure at Hot Chips 2024.

NVIDIA Blackwell Hot Chips 2024_Page_04
NVIDIA Blackwell Hot Chips 2024_Page_04

NVIDIA does not just focus on building the hardware cluster, but it also the software with optimized libraries.

NVIDIA Blackwell Hot Chips 2024_Page_05
NVIDIA Blackwell Hot Chips 2024_Page_05

The NVIDIA Blackwell platform spans from the CPU and GPU compute, to the different types of networks used for interconnects. This is chips to racks and interconnects, not just a GPU.

NVIDIA Blackwell Hot Chips 2024_Page_06
NVIDIA Blackwell Hot Chips 2024_Page_06

We did a fairly in-depth look at Blackwell during the NVIDIA GTC 2024 Keynote earlier this year.

NVIDIA Blackwell Hot Chips 2024_Page_08
NVIDIA Blackwell Hot Chips 2024_Page_08

The GPU is huge. One of the big features is the NVLink-C2C to the Grace CPU.

NVIDIA Blackwell Hot Chips 2024_Page_09
NVIDIA Blackwell Hot Chips 2024_Page_09

As NVIDIA’s newest, GPU is also its highest performance one.

NVIDIA Blackwell Hot Chips 2024_Page_10
NVIDIA Blackwell Hot Chips 2024_Page_10

NVIDIA uses the NVIDIA High-Bandwidth Interface (NV-HBI) to provide 10TB/s of bandwidth between the two GPU dies.

NVIDIA Blackwell Hot Chips 2024_Page_11
NVIDIA Blackwell Hot Chips 2024_Page_11

The NVIDIA GB200 Superchip is the NVIDIA Grace CPU and two NVIDIA Blackwell GPUs in a half-width platform. Two of these side-by-side means that each compute tray has four GPUs and two Arm CPUs.

NVIDIA Blackwell Hot Chips 2024_Page_12
NVIDIA Blackwell Hot Chips 2024_Page_12

NVIDIA has new FP4 and FP6 precision. Lowering the precision of compute is a well-known way to increase performance.

NVIDIA Blackwell Hot Chips 2024_Page_15
NVIDIA Blackwell Hot Chips 2024_Page_15

NVIDIA Quasar Quantization is used to figure out what can use lower precision, and therefore less compute and storage.

NVIDIA Blackwell Hot Chips 2024_Page_16
NVIDIA Blackwell Hot Chips 2024_Page_16

FP4 for Inference NVIDIA says can get close to BF16 performance in some cases.

NVIDIA Blackwell Hot Chips 2024_Page_17
NVIDIA Blackwell Hot Chips 2024_Page_17

Here is an image generation task using FP16 inference and FP4. These rabbits are not the same, but they are fairly close at a quick glance.

NVIDIA Blackwell Hot Chips 2024_Page_18
NVIDIA Blackwell Hot Chips 2024_Page_18

NVIDIA says AI models are growing.

NVIDIA Blackwell Hot Chips 2024_Page_20
NVIDIA Blackwell Hot Chips 2024_Page_20

The PHY has become a big deal because part of NVIDIA’s secret sauce is being able to ship data around different parts of systems over NVLink more efficiently than with other technologies.

NVIDIA Blackwell Hot Chips 2024_Page_21
NVIDIA Blackwell Hot Chips 2024_Page_21

The NVLink Switch Chip and NVLink switch tray are designed to push a ton of data at lower power than simply using an off-the-shelf solution like Ethernet.

NVIDIA Blackwell Hot Chips 2024_Page_22
NVIDIA Blackwell Hot Chips 2024_Page_22

NVLink has done this from 2016 with eight GPUs to 72 GPUs in the current generation. Conveniently, the NVIDIA NVSwitch Details at Hot Chips 30 talk that went into the 16-GPU NVSwitch DGX-2 topology was left out.

NVIDIA Blackwell Hot Chips 2024_Page_23
NVIDIA Blackwell Hot Chips 2024_Page_23

NVIDIA is showing the GB200 NVL72 and NVL36. The NVL36 is the 36 GPU version for those data centers that cannot handle 120kW racks.

NVIDIA Blackwell Hot Chips 2024_Page_24
NVIDIA Blackwell Hot Chips 2024_Page_24

With Spectrum-X, Spectrum-4 (similar to the Marvell Teralynx 10 a 51.2T Ethernet switch) plus BlueField-3 yield a combined solution for RDMA networking over Ethernet. In a way, NVIDIA is already doing some of the things that the UltraEthernet Consortium will introduce in future generations.

NVIDIA Blackwell Hot Chips 2024_Page_25
NVIDIA Blackwell Hot Chips 2024_Page_25

The GB200 NVL72 is designed for the trillion parameter AI.

NVIDIA Blackwell Hot Chips 2024_Page_26
NVIDIA Blackwell Hot Chips 2024_Page_26

With increasing model sizes, splitting workloads across multiple GPUs is imperative.

NVIDIA Blackwell Hot Chips 2024_Page_27
NVIDIA Blackwell Hot Chips 2024_Page_27

Blackwell is big enough to handle expert models in one GPU.

NVIDIA Blackwell Hot Chips 2024_Page_28
NVIDIA Blackwell Hot Chips 2024_Page_28

NVIDIA is showing the GPT-MoE 1.8T performance.

NVIDIA Blackwell Hot Chips 2024_Page_29
NVIDIA Blackwell Hot Chips 2024_Page_29

Here is the new NVIDIA Roadmap slide. With 1.6T ConnectX-9 in 2026, that means that NVIDIA seems to be pointing to the need for PCIe Gen7 since PCIe Gen6 x16 cannot handle 1.6T network connections. Perhaps one could use multi-host, but this is exciting.

Screenshot
NVIDIA Roadmap 2024-08

Here is the quick summary.

NVIDIA Blackwell Hot Chips 2024_Page_32
NVIDIA Blackwell Hot Chips 2024_Page_32

Always good, but hopefully, we see more Blackwell arrive this year.

Final Words

A lot of this we have seen before, save perhaps the roadmap slide. What is somewhat interesting is that we are sitting in a conference where there are a lot of AI accelerators. At the same time, NVIDIA is not just building clusters, but it is also optimizing everything, including the interconnects, switch chips, and even the deployment models. A challenge AI startups have is that NVIDIA is not just making today’s chips, switches, NICs, and more. Instead, it is doing frontier research so that its next-generation products meet the needs of future models at a cluster level. That is a big difference.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.