NVIDIA H100 NVL for High-End AI Inference Launched

5
NVIDIA H100 HVL
NVIDIA H100 HVL

The NVIDIA H100 HVL may look like something we have seen before, but there is a big difference. We asked NVIDIA, and the company says that logically this is two GPUs to the OS, but that the NVLink will allow the full 188GB of memory to be used by the system.

NVIDIA H100 NVL for High-End AI Inference Launched

The new NVIDIA H100 NVL brings two NVIDIA H100 PCIe together with NVLink, and a twist. The new NVL version has 94GB of HBM3 memory per GPU for a total of 188GB. That likely means that the sixth 16GB stack is activated, but with only 14GB available for 94GB of the 96GB active.

NVIDIA H100 HVL
NVIDIA H100 HVL

What is really interesting as well is the TDP. These are 350W to 400W TDP PCIe cards. Generally, 300W is the top-end we see from most other vendors in PCIe cards since many servers cannot handle 400W in PCIe form factors. That is a big driver for higher-end OAM/ SXM form factors.

H100 SXM H100 PCIe H100 NVL
FP64 34 teraFLOPS 26 teraFLOPS 68 teraFLOPs
FP64 Tensor Core 67 teraFLOPS 51 teraFLOPS 134 teraFLOPs
FP32 67 teraFLOPS 51 teraFLOPS 134 teraFLOPs
TF32 Tensor Core 989 teraFLOPS1 756teraFLOPS1 1,979 teraFLOPs1
BFLOAT16 Tensor Core 1,979 teraFLOPS1 1,513 teraFLOPS1 3,958 teraFLOPs1
FP16 Tensor Core 1,979 teraFLOPS1 1,513 teraFLOPS1 3,958 teraFLOPs1
FP8 Tensor Core 3,958 teraFLOPS1 3,026 teraFLOPS1 7,916 teraFLOPs1
INT8 Tensor Core 3,958 TOPS1 3,026 TOPS1 7,916 TOPS1
GPU memory 80GB 80GB 188GB
GPU memory bandwidth 3.35TB/s 2TB/s 7.8TB/s
Decoders 7 NVDEC
7 JPEG
7 NVDEC
7 JPEG
14 NVDEC
14 JPEG
Max thermal design power (TDP) Up to 700W (configurable) 300-350W (configurable) 2x 350-400W
(configurable)
Multi-Instance GPUs Up to 7 MIGS @ 10GB each Up to 14 MIGS @ 12GB
each
Form factor SXM PCIe
Dual-slot air-cooled
2x PCIe
Dual-slot air-cooled
Interconnect NVLink: 900GB/s
PCIe Gen5: 128GB/s
NVLink: 600GB/s
PCIe Gen5: 128GB/s
NVLink: 600GB/s
PCIe Gen5: 128GB/s

Based on the specs, it seems like, assuming the NVIDIA H100 NVL specs are for 400W, that the PCIe versions are vastly superior to the H100 SXM5 versions but without the higher-end 900GB/s NVLINK interfaces. The compute specs are 2x the H100 SXM, but the NVL version has more memory, higher memory bandwidth, and uses similar power for the performance.

Final Words

Our sense is that the NVL will have to be de-rated or the H100 SXM5 will need a spec bump soon to match. This is very strange positioning. Still, NVIDIA says that OpenAI that is using DGX A100’s now for ChatGPT can replace up to 10x DGX A100 systems with four sets of NVIDIA H100 NVL pairs to do its inferencing. It will be interesting to see whether these get de-rated or an updated H100 SXM5 over time.

5 COMMENTS

  1. “Based on the specs, it seems like, assuming the NVIDIA H100 NVL specs are for 400W, that the PCIe versions are vastly superior to the H100 SXM5 versions given they are 300W less but without the higher-end 900GB/s NVLINK interfaces.”

    TDP is a design specification. Just because the SXM part can be configured to allow it to operate at 700 W doesn’t mean it needs 700 W to operate. A processor can be limited by various different things for its performance, power being just one of them. Just because a processor can draw 700 W doesn’t mean it needs to in order to perform a certain workload as fast as one that can only draw 400 W. The SXM form factor may allow for a much better cooling solution than the PCIE card. That would allow increased performance for workloads that are power-limited.

  2. “The compute specs are 2x the H100 SXM, but the NVL version has more memory, higher memory bandwidth, and uses 57% the power for the same performance.”

    I think there is an error in your reporting here. The specs quoted here are for the two cards **together**. As such the H100 SXM still enjoys a clear positioning in the market if you ask me.

  3. Seems like a vram bump, and one has to buy in pairs to get it. Which is just more market segmentation in my books.

    If, as the marketing vaguely implies it was transparent shared address space btwn the two GPU chips that’d be cool. But it actually looks like a pair of H100 connected by, OMG!, nvlink.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.