Intel Ponte Vecchio is a Spaceship of a GPU

6
Intel Architecture Day 2021 Ponte Vecchio OAM Module
Intel Architecture Day 2021 Ponte Vecchio OAM Module

At Intel Architecture Day 2021, Intel disclosed more about its new HPC-focused GPU offering: Ponte Vecchio. If there was any question of Intel’s desire to embrace multi-die/ multi-tile and multi-fab, this is the product that will put those questions to rest. The Intel Ponte Vecchio Xe HPC GPU is a Spaceship of a GPU. Whereas we previously called those working on making chips chip designers, if Ponte Vecchio is a “spaceship” that would make those working on it “chipbuilders.”

Intel Ponte Vecchio is a Spaceship of a GPU

Intel is targeting more than a small departure with its Xe HPC GPU codenamed Ponte Vecchio. A few years ago the company recognized that Xeon Phi was not going to get the company to exascale fast enough. Instead, it needed a GPU architecture with a radically different approach. Here is the improvement that the company is targeting:

Intel Architecture Day 2021 Xe HPC Ponte Vecchio Performance Expectation 1
Intel Architecture Day 2021 Xe HPC Ponte Vecchio Performance Expectation 1

The green line is NVIDIA’s top-end GPUs like the NVIDIA A100’s we looked at in Liquid Cooling Next-Gen Servers Getting Hands-on with 3 Options and Inspur NF5488A5 8x NVIDIA A100 HGX Platform Review. The blue line does not have a scale, but Intel’s point is that it is expecting big gains.

Intel Xe HPC Architecture

The Intel Xe HPC architecture incorporates a number of different IP tiles, but perhaps the main one is based around the Xe-core. This is the compute unit of the Xe HPC GPU. There are 8x 512-bit vector engines and 8x 4096-bit matrix engines. These execution units get a sizable L1 cache and the ability to move data quickly in order to keep them fed.

Intel Architecture Day 2021 Xe HPC Ponte Vecchio Xe Core
Intel Architecture Day 2021 Xe HPC Ponte Vecchio Xe Core

Here are the specs for the Vector Engine and Matrix Engine.

Intel Architecture Day 2021 Xe HPC Ponte Vecchio Vector And Matrix Engine
Intel Architecture Day 2021 Xe HPC Ponte Vecchio Vector And Matrix Engine

The building block of the Xe HPC core is then assembled into the Xe HPC Slice that aggregates 16 of these cores together. Since each has 512KB of L1 cache, that gives us 8MB of L1 cache total. A key challenge in GPU designs is memory bandwidth, or put more simply, keeping execution units primed with data to perform useful work. Much of the Xe HPC architecture is designed to keep the execution units primed as much as possible.

Intel Architecture Day 2021 Xe HPC Ponte Vecchio Xe Slice
Intel Architecture Day 2021 Xe HPC Ponte Vecchio Xe Slice

In the middle of the Xe HPC Slice are the 16 Ray Tracing units. Also as a quick note, the hardware context here is designed to help accelerate things like virtualized/ multi-tenant GPU usage.

Up to four Xe HPC Slices are brought together with a large L2 cache, 4x HBM2e memory controllers, a media engine, along with the Xe Links to form the Xe HPC Slice. In the bottom corner, one will notice the “Stack-to-Stack” block.

Intel Architecture Day 2021 Xe HPC Ponte Vecchio Xe Stack
Intel Architecture Day 2021 Xe HPC Ponte Vecchio Xe Stack

That is for the Xe HPC 2-Stack that combines two of these stacks for a total of 128x Xe HPC cores/ 8x slices together. As a result, we get eight times the slice figures for things such as 128x ray tracing units as well. We also double features such as the Xe Links (16) and HBM2e memory controllers (8.)

Intel Architecture Day 2021 Xe HPC Ponte Vecchio Xe 2 Stack
Intel Architecture Day 2021 Xe HPC Ponte Vecchio Xe 2 Stack

The Xe Link is the high-speed coherent unified fabric that allows for GPU to GPU transfer. We are not going to say this is Intel’s answer to NVLink, but a reader can draw their own conclusion.

Intel Architecture Day 2021 Xe HPC Ponte Vecchio Xe Link
Intel Architecture Day 2021 Xe HPC Ponte Vecchio Xe Link

The Xe Link can be used to connect two, four, six, or eight GPUs together.

Intel Architecture Day 2021 Xe HPC Ponte Vecchio 8x Link Scaling
Intel Architecture Day 2021 Xe HPC Ponte Vecchio 8x Link Scaling

If one has eight Xe HPC GPUs in their full configurations (there are many “up to’s” in the slides), here are the performance figures:

Intel Architecture Day 2021 Xe HPC Ponte Vecchio 8x System Rate
Intel Architecture Day 2021 Xe HPC Ponte Vecchio 8x System Rate

Now that we have looked at the architecture, let us get to the Intel Ponte Vecchio Chipbuilding.

6 COMMENTS

  1. “The Xe Link is the high-speed coherent unified fabric …”
    Previous presentations on Ponte Vecchio mentioned CXL, and I presumed the Sapphire Rapids chips would be responsible for coherency. I didn’t hear CXL mentioned for Ponte Vecchio today, nor for any of the other chips except Sapphire Rapids.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.