Intel Gaudi 3 Going GA for Scale-out AI Acceleration

0
8x Intel Gaudi 3 OAM In UBB 1
8x Intel Gaudi 3 OAM In UBB 1

At STH, we have been covering the Intel Gaudi 3 launch for some time. Along with its Xeon 6900P launch, the company also announced that the Gaudi 3 AI accelerator is going general availability. For Intel, this is a big deal as it needs to show it has an AI offering that is growing after IBM Cloud recently said it will deploy Gaudi 3.

We attended a pre-brief event last week in Oregon for this, so we are going to say this is sponsored.

Intel Gaudi 3 Goes GA for Scale-out AI Acceleration

Intel is fully on the AI bandwagon with AI ranging from the PC to data center clusters.

Intel Gaudi 3 RAG AI Everywhere
Intel Gaudi 3 RAG AI Everywhere

In April, we showed the Intel Gaudi 3 128GB HBM2e AI chip in the wild. The new chips are hitting GA in October with systems from several vendors. Dell has its PowerEdge XE9680, one of the least serviceable AI systems, but Dell has a big customer base.

Dell and Supermicro Support Intel Gaudi 3
Dell and Supermicro Support Intel Gaudi 3

Supermicro showed off its X14 Gaudi 3 system in April 2024, and it was a functional system at that time. Other vendors brought cardboard and NVIDIA systems to Intel’s Analyst event in April, so the system we photographed was likely one of the first running and available models waiting for GA on the accelerators.

Supermicro SYS 822GA NGR3 Intel Gaudi 3 8 Way 2
Supermicro SYS 822GA NGR3 Intel Gaudi 3 8 Way 2

We also saw Wiwynn’s Gaudi 3 system in June, and others also have Gaudi 3 systems as well.

Systems are a big deal since not many of these are going to be deployed as single OAM modules. In fact, it would also be strange to simply see a single 8-GPU Gaudi 3 system deployed by itself these days. As a result, Intel is talking about building moderately sized clusters of up to 1024 nodes or 8192 accelerators.

Intel Gaudi 3 Scalable System Blocks
Intel Gaudi 3 Scalable System Blocks

Intel is using Ethernet as a scale-out fabric and has a relatively moderate-density rack configuration taking 15 racks to hold the storage, networking, control plane, and 256 Gaudi 3 accelerators.

Intel Gaudi 3 32 Node Configuration
Intel Gaudi 3 32 Node Configuration

That is part of Intel’s plan to help customers and partners deliver validated RAG experiences for enterprises.

Intel Gaudi 3 Full Stack RAG Delivery
Intel Gaudi 3 Full Stack RAG Delivery

While we have focused on the HL-325L OAM card and the systems built around the HLB-325 UBB (shown in the cover image to this article), Intel also has a PCIe card.

Intel Gaudi 3 Launch Products
Intel Gaudi 3 Launch Products

The Intel HL-338 is a Gaudi 3 card in a PCIe CEM form factor. We should be very clear here that not every server will support this 128GB HBM2e card. It is a 600W TDP dual slot passively cooled card that requires a lot of chassis power and airflow.

Intel Gaudi 3 Launch PCIe CEM HL 338
Intel Gaudi 3 Launch PCIe CEM HL 338

x

Intel Gaudi 3 Launch PCIe CEM HL 338 Block Diagram
Intel Gaudi 3 Launch PCIe CEM HL 338 Block Diagram

These cards can use QSFP-DD networking for two 400GbE links externally and then can use a backplane for local card-to-card transfers. The four-card block above is an interesting architecture.

Of course, NVIDIA is the big player these days in AI accelerators. Intel says, however, that its cards can be very competitive with the NVIDIA H100 both on a performance and performance per dollar basis. Realistically, Intel needs to discount its cards compared to the NVIDIA H100 given the current market dynamics, and it seems like these are notably less expensive.

Intel Gaudi 3 Launch Price Performance
Intel Gaudi 3 Launch Price Performance

A few weeks ago we went into the detail around the Gaudi 3 architecture at Hot Chips 2024. Intel again showed a lot of that information on how the cards work.

Intel Gaudi 3 Launch Spec And Block Diagram
Intel Gaudi 3 Launch Spec And Block Diagram

We are just going to post the slides here since we went over this about a month ago.

Intel Gaudi 3 Launch Matrix Mutiplication And Vector Engines
Intel Gaudi 3 Launch Matrix Multiplication And Vector Engines

Gaudi 3 certainly puts a lot of emphasis on the memory subsystem as is common in the space. The use of 200GbE for scaling out is one of the more interesting features. Most NVIDIA HGX H100 systems have a NIC per GPU which adds more cost and power consumption to a system. Plus, it creates another internal hop from GPU to NIC over PCIe. With Gaudi 3, that is onboard networking. A fun thought is that if you were to buy PCIe NICs with the same amount of bandwidth as the Gaudi 3 offers, you would likely spend as much if not more on the NICs than the Gaudi 3.

Intel Gaudi 3 Launch Memory Subsystem
Intel Gaudi 3 Launch Memory Subsystem

Intel has been working with the Gaudi team since the Habana Labs acquisition, so this is now a third-generation product.

Intel Gaudi 3 Launch MME TPE Pipeline
Intel Gaudi 3 Launch MME TPE Pipeline

Here are the specs on the HLS3, or the eight GPU reference server.

Intel Gaudi 3 Launch Networking
Intel Gaudi 3 Launch Networking

Intel has an entire software suite for the Gaudi line, as one would expect.

Intel Gaudi 3 Launch Software
Intel Gaudi 3 Launch Software

It also supports major frameworks.

Intel Gaudi 3 Launch Models
Intel Gaudi 3 Launch Models

Something fun is that this is available in the Intel Tiber Developer Cloud.

You might have seen the Intel Developer Cloud with Gaudi accelerators in our Touring the Intel AI Playground Inside the Intel Developer Cloud piece.

Final Words

Overall, it is great to see the Gaudi 3 hit general availability. Intel needs its AI accelerator business to take off, and hopefully, before Falcon Shores arrives. I asked if Intel was looking to package end-to-end solutions for companies just carting in clusters and having enterprise RAG applications running, it sounds like that is not the plan and instead, it is looking to power those applications through partners. Hopefully, we get to show you more Gaudi 3 systems in the future.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.