Tesla DOJO Exa-Scale Lossy AI Network using the Tesla Transport Protocol over Ethernet TTPoE

4
The Tesla Dojo Training Tile Wired And Plumbed
The Tesla Dojo Training Tile Wired And Plumbed

At Hot Chips 2024, we got to learn about TTPoE, or the Tesla Transport Protocol over Ethernet. This is largely around the V1 of the TTP, but it is something that can be shared at Hot Chips. Instead of using TCP, Tesla decided to make its own networking protocol for its AI cluster.

Please excuse typos. These are being done in real-time during Hot Chips at Stanford.

Tesla DOJO Exa-Scale Lossy AI Network using the Tesla Transport Protocol over Ethernet TTPoE

For Tesla’s DOJO supercomputer, the company did not just make an AI accelerator, but also its own transport protocol over Ethernet. Or the aptly named Tesla Transport Protocol over Ethernet (TTPoE.)

Tesla Dojo Hot Chips 2024_Page_01
Tesla Dojo Hot Chips 2024_Page_01

Tesla says TCP/IP is too slow but RDMA using PFC for lossless fabric impacts the network.

Tesla Dojo Hot Chips 2024_Page_02
Tesla Dojo Hot Chips 2024_Page_02

TTPoE is a peer-to-peer transport layer protocol executed in hardware. One advantage is that Tesla does not need special switches since it is mostly using them for Layer 2 transport.

Tesla Dojo Hot Chips 2024_Page_03
Tesla Dojo Hot Chips 2024_Page_03

Here is the OSI layer for DOJO. We can see that Tesla is replacing the transport layer.

Tesla Dojo Hot Chips 2024_Page_04
Tesla Dojo Hot Chips 2024_Page_04

Here are the TTP transition examples over the TTP Link.

Tesla Dojo Hot Chips 2024_Page_05
Tesla Dojo Hot Chips 2024_Page_05

This is the TCP state machine versus the TTP state machine.

Tesla Dojo Hot Chips 2024_Page_06
Tesla Dojo Hot Chips 2024_Page_06

Here is the TTP header frame built on Ethernet-II framing.

Tesla Dojo Hot Chips 2024_Page_07
Tesla Dojo Hot Chips 2024_Page_07

Unlike lossless RDMA networks, TTPoE expects to lose packets and retry packet transmission. This is not UDP, instead it is more like TCP.

 

Tesla Dojo Hot Chips 2024_Page_08
Tesla Dojo Hot Chips 2024_Page_08

Congestion management is handled by local link channels instead of being done at the network or switch level. Tesla said TTP supports QoS, but it has been turned off.

Tesla Dojo Hot Chips 2024_Page_09
Tesla Dojo Hot Chips 2024_Page_09

Tesla put this IP block in FPGA and silicon and it is designed to just blast packets across a wire.

Tesla Dojo Hot Chips 2024_Page_10
Tesla Dojo Hot Chips 2024_Page_10

Here is the TTP microarchitecture. Something unique is that it looks a lot like a L3 cache. The 1MB of TX buffer was described as “in this generation” so there is a good chance it has changed in a newer generation. The last line of HBM2HBM fabric memory is a very popular feature.

Tesla Dojo Hot Chips 2024_Page_11
Tesla Dojo Hot Chips 2024_Page_11

The 100Gbps NIC for Dojo is Mojo that runs at under 20W had has 8GB of DDR4 memory as well as the Dojo DMA engine onboard. We covered this in theĀ Tesla Dojo Custom AI Supercomputer at HC34.

Tesla Dojo Hot Chips 2024_Page_12
Tesla Dojo Hot Chips 2024_Page_12

Tesla is now going back to that 2022 presentation showing the D1 Die.

Tesla Dojo Hot Chips 2024_Page_13
Tesla Dojo Hot Chips 2024_Page_13

It is now showing the 5×5 array of D1 chips that are packaged together.

Tesla Dojo Hot Chips 2024_Page_14
Tesla Dojo Hot Chips 2024_Page_14

x

There is also a 32GB HBM Dojo Interface processor with TTPoE. The 900GB/s TTP interface is internal. TTPoE is wrapped in the Ethernet frame.

Tesla Dojo Hot Chips 2024_Page_15
Tesla Dojo Hot Chips 2024_Page_15

Tesla showed how Dojo is connected.

Tesla 100G NICs To V1 Dojo Interface Cards To Dojo
Tesla 100G NICs To V1 Dojo Interface Cards To Dojo

It starts with the assembly that houses the D1 tiles all packaged together which SerDes cabled connections.

The Tesla Dojo Training Tile Wired And Plumbed
The Tesla Dojo Training Tile Wired And Plumbed

Those go to the Interface Cards.

Tesla V1 Dojo Interface Processor Cards 2
Tesla V1 Dojo Interface Processor Cards 2

They are then attached to the low cost 100G NICs.

Tesla Dojo 100G NICs
Tesla Dojo 100G NICs

Here is another view of what was on the table.

Tesla Dojo Hot Chips 2024_Page_16
Tesla Dojo Hot Chips 2024_Page_16

Here is the Mojo Dojo Compute Hall or MDCH in New York. We can see 2U compute nodes without any front 2.5″ storage, which is really interesting.

Tesla Dojo Hot Chips 2024_Page_17
Tesla Dojo Hot Chips 2024_Page_17

This is the 4 ExaFLOP engineering system with 40PB of local storage, and lots of bandwidth and compute. It is also somewhat crazy to have a 4EF (BF16/FP16) engineering system.

Tesla Dojo Hot Chips 2024_Page_18
Tesla Dojo Hot Chips 2024_Page_18

Arista has been providing switches for this. When the network is brought to a larger scale, with more hops, the latency additions have an impact on the bandwidth.

Tesla Dojo Hot Chips 2024_Page_19
Tesla Dojo Hot Chips 2024_Page_19

Tesla is joining UEC and is offering TTPoE publicly. Very cool!

Tesla Dojo Hot Chips 2024_Page_20
Tesla Dojo Hot Chips 2024_Page_20

 

Tesla Dojo Hot Chips 2024_Page_21
Tesla Dojo Hot Chips 2024_Page_21

It looks like Tesla is using Arista switches in the photos as well.

Tesla Dojo Hot Chips 2024_Page_22
Tesla Dojo Hot Chips 2024_Page_22

Here is something interesting. Tesla is also saying that TTPoE can have lower one-way write latency over a switch, and that includes NVLink.

Tesla Dojo Hot Chips 2024_Page_23
Tesla Dojo Hot Chips 2024_Page_23

Tesla’s takeaway is that they are in the microsecond realm.

Final Words

This is one of those interesting talks, but at some point it would be cool if this was used beyond just Dojo. It feels like a lot of lifting to do making custom NICs, custom protocols, and so forth for a system and not trying to benefit from economies of scale. It was cool to see that Tesla is bringing this to the UltraEthernet Consortium.

4 COMMENTS

  1. One slide makes a reference to a 40PB storage. Of course the storage must support TTPoE. Which kind of protocol is used to access the storage ?

  2. Great to hear this is being taken to the Ultra Ethernet Cons. The ambition of lossless Ethernet is admirable but may have taken it beyond a reasonable timeframe.

    Making everything work while the real world happens is the way to go.

    Now to get testing how much loss, latency, jitter, etc is TOO much.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.