Today, we get more on the Enflame DTU 1.0. This is Enflame’s AI compute chip meant for servers. We will note that the Enflame DTU 1.0 is a 2018/2019 chip so this is looking at an older design in the talk. We heard about Enflame previously in OCP China Day 2020 Interview with Bill Carter and Shen Rong of Inspur. Like other pieces, we are doing this live at Hot Chips 33, so please excuse typos.
Enflame DTU 1.0 AI Compute Chip at Hot Chips 33
The Enflame DTU 1.0 is a 12nm FinFET chip that has a PCIe Gen4 x16 interface along with 200GB/s interconnects.
The package itself includes HBM2 onboard, but we did not get the capacity.
The DTU 1.0 SOC has four clusters and 32 AI compute cores. It also has data transfer engines and high-speed interconnects for chip-to-chip communication.
This is the base look at the VLIW cores.
The cores have Tensor ALUs to accelerate the matrix/ vector operations.
One of the big aspects of Enflame’s architecture is exploiting sparsity. Enflame had a number of detailed slides (~9) on those concepts.
The key part here is that Enflame is able to skip instructions/ data that do not need to be executed due to sparsity.
Here is the flow in the data pipeline:
The interconnect is not cache coherent, but it is Enflame’s own interconnect that can directly connect up to four GPUs to one another and scale to 8-GPUs much like theĀ 3rd Generation Intel Xeon Scalable Cooper Lake 4P and 8P topologies.
These can be cabled between 8x DTU chassis to make bigger training clusters.
The training accelerator card comes in the CloudBlazer T10 for PCIe or the CloudBlazer T11 for OAM.
Notable here is that all of the system photos are of the PCIe version, not the OAM version. OAM and the UBB are designed to scale out to multiple systems.
Enflame says that it gets fairly linear training even with 160 accelerators or 20 chassis worth of accelerators.
Enflame has DTU 2.0 as of July, but is not sharing many details other than saying it has performance around FP32 and 3x memory bandwidth and 4x memory capacity. It says the new product will be shipping soon.
Final Words
We do not often get to see the Enflame solution. While most of the talks are looking at current or future technology, this is more of an older chip as a 2019 solution that is being shown. Still, it is interesting to see what the previous generation was an to get some sense of the next generation as well.
Would be really curious to see STH test the 2nd gen DTU – is there any chance on the horizon, where you might be able to put a server with DTU 2.0’s through its paces?
In this age of 500W GPUs I applaud the company’s bravery in chosing it’s name for this product.