Esperanto ET-SoC-1 1092 RISC-V AI Accelerator Solution at Hot Chips 33

0
HC33 Six Esperanto ET SoC 1 Chips Deployed In Yosemite V2 On Glacier Point V2
HC33 Six Esperanto ET SoC 1 Chips Deployed In Yosemite V2 On Glacier Point V2

The Esperanto ET-SoC-1 is kicking off Hot Chips 33 Day 2 with a different solution. The company effectively is taking a many-core RISC-V architecture designed for AI acceleration, and then building small chips that can work in parallel on single cards for AI inference. As with our other Hot Chips coverage, this is being done live.

Esperanto ET-SoC-1 1092 RISC-V AI Accelerator Solution at Hot Chips 33

Here is a quick summary of the Esperanto ET-SoC-1. It is a wall of text, so we are not going to just transcribe.

HC33 Esperanto ET SoC 1 Overview
HC33 Esperanto ET SoC 1 Overview

Esperanto’s approach is to use many smaller chips each with its own DRAM instead of a larger chip.

HC33 Esperanto ET SoC 1 Different Approach
HC33 Esperanto ET SoC 1 Different Approach

Since Esperanto has a lower power budget, the goal is to be able to hit many of these smaller chips into a single slot with a 120W power budget.

HC33 Esperanto ET SoC 1 Six Chips In 120W
HC33 Esperanto ET SoC 1 Six Chips In 120W

Esperanto can get better performance per Watt by running at lower clocks and power, but to get solid benchmark results, each runs at 20W.

HC33 Esperanto ET SoC 1 Efficiency
HC33 Esperanto ET SoC 1 Efficiency

The main core, of which there are 1088 of them, is the ET-Minion. These are basically small in-order cores with big vector/ tensor units and its own local L1 cache. Esperanto is using RISC-V here so it has a customized core with a general-purpose instruction set plus Esperanto’s vector/ tensor instructions. Esperanto says RISC-V also uses the lowest number of instruction gates to implement, so it is a smaller/ lower power core.

HC33 Esperanto ET SoC 1 ET Minion RISC V CPU With Vector Tensor Unit
HC33 Esperanto ET SoC 1 ET Minion RISC V CPU With Vector Tensor Unit

Eight of these cores form a neighborhood and have 32KB of shared cache.

HC33 Esperanto ET SoC 1 8 ET Minion Neighborhood
HC33 Esperanto ET SoC 1 8 ET Minion Neighborhood

32 of these cores share 4x 1MB banks of SRAM and have a mesh stop to the rest of the chip. In the “not inspired by the Hobbit” category these blocks are each called a Minion Shire.

HC33 Esperanto ET SoC 1 32 ET Minion CPUs And 4MB Memory Minion Shire
HC33 Esperanto ET SoC 1 32 ET Minion CPUs And 4MB Memory Minion Shire

The mesh network allows the Shires to talk to each other.

HC33 Esperanto ET SoC 1 Shire Mesh Interconnect
HC33 Esperanto ET SoC 1 Shire Mesh Interconnect

Here is what the block diagram looks like for the entire chip with its 34 Minion shires and other Shires like the PCIe, Memory, and I/O Shires.

HC33 Esperanto ET SoC 1 Full Chip Block Diagram
HC33 Esperanto ET SoC 1 Full Chip Block Diagram

The card has a PCIe Gen4 link along with using LPDDR4x.

HC33 Esperanto ET SoC 1 External Chip Interfaces
HC33 Esperanto ET SoC 1 External Chip Interfaces

The ET-SoC-1 is then put onto cards with multiple chips per card. Specifically, Esperanto is focusing on the six-chip configuration. More on that soon.

HC33 Esperanto ET SoC 1 Six Chips For Large Sparse Recommendation
HC33 Esperanto ET SoC 1 Six Chips For Large Sparse Recommendation

Specifically, six of these chips can fit on a Glacier Point V2 card. That means one can fit 6,558 RISC-V cores and 192GB of DRAM per card while running at around 120W.

HC33 Esperanto ET SoC 1 Six Double Width M2 On Glacier Point V2
HC33 Esperanto ET SoC 1 Six Double Width M2 On Glacier Point V2

We have coveredĀ Twin Lakes and Glacier Point V2/ Yosemite V2 before. Facebook has been on Yosemite V3 for some time.

HC33 Esperanto ET SoC 1 Deployed In Yosemite V2
HC33 Esperanto ET SoC 1 Deployed In Yosemite V2

Esperanto is focusing on hardware in this talk, but it also has a software stack. Quite a small amount of emphasis was put on this which is strange since NVIDIA CUDA is a big competitive advantage. Perhaps just because this is a hardware conference.

HC33 Esperanto ET SoC 1 Software Support
HC33 Esperanto ET SoC 1 Software Support

Here is a recommendation benchmark that Esperanto is showing better performance.

HC33 Esperanto ET SoC 1 Performance NVIDIA T4 A10 Comparison
HC33 Esperanto ET SoC 1 Performance NVIDIA T4 A10 Comparison

Here is a ResNet-50 benchmark where even the Habana Goya is included. Goya also looks very good here. Just as a quick note, when we did theĀ NVIDIA Tesla T4 AI inferencing GPU benchmarks and review a long time ago, we did not get 70W. Esperanto may be using TDP not actual results.

HC33 Esperanto ET SoC 1 Performance Image Classification
HC33 Esperanto ET SoC 1 Performance Image Classification

While the ET-Minion is an in-order core designed to effectively exploit the vector/ tensor units, there are out-of-order cores. These are more of the chip’s high-end cores for when those are required.

HC33 Esperanto ET SoC 1 ET Maxions OOO RISC V
HC33 Esperanto ET SoC 1 ET Maxions OOO RISC V

Here is the summary of stats of the TSMC 7nm chip.

HC33 Esperanto ET SoC 1 Stats Summary
HC33 Esperanto ET SoC 1 Stats Summary

Just a quick note, Esperanto is just getting silicon so the performance results are estimated.

Final Words

Overall, this is a very interesting solution. Esperanto has a modular architecture to scale. It is also interesting to see a RISC-V chip. RISC-V is generating a lot of buzz. It has taken time for that buzz to turn into product. As Arm has transitioned from the exciting new architecture to the more mature legacy architecture, RISC-V is turning into the next hot new thing.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.