Untether.AI Boqueria 1458 RISC-V Core AI Accelerator

August 23, 2022

At Hot Chips 34, Untether.AI showed off its newest AI accelerator. The Untether.AI Boqueria is a 1458 RISC-V core AI accelerator that tries to match compute with memory.

Note: We are doing this piece live at HC34 during the presentation so please excuse typos.

Untether.AI Boqueria 1458 RISC-V Core AI Accelerator at Hot Chips 34

Untether.AI has a common theme that data movement is expensive in terms of performance and power consumption. Part of the goal is to bring compute closer to memory to minimize movement.

HC34 Untether AI Boqueria At Memory Compute Is The Sweet Spot For AI

Boqueria is a 2PF of FP8 processor built on TSMC 7nm. The 238MB of on-chip SRAM gives the chip around 1PB/s of SRAM bandwidth, plus it can access external memory. The FP8 is important as that is a key part of Boqueria’s architecture.

Each Memory Bank (the memory/ compute clusters on the NOC) has two multi-threaded RISC-V cores. All of these Memory Banks are connected via the NOC.

HC34 Untether.AI Boqueria Memory Bank RISC V

Here is a diagram of how Boqueria puts SRAM and compute together.

HC34 Untether.AI Boqueria Compute At Memory RISC V

A big insight and design principle of Untether.AI is that FP8 is suitable for inference. It says that FP8 is more efficient than INT8 to design for.

HC34 Untether.AI Boqueria FP8 For AI Inference

FP8 has a small accuracy impact on inference so that is why Untether.AI is using FP8 since it is more efficient and has a low impact on accuracy.

HC34 Untether.AI Boqueria Memory Bank RISC V Accuracy Degradation FP8 V INT8

The RISC-V processor is a RV32EMC instruction set, but then custom instructions. That is part of the power of RISC-V.

HC34 Untether.AI Boqueria RISC V Instruction Set And Processor

Here is more detail about the on-chip NOCs.

HC34 Untether.AI Boqueria High Bandwidth IO And Connectivity

The company says its architecture scales from very low power to higher power devices. It is not discussing 500W+ chips but is instead targeting M.2 type of power envelopes.

HC34 Untether.AI Boqueria Scaling Architecture

The idea is to then aggregate a number of these smaller chips to achieve more performance. Note that this will be a PCIe Gen5 device as well.

HC34 Untether.AI Boqueria 6 Chip PCIe Card

The company’s software is called the imAIgine SDK.

Like most AI accelerators, the compilers need to be highly optimized for the hardware.

HC34 Untether.AI ImAIgine SDK Spacial Comilation Optimizations — HC34 Untether.AI ImAIgine SDK Spacial Compilation Optimizations

With that, the company says it can have higher performance than a GPU.

Here are the throughput and energy efficiency comparisons:

Of course, one has to remember that the GPU being compared is a more general-purpose accelerator device that is currently commercially available.

Final Words

Every year at Hot Chips we get a number of AI startups. Usually, startups that try to simply match what NVIDIA is doing at a lower price we do not cover. This we thought was interesting not just because of the inference accelerator angle, but also because it is using RISC-V. These are the types of applications where RISC-V can make inroads on Arm’s market before trying to go into more mainstream markets.

Untether.AI Boqueria 1458 RISC-V Core AI Accelerator at Hot Chips 34

Final Words

RELATED ARTICLESMORE FROM AUTHOR

This is the New NVIDIA GB200 NVL4

NVIDIA H200 NVL 4-Way Shown at OCP Summit 2024

New Shots of the NVIDIA HGX B200

LEAVE A REPLY

RELATED ARTICLES MORE FROM AUTHOR