At FMS 2022, we saw a new DPU. There were two products actually at FMS 2022 that claimed to be DPU, Kalray’s option was one, while the other company showing a DPU was showing a fake DPU. As a result, we wanted to show Kalray’s DPU solution, and why we are going to classify it as a DPU even though it uses something very different than the others in the market.
Kalray K200-LP DPU with Coolidge MPPA3-80 DPU Chip at FMS 2022
The way most of us will see the Kalray K200-LP DPU is in the -LP or low profile PCIe Gen4 x16 card.
Onboard is the Kalray MPPA3-80 DPU chip. We were told that this chip is running two versions of Linux simultaneously to provide DPU functionality, and that is something very different.
Here is a back side of the card where we can see more memory packages.
Here are the two QSFP28 100GbE ports.
Why the Kalray MPPA3-80 DPU is so interesting is that it is not using Arm, MIPS, or x86. Instead, it is using the company’s own Coolidge cores and has 80 of them on the chip. The cores are surrounded by caches, accelerators, and other devices. Each chip is made up of five clusters of 16 cores. Kalray has one of these 16 core clusters running the card’s management function in its own Linux environment. The company told us then that for applications, the other 64 cores are running another Linux environment. Since these are not mainline cores, the Linux distribution is custom compiled for the Kalray cores.
This is an interesting one because we typically have only seen Arm, MIPS, and x86 solutions in this space. Kalray has something different, so we wanted to run it through our DPU framework outlined in What is a DPU? A Data Processing Unit Quick Primer.
- High-speed networking connectivity (usually multiple 100Gbps-200Gbps interfaces in this generation) – 2x 100GbE on the K200-LP.
- High-speed packet processing with specific acceleration and often programmable logic (P4/ P4-like is common) – This was left to the cores, but seems to be less of a focus for Kalray since it is focused on storage.
- A CPU core complex (often Arm or MIPS based in this generation) – This is the Coolidge core cluster with 80 cores
- Memory controllers (commonly DDR4 but we also see HBM and DDR5 support) – The K200-LP has DDR4-3200 support and we can see the memory on the card
- Accelerators (often for crypto or storage offload) – Each cluster has these on the MPPA3-80.
- PCIe Gen4 lanes (run as either root or endpoints) – There is an x16 interface on the chip and card
- Security and management features (offering a hardware root of trust as an example) – This is not the focus on the card, but it does offer the second environment for managing the infrastructure so we are giving it a pass here.
- Runs its own OS separate from a host system (commonly Linux, but the subject of VMware Project Monterey ESXi on Arm as another example) – Here we confirmed the card is running two Linux OSs.
This seems to be a storage focused DPU, so it seems to be more focused, like the Fungible solution, on providing a storage alternative to a traditional CPU-based system. There is less of a focus on creating an infrastructure-wide solution. Still, we are going to include this in our DPU coverage going forward since it seems to be close to what we would call a DPU.
Kalray DPU Solutions
The company also showed storage solutions based on its DPU. One was the Kalray Flashbox that we think is made by Viking.
This is powered by two nodes, each with four Kalray KP200-LP DPUs.
At the rear of the box, we can see that there are two of these four DPU controller nodes. In this picture, one of the nodes had only three DPUs.
Atop the hardware, Kalray has its software based on SPDK and being run and accelerated by the DPUs.
The other solution was from Pixitmedia, and was that company’s PixStor box.
PixStor uses DPUs in the company’s NVMe tier for adding higher-speed storage to the overall solution.
Where Fungible focused more on selling its own solution, Kalray seems to be also looking at more OEM opportunities.
Final Words
This is one of those really interesting solutions because it is not using an Arm or x86 CPU. On one hand, for the purpose of just doing storage, the Kalray DPU using something different may make sense. It is great to see different types of technologies in the marketplace.
At the same time, we can also see the benefit to using Arm or something that is more general purpose to a DPU can be more easily maintained and extended in the future. For flexibility, we looked at the FPGA plus Intel Xeon D IPU that JD.com is using in This Changes Networking Intel IPU Hands-on with Big Spring Canyon. We also did a hands-on with ZFS without a Server Using the NVIDIA BlueField-2 DPU:
It will be interesting to see how Kalray fares in the market with its very different DPU.
It’s highly likely that those Coolidge cores are just tweaked ARM cores.
I don’t see anyone doing a whole new backend on open-source tool chains (gcc/llvm/gdb,binutils) without that being reflected in their source.
They might have done some tweaks to existing backends to accomodate some special instructions that use special-purpose hardware onboard and that’s it.
No, from their product page (https://www.kalrayinc.com/products/processors-many-core/) it looks like they’re VLIW cores (fascinating!)
VLIW is fast and cheap if your problem fits (dsps, gpus) but is pretty bad for general purpose compute, but that is probably fine for targeted applications like storage.
@Onibra
Hi, it’s not an ARM architecture. It is a patented MPPA technology from CEA Leti (Atomic Energy Commission, Grenoble, France), one of the main microelectronics laboratories in the world, at the origin of the company STMicroelectronics among others.
interesting article despite few mistakes…
ex : there’s a single instance of linux running on one of the clusters, and it’s only dedicated to control and management plane. Other clusters are running a lightweight run-to-completion proprietary OS (called ClusterOS) with libc and minimal pthread support on top of which SPDK has been ported.
And indeed cores are proprietary VLIW with dedicated instructions to accelerate compute intensive tasks like Erasure Coding, AI (ex CNN).. with support for GCC, LLVM … and upstreaming is in WIP ;-)
Patented, proprietary, niche CPU, no support in upstream Linux kernel – hard pass.
@Onibra
Apparently, they do: by looking at their github they have ported binutils, gcc, llvm, linux, gdb, …
Their gcc is even available on godbolt (under the name KVX GCC. why ?), and it really does look like a VLIW instead of ARM instruction set.
@Nix
In fact Coolidge (aka MPPA 3) is the name of the SoC, not the name of the core’s architecture.
The architecture of the core(s) is named kv3-1 (meaning 3rd generation, version 1), and it’s of the kvx family thus the kvx name everywhere.
@Onibra
Tell me you don’t know anything about Kalray without telling me you don’t know anything about Kalray.