Molex’s BittWare division has a number of really interesting FPGA solutions. While many in the industry believe we are moving to smarter computational storage SSDs with the compute happening on the SSD itself, others see the accelerators on the PCIe fabric. As we move to CXL in PCIe Gen5 systems, the fabric approach may have additional merit. The Molex BittWare 250-M2D adds a Xilinx FPGA to a M.2 card along with DRAM in order to move computational storage acceleration to the OCP-friendly form factor.
Molex BittWare 250-M2D
BittWare (a Molex company) is using a Xilinx Kintex UltraScale+ (KU3P in a B784 package) to put the FPGA into a very common form factor. This includes local DDR4 DRAM on the modules as well. This is also still a PCIe Gen3 x4 M.2 form factor, not PCIe Gen4.
For those wondering, the UltraScale+ is the current shipping but older-generation 16nm Xilinx FPGA. Xilinx is making a major move to upgrade its portfolio to the 7nm Xilinx Versal ACAPs. Putting a FPGA in a M.2 device helps integrate into many servers. One can quickly browse the huge number of server reviews STH does and see the M.2 form factor on most new servers.
Putting a FPGA in a M.2 slot, especially using a PCIe switch, allows the FPGA to do tasks such as compression, encryption, and even AI inferencing by pulling data over the PCIe bus in peer-to-peer transfers. This model provides massive CPU offload capabilities.
The M.2 form factor also works very well in the Open Compute Project (OCP) designs. Facebook, in particular, has been keen to use M.2 accelerators such as the Intel NNP-I 1000 as well as using M.2 for storage. There are even Glacier Point V2 modules designed to pack M.2 devices into Facebook’s front-end web servers. Microsoft also uses M.2 heavily. If you look at the cover image for BittWare’s accelerator here, you can see it has the mechanical assembly bits to fit into the OCP M.2 slots in Glacier Point V2 slots.
Facebook is transitioning to Yosemite V3 as we covered in Facebook Introduces Next-Gen Cooper Lake Intel Xeon Platforms. As you can see, each node has two M.2 SSDs that have slightly different physical parts to aid in mechanical servicing versus what the BittWare 250-M2D shows.
Facebook’s Yosemite V3 has the ability to add more accelerators, however, one will notice that the BittWare heatsink is oriented more for Yosemite V2 versus V3 designs. That may give us a tip as to the installations BittWare is targeting.
Still, adding M.2 FPGAs is a truly interesting use case.
Final Words
Aside from being cool hardware, BittWare is partnering with Eideticom to bring NoLoad storage and Mertyle.ai’s SEAL accelerator for recommender systems to the platform. BittWare can pre-load that IP to the accelerator modules. One can, of course, also design and load their own IP and update it in the future to the BittWare 250-M2D making it flexible and upgradable.
Overall, this is a cool platform. The big question is adoption. From a model perspective, this is something we hope Xilinx and Intel will do more of. STH has an Intel N3000 FPGA network accelerator being featured which is being sold pre-loaded with IP. Xilinx now has a Live Video Transcoding Product Line. Hopefully what BittWare is doing becomes the standard model going forward. We have a standards-based M.2 form factor (they make U.2 as well) with a menu of IP packages that can be more easily integrated. This is a truly interesting product.
“As we move to CXL in PCIe Gen5 systems, the fabric approach may have additional merit.” -> As we move to CXL i nPCIe Gen5 systems, the fabric…
“Mertyle.ai’s SEAL accelerator for recommender systems” -> recommended*
Some fixes… Are articles reviewed before hitting submit?
Expansion in a standard PC was via ISA / EISA / PCI slots. Smaller setups, including laptops, have not had this luxury.
Using an M.2 or spare NVME slot is an improvement, but I’d like to see more. For example, high speed ADC / DAC (over 150 Msps, 12 or even 14 bit) combined with DSP on FPGA would make for a great way to get real world data in and out of a notebook PC.
The heatsink dissipation ability according to device photo is a bit more than 11.55W that defined as normal for M.2 Key M device in spec… Moreover, the device max device TPD is defined as 14.85W on the Bitware site. But still the device is very interesting :)