Get Excited Marvell Structera CXL Memory with Arm Neoverse V2

4
Marvell Structera A DLRM Example With 2 Devices
Marvell Structera A DLRM Example With 2 Devices

The new Marvell Structera CXL family includes both CXL Type-3 Memory expansion devices with a few exciting twists. What makes the family more intriguing is that the company has another version, called a near-memory accelerator with up to 16 Arm Neoverse V2 cores onboard. I open very few briefing decks where I immediately get excited. On the way back from the Olympics, I opened this announcement half asleep and felt that sense of excitement. Let us get into it.

Marvell Structera CXL Family

The two sides of the family currently are the Structera A and Stuctera X. The A feels like it is for accelerators and the X is for memory eXpansion.

Marvell Structera Launch 2024 Product Line
Marvell Structera Launch 2024 Product Line

These are CXL 2.0 and PCIe Gen5 devices designed for today’s servers built on TSMC 5nm. Let us get into it.

Marvell Structera A 2504: CXL DDR5 Memory Accelerator with Arm

Let us start with the coolest one first. The Marvell Structera A 2504 is, at its heart, a CXL 2.0 memory expansion device, but with a huge twist. First, it can support up to four channels, not just two. Next, it has 16x Arm Neoverse V2 cores. Those are the same cores as NVIDIA Grace uses, so these are not low-power and low-performance cores.

Marvell Structera A 2504 Overview
Marvell Structera A 2504 Overview

Marvell also has built-in LZ4 compression and decompression to maximize the capacity of the memory, which can be up to DDR5-6400.

Marvell Structera A Block Diagram
Marvell Structera A Block Diagram

Of course, folks may wonder why one would want to put Arm cores on a memory expansion card. The basic idea is that scaling cores only leaves you with less memory bandwidth per core. Scaling memory capacity only does not help add more processing power. That is why some applications will want to scale both. If you saw ourĀ Memory Bandwidth Per Core and Per Socket for Intel Xeon and AMD EPYC piece, that demonstrates what is going on in the industry.

Marvell Structera A Scale Compute And Memory Bandwidth
Marvell Structera A Scale Compute And Memory Bandwidth

Here is an example of a deep learning recommendation model (DLRM) server. Here, we can see the host CPU with 64 cores, an 8-channel memory solution, and an XPU for AI acceleration. DLRM is still a huge space, even though folks often talk a lot about generative AI.

Marvell Structera A DLRM Server Without SA
Marvell Structera A DLRM Server Without SA

Adding one accelerator adds 25% more cores and 50% more memory bandwidth.

Marvell Structera A DLRM Example With 1 Device
Marvell Structera A DLRM Example With 1 Device

Adding two of these accelerators would add memory capacity and also double the memory bandwidth while adding 50% more cores (32 on cards plus 64 on the CPU.)

Marvell Structera A DLRM Example With 2 Devices
Marvell Structera A DLRM Example With 2 Devices

The big advantage here is that the server gets more cores and more memory bandwidth that can scale by adding more CXL controllers.

We are just going to note here that Marvell makes chips for hyper-scale clients. The example above is specific enough that it does not feel like Marvell is making this as a product, hoping that someone configures it in an HPE ProLiant one day.

Still, not all servers and applications will want to scale cores like this. For that, Marvell has new CXL memory expanders.

Marvell Structera X 2404: CXL 2.0 DDR4 4-channel Memory Expander

This is one that solves a huge challenge in the industry. The Structera X 2404 is a 4-channel DDR4 memory expansion device. What is more, one can plug up to three DDR4 devices per channel. That means, one controller can take up to 12 DIMMs.

Marvell Structera X 2404 4 Channel DDR4 Memory Expansion Device
Marvell Structera X 2404 4 Channel DDR4 Memory Expansion Device

Here is the block diagram for the memory expander.

Marvell Structera X 2404 Block Diagram
Marvell Structera X 2404 Block Diagram

Of course, the big use case is DDR4 memory recycling. If you are a hyper-scaler with tons of DDR4 servers still running, a very plausible model will be to pull DDR4 out of decommissioned servers and put the DIMMs into shelves with the Structera X 2404. DRAM is an enormous cost in hyper-scale racks, making this is an ample opportunity to lower costs through recycling. That is especially the case with LZ4 compression/ decompression.

Marvell Structera X 2404 CXL Enabled DDR4 Memory Recycling
Marvell Structera X 2404 CXL Enabled DDR4 Memory Recycling

Depending on who you talk to in the industry, the CXL Type-3 DDR5 expanders for performance applications will be another major use case. Marvell also has a variant for that.

Marvell Structera X 2504: CXL 2.0 DDR5 4-channel Memory Expander

In many ways this is very similar to the 2404, except that it uses DDR5. Supporting up to DDR5-6400 in 4-channel 2-DIMMs per channel mode adds a lot of performance to the part.

Marvell Structera X 2504 Overview CXL Memory Expander
Marvell Structera X 2504 Overview CXL Memory Expander

Here is the block diagram.

Marvell Structera X 2504 Overview CXL Memory Expander Block Diagram
Marvell Structera X 2504 Overview CXL Memory Expander Block Diagram

The memory expansion bit is very interesting. Depending on your CPU and how you will want to integrate CXL memory expansion, you may need different capacities and performance profiles for your memory. Even Intel Xeon 6 variants have different CXL memory capacity capabilities for running in different modes (e.g., running CXL memory as its own pool versus an interleaved memory source.)

Marvell Structera X 2504 Overview CXL Memory Expander With DDR5
Marvell Structera X 2504 Overview CXL Memory Expander With DDR5

Whereas the DDR4 version is more about re-using an inexpensive source of memory, the DDR5 version can aim more for performance.

Final Words

We have previously shown off two channel devices from Astera Labs on STH. We will at least note that Astera Labs has had working units for a long time, and Marvell is a few months from launch. Still, Marvell is doing something different with its CXL memory expanders and something very different with the Arm Neoverse V2 core accelerator version.

Marvell Structera Summary
Marvell Structera Summary

This is a family where we cannot wait to get to use all three variants in action. Especially with the CXL 2.0 capability and CXL 2.0 switches starting to become a reality, the possibilities get a bit more exciting. Imagine a shelf with a CXL 2.0 switch and several of these accelerators adding more Arm cores to a system.

If you want to learn more about CXL, feel free to read ourĀ CXL 3.1 Specification Aims for Big Topologies piece to see where this is headed. The big question is when will we see them in the market with a sampling timeframe of Q4 2024.

4 COMMENTS

  1. Any plans to acquire and test these? It would be interesting to see performance on various pools or tiered memory.

  2. The X cards are intriguing. CXL memory seems like it should be able to offer capital-L large memory pools. These are an improvement ov the two DIMM stuff I’ve seen. It’s early days still I suppose.

    The A cards, would not they be rather niche? Not everyone can leverage those cores.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.