Ampere AmpereOne A192-32X Review A 192 Arm Core Server CPU

0
Ampere AmpereOne A160 30 At Computex 2023 1
Ampere AmpereOne A160 30 At Computex 2023 1

It is time for the piece many have been waiting for, the Ampere AmpereOne A192-32X review. In this review, we are going to go into the performance, power consumption, and perhaps most importantly, what it is like using platforms like the Supermicro MegaDC ARS-211M-NR and what it means for the industry. We have a lot here, so let us get to it.

Ampere AmpereOne A192-32X Overview

The AmpereOne A192-32X is important to keep in context. It is a 192-core 3.2GHz (hence A192-32X) part, which seems mundane by 2024 standards. Allegedly, it was first sold in 2022-2023, mainly on the Oracle Cloud. That initial volume going to cloud providers means that it took quite some time to get into the hands of other customers. In 2024, that has changed, and now we have servers like the Supermicro MegaDC ARS-211M-NR.

Ampere AmpereOne A192 32X In Supermicro Socket LGA5964 2
Ampere AmpereOne A192 32X In Supermicro Socket LGA5964 2

That may not seem like a big deal, but it is the difference between AmpereOne hitting the enterprise market with 192 cores when that was a lot versus today when Intel is at 144 E cores at 250W and next quarter and 128 P-cores (256 threads) in Q3 2024. AMD, for its part in early Q4, is now at 192 cores/ 384 threads per socket. Or let us put it this way. In 2022-2023, a 192 core Arm CPU was otherworldly. In 2024, the x86 crew has largely caught up.

AmpereOne A192 32X Lscpu OutputAmpereOne A192 32X Lscpu Output

Ampere is focused on providing a chip that can be partitioned via containers or VMs for multiple customers at once. For all of its performance claims, let us get real for a moment. Ampere is not trying to build a HPC CPU. This is a cloud-native chip.

AMD EPYC Siena Bergamo Ampere AmpereOne Intel Xeon 6700E Sierra Forest 1
AMD EPYC Siena Bergamo Ampere AmpereOne Intel Xeon 6700E Sierra Forest 1

One area in which Ampere moved up the stack with AmpereOne is pricing. AmpereOne pricing is higher than Altra Max but with more performance. Still, Intel, AMD, and NVIDIA do not consider a $10K list price for their chips a ceiling in any way.

AmpereOne SKU List And Pricing Large
AmpereOne SKU List And Pricing Large

The other big one between AmpereOne and Altra Max is that the feature set saw a huge revision. This is the original 2022 slide, the A192-32X is a 400W part. Still, things like nested virtualization are new with AmpereOne. We also get PCIe Gen5 and DDR5 support.

Ampere Altra To AmpereOne Products
Ampere Altra To AmpereOne Products

We went into more details during the Ampere AmpereOne Architecture at Hot Chips 2024 but Ampere also changed how it is making chips. The center chip that you see has the cores and caches on TSMC 5nm. Around that main chip are smaller chips that handle PCIe and DDR5 connectivity. Eventually, with AmpereOne M, Ampere will add two more DDR5 chips and get to 12 channel DDR5 matching AMD and Intel. For now, we are looking at the 8-channel DDR5 machine.

Ampere AmpereOne Hot Chips 2024_Page_10
Ampere AmpereOne Hot Chips 2024_Page_10

Some of the other impacts of the cloud-native design are when it comes to cores and caches. The center compute tile is a sea of 192 cores in 24 8-core clusters. Each core gets its own 2MB L2 cache and does not utilize SMT. So one core is one thread. For an organization worried about a future Spectre/ Meltdown vulnerability, one core/ one thread protects against that. It is telling that Intel and NVIDIA have taken this approach as well.

Supermicro MegaDC ARS 211M NR Topology Ampere AmpereOne A192 32X Base Config
Supermicro MegaDC ARS 211M NR Topology Ampere AmpereOne A192 32X Base Config

Something very different with this chip versus an Intel Xeon 6 Granite Rapids-AP (or even Sapphire Rapids/ Emerald Rapids) or an AMD EPYC 9005 “Turin” is that there is a tiny shared L3 cache at 64MB. That is much smaller than even the 144-core Intel Xeon 6700E and minuscule compared to AMD’s L3 caches. Again, this is designed to be partitioned off and sold to multiple customers, so having a large shared L3 cache conceptually can be challenging in that model. Plus, a large L3 cache takes up a lot of die area.

Still, one benefit of this approach is that the core-to-core latency can be better than Intel and AMD because there is a single compute tile.

Ampere AmpereOne A192 32X C2C Latency Run 1 Results
Ampere AmpereOne A192 32X C2C Latency Run 1 Results

Next, let us get to the performance.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.