Today, a quick one. We have been covering Ampere Computing’s progress in making Arm data center CPUs for some time. As some examples, we covered an Ampere Altra powered Wiwynn Mt. Jade server designed for cloud deployments and we have also covered Oracle cloud adopting the 80 core Arm CPUs. Oracle Cloud is even giving away Ampere Arm A1 instances in its always free tier. While 80 cores are impressive, the Ampere Altra Max M128-30 is designed to take on the latest x86 processors from Intel and AMD with its 128 cores.
Ampere Altra Max M128-30 Spotted
Recently the new Ampere Altra Max M128-30 was spotted alongside the AMD EPYC 7003 “Milan” and Intel Xeon Ice Lake parts. For those wondering, the M seems to mean “Max”, the 128 is for the 128 cores, and the 30 is for 3.0GHz.
Previously at STH, we saw the 2019-2020 era AMD EPYC 7002, Ampere Altra Q80-30 (80 core, 3.0GHz), and Intel Xeon Cascade Lake together.
Seeing the Ampere Altra Max in Q3 2021 seems to align with the timeline that the product would be shipping in 2021 that the company set forth last year.
Here is the pin side of the Ampere Altra Max with its contemporaries. In comparison, the Altra Max is a larger heavier package.
Unlike its counterparts from AMD and Intel, the M128-30 fits a full 128 Arm Neoverse N1 cores onto a single processor package. AMD EPYC 7002 and 7003 series CPUs utilized 64 cores with SMT=2 to reach 128 threads per socket.
As one can see this is the AC-212825002 part which is listed as a 250W part which means it has a lower TDP than the top-end AMD EPYC 7763 or Intel Xeon Platinum 8380. Here are the Ampere Altra Max SKUs.
• AC-212825002 (128 cores, 250 W)
• AC-212823002 (128 cores, 230 W)
• AC-212819002 (128 cores, 190 W)
• AC-211224002 (112 cores, 240 W)
• AC-211221002 (112 cores, 210 W)
• AC-211218002 (112 cores, 180 W)
• AC-209623502 (96 cores, 235 W)
• AC-209622002 (96 cores, 220 W)
• AC-209619002 (96 cores, 190 W)
• AC-209617002 (96 cores, 170 W)
Although the 128 core option is out, there are several other options on the market.
Ampere Altra Max Key Features
Here are some of the key specs for the Altra Max pictured above.
- 128 Armv8.2+ 64-bit CPU cores up to 3.0GHz maximum
- 64KB L1 I-cache, 64KB L1 D-cache per core
- 1MB L2 cache per core
- 16MB System Level Cache (SLC)
- 2x full-width (128b) SIMD
- Coherent mesh-based interconnect – Distributed snoop filtering
- 8x 72-bit DDR4-3200 channels
- ECC, Symbol-based ECC, and DDR4 RAS features
- Up to 16 DIMMs and 4 TB/socket
- Full interrupt virtualization (GICv3)
- Full I/O virtualization (SMMUv3)
- Enterprise server-class RAS
- Up to 128 lanes of PCIe Gen4 per CPU
- 4 x16 PCIe + 4 x16 PCIe/CCIX with
Extended Speed Mode (ESM) support for data transfers at 20/25 GT/s\ - 32 controllers to support up to 32 x4 links
- 128 PCIe lanes in 1P configuration
- 192 PCIe lanes in 2P configuration
- 4 x16 PCIe + 4 x16 PCIe/CCIX with
- Coherent multi-socket support
- Q4 x16 CCIX lanes
This certainly has the specs around the CPU to build very high-end solutions beyond just the basic CPU cores.
Final Words
This was a quick teaser, but an important one. The Intel and AMD server CPUs have large launch events. Ampere is more focused on selling to cloud customers. As a result, the product launches are a bit different to the point when folks may not know exactly when chips are out. We can confirm that the chips are out in the wild now (albeit this is still marked as ES silicon.)
Patrick sent me a few photos and said “can you write something on this?” So as one can imagine, more is coming on this topic in the future. Still, it is very cool to see the new server chips rolling out. Ampere is not aiming to exactly match Intel and AMD’s strategy. Instead the company is forging ahead doing something different. Stay tuned for more.
1. Interesting, the package is bigger despite the die size should be smaller? I/O and Memory Channel are about the same as AMD.
2. The original Max was suppose to be aiming at 280/300W. So 250W is a little underwhelming.
The cache hierarchy looks like tuned for nginx or so. 128MB of L2 and 16MB L3 for communication between cores? Reminds me about under-cached SPARC T1/T2.
Giv me 2. Aka where to get them… !?
@KarelG- The Altra Max has larger L1 cache, same size L2 cache, similar size L3/SLC cache compared to Intel Xeon Cascade Lake. The AMD Rome requires a Larger L3 in order to avoid the memory latency penalty from Zen2 Architecture. The AMD Rome L3 is allocated as 16 MB per 4-core CCX block and all L3 cache accesses larger than 16 MB must go through the IO die and incur a latency penalty, this is based on Anandtech’s review of Rome.
@Ksec – why would a datacenter prefer a CPU to drive more power and produce the same level of performance. Altra Max has leads the industry in perf/watt.
Looks like perfect match for hyper scalers, less cost for a instance