Supermicro MegaDC ARS-211M-NR Performance
Now what everyone wants to know, let us chat a bit about what happens when you get an Ampere AmpereOne A192-32X processor. There are 192 cores without SMT so one gets 192 threads. This is similar to Intel Xeon 6 6700E Sierra Forest insofar as SMT is absent, and this is more of an E-core rather than a P-core design in x86 parlance.
One area where this CPU does extremely well is just pegging all of the cores at 3.2GHz. This is stress-ng running across 192 cores and the 3.2GHz clock speed is across all 192 cores.
In some server CPU architectures, one might see a few cores run faster, others run a bit slower. AmpereOne is designed so all cores can run up to the same speed.
SPEC CPU2017 Results
SPEC CPU2017 is perhaps the most widely known and used benchmark in server RFPs. We do our own SPEC CPU2017 testing, and our results are usually a few percentage points lower than what OEMs submit as official results. It is a consistent ~5% just because of all of the optimization work OEMs do for these important benchmarks. Since there are official numbers at this point, it feels right to use the official numbers if we are talking about a benchmark.
We are using the official results here so that means optimized compilers. Ampere would suggest using all gcc and shows its numbers for de-rating AMD and Intel to gcc figures for this benchmark. That discussion is like debating religion. One could argue gcc is the least common denominator so that is the right way to look at this. On the flip side, when we look at the AI space, almost everything is using optimized compilers and toolchains. In the end, data exists for all of the major CPUs in the cloud-native market, and the benchmark is open to compilers, so we are just going to show the official runs. Here, the 144-core Intel Xeon 6780E is close to the 192-core AmpereOne.
STH nginx CDN Performance
On the nginx CDN test, we are using an old snapshot and access patterns from the STH website, with DRAM caching disabled, to show what the performance looks like fetching data from disks. This requires low latency nginx operation but an additional step of low-latency I/O access, which makes it interesting at a server level. Here is a quick look at the distribution:
We certainly get a nice generational lift here. Just as a quick note, the configuration we use is a snapshot of our live configuration. Here, nginx is one of the very well-optimized for Arm workloads, but we probably have some room to grow in terms of whether we need to optimize our configuration for Arm. Still, it is about what we would expect with AmpereOne being roughly core-for-core competitive with Sierra Forest and ahead of AMD EPYC Bergamo on a per-socket basis. We also get a little better than per-core scaling here over Altra Max.
MariaDB Pricing Analytics
This is a very interesting one for me personally. The origin of this test is that we have a workload that runs deal management pricing analytics on a set of data that has been anonymized from a major data center OEM. The application effectively looks for pricing trends across product lines, regions, and channels to determine good deal/ bad deal guidance based on market trends to inform real-time BOM configurations. If this seems very specific, the big difference between this and something deployed at a major vendor is the data we are using. This is the kind of application that has moved to AI inference methodologies, but it is a great real-world example of something a business may run in the cloud.
This is very similar to the nginx test in terms of optimization notes. Open databases are widely used in cloud instances, so the underlying software may be better optimized than our port of the application at this point. Still, this is effectively a real-world tool that has runs tens of billions of dollars of data center hardware deals through it (of course using different data) making it a very real-world business application. Zen 4c does well here but AmpereOne is in the same orbit.
We are going to have more in our full AmpereOne A192-32X review coming shortly, but that should give some idea around performance. The item to keep in mind is that this is the $5555 list price part, while the AMD EPCY 9754 and Intel Xeon 6780E have list prices around twice that figure.
Next, let us get to the power consumption.
That’s great performance per dollar at least
I don’t get why you’d want a 1S not a 2S for these. If you’re trying to save money, then 2S 1 NIC shares common components except CPUs and mem so that’s much cheaper.
At idling Ampere’s power consumption seems/feels “broken” – Phoronix in its latest benchmarks with EPYC 5c does too, make this same observation.