Earlier this month, Arm held a Tech Day where it discussed next-generation technologies. Frankly, a lot of ground was covered. For STH readers who are interested in the next-generation of Arm CPU technology, there was a lot in the Arm Tech Day 2021 presentations that we are going to cover. We also recognize that we have many practical readers looking to evaluate what they can purchase. Today’s announcements are more focused on future products rather than today’s products, and more precisely, are parts of the total package required to make future processors. As a result, we are going to cover the summary on page 1, then go into more detail on subsequent pages. If you read STH’s forward-looking articles simply to stay informed, feel free to skim the first page and we will link relevant bits when they become relevant in future STH coverage.
Arm Neoverse N2 and V1 at Arm Tech Day 2021: The Overview
First off, we need to level-set on what is being presented. Arm builds IP for its processors. That IP is then licensed to partners who take Arm IP, and add in 3rd party IP or create their own and create products. Bringing this back to other ecosystems, this is Arm discussing the parts of its IP that will be competing with AMD EPYC 7004 Genoa, Intel Xeon Sapphire Rapids, and in perhaps some small segments IBM POWER10. With that said, here is the current Arm Neoverse Roadmap.
The key takeaways are that Arm has the E1 platform for lower-power or higher-efficiency cores, the V-series for high performance per core or lower-efficiency cores, and the N-series which is in the middle, but closer to the V1. We often discuss CPUs in terms of EPYCs, Xeons, Ampere Altra (Max), AWS Graviton, and others. It is important to remember that there is a much larger (and perhaps more exciting) market out there as 5G creates new infrastructure demands and new security/ network processing needs. A great example of this is the DPU. We have a lot of DPU coverage on STH already and have several NVIDIA BlueField-2 cards running for a future piece. In Arm’s presentation, the picture of the Marvell Octeon SmartNIC struck me as a great example of how the shape of computing is changing. When your NIC is running a Linux distribution such as Ubuntu and is addressable independently, the paradigm of computing changes.
To that end, Arm is investing in a number of technologies and a number of fronts to enable its ecosystem. Again, the more Arm can expand its ecosystem, the more IP it licenses so when we hear Intel discuss OneAPI for its xPU strategy (some powered by Arm cores) we have to remember that Arm needs to do the same, and more, for its ecosystem partners.
Part of that is Project Cassini. This is an effort by Arm to increase standardization by its partners. It may not necessarily stop Apple from doing its own design with M1, but creating standards helps with adoption up-front. One area that many people forget is that many devices outside of the data center end up being deployed for many years. One area that x86 does extremely well in is the ability to support devices that have been deployed for a decade or more. If you have an Intel Xeon 5500 series system, you can install and run Ubuntu on it without any additional work or keep it up-to-date. In the Arm ecosystem, this has often been derided as supporting legacy code, but we also need to remember not everything is a phone, desktop, or server. Also, not everyone has the financial means to upgrade every 3-5 years. An eventual outcome of Project Cassini is increasing standardization which will help maintain deployed endpoints longer. From a broader sense, this is as important, or maybe more important than an individual core generational upgrade.
What Arm focused on at the Arm Tech Day 2021 was its new V1 and N2 platforms. The Neoverse V1 IP was available 3-4 quarters before N2 from what we heard, but it is designed to be a less efficient but higher performance per core solution to target x86-style compute more directly. The Neoverse N2 is designed to be the cloud core of the future.
Something that Arm tends to do is focus on AWS Graviton2 performance. This is somewhat a strange comparison since AWS builds the Graviton2 and sets its pricing to highly locked-in customers. While there is a switching cost from x86 to Arm, for applications like Nginx, there is a bigger switching cost to go from AWS to Azure. This is somewhat akin to if Intel had a cloud of Xeon servers, that you could not buy the chips/ systems directly that are being used, then showed Intel Xeon performance versus AMD EPYC performance and cost in its own cloud. That is a somewhat crazy proposition, but that is analogous to what using AWS Graviton2 to AWS x86 instance pricing is doing. It is not wrong, but one has to be very conscious of the context.
Likewise, Arm discusses its performance in a per-thread context. Since Arm is not using SMT/ Hyper-Threading, it only has full cores. As a result, across all of the threads at a similar thread count it can come out ahead. If we are comparing what you can get access to commercially today, it is the Arm Neoverse N1 64C/ 64T (usually Graviton2) versus the “Traditional 2021 40c, 80t” which is the projection for the Intel Xeon Platinum 8380, and the “Traditional 2021 64c, 128t” which is likely the EPYC 7763 or other EPYC 7003 part. So in this chart, Arm is showing it believes next-generation processors that partners build around its IP will exceed today’s x86 CPUs on a per-thread basis and on a per-socket basis. Again, there is a lot of context required to read these.
Specifically, Arm’s two lines the Neoverse V1 and N2 are designed for slightly different markets with the V1 focusing on higher-performance per thread while the N2 is focused on scale out.
As we typically see, Arm believes its next-generation V1 part is going to be faster than the previous-generation N1 IPC by around 48%.
Something one will notice is that the N2 Arm says came in better than expected so it is more like 32% IPC uplift over N1. If you are looking at these two charts, one may immediately think that the V1 is only around 12% faster than the N2, but is not designed for the same scale-out and efficiency target. The answer there is simply V1 is the slightly older IP and is focused a lot on the SVE performance for more HPC-style workloads. That is important because both Intel and AMD are effectively supporting this style of computing on their respective lines.
Coming full circle, Arm again thinks future partner solutions built around V1 and N2 will be faster than today’s x86 compute from Intel and AMD.
The major distinction for our readers to take away is to think of the Neoverse V1 as more of the HPC-style compute core while the N2 will be the scale-out cloud core. If you are running a web server, you will want N2 over V1.
With that summary, let us start to get into the details. First, we are going to look at the V1. We are then going to look at the N2. Finally, we are going to focus on the coherent mesh network Arm is bringing to its ecosystem. If you are not interested in CPU details, the mesh network portion is perhaps one level higher and is more impactful for system design.
It looks like a 64 core Epyc Milan offers over 2 TFLOPs (Source: https://www.microway.com/knowledge-center-articles/detailed-specifications-of-the-amd-epyc-milan-cpus/) per CPU and costs less than the 3.072 TFLOPs offered by the dual CPU PrimeHPC FX700 costing U$40K (Source: https://www.fujitsu.com/global/products/computing/servers/supercomputer/specifications/).
Excellent for ARM CPUs to hold the top Supercomputer spot, but cost / power / physical space / etc. all come into play – (highly capable) monkeys on typewriters.
If I wanted to pay almost 2x I could go with IBM’S POWER instead of AMD’s Milan.
Hurray for ARM, but I am skeptical about their ‘win’ (and I’m not talking about their placement in the TOP 500 spot, which I don’t dispute). Based on the FX700’s pricing and the software (even hardware) infrastructure I would choose the competition.
But, competition is great (for the customer); whichever team you prefer.