Arm Neoverse E1 Core Architectural Details
Beyond the Arm Neoverse N1, we have a second CPU launched. The Arm Neoverse E1 is a CPU designed for 5G infrastructure and edge compute and one I came away more excited about than I would have thought when we started the Tech Day.
If one thinks about the Arm Neoverse N1 as the chip that is designed to attack the IPC of Intel Xeon, the Arm Neoverse E1 is going in the other generation. It is targeting throughput and making accelerators intelligent.
Arm also claims large throughput gains for the Neoverse E1 over the previous generation parts. Instead of focusing on the Cortex-A72, the comparison points for Neoverse E1 are the Cortex-A53 and Cortex-A55.
The entire industry is working hard to address the 5G rollout. Everyone from the analog to digital converter specialists to the DSP and FPGA vendors are targeting new performance requirements from the new spectrum coming online. Beyond that, more connected devices with more bandwidth create the need to process data further out closer to endpoints. This is the industry the Arm Neoverse E1 is targeted at.
Arm Neoverse E1 Architecture
Like the N1, the Arm Neoverse E1 is an Armv8.2 compliant core. Instead of going for higher clock speeds, it is designed to run at low power and be deployed in clusters of cores.
Here, the Arm Neoverse E1 has small out of order cores with SMT capabilities to ensure the cores can keep throughput high. For comparison, the ThunderX (1) design was an in-order design, as were older Atom chips like the Intel Atom S1260 Centerton.
Deployed commonly in clusters of up to 8 CPUs, the Arm Neoverse cores share components to maximize space and power efficiency.
The pipeline is only 10 stages on and you will notice that it is a lot less complex than the Arm Neoverse N1 pipeline.
SMT is added. That was a major feature that the Broadcom Vulcan incarnated in the ThunderX2 added. The ThunderX2 utilized 4-way SMT to make 32 core CPUs have 128 threads. With the Neoverse E1, we have 2-way SMT so each core will look like two separate CPUs.
Also features like the 128-bit load store was only half that on previous generations and the non-blocking pipeline helps performance as well.
The instruction cache is likely to be smaller on the Neoverse E1 designs to save on silicon space. Arm is again betting that heavy branch prediction will be able to keep its cache and cores fed.
Arm is using an out of order architecture here which is designed to reduce stalls in the CPU and maintain throughput. Frankly, Arm needed to use an OoO model for this type of core.
Like with the Neoverse Cortex N1, the E1 features faster execution than its predecessors. We heard that the Neoverse E1 is technically an Armv8.2 architecture, but some features were pulled in from Armv8.3.
We are told that most vendors will use 1MB or L3 cache per cluster but the design supports up to 4MB. The Neoverse E1 is able to support up to 16 outstanding transactions to help keep the cores fed. We were told memory speedups of 4.9x A53 to E1 and 2.2x A55 to E1.
If the Neoverse N1 core was designed for 1-1.8W of power per core, the Neoverse E1 is both smaller and designed to run at 183mW per core or 5-10x lower. Clock speeds are lower as well, targeting 2.5GHz or lower as optimal.
To put this all in perspective, Arm essentially made an architectural leap akin to when Intel went from the dual-core Intel Atom S1260 “Centerton” to the Atom C2000 Avoton parts. That delta was enormous in September 2013 and from what Arm is showing, the Neoverse E1 has the potential to do the same if its customers choose to make chips accordingly.
Arm Neoverse E1 Impact and Target Market
Arm is using a low power core in the Neoverse E1 instead of the N1 because it sees a developing market at the edge. With the explosion of endpoints pushing data back to the network, Arm sees a need to put intelligence to either act on that data closer to the device for lower latency or to filter the data going back to the cloud. If the Arm Neoverse N1 messaging was to make a CPU for hyperscale cloud vendors, in a way, the Arm Neoverse E1 is the chip to make sure that data from endpoints do not need to make it back to the cloud. Arm wants to make CPUs for the entire value chain, but that is an interesting, albeit valid, position to take.
We did not see the Neoverse E1 edge platform reference design at the Tech Day as we did with the Neoverse N1 platform. Arm told us that it was expecting sub 15W SoCs with less than 4W dedicated to 16x Neoverse E1 CPUs providing 32 threads. That is intriguing because Arm was quoting 0.183W per core which one would expect is under 3W, but there are other aspects missing between the earlier figure and a fully functioning CPU.
One of the key points is that Arm is pushing its Server Based System Architecture. That is a push for standards that many of the edge computing devices need to help make the overall ecosystem flourish, not just the ecosystem for one chipmaker.
Arm sees 25GbE devices that can sit atop lamp posts and serve as 5G endpoints. That means the Neoverse E1 reference platform is being targeted at very power constrained requirement sets that also must achieve a high throughput.
Beyond just the Arm Neoverse E1 cores, the company expects its chipmaker customers to integrated other hardware accelerators with the E1 cores to provide greater efficiency via hardware offload. Small cores allow Arm to provide the OS handling and logic while accelerators can handle the tasks they are best at.
Now that 25GbE and 100GbE are mature and have become the defacto standard for what organizations are striving to deploy today, Arm is also looking at what it would take to go beyond 100GbE standards. That can be a mesh including both Neoverse E1 and Neoverse N1 cores alongside dedicated hardware.
Arm sees benefits to leveraging low power standardized cores. We frankly wish that all of the lower-end development boards adopted Neoverse E1 CPUs to get everything standardized as soon as possible.
If one takes a step back, the Arm Neoverse E1 is exactly the type of cores that we will need in quantity going forward. Believing in an explosion of data from sensors like surveillance cameras, intelligence at the edge must increase to avoid unnecessary data movement and storage. Chipmakers have an additional incentive here. Deploying Arm Noverse E1 SBSA compliant cores in your devices will help make them more standardized which will in turn help the Arm ecosystem grow.
Next, we are going to end with our market perspective and final thoughts on the platform.
This was a great long read.
STH is now like a mix of the technical side of Anandtech, the business side of TNP, and adding in it’s own mix of hands on experience working with this hw. I can’t wait for your N1 review
This article took me 2 hours to read this morning between meetings and tea. Great read STH.
Amazing article!
Arm is set to dominate the EDGE, I don’t really see how Intel hopes to gain any market share with the power draw of the x86 ecosystem. Given how much money they can out on R&D, we should expect to see something from them … and the Big.Little using Atom little cores doesn’t sound the right approach
M4r1k I don’t know if Arm really has a power consumption advantage when it comes to bigger chips.
https://www.servethehome.com/updated-cavium-thunderx2-power-consumption-results/
Both STH and AT had crazy high power for the ThunderX1 too.
Maybe in mobile they’re way better, but when they’re trying to get feature parity they haven’t proven to use less than modern x86.
How much is the motherboard with an Arm Neoverse N1 8 or 16 cores cpu?
Risky89 – Arm Neoverse N1 CPUs will be coming out in a few quarters. The development board with the Neoverse N1 CPU is a low production unit that is primarily going to companies that are building chips.
thanks Patrick