The November 2021 Top500 list was very interesting. At STH we have been looking at a subset of every list for years now. Specifically, we always look at the net-new systems since that shows quite a lot in terms of industry trends. Many systems will add nodes over time, but we generally like to look at the net-new. In this list, Lenovo dominated. That dominance actually brings out some trends that would be overlooked otherwise.
For those wanting to take a trip down into the archives, you can find our previous pieces here: June 2021, November 2020, June 2020, November 2019, June 2019. We are going to reference the June 2021 quite a bit in this piece so that may be one to open in another tab/ window.
Top500 New System CPU Architecture Trends
In this section, we simply look at CPU architecture trends by looking at what new systems enter the Top500 and the CPUs that they use. Let us start by looking at the vendor breakdown.
Unlike the past two editions, the November 2021 list was all AMD and Intel. From this view, it looks like Intel led handily, but there is more to it than that. If you recall from the June 2021 list, this was our video and one may look at the above and exclaim that the tables have turned.
When we get into more of the details, things get more interesting. Intel Ice Lake launched in roughly the same timeframe as AMD’s Milan. What we see though is that Intel Ice Lake only represents 3 of 42 of Intel’s new systems, for a paltry 7%. Supercomputers do tend to lag, but AMD saw a 50% transition to its new chips. Here is a breakdown by architecture:
Something that is also interesting is just that we are not seeing any pre-2019 architectures as we have with earlier lists. This is solely the two newest architectures from Intel and AMD. While Arm holds the top spot with Supercomputer Fugaku by Fujitsu and RIKEN and is powering Chinese supercomputers, it is not getting new systems onto the list. That will change as Europe builds Arm supercomputer chips from non-EU Arm IP (UK/ Japan unless NVIDIA purchases it, and from what we heard currently slated for TSMC fabs. Still, part of efforts like the Euroexa project should include some EU IP.)
Back to the asterisk on the Intel is beating AMD. Here is another way to look at the data, by looking at cores instead of just by the number of systems that use each type of CPU. As we can see, this does not look like Intel’s dominance.
One other very interesting note here. Of the 42 Intel-based systems, 41 are made by Lenovo and one by Dell. Also, only 10% of the accelerated systems (e.g. with a GPU) utilize Intel Xeon CPUs.
CPU Cores Per Socket
Here is an intriguing chart, looking at the new systems and the number of cores they have per socket.
24 cores remains an important point. What is really interesting here is how this has changed. With Ice Lake being less represented than the previous list, we now have the 48 and 64 core AMD systems. 32 cores is mixed between the vendors and 24 cores has a single EPYC 7413 system. Otherwise, it is generally Intel on the left and AMD on the right. Just for comparison, we also see lower core count systems are gone. Here is the June 2021 list where a number of new systems used <24 cores:
So there certainly has been a lot of movement basically using what would be a modern low core count CPU 24-32 cores, or just going to 64 cores.
Here are the actual SKUs used:
The AMD EPYC 7V12 is Microsoft Azure’s custom SKU. The Azure team is building a dedicated HPC infrastructure in the cloud and will be deploying Milan-X soon.
Perhaps the shocking bit is that 14 of the systems use the Intel Xeon Platinum 8280, but none are using the new Platinum 8380. It seems a bit strange that those Platinum 8280’s are on the list just given they now feel dated in a world with the EPYC 7763.
Accelerators or Just NVIDIA?
Unlike the June 2021 list, NVIDIA is not the only accelerator vendor for the new systems. Here is a breakdown in our accelerator by vendor chart:
Here is a breakdown of the new accelerated systems by accelerator:
First, off, NVIDIA has transitioned to the A100, as we are not seeing V100 systems anymore. Quite a few of these systems are actually the Dell EMC PowerEdge XE8545 we reviewed. Indeed three of Dell EMC’s four entries used this system:
The bigger story is perhaps the decline of NVIDIA here. In June 2021 22 of the 58 new systems used NVIDIA accelerators (down from 28 of 58 in June 2020’s 58 new systems.) Now NVIDIA is 19/70. Putting that in perspective, roughly six quarters ago, ~48% of new systems on the Top500 used NVIDIA accelerators. Now, that is down to 27%.
Acceleration is still a NVIDIA game, but with Exascale systems coming soon, and we know about AMD with Frontier and El Capitan 2 along with Intel Xe HPC GPUs for that era, we may see a change over the next few lists as high-end systems get more diverse with accelerators.
Above we have what Intel is doing on the HPC side and we discussed the AMD Instinct MI200 and saw the AMD Instinct MI250X OAM at SC21.
The strange thing is that while there are some fairly strong signs the HPC market has actually been pulling back from NVIDIA, Intel and AMD do not have their GPUs on this list.
Fabric and Networking Trends
Here is one that many regulars to this piece will identify with. On the Interconnect side, Ethernet is by far the most common solution.
In June 2020 we saw Ethernet at 53% of the new systems. In June 2021 it was only 19% of new systems using Ethernet. In November 2021 we are now back to 54%.
While Omni-Path had some uptake on the November 2020 list, we again do not have any Intel OPA or Cornelis Networks Omni-Path systems here.
When we look at a breakdown by generation, here is what we get:
As a quick note, there were four Russian systems that simply were marked with “Infiniband” so we are calling these RU Infiniband. Our best guess would be HDR but we just wanted to busket these separately in the event that assumption is incorrect. Also, on the Atos BXI V2, it is certainly well behind the HPE-Cray Slingshot in terms of adoption.
If we drill into which manufacturers are using 10GbE, 25GbE, 40GbE, and 100GbE we get an interesting picture:
Lenovo tends to be the top Ethernet user for new systems on the list. This is no different except for two factors. First, no other vendor is using Ethernet. Second, we finally have a list with no 10GbE/ 40GbE. There are still 38 Ethernet systems but only 8 are 25GbE.
Lenovo tends to take clusters of non-traditional HPC systems, runs Linpack on them, and then submits them to the Top500 in order to claim the biggest vendor numbers. All 38 of these Ethernet systems just happen to also be Intel Xeon Cascade Lake systems.
Final Words
When we look at the vendor picture, we can get a sense of what is happening in the market:
Lenovo is #1 again, but again, 38 of the 42 systems it has on this list are Ethernet-based, Intel Xeon Cascade Lake, non-accelerated systems installed at software vendors or service providers. Lenovo has so many of these systems that they make up 54% of the new systems on the Top500 list. When we discuss Intel, AMD, Arm, or NVIDIA winning in the new Top500 systems, Lenovo’s point is perhaps none of them win. Instead, anything using technology more exotic than a 2.5-year-old+ CPU and Ethernet is technically a minority system.
Even though we know the list is not a perfect representation of what is going on, it is still fun to do a bit of analysis around the changes happening in the industry. Hopefully, our readers enjoyed this one. Let us also hope that ISC 2022 brings some big changes.
What? It’s almost like Lenovo’s gaming the system.
Or do other vendors lack game?
Big losers are ICE and NVIDIA.
EU just said they’re doing EU only Exa to get funding. They all knew they’d need an Asia fab and IP. Arm is a Japanese company, not UK. UK HQ, but owned in Japan. They’re using Asia IP and Asia fab but sold politicians on made in EU.
So refreshing to have someone actually pull this view and not pander to the SC $$$. That’s why I read STH like a religion. I don’t agree with everything, but I’ll appreciate that you’ve got a perspective.
I don’t understand what’s relevant about only looking at the new systems on the list. Why not look at the whole thing? This is only what’s new in like half a year.
The well-known paraphrase of Parkinson’s law–when a measure becomes a target, it ceases to be a good measure– explains well the irrelevance of the Top 500 for measuring scientific computing facilities.
The fact that the two exascale systems in China aren’t on the list coupled with the fact that many of the machines which did make the list aren’t used for science further illustrates the irrelevance.
Thoroughly enjoyed it. Thanks!
I think they do this to show trends Apron