Supermicro ARS-210M-NR Performance
In terms of performance, the Ampere Altra Max we largely liken to the performance of between an AMD EPYC 7763 and an AMD EPYC 7773X on the integer side. On the floating point side, AMD is faster, and that will change this week with Genoa. Given that, we recently tested NVIDIA’s Ampere HPC development kit, and we wanted to compare the single socket M128-30 performance between that system and this one.
Something we just wanted to show quickly is that these results are all very close, but we do see a slightly lower average than we saw on the 2U edge box that did not have four GPUs running. Here was that result:
Overall, these are very low test variations, but we just wanted to show what we found.
On the GPU side, we wanted to instead look at a NVIDIA A16’s performance compared to an x86 server, so we tried the same GPU in one of the 2U Supermicro servers we used for ourĀ 4th Gen Intel Xeon Scalable Sapphire Rapids launch piece.
This is a bit larger of a delta than we saw with AMD EPYC Milan generation and the A100’s. On the other hand, the 4th Gen Intel Xeon Scalable “Sapphire Rapids” is a more modern design with DDR5, PCIe Gen5, and more. While losing 1-5% GPU performance may seem like a lot here, the real benefit is on the CPU side running native Arm for cloud gaming running Android or other Arm-heavy OSes.
Next, let us get to OS and then power consumption.
Supermicro ARS-210M-NR OS Support
Something we were very excited about with this server was the OS support. When we first started reviewing Arm servers, everything was delicate, and things often did not work. Here is the server running the VMware ESXi on Arm Fling. This worked just like it did on the 2U edge platform which makes sense since it is the same underlying motherboard.
Also, these days, one can just install OSes via IPMI. That may sound trivial, but often during the Cavium ThunderX days, installing Ubuntu required a BIOS update, maybe some firmware updates, a few patches, and so forth. Now the experience is easy.
Let us be 100% clear, the experience is close to x86, but even simple things are a bit harder. On the previous system, the Ubuntu OS drive was corrupt, so we had to re-install it prior to doing the Raspberry Pi cluster v. Ampere Altra video’s Linpack test. With this system, Ubuntu’s installer stalled, we rebooted, and then it gave us an installation error using LVM, so we had to install without that. The average Ubuntu mirror also has x86 desktop and server images as of this review’s publication date for Ubuntu 20.04.5, but if you want Arm, you go to the Ubuntu on Arm page and are happy to download Ubuntu 20.04.1.
Also, boot times can be in the 10+ minute range just for Ubuntu 20.04 server. There are Arm zealots who will say that the experience is exactly x86. That is categorically untrue. It is getting much closer, but it is not the same. One key tip is to utilize the OpenBMC Serial Over LAN (SOL) feature as trying to see what is happening via the KVM console often does not help.
There are still some other strange things. For example, Canonical, the company behind Ubuntu, has an Android distribution called Anbox (Android-in-a-Box) that is designed to run Android in the cloud. This system will have proof points using Anbox, where it will have 128 Android containers running 1080p60 at ~59fps. That is a key use case for a server like this. If you go to the main Anbox page and start installing from snaps, you will find x86 (amd64) is available, but arm64 is not.
That will likely be the first error you encounter as you will again find something trying trying to do the ppa add:
sudo add-apt-repository ppa:morphis/anbox-support
This is one of those use cases where ideally, one would go to the main documentation page, follow the steps, and everything would work correctly. In this case, it takes more effort. There is still a noticeable gap between Arm and x86 in everyday usability.
For those looking to build cloud gaming clusters or build something that specifically needs Arm, these are things that do not take a lot of time to work around. For others, this is still a challenge in 2023 that will keep them from Arm. Then again, this is a Supermicro MegaDC server, so it is one designed for large-scale cloud gaming deployments, and those folks will leverage small efforts to get things working across a sufficient scale.
With 128 cores and 512GB of memory, and four NVIDIA A16’s, the next question is power consumption. Next, let us get to that power consumption.
STH review of NVIDIA A16 when? I loved your old GRID M40’s https://www.servethehome.com/nvidia-grid-m40-4x-maxwell-gpus-16gb-ram-cards/
But… can it run 16 copies of Crysis at 1080?
Title is kind of confusing as there are not 16x GPUs in the system reviewed.
The A16 is a card that contains 4xA2s each with 16GB of dedicated ram. With 4xA16 you would have 16x A2s.
We ran the Bombsquad-stress in this system which up to 128 instances (concurrent users) with 1080P@60fps.
When running the high-quality Genshin Impact this system can support 48 instances and all running 1080P@60fps.
If this has 16x GPUs, then the First Gen EPYC dual-socket configuration had 8-CPUs.
It’s how Nvidia wants to market it but it’s not how we’ve discussed these sorts of products in the past. If one of the GPUs goes bad, how many do you need to replace? 1? Nope, you need to replace 4.
DDR5 almost had this problem but we’ve all settled on calling the current systems by the number of DIMMS instead of sub-channels.