ASRock Rack 1U4G-ROME AMD EPYC 4x GPU 1U with EDSFF

2

ASRockRack 1U4G-ROME Management

In our hardware overview, we showed the out-of-band management port. This allows OOB management features such as IPMI but also allows one to get to a management page. ASRock rack seems to be using a lightly skinned MegaRAC SP-X interface. Since we have covered this a number of times, and it is standard on ASRock Rack servers here is the quick overview.

This interface is a more modern HTML5 UI that performs more like today’s web pages and less like pages from a decade ago. We like this change. Here is the dashboard.

ASRock Rack MegaRAC SP X Dashboard
ASRock Rack MegaRAC SP X Dashboard

Going through the options, the ASRock Rack solution seems as though it is following the SP-X package very closely. As a result, we see more of the standard set of features and options.

ASRock Rack MegaRAC SP X System Inventory
ASRock Rack MegaRAC SP X System Inventory

One nice feature is that we get a modern HTML5 iKVM solution. Some other vendors have implemented iKVM HTML5 clients but did not implement virtual media support in them at the outset. ASRock Rack has this functionality as well as power on/ off directly from the window.

ASRock Rack ROMED8 2T IPMI HTML5 IKM BIOS
ASRock Rack IPMI HTML5 IKVM BIOS

Many large system vendors such as HPE, Dell EMC, and Lenovo charge for iKVM functionality. This feature is an essential tool for remote system administration these days. ASRock’s inclusion of the functionality as a standard feature is great for customers who have one less license to worry about.

Beyond the iKVM functionality, there are also remote firmware updates enabled on the platform. You can update the BIOS and BMC firmware directly from the web interface. This is something that Supermicro charges extra for.

ASRock Rack MegaRAC SP X BIOS Update
ASRock Rack MegaRAC SP X BIOS Update

Something that we did want to mention here is that ASRock Rack is not vendor locking CPUs as we find in servers from vendors such as Lenovo and Dell EMC. We covered how AMD PSB Vendor Locks EPYC CPUs for Enhanced Security at a Cost. ASRock Rack has a more eco-friendly design that does not vendor-lock AMD EPYC CPUs to ASRock Rack-only servers. You can learn more about what is happening and why it is important you know if your server vendor is doing this either in that article or in the video below.

This is important for the industry so we wanted to point it out to our readers.

Next, we are going to move on to the performance of the server.

ASRock Rack 1U4G-ROME Performance

Something that we had a fairly unique opportunity to do here was to test the 1U versus 2U 4x GPU solution. Just recently we tested the ASRock Rack 2U4G-ROME/2T and we were able to test the system using the same components, including the same NVIDIA A100 GPUs. The density implications of moving from 2U to 1U with the same four GPUs was easy

ASRock Rack 2U4G ROME 2T Internal Riser Room 1
ASRock Rack 2U4G ROME 2T Internal Riser Room 1

We wanted to validate that the cooling was again able to cool all of the accelerators since that is a major feature for the server. Here we are using four passively cooled NVIDIA A100 40GB PCIe cards that we put into one of our 8x GPU boxes. We also reviewed the ASUS RS720A-E11-RS24U dual-socket system and have a Dell EMC PowerEdge R750xa review coming. These have seen some mileage.

ASRock Rack 2U4G ROME 2T GPU A100 Nvidia Smi
ASRock Rack A100 Nvidia Smi

In terms of cooling, we saw that this system was able to keep the GPUs cool in our testing but there was a bit of a delta that was noticeable.

ASRock Rack 1U4GPU ROME GPU Performance 4x A100
ASRock Rack 1U4GPU ROME GPU Performance 4x A100

Overall, we see that we had performance we would characterize as slightly lower than the 2U 4x GPU ASRock Rack server. This is likely a cooling delta and instinctively makes sense. It was also a bit more pronounced in our training workloads and we did see the FP32 ResNet-50 training test go a bit outside what we may forgive as just a test variation. Of course, the other side of this is that losing 1-2.5% of GPU performance to double the density in a data center is a trade-off that will make sense for many looking at these 1U servers.

One other feature we wanted to just mention is that the NVIDIA A100’s we have here have MIG or multi-instance GPU support. This allows one to partition a GPU into multiple smaller slices. Here is an example partitioning a single 40GB A100 to two 20GB slices. This is actually a big deal for inferencing workloads. Those inferencing workloads also tended to be much closer in terms of performance than the training workloads.

ASRock Rack 2U4G ROME 2T GPU A100 Nvidia Smi MIG
ASRock Rack 2U4G ROME 2T GPU A100 Nvidia Smi MIG

This is important since it practically means that one can get up to seven GPU instances per card. With four A100’s, that means this server could have up to 28 A100 slices. For inferencing workloads, a single A100 is often too much. Using MIG functionality means one can get often get the benefit of having many smaller GPUs, like NVIDIA T4‘s, without having to install so many physical cards.

One item we could not test was NVLink in this system. The four separate PCIe GPU risers meant that we could not utilize NVLink bridges on the A100’s as we did in this photo to get NVLink support in this system. The same would hold true for the AMD Radeon Instinct MI100 bridges. With the GPUs spread throughout the system, the top of GPU short PCB point-to-point links do not fit.

2x NVIDIA A100 PCIe With NVLink Bridges Installed
2x NVIDIA A100 PCIe With NVLink Bridges Installed (Does not work in this server)

Overall though, we saw good CPU and GPU performance in this platform.

Next, we are going to move to our power consumption, server spider, and final words.

2 COMMENTS

  1. The EDSFF storage and connectors are a thing of real beauty. It would be wonderful if this tech supplants m.2 in the desktop world also.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.