AMD EPYC 9005 Turin Turns Transcendent Performance with 768 Threads Per Server

2

Removing Bottlenecks with HUGE CPUs

Something that we need to call out here is something we expect to see from everyone reviewing the platform today. We saw significant bottlenecks elsewhere in the system once we hit 192 cores per socket. We have been running many of our workloads so long, that we have a decent idea of how they should perform. With 128 and 192 cores we started to see an impact by swapping out our normal PCIe Gen4 NVMe SSDs for newer generation PCIe Gen5 drives. We had a few of the new Solidigm D7-PS1010 drives in the lab, and since they are new and fast we decided to do a quick generational comparison.

Solidigm D7 PS1010 Side By Side
Solidigm D7 PS1010 Side By Side

At 64 cores running our nginx workload, we did not see a huge benefit to the new drives. By the time we got to the AMD EPYC 9965, we were over 8% better performance.

STH nginx CDN AMD EPYC 9005 Turin SSD Sensitivity by SKU Solidigm D7-PS1010
STH nginx CDN AMD EPYC 9005 Turin SSD Sensitivity by SKU Solidigm D7-PS1010

On our pricing analytics workload, we saw slightly better performance especially at 192 cores:

MariaDB AMD EPYC 9005 Turin SSD Sensitivity by SKU Solidigm D7-PS1010
MariaDB AMD EPYC 9005 Turin SSD Sensitivity by SKU Solidigm D7-PS1010

That may not seem like a lot, but using a newer generation of drives effectively gave us a performance benefit similar to adding 5-19 cores. That is a huge deal.

Solidigm D7 PS1010 And D7 PS1030 Specs
Solidigm D7 PS1010 And D7 PS1030 Specs

We grabbed these drives because we knew that they were new and very fast. Still, the high core-count CPUs are really showing bottlenecks where we might not have seen them previously.

Something similar happened on the networking side. After seeing the storage change, we thought that new faster CPUs with 192 cores might need more networking than just one 100GbE link per CPU. Since we have the new Broadcom 400GbE NICs, we installed them in the AMD Volcano platform.

Broadcom 400GbE OCP NIC 3.0 Angle 2
Broadcom 400GbE OCP NIC 3.0 Angle 2

Unfortunately, we only had one of each card, but we could get a total of 400Gbps on each CPU (1x 400GbE and 2x 200GbE.) Not perfect, but it is what we had.

Broadcom Dual Port 200GbE 400GbE Generation NIC Cover
Broadcom Dual Port 200GbE 400GbE Generation NIC Cover

As we would imagine, hitting our SLA on the STH nginx CDN benchmark was easier with faster networking.

STH nginx CDN AMD EPYC 9005 Turin Network Sensitivity
STH nginx CDN AMD EPYC 9005 Turin Network Sensitivity

We saw a smaller impact on the pricing analytics side.

MariaDB AMD EPYC 9005 Turin SSD Sensitivity by SKU Network
MariaDB AMD EPYC 9005 Turin SSD Sensitivity by SKU Network

These NICs are also relatively low power and more power efficient on a pJ/bit basis than the 100GbE ConnectX-6 NICs we often use in the lab.

Introducing The Broadcom 400GbE RDMA NIC
Introducing The Broadcom 400GbE RDMA NIC

This was cool to be able to show, but it was also a bit frustrating. We only had a limited amount of time with the system, and three sets of CPUs to test, so finding something like this put us behind. On the other hand, it is a really valuable insight, and probably a step beyond the “more cores = more better” message that we would have expected with this review.

Let us next get to performance.

2 COMMENTS

  1. Smart Data Cache Injection (SDCI) which allows direct insertion of data from I/O devices into L3 cache could be a huge gain for low latency network IO workloads. It’s similar to Intel’s Data Direct I/O (DDIO).

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.