AMD EPYC 7552 Test Configurations
For our testing, we utilized both single and dual-socket configurations to show different aspects of performance, check the 1P story, and gain access to both of our comparison sets.
Test Configuration: Single Socket
For most of our charts, we are using the Tyan Transport SX TS65A-B8036.
- Platform: Tyan Transport SX TS65A-B8036
- CPU: AMD EPYC 7552
- RAM: 8x 32GB Micron DDR4-3200 RDIMMs
- OS SSD: 400GB Intel DC S3700
- Data SSD: 960GB Intel Optane 905P
You are going to see more about this platform, but this is a PCIe Gen4 single-socket platform from Tyan that has 16x front U.2 NVMe bays, 10x front SATA/ SAS bays, two rear 2.5″ SATA OS SSD bays, six expansion slots on risers and an OCP mezzanine slot. All this is achieved using a single AMD EPYC 7002 series CPU.
Some servers are not going to be designed to accept such high-power CPUs so one needs to watch out for those when shopping for servers. On the single-socket side, it is less of a concern. In form factors such as 2U 4-node, it will be a bigger concern.
Just to note, the topology picture above was using this server to show a single CPU to keep things simple.
Test Configuration: Dual Socket
We also had a test configuration for dual-socket processors. We have been using the AMD “Daytona” reference platform and we had the latest AGESA for this as well.
- Platform: AMD “Daytona” Reference Platform
- CPUs: 2x AMD EPYC 7552
- RAM: 16x 32GB Micron DDR4-3200 RDIMMs
- OS SSD: 400GB Intel DC S3700
- Data SSD: 960GB Intel Optane 905P
Note, this platform looks a lot like the Quanta AMD EPYC Rome Servers Set to Make a Splash that we covered on a Q4 2019 visit to Taipei, Taiwan.
Next, let us get to performance before moving on to our market analysis section.
AMD EPYC 7552 Performance
For this exercise, we are using our legacy Linux-Bench scripts which help us see cross-platform “least common denominator” results we have been using for years as well as several results from our updated Linux-Bench2 scripts. Starting with our 2nd Generation Intel Xeon Scalable refresh benchmarks, we are adding a number of our workload testing features to the mix as the next evolution of our platform.
At this point, our benchmarking sessions take days to run and we are generating well over a thousand data points. We are also running workloads for software companies that want to see how their software works on the latest hardware. As a result, this is a small sample of the data we are collecting and can share publicly. Our position is always that we are happy to provide some free data but we also have services to let companies run their own workloads in our lab, such as with our DemoEval service. What we do provide is an extremely controlled environment where we know every step is exactly the same and each run is done in a real-world data center, not a test bench.
We are going to show off a few results, and highlight a number of interesting data points in this article.
Python Linux 4.4.2 Kernel Compile Benchmark
This is one of the most requested benchmarks for STH over the past few years. The task was simple, we have a standard configuration file, the Linux 4.4.2 kernel from kernel.org, and make the standard auto-generated configuration utilizing every thread in the system. We are expressing results in terms of compiles per hour to make the results easier to read:
This is something we are going to see a lot in these charts. The EPYC 7552 is somewhere between a number of different options. It is not quite as fast as the other 48-core EPYC 7002 parts, yet it is a significant step up from the 32 core offerings..
c-ray 1.1 Performance
We have been using c-ray for our performance testing for years now. It is a ray tracing benchmark that is extremely popular to show differences in processors under multi-threaded workloads. We are going to use our 8K results which work well at this end of the performance spectrum.
This is a benchmark that we started to use several years ago. There are architectural reasons the AMD Zen and Zen 2 chips perform extremely well here. Instead of looking at AMD versus Intel, it is best to look at AMD v. AMD here.
We can see performance more closely aligns with the number of cores here which puts this part in the 48-core space behind the AMD EPYC 7642.
7-zip Compression Performance
7-zip is a widely used compression/ decompression program that works cross-platform. We started using the program during our early days with Windows testing. It is now part of Linux-Bench.
As we go through these benchmarks, you can look to the Intel Xeon Platinum 8280 numbers as being similar to the Xeon Gold 6258R representing the top-end of the Intel Xeon line at this time. Our Gold 6258R review is coming and is already submitted to the publishing queue.
NAMD Performance
NAMD is a molecular modeling benchmark developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. More information on the benchmark can be found here. With GROMACS we have been working hard to support AVX-512 and AVX2 architectures. Here are the comparison results for the legacy data set:
Although we generally see much more performance with these parts versus the Intel Xeon parts, we will note that AMD has the EPYC 7702P and EPYC 7742 along with a few other 64-core parts on the market. We do not expect Intel to be competitive on a core count basis until 2022 giving these 2019-2020 chips a large lead.
OpenSSL Performance
OpenSSL is widely used to secure communications between servers. This is an important protocol in many server stacks. We first look at our sign tests:
Here are the verify results:
This is a workload that Intel may argue can be offloaded to QAT accelerators. Still, most architectures today do not have QAT accelerators since Intel uses this as an add-on sale either through accelerator cards or upgraded PCHs. Here we can see that when one is not using an accelerator either QAT or another dedicated offload device, the EPYC 7552 performance is excellent.
UnixBench Dhrystone 2 and Whetstone Benchmarks
Some of the longest-running tests at STH are the venerable UnixBench 5.1.3 Dhrystone 2 and Whetstone results. They are certainly aging, however, we constantly get requests for them, and many angry notes when we leave them out. UnixBench is widely used so we are including it in this data set. Here are the Dhrystone 2 results:
Here are the whetstone results:
This was an interesting result where unexpectedly the EPYC 7642 and EPYC 7552 were very close. The data center temperatures were within 0.2C at the server inlet with the same relative humidity for the test runs but perhaps here we are seeing a small delta due to that. The EPYC 7642 we would expect to be slightly faster on a consistent basis rather than effectively showing even performance.
Chess Benchmarking
Chess is an interesting use case since it has almost unlimited complexity. Over the years, we have received a number of requests to bring back chess benchmarking. We have been profiling systems and now use the results in our mainstream reviews:
We should note that the 48-cores do not make the EPYC 7552 50% faster than the EPYC 7502 on all of our tests. There are still TDP and clock limitations that meter the performance per core. At the same time, we do get a notable increase in performance across the chips. The result of this is that if one has a per-core licensed application set, these are not the chips you would want to run on. Instead, the AMD EPYC 7F52 and other “F” parts are more appropriate.
STH STFB KVM Virtualization Testing
One of the other workloads we wanted to share is from one of our DemoEval customers. We have permission to publish the results, but the application itself being tested is closed source. This is a KVM virtualization-based workload where our client is testing how many VMs it can have online at a given time while completing work under the target SLA. Each VM is a self-contained worker.
As a quick note here, since we only have 8 cores, this is not memory limited on either Intel or AMD platforms.
As we can see, the 48 cores help significantly in these scenarios. There is always a case to be made between this as a 48-core part and a higher-end 64-core part if one wants to push virtualization consolidation ratios. At the same time, the performance per core and price per core make the EPYC 7552 very attractive.
Next, we are going to get into the “so what” and discuss market positioning for the processor before giving our final words.
For real datacenter work Bang per powerconsumption is also an important metric.