AMD EPYC 3351 Benchmarks
For this exercise, we are using our legacy Linux-Bench scripts which help us see cross-platform “least common denominator” results we have been using for years as well as several results from our updated Linux-Bench2 scripts. Starting with our 2nd Generation Intel Xeon Scalable benchmarks, we are adding a number of our workload testing features to the mix as the next evolution of our platform.
At this point, our benchmarking sessions take days to run and we are generating well over a thousand data points. We are also running workloads for software companies that want to see how their software works on the latest hardware. As a result, this is a small sample of the data we are collecting and can share publicly. Our position is always that we are happy to provide some free data but we also have services to let companies run their own workloads in our lab, such as with our DemoEval service. What we do provide is an extremely controlled environment where we know every step is exactly the same and each run is done in a real-world data center, not a test bench.
We are going to show off a few results, and highlight a number of interesting data points in this article.
Python Linux 4.4.2 Kernel Compile Benchmark
This is one of the most requested benchmarks for STH over the past few years. The task was simple, we have a standard configuration file, the Linux 4.4.2 kernel from kernel.org, and make the standard auto-generated configuration utilizing every thread in the system. We are expressing results in terms of compiles per hour to make the results easier to read:
Here we see the power of cores. We are actually getting more performance than the AMD EPYC 7232P here. That part is the entry-level in AMD’s mainstream server systems.
c-ray 1.1 Performance
We have been using c-ray for our performance testing for years now. It is a ray tracing benchmark that is extremely popular to show differences in processors under multi-threaded workloads. We are going to use our 8K results which work well at this end of the performance spectrum.
Here the additional core count plus AMD Zen architecture allows this chip to pull well ahead of even the 16-core Intel Xeon D-1587.
7-zip Compression Performance
7-zip is a widely used compression/ decompression program that works cross-platform. We started using the program during our early days with Windows testing. It is now part of Linux-Bench.
Here we can see performance well above the AMD EPYC 3251. Although per-core clocks are moderated compared to the single die part, we still see nice scaling. Also, one needs to keep in mind that this is not just a core count increase from 8 to 12. It also entails doubling I/O and memory capacity.
NAMD Performance
NAMD is a molecular modeling benchmark developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. More information on the benchmark can be found here. With GROMACS we have been working hard to support AVX-512 and AVX2 architectures. Here are the comparison results for the legacy data set:
Here there is an interesting comparison to be made. The Intel Xeon D-2141I is an 8-core embedded part that has a slightly higher list price than the EPYC 3351. If your appliance can handle more cores, you can get more cores and more I/O at a lower price.
Sysbench CPU test
Sysbench is another one of those widely used Linux benchmarks. We specifically are using the CPU test, not the OLTP test that we use for some storage testing.
Here the performance of the chip is just between the Xeon D-2141I and the Intel Atom C3955. As we have noted previously, this tends to be a test that performs very well on the Atom.
OpenSSL Performance
OpenSSL is widely used to secure communications between servers. This is an important protocol in many server stacks. We first look at our sign tests:
Here are the verify results:
Comparing this part to the Intel Xeon D-1557, which is a 12-core embedded CPU, one can see that the EPYC 3351 performs better and at a lower cost ($539 v. $697.) The EPYC 3351 although hitting mass production in late 2019, was an early 2018 generation product based on a 2017 core. We see these two chips battling it out in the market. For its part, the D-1557 has much more limited I/O and memory capacity, but at a lower TDP. CPU performance is only part of the equation.
UnixBench Dhrystone 2 and Whetstone Benchmarks
Some of the longest-running tests at STH are the venerable UnixBench 5.1.3 Dhrystone 2 and Whetstone results. They are certainly aging, however, we constantly get requests for them, and many angry notes when we leave them out. UnixBench is widely used so we are including it in this data set. Here are the Dhrystone 2 results:
Here are the whetstone results:
Here we see a fairly significant drop-off from the 16 core parts which is expected with fewer cores and a lower TDP.
Chess Benchmarking
Chess is an interesting use case since it has almost unlimited complexity. Over the years, we have received a number of requests to bring back chess benchmarking. We have been profiling systems and now use the results in our mainstream reviews:
Here we again see what we would expect from the embedded part with a solid level of performance somewhere between the 8 and 16 core embedded parts and closer to an 8-core mainstream socketed server part.
STH STFB KVM Virtualization Testing
One of the other workloads we wanted to share is from one of our DemoEval customers. We have permission to publish the results, but the application itself being tested is closed source. This is a KVM virtualization-based workload where our client is testing how many VMs it can have online at a given time while completing work under the target SLA. Each VM is a self-contained worker.
Here we can see scaling just below the 16-core AMD EPYC 3451 but also above the Xeon D-1557 and even the Xeon D-1587 parts. This shows the power of the newer architecture and the extra clock speed, caches, as well as memory bandwidth. These VMs are larger and more resource hungry as they are designed for larger server platforms. Still, we can see these embedded parts able to handle many of these
Next, we are going to discuss market positioning and impact before getting to our final words.
In the test config , shouldn’t the 3351 have only 12 cores?
Can someone just make that exact board. Double stack the SFP+ cages and add a BMC for oob management?
Is this a commercially available board? It looks awesome
> These are $539 list price parts.
hmm. list price for what? for epyc cpu the list price was $450
also, what is the name of this board? what is being tested here?
Andres/alt: this is AMD Wallaby platform as mentioned in the text. I guess it is not commercially available if you do not like over-priced engineering boards made by/for SoC vendor to show stuff to its customers (which are other hardware vendors). Just wait for common vendors (Kontron/Advantech/Supermicro etc.) to add this into their portfolio.
I think I am with Nate77 and alt. Let’s have a commercially available board and let it have 8 10 gig Ethernet ports.
Testing an AMD demo board is all very nice, but XEON D is on the shelves NOW.
alt – that is the 1K list price AMD sent as of a few weeks ago for the EPYC 3351 SoC.
Also, yes, this was done on the Wallaby development platform.
There reviews are always interesting. The embedded AMD parts are making me want to replace my home lab… but it doesn’t seem there are any groups selling them. SuperMicro only has lower end ones. It would be great if as part of these reviews you could reach out to OEMs to see what their ETA for availability is.
@Nate77 @emerth – do you think a software solutions can saturate 8 10 Gbit links, while doing something meaningful?
If so, please include some benchmark data.
So where can we buy this???