AMD EPYC 7351P Single Socket CPU Linux Benchmarks and Review

21
AMD EPYC And Xeon Scalable In Trays
AMD EPYC And Xeon Scalable In Trays

Today we have our AMD EPYC 7351P Linux benchmarks and review. This is part of a much larger series as many of our longtime readers may have seen with our Intel Xeon Silver 4114 benchmarks earlier this week. Make no mistake, the AMD EPYC 7351P performance is very good. So much so that it is going to make some of our readers feel a bit uncomfortable about purchases they may have been planning to make.

Background: Heavy Legwork to Build a Useful Dataset

At STH, we are working on a major project. We have over $100,000 worth of current generation AMD EPYC and Intel Xeon Scalable CPUs in the lab. Several racks and 6kW dedicated to a project in the data center. We have the CPUs in-house for over 40% of all the single and dual socket AMD EPYC and Intel Xeon Scalable configurations. That is a huge project that we have already invested over $250,000 in that we will be detailing a bit more on soon. Perhaps one of the more interesting areas from all of these different CPUs is around AMD EPYC’s single socket parts. There are three EPYC SKUs: 7351P, 7401P and 7551P that are identical to their dual socket counterparts except for two areas. First, they are single socket capable and cannot be used in dual socket configurations. Second, they are priced at an enormous discount. Today we are going to publish our first EPYC numbers for a single socket only part, the AMD EPYC 7351P.

Up to this point, the vast majority of benchmarks found online have been ad-hoc, at best in their comparisons. Running so many servers to generate data sets is expensive and we have bought CPUs and systems to accelerate our testing schedule. Beyond that, we also have an extremely controlled data center environment where we monitor temperature and humidity as they are key inputs to overall server performance and power consumption. By scaling up our efforts, we are able to quickly provide a complete comparison set.

Comparing the AMD EPYC 7351P today we have other AMD EPYC CPUs in the sub $1000 price range. We also have the entire Xeon Silver range represented in both single, and where applicable, dual socket configurations. These are Intel’s offerings in the sub-$1000 segment (save the Bronze 3104 and 3106 that we already covered.) Today is when the industry moves from ad-hoc one-to-one comparisons to actionable comparisons. Our goal is that as we release even more of our giant data set, buyers will be able to make informed decisions looking at incremental price and performance.

AMD EPYC 7251 In Socket And Carrier
AMD EPYC In Socket And Carrier

Key stats for the AMD EPYC 7351P: 16 cores / 32 threads, 2.4GHz base and 2.9GHz turbo with a whopping 64MB L3 cache. The CPU features a 170W TDP. Here is the AMD product page with the feature set. Here is the lscpu output for the processor:

AMD EPYC 7351p Lscpu
AMD EPYC 7351p Lscpu

Since the AMD EPYC architecture is going to be new for many, we wanted to provide that CPU feature set output. Although you may see 8MB L3 cache in the lscpu output, the chip actually carries a staggering 64MB L3 cache. That means that this ~$750 CPU has more L2+L3 cache than Intel’s top of the line Xeon Scalable 28 core part. AMD achieves this by using four die per package instead of Intel’s single die design which you can read about in our AMD EPYC and Intel Xeon Scalable Architecture Ultimate Deep Dive.

Test Configuration

By the end of September, we will have every AMD EPYC SKU tested on a common Tyan EPYC platform and work started on another platform. Here is the base hardware configuration we are using:

  • CPU: AMD EPYC 7351P
  • Server Barebones: Tyan Transport SX TN70A-B8026 (B8026T70AE24HR)
  • RAM: 8x 16GB 128GB DDR4-2666 RDIMMs (Samsung)
  • SSD: 1x Intel DC S3710 400GB SATA SSD
  • NIC: 1x Mellanox ConnectX-3 Pro EN VPI
Tyan Transport SX B8026T70AE24HR Front And Rear
Tyan Transport SX B8026T70AE24HR Front And Rear

Key to this system is that it supports 24x NVMe U.2 NVMe SSDs without using Broadcom PLX PCIe expanders. That is 96 lanes of PCIe 3.0 directly from a single SKU. One of the key advantages AMD EPYC has is that a single EPYC CPU can use 128x PCIe lanes, the same number as the dual socket configuration. Tyan has responded to this opportunity by offering a single-socket system that can handle 24x NVMe drives plus have I/O available for 10/25/40/50/100GbE.

Tyan Transport SX B8026T70AE24HR Internal 1
Tyan Transport SX B8026T70AE24HR Internal 1

AMD and Tyan originally suggested that we use a Samsung SSD (as pictured), however, to aid in consistency, we are using our lab standard Intel DC S3710 400GB SSDs.

AMD EPYC 7401 In Tyan 24 Bay NVMe 2U
AMD EPYC 7401 In Tyan 24 Bay NVMe 2U

In our forthcoming system review, we will have data on every CPU from the AMD EPYC 7251 to the EPYC 7601 for those looking at the system.

21 COMMENTS

  1. I’m in IT procurement for a big F100 company in the US. We’re struggling with AMD to Intel now and our IT staff is trying to quantify for the business case. You may have done their jobs for them. +1+1 for the timely article. I’m emailing them this now

  2. Just what I needed. Your point on completeness is good. It’s also good that you’ve got a consistent setup and enviro for testing. We use SPEC06 for our purchases but it’s tough to use. You know it’s the OEM with the best tuning team getting the best results. I’m also sure that the OEMs have some special cooling that they’re using or something for their runs to get the best #’s. Having AMD and Intel H2H is a value to the IT community. Scripted + same rack = consistent. We know we’re buying nodes. This is helpful for gauging relative performance improvement.

    3 ?’s:
    — Ever consider using less stodgy lingo? It sounds like you’re a professor.
    — You mention a larger data set. I can see you’re running more than you’re reporting here. Are you accepting inquiries to purchase an expanded set?
    — Are these nodes going on DemoEval?

  3. So for power draw.
    7351P max load 340W?
    Dual Xeon Silver 4110 under 200W?
    The 2 seem to have same performance on some tests.
    Is that correct?

    Thank you.

  4. @patrick Do you know if an Epyc P part will run fine as the single processor in a dual socket board? Or it must be installed on a single socket board?

  5. I would like to see how these perform for workloads like medium sized databases (postgres / MySQL) compared to Intel, I suppose these kinds of workloads may suffer a lot more from NUMA than the ones in your testsuite.

  6. That Tyan server looks amazing with all that NVMe connectivity, and the BCM5720 NIC with dual 1G RJ45. We are using the Silver 4110 in a lot of configurations, so I can see the EPYC P-processors as a good alternative, offering greater value, performance and connectivity. Often I go for single-socket, which is advantageous from a licensing perspective, but it often means a compromise on I/O.

    From the article is mentioned “The AMD EPYC 7351P can handle 16x DDR4-2666 DIMMs (8 channel)”. In the presentations I have seen, EPYC only support DDR4-2666 with 1 DPC, and DDR4-2133 with 2 DPC. Has this been improved with microcode improvements similar to desktop Ryzen?

  7. Could you post some benchmarks under Windows 10 Pro / Windows Server like Cinebench or FFmpeg? I think EPYC processors can be very useful for 3d content production and multimedia production. Thanks in advance and congrats for the reviews!

  8. @Francesco F
    I second that, but understand that Servethehome is focused on server benchmarks.
    And yes EPYC is a wonderfull platform for a program like Davinci Resolve which runs mainly on GPU.
    Staxrip is a good program to see how the multi-thread speed is (lot better than handbrake).

  9. A stellar article read this morning. Thanks for doing Xeon D as well. Xeon D costs more for 16c but it’s got a miniscule tdp of 65w for 16c. Intel really needs some cheaper performance parts in the silver line. I’m all for low power but there’s no performance option in Silver or Bronze.

    We don’t need every number but a good set of relative numbers is all we need. This hit the spot. I don’t like seeing places use geekbench or others with user submittals since ya never know how differ to the systems being tested are or if there’s something else running.

  10. @Misha Engel
    I understand, but some software like V-Ray (Cinebench test), Adobe Media Encoder (FFmpeg test) works primarily on CPU. STH have already done some Cinebench test in the past.

  11. +Franchesco F don’t they have c-ray here and show Cinebench can’t handle high spec nodes in that linked article? Cinebench is worthless on servers now. I’m reading the words in the article and not comprehending your comment? Or you just want to see Cinebench and Windows on a CPU where Windows 10 Pro will occupy 0.00000000001% of the install base?

    I’d like 24c results.

  12. Seems my reply get stuck, anyway well done, nice benchmarks!
    Over at phoronix they also did some tests with numactl which essentially took a single 7601 epyc ahead of a dual 6137 gold server, which is quite impressive.
    I really had high hopes for the new xeons but cutting away the avx units drove me to order some new broadwell nodes.
    As soon as there is a quad node in 2U server for epyc available, we will switch to epyc.

  13. Wow this is the best. you’ve got so much comparison data for a site that isn’t aggregating user data. i’m seeing like every server cpu combo under $1K. i like the c3958 and c3955 and d-1567 too for other 16 core intel. it appears cheap by comparison. good job amd too on delivering cheap parts with cores and performance

  14. @Misha
    it seems like SuperMicro is working on something: H11DST-B
    But it is too late for us this year. What was nice, even our distributor preferred the broadwell over the silver cpus.
    Do you happen to know a good MKL alternative? I know ATLAS and openBLAS, but had no time to test it against MKL, the biggest problem will be a good fortran compiler, I guess IFORT still beats gfortran by a large margin

  15. @Nino
    It depends mostly on the size of the data set. If it fits completely in RAM then only writes are i/o bound, although this might be interesting by itself since of course I/O also suffers from NUMA, especially with fast NVMe drives.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.