A project we work on with every new processor generation is to build a comparative dataset around intra and intergenerational CPU performance. Today we are providing some initial benchmarks of the Quad Intel Xeon Platinum 8180 configuration since we wanted to start with the top end of the Intel Xeon Scalable Processor Family. Over the next few days, we will release plenty of additional data. We are also working to run some of our larger benchmarks and also a comparison to AMD EPYC, when those systems are ready. While you can buy this Quad Intel Xeon Platinum 8180 configuration today (July 11, 2017) from several vendors, AMD EPYC is weeks or months away from availability.
Since this is STH after all, we wanted to get some numbers up for launch week.
Test Configuration
Our test configuration used an Intel platform.
- CPU(s): 4x Intel Xeon Platinum 8180 28 core/ 56 thread CPUs (112 cores/ 224 threads total) with 2.5GHz base and 3.7GHz turbo clocks, 38.5MB L3 cache each
- Platform: Intel S4PR1SY2B
- RAM: 768GB in 24x SK.Hynix 32GB DDR4-2666 2RX4 DIMMs
- OS SSD(s): 1x Intel DC S3700 400GB
- OSes: Ubuntu 14.04 LTS, Ubuntu 17.04, CentOS 7.2
This is the platform that Intel sent us for review. For those wondering, the max power we saw on this system was 1336W on our 208V lab racks.
The numbers we have should be comparable to what you will see with a quad Intel Xeon Platinum 8180M setup as the clock speeds are the same. The slight difference may be that one would be using a different memory configuration with that SKU which may have a minor impact on performance.
Quad Intel Xeon Platinum 8180 Benchmarks
For our testing, we are using Linux-Bench scripts which help us see cross platform “least common denominator” results. We are using gcc due to its ubiquity as a default compiler. One can see details of each benchmark here. We are already testing the next-generation Linux-Bench that can be driven via Docker and uses newer kernels to support newer hardware. The next generation benchmark suite also has an expanded benchmark set that we are running regressions on. For now, we are using the legacy version that now has over 100,000 test runs under its belt.
Python Linux 4.4.2 Kernel Compile Benchmark
This is one of the most requested benchmarks for STH over the past few years. The task was simple, we have a standard configuration file, the Linux 4.4.2 kernel from kernel.org, and make utilizing every thread in the system. We are expressing results in terms of compiles per hour to make the results easier to read.
The quad Intel Xeon Platinum 8180 setup is an absolute beast here. It easily bests the fastest Broadwell system we tested, the quad Intel Xeon E7-8890 V4 Dell PowerEdge R930.
c-ray 1.1 Performance
We have been using c-ray for our performance testing for years now. It is a ray tracing benchmark that is extremely popular to show differences in processors under multi-threaded workloads.
C-ray 1.1 generally fits within caches so it will scale just about as fast as one can throw cores at it. Here is the key takeaway: we are adding an 8K resolution to the next batch of testing. The quad Intel Platinum machines obliterate what we had as a “hard” test in 2012. What a difference five years make.
7-zip Compression Performance
7-zip is a widely used compression/ decompression program that works cross platform. We started using the program during our early days with Windows testing. It is now part of Linux-Bench.
In terms of compression performance, one can see that the quad Xeon Platinum machines perform well as is expected. There is a nice bump over the Xeon E7-8890 V4 generation.
NAMD Performance
NAMD is a molecular modeling benchmark developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. More information on the benchmark can be found here. We are going to augment this with GROMACS in the next-generation Linux-Bench and are building a dataset of those results for future publication.
In terms of NAMD, it is not a huge surprise that the fastest chips perform the best. We added the 8x Intel Xeon E7-8870 V2 results in there just to show the impact for those looking to upgrade. One can get by with half of the CPU sockets of just three generations ago.
Sysbench CPU test
Sysbench is another one of those widely used Linux benchmarks. We specifically are using the CPU test, not the OLTP test that we use for some storage testing.
We left in an older Nahelem result in this set along with some Sandy bridge results. If you were using a lower-end quad E5-4620 V1 machine, we are at the point where you can consolidate multiple machines onto dual socket Xeon Platinum and Gold.
OpenSSL Performance
OpenSSL is widely used to secure communications between servers. This is an important protocol in many server stacks. We first look at our sign tests:
Here are the verify results:
The bottom line here is simple, we are seeing a fairly massive improvement in OpenSSL speeds. These tests were done without using an onboard Intel QuickAssist PCH. In this generation, PCH capabilities can greatly enhance some OpenSSL performance.
UnixBench Dhrystone 2 and Whetstone Benchmarks
One of our longest running tests is the venerable UnixBench 5.1.3 Dhrystone 2 and Whetstone results. They are certainly aging, however, we constantly get requests for them, and many angry notes when we leave them out. UnixBench is widely used so it is a good comparison point.
And whetstone:
We left the single thread results in here just to show how comical it is getting. Remember, the Intel Xeon Platinum 8180 is a 224 thread machine. At some point, single threaded performance matters, but TCO business cases are going to be made largely on consolidation. Having more threads helps.
Single Redis Instance Benchmarks
We unleash a single Redis instance for these benchmarks and generate set/ get requests against the instance. This is more of a frequency plus memory bandwidth bound workload rather than a CPU speed bound result.
Lots of memory bandwidth help the Platinum 8180 stay on top of these results. The speeds are fairly well grouped. We will be using this as a base for one of our multi-application Docker tests in the next-generation Linux-Bench.
Inter-Socket Latency with Intel Memory Latency Checker
We did want to touch upon one hot topic, especially in light of our recent NUMA piece with Intel and AMD. We are going to have more on this soon with dual socket results and AMD EPYC results. Here is a teaser of what one can expect:
Putting this into perspective, Intel actually has inter-socket idle latency that is better than our AMD EPYC 7601 system with DDR4-2400 is currently putting out. That is a phenomenal result and will help explain some of the performance findings we have later.
Final Words
Overall, this system is a beast, and it should be. The list price for this configuration is likely around $50,000 and up, so it is not inexpensive. On the other hand, if you need a scale up node, perhaps due to licensing costs, this system is hard to beat. Even with the 28 core die Intel is able to raise the TDP of the chips to 205W and maintain very respectable clock speeds.
In the near future, we are going to have several other performance data points on some of the larger applications we test. The above should provide at least a relative sense of performance on Intel’s top end 4-socket Skylake-SP configuration.
You can read more about the new chips at our Intel Xeon Scalable Processor Family launch coverage headquarters.
Just btw.. the 8180M is a Platinum Xeon which means it should have dual AVX512 units per core – and the above benchmarks probably don’t even support AVX512 instructions, 256 byte wide AVX2 at the most. When they do, this thing should really fly (at least the non-QA OpenSSL, NAMD, C-Ray should).
How does it compare to Epyc line
Well, first off, you cannot yet buy EPYC servers and you can buy the Platinum 8180 today.
We are waiting until we have final production firmware on one of the commercial EPYC systems we have in the lab before publishing. We are not publishing EPYC numbers based on an AMD supplied unit since you cannot buy one of those on the open market.
Note that for high end servers which require high amounts of RAM memory, the primary difference between the US$10,000 Platinum 8180 and US$13,000 Platinum 8180M is that the 8180M can support 1.5 TB of RAM at maximum.
I dont really support such heavy market segregation, but here is some info which you may find helpful.
M series- Extra High memory capacity
Xeon Platinum- allows up to Octa CPU
DDR4 2666
Xeon Gold- allows up to Quad CPU
DDR4 2400
Xeon Bronze/Silver- allows up to Dual CPU
DDR4 2133
Also note that Platinum/Gold has 2 AVX512 FPUs per core while Silver/Bronze only has one AVX512 FPU per core.
What a mess this is. Intel thinks they are running a CPU olympics :)
Most people would be buying Xeon Silver/Bronze/i9/i7 since they are far cheaper, and since they have fewer but more powerful cores, they are more economical for per core licenses.
A Xeon Gold 6134 at 18C 36T 3.20GHz Base clocks would be the best choice. High base clocks per core, and still has many cores.
Hope this helps.
Koh_GT
Hi Koh_GT – that is mostly accurate. There are a few items such as Gold 6100 series supports DDR4-2666 and the Silver series supports DDR4-2400 that readers should be aware of.
We have the most complete set of technical information and benchmarks of the new CPUs which you can find centrally located via our Intel Xeon Scalable Processor Family Coverage Central.
what motherboard was used for quad cpu support for all four Platinum 8180 ?
I’d like to know the motherboard too. I’ve no idea why such an important component would be left out from a review.
Hi Patrick,
Thanks for this review.
Quad E7-8870 V2 is present twice in the NAMD chart, with different values.
We are not publishing epyc numbers amd supplied unit since you cannot buy one of those on the open market.
Tec Industrial – that was July 2017. Quite a few things have changed since then.
For this configuration is likely around $50,000 and up, so it is not inexpensive.