Quad Intel Xeon Platinum 8276L Benchmarks
For this exercise, we are using our legacy Linux-Bench scripts which help us see cross-platform “least common denominator” results we have been using for years as well as several results from our updated Linux-Bench2 scripts. Starting with our 2nd Generation Intel Xeon Scalable benchmarks, we are adding a number of our workload testing features to the mix as the next evolution of our platform.
At this point, our benchmarking sessions take days to run and we are generating well over a thousand data points. We are also running workloads for software companies that want to see how their software works on the latest hardware. As a result, this is a small sample of the data we are collecting and can share publicly. Our position is always that we are happy to provide some free data but we also have services to let companies run their own workloads in our lab, such as with our DemoEval service. What we do provide is an extremely controlled environment where we know every step is exactly the same and each run is done in a real-world data center, not a test bench.
We are going to show off a few results, and highlight a number of interesting data points in this article.
Python Linux 4.4.2 Kernel Compile Benchmark
This is one of the most requested benchmarks for STH over the past few years. The task was simple, we have a standard configuration file, the Linux 4.4.2 kernel from kernel.org, and make the standard auto-generated configuration utilizing every thread in the system. We are expressing results in terms of compiles per hour to make the results easier to read:
Here we see the quad Intel Xeon Platinum 8276L just about where we would expect with a slight speed bump over the Xeon Platinum 8176.
Although our tests have evolved since STH was doing Intel Xeon E7 testing, here is a very interesting comparison to keep in mind:
The Intel Xeon Platinum 82xxL parts, like the Intel Xeon Platinum 8276L we have here, are the first time that Intel has offered a maximum RAM increase in this segment since the 2017 Intel Xeon E7-88xx series.
c-ray 1.1 Performance
We have been using c-ray for our performance testing for years now. It is a ray tracing benchmark that is extremely popular to show differences in processors under multi-threaded workloads. We are going to use our 8K results which work well at this end of the performance spectrum.
We did not have the c-ray 8K test when we did our Intel Xeon E7 testing such as with the Dell PowerEdge R930.
7-zip Compression Performance
7-zip is a widely used compression/ decompression program that works cross-platform. We started using the program during our early days with Windows testing. It is now part of Linux-Bench.
We wanted to provide a chart “de-noised” from previous generations. Here one can see a quick comparison with nice scaling between the quad Intel Xeon Platinum 8276L, quad Intel Xeon Platinum 8260, and quad Intel Xeon Gold 6242.
NAMD Performance
NAMD is a molecular modeling benchmark developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. More information on the benchmark can be found here. We are going to augment this with GROMACS in the next-generation Linux-Bench in the near future. With GROMACS we have been working hard to support Intel’s Skylake AVX-512 and AVX2 supporting AMD Zen architecture. Here are the comparison results for the legacy data set:
Here we see solid scaling putting the quad Intel Xeon Platinum 8276L between the quad Intel Xeon Platinum 8180 and Platinum 8176 figures.
OpenSSL Performance
OpenSSL is widely used to secure communications between servers. This is an important protocol in many server stacks. We first look at our sign tests:
Here are the verify results:
Here we see great performance. The Intel Xeon E7-8890 V4 is almost a predecessor in the Intel Xeon Platinum 8276 swim lane with around the same price tag. That Intel Xeon E7-8870 V4 supports 3TB of memory which is more than the Intel Xeon Platinum 8276. The newer CPUs get more cores and higher clock speeds, which improve performance a significant amount.
UnixBench Dhrystone 2 and Whetstone Benchmarks
Some of the longest-running tests at STH are the venerable UnixBench 5.1.3 Dhrystone 2 and Whetstone results. They are certainly aging, however, we constantly get requests for them, and many angry notes when we leave them out. UnixBench is widely used so we are including it in this data set. Here are the Dhrystone 2 results:
Here are the whetstone results:
Absolutely these are too old of benchmarks for a modern quad-socket system. However, we are simply going to present the data. Perhaps one of the more interesting aspects is that they are still scaling. Many virtualized workloads are legacy applications optimized years ago, if at all.
GROMACS STH Medium AVX2/ AVX-512 Enabled
We have a small GROMACS molecule simulation we previewed in the first AMD EPYC 7601 Linux benchmarks piece. In Linux-Bench2 we are using a “medium” test across quad socket nodes. Our GROMACS test will use the AVX-512 and AVX2 extensions if available.
We very rarely use our medium case since our “small” case tends to work very well across the range of single and dual CPU configurations we test. Here, one can see very strong performance.
Chess Benchmarking
Chess is an interesting use case since it has almost unlimited complexity. Over the years, we have received a number of requests to bring back chess benchmarking. We have been profiling systems and are ready to start sharing results:
Here we have fewer results again simply because this was added later than some of our other tests. the quad Intel Xeon Platinum 8276L configuration again performed well here.
STH STFB KVM Virtualization Testing
One of the other workloads we wanted to share is from one of our DemoEval customers. We have permission to publish the results, but the application itself being tested is closed source. This is a KVM virtualization based workload where our client is testing how many VMs it can have online at a given time while completing work under the target SLA. Each VM is a self-contained worker.
There is an interesting result here. Even with being mostly CPU bound, we see an ever so slight departure that was consistent between runs at the “small” VM sizes due to using pure DDR4 versus DDR4 + Intel Optane DCPMM in the quad Intel Xeon Platinum 8276L configuration.
The company also has a CPU-light back-end workload that is mostly dependent on Redis performance and memory capacity with less of a CPU stressor.
Since we are looking at the quad Intel Xeon Platinum 8276L here, we wanted to show off the impact of the “L” in terms of memory capacity with Intel Optane DCPMM. One can see much higher utilization in the larger VM sizes mostly driven by larger memory capacity. Using DDR4 only, the results are much closer.
Next, we are going to discuss market positioning before our final words.
Here, one can see very strong performance here. ???
Hi,
When you are talking about STH budgets and pricing, are you suggesting you are actually purchasing these systems retail just to review them?
Shouldent Intel (or the other brands) be providing you with samples?
Navi
Hi Navi – we purchase six figures worth of hardware each year in addition to what vendors supply for reviews. It takes a lot to do this testing. Just the data center costs we have are well over $50K annually using low cost providers.