Gigabyte E251-U70 Performance
For this exercise, we are using our legacy Linux-Bench scripts which help us see cross-platform “least common denominator” results we have been using for years as well as several results from our updated Linux-Bench2 scripts.
At this point, our benchmarking sessions take days to run and we are generating well over a thousand data points. We are also running workloads for software companies that want to see how their software works on the latest hardware. As a result, this is a small sample of the data we are collecting and can share publicly. Our position is always that we are happy to provide some free data but we also have services to let companies run their own workloads in our lab, such as with our DemoEval service. What we do provide is an extremely controlled environment where we know every step is exactly the same and each run is done in a real-world data center, not a test bench.
We are going to show off a few results and highlight a number of interesting data points in this article.
Python Linux 4.4.2 Kernel Compile Benchmark
This is one of the most requested benchmarks for STH over the past few years. The task was simple, we have a standard configuration file, the Linux 4.4.2 kernel from kernel.org, and make the standard auto-generated configuration utilizing every thread in the system. We are expressing results in terms of compiles per hour to make the results easier to read:
We wanted to get a good sense of the range of performance depending on the configuration. We also wanted to test 205W TDP chips in the platform such as the Intel Xeon Gold 6258R.
c-ray 1.1 Performance
We have been using c-ray for our performance testing for years now. It is a ray tracing benchmark that is extremely popular to show differences in processors under multi-threaded workloads. We are going to use our 8K results which work well at this end of the performance spectrum.
We have two pre-refresh parts on this chart. A tip we have is to use SKUs from the big 2nd Gen Intel Xeon Scalable Refresh. They are much lower cost than the original 2019 2nd Generation Intel Xeon Scalable SKUs. One will not miss the lack of a UPI link since this is a single-socket platform. Also, with only 8x DIMM slots, there are few configurations that will require high-memory SKUs.
We should mention that there are “U” SKUs in Intel’s lineup that are single-socket optimized and directly aimed at AMD’s “P” series SKUs. Those are ideal in a platform like this, but we tend to have the non-U variants just so we can test single and dual-socket configurations.
7-zip Compression Performance
7-zip is a widely used compression/ decompression program that works cross-platform. We started using the program during our early days with Windows testing. It is now part of Linux-Bench.
Even though the Gigabyte E251-U70 is designed for efficient GPU to NIC communication, we would still suggest getting a higher-end part than the Xeon Bronze series. The Gold 622xR and Gold 52xxR parts are good matches for this platform.
OpenSSL Performance
OpenSSL is widely used to secure communications between servers. This is an important protocol in many server stacks. We first look at our sign tests:
Here are the verify results:
Here we are testing crypto on the CPU. In many of the configurations, one will use this system in, the add-in cards will handle cryptographic offload. Still, this gives us some sense of scaling performance.
Chess Benchmarking
Chess is an interesting use case since it has almost unlimited complexity. Over the years, we have received a number of requests to bring back chess benchmarking. We have been profiling systems and are ready to start sharing results:
To us, the Xeon Scalable Refresh has made a lot of the Xeon Gold 6000R series very accessible. We did not get to test the U series, but there is an alternative to the Xeon Gold 6226R shown above which is a $1300 list price SKU. The Intel Xeon Gold 6208U is a $989 part with identical 16 cores and clock speeds. That Gold 6208U we think is a great match (at the Gold 6226R performance level) for this server.
Overall, we are late in the LGA3647 cycle with the socket first launched in mid-2017. As a result, we know what to expect with a socket like this.
Next, we are going to move to our power consumption, server spider, and final words.
The combination of a powered riser card with PCIe switch, rather than a passive riser using some higher-density proprietary edge connector, and a bunch of fully populated but mechanically unusable PCIe slots interests me. I definitely wouldn’t expect to see an active PCIe switch dragged in to a cost-optimized design that isn’t starved for CPU-provided lanes; and when pennies really need to be pinched even having the connectors populated isn’t a given.
Do you know if this system just shares a motherboard with one or more other Gigabyte units that require all the slots and it was cheaper to avoid SKU proliferation than it was to cut the redundant headers; or if there is a variant of the riser card module that just provides a bunch of half-height slots? The one slot closest to the RAM looks like it’s blocked by non-removable chassis metal; but the rest of them certainly look like they could be configured as half-height slots with an appropriate rear plate slotted in.
Who is this “one” I keep seeing in all STH articles? Neo?