Today we have benchmarks of dual Intel Xeon E5-2697 V4 processors. Each Intel Xeon E5-2697 V4 has 18 core, 36 threads and 45MB L3 cache. With 2.3GHz base clock speeds, these chips are very similar to the E5-2699 V3 processors which were top of the line in the last generation. We have already published Intel Xeon E5-2699 V4 Benchmarks and Intel Xeon E5-2698 V4 Benchmarks so now it is time to take a look at the third highest model. Of note, Intel also has an E5-2697A V4 which is a 16 core / 32 thread part. Despite similar naming, we are benchmarking the 18 core parts in a dual socket configuration.
Test Configuration
Our test platform was a standard EATX motherboard upgraded for Xeon E5 V4 support via a simple BIOS upgrade. This is one of the NVMe servers we use in the Fremont colocation that we brought offline and upgraded to the V4 part.
- CPU: Intel Xeon E5-2697 V4
- Chassis: Intel R2208WTTYS 2U
- Memory: 64GB – 4x Samsung 16GB DDR4 2400MHz ECC RDIMMs
- SSD: 1x Intel DC S3700 400GB, 4x Intel DC P3600 800GB
- Operating System: Ubuntu 14.04.3 LTS
As another note, we tried picking some interesting comparisons out of our data set and did have legacy E5-2697 V2 and V3 information for most of the benchmarks we run.
Intel Xeon E5-2697 V4 Benchmarks
For our testing we are using Linux-Bench scripts which help us see cross platform “least common denominator” results. We are using gcc due to its ubiquity as a default compiler. One can see details of each benchmark here. We are likely going to update the Linux-Bench in the near future with a few new tests as well as an even simpler to use/ faster revision, but for now, we are using our old Ubuntu 14.04.3 LTS version.
Python Linux 4.4.2 Kernel Compile Benchmark
This is one of the most requested benchmarks for STH over the past few years. We (finally) have a Linux kernel compile benchmark script that is consistent. Expect to see this functionality migrate into Linux-Bench soon (we are just awaiting the parser work on it.) The task was simple, we have a standard configuration file, the Linux 4.4.2 kernel from kernel.org, and make with every thread in the system. We are expressing results in terms of complies per hour to make the results easier to read.
As you can see, the dual Xeon E5-2697 V4 system is the fourth fastest on our charts and likely similar to what we would see if we were able to re-test the E5-2699 V3 on our new benchmark. Unfortunately, we did not have a set of the older CPUs available for re-testing the Linux kernel complie benchmark.
c-ray 1.1 Performance
We have been using c-ray for our performance testing for years now. It is a ray tracing benchmark that is extremely popular to show differences in processors under multi-threaded workloads.
The Intel Xeon E5-2697 V4 parts are so fast that we similar performance to the E5-2699 V3 parts which were also 18 core/ 36 thread parts.
7-zip is a widely used compression/ decompression program that works cross platform. We started using the program during our early days with Windows testing. It is now part of Linux-Bench.
Compression is a major operation we see in today’s workloads and is also highly threaded. We did have the Cavium ThunderX 48 core result omitted as we explained in our 96 core Cavium ThunderX benchmark piece. The Intel Xeon E5-2697 V4 does show a continued trend towards IPC improvements in these benchmarks.
NAMD Performance
NAMD is a molecular modeling benchmark developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. More information on the benchmark can be found here.
Somewhat surprisingly we find the Intel Xeon E5-2697 V4 perform better than several of the older configurations we have tested. Performance was very good on this complex benchmark.
Sysbench CPU test
Sysbench is another one of those widely used Linux benchmarks. We specifically are using the CPU test, not the OLTP test that we use for some storage testing.
We sorted this chart on the multi-threaded results. Practically that means that the blue bars representing single threaded performance would change the ranking. The single threaded results are bounded in a fairly tight range because these are all Intel processor CPUs mostly ranging between Sandy Bridge and Broadwell architectures.
OpenSSL Performance
OpenSSL is widely used to secure communications between servers. This is an important protocol in many server stacks. We first look at our sign tests:
Moving to the verify results:
At this point we see fourth place finish for the E5-2697 V4 however we do notice this is an area where the V4 architecture is strong and there is a relatively measured gap between the E5-2697 V4 and the higher dollare E5-2699 V4 parts.
UnixBench Dhrystone 2 and Whetstone Benchmarks
Of course, these chips are not meant for heavy compute but we pick out the UnixBench 5.1.3 Dhrystone 2 and Whetstone results to show some of the raw performance they are capable of. UnixBench is widely used so it is a good comparison point.
Here are the single threaded workloads:
As you can see, the results are relatively tight on our single threaded benchmarks likely due to the
Now the E5 V4’s sweet spot, the multi-threaded workloads:
In multi-threaded tasks, the Intel Xeon E5-2697 V4 performs near the top of the pack. There is very little reason to upgrade from Intel Xeon E5-2600 V3 to V4 parts, however one can see massive gains from the, at the time mainstream, E5-2670 V1 parts to the point that E5-2697 V4 provides close to 3x the floating point performance. For a 3x performance gain, that starts to become a compelling upgrade timing.
Conclusion
With the new E5 V4 CPUs, we can see substantial performance gains with the higher end parts. Starting with the E5-2697 V4 we can see modest performance gains over the similar core count V3 parts (E.g. the E5-2699 V3) however the E5-2697 V4 parts are going to be much more accessible in terms of price and availability. Intel has the luxury of being able to turn off 6 of the 24 LCC (large core count) die cores or 25% of what the silicon has. This means the E5-2697 V4 is going to be a high yield part for Intel. With the E5-2699 V3, we saw fewer chips on the market due to yield constraints. We expect the E5-2697 V4 to become widely adopted this generation.
You can find more STH Xeon E5 V4 coverage here:
- Intel Xeon E5-2600 V4 Line-up and Architectural Overview
- Intel Xeon E5-2699 V4 Benchmarks
- Intel Xeon E5-2698 V4 Benchmarks
- Intel DC P3700 and D3600 dual port NVMe
- Intel DC P3520 and DC P3320 NVMe SSDs
Subscribe to STH to get the latest benchmarks and platform reviews as they are published. We have a huge back log of content coming.
The E5-2697 v4 per core performance improvement over a v1 is not breathtaking. It is sometimes faster per core and is sometimes slower per core.
2697v4 /2670v1
36cores/16cores=2.25
The best comparison is OpenSSL
2697v4 / 2670v1
4500 / 1600 = 2.81
A 25% improvement per core (over the 2.25 ratio)
7-zip is about 4% slower
142000 / 66000 = 2.15
Linux compile scaled with the cores, 2.25
2697v4/2670v1
18/8= 2.25
Above benchmark numbers may be off a bit, they were eyeballed from the article charts.
It would be good to see operations / watt.