Cavium just announced that its newest 64-bit Arm CPU, the ThunderX2, has hit general availability. At STH, we do have benchmarks and a full review. We have not been given the OK to publish them yet but our piece is comprehensive at over 5000 words and dozens of tests and images. The company’s press release just hit the wire so we wanted to cover that and highlight a few points in the meantime.
The Cavium ThunderX2 family is really interesting. Instead of targeting lower performance workloads like other ARM offerings, the ThunderX2 is designed to be a high-performance chip. If you look at the partners in the official press release they are Atos, Cray, and HPE, or in summary, HPC shops. The new chips have 8 channel memory, up to 4 way SMT, and dual socket configurations. That means you can get up to 256 threads per server. Visually, here is the impact:
We go into the history of Cavium ThunderX2 in our full review, but we can say this, the new generation is on par with AMD EPYC 7000 series and Intel Xeon Scalable in terms of performance while topping out at a $1795 price tag.
Cavium ThunderX2 Key Specs
Here are the key specs from the press release:
- Single chip system on a chip (SoC) server CPU
- Core and socket level performance comparable to highest end Xeon Skylake Platinum CPUs
- Second generation of full custom Cavium Arm core
- Quad Issue, Fully Out of Order
- Full SMT support – 1, 2, 4 threads per core
- Up to 2.5 GHz in normal mode, up to 3 GHz in Turbo mode
- 3X single thread performance compared to ThunderX®
- Up to 32 cores per socket delivering > 2.5-3X socket level performance compared to ThunderX
- Cache:
- 32 KB L1 instruction and data cache, 256KB L2 per core
- 32 MB distributed L3 cache
- Advanced server class RAS features covering memory, CPU, cache, CCPI2 and PCIe interfaces
- Advanced power management
- On-chip management engine for dynamic voltage and frequency scaling across the chip
- Full Turbo mode support
- Single and dual socket configuration support using 2nd generation of Cavium Coherent Interconnect with > 2.5X coherent bandwidth compared to ThunderX
- System Memory
- 8 DDR4 memory controllers per socket
- Dual DIMM per memory controller, for a total of 16 DIMMs per socket
- Up to 4 TB of memory in dual socket configuration
- 33% higher memory bandwidth and memory capacity compared to Xeon Skylake Platinum CPUs
- Flexible IO:
- Integrated 56 lanes of PCIe Gen3 interfaces, x1, x4, x8 and x16 support, 14 integrated PCIe controllers
- Integrated SATAv3, GPIOs, USB interfaces
- 16% higher IO bandwidth compared to Xeon Skylake Platinum CPU
Nice! But price is everything and if it is on par with Intel SP also on price, then Epyc is cheaper and it will win probably. I’m personally sorry about it since it looks like a nice SoC, but ThunderXStation is for ~10k british pounds and that’s in the realm of very high-end amd64 workstation…
The 32 core chips are $1795.
Patrick, ok, 32 core inside the workstation costs 9.5k british pounds. At least here: https://www.avantek.co.uk/store/avantek-thunderx2-arm-workstation-thunderx2station.html — and that’s configuration with just 32GB RAM so someone tell me where the additional 7-8k pounds go…
GA for the parts is today. Sometimes developer workstations pre-GA cost more and are lower volume. I did give them feedback on developer workstation pricing at OCP Summit.
I suspect as systems roll out in higher volume, prices for platforms will go down.
Is there any news on the NVidia version of the ARM server SoC?
“That means you can get up to 256 threads per socket” Umm, 32 cores x 4 way smt = 128 per socket, 256 per two socket motherboard, Right?
Patrick, let’s see if you are right. I’m afraid this ThunderX2 platform may become another POWER8/9 which competes to intel xeon and not-compete as it basically cost same per same performance… If Cavium would like to fly on server/workstation sells, then it needs to be considerably cheaper than Intel. E.g. do the same like Intel with Pentium Pro/II in ’90, it was also considerably cheaper for the same or better performance than Unix workstations…
So it is a HPC cpu but they do not indicate the GFLOPS?
And what about comparing to epyc also? Seems to me that they have about the same characteristics and pricing of epyc is quite good.
Ziple – over 1 TFLOP dual socket. I believe we have some numbers in our review but we were told we need to move to 1 thread per core for better performance on linpack for example.
At the launch event where I spoke yesterday, the next presenters were from Cray, HPE’s Apollo 70, and Atos. All HPC shops.