Today TACC Frontera was officially unveiled. If this sounds a bit strange, there is a good reason. Frontera already placed at #5 on the Top500 list in June 2019. We also heard about Frontera’s liquid-cooled design in our piece Dell EMC Talks Deep Learning and AI Q3 2019. Still, when you spend a lot of money on a supercomputer, you want to get all of the PR possible.
A Look Behind TACC Frontera
Instead of regurgitating a press release, we wanted to show what is powering the system since that is more interesting.
Starting with storage, the primary storage is a disk backed capacity system while there is also a faster NVMe scratch system. A 4PB “fast scratch” NVMe-backed system is rated at 1.5TB/s. For disk, there is 50PB rated at 300GB/s.
Interconnect is Mellanox Infiniband HDR-100 for its primary compute and its liquid submerged GPU systems. The Longhorn solution using IBM Power9 nodes and NVIDIA Tesla V100 GPUs utilizes Mellanox EDR Infiniband.
In terms of primary compute, things here are fascinating. There are 8008 dual-socket Intel Xeon Platinum 8280 nodes each with 192GB per node. That seems to indicate using 6x 16GB DIMMs per socket. Using liquid-cooled Dell nodes, one would have thought this is the perfect installation candidate for Intel’s Xeon Platinum 9200 series. Low memory capacities and high core count with liquid cooling for HPC are exactly what the Platinum 9200 series is designed for.
The use of Intel Xeon Platinum 8280’s is interesting for another reason. If you look at the new systems on the Top500 list, 20 cores are the most common by far, meaning the 28 core parts have 40% more cores per socket than we normally see. Here is the Top500 November 2018 Our New Systems Analysis CPU cores per socket:
Here is the Top500 June 2019 Our New Systems Analysis, where Frontera is one of the 28-core machines:
The TACC systems tend to be ones that Dell and Intel win. Our sense is that Intel is providing significant discounts on the Platinum 8280 to win this. At list price, the Platinum 8280’s would cost over $160 million alone.
Beyond the compute nodes, there is a subsystem focused on single-precision that is focused on AI and work that does not require double precision. This subsystem is headlined by 360 NVIDIA Quadro RTX 5000 GPUs. The cooling is perhaps the most unique as they are using liquid submersion cooling here. That is likely a pilot for what is to come in future generations.
Final Words
Frontera is a cool system, but in the next few years, it will be absolutely dwarfed as we enter the exascale era. Still, systems doing research today are important. Frontera is clearly researching technologies and design principles for future generations of supercomputers.
They must have already locked into Intel or their software is Intel specific becauae they would have gotten a greater compute/power density with EPYCs…?
Considering early access in march 2019 this is probably a machine ordered Q3 2018 at the latest unless they really pushed it and ordered it early Q4 if they had building and power ready.
So 9200 wasn’t released and EPYC was gen1
The sheer power of that system is mind-boggling.