NVIDIA made a number of announcements Monday at Supercomputing 2018 (SC18.) We wanted to briefly cover some of the major points. Some of our readers will be disappointed at the latck of new hardware. On the other hand, the software is where NVIDIA is extending its lead.
More NVIDIA Accelerated Supercomputers
Summit and Sierra are the world’s fastest supercomputers, powered by NVIDIA Tesla Volta GPUs along with Power9 processors and EDR Infiniband. We covered how the IBM Power9 chips are able to speak NVLink natively during Hot Chips 30. In addition, we tested our first 8x GPU NVLink server in Gigabyte G481-S80 8x NVIDIA Tesla GPU Server Review.
The overall trend is that NVIDIA is now in 127 of the Top500. This is up from 83 systems a year ago. In our recent piece Top500 November 2018 Our New Systems Analysis we saw NVIDIA dominate the accelerator market again.
NVIDIA also notes it has 22 of the 25 greenest systems in the world and that half of the world’s Top500 compute is coming from NVIDIA accelerated systems.
NVIDIA Tesla T4 on Google Cloud
NVIDIA announced that its newest Tesla T4 inferencing engine is available in the Google Cloud. This is not a GA announcement, rather an early access announcement.
Admittedly, I had to double-check that we had not covered this before. When NVIDIA releases a new GPU like this, and with all the buzz the Tesla T4 has garnered in the industry, I thought it already had happened.
On the other hardware side, NVIDIA also announced that the DGX-2 and HGX-2 platforms are making their way into national labs and cloud providers. We are going to do a separate piece on these solutions.
NVIDIA Weather Simulation and Modeling Acceleration
Weather simulation and modeling is a top 10 HPC application. NVIDIA says that
NVIDIA mentions that COSMO, WRF, and MPAS are all NVIDIA accelerated and seeing massive performance increases.
Multi-Node NGC Containers
Perhaps the most exciting announcement for us at the show is that NVIDIA is making its NGC containers multi-node aware.
With the new NGC containers, one can scale beyond the GPUs found in a single box, and to multiple nodes. New NGC multi-node containers support Mellanox Infiniband for high speed, low latency, communication between nodes.
Final Words
What we did not get is a new major supercomputer announcement or a new GPU announcement. NVIDIA has an enormous software stack advantage after a decade of GPU accelerated computing that upstarts, and even well-known companies like AMD, cannot match. Our sense is that NVIDIA will not announce the V100’s successor until its next GTC event. The new GPU will be either optimized for exascale computing or be the GPU architecture generation before exascale systems come online. What is clear is that on the hardware side NVIDIA is getting a lot of mileage out of the Tesla V100, significantly more than a CPU generation these days.