Live at the NVIDIA GTC 2016 Keynote

April 5, 2016

We are here live at the NVIDIA GTC 2016 keynote where Jen-Hsun Huang is about to take the stage. We will keep this post updated with the latest information and some commentary throughout the presentation. We are going to focus coverage on non-VR applications.

8:58 AM – Getting setup here in the press area. More to come soon as the keynote gets started. We have high hopes to see new VR announcements, NVLink announcements as we saw QCT pre-announced yesterday, and of course, Pascal.

NVIDIA CEO Jen-Hsun Huang takes the stage.

Jen-Hsun Huang takes the stage NVIDIA GTC 2016

GTC 2016 – CUDA 2016 growth 4x since 2012. 97% of all new Supercomputers will be using NVIDIA GPUs.

Announcement #1 – NVIDIA SDK – helping developers

GTC 2016 - NVIDIA SDK — GTC 2016 – NVIDIA SDK

NVIDIA Compute works – CUDA 8 is coming in June. cuDNN 5 for training neural networks, faster and with new features. nvGRAPH for quickly analyzing graph information. IndeX for indesxing large amounts of information.

GPU inference engine (GIE) Available in May. It will work on the ARM based Jetson TX1 meant for embedded applications such as self driving cars, robots and etc. GIE has increased performance from 4 images/s/W to 24 images/s/W

GTC 2016 Jetson TX1 - NVIDIA Computeworks — GTC 2016 Jetson TX1 – NVIDIA Computeworks

Announcement 2 – VR the start of a new platform

Beyond gaming, VR will transform prototyping, communications (e.g. Microsoft Hololens). One example is Everest VR 108 billion pixels of image data to reconstruct Everest in 3D.

GTC 2016 - Everest VR — GTC 2016 – Everest VR

This used an Unreal engine and PhysiX to simulate swirling snow. Very impressive even in 2D.

Mars 2030 – 8km of Martian terrain imaged. Rover is simulated. HDR applied.

GTC 2016 - Mars 2030 VR — GTC 2016 – Mars 2030 VR

These full screen demos are awesome.

GTC 2016 - Mars 2030 Full Screen — GTC 2016 – Mars 2030 Full Screen

The Woz is live and is going to try navigating Mars in 3D.

GTC 2016 - Mars 2030 Full Screen with Woz — GTC 2016 – Mars 2030 Full Screen with Woz

Woz “I am getting dizzy, I am about to fall out of this chair” – “That was not a helpful comment”

That demo was on an NVIDIA Titan. Need something better for photo real. Today we are announcing IRAY VR. This is not ray tracing. Render light probes throughout the room. See how light would emanate from that spot. Each render does a 4K snap. We render 100 of these probes. Next, we use a Quadro M6000 (large frame buffer) we then render from the point of the eye and mixed/ filtered/ processed so it is appropriate for that eye.

Rendering NVIDIA’s new headquarters building using IRAY and 3D. 500,000 square feet in phase 1.

GTC 2016 - IRAY VR NVIDIA HQ — GTC 2016 – IRAY VR NVIDIA HQ

IRAY VR Lite – desktop ray tracing VR.

GTC 2016 - IRAY VR LITE — GTC 2016 – IRAY VR LITE

Announcement 3 – A Deep Learning Chip

2016 will be a landmark year in AI and Deep Learning. NVIDIA has gone “all in” on Deep Learning. This is a new computing model. Deep learning can program things we do not know yet how to write programs for.

GTC 2016 - AI Ecosystem — GTC 2016 – AI Ecosystem

“Cloud platforms of the future are going to be powered by AI”. $5B invested in AI start-ups. We will have an industrial Internet connected to artificial intelligence.

GTC 2016 - 500B AI Opportunity — GTC 2016 – 500B AI Opportunity

Deep learning is a relatively easy concept to apply. Train your own network with the data you have. Every company and every organization should be able to apply this technology. Achieve superhuman results without super humans to train them.

$500B Opportunity over 10 years.

GTC 2016 - Tesla M4 and M40 — GTC 2016 – Tesla M4 and M40

Tesla M40 for training and Tesla M4 for making inferences. M4 is less than 50 watts and works in 1U computers. 20 images per second per watt. “No reason to use FPGAs”.

A purpose built AI processor. $2B In R&D. Thousands of engineers for years.

15 billion transistors.
5.3 Teraflops FP64
10.6 TF FP 32
21.2 TF FP 16
14MB SM register files
4MB L2 Cache

New instructions for deep learning.

GTC 2016 - Tesla P100 — GTC 2016 – Tesla P100

Key features:

Unified memory
Pascal Architecture
16nm FinFET
CoWoS with HBM2
NVLink – 160GB/s
New AI Algorithms
Preemption
600mm^2 “huge chip”
4000 wires (384 on Maxwell) to connect to memory

Wow! This thing is amazing.

Tesla P100 is in production now. Server vendors will release in Q1 2017. Between now and then Hyperscale vendors will consume all NVIDIA can produce.

Announcement #4 – GPU accelerated DDL for every market.

NVIDIA DGX-1

170 TFLOPS in a box. 2 PF in a RACK
8x Tesla P100 – Hybrid Cube mesh using NVLINK
3200 watts. 8 GPUs, 2x Xeon Processors. 7TB SSDs
Quad 100Gbps Infiniband, 2x 10GbE
7TB SSDs

Alexnet training time from 150 hours (dual Xeon plus GPU) down to 2 hours. It would take 250 traditional nodes to train in 2 hours versus one DGX-1 node. 12.5X speedup Y/Y. 4x Maxwell GPUs versus 8x Pascal GPUs.