We are here live at the NVIDIA GTC 2016 keynote where Jen-Hsun Huang is about to take the stage. We will keep this post updated with the latest information and some commentary throughout the presentation. We are going to focus coverage on non-VR applications.
8:58 AM – Getting setup here in the press area. More to come soon as the keynote gets started. We have high hopes to see new VR announcements, NVLink announcements as we saw QCT pre-announced yesterday, and of course, Pascal.
NVIDIA CEO Jen-Hsun Huang takes the stage.
GTC 2016 – CUDA 2016 growth 4x since 2012. 97% of all new Supercomputers will be using NVIDIA GPUs.
Announcement #1 – NVIDIA SDK – helping developers
NVIDIA Compute works – CUDA 8 is coming in June. cuDNN 5 for training neural networks, faster and with new features. nvGRAPH for quickly analyzing graph information. IndeX for indesxing large amounts of information.
GPU inference engine (GIE) Available in May. It will work on the ARM based Jetson TX1 meant for embedded applications such as self driving cars, robots and etc. GIE has increased performance from 4 images/s/W to 24 images/s/W
Announcement 2 – VR the start of a new platform
Beyond gaming, VR will transform prototyping, communications (e.g. Microsoft Hololens). One example is Everest VR 108 billion pixels of image data to reconstruct Everest in 3D.
This used an Unreal engine and PhysiX to simulate swirling snow. Very impressive even in 2D.
Mars 2030 – 8km of Martian terrain imaged. Rover is simulated. HDR applied.
These full screen demos are awesome.
The Woz is live and is going to try navigating Mars in 3D.
Woz “I am getting dizzy, I am about to fall out of this chair” – “That was not a helpful comment”
That demo was on an NVIDIA Titan. Need something better for photo real. Today we are announcing IRAY VR. This is not ray tracing. Render light probes throughout the room. See how light would emanate from that spot. Each render does a 4K snap. We render 100 of these probes. Next, we use a Quadro M6000 (large frame buffer) we then render from the point of the eye and mixed/ filtered/ processed so it is appropriate for that eye.
Rendering NVIDIA’s new headquarters building using IRAY and 3D. 500,000 square feet in phase 1.
IRAY VR Lite – desktop ray tracing VR.
Announcement 3 – A Deep Learning Chip
2016 will be a landmark year in AI and Deep Learning. NVIDIA has gone “all in” on Deep Learning. This is a new computing model. Deep learning can program things we do not know yet how to write programs for.
“Cloud platforms of the future are going to be powered by AI”. $5B invested in AI start-ups. We will have an industrial Internet connected to artificial intelligence.
Deep learning is a relatively easy concept to apply. Train your own network with the data you have. Every company and every organization should be able to apply this technology. Achieve superhuman results without super humans to train them.
$500B Opportunity over 10 years.
Tesla M40 for training and Tesla M4 for making inferences. M4 is less than 50 watts and works in 1U computers. 20 images per second per watt. “No reason to use FPGAs”.
A purpose built AI processor. $2B In R&D. Thousands of engineers for years.
- 15 billion transistors.
- 5.3 Teraflops FP64
- 10.6 TF FP 32
- 21.2 TF FP 16
- 14MB SM register files
- 4MB L2 Cache
New instructions for deep learning.
Key features:
- Unified memory
- Pascal Architecture
- 16nm FinFET
- CoWoS with HBM2
- NVLink – 160GB/s
- New AI Algorithms
- Preemption
- 600mm^2 “huge chip”
- 4000 wires (384 on Maxwell) to connect to memory
Wow! This thing is amazing.
Tesla P100 is in production now. Server vendors will release in Q1 2017. Between now and then Hyperscale vendors will consume all NVIDIA can produce.
Announcement #4 – GPU accelerated DDL for every market.
NVIDIA DGX-1
- 170 TFLOPS in a box. 2 PF in a RACK
- 8x Tesla P100 – Hybrid Cube mesh using NVLINK
- 3200 watts. 8 GPUs, 2x Xeon Processors. 7TB SSDs
- Quad 100Gbps Infiniband, 2x 10GbE
- 7TB SSDs
Alexnet training time from 150 hours (dual Xeon plus GPU) down to 2 hours. It would take 250 traditional nodes to train in 2 hours versus one DGX-1 node. 12.5X speedup Y/Y. 4x Maxwell GPUs versus 8x Pascal GPUs.
- Baidu “we expect a 30x improvement with the P100”
- Google Tensorflow – Open sourced with the most Github likes on Github
DGX-1 price – $129,000 – wow!
Here is the Tesla family now:
Self Driving cars – NVIDIA Drive PX – AI Car Computer
Drive PX – 180fps multi-point tracking of cars around your vehicle.
Baidu self driving car computer cluster that has been replaced by the new Drive PX2.
Drive PX2 – two Tegra processors and 2 unannounced Pascal GPUs will make a GPU computer in a box.
I think this is something we are going to hear a lot more about this year.
That’s a wrap. I will get some extra items up later today.