At GTC 2017 in China (NVIDIA has multiple GTC’s), NVIDIA announced that CUDA 9 is now available. That is a major milestone in the HPC/ AI industries as with each new CUDA release we generally see support for new architectures and libraries optimized for the most cutting-edge applications. NVIDIA CUDA 9 has been available in release candidate form for some time but we are finally seeing the GA mark of the new tooling.
New NVIDIA CUDA 9 Features
If you want to get a full overview, the NVIDIA Parallel Forall blog has an in-depth look at the new features of NVIDIA CUDA 9. We suggest giving it a read:
https://devblogs.nvidia.com/parallelforall/cuda-9-features-revealed/
The key features via the NVIDIA Developer site are listed as:
- Speed up high-performance computing (HPC) and deep learning apps with new GEMM kernels in cuBLAS
- Execute image and signal processing apps faster with performance optimizations across multiple GPU configurations in cuFFT and NVIDIA Performance Primitives
- Solve linear and graph analytics problems common in HPC with new algorithms in cuSOLVER and nvGRAPH
- Express rich parallel algorithms with threads from sub-tiles to warps, blocks, and grids
- Manage and reuse threads efficiently within an application with new API and function primitives
- Optimize and pre-fetch memory access by identifying source code causing page faults in unified memory
- Inspect unified memory performance bottlenecks with new event filters based on virtual address, migration reason and page fault access type
There are also a number of Volta and NVLink support items that have been added in the newest CUDA 9 release:
- Replace warp-synchronous programming with robust programming model on Kepler architecture and above
- Execute AI applications faster with Tensor Cores performing 5X faster than Pascal GPUs
- Scale multi-GPU applications with next-generation NVLink delivering 2X throughput of prior generation
- Increase GPU utilization with Volta Multi-Process Service (MPS)
- Profile PCIe usage by analyzing bandwidth of memory transfers, latency, and comparison with NVLink
STH will be updating many of our nvidia-docker images with the new CUDA 9 after testing.