Nvidia's Volta GPU: Why Gamers Should Care

Friday, 12 May, 2017

Nvidia hopes these metrics will draw developers by the masses, as it outlined earlier this week during its annual GTC conference.

In the image below, Nvidia is showing how for a matrix-matrix multiplication, commonly used in the training of neural networks, the V100 can be more than 9x faster compared to the Pascal-based P100 GPU.

"To make one chip work per 12-inch wafer, I would characterise it as 'unlikely.' So the fact that this is manufacturable is an incredible feat".

Volta, which is created to bring great speed and scalability for AI inferencing and training, is an architecture built with 21 billion transistors.

Indeed, Huang made the case that as conventional central processing units like those made by Intel Corp. are seeing limits to their ability to double the number of transistors on a single chip every two years, a decades-long trend known as Moore's Law, Nvidia's parallel style of processing is positioned to take the lead in chip progress. "To address these challenges, HPE is introducing new optimized GPU compute platforms, an enhanced collaboration with NVIDIA and HPE Pointnext Services from the Core Datacenter to the Intelligent Edge".

In raw specs, the V100 is seriously impressive. It is a huge chip - 815 square millimeters, or about as big as an Apple Watch face. At a cost of $3 billion in R&D, the final chip is fabricated using a 12nm process by TSMC, and uses the highest-speed RAM available from Samsung.

Riding a wave of excitement for all things AI, NVIDIA has launched the Volta GPU.

The company said that this result is due to the custom crafting of the Tensor Cores and their data paths to maximize their floating point performance with a minimal increase in power consumption. The new V100 chip announced this week is the culmination of that effort, and features a new core specialized for accelerating deep-learning math. These are titled Tensor Cores. By pairing CUDA cores and the new Volta Tensor Core with a unified architecture on a single server with Tesla V100 GPUs can "replace hundreds of commodity CPUs for traditional HPC", says the company. It is equipped with 5,120 CUDA cores, 640 Tensor Cores, delivers 120 teraflops using INT8 - that's 7.5 TFLOPS using 64-bit FP and 15 TFLOPS using 32-bit FP - and stocks a 16GB HBM2 memory bank with a bandwidth of 900GB per second. NVIDIA also unveiled the new DGX Station. The V100 will start shipping by the end of the year to data centers owned by Amazon, Microsoft, and other cloud computing providers in several different configurations. In partnership with Microsoft Azure, Nvidia has also developed a cloud-friendly box, the HGX-1, with eight V100s that can be flexibly configured for a variety of cloud computing needs. My job is to ensure people use the Azure Cloud, and people want to use what's available immediately, without waiting. "Although Volta is more efficient [than Pascal] running deep-learning workloads, Nvidia didn't compare it with Google's TPU ASIC".