Subject: Graphics Cards | October 31, 2017 - 09:58 PM | Scott Michaud
Tagged: nvidia, amazon, google, pascal, Volta, gv100, tesla v100
Remember last month? Remember when I said that Google’s introduction of Tesla P100s would be good leverage over Amazon, as the latter is still back in the Kepler days (because Maxwell was 32-bit focused)?
To compare the two parts, the Tesla P100 has 3584 CUDA cores, yielding just under 10 TFLOPs of single-precision performance. The Tesla V100, with its ridiculous die size, pushes that up over 14 TFLOPs. Same as Pascal, they also support full 1:2:4 FP64:FP32:FP16 performance scaling. It also has access to NVIDIA’s tensor cores, which are specialized for 16-bit, 4x4 multiply-add matrix operations that are apparently common in neural networks, both training and inferencing.
Amazon allows up to eight of them at once (with their P3.16xlarge instances).
So that’s cool. While Google has again been quickly leapfrogged by Amazon, it’s good to see NVIDIA getting wins in multiple cloud providers. This keeps money rolling in that will fund new chip designs for all the other segments.
Subject: Graphics Cards | May 10, 2017 - 01:32 PM | Ryan Shrout
Tagged: v100, tesla, nvidia, gv100, gtc 2017
During the opening keynote to NVIDIA’s GPU Technology Conference, CEO Jen-Hsun Huang formally unveiled the latest GPU architecture and the first product based on it. The Tesla V100 accelerator is based on the Volta GPU architecture and features some amazingly impressive specifications. Let’s take a look.
|Tesla V100||GTX 1080 Ti||Titan X (Pascal)||GTX 1080||GTX 980 Ti||TITAN X||GTX 980||R9 Fury X||R9 Fury|
|GPU||GV100||GP102||GP102||GP104||GM200||GM200||GM204||Fiji XT||Fiji Pro|
|Base Clock||-||1480 MHz||1417 MHz||1607 MHz||1000 MHz||1000 MHz||1126 MHz||1050 MHz||1000 MHz|
|Boost Clock||1455 MHz||1582 MHz||1480 MHz||1733 MHz||1076 MHz||1089 MHz||1216 MHz||-||-|
|ROP Units||128 (?)||88||96||64||96||96||64||64||64|
|Memory Clock||878 MHz (?)||11000 MHz||10000 MHz||10000 MHz||7000 MHz||7000 MHz||7000 MHz||500 MHz||500 MHz|
|Memory Interface||4096-bit (HBM2)||352-bit||384-bit G5X||256-bit G5X||384-bit||384-bit||256-bit||4096-bit (HBM)||4096-bit (HBM)|
|Memory Bandwidth||900 GB/s||484 GB/s||480 GB/s||320 GB/s||336 GB/s||336 GB/s||224 GB/s||512 GB/s||512 GB/s|
|TDP||300 watts||250 watts||250 watts||180 watts||250 watts||250 watts||165 watts||275 watts||275 watts|
|Peak Compute||15 TFLOPS||10.6 TFLOPS||10.1 TFLOPS||8.2 TFLOPS||5.63 TFLOPS||6.14 TFLOPS||4.61 TFLOPS||8.60 TFLOPS||7.20 TFLOPS|
While we are low on details today, it appears that the fundamental compute units of Volta are similar to that of Pascal. The GV100 has 80 SMs with 40 TPCs and 5120 total CUDA cores, a 42% increase over the GP100 GPU used on the Tesla P100 and 42% more than the GP102 GPU used on the GeForce GTX 1080 Ti. The structure of the GPU remains the same GP100 with the CUDA cores organized as 64 single precision (FP32) per SM and 32 double precision (FP64) per SM.
Click to Enlarge
Interestingly, NVIDIA has already told us the clock speed of this new product as well, coming in at 1455 MHz Boost, more than 100 MHz lower than the GeForce GTX 1080 Ti and 25 MHz lower than the Tesla P100.
Click to Enlarge
Volta adds in support for a brand new compute unit though, known as Tensor Cores. With 640 of these on the GPU die, NVIDIA directly targets the neural network and deep learning fields. If this is your first time hearing about Tensor, you should read up on its influence on the hardware markets, bringing forth an open-source software library for machine learning. Google has invested in a Tensor-specific processor already, and now NVIDIA throws its hat in the ring.
Adding Tensor Cores to Volta allows the GPU to do mass processing for deep learning, on the order of a 12x improvement over Pascal’s capabilities using CUDA cores only.
For users interested in standard usage models, including gaming, the GV100 GPU offers 1.5x improvement in FP32 computing, up to 15 TFLOPS of theoretical performance and 7.5 TFLOPS of FP64. Other relevant specifications include 320 texture units, a 4096-bit HBM2 memory interface and 16GB of memory on-module. NVIDIA claims a memory bandwidth of 900 GB/s which works out to 878 MHz per stack.
Maybe more impressive is the transistor count: 21.1 BILLION! NVIDIA claims that this is the largest chip you can make physically with today’s technology. Considering it is being built on TSMC's 12nm FinFET technology and has an 815 mm2 die size, I see no reason to doubt them.
Shipping is scheduled for Q3 for Tesla V100 – at least that is when NVIDIA is promising the DXG-1 system using the chip is promised to developers.
I know many of you are interested in the gaming implications and timelines – sorry, I don’t have an answer for you yet. I will say that the bump from 10.6 TFLOPS to 15 TFLOPS is an impressive boost! But if the server variant of Volta isn’t due until Q3 of this year, I find it hard to think NVIDIA would bring the consumer version out faster than that. And whether or not NVIDIA offers gamers the chip with non-HBM2 memory is still a question mark for me and could directly impact performance and timing.