Turing vs Volta: Two Chips Enter. No One Dies.

Subject: Graphics Cards | August 21, 2018 - 08:43 PM |
Tagged: nvidia, Volta, turing, tu102, gv100

In the past, when NVIDIA launched a new GPU architecture, they would make a few designs for each of their market segments. All SKUs would be one of those chips, with varying amounts of it disabled or re-clocked to hit multiple price points. The mainstream enthusiast (GTX -70/-80) chip of each generation is typically 300mm2, and the high-end enthusiast (Titan / -80 Ti) chip is often around 600mm2.

nvidia-2016-gtc-pascal-banner.png

Kepler used quite a bit of that die space for FP64 calculations, but that did not happen with consumer versions of Pascal. Instead, GP100 supported 1:2:4 FP64:FP32:FP16 performance ratios. This is great for the compute community, such as scientific researchers, but games are focused on FP32. Shortly thereafter, NVIDIA releases GP102, which had the same number of FP32 cores (3840) as GP100 but with much-reduced 64-bit performance… and much reduced die area. GP100 was 610mm2, but GP102 was just 471mm2.

At this point, I’m thinking that NVIDIA is pulling scientific computing chips away from the common user to increase the value of their Tesla parts. There was no reason to either make a cheap 6XXmm2 card available to the public, and a 471mm2 part could take the performance crown, so why not reap extra dies from your wafer (and be able to clock them higher because of better binning)?

nvidia-2017-sc17-japanaisuper.jpg

And then Volta came out. And it was massive (815mm2).

At this point, you really cannot manufacture a larger integrated circuit. You are at the limit of what TSMC (and other fabs) can focus onto your silicon. Again, it’s a 1:2:4 FP64:FP32:FP16 ratio. Again, there is no consumer version in sight. Again, it looked as if NVIDIA was going to fragment their market and leave consumers behind.

And then Turing was announced. Apparently, NVIDIA still plans on making big chips for consumers… just not with 64-bit performance. The big draw of this 754mm2 chip is its dedicated hardware for raytracing. We knew this technology was coming, and we knew that the next generation would have technology to make this useful. I figured that meant consumer-Volta, and NVIDIA had somehow found a way to use Tensor cores to cast rays. Apparently not… but, don’t worry, Turing has Tensor cores too… they’re just for machine-learning gaming applications. Those are above and beyond the raytracing ASICs, and the CUDA cores, and the ROPs, and the texture units, and so forth.

nvidia-2018-geforce-rtx-turing-630-u.jpg

But, raytracing hype aside, let’s think about the product stack:

  1. NVIDIA now has two ~800mm2-ish chips… and
  2. They serve two completely different markets.

In fact, I cannot see either FP64 or raytracing going anywhere any time soon. As such, it’s my assumption that NVIDIA will maintain two different architectures of GPUs going forward. The only way that I can see this changing is if they figure out a multi-die solution, because neither design can get any bigger. And even then, what workload would it even perform? (Moment of silence for 10km x 10km video game maps.)

What do you think? Will NVIDIA keep two architectures going forward? If not, how will they serve all of their customers?

GTC 2018: NVIDIA Announces Volta-Powered Quadro GV100

Subject: General Tech | March 27, 2018 - 03:30 PM |
Tagged: nvidia, GTC, quadro, gv100, GP100, tesla, titan v, v100, votla

One of the big missing markets for NVIDIA with their slow rollout of the Volta architecture was professional workstations. Today, NVIDIA announced they are bringing Volta to the Quadro family with the Quadro GV100 card.

27-gv100-gpu.jpg

Powered by the same GV100 GPU that announced at last year's GTC in the Tesla V100, and late last year in the Titan V, the Quadro GV100 represents a leap forward in computing power for workstation-level applications. While these users could currently be using TITAN V for similar workloads, as we've seen in the past, Quadro drivers generally provide big performance advantages in these sorts of applications. Although, we'd love to see NVIDIA repeat their move of bringing these optimizations to the TITAN lineup as they did with the TITAN Xp.

As it is a Quadro, we would expect this to be NVIDIA's first Volta-powered product which provides certified, professional driver code paths for applications such as CATIA, Solidedge, and more.

quadro-gv100.png

NVIDIA also heavily promoted the idea of using two of these GV100 cards in one system, utilizing NVLink. Considering the lack of NVLink support for the TITAN V, this is also the first time we've seen a Volta card with display outputs supporting NVLink in more standard workstations.

More importantly, this announcement brings NVIDIA's RTX technology to the professional graphics market. 

With popular rendering applications like V-Ray already announcing and integrating support for NVIDIA's Optix Raytracing denoiser in their beta branch, it seems only a matter of time before we'll see a broad suite of professional applications supporting RTX technology for real-time. For example, raytraced renders of items being designed in CAD and modeling applications. 

This sort of speed represents a potential massive win for professional users, who won't have to waste time waiting for preview renderings to complete to continue iterating on their projects.

The NVIDIA Quadro GV100 is available now directly from NVIDIA now for a price of $8,999, which puts it squarely in the same price range of the previous highest-end Quadro GP100. 

Source: NVIDIA

NVIDIA Partners with AWS for Volta V100 in the Cloud

Subject: Graphics Cards | October 31, 2017 - 09:58 PM |
Tagged: nvidia, amazon, google, pascal, Volta, gv100, tesla v100

Remember last month? Remember when I said that Google’s introduction of Tesla P100s would be good leverage over Amazon, as the latter is still back in the Kepler days (because Maxwell was 32-bit focused)?

Amazon has leapfrogged them by introducing Volta-based V100 GPUs.

nvidia-2017-voltatensor.jpg

To compare the two parts, the Tesla P100 has 3584 CUDA cores, yielding just under 10 TFLOPs of single-precision performance. The Tesla V100, with its ridiculous die size, pushes that up over 14 TFLOPs. Same as Pascal, they also support full 1:2:4 FP64:FP32:FP16 performance scaling. It also has access to NVIDIA’s tensor cores, which are specialized for 16-bit, 4x4 multiply-add matrix operations that are apparently common in neural networks, both training and inferencing.

Amazon allows up to eight of them at once (with their P3.16xlarge instances).

So that’s cool. While Google has again been quickly leapfrogged by Amazon, it’s good to see NVIDIA getting wins in multiple cloud providers. This keeps money rolling in that will fund new chip designs for all the other segments.

Source: Amazon

NVIDIA Announces Tesla V100 with Volta GPU at GTC 2017

Subject: Graphics Cards | May 10, 2017 - 01:32 PM |
Tagged: v100, tesla, nvidia, gv100, gtc 2017

During the opening keynote to NVIDIA’s GPU Technology Conference, CEO Jen-Hsun Huang formally unveiled the latest GPU architecture and the first product based on it. The Tesla V100 accelerator is based on the Volta GPU architecture and features some amazingly impressive specifications. Let’s take a look.

  Tesla V100 GTX 1080 Ti Titan X (Pascal) GTX 1080 GTX 980 Ti TITAN X GTX 980 R9 Fury X R9 Fury
GPU GV100 GP102 GP102 GP104 GM200 GM200 GM204 Fiji XT Fiji Pro
GPU Cores 5120 3584 3584 2560 2816 3072 2048 4096 3584
Base Clock - 1480 MHz 1417 MHz 1607 MHz 1000 MHz 1000 MHz 1126 MHz 1050 MHz 1000 MHz
Boost Clock 1455 MHz 1582 MHz 1480 MHz 1733 MHz 1076 MHz 1089 MHz 1216 MHz - -
Texture Units 320 224 224 160 176 192 128 256 224
ROP Units 128 (?) 88 96 64 96 96 64 64 64
Memory 16GB 11GB 12GB 8GB 6GB 12GB 4GB 4GB 4GB
Memory Clock 878 MHz (?) 11000 MHz 10000 MHz 10000 MHz 7000 MHz 7000 MHz 7000 MHz 500 MHz 500 MHz
Memory Interface 4096-bit (HBM2) 352-bit 384-bit G5X 256-bit G5X 384-bit 384-bit 256-bit 4096-bit (HBM) 4096-bit (HBM)
Memory Bandwidth 900 GB/s 484 GB/s 480 GB/s 320 GB/s 336 GB/s 336 GB/s 224 GB/s 512 GB/s 512 GB/s
TDP 300 watts 250 watts 250 watts 180 watts 250 watts 250 watts 165 watts 275 watts 275 watts
Peak Compute 15 TFLOPS 10.6 TFLOPS 10.1 TFLOPS 8.2 TFLOPS 5.63 TFLOPS 6.14 TFLOPS 4.61 TFLOPS 8.60 TFLOPS 7.20 TFLOPS
Transistor Count 21.1B 12.0B 12.0B 7.2B 8.0B 8.0B 5.2B 8.9B 8.9B
Process Tech 12nm 16nm 16nm 16nm 28nm 28nm 28nm 28nm 28nm
MSRP (current) lol $699 $1,200 $599 $649 $999 $499 $649 $549

While we are low on details today, it appears that the fundamental compute units of Volta are similar to that of Pascal. The GV100 has 80 SMs with 40 TPCs and 5120 total CUDA cores, a 42% increase over the GP100 GPU used on the Tesla P100 and 42% more than the GP102 GPU used on the GeForce GTX 1080 Ti. The structure of the GPU remains the same GP100 with the CUDA cores organized as 64 single precision (FP32) per SM and 32 double precision (FP64) per SM.

image7.png

Click to Enlarge

Interestingly, NVIDIA has already told us the clock speed of this new product as well, coming in at 1455 MHz Boost, more than 100 MHz lower than the GeForce GTX 1080 Ti and 25 MHz lower than the Tesla P100.

SXM2-VoltaChipDetails.png

Click to Enlarge

Volta adds in support for a brand new compute unit though, known as Tensor Cores. With 640 of these on the GPU die, NVIDIA directly targets the neural network and deep learning fields. If this is your first time hearing about Tensor, you should read up on its influence on the hardware markets, bringing forth an open-source software library for machine learning. Google has invested in a Tensor-specific processor already, and now NVIDIA throws its hat in the ring.

Adding Tensor Cores to Volta allows the GPU to do mass processing for deep learning, on the order of a 12x improvement over Pascal’s capabilities using CUDA cores only.

07.jpg

For users interested in standard usage models, including gaming, the GV100 GPU offers 1.5x improvement in FP32 computing, up to 15 TFLOPS of theoretical performance and 7.5 TFLOPS of FP64. Other relevant specifications include 320 texture units, a 4096-bit HBM2 memory interface and 16GB of memory on-module. NVIDIA claims a memory bandwidth of 900 GB/s which works out to 878 MHz per stack.

Maybe more impressive is the transistor count: 21.1 BILLION! NVIDIA claims that this is the largest chip you can make physically with today’s technology. Considering it is being built on TSMC's 12nm FinFET technology and has an 815 mm2 die size, I see no reason to doubt them.

03.jpg

Shipping is scheduled for Q3 for Tesla V100 – at least that is when NVIDIA is promising the DXG-1 system using the chip is promised to developers.

I know many of you are interested in the gaming implications and timelines – sorry, I don’t have an answer for you yet. I will say that the bump from 10.6 TFLOPS to 15 TFLOPS is an impressive boost! But if the server variant of Volta isn’t due until Q3 of this year, I find it hard to think NVIDIA would bring the consumer version out faster than that. And whether or not NVIDIA offers gamers the chip with non-HBM2 memory is still a question mark for me and could directly impact performance and timing.

More soon!!

Source: NVIDIA