Google Introduces Tesla P100 to Cloud Platform

Subject: Graphics Cards | September 23, 2017 - 12:16 AM |
Tagged: google, nvidia, p100, GP100

NVIDIA seems to have scored a fairly large customer lately, as Google has just added Tesla P100 GPUs to their cloud infrastructure. Effective immediately, you can attach up to four of these GPUs to your rented servers on an hourly or monthly basis. According to their pricing calculator, each GPU adds $2.30 per hour to your server’s fee in Oregon and South Carolina, which isn’t a lot if you only use them for short periods of time.

google-2017-cloudplatformlogo.png

If you need to use them long-term, though, Google has also announced “sustained use discounts” with this blog post, too.

While NVIDIA has technically launched a successor to the P100, the Volta-based V100, the Pascal-based part is still quite interesting. The main focus of the GPU design, GP100, was bringing FP64 performance up to its theoretical maximum of 1/2 FP32. It also has very high memory bandwidth, due to its HBM 2.0 stacks, which is often a huge bottleneck for GPU-based applications.

For NVIDIA, selling high-end GPUs is obviously good. The enterprise market is lucrative, and it validates their push into the really large die sizes. For Google, it gives a huge reason for interested parties to consider them over just defaulting to Amazon. AWS has GPU instances, but they’re currently limited to Kepler and Maxwell (and they offer FPGA-based acceleration, too). They can always catch up, but they haven’t yet, and that's good for Google.

Source: Google
Author:
Manufacturer: NVIDIA

NVIDIA P100 comes to Quadro

At the start of the SOLIDWORKS World conference this week, NVIDIA took the cover off of a handful of new Quadro cards targeting professional graphics workloads. Though the bulk of NVIDIA’s discussion covered lower cost options like the Quadro P4000, P2000, and below, the most interesting product sits at the high end, the Quadro GP100.

As you might guess from the name alone, the Quadro GP100 is based on the GP100 GPU, the same silicon used on the Tesla P100 announced back in April of 2016. At the time, the GP100 GPU was specifically billed as an HPC accelerator for servers. It had a unique form factor with a passive cooler that required additional chassis fans. Just a couple of months later, a PCIe version of the GP100 was released under the Tesla GP100 brand with the same specifications.

quadro2017-2.jpg

Today that GPU hardware gets a third iteration as the Quadro GP100. Let’s take a look at the Quadro GP100 specifications and how it compares to some recent Quadro offerings.

  Quadro GP100 Quadro P6000 Quadro M6000 Full GP100
GPU GP100 GP102 GM200 GP100 (Pascal)
SMs 56 60 48 60
TPCs 28 30 24 (30?)
FP32 CUDA Cores / SM 64 64 64 64
FP32 CUDA Cores / GPU 3584 3840 3072 3840
FP64 CUDA Cores / SM 32 2 2 32
FP64 CUDA Cores / GPU 1792 120 96 1920
Base Clock 1303 MHz 1417 MHz 1026 MHz TBD
GPU Boost Clock 1442 MHz 1530 MHz 1152 MHz TBD
FP32 TFLOPS (SP) 10.3 12.0 7.0 TBD
FP64 TFLOPS (DP) 5.15 0.375 0.221 TBD
Texture Units 224 240 192 240
ROPs 128? 96 96 128?
Memory Interface 1.4 Gbps
4096-bit HBM2
9 Gbps
384-bit GDDR5X
6.6 Gbps
384-bit
GDDR5
4096-bit HBM2
Memory Bandwidth 716 GB/s 432 GB/s 316.8 GB/s ?
Memory Size 16GB 24 GB 12GB 16GB
TDP 235 W 250 W 250 W TBD
Transistors 15.3 billion 12 billion 8 billion 15.3 billion
GPU Die Size 610mm2 471 mm2 601 mm2 610mm2
Manufacturing Process 16nm 16nm 28nm 16nm

There are some interesting stats here that may not be obvious at first glance. Most interesting is that despite the pricing and segmentation, the GP100 is not the de facto fastest Quadro card from NVIDIA depending on your workload. With 3584 CUDA cores running at somewhere around 1400 MHz at Boost speeds, the single precision (32-bit) rating for GP100 is 10.3 TFLOPS, less than the recently released P6000 card. Based on GP102, the P6000 has 3840 CUDA cores running at something around 1500 MHz for a total of 12 TFLOPS.

gp102-blockdiagram.jpg

GP100 (full) Block Diagram

Clearly the placement for Quadro GP100 is based around its 64-bit, double precision performance, and its ability to offer real-time simulations on more complex workloads than other Pascal-based Quadro cards can offer. The Quadro GP100 offers 1/2 DP compute rate, totaling 5.2 TFLOPS. The P6000 on the other hand is only capable of 0.375 TLOPS with the standard, consumer level 1/32 DP rate. Inclusion of ECC memory support on GP100 is also something no other recent Quadro card has.

quadro2017-3.jpg

Raw graphics performance and throughput is going to be questionable until someone does some testing, but it seems likely that the Quadro P6000 will still be the best solution for that by at least a slim margin. With a higher CUDA core count, higher clock speeds and equivalent architecture, the P6000 should run games, graphics rendering and design applications very well.

There are other important differences offered by the GP100. The memory system is built around a 16GB HBM2 implementation which means more total memory bandwidth but at a lower capacity than the 24GB Quadro P6000. Offering 66% more memory bandwidth does mean that the GP100 offers applications that are pixel throughput bound an advantage, as long as the compute capability keeps up on the backend.

m.jpg

Continue reading our preview of the new Quadro GP100!