First, Some Background

The rumored GP102 is the first of its kind since Fermi. How big of a change could it be?

 
TL;DR:
NVIDIA's Rumored GP102
 
Based on two rumors, NVIDIA seems to be planning a new GPU, called GP102, that sits between GP100 and GP104. This changes how their product stack flowed since Fermi and Kepler. GP102's performance, both single-precision and double-precision, will likely signal NVIDIA's product plans going forward.
  • – GP100's ideal 1 : 2 : 4 FP64 : FP32 : FP16 ratio is inefficient for gaming
  • – GP102 either extends GP104's gaming lead or bridges GP104 and GP100
  • – If GP102 is a bigger GP104, the future is unclear for smaller GPGPU devs
    • This is, unless GP100 can be significantly up-clocked for gaming.
  • – If GP102 matches (or outperforms) GP100 in gaming, and has better than 1 : 32 double-precision performance, then GP100 would be the first time that NVIDIA designed an enterprise-only, high-end GPU.
 

 

When GP100 was announced, Josh and I were discussing, internally, how it would make sense in the gaming industry. Recently, an article on WCCFTech cited anonymous sources, which should always be taken with a dash of salt, that claimed NVIDIA was planning a second architecture, GP102, between GP104 and GP100. As I was writing this editorial about it, relating it to our own speculation about the physics of Pascal, VideoCardz claims to have been contacted by the developers of AIDA64, seemingly on-the-record, also citing a GP102 design.

I will retell chunks of the rumor, but also add my opinion to it.

In the last few generations, each architecture had a flagship chip that was released in both gaming and professional SKUs. Neither audience had access to a chip that was larger than the other's largest of that generation. Clock rates and disabled portions varied by specific product, with gaming usually getting the more aggressive performance for slightly better benchmarks. Fermi had GF100/GF110, Kepler had GK110/GK210, and Maxwell had GM200. Each of these were available in Tesla, Quadro, and GeForce cards, especially Titans.

Maxwell was interesting, though. NVIDIA was unable to leave 28nm, which Kepler launched on, so they created a second architecture at that node. To increase performance without having access to more feature density, you need to make your designs bigger, more optimized, or more simple. GM200 was giant and optimized, but, to get the performance levels it achieved, also needed to be more simple. Something needed to go, and double-precision (FP64) performance was the big omission. NVIDIA was upfront about it at the Titan X launch, and told their GPU compute customers to keep purchasing Kepler if they valued FP64.

GPU manufacturers jump from 28nm, past 20nm, down to 14nm (AMD) and 16nm (NVIDIA). This double-jump in fabrication technology gives them a lot of room to add features, such as more shader cores and other accelerators (video decode, simultaneous multi-projection, etc.). Alternatively, they can produce a smaller chip with the same amount of performance, yielding more from a batch.

One thing that we knew NVIDIA was planning to add to Pascal is 16-bit support. This will allow developers to trade a boost in speed (by pushing two, 16-bit calculations through a space that's designed for 32-bit values) for a reduction in precision, but no specific details were given. 64-bit would also be supported but, historically, it was some fraction of 32-bit performance, especially on gaming SKUs.

Then they announced that GP100 would have a performance ratio of 4 FP16 : 2 FP32 : 1 FP64.

Okay then… that's a lot of die area that is not being used for single-precision. If you're a researcher or another high-performance computing customer, then this is music to your ears. Otherwise, that's a lot of performance for calculations that your software will basically never make… that is, unless the games industry was about to change in a dramatic way.

It isn't. NVIDIA announced the GeForce GTX 1080 and its GP104 processor.

In terms of performance, this architecture has the same ratio as Maxwell, 32 FP32 : 1 FP64, and, while FP16 is supported, it's just there for compatibility reasons. You don't want to use half-precision. FP32 is the only first-class citizen in GP104. That said, GP100 is twice the size (and transistor count) of GP104, but it only has 50% more shaders. Actual performance may even be less, too, if the bigger chip requires a lower clock rate due to the higher chance of manufacturing defects occurring within each die's boundaries.

What This Means

GP100 is the first chip from NVIDIA to reach the ideal 1 : 2 : 4 ratio between 64-bit, 32-bit, and 16-bit calculations. (Workstation Fermi did 1: 2 in FP64 : FP32 — but not FP16.) This push might have been encouraged by Intel's Xeon Phi co-processor, which has secured some super-computer design wins due to its double-precision performance, which is, likewise, 2x single-precision. (As far as I know, AVX-512 doesn't support FP16 instructions.) I can see why NVIDIA would want FP64 to return as full-speed data type to keep customers away from Intel, with its high-tech fabrication processes and x86-everywhere mindset. GP100 is theoretically faster than Knight's Landing, but that doesn't mean anything if your customers already wrote their software, and did so exclusively for an x86, many-threaded architecture. It also pulls their designs away from the needs of gaming.

When GP100 was announced, I saw one of three outcomes:

  1. Gaming shifts into new features, like 64-bit world coordinates, to use capacity
  2. GP100's die area waste is still acceptably low, like GF110, for NVIDIA to justify ignoring it
  3. NVIDIA diverges their gaming and professional designs

The first outcome, where gaming suddenly embraces 64-bit computation on the GPU, was incinerated when GP104 was announced. I figured that, due to the double-jump in fabrication nodes, it would be the best time to take the hit (and still show a performance increase) if they knew something was on the horizon. Apparently they don't. This leaves the fight between the last two points. Designing an extra chip would take effort, not to mention alienate enthusiasts who want NVIDIA's “best” chip, but GP100 is quite expensive in ways that might not make sense for home users.

We're now hearing about GP102, which is rumored to be the big gaming chip of this generation. It is said to slide itself between GP100 and GP104 in terms of die area, but we don't know whether it will use HBM2 or GDDR5X memory. Whenever we get a new Titan, and perhaps a GTX 1080 Ti, or whatever they're called, this seems to be the silicon that powers it.

The thing is, GP102 also drives a wedge between gamers and GP100, depending on how it's tuned. As we said earlier, there might not be a large gap between GP104 and GP100 for gaming, which makes me wonder whether the whole other product stack will actually perform under GP100, or around GP100. Either GP100 still has a lot of headroom, and will take the crown on a second generation of Pascal, or GP102 performs equivalent (or better) under gaming scenarios, clearly dividing the market between the two chips.

This is where we stop and ponder what NVIDIA's future product stack will look like. The original Titan (and Titan Black) introduced enthusiasts to a high-performance GPU compute card that also happened to be the best gaming card available. GK110, again, didn't have an ideal, 1 : 2 FP64 ratio, but 1 : 3 is pretty close. GP102 could be closer to GK110, at a 1:4 or 1:8 ratio, but that would even more effort on NVIDIA's part for a probably low-volume chip. If this is the case, GP102 could be the compromising bridge between the two product categories — better gaming performance with a taste of high-performance compute.

On the other hand, it could be a scaled-up GP104, and have a 1 : 32 ratio. This would be easier to develop, but also cast away anyone looking for a cheap, but GPGPU-friendly middle-ground. Whatever we get should reveal NVIDIA's product strategy going forward.