Subject: Graphics Cards | February 7, 2019 - 03:30 PM | Jeremy Hellstrom
Tagged: VRAM, video card, Vega 20, Vega, radeon vii, radeon, pcie, opencl, HBM2, graphics card, gaming, compute, amd, 7nm, 16GB
While enjoying the pictures and tests Sebastian ran on the new AMD Radeon VII, was there a game that we missed that is near and dear to your heart? Then perhaps one of these reviews below will solve that, the list even includes Linux performance for those on that side of the silicon. For instance, over at The Tech Report you can check out Monster Hunter: World, Forza Horizon 4 and the impressive results that the new 7nm card offers in Battlefield V.
"AMD's Radeon VII is the first gaming graphics card powered by a 7 nm GPU: Vega 20. This hopped-up Vega chip comes linked up with 16 GB of HBM2 RAM good for 1 TB/s of memory bandwidth. We put this potent combination to the test to see if it can beat out Nvidia's GeForce RTX 2080."
Here are some more Graphics Card articles from around the web:
- AMD Radeon VII @ Guru of 3D
- AMD Radeon VII 16GB Video Card Review @ Legit Reviews
- AMD Radeon VII: A 7nm-long step in the right direction, but is that enough? @ Ars Technica
- AMD Radeon VII 1440p, 4K & Ultrawide Gaming Performance @ Techgage
- AMD Radeon VII Review: RTX Killer or Flop? @ Techspot
- AMD Radeon VII 16 GB @ TechPowerUp
- AMD Radeon VII @ Kitguru
- AMD Radeon VII Linux Benchmarks - Powerful Open-Source Graphics For Compute & Gaming @ Phoronix
Overview and Specifications
After a month-long wait following its announcement during the AMD keynote at CES, the Radeon VII is finally here. By now you probably know that this is the world’s first 7nm gaming GPU, and it is launching today at a price equal to NVIDIA’s GeForce RTX 2080 at $699.
The AMD Radeon VII in action on the test bench
More than a gaming card, the Radeon VII is being positioned as a card for content creators as well by AMD, with its 16GB of fast HBM2 memory and enhanced compute capabilities complimenting what should be significantly improved gaming performance compared to the RX Vega 64.
Vega at 7nm
At the heart of the Radeon VII is the Vega 20 GPU, introduced with the Radeon Instinct MI60 and MI50 compute cards for the professional market back in November. The move to 7nm brings a reduction in die size from 495 mm2 with Vega 10 to 331 mm2 with Vega 20, but this new GPU is more than a die shrink with the most notable improvement by way of memory throughput, as this is significantly higher with Vega 20.
Double the HBM2, more than double the bandwidth
While effective memory speeds have been improved only slightly from 1.89 Gbps to 2.0 Gbps, far more impactful is the addition of two 4GB HBM2 stacks which not only increase the total memory to 16GB, but bring with them two additional memory controllers which double the interface width from 2048-bit to 4096-bit. This provides a whopping 1TB (1024 GB/s) of memory bandwidth, up from 483.8 GB/s with the RX Vega 64.
Subject: Graphics Cards | June 5, 2018 - 11:58 PM | Tim Verry
Tagged: Vega, machine learning, instinct, HBM2, gpu, computex 2018, computex, amd, 7nm
AMD showed off its first 7nm GPU in the form of the expected AMD Radeon Instinct RX Vega graphics product and RX Vega GPU with 32GB of HBM2 memory. The new GPU uses the Vega architecture along with the open source ecosystem built by AMD to enable both graphics and GPGPU workloads. AMD demonstrated using the 7nm RX Vega GPU for ray tracing in a cool demo that showed realistic reflections and shadows being rendered on a per pixel basis in a model. Granted, we are still a long way away from seeing that kind of detail in real time gaming, but is still cool to see glimpses of that ray traced future.
According to AMD, the 32GB of HBM2 memory will greatly benefit creators and enterprise clients that need to work with large datasets and be able to quickly make changes and updates to models before doing a final render. The larger memory buffer will also help in HPC applications with more big data databases being able to be kept close to the GPU for processing using the wide HBM2 memory bus. Further, HBM2 has physical size and energy efficiency benefits which will pique the interest of datacenters focused on maximizing TCO numbers.
Dr. Lisa Su came on state towards the end of the 7nm Vega demonstration to show off the GPU in person, and you can see that it is rather tiny for the compute power it provides! It is shorter than the two stacks of HBM2 dies on either side, for example.
Of course AMD did not disclose all the nitty-gritty specifications of the new machine learning graphics card that enthusiasts want to know. We will have to wait a bit longer for that information unfortunately!
As for other 7nm offerings? As Ryan talked about during CES in January, 2018 will primarily be the year for the machine learning-focused Radeon Instinct RX Vega 7nm GPU, with other consumer-focused GPUs using the smaller process node likely coming out in 2019. Whether those 7nm GPUs in 2019 will be a refreshed Vega or the new Navi is still up for debate, however AMD's graphics roadmap certainly doesn't rule out Navi as a possibility. In any case, AMD did state during the livestream that it intends to release a new GPU every year with the GPUs alternating between new architecture and new process node.
What are your thoughts on AMD's graphics roadmap and its first 7nm Vega GPU?
Subject: Memory | January 12, 2018 - 05:46 PM | Tim Verry
Tagged: supercomputing, Samsung, HPC, HBM2, graphics cards, aquabolt
Samsung recently announced that it has begun mass production of its second generation HBM2 memory which it is calling “Aquabolt”. Samsung has refined the design of its 8GB HBM2 packages allowing them to achieve an impressive 2.4 Gbps per pin data transfer rates without needing more power than its first generation 1.2V HBM2.
Reportedly Samsung is using new TSV (through-silicon-via) design techniques and adding additional thermal bumps between dies to improve clocks and thermal control. Each 8GB HBM2 “Aquabolt” package is comprised of eight 8Gb dies each of which is vertically interconnected using 5,000 TSVs which is a huge number especially considering how small and tightly packed these dies are. Further, Samsung has added a new protective layer at the bottom of the stack to reinforce the package’s physical strength. While the press release did not go into detail, it does mention that Samsung had to overcome challenges relating to “collateral clock skewing” as a result of the sheer number of TSVs.
On the performance front, Samsung claims that Aquabolt offers up a 50% increase in per package performance versus its first generation “Flarebolt” memory which ran at 1.6Gbps per pin and 1.2V. Interestingly, Aquabolt is also faster than Samsung’s 2.0Gbps per pin HBM2 product (which needed 1.35V) without needing additional power. Samsung also compares Aquabolt to GDDR5 stating that it offers 9.6-times the bandwidth with a single package of HBM2 at 307 GB/s and a GDDR5 chip at 32 GB/s. Thanks to the 2.4 Gbps per pin speed, Aquabolt offers 307 GB/s of bandwidth per package and with four packages products such as graphics cards can take advantage of 1.2 TB/s of bandwidth.
This second generation HBM2 memory is a decent step up in performance (with HBM hitting 128GB/s and first generation HBM2 hitting 256 GB/s per package and 512 GB/s and 1 TB/s with four packages respectively), but the interesting bit is that it is faster without needing more power. The increased bandwidth and data transfer speeds will be a boon to the HPC and supercomputing market and useful for working with massive databases, simulations, neural networks and AI training, and other “big data” tasks.
Aquabolt looks particularly promising for the mobile market though with future products succeeding the current mobile Vega GPU in Kaby Lake-G processors, Ryzen Mobile APUs, and eventually discrete Vega mobile graphics cards getting a nice performance boost (it’s likely too late for AMD to go with this new HBM2 on these specific products, but future refreshes or generations may be able to take advantage of it). I’m sure it will also see usage in the SoCs uses in Intel’s and NVIDIA’s driverless car projects as well.
Subject: General Tech | December 27, 2017 - 11:42 AM | Jeremy Hellstrom
Tagged: nvidia, Intel, HBM2, deep learning
AMD has never been afraid to try new things, from hitting 1GHz first, to creating a true multicore processor, most recently adopting HBM and HBM2 into their graphics cards. That move contributed to some of their recent difficulties with the current generation of GPUs; HBM is more expensive to produce and more of a challenge to implement. While they were the first to implement HBM, it is NVIDIA and Intel which are benefiting from AMD's experimental nature. Their new generation of HPC solutions, the Tesla P100, Quadro GP 100 and Lake Crest all use HBM2 and benefit from the experience Hynix, Samsung and TSMC gained fabbing the first generation. Vega products offer slightly less memory bandwidth as well as lagging behind in overall performance, a drawback to being first.
On a positive note, AMD have now had more experience designing chips which make use of HBM and this could offer a new hope for the next generation of cards, both gaming and HPC flavours. DigiTimes briefly covers the two processes manufacturers use in the production of HBM here.
"However, Intel's release of its deep-learning chip, Lake Crest, which came following its acquisition of Nervana, has come with HMB2. This indicates that HBM-based architecture will be the main development direction of memory solutions for HPC solutions by GPU vendors."
Here is some more Tech News from around the web:
- FCC Approves First Wireless 'Power-At-A-Distance' Charging System @ Slashdot
- Guidemaster: Everything Amazon’s Alexa can do, plus the best skills to enable @ Ars Technica
- INQ's best and worst of tech in 2017
- Linksys LAPAC2600 AC2600 Dual Band MU-MIMMO Access Point Review @ NikKTech
- Acoustic Attacks on HDDs Can Sabotage PCs, CCTV Systems, ATMs, More @ Slashdot
- 10 Tech Products That Are Next to Impossible to Repair @ Techspot
- noblechairs ICON PU Faux Leather Chair @ TechPowerUp
Subject: General Tech, Graphics Cards | December 4, 2017 - 05:47 PM | Tim Verry
Tagged: navi, HBM2, hbm, gddr6, amd
WCCFTech reports that AMD is working on a GDDR6 memory controller for its upcoming graphics cards. Starting with an AMD Technical Engineer listing GDDR6 on his portfolio, the site claims to have verified through sources familiar with the matter that AMD is, in fact, supporting the new graphics memory standard and will be using their own controller to support it (rather than licensing one).
AMD is not abandoning HBM2 memory though. The company is sticking to its previously released roadmaps and Navi will still utilize HBM2 memory – at least on the high-end SKUs. While AMD has so far only released RX Vega 64 and RX Vega 56 graphics cards, the company may well release lower-end Vega-based cards with GDDR5 at some point although for now the Polaris architecture is handling the lower end. AMD supporting GDDR6 is a good thing and should enable cheaper mid-range cards that are not limited by supply shortages of the more expensive (albeit much higher bandwidth) High Bandwidth Memory that have seemingly plagues both NVIDIA and AMD at various points in time. GDDR6 further offers several advantages over GDDR5 with almost twice the speed (9 Gbps versus 16 Gbps) at lower power (1.5V versus 1.35V) and more density and underlying technology optimizations than even GDDR5X. While the G5X memory is capable of hitting the same 16 Gbps launch speeds of GDDR6, the newer memory technology offers up to 32Gb dies* versus 16Gb and a two channel design (which ends up being a bit more efficient and easier to produce / for GPU manufacturers to wire up). GDDR6 will represent a nice speed bump for mid-range cards (very low end may well stick with GDDR5 save for mobile parts which could benefit from the lower power GDDR6) while letting AMD have a bit better profit margins on these lower end margin SKUs and being able to produce more cards to satisfy demand. HBM2 is nice to have but it is more well suited for the compute-oriented cards for workstation and data center usage rather than gaming right now and GDDR6 can offer more price-to-performance for the consumer gaming cards.
As for the question of why AMD would want to design their own GDDR6 memory controller rather than license one, I think that comes down to AMD thinking long-term. It will be more expensive up front to design their own controller, but AMD will be able to more fully integrate it and tune it to work with their graphics cards such that it can be more power efficient. Also, having their own GDDR6 memory controller means they can use it in other areas such as their APUs and SoCs offered through their Semi Custom Business Unit (e.g. the SoCs used in gaming consoles). Being able to offer that controller to other companies in their semi-custom SoCs free of third party licensing fees is a good thing for AMD.
With GDDR6 becoming readily available early next year, there is a good chance AMD will be ready to use the new memory technology as soon as Navi but likely not until closer to the end of 2018 or early 2019 when AMD launches new lower and mid-range gaming cards (consumer-level) based on Navi and/or Vega.
*At launch it appears that GDDR6 from the big three (Micron, Samsung, and SK Hynix) will use 16Gb dies, but the standard allows for up to 32Gb dies. The G5X standard allows for up to 16Gb dies.
- (Leak) AMD Vega 10 and Vega 20 Information Leaked
- Micron Pushes GDDR5X To 16Gbps, Expects To Launch GDDR6 In Early 2018
- Micron Planning To Launch GDDR6 Graphics Memory In 2017
- Podcast #436 - ECS Mini-STX, NVIDIA Quadro, AMD Zen Arch, Optane, GDDR6 and more!
- AMD Q3 2017 Earnings: A Pleasant Surprise
Subject: Processors | November 6, 2017 - 02:00 PM | Josh Walrath
Tagged: radeon, Polaris, mobile, kaby lake, interposer, Intel, HBM2, gaming, EMIB, apple, amd, 8th generation core
In what is probably considered one of the worst kept secrets in the industry, Intel has announced a new CPU line for the mobile market that integrates AMD’s Radeon graphics. For the past year or so rumors of such a partnership were freely flowing, but now we finally get confirmation as to how this will be implemented and marketed.
Intel’s record on designing GPUs has been rather pedestrian. While they have kept up with the competition, a slew of small issues and incompatibilities have plagued each generation. Performance is also an issue when trying to compete with AMD’s APUs as well as discrete mobile graphics offerings from both AMD and NVIDIA. Software and driver support is another area where Intel has been unable to compete due largely to economics and the competitions’ decades of experience in this area.
There are many significant issues that have been solved in one fell swoop. Intel has partnered with AMD’s Semi-Custom Group to develop a modern and competent GPU that can be closely connected to the Intel CPU all the while utilizing HBM2 memory to improve overall performance. The packaging of this product utilizes Intel’s EMIB (Embedded Multi-die Interconnect Bridge) tech.
EMIB is an interposer-like technology that integrates silicon bridges into the PCB instead of relying upon a large interposer. This allows a bit more flexibility in layout of the chips as well as lowers the Z height of the package as there is not a large interposer sitting between the chips and the PCB. Just as interposer technology allows the use of chips from different process technologies to work seamlessly together, EMIB provides that same flexibility.
The GPU looks to be based on the Polaris architecture which is a slight step back from AMD’s cutting edge Vega architecture. Polaris does not implement the Infinity Fabric component that Vega does. It is more conventional in terms of data communication. It is a step beyond what AMD has provided for Sony and Microsoft, who each utilize a semi-custom design for the latest console chips. AMD is able to integrate the HBM2 controller that is featured in Vega. Using HBM2 provides a tremendous amount of bandwidth along with power savings as compared to traditional GDDR-5 memory modules. It also saves dramatically on PCB space allowing for smaller form factors.
EMIB provides nearly all of the advantages of the interposer while keeping the optimal z-height of the standard PCB substrate.
Intel did have to do quite a bit of extra work on the power side of the equation. AMD utilizes their latest Infinity Fabric for fine grained power control in their upcoming Raven Ridge based Ryzen APUs. Intel had to modify their current hardware to be able to do much the same work with 3rd party silicon. This is no easy task as the CPU needs to monitor and continually adjust for GPU usage in a variety of scenarios. This type of work takes time and a lot of testing to fine tune as well as the inevitable hardware revisions to get thing to work correctly. This then needs to be balanced by the GPU driver stack which also tends to take control of power usage in mobile scenarios.
This combination of EMIB, Intel Kaby Lake CPU, HBM2, and a current AMD GPU make this a very interesting combination for the mobile and small form factor markets. The EMIB form factor provides very fast interconnect speeds and a smaller footprint due to the integration of HBM2 memory. The mature AMD Radeon software stack for both Windows and macOS environments provides Intel with another feature in which to sell their parts in areas where previously they were not considered. The 8th Gen Kaby Lake CPU provides the very latest CPU design on the new 14nm++ process for greater performance and better power efficiency.
This is one of those rare instances where such cooperation between intense rivals actually improves the situation for both. AMD gets a financial shot in the arm by signing a large and important customer for their Semi-Custom division. The royalty income from this partnership should be more consistent as compared to the console manufacturers due to the seasonality of the console product. This will have a very material effect on AMD’s bottom line for years to come. Intel gets a solid silicon solution with higher performance than they can offer, as well as aforementioned mature software stack for multiple OS. Finally throw in the HBM2 memory support for better power efficiency and a smaller form factor, and it is a clear win for all parties involved.
The PCB savings plus faster interconnects will allow these chips to power smaller form factors with better performance and battery life.
One of the unknowns here is what process node the GPU portion will be manufactured on. We do not know which foundry Intel will use, or if they will stay in-house. Currently TSMC manufactures the latest console SoCs while GLOBALFOUNDRIES handles the latest GPUS from AMD. Initially one would expect Intel to build the GPU in house, but the current rumor is that AMD will work to produce the chips with one of their traditional foundry partners. Once the chip is manufactured then it is sent to Intel to be integrated into their product.
Apple is one of the obvious candidates for this particular form factor and combination of parts. Apple has a long history with Intel on the CPU side and AMD on the GPU side. This product provides all of the solutions Apple needs to manufacture high performance products in smaller form factors. Gaming laptops also get a boost from such a combination that will offer relatively high performance with minimal power increases as well as the smaller form factor.
The potential (leaked) performance of the 8th Gen Intel CPU with Radeon Graphics.
The data above could very well be wrong about the potential performance of this combination. What we see is pretty compelling though. The Intel/AMD product performs like a higher end CPU with discrete GPU combo. It is faster than a NVIDIA GTX 1050 Ti and trails the GTX 1060. It also is significantly faster than a desktop AMD RX 560 part. We can also see that it is going to be much faster than the flagship 15 watt TDP AMD Ryzen 7 2700U. We do not yet know how it compares to the rumored 65 watt TDP Raven Ridge based APUs from AMD that will likely be released next year. What will be fascinating here is how much power the new Intel combination will draw as compared to the discrete solutions utilizing NVIDIA graphics.
To reiterate, this is Intel as a customer for AMD’s Semi-Custom group rather than a licensing agreement between the two companies. They are working hand in hand in developing this solution and then both profiting from it. AMD getting royalties from every Intel package sold that features this technology will have a very positive effect on earnings. Intel gets a cutting edge and competent graphics solution along with the improved software and driver support such a package includes.
Update: We have been informed that AMD is producing the chips and selling them directly to Intel for integration into these new SKUs. There are no royalties or licensing, but the Semi-Custom division should still receive the revenue for these specialized products made only for Intel.
Subject: General Tech | March 28, 2017 - 01:04 PM | Jeremy Hellstrom
Tagged: amd, Vega, rumour, HBM2
The Inquirer have posted a tiny bit of information about AMD's upcoming Vega and as any rumours about the new GPU are hard to find it is the best we have at the moment. AMD's claim is that the second generation HBM present on the 4GB and 8GB models could offer equivalent memory bandwidth to a GTX 1080 Ti, which makes perfect sense. The GTX 1080 Ti offers 484 GB/s of memory bandwidth while AMD's R9 series first generation HBM offers 512 GB/s. The real trick is filling that pipeline to give AMD's HBM2 based cards a chance to shine and which depends on software developers as much as it does the hardware. As well, The Inquirer discusses the possible efficiency advantages that Vega will have, which could result in smaller cards as well as an effective mobile product. Pop over to take a look at the current rumours, here is hoping we can provide more detailed information in the near future.
"AMD HAS TEASED more information about its forthcoming Vega-based graphics cards, revealing that they will come with either 4GB or 8GB memory and hinting that a launch is imminent."
Here is some more Tech News from around the web:
- iPhone-havers think they're safe. But they're not @ The Register
- FYI Docs.com users: You may have leaked passwords, personal info – thousands have @ The Register
- LastPass scrambles to fix another major flaw – once again spotted by Google's bugfinders @ The Register
- Johnny Depp signs on to play John McAfee in a film of his life @ The Inquirer
- Samsung 4K Blu-ray Player @ Hardware Secrets
- Futuremark Ends Support for 3DMark Vantage and PCMark Vantage @ [H]ard|OCP
- Konica Minolta Unveils the Future of Work, Or At Least Its Version @ Kitguru
- Win a PC hardware bundle with Gigabyte AORUS, HyperX and KitGuru
NVIDIA P100 comes to Quadro
At the start of the SOLIDWORKS World conference this week, NVIDIA took the cover off of a handful of new Quadro cards targeting professional graphics workloads. Though the bulk of NVIDIA’s discussion covered lower cost options like the Quadro P4000, P2000, and below, the most interesting product sits at the high end, the Quadro GP100.
As you might guess from the name alone, the Quadro GP100 is based on the GP100 GPU, the same silicon used on the Tesla P100 announced back in April of 2016. At the time, the GP100 GPU was specifically billed as an HPC accelerator for servers. It had a unique form factor with a passive cooler that required additional chassis fans. Just a couple of months later, a PCIe version of the GP100 was released under the Tesla GP100 brand with the same specifications.
Today that GPU hardware gets a third iteration as the Quadro GP100. Let’s take a look at the Quadro GP100 specifications and how it compares to some recent Quadro offerings.
|Quadro GP100||Quadro P6000||Quadro M6000||Full GP100|
|FP32 CUDA Cores / SM||64||64||64||64|
|FP32 CUDA Cores / GPU||3584||3840||3072||3840|
|FP64 CUDA Cores / SM||32||2||2||32|
|FP64 CUDA Cores / GPU||1792||120||96||1920|
|Base Clock||1303 MHz||1417 MHz||1026 MHz||TBD|
|GPU Boost Clock||1442 MHz||1530 MHz||1152 MHz||TBD|
|FP32 TFLOPS (SP)||10.3||12.0||7.0||TBD|
|FP64 TFLOPS (DP)||5.15||0.375||0.221||TBD|
|Memory Interface||1.4 Gbps
|Memory Bandwidth||716 GB/s||432 GB/s||316.8 GB/s||?|
|Memory Size||16GB||24 GB||12GB||16GB|
|TDP||235 W||250 W||250 W||TBD|
|Transistors||15.3 billion||12 billion||8 billion||15.3 billion|
|GPU Die Size||610mm2||471 mm2||601 mm2||610mm2|
There are some interesting stats here that may not be obvious at first glance. Most interesting is that despite the pricing and segmentation, the GP100 is not the de facto fastest Quadro card from NVIDIA depending on your workload. With 3584 CUDA cores running at somewhere around 1400 MHz at Boost speeds, the single precision (32-bit) rating for GP100 is 10.3 TFLOPS, less than the recently released P6000 card. Based on GP102, the P6000 has 3840 CUDA cores running at something around 1500 MHz for a total of 12 TFLOPS.
GP100 (full) Block Diagram
Clearly the placement for Quadro GP100 is based around its 64-bit, double precision performance, and its ability to offer real-time simulations on more complex workloads than other Pascal-based Quadro cards can offer. The Quadro GP100 offers 1/2 DP compute rate, totaling 5.2 TFLOPS. The P6000 on the other hand is only capable of 0.375 TLOPS with the standard, consumer level 1/32 DP rate. Inclusion of ECC memory support on GP100 is also something no other recent Quadro card has.
Raw graphics performance and throughput is going to be questionable until someone does some testing, but it seems likely that the Quadro P6000 will still be the best solution for that by at least a slim margin. With a higher CUDA core count, higher clock speeds and equivalent architecture, the P6000 should run games, graphics rendering and design applications very well.
There are other important differences offered by the GP100. The memory system is built around a 16GB HBM2 implementation which means more total memory bandwidth but at a lower capacity than the 24GB Quadro P6000. Offering 66% more memory bandwidth does mean that the GP100 offers applications that are pixel throughput bound an advantage, as long as the compute capability keeps up on the backend.
93% of a GP100 at least...
NVIDIA has announced the Tesla P100, the company's newest (and most powerful) accelerator for HPC. Based on the Pascal GP100 GPU, the Tesla P100 is built on 16nm FinFET and uses HBM2.
NVIDIA provided a comparison table, which we added what we know about a full GP100 to:
|Tesla K40||Tesla M40||Tesla P100||Full GP100|
|GPU||GK110 (Kepler)||GM200 (Maxwell)||GP100 (Pascal)||GP100 (Pascal)|
|FP32 CUDA Cores / SM||192||128||64||64|
|FP32 CUDA Cores / GPU||2880||3072||3584||3840|
|FP64 CUDA Cores / SM||64||4||32||32|
|FP64 CUDA Cores / GPU||960||96||1792||1920|
|Base Clock||745 MHz||948 MHz||1328 MHz||TBD|
|GPU Boost Clock||810/875 MHz||1114 MHz||1480 MHz||TBD|
|Memory Interface||384-bit GDDR5||384-bit GDDR5||4096-bit HBM2||4096-bit HBM2|
|Memory Size||Up to 12 GB||Up to 24 GB||16 GB||TBD|
|L2 Cache Size||1536 KB||3072 KB||4096 KB||TBD|
|Register File Size / SM||256 KB||256 KB||256 KB||256 KB|
|Register File Size / GPU||3840 KB||6144 KB||14336 KB||15360 KB|
|TDP||235 W||250 W||300 W||TBD|
|Transistors||7.1 billion||8 billion||15.3 billion||15.3 billion|
|GPU Die Size||551 mm2||601 mm2||610 mm2||610mm2|
|Manufacturing Process||28 nm||28 nm||16 nm||16nm|
This table is designed for developers that are interested in GPU compute, so a few variables (like ROPs) are still unknown, but it still gives us a huge insight into the “big Pascal” architecture. The jump to 16nm allows for about twice the number of transistors, 15.3 billion, up from 8 billion with GM200, with roughly the same die area, 610 mm2, up from 601 mm2.
A full GP100 processor will have 60 shader modules, compared to GM200's 24, although Pascal stores half of the shaders per SM. The GP100 part that is listed in the table above is actually partially disabled, cutting off four of the sixty total. This leads to 3584 single-precision (32-bit) CUDA cores, which is up from 3072 in GM200. (The full GP100 architecture will have 3840 of these FP32 CUDA cores -- but we don't know when or where we'll see that.) The base clock is also significantly higher than Maxwell, 1328 MHz versus ~1000 MHz for the Titan X and 980 Ti, although Ryan has overclocked those GPUs to ~1390 MHz with relative ease. This is interesting, because even though 10.6 TeraFLOPs is amazing, it's only about 20% more than what GM200 could pull off with an overclock.