Subject: General Tech | March 28, 2017 - 01:04 PM | Jeremy Hellstrom
Tagged: amd, Vega, rumour, HBM2
The Inquirer have posted a tiny bit of information about AMD's upcoming Vega and as any rumours about the new GPU are hard to find it is the best we have at the moment. AMD's claim is that the second generation HBM present on the 4GB and 8GB models could offer equivalent memory bandwidth to a GTX 1080 Ti, which makes perfect sense. The GTX 1080 Ti offers 484 GB/s of memory bandwidth while AMD's R9 series first generation HBM offers 512 GB/s. The real trick is filling that pipeline to give AMD's HBM2 based cards a chance to shine and which depends on software developers as much as it does the hardware. As well, The Inquirer discusses the possible efficiency advantages that Vega will have, which could result in smaller cards as well as an effective mobile product. Pop over to take a look at the current rumours, here is hoping we can provide more detailed information in the near future.
"AMD HAS TEASED more information about its forthcoming Vega-based graphics cards, revealing that they will come with either 4GB or 8GB memory and hinting that a launch is imminent."
Here is some more Tech News from around the web:
- iPhone-havers think they're safe. But they're not @ The Register
- FYI Docs.com users: You may have leaked passwords, personal info – thousands have @ The Register
- LastPass scrambles to fix another major flaw – once again spotted by Google's bugfinders @ The Register
- Johnny Depp signs on to play John McAfee in a film of his life @ The Inquirer
- Samsung 4K Blu-ray Player @ Hardware Secrets
- Futuremark Ends Support for 3DMark Vantage and PCMark Vantage @ [H]ard|OCP
- Konica Minolta Unveils the Future of Work, Or At Least Its Version @ Kitguru
- Win a PC hardware bundle with Gigabyte AORUS, HyperX and KitGuru
NVIDIA P100 comes to Quadro
At the start of the SOLIDWORKS World conference this week, NVIDIA took the cover off of a handful of new Quadro cards targeting professional graphics workloads. Though the bulk of NVIDIA’s discussion covered lower cost options like the Quadro P4000, P2000, and below, the most interesting product sits at the high end, the Quadro GP100.
As you might guess from the name alone, the Quadro GP100 is based on the GP100 GPU, the same silicon used on the Tesla P100 announced back in April of 2016. At the time, the GP100 GPU was specifically billed as an HPC accelerator for servers. It had a unique form factor with a passive cooler that required additional chassis fans. Just a couple of months later, a PCIe version of the GP100 was released under the Tesla GP100 brand with the same specifications.
Today that GPU hardware gets a third iteration as the Quadro GP100. Let’s take a look at the Quadro GP100 specifications and how it compares to some recent Quadro offerings.
|Quadro GP100||Quadro P6000||Quadro M6000||Full GP100|
|FP32 CUDA Cores / SM||64||64||64||64|
|FP32 CUDA Cores / GPU||3584||3840||3072||3840|
|FP64 CUDA Cores / SM||32||2||2||32|
|FP64 CUDA Cores / GPU||1792||120||96||1920|
|Base Clock||1303 MHz||1417 MHz||1026 MHz||TBD|
|GPU Boost Clock||1442 MHz||1530 MHz||1152 MHz||TBD|
|FP32 TFLOPS (SP)||10.3||12.0||7.0||TBD|
|FP64 TFLOPS (DP)||5.15||0.375||0.221||TBD|
|Memory Interface||1.4 Gbps
|Memory Bandwidth||716 GB/s||432 GB/s||316.8 GB/s||?|
|Memory Size||16GB||24 GB||12GB||16GB|
|TDP||235 W||250 W||250 W||TBD|
|Transistors||15.3 billion||12 billion||8 billion||15.3 billion|
|GPU Die Size||610mm2||471 mm2||601 mm2||610mm2|
There are some interesting stats here that may not be obvious at first glance. Most interesting is that despite the pricing and segmentation, the GP100 is not the de facto fastest Quadro card from NVIDIA depending on your workload. With 3584 CUDA cores running at somewhere around 1400 MHz at Boost speeds, the single precision (32-bit) rating for GP100 is 10.3 TFLOPS, less than the recently released P6000 card. Based on GP102, the P6000 has 3840 CUDA cores running at something around 1500 MHz for a total of 12 TFLOPS.
GP100 (full) Block Diagram
Clearly the placement for Quadro GP100 is based around its 64-bit, double precision performance, and its ability to offer real-time simulations on more complex workloads than other Pascal-based Quadro cards can offer. The Quadro GP100 offers 1/2 DP compute rate, totaling 5.2 TFLOPS. The P6000 on the other hand is only capable of 0.375 TLOPS with the standard, consumer level 1/32 DP rate. Inclusion of ECC memory support on GP100 is also something no other recent Quadro card has.
Raw graphics performance and throughput is going to be questionable until someone does some testing, but it seems likely that the Quadro P6000 will still be the best solution for that by at least a slim margin. With a higher CUDA core count, higher clock speeds and equivalent architecture, the P6000 should run games, graphics rendering and design applications very well.
There are other important differences offered by the GP100. The memory system is built around a 16GB HBM2 implementation which means more total memory bandwidth but at a lower capacity than the 24GB Quadro P6000. Offering 66% more memory bandwidth does mean that the GP100 offers applications that are pixel throughput bound an advantage, as long as the compute capability keeps up on the backend.
93% of a GP100 at least...
NVIDIA has announced the Tesla P100, the company's newest (and most powerful) accelerator for HPC. Based on the Pascal GP100 GPU, the Tesla P100 is built on 16nm FinFET and uses HBM2.
NVIDIA provided a comparison table, which we added what we know about a full GP100 to:
|Tesla K40||Tesla M40||Tesla P100||Full GP100|
|GPU||GK110 (Kepler)||GM200 (Maxwell)||GP100 (Pascal)||GP100 (Pascal)|
|FP32 CUDA Cores / SM||192||128||64||64|
|FP32 CUDA Cores / GPU||2880||3072||3584||3840|
|FP64 CUDA Cores / SM||64||4||32||32|
|FP64 CUDA Cores / GPU||960||96||1792||1920|
|Base Clock||745 MHz||948 MHz||1328 MHz||TBD|
|GPU Boost Clock||810/875 MHz||1114 MHz||1480 MHz||TBD|
|Memory Interface||384-bit GDDR5||384-bit GDDR5||4096-bit HBM2||4096-bit HBM2|
|Memory Size||Up to 12 GB||Up to 24 GB||16 GB||TBD|
|L2 Cache Size||1536 KB||3072 KB||4096 KB||TBD|
|Register File Size / SM||256 KB||256 KB||256 KB||256 KB|
|Register File Size / GPU||3840 KB||6144 KB||14336 KB||15360 KB|
|TDP||235 W||250 W||300 W||TBD|
|Transistors||7.1 billion||8 billion||15.3 billion||15.3 billion|
|GPU Die Size||551 mm2||601 mm2||610 mm2||610mm2|
|Manufacturing Process||28 nm||28 nm||16 nm||16nm|
This table is designed for developers that are interested in GPU compute, so a few variables (like ROPs) are still unknown, but it still gives us a huge insight into the “big Pascal” architecture. The jump to 16nm allows for about twice the number of transistors, 15.3 billion, up from 8 billion with GM200, with roughly the same die area, 610 mm2, up from 601 mm2.
A full GP100 processor will have 60 shader modules, compared to GM200's 24, although Pascal stores half of the shaders per SM. The GP100 part that is listed in the table above is actually partially disabled, cutting off four of the sixty total. This leads to 3584 single-precision (32-bit) CUDA cores, which is up from 3072 in GM200. (The full GP100 architecture will have 3840 of these FP32 CUDA cores -- but we don't know when or where we'll see that.) The base clock is also significantly higher than Maxwell, 1328 MHz versus ~1000 MHz for the Titan X and 980 Ti, although Ryan has overclocked those GPUs to ~1390 MHz with relative ease. This is interesting, because even though 10.6 TeraFLOPs is amazing, it's only about 20% more than what GM200 could pull off with an overclock.
Some Hints as to What Comes Next
On March 14 at the Capsaicin event at GDC AMD disclosed their roadmap for GPU architectures through 2018. There were two new names in attendance as well as some hints at what technology will be implemented in these products. It was only one slide, but some interesting information can be inferred from what we have seen and what was said in the event and afterwards during interviews.
Polaris the the next generation of GCN products from AMD that have been shown off for the past few months. Previously in December and at CES we saw the Polaris 11 GPU on display. Very little is known about this product except that it is small and extremely power efficient. Last night we saw the Polaris 10 being run and we only know that it is competitive with current mainstream performance and is larger than the Polaris 11. These products are purportedly based on Samsung/GLOBALFOUNDRIES 14nm LPP.
The source of near endless speculation online.
In the slide AMD showed it listed Polaris as having 2.5X the performance per watt over the previous 28 nm products in AMD’s lineup. This is impressive, but not terribly surprising. AMD and NVIDIA both skipped the 20 nm planar node because it just did not offer up the type of performance and scaling to make sense economically. Simply put, the expense was not worth the results in terms of die size improvements and more importantly power scaling. 20 nm planar just could not offer the type of performance overall that GPU manufacturers could achieve with 2nd and 3rd generation 28nm processes.
What was missing from the slide is mention that Polaris will integrate either HMB1 or HBM2. Vega, the architecture after Polaris, does in fact list HBM2 as the memory technology it will be packaged with. It promises another tick up in terms of performance per watt, but that is going to come more from aggressive design optimizations and likely improvements on FinFET process technologies. Vega will be a 2017 product.
Beyond that we see Navi. It again boasts an improvement in perf per watt as well as the inclusion of a new memory technology behind HBM. Current conjecture is that this could be HMC (hybrid memory cube). I am not entirely certain of that particular conjecture as it does not necessarily improve upon the advantages of current generation HBM and upcoming HBM2 implementations. Navi will not show up until 2018 at the earliest. This *could* be a 10 nm part, but considering the struggle that the industry has had getting to 14/16nm FinFET I am not holding my breath.
AMD provided few details about these products other than what we see here. From here on out is conjecture based upon industry trends, analysis of known roadmaps, and the limitations of the process and memory technologies that are already well known.
Subject: Graphics Cards | March 15, 2016 - 02:02 AM | Ryan Shrout
Tagged: vulkan, raja koduri, Polaris, HBM2, hbm, dx12, crossfire, amd
After hosting the AMD Capsaicin event at GDC tonight, the SVP and Chief Architect of the Radeon Technologies Group Raja Koduri sat down with me to talk about the event and offered up some additional details on the Radeon Pro Duo, upcoming Polaris GPUs and more. The video below has the full interview but there are several highlights that stand out as noteworthy.
- Raja claimed that one of the reasons to launch the dual-Fiji card as the Radeon Pro Duo for developers rather than pure Radeon, aimed at gamers, was to “get past CrossFire.” He believes we are at an inflection point with APIs. Where previously you would abstract two GPUs to appear as a single to the game engine, with DX12 and Vulkan the problem is more complex than that as we have seen in testing with early titles like Ashes of the Singularity.
But with the dual-Fiji product mostly developed and prepared, AMD was able to find a market between the enthusiast and the creator to target, and thus the Radeon Pro branding was born.
Raja further expands on it, telling me that in order to make multi-GPU useful and productive for the next generation of APIs, getting multi-GPU hardware solutions in the hands of developers is crucial. He admitted that CrossFire in the past has had performance scaling concerns and compatibility issues, and that getting multi-GPU correct from the ground floor here is crucial.
- With changes in Moore’s Law and the realities of process technology and processor construction, multi-GPU is going to be more important for the entire product stack, not just the extreme enthusiast crowd. Why? Because realities are dictating that GPU vendors build smaller, more power efficient GPUs, and to scale performance overall, multi-GPU solutions need to be efficient and plentiful. The “economics of the smaller die” are much better for AMD (and we assume NVIDIA) and by 2017-2019, this is the reality and will be how graphics performance will scale.
Getting the software ecosystem going now is going to be crucial to ease into that standard.
- The naming scheme of Polaris (10, 11…) has no equation, it’s just “a sequence of numbers” and we should only expect it to increase going forward. The next Polaris chip will be bigger than 11, that’s the secret he gave us.
There have been concerns that AMD was only going to go for the mainstream gaming market with Polaris but Raja promised me and our readers that we “would be really really pleased.” We expect to see Polaris-based GPUs across the entire performance stack.
- AMD’s primary goal here is to get many millions of gamers VR-ready, though getting the enthusiasts “that last millisecond” is still a goal and it will happen from Radeon.
- No solid date on Polaris parts at all – I tried! (Other than the launches start in June.) Though Raja did promise that after tonight, he will only have his next alcoholic beverage until the launch of Polaris. Serious commitment!
- Curious about the HBM2 inclusion in Vega on the roadmap and what that means for Polaris? Though he didn’t say it outright, it appears that Polaris will be using HBM1, leaving me to wonder about the memory capacity limitations inherent in that. Has AMD found a way to get past the 4GB barrier? We are trying to figure that out for sure.
Why is Polaris going to use HBM1? Raja pointed towards the extreme cost and expense of building the HBM ecosystem prepping the pipeline for the new memory technology as the culprit and AMD obviously wants to recoup some of that cost with another generation of GPU usage.
Speaking with Raja is always interesting and the confidence and knowledge he showcases is still what gives me assurance that the Radeon Technologies Group is headed in the correct direction. This is going to be a very interesting year for graphics, PC gaming and for GPU technologies, as showcased throughout the Capsaicin event, and I think everyone should be looking forward do it.
Subject: Graphics Cards | March 11, 2016 - 05:03 PM | Sebastian Peak
Tagged: rumor, report, pascal, nvidia, HBM2, gtx1080, GTX 1080, gtx, GP104, geforce, gddr5x
We are expecting news of the next NVIDIA graphics card this spring, and as usual whenever an announcement is imminent we have started seeing some rumors about the next GeForce card.
(Image credit: NVIDIA)
Pascal is the name we've all being hearing about, and along with this next-gen core we've been expecting HBM2 (second-gen High Bandwidth Memory). This makes today's rumor all the more interesting, as VideoCardz is reporting (via BenchLife) that a card called either the GTX 1080 or GTX 1800 will be announced, using the GP104 GPU core with 8GB of GDDR5X - and not HBM2.
The report also claims that NVIDIA CEO Jen-Hsun Huang will have an announcement for Pascal in April, which leads us to believe a shipping product based on Pascal is finally in the works. Taking in all of the information from the BenchLife report, VideoCardz has created this list to summarize the rumors (taken directly from the source link):
- Pascal launch in April
- GTX 1080/1800 launch in May 27th
- GTX 1080/1800 has GP104 Pascal GPU
- GTX 1080/1800 has 8GB GDDR5X memory
- GTX 1080/1800 has one 8pin power connector
- GTX 1080/1800 has 1x DVI, 1x HDMI, 2x DisplayPort
- First Pascal board with HBM would be GP100 (Big Pascal)
Rumored GTX 1080 Specs (Credit: VideoCardz)
The alleged single 8-pin power connector with this GTX 1080 would place the power limit at 225W, though it could very well require less power. The GTX 980 is only a 165W part, with the GTX 980 Ti rated at 250W.
As always, only time will tell how accurate these rumors are; though VideoCardz points out "BenchLife stories are usually correct", though they are skeptical of the report based on the name GTX 1080 (though this would follow the current naming scheme of GeForce cards).
Subject: Memory | February 15, 2016 - 05:59 PM | Jeremy Hellstrom
Tagged: Samsung, HBM2, Data Memory Systems
Samsung is ready to roll out the next generation of High Bandwidth Memory, aka HBM2, for your desktop and not just your next generation of GPU. They have already begun production on 4GB HBM2 DRAM and promise 8GB DIMMs by the end of this year. The modules will provide double the bandwidth of HBM1, up 256GB/s of bandwidth which is very impressive compared to the up to 70GB/s DDR4-3200 theoretically offers.
Not only is this technology going to appear in the next genertation of NVIDIA and AMD GPUs but could also work its way into main system memory. Of course these DIMMs are not going to work with any desktop or mobile processor currently on the market but we will hopefully see new processors with compatible memory controllers in the near future. You can also expect this to come with a cost, not just in expensive DIMMs at launch but also a comparible increaset in CPU prices as they will cost more to manufacture initially.
It will be very interesting to see how this effects the overall market; will we see a split similar to what is currently seen in mainstream GPUs, a lower cost DDR version and a standard GDDR version? The new market could see DDRx and HMBx models of CPUs and motherboards and could do the same for the GPU market, with the end of DDR on graphics cards. If so will it spell the end of DDR5 development? Interesting times to be living in, we should be hearing more from Samsung in the near future.
Subject: Graphics Cards, Memory | January 19, 2016 - 11:01 PM | Scott Michaud
Tagged: Samsung, HBM2, hbm
Samsung has just announced that they have begun mass production of 4GB HBM2 memory modules. When used on GPUs, four packages can provide 16GB of Video RAM with very high performance. They do this with a very wide data bus, which trade off frequency for transferring huge chunks. Samsung's offering is rated at 256 GB/s per package, which is twice what the Fury X could do with HBM1.
They also expect to mass produce 8GB HBM2 packages within this calendar year. I'm guessing that this means we'll see 32GB GPUs in the late-2016 or early-2017 time frame unless "within this year" means very, very soon (versus Q3/Q4). They will likely be for workstation or professional cards, but, in NVIDIA's case, those are usually based on architectures that are marketed to high-end gaming enthusiasts through some Titan offering. There's a lot of ways this could go, but a 32GB Titan seems like a bit much; I wouldn't expect that this affects the enthusiast gamer segment. It might mean that professionals looking to upgrade from the Kepler-based Tesla K-series might be waiting a little longer, maybe even GTC 2017. Alternatively, they might get new cards, just with a 16GB maximum until a refresh next year. There's not enough information to know one way or the other, but it's something to think about when more of it starts rolling in.
Samsung's HBM2 are compatible with ECC, although I believe that was also true for at least some HBM1 modules from SK Hynix.
Subject: Graphics Cards | January 11, 2016 - 06:05 PM | Sebastian Peak
Tagged: rumor, report, pascal, nvidia, HBM2, hbm, GP104
A delivery of GPUs and related test equipment from Taiwan to Banglore has led to speculation about NVIDIA's upcoming GP104 Pascal GPU.
Image via Zauba.com
How much information can be gleaned from an import shipping manifest (linked here)? The data indicates a chip with a 37.5 x 37.5 mm package and 2152 pins, which is being attributed to the GP104 based on knowledge of “earlier, similar deliveries” (or possible inside information). This has prompted members of the 3dcenter.org forums (German language) to speculate on the use of GDDR5 or GDDR5X memory based on the likelihood of HBM being implemented on a die of this size.
Of course, NVIDIA has stated that Pascal will implement 3D memory, and the upcoming GP100 will reportedly be on a 55 x 55 mm package using HBM2. Could this be a new, lower-cost part using the existing GDDR5 standard or the faster GDDR5X instead? VideoCardz and WCCFtech have posted stories based on the 3DCenter report, and to quote directly from the VideoCardz post on the subject:
"3DCenter has a theory that GP104 could actually not use HBM, but GDDR5(X) instead. This would rather be a very strange decision, but could NVIDIA possibly make smaller GPU (than GM204) and still accommodate 4 HBM modules? This theory is not taken from the thin air. The GP100 aka the Big Pascal, would supposedly come in 55x55mm BGA package. That’s 10mm more than GM200, which were probably required for additional HBM modules. Of course those numbers are for the whole package (with interposer), not just the GPU."
All of this is a lot to take from a shipping record that might not even be related to an NVIDIA product, but the report has made the rounds at this point so now we’ll just have to wait for new information.
GPU Enthusiasts Are Throwing a FET
NVIDIA is rumored to launch Pascal in early (~April-ish) 2016, although some are skeptical that it will even appear before the summer. The design was finalized months ago, and unconfirmed shipping information claims that chips are being stockpiled, which is typical when preparing to launch a product. It is expected to compete against AMD's rumored Arctic Islands architecture, which will, according to its also rumored numbers, be very similar to Pascal.
This architecture is a big one for several reasons.
Image Credit: WCCFTech
First, it will jump two full process nodes. Current desktop GPUs are manufactured at 28nm, which was first introduced with the GeForce GTX 680 all the way back in early 2012, but Pascal will be manufactured on TSMC's 16nm FinFET+ technology. Smaller features have several advantages, but a huge one for GPUs is the ability to fit more complex circuitry in the same die area. This means that you can include more copies of elements, such as shader cores, and do more in fixed-function hardware, like video encode and decode.
That said, we got a lot more life out of 28nm than we really should have. Chips like GM200 and Fiji are huge, relatively power-hungry, and complex, which is a terrible idea to produce when yields are low. I asked Josh Walrath, who is our go-to for analysis of fab processes, and he believes that FinFET+ is probably even more complicated today than 28nm was in the 2012 timeframe, which was when it launched for GPUs.
It's two full steps forward from where we started, but we've been tiptoeing since then.
Image Credit: WCCFTech
Second, Pascal will introduce HBM 2.0 to NVIDIA hardware. HBM 1.0 was introduced with AMD's Radeon Fury X, and it helped in numerous ways -- from smaller card size to a triple-digit percentage increase in memory bandwidth. The 980 Ti can talk to its memory at about 300GB/s, while Pascal is rumored to push that to 1TB/s. Capacity won't be sacrificed, either. The top-end card is expected to contain 16GB of global memory, which is twice what any console has. This means less streaming, higher resolution textures, and probably even left-over scratch space for the GPU to generate content in with compute shaders. Also, according to AMD, HBM is an easier architecture to communicate with than GDDR, which should mean a savings in die space that could be used for other things.
Third, the architecture includes native support for three levels of floating point precision. Maxwell, due to how limited 28nm was, saved on complexity by reducing 64-bit IEEE 754 decimal number performance to 1/32nd of 32-bit numbers, because FP64 values are rarely used in video games. This saved transistors, but was a huge, order-of-magnitude step back from the 1/3rd ratio found on the Kepler-based GK110. While it probably won't be back to the 1/2 ratio that was found in Fermi, Pascal should be much better suited for GPU compute.
Image Credit: WCCFTech
Mixed precision could help video games too, though. Remember how I said it supports three levels? The third one is 16-bit, which is half of the format that is commonly used in video games. Sometimes, that is sufficient. If so, Pascal is said to do these calculations at twice the rate of 32-bit. We'll need to see whether enough games (and other applications) are willing to drop down in precision to justify the die space that these dedicated circuits require, but it should double the performance of anything that does.
So basically, this generation should provide a massive jump in performance that enthusiasts have been waiting for. Increases in GPU memory bandwidth and the amount of features that can be printed into the die are two major bottlenecks for most modern games and GPU-accelerated software. We'll need to wait for benchmarks to see how the theoretical maps to practical, but it's a good sign.