Subject: Graphics Cards | June 5, 2018 - 11:58 PM | Tim Verry
Tagged: Vega, machine learning, instinct, HBM2, gpu, computex 2018, computex, amd, 7nm
AMD showed off its first 7nm GPU in the form of the expected AMD Radeon Instinct RX Vega graphics product and RX Vega GPU with 32GB of HBM2 memory. The new GPU uses the Vega architecture along with the open source ecosystem built by AMD to enable both graphics and GPGPU workloads. AMD demonstrated using the 7nm RX Vega GPU for ray tracing in a cool demo that showed realistic reflections and shadows being rendered on a per pixel basis in a model. Granted, we are still a long way away from seeing that kind of detail in real time gaming, but is still cool to see glimpses of that ray traced future.
According to AMD, the 32GB of HBM2 memory will greatly benefit creators and enterprise clients that need to work with large datasets and be able to quickly make changes and updates to models before doing a final render. The larger memory buffer will also help in HPC applications with more big data databases being able to be kept close to the GPU for processing using the wide HBM2 memory bus. Further, HBM2 has physical size and energy efficiency benefits which will pique the interest of datacenters focused on maximizing TCO numbers.
Dr. Lisa Su came on state towards the end of the 7nm Vega demonstration to show off the GPU in person, and you can see that it is rather tiny for the compute power it provides! It is shorter than the two stacks of HBM2 dies on either side, for example.
Of course AMD did not disclose all the nitty-gritty specifications of the new machine learning graphics card that enthusiasts want to know. We will have to wait a bit longer for that information unfortunately!
As for other 7nm offerings? As Ryan talked about during CES in January, 2018 will primarily be the year for the machine learning-focused Radeon Instinct RX Vega 7nm GPU, with other consumer-focused GPUs using the smaller process node likely coming out in 2019. Whether those 7nm GPUs in 2019 will be a refreshed Vega or the new Navi is still up for debate, however AMD's graphics roadmap certainly doesn't rule out Navi as a possibility. In any case, AMD did state during the livestream that it intends to release a new GPU every year with the GPUs alternating between new architecture and new process node.
What are your thoughts on AMD's graphics roadmap and its first 7nm Vega GPU?
Subject: Memory | January 12, 2018 - 05:46 PM | Tim Verry
Tagged: supercomputing, Samsung, HPC, HBM2, graphics cards, aquabolt
Samsung recently announced that it has begun mass production of its second generation HBM2 memory which it is calling “Aquabolt”. Samsung has refined the design of its 8GB HBM2 packages allowing them to achieve an impressive 2.4 Gbps per pin data transfer rates without needing more power than its first generation 1.2V HBM2.
Reportedly Samsung is using new TSV (through-silicon-via) design techniques and adding additional thermal bumps between dies to improve clocks and thermal control. Each 8GB HBM2 “Aquabolt” package is comprised of eight 8Gb dies each of which is vertically interconnected using 5,000 TSVs which is a huge number especially considering how small and tightly packed these dies are. Further, Samsung has added a new protective layer at the bottom of the stack to reinforce the package’s physical strength. While the press release did not go into detail, it does mention that Samsung had to overcome challenges relating to “collateral clock skewing” as a result of the sheer number of TSVs.
On the performance front, Samsung claims that Aquabolt offers up a 50% increase in per package performance versus its first generation “Flarebolt” memory which ran at 1.6Gbps per pin and 1.2V. Interestingly, Aquabolt is also faster than Samsung’s 2.0Gbps per pin HBM2 product (which needed 1.35V) without needing additional power. Samsung also compares Aquabolt to GDDR5 stating that it offers 9.6-times the bandwidth with a single package of HBM2 at 307 GB/s and a GDDR5 chip at 32 GB/s. Thanks to the 2.4 Gbps per pin speed, Aquabolt offers 307 GB/s of bandwidth per package and with four packages products such as graphics cards can take advantage of 1.2 TB/s of bandwidth.
This second generation HBM2 memory is a decent step up in performance (with HBM hitting 128GB/s and first generation HBM2 hitting 256 GB/s per package and 512 GB/s and 1 TB/s with four packages respectively), but the interesting bit is that it is faster without needing more power. The increased bandwidth and data transfer speeds will be a boon to the HPC and supercomputing market and useful for working with massive databases, simulations, neural networks and AI training, and other “big data” tasks.
Aquabolt looks particularly promising for the mobile market though with future products succeeding the current mobile Vega GPU in Kaby Lake-G processors, Ryzen Mobile APUs, and eventually discrete Vega mobile graphics cards getting a nice performance boost (it’s likely too late for AMD to go with this new HBM2 on these specific products, but future refreshes or generations may be able to take advantage of it). I’m sure it will also see usage in the SoCs uses in Intel’s and NVIDIA’s driverless car projects as well.
Subject: General Tech | December 27, 2017 - 11:42 AM | Jeremy Hellstrom
Tagged: nvidia, Intel, HBM2, deep learning
AMD has never been afraid to try new things, from hitting 1GHz first, to creating a true multicore processor, most recently adopting HBM and HBM2 into their graphics cards. That move contributed to some of their recent difficulties with the current generation of GPUs; HBM is more expensive to produce and more of a challenge to implement. While they were the first to implement HBM, it is NVIDIA and Intel which are benefiting from AMD's experimental nature. Their new generation of HPC solutions, the Tesla P100, Quadro GP 100 and Lake Crest all use HBM2 and benefit from the experience Hynix, Samsung and TSMC gained fabbing the first generation. Vega products offer slightly less memory bandwidth as well as lagging behind in overall performance, a drawback to being first.
On a positive note, AMD have now had more experience designing chips which make use of HBM and this could offer a new hope for the next generation of cards, both gaming and HPC flavours. DigiTimes briefly covers the two processes manufacturers use in the production of HBM here.
"However, Intel's release of its deep-learning chip, Lake Crest, which came following its acquisition of Nervana, has come with HMB2. This indicates that HBM-based architecture will be the main development direction of memory solutions for HPC solutions by GPU vendors."
Here is some more Tech News from around the web:
- FCC Approves First Wireless 'Power-At-A-Distance' Charging System @ Slashdot
- Guidemaster: Everything Amazon’s Alexa can do, plus the best skills to enable @ Ars Technica
- INQ's best and worst of tech in 2017
- Linksys LAPAC2600 AC2600 Dual Band MU-MIMMO Access Point Review @ NikKTech
- Acoustic Attacks on HDDs Can Sabotage PCs, CCTV Systems, ATMs, More @ Slashdot
- 10 Tech Products That Are Next to Impossible to Repair @ Techspot
- noblechairs ICON PU Faux Leather Chair @ TechPowerUp
Subject: General Tech, Graphics Cards | December 4, 2017 - 05:47 PM | Tim Verry
Tagged: navi, HBM2, hbm, gddr6, amd
WCCFTech reports that AMD is working on a GDDR6 memory controller for its upcoming graphics cards. Starting with an AMD Technical Engineer listing GDDR6 on his portfolio, the site claims to have verified through sources familiar with the matter that AMD is, in fact, supporting the new graphics memory standard and will be using their own controller to support it (rather than licensing one).
AMD is not abandoning HBM2 memory though. The company is sticking to its previously released roadmaps and Navi will still utilize HBM2 memory – at least on the high-end SKUs. While AMD has so far only released RX Vega 64 and RX Vega 56 graphics cards, the company may well release lower-end Vega-based cards with GDDR5 at some point although for now the Polaris architecture is handling the lower end. AMD supporting GDDR6 is a good thing and should enable cheaper mid-range cards that are not limited by supply shortages of the more expensive (albeit much higher bandwidth) High Bandwidth Memory that have seemingly plagues both NVIDIA and AMD at various points in time. GDDR6 further offers several advantages over GDDR5 with almost twice the speed (9 Gbps versus 16 Gbps) at lower power (1.5V versus 1.35V) and more density and underlying technology optimizations than even GDDR5X. While the G5X memory is capable of hitting the same 16 Gbps launch speeds of GDDR6, the newer memory technology offers up to 32Gb dies* versus 16Gb and a two channel design (which ends up being a bit more efficient and easier to produce / for GPU manufacturers to wire up). GDDR6 will represent a nice speed bump for mid-range cards (very low end may well stick with GDDR5 save for mobile parts which could benefit from the lower power GDDR6) while letting AMD have a bit better profit margins on these lower end margin SKUs and being able to produce more cards to satisfy demand. HBM2 is nice to have but it is more well suited for the compute-oriented cards for workstation and data center usage rather than gaming right now and GDDR6 can offer more price-to-performance for the consumer gaming cards.
As for the question of why AMD would want to design their own GDDR6 memory controller rather than license one, I think that comes down to AMD thinking long-term. It will be more expensive up front to design their own controller, but AMD will be able to more fully integrate it and tune it to work with their graphics cards such that it can be more power efficient. Also, having their own GDDR6 memory controller means they can use it in other areas such as their APUs and SoCs offered through their Semi Custom Business Unit (e.g. the SoCs used in gaming consoles). Being able to offer that controller to other companies in their semi-custom SoCs free of third party licensing fees is a good thing for AMD.
With GDDR6 becoming readily available early next year, there is a good chance AMD will be ready to use the new memory technology as soon as Navi but likely not until closer to the end of 2018 or early 2019 when AMD launches new lower and mid-range gaming cards (consumer-level) based on Navi and/or Vega.
*At launch it appears that GDDR6 from the big three (Micron, Samsung, and SK Hynix) will use 16Gb dies, but the standard allows for up to 32Gb dies. The G5X standard allows for up to 16Gb dies.
- (Leak) AMD Vega 10 and Vega 20 Information Leaked
- Micron Pushes GDDR5X To 16Gbps, Expects To Launch GDDR6 In Early 2018
- Micron Planning To Launch GDDR6 Graphics Memory In 2017
- Podcast #436 - ECS Mini-STX, NVIDIA Quadro, AMD Zen Arch, Optane, GDDR6 and more!
- AMD Q3 2017 Earnings: A Pleasant Surprise
Subject: Processors | November 6, 2017 - 02:00 PM | Josh Walrath
Tagged: radeon, Polaris, mobile, kaby lake, interposer, Intel, HBM2, gaming, EMIB, apple, amd, 8th generation core
In what is probably considered one of the worst kept secrets in the industry, Intel has announced a new CPU line for the mobile market that integrates AMD’s Radeon graphics. For the past year or so rumors of such a partnership were freely flowing, but now we finally get confirmation as to how this will be implemented and marketed.
Intel’s record on designing GPUs has been rather pedestrian. While they have kept up with the competition, a slew of small issues and incompatibilities have plagued each generation. Performance is also an issue when trying to compete with AMD’s APUs as well as discrete mobile graphics offerings from both AMD and NVIDIA. Software and driver support is another area where Intel has been unable to compete due largely to economics and the competitions’ decades of experience in this area.
There are many significant issues that have been solved in one fell swoop. Intel has partnered with AMD’s Semi-Custom Group to develop a modern and competent GPU that can be closely connected to the Intel CPU all the while utilizing HBM2 memory to improve overall performance. The packaging of this product utilizes Intel’s EMIB (Embedded Multi-die Interconnect Bridge) tech.
EMIB is an interposer-like technology that integrates silicon bridges into the PCB instead of relying upon a large interposer. This allows a bit more flexibility in layout of the chips as well as lowers the Z height of the package as there is not a large interposer sitting between the chips and the PCB. Just as interposer technology allows the use of chips from different process technologies to work seamlessly together, EMIB provides that same flexibility.
The GPU looks to be based on the Polaris architecture which is a slight step back from AMD’s cutting edge Vega architecture. Polaris does not implement the Infinity Fabric component that Vega does. It is more conventional in terms of data communication. It is a step beyond what AMD has provided for Sony and Microsoft, who each utilize a semi-custom design for the latest console chips. AMD is able to integrate the HBM2 controller that is featured in Vega. Using HBM2 provides a tremendous amount of bandwidth along with power savings as compared to traditional GDDR-5 memory modules. It also saves dramatically on PCB space allowing for smaller form factors.
EMIB provides nearly all of the advantages of the interposer while keeping the optimal z-height of the standard PCB substrate.
Intel did have to do quite a bit of extra work on the power side of the equation. AMD utilizes their latest Infinity Fabric for fine grained power control in their upcoming Raven Ridge based Ryzen APUs. Intel had to modify their current hardware to be able to do much the same work with 3rd party silicon. This is no easy task as the CPU needs to monitor and continually adjust for GPU usage in a variety of scenarios. This type of work takes time and a lot of testing to fine tune as well as the inevitable hardware revisions to get thing to work correctly. This then needs to be balanced by the GPU driver stack which also tends to take control of power usage in mobile scenarios.
This combination of EMIB, Intel Kaby Lake CPU, HBM2, and a current AMD GPU make this a very interesting combination for the mobile and small form factor markets. The EMIB form factor provides very fast interconnect speeds and a smaller footprint due to the integration of HBM2 memory. The mature AMD Radeon software stack for both Windows and macOS environments provides Intel with another feature in which to sell their parts in areas where previously they were not considered. The 8th Gen Kaby Lake CPU provides the very latest CPU design on the new 14nm++ process for greater performance and better power efficiency.
This is one of those rare instances where such cooperation between intense rivals actually improves the situation for both. AMD gets a financial shot in the arm by signing a large and important customer for their Semi-Custom division. The royalty income from this partnership should be more consistent as compared to the console manufacturers due to the seasonality of the console product. This will have a very material effect on AMD’s bottom line for years to come. Intel gets a solid silicon solution with higher performance than they can offer, as well as aforementioned mature software stack for multiple OS. Finally throw in the HBM2 memory support for better power efficiency and a smaller form factor, and it is a clear win for all parties involved.
The PCB savings plus faster interconnects will allow these chips to power smaller form factors with better performance and battery life.
One of the unknowns here is what process node the GPU portion will be manufactured on. We do not know which foundry Intel will use, or if they will stay in-house. Currently TSMC manufactures the latest console SoCs while GLOBALFOUNDRIES handles the latest GPUS from AMD. Initially one would expect Intel to build the GPU in house, but the current rumor is that AMD will work to produce the chips with one of their traditional foundry partners. Once the chip is manufactured then it is sent to Intel to be integrated into their product.
Apple is one of the obvious candidates for this particular form factor and combination of parts. Apple has a long history with Intel on the CPU side and AMD on the GPU side. This product provides all of the solutions Apple needs to manufacture high performance products in smaller form factors. Gaming laptops also get a boost from such a combination that will offer relatively high performance with minimal power increases as well as the smaller form factor.
The potential (leaked) performance of the 8th Gen Intel CPU with Radeon Graphics.
The data above could very well be wrong about the potential performance of this combination. What we see is pretty compelling though. The Intel/AMD product performs like a higher end CPU with discrete GPU combo. It is faster than a NVIDIA GTX 1050 Ti and trails the GTX 1060. It also is significantly faster than a desktop AMD RX 560 part. We can also see that it is going to be much faster than the flagship 15 watt TDP AMD Ryzen 7 2700U. We do not yet know how it compares to the rumored 65 watt TDP Raven Ridge based APUs from AMD that will likely be released next year. What will be fascinating here is how much power the new Intel combination will draw as compared to the discrete solutions utilizing NVIDIA graphics.
To reiterate, this is Intel as a customer for AMD’s Semi-Custom group rather than a licensing agreement between the two companies. They are working hand in hand in developing this solution and then both profiting from it. AMD getting royalties from every Intel package sold that features this technology will have a very positive effect on earnings. Intel gets a cutting edge and competent graphics solution along with the improved software and driver support such a package includes.
Update: We have been informed that AMD is producing the chips and selling them directly to Intel for integration into these new SKUs. There are no royalties or licensing, but the Semi-Custom division should still receive the revenue for these specialized products made only for Intel.
Subject: General Tech | March 28, 2017 - 01:04 PM | Jeremy Hellstrom
Tagged: amd, Vega, rumour, HBM2
The Inquirer have posted a tiny bit of information about AMD's upcoming Vega and as any rumours about the new GPU are hard to find it is the best we have at the moment. AMD's claim is that the second generation HBM present on the 4GB and 8GB models could offer equivalent memory bandwidth to a GTX 1080 Ti, which makes perfect sense. The GTX 1080 Ti offers 484 GB/s of memory bandwidth while AMD's R9 series first generation HBM offers 512 GB/s. The real trick is filling that pipeline to give AMD's HBM2 based cards a chance to shine and which depends on software developers as much as it does the hardware. As well, The Inquirer discusses the possible efficiency advantages that Vega will have, which could result in smaller cards as well as an effective mobile product. Pop over to take a look at the current rumours, here is hoping we can provide more detailed information in the near future.
"AMD HAS TEASED more information about its forthcoming Vega-based graphics cards, revealing that they will come with either 4GB or 8GB memory and hinting that a launch is imminent."
Here is some more Tech News from around the web:
- iPhone-havers think they're safe. But they're not @ The Register
- FYI Docs.com users: You may have leaked passwords, personal info – thousands have @ The Register
- LastPass scrambles to fix another major flaw – once again spotted by Google's bugfinders @ The Register
- Johnny Depp signs on to play John McAfee in a film of his life @ The Inquirer
- Samsung 4K Blu-ray Player @ Hardware Secrets
- Futuremark Ends Support for 3DMark Vantage and PCMark Vantage @ [H]ard|OCP
- Konica Minolta Unveils the Future of Work, Or At Least Its Version @ Kitguru
- Win a PC hardware bundle with Gigabyte AORUS, HyperX and KitGuru
NVIDIA P100 comes to Quadro
At the start of the SOLIDWORKS World conference this week, NVIDIA took the cover off of a handful of new Quadro cards targeting professional graphics workloads. Though the bulk of NVIDIA’s discussion covered lower cost options like the Quadro P4000, P2000, and below, the most interesting product sits at the high end, the Quadro GP100.
As you might guess from the name alone, the Quadro GP100 is based on the GP100 GPU, the same silicon used on the Tesla P100 announced back in April of 2016. At the time, the GP100 GPU was specifically billed as an HPC accelerator for servers. It had a unique form factor with a passive cooler that required additional chassis fans. Just a couple of months later, a PCIe version of the GP100 was released under the Tesla GP100 brand with the same specifications.
Today that GPU hardware gets a third iteration as the Quadro GP100. Let’s take a look at the Quadro GP100 specifications and how it compares to some recent Quadro offerings.
|Quadro GP100||Quadro P6000||Quadro M6000||Full GP100|
|FP32 CUDA Cores / SM||64||64||64||64|
|FP32 CUDA Cores / GPU||3584||3840||3072||3840|
|FP64 CUDA Cores / SM||32||2||2||32|
|FP64 CUDA Cores / GPU||1792||120||96||1920|
|Base Clock||1303 MHz||1417 MHz||1026 MHz||TBD|
|GPU Boost Clock||1442 MHz||1530 MHz||1152 MHz||TBD|
|FP32 TFLOPS (SP)||10.3||12.0||7.0||TBD|
|FP64 TFLOPS (DP)||5.15||0.375||0.221||TBD|
|Memory Interface||1.4 Gbps
|Memory Bandwidth||716 GB/s||432 GB/s||316.8 GB/s||?|
|Memory Size||16GB||24 GB||12GB||16GB|
|TDP||235 W||250 W||250 W||TBD|
|Transistors||15.3 billion||12 billion||8 billion||15.3 billion|
|GPU Die Size||610mm2||471 mm2||601 mm2||610mm2|
There are some interesting stats here that may not be obvious at first glance. Most interesting is that despite the pricing and segmentation, the GP100 is not the de facto fastest Quadro card from NVIDIA depending on your workload. With 3584 CUDA cores running at somewhere around 1400 MHz at Boost speeds, the single precision (32-bit) rating for GP100 is 10.3 TFLOPS, less than the recently released P6000 card. Based on GP102, the P6000 has 3840 CUDA cores running at something around 1500 MHz for a total of 12 TFLOPS.
GP100 (full) Block Diagram
Clearly the placement for Quadro GP100 is based around its 64-bit, double precision performance, and its ability to offer real-time simulations on more complex workloads than other Pascal-based Quadro cards can offer. The Quadro GP100 offers 1/2 DP compute rate, totaling 5.2 TFLOPS. The P6000 on the other hand is only capable of 0.375 TLOPS with the standard, consumer level 1/32 DP rate. Inclusion of ECC memory support on GP100 is also something no other recent Quadro card has.
Raw graphics performance and throughput is going to be questionable until someone does some testing, but it seems likely that the Quadro P6000 will still be the best solution for that by at least a slim margin. With a higher CUDA core count, higher clock speeds and equivalent architecture, the P6000 should run games, graphics rendering and design applications very well.
There are other important differences offered by the GP100. The memory system is built around a 16GB HBM2 implementation which means more total memory bandwidth but at a lower capacity than the 24GB Quadro P6000. Offering 66% more memory bandwidth does mean that the GP100 offers applications that are pixel throughput bound an advantage, as long as the compute capability keeps up on the backend.
93% of a GP100 at least...
NVIDIA has announced the Tesla P100, the company's newest (and most powerful) accelerator for HPC. Based on the Pascal GP100 GPU, the Tesla P100 is built on 16nm FinFET and uses HBM2.
NVIDIA provided a comparison table, which we added what we know about a full GP100 to:
|Tesla K40||Tesla M40||Tesla P100||Full GP100|
|GPU||GK110 (Kepler)||GM200 (Maxwell)||GP100 (Pascal)||GP100 (Pascal)|
|FP32 CUDA Cores / SM||192||128||64||64|
|FP32 CUDA Cores / GPU||2880||3072||3584||3840|
|FP64 CUDA Cores / SM||64||4||32||32|
|FP64 CUDA Cores / GPU||960||96||1792||1920|
|Base Clock||745 MHz||948 MHz||1328 MHz||TBD|
|GPU Boost Clock||810/875 MHz||1114 MHz||1480 MHz||TBD|
|Memory Interface||384-bit GDDR5||384-bit GDDR5||4096-bit HBM2||4096-bit HBM2|
|Memory Size||Up to 12 GB||Up to 24 GB||16 GB||TBD|
|L2 Cache Size||1536 KB||3072 KB||4096 KB||TBD|
|Register File Size / SM||256 KB||256 KB||256 KB||256 KB|
|Register File Size / GPU||3840 KB||6144 KB||14336 KB||15360 KB|
|TDP||235 W||250 W||300 W||TBD|
|Transistors||7.1 billion||8 billion||15.3 billion||15.3 billion|
|GPU Die Size||551 mm2||601 mm2||610 mm2||610mm2|
|Manufacturing Process||28 nm||28 nm||16 nm||16nm|
This table is designed for developers that are interested in GPU compute, so a few variables (like ROPs) are still unknown, but it still gives us a huge insight into the “big Pascal” architecture. The jump to 16nm allows for about twice the number of transistors, 15.3 billion, up from 8 billion with GM200, with roughly the same die area, 610 mm2, up from 601 mm2.
A full GP100 processor will have 60 shader modules, compared to GM200's 24, although Pascal stores half of the shaders per SM. The GP100 part that is listed in the table above is actually partially disabled, cutting off four of the sixty total. This leads to 3584 single-precision (32-bit) CUDA cores, which is up from 3072 in GM200. (The full GP100 architecture will have 3840 of these FP32 CUDA cores -- but we don't know when or where we'll see that.) The base clock is also significantly higher than Maxwell, 1328 MHz versus ~1000 MHz for the Titan X and 980 Ti, although Ryan has overclocked those GPUs to ~1390 MHz with relative ease. This is interesting, because even though 10.6 TeraFLOPs is amazing, it's only about 20% more than what GM200 could pull off with an overclock.
Some Hints as to What Comes Next
On March 14 at the Capsaicin event at GDC AMD disclosed their roadmap for GPU architectures through 2018. There were two new names in attendance as well as some hints at what technology will be implemented in these products. It was only one slide, but some interesting information can be inferred from what we have seen and what was said in the event and afterwards during interviews.
Polaris the the next generation of GCN products from AMD that have been shown off for the past few months. Previously in December and at CES we saw the Polaris 11 GPU on display. Very little is known about this product except that it is small and extremely power efficient. Last night we saw the Polaris 10 being run and we only know that it is competitive with current mainstream performance and is larger than the Polaris 11. These products are purportedly based on Samsung/GLOBALFOUNDRIES 14nm LPP.
The source of near endless speculation online.
In the slide AMD showed it listed Polaris as having 2.5X the performance per watt over the previous 28 nm products in AMD’s lineup. This is impressive, but not terribly surprising. AMD and NVIDIA both skipped the 20 nm planar node because it just did not offer up the type of performance and scaling to make sense economically. Simply put, the expense was not worth the results in terms of die size improvements and more importantly power scaling. 20 nm planar just could not offer the type of performance overall that GPU manufacturers could achieve with 2nd and 3rd generation 28nm processes.
What was missing from the slide is mention that Polaris will integrate either HMB1 or HBM2. Vega, the architecture after Polaris, does in fact list HBM2 as the memory technology it will be packaged with. It promises another tick up in terms of performance per watt, but that is going to come more from aggressive design optimizations and likely improvements on FinFET process technologies. Vega will be a 2017 product.
Beyond that we see Navi. It again boasts an improvement in perf per watt as well as the inclusion of a new memory technology behind HBM. Current conjecture is that this could be HMC (hybrid memory cube). I am not entirely certain of that particular conjecture as it does not necessarily improve upon the advantages of current generation HBM and upcoming HBM2 implementations. Navi will not show up until 2018 at the earliest. This *could* be a 10 nm part, but considering the struggle that the industry has had getting to 14/16nm FinFET I am not holding my breath.
AMD provided few details about these products other than what we see here. From here on out is conjecture based upon industry trends, analysis of known roadmaps, and the limitations of the process and memory technologies that are already well known.
Subject: Graphics Cards | March 15, 2016 - 02:02 AM | Ryan Shrout
Tagged: vulkan, raja koduri, Polaris, HBM2, hbm, dx12, crossfire, amd
After hosting the AMD Capsaicin event at GDC tonight, the SVP and Chief Architect of the Radeon Technologies Group Raja Koduri sat down with me to talk about the event and offered up some additional details on the Radeon Pro Duo, upcoming Polaris GPUs and more. The video below has the full interview but there are several highlights that stand out as noteworthy.
- Raja claimed that one of the reasons to launch the dual-Fiji card as the Radeon Pro Duo for developers rather than pure Radeon, aimed at gamers, was to “get past CrossFire.” He believes we are at an inflection point with APIs. Where previously you would abstract two GPUs to appear as a single to the game engine, with DX12 and Vulkan the problem is more complex than that as we have seen in testing with early titles like Ashes of the Singularity.
But with the dual-Fiji product mostly developed and prepared, AMD was able to find a market between the enthusiast and the creator to target, and thus the Radeon Pro branding was born.
Raja further expands on it, telling me that in order to make multi-GPU useful and productive for the next generation of APIs, getting multi-GPU hardware solutions in the hands of developers is crucial. He admitted that CrossFire in the past has had performance scaling concerns and compatibility issues, and that getting multi-GPU correct from the ground floor here is crucial.
- With changes in Moore’s Law and the realities of process technology and processor construction, multi-GPU is going to be more important for the entire product stack, not just the extreme enthusiast crowd. Why? Because realities are dictating that GPU vendors build smaller, more power efficient GPUs, and to scale performance overall, multi-GPU solutions need to be efficient and plentiful. The “economics of the smaller die” are much better for AMD (and we assume NVIDIA) and by 2017-2019, this is the reality and will be how graphics performance will scale.
Getting the software ecosystem going now is going to be crucial to ease into that standard.
- The naming scheme of Polaris (10, 11…) has no equation, it’s just “a sequence of numbers” and we should only expect it to increase going forward. The next Polaris chip will be bigger than 11, that’s the secret he gave us.
There have been concerns that AMD was only going to go for the mainstream gaming market with Polaris but Raja promised me and our readers that we “would be really really pleased.” We expect to see Polaris-based GPUs across the entire performance stack.
- AMD’s primary goal here is to get many millions of gamers VR-ready, though getting the enthusiasts “that last millisecond” is still a goal and it will happen from Radeon.
- No solid date on Polaris parts at all – I tried! (Other than the launches start in June.) Though Raja did promise that after tonight, he will only have his next alcoholic beverage until the launch of Polaris. Serious commitment!
- Curious about the HBM2 inclusion in Vega on the roadmap and what that means for Polaris? Though he didn’t say it outright, it appears that Polaris will be using HBM1, leaving me to wonder about the memory capacity limitations inherent in that. Has AMD found a way to get past the 4GB barrier? We are trying to figure that out for sure.
Why is Polaris going to use HBM1? Raja pointed towards the extreme cost and expense of building the HBM ecosystem prepping the pipeline for the new memory technology as the culprit and AMD obviously wants to recoup some of that cost with another generation of GPU usage.
Speaking with Raja is always interesting and the confidence and knowledge he showcases is still what gives me assurance that the Radeon Technologies Group is headed in the correct direction. This is going to be a very interesting year for graphics, PC gaming and for GPU technologies, as showcased throughout the Capsaicin event, and I think everyone should be looking forward do it.