Subject: Graphics Cards | September 5, 2018 - 05:50 PM | Jeremy Hellstrom
Tagged: amd, GCN, R9 290X, r9 390x, R9 Fury X, RX VEGA 64
[H]ard|OCP have been examining the generational performance differences between GPUs, starting with NVIDIA and moving onto AMD. In this review they compare Hawaii GCN 1.1, Fiji GCN 1.3 and Vega10 GCN 1.5 on a wide variety of games. AMD is a more interesting case as they have made more frequent changes to their architecture, while at the same time tending towards mid-range performance as opposed to aiming for the high end of performance and pricing. This has led to interesting results, with certain GCN versions offering more compelling upgrade paths than others. Take a close look to see how AMD's GPUs have changed over the past five years.
"Wonder how much performance you are truly getting from GPU to GPU upgrade in games? We take GPUs from AMD and compare performance gained from 2013 to 2018. This is our AMD GPU Generational Performance Part 1 article focusing on the Radeon R9 290X, Radeon R9 390X, Radeon R9 Fury X, and Radeon RX Vega 64 in 14 games."
Here are some more Graphics Card articles from around the web:
- The New 3GB GeForce GTX 1050: Good Product or Misleading Product? @ TechSpot
- Razer Core X @ Kitguru
- Blackmagic external GPU review: A very Apple graphics solution @ Ars Technica
Subject: Graphics Cards | April 25, 2017 - 03:11 AM | Tim Verry
Tagged: sapphire, RX 580, RX 550, pulse, Polaris, nitro+, GCN
Earlier this month Sapphire announced a new budget-oriented series of graphics cards it calls PULSE. The new series slides in below the premium Nitro+ series to offer cheaper graphics cards that retain many of the high-quality hardware components but lack the flashy extras on the coolers, come in at lower factory overclocks, and have fewer PCI-E power inputs which, in theory, means lower overclocking headroom. The new graphics cards series is currently made up of five Polaris-based GPUs: the Sapphire Pulse RX 580, RX 570, RX 570 ITX, and RX 550.
According to Sapphire, Pulse graphics cards use many of the high-end components as the Nitro+ cards including Black Diamond Chokes 4, long lasting capacitors, fuse protection. And intelligent fan control. The new graphics cards have aluminum backplates, removeable Quick Connect fans with semi-passive cooling technology that allows the fans to turn off when the card is under light load. The RX 580 and RX 570 use Dual-X coolers and the RX 570 ITX and RX 550 use single fan shrouded coolers.
Compared to Nitro+, the coolers are a bit less flashy and there are no Nitro+ Glow LEDs. If you are not a fan of bling or do not have a windowed case, the Pulse cards might save you a bit of money while getting you most of the performance if Sapphire’s claims are accurate.
Speaking of performance, the Pulse branded graphics cards are factory overclocked, just not as much. The Sapphire Pulse RX 580 with its 2,304 cores comes with a boost clock of 1366 MHz, the RX 570 and RX 570 ITX come with GPU boost clocks of 1,284 MHz and 1,244 MHz respectively, and the RX 550 has a boost clock of 1,206 MHz. Memory clocks sit at 8,000 MHz for the RX 580 and 7,000 MHz for the remaining Pulse cards (RX 570, RX 570 ITX, and RX 550).
Along with the introduction of its new Pulse series of graphics cards, Sapphire has entered a “strategic partnership” with motherboard manufacturer Asrock. The new graphics cards are shipping now and will be available at retailers shortly. Pricing for the RX 550 isn’t available, but prices for the other cards has appeared online as follows: Pulse RX 580 8GB for $229.99, Pulse RX 580 4GB for $199.99, Pulse RX 570 for $179.99, Pulse RX 570 ITX for $169.99.
In all, the Pulse cards appear to be about $20 cheaper than the Nitro+ variant. We will have to wait and see if those prices hold up once retailers get stock in.
Bristol Ridge Takes on Mobile: E2 Through FX
It is no secret that AMD has faced an uphill battle since the release of the original Core 2 processors from Intel. While stayed mostly competitive through the Phenom II years, they hit some major performance issues when moving to the Bulldozer architecture. While on paper the idea of Chip Multi-Threading sounded fantastic, AMD was never able to get the per thread performance up to expectations. While their CPUs performed well in heavily multi-threaded applications, they just were never seen in as positive of a light as the competing Intel products.
The other part of the performance equation that has hammered AMD is the lack of a new process node that would allow it to more adequately compete with Intel. When AMD was at 32 nm PD-SOI, Intel had introduced its 22nm TriGate/FinFET. AMD then transitioned to a 28nm HKMG planar process that was more size optimized than 32nm, but did not drastically improve upon power and transistor switching performance.
So AMD had a double whammy on their hands with an underperforming architecture and limitted to no access to advanced process nodes that would actually improve their power and speed situation. They could not force their foundry partners to spend billions on a crash course in FinFET technology to bring that to market faster, so they had to iterate and innovate on their designs.
Bristol Ridge is the fruit of that particular labor. It is also the end point to the architecture that was introduced with Bulldozer way back in 2011.
Subject: Editorial, Graphics Cards | May 18, 2016 - 01:18 PM | Tim Verry
Tagged: rumor, Polaris, opinion, HDMI 2.0, gpu, gddr5x, GDDR5, GCN, amd, 4k
While Nvidia's Pascal has held the spotlight in the news recently, it is not the only new GPU architecture debuting this year. AMD will soon be bringing its Polaris-based graphics cards to market for notebooks and mainstream desktop users. While several different code names have been thrown around for these new chips, they are consistently in general terms referred to as Polaris 10 and Polaris 11. AMD's Raja Kudori stated in an interview with PC Perspective that the numbers used in the naming scheme hold no special significance, but eventually Polaris will be used across the entire performance lineup (low end to high end graphics).
Naturally, there are going to be many rumors and leaks as the launch gets closer. In fact, Tech Power Up recently came into a number of interesting details about AMD's plans for Polaris-based graphics in 2016 including specifications and which areas of the market each chip is going to be aimed at.
Citing the usual "industry sources" familiar with the matter (take that for what it's worth, but the specifications do not seem out of the realm of possibility), Tech Power Up revealed that there are two lines of Polaris-based GPUs that will be made available this year. Polaris 10 will allegedly occupy the mid-range (mainstream) graphics option in desktops as well as being the basis for high end gaming notebook graphics chips. On the other hand, Polaris 11 will reportedly be a smaller chip aimed at thin-and-light notebooks and mainstream laptops.
Now, for the juicy bits of the leak: the rumored specifications!
AMD's "Polaris 10" GPU will feature 32 compute units (CUs) which TPU estimates – based on the assumption that each CU still contains 64 shaders on Polaris – works out to 2,048 shaders. The GPU further features a 256-bit memory interface along with a memory controller supporting GDDR5 and GDDR5X (though not at the same time heh). This would leave room for cheaper Polaris 10 derived products with less than 32 CUs and/or cheaper GDDR5 memory. Graphics cards would have as much as 8GB of memory initially clocked at 7 Gbps. Reportedly, the full 32 CU GPU is rated at 5.5 TFLOPS of single precision compute power and runs at a TDP of no more than 150 watts.
Compared to the existing Hawaii-based R9 390X, the upcoming R9 400 Polaris 10 series GPU has fewer shaders and less memory bandwidth. The memory is clocked 1 GHz higher, but the GDDR5X memory bus is half that of the 390X's 512-bit GDDR5 bus which results in 224 GB/s memory bandwidth for Polaris 10 versus 384 GB/s on Hawaii. The R9 390X has a slight edge in compute performance at 5.9 TFLOPS versus Polaris 10's 5.5 TFLOPS however the Polaris 10 GPU is using much less power and easily wins at performance per watt! It almost reaches the same level of single precision compute performance at nearly half the power which is impressive if it holds true!
|R9 390X||R9 390||R9 380||R9 400-Series "Polaris 10"|
|GPU Code name||Grenada (Hawaii)||Grenada (Hawaii)||Antigua (Tonga)||Polaris 10|
|Rated Clock||1050 MHz||1000 MHz||970 MHz||~1343 MHz|
|Memory Clock||6000 MHz||6000 MHz||5700 MHz||7000 MHz|
|Memory Bandwidth||384 GB/s||384 GB/s||182.4 GB/s||224 GB/s|
|TDP||275 watts||275 watts||190 watts||150 watts (or less)|
|Peak Compute||5.9 TFLOPS||5.1 TFLOPS||3.48 TFLOPS||5.5 TFLOPS|
|MSRP (current)||~$400||~$310||~$199||$ unknown|
Note: Polaris GPU clocks esitmated using assumption of 5.5 TFLOPS being peak compute and accurate number of shaders. (Thanks Scott.)
Another comparison that can be made is to the Radeon R9 380 which is a Tonga-based GPU with similar TDP. In this matchup, the Polaris 10 based chip will – at a slightly lower TDP – pack in more shaders, twice the amount of faster clocked memory with 23% more bandwidth, and provide a 58% increase in single precision compute horsepower. Not too shabby!
Likely, a good portion of these increases are made possible by the move to a smaller process node and utilizing FinFET "tri-gate" like transistors on the Samsung/Globalfoundries 14LPP FinFET manufacturing process, though AMD has also made some architecture tweaks and hardware additions to the GCN 4.0 based processors. A brief high level introduction is said to be made today in a webinar for their partners (though AMD has come out and said preemptively that no technical nitty-gritty details will be divulged yet). (Update: Tech Altar summarized the partner webinar. Unfortunately there was no major reveals other than that AMD will not be limiting AIB partners from pushing for the highest factory overclocks they can get).
Moving on from Polaris 10 for a bit, Polaris 11 is rumored to be a smaller GCN 4.0 chip that will top out at 14 CUs (estimated 896 shaders/stream processors) and 2.5 TFLOPS of single precision compute power. These chips aimed at mainstream and thin-and-light laptops will have 50W TDPs and will be paired with up to 4GB of GDDR5 memory. There is apparently no GDDR5X option for these, which makes sense at this price point and performance level. The 128-bit bus is a bit limiting, but this is a low end mobile chip we are talking about here...
|R7 370||R7 400 Series "Polaris 11"|
|GPU Code name||Trinidad (Pitcairn)||Polaris 11|
925 MHz base (975 MHz boost)
|Memory||2 or 4GB||4GB|
|Memory Clock||5600 MHz||? MHz|
|Memory Bandwidth||179.2 GB/s||? GB/s|
|TDP||110 watts||50 watts|
|Peak Compute||1.89 TFLOPS||2.5 TFLOPS|
|MSRP (current)||~$140 (less after rebates and sales)||$?|
Note: Polaris GPU clocks esitmated using assumption of 2.5 TFLOPS being peak compute and accurate number of shaders. (Thanks Scott.)
Fewer details were unveiled concerning Polaris 11, as you can see from the chart above. From what we know so far, it should be a promising successor to the R7 370 series even with the memory bus limitation and lower shader count as the GPU should be clocked higher, (it also might have more shaders in M series mobile variants versus of the 370 and lower mobile series) and a much lower TDP for at least equivalent if not a decent increase in performance. The lower power usage in particular will be hugely welcomed in mobile devices as it will result in longer battery life under the same workloads, ideally. I picked the R7 370 as the comparison as it has 4 gigabytes of memory and not that many more shaders and being a desktop chip readers may be more widely familiar with it. It also appears to sit between the R7 360 and R7 370 in terms of shader count and other features but is allegedly going to be faster than both of them while using at least (on paper) less than half the power.
Of course these are still rumors until AMD makes Polaris officially, well, official with a product launch. The claimed specifications appear reasonable though, and based on that there are a few important takeaways and thoughts I have.
The first thing on my mind is that AMD is taking an interesting direction here. While NVIDIA has chosen to start out its new generation at the top by announcing "big Pascal" GP100 and actually launching the GP104 GTX 1080 (one of its highest end consumer chips/cards) yesterday and then over the course of the year introducing lower end products AMD has opted for the opposite approach. AMD will be starting closer to the lower end with a mainstream notebook chip and high end notebook/mainstream desktop GPU (Polaris 11 and 10 respectively) and then over a year fleshing out its product stack (remember Raja Kudori stated Polaris and GCN 4 would be used across the entire product stack) and building up with bigger and higher end GPUs over time finally topping off with its highest end consumer (and professional) GPUs based on "Vega" in 2017.
This means, and I'm not sure if this was planned by either Nvidia or AMD or just how it happened to work out based on them following their own GPU philosophies (but I'm thinking the latter), that for some time after both architectures are launched AMD and NVIDIA's newest architectures and GPUs will not be directly competing with each other. Eventually they should meet in the middle (maybe late this year?) with a mid-range desktop graphics card and it will be interesting to see how they stack up at similar price points and hardware levels. Then, of course once "Vega" based GPUs hit (sadly probably in time for NV's big Pascal to launch heh. I'm not sure if Vega is Fury X replacement only or even beyond that to 1080Ti or even GP100 competitor) we should see GCN 4 on the new smaller process node square up against NVIDIA and it's 16nm Pascal products across the board (entire lineup). Which will have the better performance, which will win out in power usage and performance/watt and performance/$? All questions I wish I knew the answers to, but sadly do not!!
Speaking of price and performance/$... Polaris is actually looking pretty good so far at hitting much lower TDPs and power usage targets while delivering at least similar performance if not a good bit more. Both AMD and NVIDIA appear to be bringing out GPUs better than I expected to see as far as technological improvements in performance and power usage (these die shrinks have really helped even though from here on out that trend isn't really going to continue...). I hope that AMD can at least match NV in these areas at the mid range even if they do not have a high end GPU coming out soon (not until sometime after these cards launch and not really until Vega, the high end GCN GPU successor). At least on paper based on the leaked information the GPUs so far look good. My only worry is going to be pricing which I think is going to make or break these cards. AMD will need to price them competitively and aggressively to ensure their adoption and success.
I hope that doing the rollout this way (starting with lower end chips) helps AMD to iron out the new smaller process node and that they are able to get good yields so that they can be aggressive with pricing here and eventually at the hgh end!
I am looking forward to more information on AMD's Polaris architecture and the graphics cards based on it!
- AMD Capsaicin GDC Live Stream and Live Blog TODAY!!
- AMD GPU Roadmap: Capsaicin Names Upcoming Architectures
- AMD's Raja Koduri talks moving past CrossFire, smaller GPU dies, HBM2 and more.
- AMD High-End Polaris Expected for 2016
- CES 2016: AMD Shows Polaris Architecture and HDMI FreeSync Displays
I will admit that I am not 100% up on all the rumors and I apologize for that. With that said, I would love to hear what your thoughts are on AMD's upcoming GPUs and what you think about these latest rumors!
Lower Power, Same Performance
AMD is in a strange position in that there is a lot of excitement about their upcoming Zen architecture, but we are still many months away from that introduction. AMD obviously needs to keep the dollars flowing in, and part of that means that we get refreshes now and then of current products. The “Kaveri” products that have been powering the latest APUs from AMD have received one of those refreshes. AMD has done some redesigning of the chip and tweaked the process technology used to manufacture them. The resulting product is the “Godavari” refresh that offers slightly higher clockspeeds as well as better overall power efficiency as compared to the previous “Kaveri” products.
One of the first refreshes was the A8-7670K that hit the ground in November of 2015. This is a slightly cut down part that features 6 GPU compute units vs. the 8 that a fully enabled Godavari chip has. This continues to be a FM2+ based chip with a 95 watt TDP. The clockspeed of this part goes from 3.6 GHz to 3.9 GHz. The GPU portion runs at the same 757 MHz that the original A10-7850K ran at. It is interesting to note that it is still a 95 watt TDP part with essentially the same clockspeeds as the 7850K, but with two fewer GPU compute units.
The other product being covered here is a bit more interesting. The A10-7860K looks to be a larger improvement from the previous 7850K in terms of power and performance. It shares the same CPU clockspeed range as the 7850K (3.6 GHz to 3.9 GHz), but improves upon the GPU clockspeed by hitting around 800 MHz. At first this seems underwhelming until we realize that AMD has lowered the TDP from 95 watts down to 65 watts. Less power consumed and less heat produced for the same performance from the CPU side and improved performance from the GPU seems like a nice advance.
AMD continues to utilize GLOBALFOUNDRIES 28 nm Bulk/HKMG process for their latest APUs and will continue to do so until Zen is released late this year. This is not the same 28 nm process that we were introduced to over four years ago. Over that time improvements have been made to improve yields and bins, as well as optimize power and clockspeed. GF also can adjust the process on a per batch basis to improve certain aspects of a design (higher speed, more leakage, lower power, etc.). They cannot produce miracles though. Do not expect 22 nm FinFET performance or density with these latest AMD products. Those kinds of improvements will show up with Samsung/GF’s 14nm LPP and TSMC’s 16nm FF+ lines. While AMD will be introducing GPUs on 14nm LPP this summer, the Zen launch in late 2016 will be the first AMD CPU to utilize that advanced process.
Subject: Processors | April 5, 2016 - 06:30 AM | Josh Walrath
Tagged: mobile, hp, GCN, envy, ddr4, carrizo, Bristol Ridge, APU, amd, AM4
Today AMD is “pre-announcing” their latest 7th generation APU. Codenamed “Bristol Ridge”, this new SOC is based off of the Excavator architecture featured in the previous Carrizo series of products. AMD provided very few hints as to what was new and different in Bristol Ridge as compared to Carrizo, but they have provided a few nice hints.
They were able to provide a die shot of the new Bristol Ridge APU and there are some interesting differences between it and the previous Carrizo. Unfortunately, there really are no changes that we can see from this shot. Those new functional units that you are tempted to speculate about? For some reason AMD decided to widen out the shot of this die. Those extra units around the border? They are the adjacent dies on the wafer. I was bamboozled at first, but happily Marc Sauter pointed it out to me. No new functional units for you!
This is the Carrizo shot. It is functionally identical to what we see with Bristol Ridge.
AMD appears to be using the same 28 nm HKMG process from GLOBALFOUNDRIES. This is not going to give AMD much of a jump, but from information in the industry GLOBALFOUNDRIES and others have put an impressive amount of work into several generations of 28 nm products. TSMC is on their third iteration which has improved power and clock capabilities on that node. GLOBALFOUNDRIES has continued to improve their particular process and likely Bristol Ridge is going to be the last APU built on that node.
All of the competing chips are rated at 15 watts TDP. Intel has the compute advantage, but AMD is cleaning up when it comes to graphics.
The company has also continued to improve upon their power gating and clocking technologies to keep TDPs low, yet performance high. AMD recently released the Godavari APUs to the market which exhibit better clocking and power characteristics from the previous Kaveri. Little was done on the actual design, rather it was improved process tech as well as better clock control algorithms that achieved these advances. It appears as though AMD has continued this trend with Bristol Ridge.
We likely are not seeing per clock increases, but rather higher and longer sustained clockspeeds providing the performance boost that we are seeing between Carrizo and Bristol Ridge. In these benchmarks AMD is using 15 watt TDP products. These are mobile chips and any power improvements will show off significant gains in overall performance. Bristol Ridge is still a native quad core part with what looks to be an 8 module GCN unit.
Again with all three products at a 15 watt TDP we can see that AMD is squeezing every bit of performance it can with the 28 nm process and their Excavator based design.
The basic core and GPU design look relatively unchanged, but obviously there were a lot of tweaks applied to give the better performance at comparable TDPs.
AMD is announcing this along with the first product that will feature this APU. The HP Envy X360. This convertible tablet offers some very nice features and looks to be one of the better implementations that AMD has seen using its latest APUs. Carrizo had some wins, but taking marketshare back from Intel in the mobile space has been tortuous at best. AMD obviously hopes that Bristol Ridge in the sub-35 watt range will continue to show fight for the company in this important market. Perhaps one of the more interesting features is the option for the PCIe SSD. Hopefully AMD will send out a few samples so we can see what a more “premium” type convertible can do with the AMD silicon.
The HP Envy X360 convertible in all of its glory.
Bristol Ridge will be coming to the AM4 socket infrastructure in what appears to be a Computex timeframe. These parts will of course feature higher TDPs than what we are seeing here with the 15 watt unit that was tested. It seems at that time AMD will announce the full lineup from top to bottom and start seeding the market with AM4 boards that will eventually house the “Zen” CPUs that will show up in late 2016.
Subject: Graphics Cards | December 31, 2015 - 01:41 PM | Sebastian Peak
Tagged: rumor, report, radeon, Polaris, graphics card, gpu, GCN, amd
A report claims that Polaris will succeed GCN (Graphics Core Next) as the next AMD Radeon GPU core, which will power the 400-series graphics cards.
Image via VideoCardz.com
As these rumors go, this is about as convoluted as it gets. VideoCardz has published the story, sourced from WCCFtech, who was reporting on a post with supposedly leaked slides at HardwareBattle. The primary slide in question has since been pulled, and appears below:
Image via HWBattle.com
Of course the name does nothing to provide architectural information on this presumptive GCN replacement, and a new core for the 400-series GPUs was expected anyway after the 300-series was largely a rebranded 200-series (that's a lot of series). Let's hope actual details emerge soon, but for now we can speculate on mysterious tweets from certain interested parties:
— Raja Koduri (@GFXChipTweeter) November 26, 2015
Subject: Graphics Cards | November 26, 2015 - 03:09 PM | Scott Michaud
Tagged: amd, graphics drivers, GCN, terascale
The Graphics Core Next (GCN) architecture is now a minimum requirement for upcoming AMD graphics drivers. If your graphics card (or APU) uses the TeraScale family of microarchitectures, then your last expected WHQL driver is AMD Catalyst 15.7.1 for Windows 7, 8.x, and 10. You aren't entirely left out of Radeon Software Crimson Edition, however. The latest Crimson Edition Beta driver is compatible with TeraScale, but the upcoming certified one will not be.
GCN was introduced with the AMD Radeon HD 7000 series, although it was only used in the Radeon HD 7700 series GPUs and above. The language doesn't seem to rule out an emergency driver release, such as if Microsoft breaks something in a Windows 10 update that causes bluescreens and fire on older hardware, but they also don't say that they will either. NVIDIA made a similar decision to deprecate pre-Fermi architectures back in March of 2014, which applied to the release of GeForce 343 Drivers in September of that year. Extended support for NVIDIA's old cards end on April 1st, 2016.
I wonder why AMD chose a beta driver to stop with, though. If AMD intended to support TeraScale with Crimson, then why wouldn't they keep it supported until at the first WHQL-certified version? If they didn't intend to support TeraScale, then why go through the effort of supporting it with the beta driver? This implies that AMD reached a hurdle with TeraScale that they didn't want to overcome. That may not be the case, but it's the first thing that comes to my mind none-the-less. Probably the best way to tell is to see how people with Radeon HD 6000-series (or lower-end 7000/8000-series) cards work with Radeon Software Crimson Beta.
Likely the last drivers that users with Radeon HD 6000-series graphics need are 15.7.1 or Radeon Software Crimson Edition Beta. We will soon learn which of the two will be best long-term.
Or, of course, you can buy a newer GPU / APU when you get a chance.
Subject: Graphics Cards | November 16, 2015 - 09:34 PM | Scott Michaud
Tagged: amd, radeon, GCN
Late last week, Forbes published an editorial by Patrick Moorhead, who spoke with Raja Koduri about AMD's future in the GPU industry. Patrick was a Corporate Vice President at AMD until late 2011. He then created Moor Insights and Strategy, which provides industry analysis. He regularly publishes editorials to Forbes and CIO. Raja Koduri is the head of the Radeon Technologies Group at AMD.
I'm going to be focusing on a brief mention a little more than half-way through, though. According to the editorial, Raja stated that AMD will release two new GPUs in 2016. “He promised two brand new GPUs in 2016, which are hopefully going to both be 14nm/16nm FinFET from GlobalFoundries or TSMC and will help make Advanced Micro Devices more power and die size competitive.”
We have been expecting AMD's Artic Islands to arrive at some point in 2016, which will compete with NVIDIA's Pascal architecture at the high end. AMD's product stack has been relatively stale for a while, with most of the innovation occurring at the top end and pushing the previous top-end down a bit. Two new GPU architectures almost definitely mean that a second one will focus on the lower end of the market, making more compelling products on smaller processes to be more power efficient, cheaper per unit, and include newer features.
Add the recent report of the Antigua architecture, which I assume is in addition to AMD's two architecture announcement, and AMD's product stack could look much less familiar next year.
To the Max?
Much of the PC enthusiast internet, including our comments section, has been abuzz with “Asynchronous Shader” discussion. Normally, I would explain what it is and then outline the issues that surround it, but I would like to swap that order this time. Basically, the Ashes of the Singularity benchmark utilizes Asynchronous Shaders in DirectX 12, but they disable it (by Vendor ID) for NVIDIA hardware. They say that this is because, while the driver reports compatibility, “attempting to use it was an unmitigated disaster in terms of performance and conformance”.
AMD's Robert Hallock claims that NVIDIA GPUs, including Maxwell, cannot support the feature in hardware at all, while all AMD GCN graphics cards do. NVIDIA has yet to respond to our requests for an official statement, although we haven't poked every one of our contacts yet. We will certainly update and/or follow up if we hear from them. For now though, we have no idea whether this is a hardware or software issue. Either way, it seems more than just politics.
So what is it?
Simply put, Asynchronous Shaders allows a graphics driver to cram workloads in portions of the GPU that are idle, but not otherwise available. For instance, if a graphics task is hammering the ROPs, the driver would be able to toss an independent physics or post-processing task into the shader units alongside it. Kollock from Oxide Games used the analogy of HyperThreading, which allows two CPU threads to be executed on the same core at the same time, as long as it has the capacity for it.
Kollock also notes that compute is becoming more important in the graphics pipeline, and it is possible to completely bypass graphics altogether. The fixed-function bits may never go away, but it's possible that at least some engines will completely bypass it -- maybe even their engine, several years down the road.
But, like always, you will not get an infinite amount of performance by reducing your waste. You are always bound by the theoretical limits of your components, and you cannot optimize past that (except for obviously changing the workload itself). The interesting part is: you can measure that. You can absolutely observe how long a GPU is idle, and represent it as a percentage of a time-span (typically a frame).
And, of course, game developers profile GPUs from time to time...
According to Kollock, he has heard of some console developers getting up to 30% increases in performance using Asynchronous Shaders. Again, this is on console hardware and so this amount may increase or decrease on the PC. In an informal chat with a developer at Epic Games, so massive grain of salt is required, his late night ballpark “totally speculative” guesstimate is that, on the Xbox One, the GPU could theoretically accept a maximum ~10-25% more work in Unreal Engine 4, depending on the scene. He also said that memory bandwidth gets in the way, which Asynchronous Shaders would be fighting against. It is something that they are interested in and investigating, though.
This is where I speculate on drivers. When Mantle was announced, I looked at its features and said “wow, this is everything that a high-end game developer wants, and a graphics developer absolutely does not”. From the OpenCL-like multiple GPU model taking much of the QA out of SLI and CrossFire, to the memory and resource binding management, this should make graphics drivers so much easier.
It might not be free, though. Graphics drivers might still have a bunch of games to play to make sure that work is stuffed through the GPU as tightly packed as possible. We might continue to see “Game Ready” drivers in the coming years, even though much of that burden has been shifted to the game developers. On the other hand, maybe these APIs will level the whole playing field and let all players focus on chip design and efficient injestion of shader code. As always, painfully always, time will tell.