To the Max?
Much of the PC enthusiast internet, including our comments section, has been abuzz with “Asynchronous Shader” discussion. Normally, I would explain what it is and then outline the issues that surround it, but I would like to swap that order this time. Basically, the Ashes of the Singularity benchmark utilizes Asynchronous Shaders in DirectX 12, but they disable it (by Vendor ID) for NVIDIA hardware. They say that this is because, while the driver reports compatibility, “attempting to use it was an unmitigated disaster in terms of performance and conformance”.
AMD's Robert Hallock claims that NVIDIA GPUs, including Maxwell, cannot support the feature in hardware at all, while all AMD GCN graphics cards do. NVIDIA has yet to respond to our requests for an official statement, although we haven't poked every one of our contacts yet. We will certainly update and/or follow up if we hear from them. For now though, we have no idea whether this is a hardware or software issue. Either way, it seems more than just politics.
So what is it?
Simply put, Asynchronous Shaders allows a graphics driver to cram workloads in portions of the GPU that are idle, but not otherwise available. For instance, if a graphics task is hammering the ROPs, the driver would be able to toss an independent physics or post-processing task into the shader units alongside it. Kollock from Oxide Games used the analogy of HyperThreading, which allows two CPU threads to be executed on the same core at the same time, as long as it has the capacity for it.
Kollock also notes that compute is becoming more important in the graphics pipeline, and it is possible to completely bypass graphics altogether. The fixed-function bits may never go away, but it's possible that at least some engines will completely bypass it -- maybe even their engine, several years down the road.
But, like always, you will not get an infinite amount of performance by reducing your waste. You are always bound by the theoretical limits of your components, and you cannot optimize past that (except for obviously changing the workload itself). The interesting part is: you can measure that. You can absolutely observe how long a GPU is idle, and represent it as a percentage of a time-span (typically a frame).
And, of course, game developers profile GPUs from time to time...
According to Kollock, he has heard of some console developers getting up to 30% increases in performance using Asynchronous Shaders. Again, this is on console hardware and so this amount may increase or decrease on the PC. In an informal chat with a developer at Epic Games, so massive grain of salt is required, his late night ballpark “totally speculative” guesstimate is that, on the Xbox One, the GPU could theoretically accept a maximum ~10-25% more work in Unreal Engine 4, depending on the scene. He also said that memory bandwidth gets in the way, which Asynchronous Shaders would be fighting against. It is something that they are interested in and investigating, though.
This is where I speculate on drivers. When Mantle was announced, I looked at its features and said “wow, this is everything that a high-end game developer wants, and a graphics developer absolutely does not”. From the OpenCL-like multiple GPU model taking much of the QA out of SLI and CrossFire, to the memory and resource binding management, this should make graphics drivers so much easier.
It might not be free, though. Graphics drivers might still have a bunch of games to play to make sure that work is stuffed through the GPU as tightly packed as possible. We might continue to see “Game Ready” drivers in the coming years, even though much of that burden has been shifted to the game developers. On the other hand, maybe these APIs will level the whole playing field and let all players focus on chip design and efficient injestion of shader code. As always, painfully always, time will tell.
Subject: Processors | February 24, 2015 - 06:18 PM | Jeremy Hellstrom
Tagged: Puma+, Puma, Kaveri, ISSCC 2015, ISSCC, GCN, Excavator, Carrizo-L, carrizo, APU, amd
While it is utterly inconceivable that Josh might have missed something in his look at Carrizo, that hasn't stopped certain Canadians from talking about Gila County, Arizona. AMD's upcoming processor launch is a little more interesting than just another Phenom II launch, especially for those worried about power consumption. With Adaptive Voltage and Frequency Scaling the new Excavator based chips will run very well at the sub-15W per core pair range which is perfect for POS, airplane entertainment and even in casinos. The GPU portion speaks to those usage scenarios though you can't expect an R9 295 at that wattage. Check out Hardware Canucks' coverage right here.
"AMD has been working hard on their mobile Carrizo architecture and they're now releasing some details about these Excavator architecture-equipped next generation APUs."
Here are some more Processor articles from around the web:
- AMD's new Carrizo: The x86 notebook processor that thinks it's a GPU @ The Register
- AMD Carrizo APU Details Revealed @ TechARP
- AMD FX-8320E Performance On Linux @ Phoronix
- Intel Broadwell HD Graphics 5500: Windows 8.1 vs. Linux @ Phoronix
- Preliminary Tests Of Intel Sandy Bridge & Ivy Bridge vs. Broadwell @ Phoronix
AMD Details Carrizo Further
Some months back AMD introduced us to their “Carrizo” product. Details were slim, but we learned that this would be another 28 nm part that has improved power efficiency over its predecessor. It would be based on the new “Excavator” core that will be the final implementation of the Bulldozer architecture. The graphics will be based on the latest iteration of the GCN architecture as well. Carrizo would be a true SOC in that it integrates the southbridge controller. The final piece of information that we received was that it would be interchangeable with the Carrizo-L SOC, which is a extremely low power APU based on the Puma+ cores.
A few months later we were invited by AMD to their CES meeting rooms to see early Carrizo samples in action. These products were running a variety of applications very smoothly, but we were not informed of speeds and actual power draw. All that we knew is that Carrizo was working and able to run pretty significant workloads like high quality 4K video playback. Details were yet again very scarce other than the expected timeline of release, the TDP ratings of these future parts, and how it was going to be a significant jump in energy efficiency over the previous Kaveri based APUs.
AMD is presenting more information on Carrizo at the ISSCC 2015 conference. This information dives a little deeper into how AMD has made the APU smaller, more power efficient, and faster overall than the previous 15 watt to 35 watt APUs based on Kaveri. AMD claims that they have a product that will increase power efficiency in a way not ever seen before for the company. This is particularly important considering that Carrizo is still a 28 nm product.
Subject: General Tech, Graphics Cards | December 2, 2014 - 03:11 AM | Scott Michaud
Tagged: amd, GCN, dice, frostbite
Inverse trigonometric functions are difficult to compute. Their use is often avoided like the plague. If, however, the value is absolutely necessary, it will probably be solved by approximations or, if possible, replacing them with easier functions by clever use of trig identities.
If you want to see how the experts approach this problem, then Sébastien Lagarde, a senior developer of the Frostbite engine at DICE, goes into detail with a blog post. By detail, I mean you will see some GPU assembly being stepped through by the end of it. What makes this particularly interesting is the diagrams at the end, showing what each method outputs as represented by the shading of a sphere.
If you are feeling brave, take a look.
Filling the Product Gaps
In the first several years of my PCPer employment, I typically handled most of the AMD CPU refreshes. These were rather standard affairs that involved small jumps in clockspeed and performance. These happened every 6 to 8 months, with the bigger architectural shifts happening some years apart. We are finally seeing a new refresh of the AMD APU parts after the initial release of Kaveri to the world at the beginning of this year. This update is different. Unlike previous years, there are no faster parts than the already available A10-7850K.
This refresh deals with fleshing out the rest of the Kaveri lineup with products that address different TDPs, markets, and prices. The A10-7850K is still the king when it comes to performance on the FM2+ socket (as long as users do not pay attention to the faster CPU performance of the A10-6800K). The initial launch in January also featured another part that never became available until now; the A8-7600 was supposed to be available some months ago, but is only making it to market now. The 7600 part was unique in that it had a configurable TDP that went from 65 watts down to 45 watts. The 7850K on the other hand was configurable from 95 watts down to 65 watts.
So what are we seeing today? AMD is releasing three parts to address the lower power markets that AMD hopes to expand their reach into. The A8-7600 was again detailed back in January, but never released until recently. The other two parts are brand new. The A10-7800 is a 65 watt TDP part with a cTDP that goes down to 45 watts. The other new chip is the A6-7600K which is unlocked, has a configurable TDP, and looks to compete directly with Intel’s recently released 20 year Anniversary Pentium G3258.
Subject: General Tech | July 17, 2014 - 11:37 PM | Tim Verry
Tagged: quarterly earnings, GCN, financial results, APU, amd
Today, AMD posted financial results for its second quarter of 2014. The company posted quarterly revenue of $1.44 billion, operating income of $63 million, and ultimately a net loss of $36 million (or $0.05 loss per share). The results are an improvement over both the previous quarter and a marked improvement over the same quarter last year.
The chart below compares the second quarter results to the previous quarter (Q1'14) and the same quarter last year (Q2'13). AMD saw increased revenue and operating income, but a higher net loss versus last quarter. Unfortunately, AMD is still saddled with a great deal of debt, which actually increased from 2.14 billion in Q1 2014 to $2.21 billion at the end of the second quarter.
|Q2 2014||Q1 2014||Q2 2014||Q2 2013|
|Revenue||$1.44 Billion||$1.40 Billion||$1.44 Billion||$1.16 Billion|
|Operating Income||$63 Million||$49 Million||$63 Million||($29 Million)|
|Net Profit/(Loss)||($36 Million)||($20 Million)||($36 Million)||($74 Million)|
The Computing Solutions division saw increased revenue of 1% over last quarter, but revenue fell 20% year over year due to fewer chips being sold.
On the bright side, the Graphics and Visual Solutions group saw quarterly revenue increase by 5% over last quarter and 141% YoY. The massive YoY increase is due, in part, to AMD's Semi-Custom Business unit and the SoCs that have come out of there (including the chips used in the latest gaming consoles).
Further, the company is currently sourcing 50% of its wafers from Global Foundries.
“Our transformation strategy is on track and we expect to deliver full year non-GAAP profitability and year-over-year revenue growth. We continue to strengthen our business model and shape AMD into a more agile company offering differentiated solutions for a diverse set of markets.”
-AMD CEO Rory Reed
AMD expects to see third quarter revenue increase by 2% (plus or minus 3%). Following next quarter, AMD will begin production of its Seattle ARM processors. Perhaps even more interesting will be 2016 when AMD is slated to introduce new x86 and GCN processors on a 20nm process.
The company is working towards being more efficient and profitable, and the end-of-year results will be interesting to see.
Also read: AMD Restructures. Lisa Su Is Now COO @ PC Perspective
Subject: Processors | July 9, 2014 - 05:42 PM | Josh Walrath
Tagged: nvidia, msi, Luxmark, Lightning, hsa, GTX 580, GCN, APU, amd, A88X, A10-7850K
When I first read many of the initial AMD A10 7850K reviews, my primary question was how would the APU act if there was a different GPU installed on the system and did not utilize the CrossFire X functionality that AMD talked about. Typically when a user installs a standalone graphics card on the AMD FM2/FM2+ platform, they disable the graphics portion of the APU. They also have to uninstall the AMD Catalyst driver suite. So this then leaves the APU as a CPU only, and all of that graphics silicon is left silent and dark.
Who in their right mind would pair a high end graphics card with the A10-7850K? This guy!
Does this need to be the case? Absolutely not! The GCN based graphics unit on the latest Kaveri APUs is pretty powerful when used in GPGPU/OpenCL applications. The 4 cores/2 modules and 8 GCN cores can push out around 856 GFlops when fully utilized. We also must consider that the APU is the first fully compliant HSA (Heterogeneous System Architecture) chip, and it handles memory accesses much more efficiently than standalone GPUs. The shared memory space with the CPU gets rid of a lot of the workarounds typically needed for GPGPU type applications. It makes sense that users would want to leverage the performance potential of a fully functioning APU while upgrading their overall graphics performance with a higher end standalone GPU.
To get this to work is very simple. Assuming that the user has been using the APU as their primary graphics controller, they should update to the latest Catalyst drivers. If the user is going to use an AMD card, then it would behoove them to totally uninstall the Catalyst driver and re-install only after the new card is installed. After this is completed restart the machine, go into the UEFI, and change the primary video boot device to PEG (PCI-Express Graphics) from the integrated unit. Save the setting and shut down the machine. Insert the new video card and attach the monitor cable(s) to it. Boot the machine and either re-install the Catalyst suite if an AMD card is used, or install the latest NVIDIA drivers if that is the graphics choice.
Windows 7 and Windows 8 allow users to install multiple graphics drivers from different vendors. In my case I utilized a last generation GTX 580 (the MSI N580GTX Lightning) along with the AMD A10 7850K. These products coexist happily together on the MSI A88X-G45 Gaming motherboard. The monitor is attached to the NVIDIA card and all games are routed through that since it is the primary graphics adapter. Performance seems unaffected with both drivers active.
I find it interesting that the GPU portion of the APU is named "Spectre". Who owns those 3dfx trademarks anymore?
When I load up Luxmark I see three entries: the APU (CPU and GPU portions), the GPU portion of the APU, and then the GTX 580. Luxmark defaults to the GPUs. We see these GPUs listed as “Spectre”, which is the GCN portion of the APU, and the NVIDIA GTX 580. Spectre supports OpenCL 1.2 while the GTX 580 is an OpenCL 1.1 compliant part.
With both GPUs active I can successfully run the Luxmark “Sala” test. The two units perform better together than when they are run separately. Adding in the CPU does increase the score, but not by very much (my guess here is that the APU is going to be very memory bandwidth bound in such a situation). Below we can see the results of the different units separate and together.
These results make me hopeful about the potential of AMD’s latest APU. It can run side by side with a standalone card, and applications can leverage the performance of this unit. Now all we need is more HSA aware software. More time and more testing is needed for setups such as this, and we need to see if HSA enabled software really does see a boost from using the GPU portion of the APU as compared to a pure CPU piece of software or code that will run on the standalone GPU.
Personally I find the idea of a heterogeneous solution such as this appealing. The standalone graphics card handles the actual graphics portions, the CPU handles that code, and the HSA software can then fully utilize the graphics portion of the APU in a very efficient manner. Unfortunately, we do not have hard numbers on the handful of HSA aware applications out there, especially when used in conjunction with standalone graphics. We know in theory that this can work (and should work), but until developers get out there and really optimize their code for such a solution, we simply do not know if having an APU will really net the user big gains as compared to something like the i7 4770 or 4790 running pure x86 code.
In the meantime, at least we know that these products work together without issue. The mixed mode OpenCL results make a nice case for improving overall performance in such a system. I would imagine with more time and more effort from developers, we could see some really interesting implementations that will fully utilize a system such as this one. Until then, happy experimenting!
FM2+ Has a High End?
AMD faces a bit of a quandary when it comes to their products. Their APUs are great at graphics, but not so great at general CPU performance. Their products are all under $200 for the CPU/APU but these APUs are not popular with the enthusiast and gaming crowd. Yes, they can make excellent budget gaming systems for those who do not demand ultra-high resolutions and quality settings, but it is still a tough sell for a lot of the mainstream market; the primary way AMD pushes these products is price.
Perhaps the irony here is that AMD is extremely competitive with Intel when it comes to chipset features. The latest A88X Fusion Control Hub is exceptionally well rounded with four native USB 3.0 ports, ten USB 2.0 ports, and eight SATA-6G ports. Performance of this chipset is not all that far off from what Intel offers with the Z87 chipset (USB and SATA-6G are slower, but not dramatically so). The chip also offers RAID 0, 1, 5, and 10 support as well as a 10/100/1000 Ethernet MAC (but a physical layer chip is still required).
Now we get back to price. AMD is not charging a whole lot for these FCH units, even the top end A88X. I do not have the exact number, but it is cheap as compared to the competing Intel option. Intel’s chipset business has made money for the company for years, but AMD does not have that luxury. AMD needs to bundle effectively to be competitive, so it is highly doubtful that the chipset division makes a net profit at the end of the day. Their job is to help push AMD’s CPU and APU offerings as much as possible.
These low cost FCH chips allow motherboard manufacturers to place a lot of customization on their board, but they are still limited in what they can do. A $200+ motherboard simply will not fly with consumers for the level of overall performance that even the latest AMD A10 7850K APU provides in CPU bound workloads. Unfortunately, HSA has not yet taken off to leverage the full potential of the Kaveri APU. We have had big developments, just not big enough that the majority of daily users out there will require an AMD APU. Until that happens, AMD will not be viewed favorably when it comes to its APU offerings in gaming or high performance systems.
The quandary obviously is how AMD and its motherboard partners can create inexpensive motherboards that are feature packed, yet will not break the bank or become burdensome towards APU sales? The FX series of processors from AMD do have a bit more leeway as the performance of the high end FX-8350 is not considered bad, and it is a decent overclocker. That platform can sustain higher motherboard costs due to this performance. The APU side, not so much. The answer to this quandary is tradeoffs.
Subject: General Tech | June 7, 2014 - 04:32 AM | Scott Michaud
Tagged: microsoft, xbox one, xbone, gpgpu, GCN
Shortly after the Kinect deprecation, Microsoft has announced that a 10% boost in GPU performance will be coming to Xbox One. This, of course, is the platform allowing developers to avoid the typical overhead which Kinect requires for its various tasks. Updated software will allow game developers to regain some or all of that compute time back.
Still looks like Wall-E grew a Freddie Mercury 'stache.
While it "might" (who am I kidding?) be used to berate Microsoft for ever forcing the Kinect upon users in the first place, this functionality was planned from before launch. Pre-launch interviews stated that Microsoft was looking into scheduling their compute tasks while the game was busy, for example, hammering the ROPs and leaving the shader cores idle. This could be that, and only that, or it could be a bit more if developers are allowed to opt out of most or all Kinect computations altogether.
The theoretical maximum GPU compute and shader performance of the Xbox One GPU is still about 29% less than its competitor, the PS4. Still, 29% less is better than about 36% less. Not only that, but the final result will always come down to the amount of care and attention spent on any given title by its developers. This will give them more breathing room, though.
Then, of course, the PC has about 3x the shader performance of either of those systems in a few single-GPU products. Everything should be seen in perspective.
AMD Brings Kabini to the Desktop
Perhaps we are performing a study of opposites? Yesterday Ryan posted his R9 295X2 review, which covers the 500 watt, dual GPU monster that will be retailing for $1499. A card that is meant for only the extreme enthusiast who has plenty of room in their case, plenty of knowledge about their power supply, and plenty of electricity and air conditioning to keep this monster at bay. The product that I am reviewing could not be any more different. Inexpensive, cool running, power efficient, and can be fit pretty much anywhere. These products can almost be viewed as polar opposites.
The interesting thing of course is that it shows how flexible AMD’s GCN architecture is. GCN can efficiently and effectively power the highest performing product in AMD’s graphics portfolio, as well as their lowest power offerings in the APU market. The performance scales very linearly when it comes to adding in more GCN compute cores.
The product that I am of course referring to are the latest Athlon and Sempron APUs that are based on the Kabini architecture which fuses Jaguar x86 cores with GCN compute cores. These APUs were announced last month, but we did not have the chance at the time to test them. Since then these products have popped up in a couple of places around the world, but this is the first time that reviewers have officially received product from AMD and their partners.