All | Editorial | General Tech | Graphics Cards | Networking | Motherboards | Cases and Cooling | Processors | Chipsets | Memory | Displays | Systems | Storage | Mobile | Shows and Expos
Polaris 10 Specifications
It would be hard at this point to NOT know about the Radeon RX 480 graphics card. AMD and the Radeon Technologies Group has been talking publicly about the Polaris architecture since December of 2015 with lofty ambitions. In the precarious position that the company rests, being well behind in market share and struggling to compete with the dominant player in the market (NVIDIA), the team was willing to sacrifice sales of current generation parts (300-series) in order to excite the user base for the upcoming move to Polaris. It is a risky bet and one that will play out over the next few months in the market.
Since then AMD continued to release bits of information at a time. First there were details on the new display support, then information about the 14nm process technology advantages. We then saw demos of working silicon at CES with targeted form factors and then at events in Macau, showed press the full details and architecture. At Computex they announced rough performance metrics and a price point. Finally, at E3, AMD discussed the RX 460 and RX 470 cousins and the release date of…today. It’s been quite a whirlwind.
Today the rubber meets the road: is the Radeon RX 480 the groundbreaking and stunning graphics card that we have been promised? Or does it struggle again to keep up with the behemoth that is NVIDIA’s GeForce product line? AMD’s marketing team would have you believe that the RX 480 is the start of some kind of graphics revolution – but will the coup be successful?
Join us for our second major graphics architecture release of the summer and learn for yourself if the Radeon RX 480 is your next GPU.
AMD gets aggressive
At its Computex 2016 press conference in Taipei today, AMD has announced the branding and pricing, along with basic specifications, for one of its upcoming Polaris GPUs shipping later this June. The Radeon RX 480, based on Polaris 10, will cost just $199 and will offer more than 5 TFLOPS of compute capability. This is an incredibly aggressive move obviously aimed at continuing to gain market share at NVIDIA's expense. Details of the product are listed below.
|RX 480||GTX 1070||GTX 980||GTX 970||R9 Fury||R9 Nano||R9 390X||R9 390|
|GPU||Polaris 10||GP104||GM204||GM204||Fiji Pro||Fiji XT||Hawaii XT||Grenada Pro|
|Rated Clock||?||1506 MHz||1126 MHz||1050 MHz||1000 MHz||up to 1000 MHz||1050 MHz||1000 MHz|
|Memory Clock||8000 MHz||8000 MHz||7000 MHz||7000 MHz||500 MHz||500 MHz||6000 MHz||6000 MHz|
|Memory Interface||256-bit||256-bit||256-bit||256-bit||4096-bit (HBM)||4096-bit (HBM)||512-bit||512-bit|
|Memory Bandwidth||256 GB/s||256 GB/s||224 GB/s||196 GB/s||512 GB/s||512 GB/s||384 GB/s||384 GB/s|
|TDP||150 watts||150 watts||165 watts||145 watts||275 watts||175 watts||275 watts||230 watts|
|Peak Compute||> 5.0 TFLOPS||5.7 TFLOPS||4.61 TFLOPS||3.4 TFLOPS||7.20 TFLOPS||8.19 TFLOPS||5.63 TFLOPS||5.12 TFLOPS|
The RX 480 will ship with 36 CUs totaling 2304 stream processors based on the current GCN breakdown of 64 stream processors per CU. AMD didn't list clock speeds and instead is only telling us that the performance offered will exceed 5 TFLOPS of compute; how much is still a mystery and will likely change based on final clocks.
The memory system is powered by a 256-bit GDDR5 memory controller running at 8 Gbps and hitting 256 GB/s of throughput. This is the same resulting memory bandwidth as NVIDIA's new GeForce GTX 1070 graphics card.
AMD also tells us that the TDP of the card is 150 watts, again matching the GTX 1070, though without more accurate performance data it's hard to assume anything about the new architectural efficiency of the Polaris GPUs built on the 14nm Global Foundries process.
Obviously the card will support FreeSync and all of AMD's VR features, in addition to being DP 1.3 and 1.4 ready.
AMD stated that the RX 480 will launch on June 29th.
I know that many of you will want us to start guessing at what performance level the new RX 480 will actually fall, and trust me, I've been trying to figure it out. Based on TFLOPS rating and memory bandwidth alone, it seems possible that the RX 480 could compete with the GTX 1070. But if that were the case, I don't think even AMD is crazy enough to set the price this far below where the GTX 1070 launched, $379.
I would expect the configuration of the GCN architecture to remain mostly unchanged on Polaris, compared to Hawaii, for the same reasons that we saw NVIDIA leave Pascal's basic compute architecture unchanged compared to Maxwell. Moving to the new process node was the primary goal and adding to that with drastic shifts in compute design might overly complicate product development.
In the past, we have observed that AMD's GCN architecture tends to operate slightly less efficiently in terms of rated maximum compute capability versus realized gaming performance, at least compared to Maxwell and now Pascal. With that in mind, the >5 TFLOPS offered by the RX 480 likely lies somewhere between the Radeon R9 390 and R9 390X in realized gaming output. If that is the case, the Radeon RX 480 should have performance somewhere between the GeForce GTX 970 and the GeForce GTX 980.
AMD claims that the RX 480 at $199 is set to offer a "premium VR experience" that has previously be limited to $500 graphics cards (another reference to the original price of the GTX 980 perhaps...). AMD claims this should have a dramatic impact on increasing the TAM (total addressable market) for VR.
In a notable market survey, price was a leading barrier to adoption of VR. The $199 SEP for select Radeon™ RX Series GPUs is an integral part of AMD’s strategy to dramatically accelerate VR adoption and unleash the VR software ecosystem. AMD expects that its aggressive pricing will jumpstart the growth of the addressable market for PC VR and accelerate the rate at which VR headsets drop in price:
- More affordable VR-ready desktops and notebooks
- Making VR accessible to consumers in retail
- Unleashing VR developers on a larger audience
- Reducing the cost of entry to VR
AMD calls this strategy of starting with the mid-range product its "Water Drop" strategy with the goal "at releasing new graphics architectures in high volume segments first to support continued market share growth for Radeon GPUs."
So what do you guys think? Are you impressed with what Polaris looks like its going to be now?
GP104 Strikes Again
It’s only been three weeks since NVIDIA unveiled the GeForce GTX 1080 and GTX 1070 graphics cards at a live streaming event in Austin, TX. But it feels like those two GPUs, one of which hasn't even been reviewed until today, have already drastically shifted the landscape of graphics, VR and PC gaming.
Half of the “new GPU” stories are told, with AMD due to follow up soon with Polaris, but it was clear to anyone watching the enthusiast segment with a hint of history that a line was drawn in the sand that day. There is THEN, and there is NOW. Today’s detailed review of the GeForce GTX 1070 completes NVIDIA’s first wave of NOW products, following closely behind the GeForce GTX 1080.
Interestingly, and in a move that is very uncharacteristic of NVIDIA, detailed specifications of the GeForce GTX 1070 were released on GeForce.com well before today’s reviews. With information on the CUDA core count, clock speeds, and memory bandwidth it was possible to get a solid sense of where the GTX 1070 performed; and I imagine that many of you already did the napkin math to figure that out. There is no more guessing though - reviews and testing are all done, and I think you'll find that the GTX 1070 is as exciting, if not more so, than the GTX 1080 due to the performance and pricing combination that it provides.
Let’s dive in.
First, Some Background
NVIDIA's Rumored GP102
When GP100 was announced, Josh and I were discussing, internally, how it would make sense in the gaming industry. Recently, an article on WCCFTech cited anonymous sources, which should always be taken with a dash of salt, that claimed NVIDIA was planning a second architecture, GP102, between GP104 and GP100. As I was writing this editorial about it, relating it to our own speculation about the physics of Pascal, VideoCardz claims to have been contacted by the developers of AIDA64, seemingly on-the-record, also citing a GP102 design.
I will retell chunks of the rumor, but also add my opinion to it.
In the last few generations, each architecture had a flagship chip that was released in both gaming and professional SKUs. Neither audience had access to a chip that was larger than the other's largest of that generation. Clock rates and disabled portions varied by specific product, with gaming usually getting the more aggressive performance for slightly better benchmarks. Fermi had GF100/GF110, Kepler had GK110/GK210, and Maxwell had GM200. Each of these were available in Tesla, Quadro, and GeForce cards, especially Titans.
Maxwell was interesting, though. NVIDIA was unable to leave 28nm, which Kepler launched on, so they created a second architecture at that node. To increase performance without having access to more feature density, you need to make your designs bigger, more optimized, or more simple. GM200 was giant and optimized, but, to get the performance levels it achieved, also needed to be more simple. Something needed to go, and double-precision (FP64) performance was the big omission. NVIDIA was upfront about it at the Titan X launch, and told their GPU compute customers to keep purchasing Kepler if they valued FP64.
A new architecture with GP104
Table of Contents
- Asynchronous compute discussion
- Is only 2-Way SLI supported?
- Overclocking over 2.0 GHz
- Dissecting the Founders Edition
- Benchmarks begin
- VR Testing
- Impressive power efficiency
- Performance per dollar discussion
- Ansel screenshot tool
The summer of change for GPUs has begun with today’s review of the GeForce GTX 1080. NVIDIA has endured leaks, speculation and criticism for months now, with enthusiasts calling out NVIDIA for not including HBM technology or for not having asynchronous compute capability. Last week NVIDIA’s CEO Jen-Hsun Huang went on stage and officially announced the GTX 1080 and GTX 1070 graphics cards with a healthy amount of information about their supposed performance and price points. Issues around cost and what exactly a Founders Edition is aside, the event was well received and clearly showed a performance and efficiency improvement that we were not expecting.
The question is, does the actual product live up to the hype? Can NVIDIA overcome some users’ negative view of the Founders Edition to create a product message that will get the wide range of PC gamers looking for an upgrade path an option they’ll take?
I’ll let you know through the course of this review, but what I can tell you definitively is that the GeForce GTX 1080 clearly sits alone at the top of the GPU world.
NVIDIA's Ansel Technology
“In-game photography” is an interesting concept. Not too long ago, it was difficult to just capture the user's direct experience with a title. Print screen could only hold a single screenshot at a time, which allowed Steam and FRAPS to provide a better user experience. FRAPS also made video more accessible to the end-user, but it output huge files and, while it wasn't too expensive, it needed to be purchased online, which was a big issue ten-or-so years ago.
Seeing that their audience would enjoy video captures, NVIDIA introduced ShadowPlay a couple of years ago. The feature allowed users to, not only record video, but also capture the last few minutes. It did this with hardware acceleration, and it did this for free (for compatible GPUs). While I don't use ShadowPlay, preferring the control of OBS, it's a good example of how NVIDIA wants to support their users. They see these features as a value-add, which draw people to their hardware.
History and Specifications
The Radeon Pro Duo had an interesting history. Originally shown as an unbranded, dual-GPU PCB during E3 2015, which took place last June, AMD touted it as the ultimate graphics card for both gamers and professionals. At that time, the company thought that an October launch was feasible, but that clearly didn’t work out. When pressed for information in the Oct/Nov timeframe, AMD said that they had delayed the product into Q2 2016 to better correlate with the launch of the VR systems from Oculus and HTC/Valve.
During a GDC press event in March, AMD finally unveiled the Radeon Pro Duo brand, but they were also walking back on the idea of the dual-Fiji beast being aimed at the gaming crowd, even partially. Instead, the company talked up the benefits for game developers and content creators, such as its 8192 stream processors for offline rendering, or even to aid game devs in the implementation and improvement of multi-GPU for upcoming games.
Anyone that pays attention to the graphics card market can see why AMD would make the positional shift with the Radeon Pro Duo. The Fiji architecture is on the way out, with Polaris due out in June by AMD’s own proclamation. At $1500, the Radeon Pro Duo will be a stark contrast to the prices of the Polaris GPUs this summer, and it is well above any NVIDIA-priced part in the GeForce line. And, though CrossFire has made drastic improvements over the last several years thanks to new testing techniques, the ecosystem for multi-GPU is going through a major shift with both DX12 and VR bearing down on it.
So yes, the Radeon Pro Duo has both RADEON and PRO right there in the name. What’s a respectable PC Perspective graphics reviewer supposed to do with a card like that if it finds its way into your office? Test it of course! I’ll take a look at a handful of recent games as well as a new feature that AMD has integrated with 3DS Max called FireRender to showcase some of the professional chops of the new card.
The Dual-Fiji Card Finally Arrives
This weekend, leaks of information on both WCCFTech and VideoCardz.com have revealed all the information about the pending release of AMD’s dual-GPU giant, the Radeon Pro Duo. While no one at PC Perspective has been briefed on the product officially, all of the interesting data surrounding the product is clearly outlined in the slides on those websites, minus some independent benchmark testing that we are hoping to get to next week. Based on the report from both sites, the Radeon Pro Duo will be released on April 26th.
AMD actually revealed the product and branding for the Radeon Pro Duo back in March, during its live streamed Capsaicin event surrounding GDC. At that point we were given the following information:
- Dual Fiji XT GPUs
- 8GB of total HBM memory
- 4x DisplayPort (this has since been modified)
- 16 TFLOPS of compute
- $1499 price tag
The design of the card follows the same industrial design as the reference designs of the Radeon Fury X, and integrates a dual-pump cooler and external fan/radiator to keep both GPUs running cool.
Based on the slides leaked out today, AMD has revised the Radeon Pro Duo design to include a set of three DisplayPort connections and one HDMI port. This was a necessary change as the Oculus Rift requires an HDMI port to work; only the HTC Vive has built in support for a DisplayPort connection and even in that case you would need a full-size to mini-DisplayPort cable.
The 8GB of HBM (high bandwidth memory) on the card is split between the two Fiji XT GPUs on the card, just like other multi-GPU options on the market. The 350 watts power draw mark is exceptionally high, exceeded only by AMD’s previous dual-GPU beast, the Radeon 295X2 that used 500+ watts and the NVIDIA GeForce GTX Titan Z that draws 375 watts!
Here is the specification breakdown of the Radeon Pro Duo. The card has 8192 total stream processors and 128 Compute Units, split evenly between the two GPUs. You are getting two full Fiji XT GPUs in this card, an impressive feat made possible in part by the use of High Bandwidth Memory and its smaller physical footprint.
|Radeon Pro Duo||R9 Nano||R9 Fury||R9 Fury X||GTX 980 Ti||TITAN X||GTX 980||R9 290X|
|GPU||Fiji XT x 2||Fiji XT||Fiji Pro||Fiji XT||GM200||GM200||GM204||Hawaii XT|
|Rated Clock||up to 1000 MHz||up to 1000 MHz||1000 MHz||1050 MHz||1000 MHz||1000 MHz||1126 MHz||1000 MHz|
|Memory||8GB (4GB x 2)||4GB||4GB||4GB||6GB||12GB||4GB||4GB|
|Memory Clock||500 MHz||500 MHz||500 MHz||500 MHz||7000 MHz||7000 MHz||7000 MHz||5000 MHz|
|Memory Interface||4096-bit (HMB) x 2||4096-bit (HBM)||4096-bit (HBM)||4096-bit (HBM)||384-bit||384-bit||256-bit||512-bit|
|Memory Bandwidth||1024 GB/s||512 GB/s||512 GB/s||512 GB/s||336 GB/s||336 GB/s||224 GB/s||320 GB/s|
|TDP||350 watts||175 watts||275 watts||275 watts||250 watts||250 watts||165 watts||290 watts|
|Peak Compute||16.38 TFLOPS||8.19 TFLOPS||7.20 TFLOPS||8.60 TFLOPS||5.63 TFLOPS||6.14 TFLOPS||4.61 TFLOPS||5.63 TFLOPS|
|Transistor Count||8.9B x 2||8.9B||8.9B||8.9B||8.0B||8.0B||5.2B||6.2B|
The Radeon Pro Duo has a rated clock speed of up to 1000 MHz. That’s the same clock speed as the R9 Fury and the rated “up to” frequency on the R9 Nano. It’s worth noting that we did see a handful of instances where the R9 Nano’s power limiting capability resulted in some extremely variable clock speeds in practice. AMD recently added a feature to its Crimson driver to disable power metering on the Nano, at the expense of more power draw, and I would assume the same option would work for the Pro Duo.
93% of a GP100 at least...
NVIDIA has announced the Tesla P100, the company's newest (and most powerful) accelerator for HPC. Based on the Pascal GP100 GPU, the Tesla P100 is built on 16nm FinFET and uses HBM2.
NVIDIA provided a comparison table, which we added what we know about a full GP100 to:
|Tesla K40||Tesla M40||Tesla P100||Full GP100|
|GPU||GK110 (Kepler)||GM200 (Maxwell)||GP100 (Pascal)||GP100 (Pascal)|
|FP32 CUDA Cores / SM||192||128||64||64|
|FP32 CUDA Cores / GPU||2880||3072||3584||3840|
|FP64 CUDA Cores / SM||64||4||32||32|
|FP64 CUDA Cores / GPU||960||96||1792||1920|
|Base Clock||745 MHz||948 MHz||1328 MHz||TBD|
|GPU Boost Clock||810/875 MHz||1114 MHz||1480 MHz||TBD|
|Memory Interface||384-bit GDDR5||384-bit GDDR5||4096-bit HBM2||4096-bit HBM2|
|Memory Size||Up to 12 GB||Up to 24 GB||16 GB||TBD|
|L2 Cache Size||1536 KB||3072 KB||4096 KB||TBD|
|Register File Size / SM||256 KB||256 KB||256 KB||256 KB|
|Register File Size / GPU||3840 KB||6144 KB||14336 KB||15360 KB|
|TDP||235 W||250 W||300 W||TBD|
|Transistors||7.1 billion||8 billion||15.3 billion||15.3 billion|
|GPU Die Size||551 mm2||601 mm2||610 mm2||610mm2|
|Manufacturing Process||28 nm||28 nm||16 nm||16nm|
This table is designed for developers that are interested in GPU compute, so a few variables (like ROPs) are still unknown, but it still gives us a huge insight into the “big Pascal” architecture. The jump to 16nm allows for about twice the number of transistors, 15.3 billion, up from 8 billion with GM200, with roughly the same die area, 610 mm2, up from 601 mm2.
A full GP100 processor will have 60 shader modules, compared to GM200's 24, although Pascal stores half of the shaders per SM. The GP100 part that is listed in the table above is actually partially disabled, cutting off four of the sixty total. This leads to 3584 single-precision (32-bit) CUDA cores, which is up from 3072 in GM200. (The full GP100 architecture will have 3840 of these FP32 CUDA cores -- but we don't know when or where we'll see that.) The base clock is also significantly higher than Maxwell, 1328 MHz versus ~1000 MHz for the Titan X and 980 Ti, although Ryan has overclocked those GPUs to ~1390 MHz with relative ease. This is interesting, because even though 10.6 TeraFLOPs is amazing, it's only about 20% more than what GM200 could pull off with an overclock.
Why things are different in VR performance testing
It has been an interesting past several weeks and I find myself in an interesting spot. Clearly, and without a shred of doubt, virtual reality, more than any other gaming platform that has come before it, needs an accurate measure of performance and experience. With traditional PC gaming, if you dropped a couple of frames, or saw a slightly out of sync animation, you might notice and get annoyed. But in VR, with a head-mounted display just inches from your face taking up your entire field of view, a hitch in frame or a stutter in motion can completely ruin the immersive experience that the game developer is aiming to provide. Even worse, it could cause dizziness, nausea and define your VR experience negatively, likely killing the excitement of the platform.
My conundrum, and the one that I think most of our industry rests in, is that we don’t yet have the tools and ability to properly quantify the performance of VR. In a market and a platform that so desperately needs to get this RIGHT, we are at a point where we are just trying to get it AT ALL. I have read and seen some other glances at performance of VR headsets like the Oculus Rift and the HTC Vive released today, but honest all are missing the mark at some level. Using tools built for traditional PC gaming environments just doesn’t work, and experiential reviews talk about what the gamer can expect to “feel” but lack the data and analysis to back it up and to help point the industry in the right direction to improve in the long run.
With final hardware from both Oculus and HTC / Valve in my hands for the last three weeks, I have, with the help of Ken and Allyn, been diving into the important question of HOW do we properly test VR? I will be upfront: we don’t have a final answer yet. But we have a direction. And we have some interesting results to show you that should prove we are on the right track. But we’ll need help from the likes of Valve, Oculus, AMD, NVIDIA, Intel and Microsoft to get it right. Based on a lot of discussion I’ve had in just the last 2-3 days, I think we are moving in the correct direction.
Why things are different in VR performance testing
So why don’t our existing tools work for testing performance in VR? Things like Fraps, Frame Rating and FCAT have revolutionized performance evaluation for PCs – so why not VR? The short answer is that the gaming pipeline changes in VR with the introduction of two new SDKs: Oculus and OpenVR.
Though both have differences, the key is that they are intercepting the draw ability from the GPU to the screen. When you attach an Oculus Rift or an HTC Vive to your PC it does not show up as a display in your system; this is a change from the first developer kits from Oculus years ago. Now they are driven by what’s known as “direct mode.” This mode offers improved user experiences and the ability for the Oculus an OpenVR systems to help with quite a bit of functionality for game developers. It also means there are actions being taken on the rendered frames after we can last monitor them. At least for today.
A system worthy of VR!
Early this year I started getting request after request for hardware suggestions for upcoming PC builds for VR. The excitement surrounding the Oculus Rift and the HTC Vive has caught fire across all spectrums of technology, from PC enthusiasts to gaming enthusiasts to just those of you interested in a technology that has been "right around the corner" for decades. The requests for build suggestions spanned our normal readership as well as those that had previously only focused on console gaming, and thus the need for a selection of build guides began.
I launched build guides for $900 and $1500 price points earlier in the week, but today we look at the flagship option, targeting a budget of $2500. Though this is a pricey system that should not be undertaken lightly, it is far from a "crazy expensive" build with multiple GPUs, multiple CPUs or high dollar items unnecessary for gaming and VR.
With that in mind, let's jump right into the information you are looking for: the components we recommend.
|VR Build Guide
$2500 Spring 2016
|Component||Amazon.com Link||B&H Photo Link|
|Processor||Intel Core i7-5930K||$527||$578|
|Motherboard||ASUS X99-A USB 3.1||$264||$259|
|Memory||Corsair Dominator Platinum 16GB DDR4-3000||$169|
|Graphics Card||ASUS GeForce GTX 980 Ti STRIX||$659||$669|
|Storage||512GB Samsung 950 Pro
Western Digital Red 4TB
|Power Supply||Corsair HX750i Platinum||$144||$149|
|CPU Cooler||Corsair H100i v2||$107||$107|
|Case||Corsair Carbide 600C||$149||$141|
|Total Price||Full cart - $2,519|
For those of you interested in a bit more detail on the why of the parts selection, rather than just the what, I have some additional information for you.
Unlike the previous two builds that used Intel's consumer Skylake processors, our $2500 build moves to the Haswell-E platform, an enthusiast design that comes from the realm of workstation products. The Core i7-5930K is a 6-core processor with HyperThreading, allowing for 12 addressable threads. Though we are targeting this machine for VR gaming, the move to this processor will mean better performance for other tasks as well including video encoding, photo editing and more. It's unlocked too - so if you want to stretch that clock speed up via overclocking, you have the flexibility for that.
Update: Several people have pointed out that the Core i7-5820K is a very similar processor to the 5930K, with a $100-150 price advantage. It's another great option if you are looking to save a bit more money, and you don't expect to want/need the additional PCI Express lanes the 5930K offers (40 lanes versus 28 lanes).
With the transition to Haswell-E we have an ASUS X99-A USB 3.1 motherboard. This board is the first in our VR builds to support not just 2-Way SLI and CrossFire but 3-Way as well if we find that VR games and engines are able to consistently and properly integrate support for multi-GPU. This recently updated board from ASUS includes USB 3.1 support as you can tell from the name, includes 8 slots for DDR4 memory and offers enough PCIe lanes for expansion in all directions.
Looking to build a PC for the very first time, or need a refresher? You can find our recent step-by-step build videos to help you through the process right here!!
For our graphics card we have gone with the ASUS GeForce GTX 980 Ti Strix. The 980 Ti is the fastest single GPU solution on the market today and with 6GB of memory on-board should be able to handle anything that VR can toss at it. In terms of compute performance the 980 Ti is more than 40% faster than the GTX 980, the GPU used in our $1500 solution. The Strix integration uses a custom cooler that performs much better than the stock solution and is quieter.
Some Hints as to What Comes Next
On March 14 at the Capsaicin event at GDC AMD disclosed their roadmap for GPU architectures through 2018. There were two new names in attendance as well as some hints at what technology will be implemented in these products. It was only one slide, but some interesting information can be inferred from what we have seen and what was said in the event and afterwards during interviews.
Polaris the the next generation of GCN products from AMD that have been shown off for the past few months. Previously in December and at CES we saw the Polaris 11 GPU on display. Very little is known about this product except that it is small and extremely power efficient. Last night we saw the Polaris 10 being run and we only know that it is competitive with current mainstream performance and is larger than the Polaris 11. These products are purportedly based on Samsung/GLOBALFOUNDRIES 14nm LPP.
The source of near endless speculation online.
In the slide AMD showed it listed Polaris as having 2.5X the performance per watt over the previous 28 nm products in AMD’s lineup. This is impressive, but not terribly surprising. AMD and NVIDIA both skipped the 20 nm planar node because it just did not offer up the type of performance and scaling to make sense economically. Simply put, the expense was not worth the results in terms of die size improvements and more importantly power scaling. 20 nm planar just could not offer the type of performance overall that GPU manufacturers could achieve with 2nd and 3rd generation 28nm processes.
What was missing from the slide is mention that Polaris will integrate either HMB1 or HBM2. Vega, the architecture after Polaris, does in fact list HBM2 as the memory technology it will be packaged with. It promises another tick up in terms of performance per watt, but that is going to come more from aggressive design optimizations and likely improvements on FinFET process technologies. Vega will be a 2017 product.
Beyond that we see Navi. It again boasts an improvement in perf per watt as well as the inclusion of a new memory technology behind HBM. Current conjecture is that this could be HMC (hybrid memory cube). I am not entirely certain of that particular conjecture as it does not necessarily improve upon the advantages of current generation HBM and upcoming HBM2 implementations. Navi will not show up until 2018 at the earliest. This *could* be a 10 nm part, but considering the struggle that the industry has had getting to 14/16nm FinFET I am not holding my breath.
AMD provided few details about these products other than what we see here. From here on out is conjecture based upon industry trends, analysis of known roadmaps, and the limitations of the process and memory technologies that are already well known.
Shedding a little light on Monday's announcement
Most of our readers should have some familiarity with GameWorks, which is a series of libraries and utilities that help game developers (and others) create software. While many hardware and platform vendors provide samples and frameworks, taking the brunt of the work required to solve complex problems, this is NVIDIA's branding for their suite of technologies. Their hope is that it pushes the industry forward, which in turn drives GPU sales as users see the benefits of upgrading.
This release, GameWorks SDK 3.1, contains three complete features and two “beta” ones. We will start with the first three, each of which target a portion of the lighting and shadowing problem. The last two, which we will discuss at the end, are the experimental ones and fall under the blanket of physics and visual effects.
The first technology is Volumetric Lighting, which simulates the way light scatters off dust in the atmosphere. Game developers have been approximating this effect for a long time. In fact, I remember a particular section of Resident Evil 4 where you walk down a dim hallway that has light rays spilling in from the windows. Gamecube-era graphics could only do so much, though, and certain camera positions show that the effect was just a translucent, one-sided, decorative plane. It was a cheat that was hand-placed by a clever artist.
GameWorks' Volumetric Lighting goes after the same effect, but with a much different implementation. It looks at the generated shadow maps and, using hardware tessellation, extrudes geometry from the unshadowed portions toward the light. These little bits of geometry sum, depending on how deep the volume is, which translates into the required highlight. Also, since it's hardware tessellated, it probably has a smaller impact on performance because the GPU only needs to store enough information to generate the geometry, not store (and update) the geometry data for all possible light shafts themselves -- and it needs to store those shadow maps anyway.
Even though it seemed like this effect was independent of render method, since it basically just adds geometry to the scene, I asked whether it was locked to deferred rendering methods. NVIDIA said that it should be unrelated, as I suspected, which is good for VR. Forward rendering is easier to anti-alias, which makes the uneven pixel distribution (after lens distortion) appear more smooth.
A start to proper testing
During all the commotion last week surrounding the release of a new Ashes of the Singularity DX12 benchmark, Microsoft's launching of the Gears of War Ultimate Edition on the Windows Store and the company's supposed desire to merge Xbox and PC gaming, a constant source of insight for me was one Andrew Lauritzen. Andrew is a graphics guru at Intel and has extensive knowledge of DirectX, rendering, engines, etc. and has always been willing to teach and educate me on areas that crop up. The entire DirectX 12 and Unified Windows Platform was definitely one such instance.
Yesterday morning Andrew pointed me to a GitHub release for a tool called PresentMon, a small sample of code written by a colleague of Andrew's that might be the beginnings of being able to properly monitor performance of DX12 games and even UWP games.
The idea is simple and it's implementation even more simple: PresentMon monitors the Windows event tracing stack for present commands and records data about them to a CSV file. Anyone familiar with the kind of ETW data you can gather will appreciate that PresentMon culls out nearly all of the headache of data gathering by simplifying the results into application name/ID, Present call deltas and a bit more.
Gears of War Ultimate Edition - the debated UWP version
The "Present" method in Windows is what produces a frame and shows it to the user. PresentMon looks at the Windows events running through the system, takes note of when those present commands are received by the OS for any given application, and records the time between them. Because this tool runs at the OS level, it can capture Present data from all kinds of APIs including DX12, DX11, OpenGL, Vulkan and more. It does have limitations though - it is read only so producing an overlay on the game/application being tested isn't possible today. (Or maybe ever in the case of UWP games.)
What PresentMon offers us at this stage is an early look at a Fraps-like performance monitoring tool. In the same way that Fraps was looking for Present commands from Windows and recording them, PresentMon does the same thing, at a very similar point in the rendering pipeline as well. What is important and unique about PresentMon is that it is API independent and useful for all types of games and programs.
PresentMon at work
The first and obvious question for our readers is how this performance monitoring tool compares with Frame Rating, our FCAT-based capture benchmarking platform we have used on GPUs and CPUs for years now. To be honest, it's not the same and should not be considered an analog to it. Frame Rating and capture-based testing looks for smoothness, dropped frames and performance at the display, while Fraps and PresentMon look at performance closer to the OS level, before the graphics driver really gets the final say in things. I am still targeting for universal DX12 Frame Rating testing with exclusive full screen capable applications and expect that to be ready sooner rather than later. However, what PresentMon does give us is at least an early universal look at DX12 performance including games that are locked behind the Windows Store rules.
Things are about to get...complicated
Earlier this week, the team behind Ashes of the Singularity released an updated version of its early access game, which updated its features and capabilities. With support for DirectX 11 and DirectX 12, and adding in multiple graphics card support, the game featured a benchmark mode that got quite a lot of attention. We saw stories based on that software posted by Anandtech, Guru3D and ExtremeTech, all of which had varying views on the advantages of one GPU or another.
That isn’t the focus of my editorial here today, though.
Shortly after the initial release, a discussion began around results from the Guru3D story that measured frame time consistency and smoothness with FCAT, a capture based testing methodology much like the Frame Rating process we have here at PC Perspective. In that post on ExtremeTech, Joel Hruska claims that the results and conclusion from Guru3D are wrong because the FCAT capture methods make assumptions on the output matching what the user experience feels like. Maybe everyone is wrong?
First a bit of background: I have been working with Oxide and the Ashes of the Singularity benchmark for a couple of weeks, hoping to get a story that I was happy with and felt was complete, before having to head out the door to Barcelona for the Mobile World Congress. That didn’t happen – such is life with an 8-month old. But, in my time with the benchmark, I found a couple of things that were very interesting, even concerning, that I was working through with the developers.
FCAT overlay as part of the Ashes benchmark
First, the initial implementation of the FCAT overlay, which Oxide should be PRAISED for including since we don’t have and likely won’t have a DX12 universal variant of, was implemented incorrectly, with duplication of color swatches that made the results from capture-based testing inaccurate. I don’t know if Guru3D used that version to do its FCAT testing, but I was able to get some updated EXEs of the game through the developer in order to the overlay working correctly. Once that was corrected, I found yet another problem: an issue of frame presentation order on NVIDIA GPUs that likely has to do with asynchronous shaders. Whether that issue is on the NVIDIA driver side or the game engine side is still being investigated by Oxide, but it’s interesting to note that this problem couldn’t have been found without a proper FCAT implementation.
With all of that under the bridge, I set out to benchmark this latest version of Ashes and DX12 to measure performance across a range of AMD and NVIDIA hardware. The data showed some abnormalities, though. Some results just didn’t make sense in the context of what I was seeing in the game and what the overlay results were indicating. It appeared that Vsync (vertical sync) was working differently than I had seen with any other game on the PC.
For the NVIDIA platform, tested using a GTX 980 Ti, the game seemingly randomly starts up with Vsync on or off, with no clear indicator of what was causing it, despite the in-game settings being set how I wanted them. But the Frame Rating capture data was still working as I expected – just because Vsync is enabled doesn’t mean you can look at the results in capture formats. I have written stories on what Vsync enabled captured data looks like and what it means as far back as April 2013. Obviously, to get the best and most relevant data from Frame Rating, setting vertical sync off is ideal. Running into more frustration than answers, I moved over to an AMD platform.
Caught Up to DirectX 12 in a Single Day
I'm not just talking about the specification. Members of the Khronos Group have also released compatible drivers, SDKs and tools to support them, conformance tests, and a proof-of-concept patch for Croteam's The Talos Principle. To reiterate, this is not a soft launch. The API, and its entire ecosystem, is out and ready for the public on Windows (at least 7+ at launch but a surprise Vista or XP announcement is technically possible) and several distributions of Linux. Google will provide an Android SDK in the near future.
I'm going to editorialize for the next two paragraphs. There was a concern that Vulkan would be too late. The thing is, as of today, Vulkan is now just as mature as DirectX 12. Of course, that could change at a moment's notice; we still don't know how the two APIs are being adopted behind the scenes. A few DirectX 12 titles are planned to launch in a few months, but no full, non-experimental, non-early access game currently exists. Each time I say this, someone links the Wikipedia list of DirectX 12 games. If you look at each entry, though, you'll see that all of them are either: early access, awaiting an unreleased DirectX 12 patch, or using a third-party engine (like Unreal Engine 4) that only list DirectX 12 as an experimental preview. No full, released, non-experimental DirectX 12 game exists today. Besides, if the latter counts, then you'll need to accept The Talos Principle's proof-of-concept patch, too.
But again, that could change. While today's launch speaks well to the Khronos Group and the API itself, it still needs to be adopted by third party engines, middleware, and software. These partners could, like the Khronos Group before today, be privately supporting Vulkan with the intent to flood out announcements; we won't know until they do... or don't. With the support of popular engines and frameworks, dependent software really just needs to enable it. This has not happened for DirectX 12 yet, and, now, there doesn't seem to be anything keeping it from happening for Vulkan at any moment. With the Game Developers Conference just a month away, we should soon find out.
But back to the announcement.
Vulkan-compatible drivers are launching today across multiple vendors and platforms, but I do not have a complete list. On Windows, I was told to expect drivers from NVIDIA for Windows 7, 8.x, 10 on Kepler and Maxwell GPUs. The standard is compatible with Fermi GPUs, but NVIDIA does not plan on supporting the API for those users due to its low market share. That said, they are paying attention to user feedback and they are not ruling it out, which probably means that they are keeping an open mind in case some piece of software gets popular and depends upon Vulkan. I have not heard from AMD or Intel about Vulkan drivers as of this writing, one way or the other. They could even arrive day one.
On Linux, NVIDIA, Intel, and Imagination Technologies have submitted conformant drivers.
Drivers alone do not make a hard launch, though. SDKs and tools have also arrived, including the LunarG SDK for Windows and Linux. LunarG is a company co-founded by Lens Owen, who had a previous graphics software company that was purchased by VMware. LunarG is backed by Valve, who also backed Vulkan in several other ways. The LunarG SDK helps developers validate their code, inspect what the API is doing, and otherwise debug. Even better, it is also open source, which means that the community can rapidly enhance it, even though it's in a releasable state as it is. RenderDoc,
the open-source graphics debugger by Crytek, will also add Vulkan support. ((Update (Feb 16 @ 12:39pm EST): Baldur Karlsson has just emailed me to let me know that it was a personal project at Crytek, not a Crytek project in general, and their GitHub page is much more up-to-date than the linked site.))
The major downside is that Vulkan (like Mantle and DX12) isn't simple.
These APIs are verbose and very different from previous ones, which requires more effort.
Image Credit: NVIDIA
There really isn't much to say about the Vulkan launch beyond this. What graphics APIs really try to accomplish is standardizing signals that enter and leave video cards, such that the GPUs know what to do with them. For the last two decades, we've settled on an arbitrary, single, global object that you attach buffers of data to, in specific formats, and call one of a half-dozen functions to send it.
Compute APIs, like CUDA and OpenCL, decided it was more efficient to handle queues, allowing the application to write commands and send them wherever they need to go. Multiple threads can write commands, and multiple accelerators (GPUs in our case) can be targeted individually. Vulkan, like Mantle and DirectX 12, takes this metaphor and adds graphics-specific instructions to it. Moreover, GPUs can schedule memory, compute, and graphics instructions at the same time, as long as the graphics task has leftover compute and memory resources, and / or the compute task has leftover memory resources.
This is not necessarily a “better” way to do graphics programming... it's different. That said, it has the potential to be much more efficient when dealing with lots of simple tasks that are sent from multiple CPU threads, especially to multiple GPUs (which currently require the driver to figure out how to convert draw calls into separate workloads -- leading to simplifications like mirrored memory and splitting workload by neighboring frames). Lots of tasks aligns well with video games, especially ones with lots of simple objects, like strategy games, shooters with lots of debris, or any game with large crowds of people. As it becomes ubiquitous, we'll see this bottleneck disappear and games will not need to be designed around these limitations. It might even be used for drawing with cross-platform 2D APIs, like Qt or even webpages, although those two examples (especially the Web) each have other, higher-priority bottlenecks. There are also other benefits to Vulkan.
The WebGL comparison is probably not as common knowledge as Khronos Group believes.
Still, Khronos Group was criticized when WebGL launched as "it was too tough for Web developers".
It didn't need to be easy. Frameworks arrived and simplified everything. It's now ubiquitous.
In fact, Adobe Animate CC (the successor to Flash Pro) is now a WebGL editor (experimentally).
Open platforms are required for this to become commonplace. Engines will probably target several APIs from their internal management APIs, but you can't target users who don't fit in any bucket. Vulkan brings this capability to basically any platform, as long as it has a compute-capable GPU and a driver developer who cares.
Thankfully, it arrived before any competitor established market share.
Early testing for higher end GPUs
UPDATE 2/5/16: Nixxes released a new version of Rise of the Tomb Raider today with some significant changes. I have added another page at the end of this story that looks at results with the new version of the game, a new AMD driver and I've also included some SLI and CrossFire results.
I will fully admit to being jaded by the industry on many occasions. I love my PC games and I love hardware but it takes a lot for me to get genuinely excited about anything. After hearing game reviewers talk up the newest installment of the Tomb Raider franchise, Rise of the Tomb Raider, since it's release on the Xbox One last year, I've been waiting for its PC release to give it a shot with real hardware. As you'll see in the screenshots and video in this story, the game doesn't appear to disappoint.
Rise of the Tomb Raider takes the exploration and "tomb raiding" aspects that made the first games in the series successful and applies them to the visual quality and character design brought in with the reboot of the series a couple years back. The result is a PC game that looks stunning at any resolution, but even more so in 4K, that pushes your hardware to its limits. For single GPU performance, even the GTX 980 Ti and Fury X struggle to keep their heads above water.
In this short article we'll look at the performance of Rise of the Tomb Raider with a handful of GPUs, leaning towards the high end of the product stack, and offer up my view on whether each hardware vendor is living up to expectations.
Are Computers Still Getting Faster?
It looks like CES is starting to wind down, which makes sense because it ended three days ago. Now that we're mostly caught up, I found a new video from The 8-Bit Guy. He doesn't really explain any old technologies in this one. Instead, he poses an open question about computer speed. He was able to have a functional computing experience on a ten-year-old Apple laptop, which made him wonder if the rate of computer advancement is slowing down.
I believe that he (and his guest hosts) made great points, but also missed a few important ones.
One of his main arguments is that software seems to have slowed down relative to hardware. I don't believe that is true, but I believe it's looking in the right area. PCs these days are more than capable of doing just about anything in terms of 2D user interface that we would want to, and do so with a lot of overhead for inefficient platforms and sub-optimal programming (relative to the 80's and 90's at the very least). The areas that require extra horsepower are usually doing large batches of many related tasks. GPUs are key in this area, and they are keeping up as fast as they can, despite some stagnation with fabrication processes and a difficulty (at least before HBM takes hold) in keeping up with memory bandwidth.
For the last five years to ten years or so, CPUs have been evolving toward efficiency as GPUs are being adopted for the tasks that need to scale up. I'm guessing that AMD, when they designed the Bulldozer architecture, hoped that GPUs would have been adopted much more aggressively, but even as graphics devices, they now have a huge effect on Web, UI, and media applications.
These are also tasks that can scale well between devices by lowering resolution (and so forth). The primary thing that a main CPU thread needs to do is figure out the system's state and keep the graphics card fed before the frame-train leaves the station. In my experience, that doesn't scale well (although you can sometimes reduce the amount of tracked objects for games and so forth). Moreover, it is easier to add GPU performance, compared to single-threaded CPU, because increasing frequency and single-threaded IPC should be more complicated than planning out more, duplicated blocks of shaders. These factors combine to give lower-end hardware a similar experience in the most noticeable areas.
So, up to this point, we discussed:
- Software is often scaling in ways that are GPU (and RAM) limited.
- CPUs are scaling down in power more than up in performance.
- GPU-limited tasks can often be approximated with smaller workloads.
- Software gets heavier, but it doesn't need to be "all the way up" (ex: resolution).
- Some latencies are hard to notice anyway.
Back to the Original Question
This is where “Are computers still getting faster?” can be open to interpretation.
Tasks are diverging from one class of processor into two, and both have separate industries, each with their own, multiple goals. As stated, CPUs are mostly progressing in power efficiency, which extends (an assumed to be) sufficient amount of performance downward to multiple types of devices. GPUs are definitely getting faster, but they can't do everything. At the same time, RAM is plentiful but its contribution to performance can be approximated with paging unused chunks to the hard disk or, more recently on Windows, compressing them in-place. Newer computers with extra RAM won't help as long as any single task only uses a manageable amount of it -- unless it's seen from a viewpoint that cares about multi-tasking.
In short, computers are still progressing, but the paths are now forked and winding.
AMD Polaris Architecture Coming Mid-2016
In early December, I was able to spend some time with members of the newly formed Radeon Technologies Group (RTG), which is a revitalized and compartmentalized section of AMD that is taking over all graphics work. During those meetings, I was able to learn quite a bit about the plans for RTG going forward, including changes for AMD FreeSync and implementation of HDR display technology, and their plans for the GPUOpen open-sourced game development platform. Perhaps most intriguing of all: we received some information about the next-generation GPU architecture, targeted for 2016.
Codenamed Polaris, this new architecture will be the 4th generation of GCN (Graphics Core Next), and it will be the first AMD GPU that is built on FinFET process technology. These two changes combined promise to offer the biggest improvement in performance per watt, generation to generation, in AMD’s history.
Though the amount of information provided about the Polaris architecture is light, RTG does promise some changes to the 4th iteration of its GCN design. Those include primitive discard acceleration, an improved hardware scheduler, better pre-fetch, increased shader efficiency, and stronger memory compression. We have already discussed in a previous story that the new GPUs will include HDMI 2.0a and DisplayPort 1.3 display interfaces, which offer some impressive new features and bandwidth. From a multimedia perspective, Polaris will be the first GPU to include support for h.265 4K decode and encode acceleration.
This slide shows us quite a few changes, most of which were never discussed specifically that we can report, coming to Polaris. Geometry processing and the memory controller stand out as potentially interesting to me – AMD’s Fiji design continues to lag behind NVIDIA’s Maxwell in terms of tessellation performance and we would love to see that shift. I am also very curious to see how the memory controller is configured on the entire Polaris lineup of GPUs – we saw the introduction of HBM (high bandwidth memory) with the Fury line of cards.
May the Radeon be with You
In celebration of the release of The Force Awakens as well as the new Star Wars Battlefront game from DICE and EA, AMD sent over some hardware for us to use in a system build, targeted at getting users up and running in Battlefront with impressive quality and performance, but still on a reasonable budget. Pairing up an AMD processor, MSI motherboard, Sapphire GPU with a low cost chassis, SSD and more, the combined system includes a FreeSync monitor for around $1,200.
Holiday breaks are MADE for Star Wars Battlefront
Though the holiday is already here and you'd be hard pressed to build this system in time for it, I have a feeling that quite a few of our readers and viewers will find themselves with some cash and gift certificates in hand, just ITCHING for a place to invest in a new gaming PC.
The video above includes a list of components, the build process (in brief) and shows us getting our gaming on with Star Wars Battlefront. Interested in building a system similar the one above on your own? Here's the hardware breakdown.
|AMD Powered Star Wars Battlefront System|
|Processor||AMD FX-8370 - $197
Cooler Master Hyper 212 EVO - $29
|Motherboard||MSI 990FXA Gaming - $137|
|Memory||AMD Radeon Memory DDR3-2400 - $79|
|Graphics Card||Sapphire NITRO Radeon R9 380X - $266|
|Storage||SanDisk Ultra II 240GB SSD - $79|
|Case||Corsair Carbide 300R - $68|
|Power Supply||Seasonic 600 watt 80 Plus - $69|
|Monitor||AOC G2460PF 1920x1080 144Hz FreeSync - $259|
|Total Price||Full System (without monitor) - Amazon.com - $924|
For under $1,000, plus another $250 or so for the AOC FreeSync capable 1080p monitor, you can have a complete gaming rig for your winter break. Let's detail some of the specific components.
AMD sent over the FX-8370 processor for our build, a 4-module / 8-core CPU that runs at 4.0 GHz, more than capable of handling any gaming work load you can toss at it. And if you need to do some transcoding, video work or, heaven forbid, school or productivity work, the FX-8370 has you covered there too.
For the motherboard AMD sent over the MSI 990FXA Gaming board, one of the newer AMD platforms that includes support for USB 3.1 so you'll have a good length of usability for future expansion. The Cooler Master Hyper 212 EVO cooler was our selection to keep the FX-8370 running smoothly and 8GB of AMD Radeon DDR3-2133 memory is enough for the system to keep applications and the Windows 10 operating system happy.