All | Editorial | General Tech | Graphics Cards | Networking | Motherboards | Cases and Cooling | Processors | Chipsets | Memory | Displays | Systems | Storage | Mobile | Shows and Expos
GPU Enthusiasts Are Throwing a FET
NVIDIA is rumored to launch Pascal in early (~April-ish) 2016, although some are skeptical that it will even appear before the summer. The design was finalized months ago, and unconfirmed shipping information claims that chips are being stockpiled, which is typical when preparing to launch a product. It is expected to compete against AMD's rumored Arctic Islands architecture, which will, according to its also rumored numbers, be very similar to Pascal.
This architecture is a big one for several reasons.
Image Credit: WCCFTech
First, it will jump two full process nodes. Current desktop GPUs are manufactured at 28nm, which was first introduced with the GeForce GTX 680 all the way back in early 2012, but Pascal will be manufactured on TSMC's 16nm FinFET+ technology. Smaller features have several advantages, but a huge one for GPUs is the ability to fit more complex circuitry in the same die area. This means that you can include more copies of elements, such as shader cores, and do more in fixed-function hardware, like video encode and decode.
That said, we got a lot more life out of 28nm than we really should have. Chips like GM200 and Fiji are huge, relatively power-hungry, and complex, which is a terrible idea to produce when yields are low. I asked Josh Walrath, who is our go-to for analysis of fab processes, and he believes that FinFET+ is probably even more complicated today than 28nm was in the 2012 timeframe, which was when it launched for GPUs.
It's two full steps forward from where we started, but we've been tiptoeing since then.
Image Credit: WCCFTech
Second, Pascal will introduce HBM 2.0 to NVIDIA hardware. HBM 1.0 was introduced with AMD's Radeon Fury X, and it helped in numerous ways -- from smaller card size to a triple-digit percentage increase in memory bandwidth. The 980 Ti can talk to its memory at about 300GB/s, while Pascal is rumored to push that to 1TB/s. Capacity won't be sacrificed, either. The top-end card is expected to contain 16GB of global memory, which is twice what any console has. This means less streaming, higher resolution textures, and probably even left-over scratch space for the GPU to generate content in with compute shaders. Also, according to AMD, HBM is an easier architecture to communicate with than GDDR, which should mean a savings in die space that could be used for other things.
Third, the architecture includes native support for three levels of floating point precision. Maxwell, due to how limited 28nm was, saved on complexity by reducing 64-bit IEEE 754 decimal number performance to 1/32nd of 32-bit numbers, because FP64 values are rarely used in video games. This saved transistors, but was a huge, order-of-magnitude step back from the 1/3rd ratio found on the Kepler-based GK110. While it probably won't be back to the 1/2 ratio that was found in Fermi, Pascal should be much better suited for GPU compute.
Image Credit: WCCFTech
Mixed precision could help video games too, though. Remember how I said it supports three levels? The third one is 16-bit, which is half of the format that is commonly used in video games. Sometimes, that is sufficient. If so, Pascal is said to do these calculations at twice the rate of 32-bit. We'll need to see whether enough games (and other applications) are willing to drop down in precision to justify the die space that these dedicated circuits require, but it should double the performance of anything that does.
So basically, this generation should provide a massive jump in performance that enthusiasts have been waiting for. Increases in GPU memory bandwidth and the amount of features that can be printed into the die are two major bottlenecks for most modern games and GPU-accelerated software. We'll need to wait for benchmarks to see how the theoretical maps to practical, but it's a good sign.
When approached a couple of weeks ago by Microsoft with the opportunity to take an early look at an upcoming performance benchmark built on a DX12 game pending release later this year, I of course was excited for the opportunity. Our adventure into the world of DirectX 12 and performance evaluation started with the 3DMark API Overhead Feature Test back in March and was followed by the release of the Ashes of the Singularity performance test in mid-August. Both of these tests were pinpointing one particular aspect of the DX12 API - the ability to improve CPU throughput and efficiency with higher draw call counts and thus enabling higher frame rates on existing GPUs.
This game and benchmark are beautiful...
Today we dive into the world of Fable Legends, an upcoming free to play based on the world of Albion. This title will be released on the Xbox One and for Windows 10 PCs and it will require the use of DX12. Though scheduled for release in Q4 of this year, Microsoft and Lionhead Studios allowed us early access to a specific performance test using the UE4 engine and the world of Fable Legends. UPDATE: It turns out that the game will have a fall-back DX11 mode that will be enabled if the game detects a GPU incapable of running DX12.
This benchmark focuses more on the GPU side of DirectX 12 - on improved rendering techniques and visual quality rather than on the CPU scaling aspects that made Ashes of the Singularity stand out from other graphics tests we have utilized. Fable Legends is more representative of what we expect to see with the release of AAA games using DX12. Let's dive into the test and our results!
Pack a full GTX 980 on the go!
For many years, the idea of a truly mobile gaming system has been attainable if you were willing to pay the premium for high performance components. But anyone that has done research in this field would tell you that though they were named similarly, the mobile GPUs from both AMD and NVIDIA had a tendency to be noticeably slower than their desktop counterparts. A GeForce GTX 970M, for example, only had a CUDA core count that was slightly higher than the desktop GTX 960, and it was 30% lower than the true desktop GTX 970 product. So even though you were getting fantastic mobile performance, there continued to be a dominant position that desktop users held over mobile gamers in PC gaming.
This fall, NVIDIA is changing that with the introduction of the GeForce GTX 980 for gaming notebooks. Notice I did not put an 'M' at the end of that name; it's not an accident. NVIDIA has found a way, through binning and component design, to cram the entirety of a GM204-based Maxwell GTX 980 GPU inside portable gaming notebooks.
The results are impressive and the implications for PC gamers are dramatic. Systems built with the GTX 980 will include the same 2048 CUDA cores, 4GB of GDDR5 running at 7.0 GHz and will run at the same base and typical GPU Boost clocks as the reference GTX 980 cards you can buy today for $499+. And, while you won't find this GPU in anything called a "thin and light", 17-19" gaming laptops do allow for portability of gaming unlike any SFF PC.
So how did they do it? NVIDIA has found a way to get a desktop GPU with a 165 watt TDP into a form factor that has a physical limit of 150 watts (for the MXM module implementations at least) through binning, component selection and improved cooling. Not only that, but there is enough headroom to allow for some desktop-class overclocking of the GTX 980 as well.
Specs and Hardware
The AMD Radeon Nano graphics card is unlike any product we have ever tested at PC Perspective. As I wrote and described to the best of my ability (without hardware in my hands) late last month, AMD is targeting a totally unique and different classification of hardware with this release. As a result, there is quite a bit of confusion, criticism, and concern about the Nano, and, to be upfront, not all of it is unwarranted.
After spending the past week with an R9 Nano here in the office, I am prepared to say this immediately: for users matching specific criteria, there is no other option that comes close to what AMD is putting on the table today. That specific demographic though is going to be pretty narrow, a fact that won’t necessarily hurt AMD simply due to the obvious production limitations of the Fiji and HBM architectures.
At $650, the R9 Nano comes with a flagship cost but it does so knowing full well that it will not compete in terms of raw performance against the likes of the GTX 980 Ti or AMD’s own Radeon R9 Fury X. However, much like Intel has done with the Ultrabook and ULV platforms, AMD is attempting to carve out a new market that is looking for dense, modest power GPUs in small form factors. Whether or not they have succeeded is what I am looking to determine today. Ride along with me as we journey on the roller coaster of a release that is the AMD Radeon R9 Nano.
To the Max?
Much of the PC enthusiast internet, including our comments section, has been abuzz with “Asynchronous Shader” discussion. Normally, I would explain what it is and then outline the issues that surround it, but I would like to swap that order this time. Basically, the Ashes of the Singularity benchmark utilizes Asynchronous Shaders in DirectX 12, but they disable it (by Vendor ID) for NVIDIA hardware. They say that this is because, while the driver reports compatibility, “attempting to use it was an unmitigated disaster in terms of performance and conformance”.
AMD's Robert Hallock claims that NVIDIA GPUs, including Maxwell, cannot support the feature in hardware at all, while all AMD GCN graphics cards do. NVIDIA has yet to respond to our requests for an official statement, although we haven't poked every one of our contacts yet. We will certainly update and/or follow up if we hear from them. For now though, we have no idea whether this is a hardware or software issue. Either way, it seems more than just politics.
So what is it?
Simply put, Asynchronous Shaders allows a graphics driver to cram workloads in portions of the GPU that are idle, but not otherwise available. For instance, if a graphics task is hammering the ROPs, the driver would be able to toss an independent physics or post-processing task into the shader units alongside it. Kollock from Oxide Games used the analogy of HyperThreading, which allows two CPU threads to be executed on the same core at the same time, as long as it has the capacity for it.
Kollock also notes that compute is becoming more important in the graphics pipeline, and it is possible to completely bypass graphics altogether. The fixed-function bits may never go away, but it's possible that at least some engines will completely bypass it -- maybe even their engine, several years down the road.
But, like always, you will not get an infinite amount of performance by reducing your waste. You are always bound by the theoretical limits of your components, and you cannot optimize past that (except for obviously changing the workload itself). The interesting part is: you can measure that. You can absolutely observe how long a GPU is idle, and represent it as a percentage of a time-span (typically a frame).
And, of course, game developers profile GPUs from time to time...
According to Kollock, he has heard of some console developers getting up to 30% increases in performance using Asynchronous Shaders. Again, this is on console hardware and so this amount may increase or decrease on the PC. In an informal chat with a developer at Epic Games, so massive grain of salt is required, his late night ballpark “totally speculative” guesstimate is that, on the Xbox One, the GPU could theoretically accept a maximum ~10-25% more work in Unreal Engine 4, depending on the scene. He also said that memory bandwidth gets in the way, which Asynchronous Shaders would be fighting against. It is something that they are interested in and investigating, though.
This is where I speculate on drivers. When Mantle was announced, I looked at its features and said “wow, this is everything that a high-end game developer wants, and a graphics developer absolutely does not”. From the OpenCL-like multiple GPU model taking much of the QA out of SLI and CrossFire, to the memory and resource binding management, this should make graphics drivers so much easier.
It might not be free, though. Graphics drivers might still have a bunch of games to play to make sure that work is stuffed through the GPU as tightly packed as possible. We might continue to see “Game Ready” drivers in the coming years, even though much of that burden has been shifted to the game developers. On the other hand, maybe these APIs will level the whole playing field and let all players focus on chip design and efficient injestion of shader code. As always, painfully always, time will tell.
The Tiniest Fiji
Way back on June 16th, AMD held a live stream event during E3 to announce a host of new products. In that group was the AMD Radeon R9 Fury X, R9 Fury and the R9 Nano. Of the three, the Nano was the most intriguing to most of the online press as it was the one we knew the least about. AMD promised a full Fiji GPU in a package with a 6-in PCB and a 175 watt TDP. Well today, AMD is, uh, re-announcing (??) the AMD Radeon R9 Nano with more details on specifications, performance and availability.
First, let’s get this out of the way: AMD is making this announcement today because they publicly promised the R9 Nano for August. And with the final days of summer creeping up on them, rather than answer questions about another delay, AMD is instead going the route of a paper launch, but one with a known end date. We will apparently get our samples of the hardware in early September with reviews and the on-sale date following shortly thereafter. (Update: AMD claims the R9 Nano will be on store shelves on September 10th and should have "critical mass" of availability.)
Now let’s get to the details that you are really here for. And rather than start with the marketing spin on the specifications that AMD presented to the media, let’s dive into the gory details right now.
|R9 Nano||R9 Fury||R9 Fury X||GTX 980 Ti||TITAN X||GTX 980||R9 290X|
|GPU||Fiji XT||Fiji Pro||Fiji XT||GM200||GM200||GM204||Hawaii XT|
|Rated Clock||1000 MHz||1000 MHz||1050 MHz||1000 MHz||1000 MHz||1126 MHz||1000 MHz|
|Memory Clock||500 MHz||500 MHz||500 MHz||7000 MHz||7000 MHz||7000 MHz||5000 MHz|
|Memory Interface||4096-bit (HBM)||4096-bit (HBM)||4096-bit (HBM)||384-bit||384-bit||256-bit||512-bit|
|Memory Bandwidth||512 GB/s||512 GB/s||512 GB/s||336 GB/s||336 GB/s||224 GB/s||320 GB/s|
|TDP||175 watts||275 watts||275 watts||250 watts||250 watts||165 watts||290 watts|
|Peak Compute||8.19 TFLOPS||7.20 TFLOPS||8.60 TFLOPS||5.63 TFLOPS||6.14 TFLOPS||4.61 TFLOPS||5.63 TFLOPS|
AMD wasn’t fooling around, the Radeon R9 Nano graphics card does indeed include a full implementation of the Fiji GPU and HBM, including 4096 stream processors, 256 texture units and 64 ROPs. The GPU core clock is rated “up to” 1.0 GHz, nearly the same as the Fury X (1050 MHz), and the only difference that I can see in the specifications on paper is that the Nano is rated at 8.19 TFLOPS of theoretical compute performance while the Fury X is rated at 8.60 TFLOPS.
Retail Card Design
AMD is in an interesting spot right now. The general consensus is that both the AMD Radeon R9 Fury X and the R9 Fury graphics cards had successful launches into the enthusiast community. We found that the performance of the Fury X was slightly under that of the GTX 980 Ti from NVIDIA, but also that the noise levels and power draw were so improved on Fiji over Hawaii that many users would dive head first into the new flagship from the red team.
The launch of the non-X AMD Fury card was even more interesting – here was a card with a GPU performing better than the competition in a price point that NVIDIA didn’t have an exact answer. The performance gap between the GTX 980 and GTX 980 Ti resulted in a $550 graphics card that AMD had a victory with. Add in the third Fiji-based product due out in a few short weeks, the R9 Nano, and you have a robust family of products that don’t exactly dominate the market but do put AMD in a positive position unlike any it has seen in recent years.
But there are some problems. First and foremost for AMD, continuing drops in market share. With the most recent reports from multiple source claiming that AMD’s Q2 2015 share has dropped to 18%, an all-time low in the last decade or so, AMD needs some growth and they need it now. Here’s the catch: AMD can’t make enough of the Fiji chip to affect that number at all. The Fury X, Fury and Nano are going to be hard to find for the foreseeable future thanks to production limits on the HBM (high bandwidth memory) integration; that same feature that helps make Fiji the compelling product it is. I have been keeping an eye on the stock of the Fury and Fury X products and found that it often can’t be found anywhere in the US for purchase. Maybe even more damning is the fact that the Radeon R9 Fury, the card that is supposed to be the model customizable by AMD board partners, still only has two options available: the Sapphire, which we reviewed when it launched, and the ASUS Strix R9 Fury that we are reviewing today.
AMD’s product and financial issues aside, the fact is that the Radeon R9 Fury 4GB and the ASUS Strix iteration of it are damned good products. ASUS has done its usual job of improving on the design of the reference PCB and cooler, added in some great features and packaged it up a price that is competitive and well worth the investment for enthusiast gamers. Our review today will only lightly touch on out-of-box performance of the Strix card mostly because it is so similar to that of the initial Fury review we posted in July. Instead I will look at the changes to the positioning of the AMD Fury product (if any) and how the cooler and design of the Strix product helps it stand out. Overclocking, power consumption and noise will all be evaluated as well.
Another Maxwell Iteration
The mainstream end of the graphics card market is about to get a bit more complicated with today’s introduction of the GeForce GTX 950. Based on a slightly cut down GM206 chip, the same used in the GeForce GTX 960 that was released almost 8 months ago, the new GTX 950 will fill a gap in the product stack for NVIDIA, resting right at $160-170 MSRP. Until today that next-down spot from the GTX 960 was filled by the GeForce GTX 750 Ti, the very first iteration of Maxwell (we usually call it Maxwell 1) that came out in February of 2014!
Even though that is a long time to go without refreshing the GTX x50 part of the lineup, NVIDIA was likely hesitant to do so based on the overwhelming success of the GM107 for mainstream gaming. It was low cost, incredibly efficient and didn’t require any external power to run. That led us down the path of upgrading OEM PCs with GTX 750 Ti, an article and video that still gets hundreds of views and dozens of comments a week.
The GTX 950 has some pretty big shoes to fill. I can tell you right now that it uses more power than the GTX 750 Ti, and it requires a 6-pin power connector, but it does so while increasing gaming performance dramatically. The primary competition from AMD is the Radeon R7 370, a Pitcairn GPU that is long in the tooth and missing many of the features that Maxwell provides.
And NVIDIA is taking a secondary angle with the GTX 950 launch –targeting the MOBA players (DOTA 2 in particular) directly and aggressively. With the success of this style of game over the last several years, and the impressive $18M+ purse for the largest DOTA 2 tournament just behind us, there isn’t a better area of PC gaming to be going after today. But are the tweaks and changes to the card and software really going to make a difference for MOBA gamers or is it just marketing fluff?
Let’s dive into everything GeForce GTX 950!
I knew that the move to DirectX 12 was going to be a big shift for the industry. Since the introduction of the AMD Mantle API along with the Hawaii GPU architecture we have been inundated with game developers and hardware vendors talking about the potential benefits of lower level APIs, which give more direct access to GPU hardware and enable more flexible threading for CPUs to game developers and game engines. The results, we were told, would mean that your current hardware would be able to take you further and future games and applications would be able to fundamentally change how they are built to enhance gaming experiences tremendously.
I knew that the reader interest in DX12 was outstripping my expectations when I did a live blog of the official DX12 unveil by Microsoft at GDC. In a format that consisted simply of my text commentary and photos of the slides that were being shown (no video at all), we had more than 25,000 live readers that stayed engaged the whole time. Comments and questions flew into the event – more than me or my staff could possible handle in real time. It turned out that gamers were indeed very much interested in what DirectX 12 might offer them with the release of Windows 10.
Today we are taking a look at the first real world gaming benchmark that utilized DX12. Back in March I was able to do some early testing with an API-specific test that evaluates the overhead implications of DX12, DX11 and even AMD Mantle from Futuremark and 3DMark. This first look at DX12 was interesting and painted an amazing picture about the potential benefits of the new API from Microsoft, but it wasn’t built on a real game engine. In our Ashes of the Singularity benchmark testing today, we finally get an early look at what a real implementation of DX12 looks like.
And as you might expect, not only are the results interesting, but there is a significant amount of created controversy about what those results actually tell us. AMD has one story, NVIDIA another and Stardock and the Nitrous engine developers, yet another. It’s all incredibly intriguing.
It comes after 8, but before 10
As the week of Intel’s Developer Forum (IDF) begins, you can expect to see a lot of information about Intel’s 6th Generation Core architecture, codenamed Skylake, finally revealed. When I posted my review of the Core i7-6700K, the first product based on that architecture to be released in any capacity, I was surprised that Intel was willing to ship product without the normal amount of background information for media and developers. Rather than give us the details and then ship product, which has happened for essentially every consumer product release I have been a part of, Intel did the reverse: ship a consumer friendly CPU and then promise to tell us how it all works later in the month at IDF.
Today I came across a document posted on Intel’s website that dives into very specific detail on the new Gen9 graphics and compute architecture of Skylake. Details on the Core architecture changes are not present, and instead we are given details on how the traditional GPU portion of the SoC has changed. To be clear: I haven’t had any formal briefing from Intel on this topic or anything surrounding the architecture of Skylake or the new Gen9 graphics system but I wanted to share the details we found available. I am sure we’ll learn more this week as IDF progresses so I will update this story where necessary.
What Intel calls Processor Graphics is what we used to call simply integrated graphics for the longest time. The purpose and role of processor graphics has changed drastically over the years and it is now not only responsible for 3D graphics rendering but compute, media and display capabilities of the Intel Skylake SoC (when discrete add-in graphics is not used). The architecture document used to source this story focuses on Gen9 graphics, the compute architecture utilized in the latest Skylake CPUs. The Intel HD Graphics 530 on the Core i7-6700K / Core i5-6600K is the first product released and announced using Gen9 graphics and is also the first to adopt Intel’s new 3-digit naming scheme.
This die shot of the Core i7-6700K shows the increased size and prominence of the Gen9 graphics in the overall SoC design. Containing four traditional x86 CPU cores and 1 “slice” implementation of Gen9 graphics (with three visible sub-slices we’ll describe below), this is not likely to be the highest performing iteration of the latest Intel HD Graphics technology.
Like the Intel processors before it, the Skylake design utilizes a ring bus architecture to connect the different components of the SoC. This bi-directional interconnect has a 32-byte wide data bus and connects to multiple “agents” on the CPU. Each individual CPU core is considered its own agent while the Gen9 compute architecture is considered one complete agent. The system agent bundles the DRAM memory, the display controller, PCI Express and other I/O interface that communicate with the rest of the PC. Any off-chip memory requests and transactions occur through this bus while on-chip data transfers tend to be handled differently.
It's Basically a Function Call for GPUs
Mantle, Vulkan, and DirectX 12 all claim to reduce overhead and provide a staggering increase in “draw calls”. As mentioned in the previous editorial, loading graphics card with tasks will take a drastic change in these new APIs. With DirectX 10 and earlier, applications would assign attributes to (what it is told is) the global state of the graphics card. After everything is configured and bound, one of a few “draw” functions is called, which queues the task in the graphics driver as a “draw call”.
While this suggests that just a single graphics device is to be defined, which we also mentioned in the previous article, it also implies that one thread needs to be the authority. This limitation was known about for a while, and it contributed to the meme that consoles can squeeze all the performance they have, but PCs are “too high level” for that. Microsoft tried to combat this with “Deferred Contexts” in DirectX 11. This feature allows virtual, shadow states to be loaded from secondary threads, which can be appended to the global state, whole. It was a compromise between each thread being able to create its own commands, and the legacy decision to have a single, global state for the GPU.
Some developers experienced gains, while others lost a bit. It didn't live up to expectations.
The paradigm used to load graphics cards is the problem. It doesn't make sense anymore. A developer might not want to draw a primitive with every poke of the GPU. At times, they might want to shove a workload of simple linear algebra through it, while other requests could simply be pushing memory around to set up a later task (or to read the result of a previous one). More importantly, any thread could want to do this to any graphics device.
The new graphics APIs allow developers to submit their tasks quicker and smarter, and it allows the drivers to schedule compatible tasks better, even simultaneously. In fact, the driver's job has been massively simplified altogether. When we tested 3DMark back in March, two interesting things were revealed:
- Both AMD and NVIDIA are only a two-digit percentage of draw call performance apart
- Both AMD and NVIDIA saw an order of magnitude increase in draw calls
Going Beyond the Reference GTX 970
Zotac has been an interesting company to watch for the past few years. It is a company that has made a name for themselves in the small form factor community with some really interesting designs and products. They continue down that path, but they have increasingly focused on high quality graphics cards that address a pretty wide market. They provide unique products from the $40 level up through the latest GTX 980 Ti with hybrid water and air cooling for $770. The company used to focus on reference designs, but some years past they widened their appeal by applying their own design decisions to the latest NVIDIA products.
Catchy looking boxes for people who mostly order online! Still, nice design.
The beginning of this year saw Zotac introduce their latest “Core” brand products that aim to provide high end features to more modestly priced parts. The Core series makes some compromises to hit price points that are more desirable for a larger swath of consumers. The cards often rely on more reference style PCBs with good quality components and advanced cooling solutions. This equation has been used before, but Zotac is treading some new ground by offering very highly clocked cards right out of the box.
Overall Zotac has a very positive reputation in the industry for quality and support.
Plenty of padding in the box to protect your latest investment.
Zotac GTX 970 AMP! Extreme Core Edition
The product we are looking at today is the somewhat long-named AMP! Extreme Core Edition. This is based on the NVIDIA GTX 970 chip which features 56 ROPS, 1.75 MB of L2 cache, and 1664 CUDA Cores. The GTX 970 has of course been scrutinized heavily due to the unique nature of its memory subsystem. While it does physically have a 256 bit bus, the last 512 MB (out of 4GB) is addressed by a significantly slower unit due to shared memory controller capacity. In theory the card reference design supports up to 224 GB/sec of memory bandwidth. There are obviously some very unhappy people out there about this situation, but much of this could have been avoided if NVIDIA had disclosed the exact nature of the GTX 970 configuration.
Bioshock Infinite Results
Our Intel Skylake launch coverage is intense! Make sure you hit up all the stories and videos that are interesting for you!
- The Intel Core i7-6700K Review - Skylake First for Enthusiasts (Video)
- Skylake vs. Sandy Bridge: Discrete GPU Showdown (Video)
- ASUS Z170-A Motherboard Preview
- Intel Skylake / Z170 Rapid Storage Technology Tested - PCIe and SATA RAID
Today marks the release of Intel's newest CPU architecture, code named Skylake. I already posted my full review of the Core i7-6700K processor so, if you are looking for CPU performance and specification details on that part, you should start there. What we are looking at in this story is the answer to a very simple, but also very important question:
Is it time for gamers using Sandy Bridge system to finally bite the bullet and upgrade?
I think you'll find that answer will depend on a few things, including your gaming resolution and aptitude for multi-GPU configuration, but even I was surprised by the differences I saw in testing.
Our testing scenario was quite simple. Compare the gaming performance of an Intel Core i7-6700K processor and Z170 motherboard running both a single GTX 980 and a pair of GTX 980s in SLI against an Intel Core i7-2600K and Z77 motherboard using the same GPUs. I installed both the latest NVIDIA GeForce drivers and the latest Intel system drivers for each platform.
|Skylake System||Sandy Bridge System|
|Processor||Intel Core i7-6700K||Intel Core i7-2600K|
|Motherboard||ASUS Z170-Deluxe||Gigabyte Z68-UD3H B3|
|Memory||16GB DDR4-2133||8GB DDR3-1600|
|Graphics Card||1x GeForce GTX 980
2x GeForce GTX 980 (SLI)
|1x GeForce GTX 980
2x GeForce GTX 980 (SLI)
|OS||Windows 8.1||Windows 8.1|
Our testing methodology follows our Frame Rating system, which uses a capture-based system to measure frame times at the screen (rather than trusting the software's interpretation).
If you aren't familiar with it, you should probably do a little research into our testing methodology as it is quite different than others you may see online. Rather than using FRAPS to measure frame rates or frame times, we are using an secondary PC to capture the output from the tested graphics card directly and then use post processing on the resulting video to determine frame rates, frame times, frame variance and much more.
This amount of data can be pretty confusing if you attempting to read it without proper background, but I strongly believe that the results we present paint a much more thorough picture of performance than other options. So please, read up on the full discussion about our Frame Rating methods before moving forward!!
While there are literally dozens of file created for each “run” of benchmarks, there are several resulting graphs that FCAT produces, as well as several more that we are generating with additional code of our own.
If you need some more background on how we evaluate gaming performance on PCs, just check out my most recent GPU review for a full breakdown.
I only had time to test four different PC titles:
- Bioshock Infinite
- Grand Theft Auto V
- GRID 2
- Metro: Last Light
A few years ago, we took our first look at the inexpensive 27" 1440p monitors which were starting to flood the market via eBay sellers located in Korea. These monitors proved to be immensely popular and largely credited for moving a large number of gamers past 1080p.
However, in the past few months we have seen a new trend from some of these same Korean monitor manufacturers. Just like the Seiki Pro SM40UNP 40" 4K display that we took a look at a few weeks ago, the new trend is large 4K monitors.
Built around a 42-in LG AH-IPS panel, the Wasabi Mango UHD420 is an impressive display. Inclusion of HDMI 2.0 and DisplayPort 1.2 allow you to achieve 4K at a full 60Hz and 4:4:4 color gamut. At a cost of just under $800 on Amazon, this is an incredibly appealing value.
Whether or not the UHD420 is a TV or a monitor is actually quite the tossup. The lack of a tuner
might initially lead you to believe it's not a TV. Inclusion of a DisplayPort connector, and USB 3.0 hub might make you believe it's a monitor, but it's bundled with a remote control (entirely in Korean). In reality, this display could really be used for either use case (unless you use OTA tuning), and really starts to blur the lines between a "dumb" TV and a monitor. You'll also find VESA 400x400mm mounting holes on this display for easy wall mounting.
... But Is the Timing Right?
Windows 10 is about to launch and, with it, DirectX 12. Apart from the massive increase in draw calls, Explicit Multiadapter, both Linked and Unlinked, has been the cause of a few pockets of excitement here and there. I am a bit concerned, though. People seem to find this a new, novel concept that gives game developers the tools that they've never had before. It really isn't. Depending on what you want to do with secondary GPUs, game developers could have used them for years. Years!
Before we talk about the cross-platform examples, we should talk about Mantle. It is the closest analog to DirectX 12 and Vulkan that we have. It served as the base specification for Vulkan that the Khronos Group modified with SPIR-V instead of HLSL and so forth. Some claim that it was also the foundation of DirectX 12, which would not surprise me given what I've seen online and in the SDK. Allow me to show you how the API works.
Mantle is an interface that mixes Graphics, Compute, and DMA (memory access) into queues of commands. This is easily done in parallel, as each thread can create commands on its own, which is great for multi-core processors. Each queue, which are lists leading to the GPU that commands are placed in, can be handled independently, too. An interesting side-effect is that, since each device uses standard data structures, such as IEEE754 decimal numbers, no-one cares where these queues go as long as the work is done quick enough.
Since each queue is independent, an application can choose to manage many of them. None of these lists really need to know what is happening to any other. As such, they can be pointed to multiple, even wildly different graphics devices. Different model GPUs with different capabilities can work together, as long as they support the core of Mantle.
DirectX 12 and Vulkan took this metaphor so their respective developers could use this functionality across vendors. Mantle did not invent the concept, however. What Mantle did is expose this architecture to graphics, which can make use of all the fixed-function hardware that is unique to GPUs. Prior to AMD's usage, this was how GPU compute architectures were designed. Game developers could have spun up an OpenCL workload to process physics, audio, pathfinding, visibility, or even lighting and post-processing effects... on a secondary GPU, even from a completely different vendor.
Vista's multi-GPU bug might get in the way, but it was possible in 7 and, I believe, XP too.
Fiji brings the (non-X) Fury
Last month was a big one for AMD. At E3 the company hosted its own press conference to announce the Radeon R9 300-series of graphics as well as the new family of products based on the Fiji GPU. It started with the Fury X, a flagship $650 graphics card with an integrated water cooler that was well received. It wasn't perfect by any means, but it was a necessary move for AMD to compete with NVIDIA on the high end of the discrete graphics market.
At the event AMD also talked about the Radeon R9 Fury (without the X) as the version of Fiji that would be taken by board partners to add custom coolers and even PCB designs. (They also talked about the R9 Nano and a dual-GPU version of Fiji, but nothing new is available on those products yet.) The Fury, priced $100 lower than the Fury X at $549, is going back to a more classic GPU design. There is no "reference" product though, so cooler and PCB designs are going to vary from card to card. We already have two different cards in our hands that differ dramatically from one another.
The Fury cuts down the Fiji GPU a bit with fewer stream processors and texture units, but keeps most other specs the same. This includes the 4GB of HBM (high bandwidth memory), 64 ROP count and even the TDP / board power. Performance is great and it creates an interesting comparison between itself and the GeForce GTX 980 cards on the market. Let's dive into this review!
SLI and CrossFire
Last week I sat down with a set of three AMD Radeon R9 Fury X cards, our sampled review card as well as two retail cards purchased from Newegg, to see how the reports of the pump whine noise from the cards was shaping up. I'm not going to dive into that debate again here in this story as I think we have covered it pretty well thus far in that story as well as on our various podcasts, but rest assured we are continuing to look into the revisions of the Fury X to see if AMD and Cooler Master were actually able to fix the issue.
What we have to cover today is something very different, and likely much more interesting for a wider range of users. When you have three AMD Fury X cards in your hands, you of course have to do some multi-GPU testing with them. With our set I was able to run both 2-Way and 3-Way CrossFire with the new AMD flagship card and compare them directly to the comparable NVIDIA offering, the GeForce GTX 980 Ti.
There isn't much else I need to do to build up this story, is there? If you are curious how well the new AMD Fury X scales in CrossFire with two and even three GPUs, this is where you'll find your answers.
Introduction and Technical Specifications
In our previous article here, we demonstrated how to mod the EVGA GTX 970 SC ACX 2.0 video card to get higher performance and significantly lower running temps. Now we decided to take two of these custom modded EVGA GTX 970 cards to see how well they perform in an SLI configuration. ASUS was kind enough to supply us with one of their newly introduced ROG Enthusiast SLI Bridges for our experiments.
ASUS ROG Enthusiast SLI Bridge
Courtesy of ASUS
Courtesy of ASUS
For the purposes of running the two EVGA GTX 970 SC ACX 2.0 video cards in SLI, we chose to use the 3-way variant of ASUS' ROG Enthusiast SLI Bridge so that we could run the tests with full 16x bandwidth across both cards (with the cards in PCIe 3.0 x16 slots 1 and 3 in our test board). This customized SLI adapter features a powered red-colored ROG logo embedded in its brushed aluminum upper surface. The adapter supports 2-way and 3-way SLI in a variety of board configurations.
Courtesy of ASUS
ASUS offers their ROG Enthusiast SLI Bridge in 3 sizes for various variations on 2-way, 3-way, and 4-way SLI configurations. All bridges feature the top brushed-aluminum cap with embedded glowing ROG logo.
Courtesy of ASUS
The smallest bridge supports 2-way SLI configurations with either a two or three slot separation. The middle sized bridge supports up to a 3-way SLI configuration with a two slot separation required between each card. The largest bridge support up to a 4-way SLI configuration, also requiring a two slot separation between each card used.
Technical Specifications (taken from the ASUS website)
|Dimensions||2-WAY: 97 x 43 x 21 (L x W x H mm)
3-WAY: 108 x 53 x 21 (L x W x H mm)
4-WAY: 140 x 53 x 21 (L x W x H mm)
|Weight||70 g (2-WAY)
91 g (3-WAY)
|Compatible GPU set-ups||2-WAY: 2-WAY-S & 2-WAY-M
3-WAY: 2-WAY-L & 3-WAY
|Contents||2-WAY: 1 x optional power cable & 2 PCBs included for varying configurations
3-WAY: 1 x optional power cable
4-WAY: 1 x optional power cable
Tick Tock Tick Tock Tick Tock Tock
A few websites have been re-reporting on a leak from BenchLife.info about Kaby Lake, which is supposedly a second 14nm redesign (“Tock”) to be injected between Skylake and Cannonlake.
UPDATE (July 2nd, 3:20pm ET): It has been pointed out that many hoaxes have come out of the same source, and that I should be more clear in my disclaimer. This is an unconfirmed, relatively easy to fake leak that does not have a second, independent source. I reported on it because (apart from being interesting enough) some details were listed on the images, but not highlighted in the leak, such as "GT0" and a lack of Iris Pro on -K. That suggests that the leaker got the images from somewhere, but didn't notice those details, which implies that the original source was hoaxed by an anonymous source, who only seeded the hoax to a single media outlet, or that it was an actual leak.
Either way, enjoy my analysis but realize that this is a single, unconfirmed source who allegedly published hoaxes in the past.
Image Credit: BenchLife.info
If true, this would be a major shift in both Intel's current roadmap as well as how they justify their research strategies. It also includes a rough stack of product categories, from 4.5W up to 91W TDPs, including their planned integrated graphics configurations. This leads to a pair of interesting stories:
How Kaby Lake could affect Intel's processors going forward. Since 2006, Intel has only budgeted a single CPU architecture redesign for any given fabrication process node. Taking two attempts on the 14nm process buys time for 10nm to become viable, but it could also give them more time to build up a better library of circuit elements, allowing them to assemble better processors in the future.
What type of user will be given Iris Pro? Also, will graphics-free options be available in the sub-Enthusiast class? When buying a processor from Intel, the high-end mainstream processors tend to have GT2-class graphics, such as the Intel HD 4600. Enthusiast architectures, such as Haswell-E, cannot be used without discrete graphics -- the extra space is used for more cores, I/O lanes, or other features. As we will discuss later, Broadwell took a step into changing the availability of Iris Pro in the high-end mainstream, but it doesn't seem like Kaby Lake will make any more progress. Also, if I am interpreting the table correctly, Kaby Lake might bring iGPU-less CPUs to LGA 1151.
Keeping Your Core Regular
To the first point, Intel has been on a steady tick-tock cycle since the Pentium 4 architecture reached the 65nm process node, which was a “tick”. The “tock” came from the Conroe/Merom architecture that was branded “Core 2”. This new architecture was a severe departure from the high clock, relatively low IPC design that Netburst was built around, which instantaneously changed the processor landscape from a dominant AMD to an Intel runaway lead.
After 65nm and Core 2 started the cycle, every new architecture alternated between shrinking the existing architecture to smaller transistors (tick) and creating a new design on the same fabrication process (tock). Even though Intel has been steadily increasing their R&D budget over time, which is now in the range of $10 to $12 billion USD each year, creating smaller, more intricate designs with new process nodes has been getting harder. For comparison, AMD's total revenue (not just profits) for 2014 was $5.51 billion USD.
Retail cards still suffer from the issue
In our review of AMD's latest flagship graphics card, the Radeon R9 Fury X, I noticed and commented on the unique sound that the card was producing during our testing. A high pitched whine, emanating from the pump of the self-contained water cooler designed by Cooler Master, was obvious from the moment our test system was powered on and remained constant during use. I talked with a couple of other reviewers about the issue before the launch of the card and it seemed that I wasn't alone. Looking around other reviews of the Fury X, most make mention of this squeal specifically.
Noise from graphics cards come in many forms. There is the most obvious and common noise from on-board fans and the air it moves. Less frequently, but distinctly, the sound of inductor coil whine comes up. Fan noise spikes when the GPU gets hot, causing the fans to need to spin faster and move more air across the heatsink, which keeps everything running cool. Coil whine changes pitch based on the frame rate (and the frequency of power delivery on the card) and can be alleviated by using higher quality components on the board itself.
But the sound of our Fury X was unique: it was caused by the pump itself and it was constant. The noise it produced did not change as the load on the GPU varied. It was also 'pitchy' - a whine that seemed to pierce through other sounds in the office. A close analog might be the sound of an older, CRT TV or monitor that is left powered on without input.
In our review process, AMD told us the solution was fixed. In an email sent to the media just prior to the Fury X launch, an AMD rep stated:
In regards to the “pump whine”, AMD received feedback that during open bench testing some cards emit a mild “whining” noise. This is normal for most high speed liquid cooling pumps; Usually the end user cannot hear the noise as the pumps are installed in the chassis, and the radiator fan is louder than the pump. Since the AMD Radeon™ R9 Fury X radiator fan is near silent, this pump noise is more noticeable.
The issue is limited to a very small batch of initial production samples and we have worked with the manufacturer to improve the acoustic profile of the pump. This problem has been resolved and a fix added to production parts and is not an issue.
I would disagree that this is "normal" but even so, taking AMD at its word, I wrote that we heard the noise but also that AMD had claimed to have addressed it. Other reviewers noted the same comment from AMD, saying the result was fixed. But very quickly after launch some users were posting videos on YouTube and on forums with the same (or worse) sounds and noise. We had already started bringing in a pair of additional Fury X retail cards from Newegg in order to do some performance testing, so it seemed like a logical next step for us to test these retail cards in terms of pump noise as well.
First, let's get the bad news out of the way: both of the retail AMD Radeon R9 Fury X cards that arrived in our offices exhibit 'worse' noise, in the form of both whining and buzzing, compared to our review sample. In this write up, I'll attempt to showcase the noise profile of the three Fury X cards in our possession, as well as how they compare to the Radeon R9 295X2 (another water cooled card) and the GeForce GTX 980 Ti reference design - added for comparison.