All | Editorial | General Tech | Graphics Cards | Networking | Motherboards | Cases and Cooling | Processors | Chipsets | Memory | Displays | Systems | Storage | Mobile | Shows and Expos
A few secrets about GTX 970
UPDATE 1/28/15 @ 10:25am ET: NVIDIA has posted in its official GeForce.com forums that they are working on a driver update to help alleviate memory performance issues in the GTX 970 and that they will "help out" those users looking to get a refund or exchange.
Yes, that last 0.5GB of memory on your GeForce GTX 970 does run slower than the first 3.5GB. More interesting than that fact is the reason why it does, and why the result is better than you might have otherwise expected. Last night we got a chance to talk with NVIDIA’s Senior VP of GPU Engineering, Jonah Alben on this specific concern and got a detailed explanation to why gamers are seeing what they are seeing along with new disclosures on the architecture of the GM204 version of Maxwell.
NVIDIA's Jonah Alben, SVP of GPU Engineering
For those looking for a little background, you should read over my story from this weekend that looks at NVIDIA's first response to the claims that the GeForce GTX 970 cards currently selling were only properly utilizing 3.5GB of the 4GB frame buffer. While it definitely helped answer some questions it raised plenty more which is whey we requested a talk with Alben, even on a Sunday.
Let’s start with a new diagram drawn by Alben specifically for this discussion.
GTX 970 Memory System
Believe it or not, every issue discussed in any forum about the GTX 970 memory issue is going to be explained by this diagram. Along the top you will see 13 enabled SMMs, each with 128 CUDA cores for the total of 1664 as expected. (Three grayed out SMMs represent those disabled from a full GM204 / GTX 980.) The most important part here is the memory system though, connected to the SMMs through a crossbar interface. That interface has 8 total ports to connect to collections of L2 cache and memory controllers, all of which are utilized in a GTX 980. With a GTX 970 though, only 7 of those ports are enabled, taking one of the combination L2 cache / ROP units along with it. However, the 32-bit memory controller segment remains.
You should take two things away from that simple description. First, despite initial reviews and information from NVIDIA, the GTX 970 actually has fewer ROPs and less L2 cache than the GTX 980. NVIDIA says this was an error in the reviewer’s guide and a misunderstanding between the engineering team and the technical PR team on how the architecture itself functioned. That means the GTX 970 has 56 ROPs and 1792 KB of L2 cache compared to 64 ROPs and 2048 KB of L2 cache for the GTX 980. Before people complain about the ROP count difference as a performance bottleneck, keep in mind that the 13 SMMs in the GTX 970 can only output 52 pixels/clock and the seven segments of 8 ROPs each (56 total) can handle 56 pixels/clock. The SMMs are the bottleneck, not the ROPs.
A new GPU, a familiar problem
Editor's Note: Don't forget to join us today for a live streaming event featuring Ryan Shrout and NVIDIA's Tom Petersen to discuss the new GeForce GTX 960. It will be live at 1pm ET / 10am PT and will include ten (10!) GTX 960 prizes for participants! You can find it all at http://www.pcper.com/live
There are no secrets anymore. Calling today's release of the NVIDIA GeForce GTX 960 a surprise would be like calling another Avenger's movie unexpected. If you didn't just assume it was coming chances are the dozens of leaks of slides and performance would get your attention. So here it is, today's the day, NVIDIA finally upgrades the mainstream segment that was being fed by the GTX 760 for more than a year and half. But does the brand new GTX 960 based on Maxwell move the needle?
But as you'll soon see, the GeForce GTX 960 is a bit of an odd duck in terms of new GPU releases. As we have seen several times in the last year or two with a stagnant process technology landscape, the new cards aren't going be wildly better performing than the current cards from either NVIDIA for AMD. In fact, there are some interesting comparisons to make that may surprise fans of both parties.
The good news is that Maxwell and the GM206 GPU will price out starting at $199 including overclocked models at that level. But to understand what makes it different than the GM204 part we first need to dive a bit into the GM206 GPU and how it matches up with NVIDIA's "small" GPU strategy of the past few years.
The GM206 GPU - Generational Complexity
First and foremost, the GTX 960 is based on the exact same Maxwell architecture as the GTX 970 and GTX 980. The power efficiency, the improved memory bus compression and new features all make their way into the smaller version of Maxwell selling for $199 as of today. If you missed the discussion on those new features including MFAA, Dynamic Super Resolution, VXGI you should read that page of our original GTX 980 and GTX 970 story from last September for a bit of context; these are important aspects of Maxwell and the new GM206.
NVIDIA's GM206 is essentially half of the full GM204 GPU that you find on the GTX 980. That includes 1024 CUDA cores, 64 texture units and 32 ROPs for processing, a 128-bit memory bus and 2GB of graphics memory. This results in half of the memory bandwidth at 112 GB/s and half of the peak compute capability at 2.30 TFLOPS.
There are smart people that work at AMD. A quick look at the company's products, including the APU lineup as well as the discrete GPU fields, clearly indicates a lineup of talent in engineering, design, marketing and business. It's not perfect of course, and very few companies can claim to be, but the strengths of AMD are there and easily discernible to those of us on the outside looking in with the correct vision.
Because AMD has smart people working hard to improve the company, they are also aware of its shortcomings. For many years now, the thorn of GPU software has been sticking in AMD's side, tarnishing the name of Radeon and the products it releases. Even though the Catalyst graphics driver has improved substantially year after year, the truth is that NVIDIA's driver team has been keeping ahead of AMD consistently in basically all regards: features, driver installation, driver stability, performance improvements over time.
If knowing is half the battle, acting on that knowledge is at least another 49%. AMD is hoping to address driver concerns now and into the future with the release of the Catalyst Omega driver. This driver sets itself apart from previous releases in several different ways, starting with a host of new features, some incremental performance improvements and a drastically amped up testing and validation process.
AMD considers this a "special edition" driver and is something that they plan to repeat on a yearly basis. That note in itself is an interesting point - is that often enough to really change the experience and perception of the Catalyst driver program going forward? Though AMD does include some specific numbers of tested cases for its validation of the Omega driver (441,000+ automated test runs, 11,000+ manual test runs) we don't have side by side data from NVIDIA to compare it to. If AMD is only doing a roundup of testing like this once a year, but NVIDIA does it more often, then AMD might soon find itself back in the same position it has been.
UPDATE: There has been some confusion based on this story that I want to correct. AMD informed us that it is still planning on releasing other drivers throughout the year that will address performance updates for specific games and bug fixes for applications and titles released between today and the pending update for the next "special edition." AMD is NOT saying that they will only have a driver drop once a year.
But before we worry about what's going to happen in the future, let's look into what AMD has changed and added to the new Catalyst Omega driver released today.
We’ve been tracking NVIDIA’s G-Sync for quite a while now. The comments section on Ryan’s initial article erupted with questions, and many of those were answered in a follow-on interview with NVIDIA’s Tom Petersen. The idea was radical – do away with the traditional fixed refresh rate and only send a new frame to the display when it has just completed rendering by the GPU. There are many benefits here, but the short version is that you get the low-latency benefit of V-SYNC OFF gaming combined with the image quality (lack of tearing) that you would see if V-SYNC was ON. Despite the many benefits, there are some potential disadvantages that come from attempting to drive an LCD panel at varying periods of time, as opposed to the fixed intervals that have been the norm for over a decade.
As the first round of samples came to us for review, the current leader appeared to be the ASUS ROG Swift. A G-Sync 144 Hz display at 1440P was sure to appeal to gamers who wanted faster response than the 4K 60 Hz G-Sync alternative was capable of. Due to what seemed to be large consumer demand, it has taken some time to get these panels into the hands of consumers. As our Storage Editor, I decided it was time to upgrade my home system, placed a pre-order, and waited with anticipation of finally being able to shift from my trusty Dell 3007WFP-HC to a large panel that can handle >2x the FPS.
Fast forward to last week. My pair of ROG Swifts arrived, and some other folks I knew had also received theirs. Before I could set mine up and get some quality gaming time in, my bro FifthDread and his wife both noted a very obvious flicker on their Swifts within the first few minutes of hooking them up. They reported the flicker during game loading screens and mid-game during background content loading occurring in some RTS titles. Prior to hearing from them, the most I had seen were some conflicting and contradictory reports on various forums (not limed to the Swift, though that is the earliest panel and would therefore see the majority of early reports), but now we had something more solid to go on. That night I fired up my own Swift and immediately got to doing what I do best – trying to break things. We have reproduced the issue and intend to demonstrate it in a measurable way, mostly to put some actual data out there to go along with those trying to describe something that is borderline perceptible for mere fractions of a second.
First a bit of misnomer correction / foundation laying:
- The ‘Screen refresh rate’ option you see in Windows Display Properties is actually a carryover from the CRT days. In terms of an LCD, it is the maximum rate at which a frame is output to the display. It is not representative of the frequency at which the LCD panel itself is refreshed by the display logic.
- LCD panel pixels are periodically updated by a scan, typically from top to bottom. Newer / higher quality panels repeat this process at a rate higher than 60 Hz in order to reduce the ‘rolling shutter’ effect seen when panning scenes or windows across the screen.
- In order to engineer faster responding pixels, manufacturers must deal with the side effect of faster pixel decay between refreshes. This is a balanced by increasing the frequency of scanning out to the panel.
- The effect we are going to cover here has nothing to do with motion blur, LightBoost, backlight PWM, LightBoost combined with G-Sync (not currently a thing, even though Blur Busters has theorized on how it could work, their method would not work with how G-Sync is actually implemented today).
With all of that out of the way, let’s tackle what folks out there may be seeing on their own variable refresh rate displays. Based on our testing so far, the flicker only presented at times when a game enters a 'stalled' state. These are periods where you would see a split-second freeze in the action, like during a background level load during game play in some titles. It also appears during some game level load screens, but as those are normally static scenes, they would have gone unnoticed on fixed refresh rate panels. Since we were absolutely able to see that something was happening, we wanted to be able to catch it in the act and measure it, so we rooted around the lab and put together some gear to do so. It’s not a perfect solution by any means, but we only needed to observe differences between the smooth gaming and the ‘stalled state’ where the flicker was readily observable. Once the solder dust settled, we fired up a game that we knew could instantaneously swing from a high FPS (144) to a stalled state (0 FPS) and back again. As it turns out, EVE Online does this exact thing while taking an in-game screen shot, so we used that for our initial testing. Here’s what the brightness of a small segment of the ROG Swift does during this very event:
Measured panel section brightness over time during a 'stall' event. Click to enlarge.
The relatively small ripple to the left and right of center demonstrate the panel output at just under 144 FPS. Panel redraw is in sync with the frames coming from the GPU at this rate. The center section, however, represents what takes place when the input from the GPU suddenly drops to zero. In the above case, the game briefly stalled, then resumed a few frames at 144, then stalled again for a much longer period of time. Completely stopping the panel refresh would result in all TN pixels bleeding towards white, so G-Sync has a built-in failsafe to prevent this by forcing a redraw every ~33 msec. What you are seeing are the pixels intermittently bleeding towards white and periodically being pulled back down to the appropriate brightness by a scan. The low latency panel used in the ROG Swift does this all of the time, but it is less noticeable at 144, as you can see on the left and right edges of the graph. An additional thing that’s happening here is an apparent rise in average brightness during the event. We are still researching the cause of this on our end, but this brightness increase certainly helps to draw attention to the flicker event, making it even more perceptible to those who might have not otherwise noticed it.
Some of you might be wondering why this same effect is not seen when a game drops to 30 FPS (or even lower) during the course of normal game play. While the original G-Sync upgrade kit implementation simply waited until 33 msec had passed until forcing an additional redraw, this introduced judder from 25-30 FPS. Based on our observations and testing, it appears that NVIDIA has corrected this in the retail G-Sync panels with an algorithm that intelligently re-scans at even multiples of the input frame rate in order to keep the redraw rate relatively high, and therefore keeping flicker imperceptible – even at very low continuous frame rates.
A few final points before we go:
- This is not limited to the ROG Swift. All variable refresh panels we have tested (including 4K) see this effect to a more or less degree than reported here. Again, this only occurs when games instantaneously drop to 0 FPS, and not when those games dip into low frame rates in a continuous fashion.
- The effect is less perceptible (both visually and with recorded data) at lower maximum refresh rate settings.
- The effect is not present at fixed refresh rates (G-Sync disabled or with non G-Sync panels).
This post was primarily meant as a status update and to serve as something for G-Sync users to point to when attempting to explain the flicker they are perceiving. We will continue researching, collecting data, and coordinating with NVIDIA on this issue, and will report back once we have more to discuss.
During the research and drafting of this piece, we reached out to and worked with NVIDIA to discuss this issue. Here is their statement:
"All LCD pixel values relax after refreshing. As a result, the brightness value that is set during the LCD’s scanline update slowly relaxes until the next refresh.
This means all LCDs have some slight variation in brightness. In this case, lower frequency refreshes will appear slightly brighter than high frequency refreshes by 1 – 2%.
When games are running normally (i.e., not waiting at a load screen, nor a screen capture) - users will never see this slight variation in brightness value. In the rare cases where frame rates can plummet to very low levels, there is a very slight brightness variation (barely perceptible to the human eye), which disappears when normal operation resumes."
So there you have it. It's basically down to the physics of how an LCD panel works at varying refresh rates. While I agree that it is a rare occurrence, there are some games that present this scenario more frequently (and noticeably) than others. If you've noticed this effect in some games more than others, let us know in the comments section below.
(Editor's Note: We are continuing to work with NVIDIA on this issue and hope to find a way to alleviate the flickering with either a hardware or software change in the future.)
It has been a couple of months since the release of the GeForce GTX 970 and the GM204 GPU that it is based on. After the initial wave of stock on day one, NVIDIA had admittedly struggled to keep these products available. Couple that with rampant concerns over coil whine from some non-reference designs, and you could see why we were a bit hesitant to focus and spend our time on retail GTX 970 reviews.
These issues appear to be settled for the most part. Finding GeForce GTX 970 cards is no longer a problem and users with coil whine are getting RMA replacements from NVIDIA's partners. Because of that, we feel much more comfortable reporting our results with the various retail cards that we have in house, and you'll see quite a few reviews coming from PC Perspective in the coming weeks.
But let's start with the MSI GeForce GTX 970 4GB Gaming card. Based on user reviews, this is one of the most popular retail cards. MSI's Gaming series of cards combines a custom cooler that typically runs quieter and more efficient than reference design, and it comes with a price tag that is within arms reach of the lower cost options as well.
The MSI GeForce GTX 970 4GB Gaming
MSI continues with its Dragon Army branding, and its associated black/red color scheme, which I think is appealing to a wide range of users. I'm sure NVIDIA would like to see a green or neutral color scheme, but hey, there are only so many colors to go around.
MFAA Technology Recap
In mid-September NVIDIA took the wraps off of the GeForce GTX 980 and GTX 970 GPUs, the first products based on the GM204 GPU utilizing the Maxwell architecture. Our review of the chip, those products and the package that NVIDIA had put together was incredibly glowing. Not only was performance impressive but they were able to offer that performance with power efficiency besting anything else on the market.
Of course, along with the new GPU were a set of new product features coming along for the ride. Two of the most impressive were Dynamic Super Resolution (DSR) and Multi-Frame Sampled AA (MFAA) but only one was available at launch: DSR. With it, you could take advantage of the extreme power of the GTX 980/970 with older games, render in a higher resolution than your panel, and have it filtered down to match your screen in post. The results were great. But NVIDIA spent as much time talking about MFAA (not mother-fu**ing AA as it turned out) during the product briefings and I was shocked when I found out the feature wouldn't be ready to test or included along with launch.
That changes today with the release of NVIDIA's 344.75 driver, the first to implement support for the new and potentially important anti-aliasing method.
Before we dive into the results of our testing, both in performance and image quality, let's get a quick recap on what exactly MFAA is and how it works.
Here is what I wrote back in September in our initial review:
While most of the deep, architectural changes in GM204 are based around power and area efficiency, there are still some interesting feature additions NVIDIA has made to these cards that depend on some specific hardware implementations. First up is a new antialiasing method called MFAA, or Multi-Frame Sampled AA. This new method alternates the AA sample pattern, which is now programmable via software, in both temporal and spatial directions.
The goal is to change the AA sample pattern in a way to produce near 4xMSAA quality at the effective cost of 2x MSAA (in terms of performance). NVIDIA showed a couple of demos of this in action during the press meetings but the only gameplay we saw was in a static scene. I do have some questions about how this temporal addition is affected by fast motion on the screen, though NVIDIA asserts that MFAA will very rarely ever fall below the image quality of standard 2x MSAA.
That information is still correct but we do have a little bit more detail on how this works than we did before. For reasons pertaining to patents NVIDIA seems a bit less interested in sharing exact details than I would like to see, but we'll work with what we have.
Mini-ITX Sized Package with a Full Sized GPU
PC components seem to be getting smaller. Micro-ATX used to not be very popular for the mainstream enthusiast, but that has changed as of late. Mini-ITX is now the hot form factor these days with plenty of integrated features on motherboards and interesting case designs to house them in. Enthusiast graphics cards tend to be big, and that is a problem for some of these small cases. Manufacturers are responding to this by squeezing every ounce of cooling performance into smaller cards that more adequately fit in these small chassis.
MSI is currently offering their midrange cards in these mini-ITX liveries. The card we have today is the GTX 760 Mini-ITX Gaming. The GTX 760 is a fairly popular card due to it being fairly quick, but not too expensive. It is still based on the GK104, though fairly heavily cut down from a fully functional die. The GTX 760 features 1152 CUDA Cores divided into 6 SMXs. A fully functional GK104 is 1536 CUDA Cores and 8 SMXs. The stock clock on the GTX 760 is 980 MHz with a boost up to 1033 MHz.
The pricing for the GTX 760 cards is actually fairly high as compared to similarly performing products from AMD. NVIDIA feels that they offer a very solid product at that price and do not need to compete directly with AMD on a performance per dollar basis. Considering that NVIDIA has stayed very steady in terms of marketshare, they probably have a valid point. Overall the GTX 760 performs in the same general area as a R9 270X and R9 280, but again the AMD parts have a significant advantage in terms of price.
The challenges for making a high performing, small form factor card are focused on power delivery and thermal dissipation. Can the smaller PCB still have enough space for all of the VRMs required with such a design? Can the manufacturer develop a cooling solution that will keep the GPU in the designed thermal envelope? MSI has taken a shot at these issues with their GTX 760 Mini-ITX OC edition card.
A Civ for a New Generation
Turn-based strategy games have long been defined by the Civilization series. Civ 5 took up hours and hours of the PC Perspective team's non-working hours (and likely the working ones too) and it looks like the new Civilization: Beyond Earth has the chance to do the same. Early reviews of the game from Gamespot, IGN, and Polygon are quite positive, and that's great news for a PC-only release; they can sometimes get overlooked in the games' media.
For us, the game offers an interesting opportunity to discuss performance. Beyond Earth is definitely going to be more CPU-bound than the other games that we tend to use in our benchmark suite, but the fact that this game is new, shiny, and even has a Mantle implementation (AMD's custom API) makes interesting for at least a look at the current state of performance. Both NVIDIA and AMD sent have released drivers with specific optimization for Beyond Earth as well. This game is likely to be popular and it deserves the attention it gets.
Civilization: Beyond Earth, a turn-based strategy game that can take a very long time to complete, ships with an integrated benchmark mode to help users and the industry test performance under different settings and hardware configurations. To enable it, you simple add "-benchmark results.csv" to the Steam game launch options and then start up the game normally. Rather than taking you to the main menu, you'll be transported into a view of a map that represents a somewhat typical gaming state for a long term session. The game will use the last settings you ran the game at to measure your system's performance, without the modified launch options, so be sure to configure that before you prepare to benchmark.
The output of this is the "result.csv" file, saved to your Steam game install root folder. In there, you'll find a list of numbers, separated by commas, representing the frame times for each frame rendering during the run. You don't get averages, a minimum, or a maximum without doing a little work. Fire up Excel or Google Docs and remember the formula:
1000 / Average (All Frame Times) = Avg FPS
It's a crude measurement that doesn't take into account any errors, spikes, or other interesting statistical data, but at least you'll have something to compare with your friends.
Our testing settings
Just as I have done in recent weeks with Shadow of Mordor and Sniper Elite 3, I ran some graphics cards through the testing process with Civilization: Beyond Earth. These include the GeForce GTX 980 and Radeon R9 290X only, along with SLI and CrossFire configurations. The R9 290X was run in both DX11 and Mantle.
- Core i7-3960X
- ASUS Rampage IV Extreme X79
- 16GB DDR3-1600
- GeForce GTX 980 Reference (344.48)
- ASUS R9 290X DirectCU II (14.9.2 Beta)
Mantle Additions and Improvements
AMD is proud of this release as it introduces a few interesting things alongside the inclusion of the Mantle API.
- Enhanced-quality Anti-Aliasing (EQAA): Improves anti-aliasing quality by doubling the coverage samples (vs. MSAA) at each AA level. This is automatically enabled for AMD users when AA is enabled in the game.
- Multi-threaded command buffering: Utilizing Mantle allows a game developer to queue a much wider flow of information between the graphics card and the CPU. This communication channel is especially good for multi-core CPUs, which have historically gone underutilized in higher-level APIs. You’ll see in your testing that Mantle makes a notable difference in smoothness and performance high-draw-call late game testing.
- Split-frame rendering: Mantle empowers a game developer with total control of multi-GPU systems. That “total control” allows them to design an mGPU renderer that best matches the design of their game. In the case of Civilization: Beyond Earth, Firaxis has selected a split-frame rendering (SFR) subsystem. SFR eliminates the latency penalties typically encountered by AFR configurations.
EQAA is an interesting feature as it improves on the quality of MSAA (somewhat) by doubling the coverage sample count while maintaining the same color sample count as MSAA. So 4xEQAA will have 4 color samples and 8 coverage samples while 4xMSAA would have 4 of each. Interestingly, Firaxis has decided the EQAA will be enabled on Beyond Earth anytime a Radeon card is detected (running in Mantle or DX11) and AA is enabled at all. So even though in the menus you might see 4xMSAA enabled, you are actually running at 4xEQAA. For NVIDIA users, 4xMSAA means 4xMSAA. Performance differences should be negligible though, according to AMD (who would actually be "hurt" by this decision if it brought down FPS).
GeForce GTX 980M Performance Testing
When NVIDIA launched the GeForce GTX 980 and GTX 970 graphics cards last month, part of the discussion at our meetings also centered around the mobile variants of Maxwell. The NDA was a bit later though and Scott wrote up a short story announcing the release of the GTX 980M and the GTX 970M mobility GPUs. Both of these GPUs are based on the same GM204 design as the desktop cards, though as you should have come to expect by now, do so with lower specifications than the similarly-named desktop options. Take a look:
|GTX 980M||GTX 970M||
|Memory||Up to 4GB||Up to 3GB||4GB||4GB||4GB/8GB|
|Memory Rate||2500 MHz||2500 MHz||7.0 (GT/s)||7.0 (GT/s)||2500 MHz|
Just like the desktop models, GTX 980M and GTX 970M are built on the 28nm process technology and are tweaked and built for power efficiency - one of the reasons the mobile release of this product is so interesting.
With a CUDA core count of 1536, the GTX 980M has 33% fewer shader cores than the desktop GTX 980, along with a slightly lower base clock speed. The result is a peak theoretical performance of 3.189 TFLOPs, compared to 4.6 TFLOPs on the GTX 980 desktop. In fact, that is only slightly higher than the GTX 880M based on Kepler, that clocks in with the same CUDA core count (1536) but a TFLOP capability of 2.9. Bear in mind that the GTX 880M is using a different architecture design than the GTX 980M; Maxwell's design advantages go beyond just CUDA core count and clock speed.
The GTX 970M is even smaller, with a CUDA core count of 1280 and peak performance rated at 2.365 TFLOPs. Also notice that the memory bus width has shrunk from 256-bit to 192-bit for this part.
As is typically the case with mobile GPUs, the memory speed of the GTX 980M and GTX 970M is significantly lower than the desktop parts. While the GeForce GTX 980 and 970 that install in your desktop PC will have memory running at 7.0 GHz, the mobile versions will run at 5.0 GHz in order to conserve power.
From a feature set stand point though, the GTX 980M/970M are very much the same as the desktop parts that I looked at in September. You will have support for VXGI, NVIDIA's new custom global illumination technology, Multi-Frame AA and maybe most interestingly, Dynamic Super Resolution (DSR). DSR allows you to render a game at a higher resolution and then use a custom filter to down sample it back to your panel's native resolution. For mobile gamers that are using 1080p screens (as our test sample shipped with) this is a good way to utilize the power of your GPU for less power-hungry games, while getting a surprisingly good image at the same time.
SLI Setup and Testing Configuration
The idea of multi-GPU gaming is pretty simple on the surface. By adding another GPU into your gaming PC, the game and the driver are able to divide the workload of the game engine and send half of the work to one GPU and half to another, then combining that work on to your screen in the form of successive frames. This should make the average frame rate much higher, improve smoothness and just basically make the gaming experience better. However, implementation of multi-GPU technologies like NVIDIA SLI and AMD CrossFire are much more difficult than the simply explanation above. We have traveled many steps in this journey and while things have improved in several key areas, there is still plenty of work to be done in others.
As it turns out, support for GPUs beyond two seems to be one of those areas ready for improvement.
When the new NVIDIA GeForce GTX 980 launched last month my initial review of the product included performance results for GTX 980 cards running in a 2-Way SLI configuration, by far the most common derivative. As it happens though, another set of reference GeForce GTX 980 cards found there way to our office and of course we needed to explore the world of 3-Way and 4-Way SLI support and performance on the new Maxwell GPU.
The dirty secret for 3-Way and 4-Way SLI (and CrossFire for that matter) is that it just doesn't work as well or as smoothly as 2-Way configurations. Much more work is put into standard SLI setups as those are by far the most common and it doesn't help that optimizing for 3-4 GPUs is more complex. Some games will scale well, others will scale poorly; hell some even scale the other direction.
Let's see what the current state of high GPU count SLI is with the GeForce GTX 980 and whether or not you should consider purchasing more than one of these new flagship parts.
Quick Performance Comparison
Earlier this week, we posted a brief story that looked at the performance of Middle-earth: Shadow of Mordor on the latest GPUs from both NVIDIA and AMD. Last week also marked the release of the v1.11 patch for Sniper Elite 3 that introduced an integrated benchmark mode as well as support for AMD Mantle.
I decided that this was worth a quick look with the same line up of graphics cards that we used to test Shadow of Mordor. Let's see how the NVIDIA and AMD battle stacks up here.
For those unfamiliar with the Sniper Elite series, the focuses on the impact of an individual sniper on a particular conflict and Sniper Elite 3 doesn't change up that formula much. If you have ever seen video of a bullet slowly going through a body, allowing you to see the bones/muscle of the particular enemy being killed...you've probably been watching the Sniper Elite games.
Gore and such aside, the game is fun and combines sniper action with stealth and puzzles. It's worth a shot if you are the kind of gamer that likes to use the sniper rifles in other FPS titles.
But let's jump straight to performance. You'll notice that in this story we are not using our Frame Rating capture performance metrics. That is a direct result of wanting to compare Mantle to DX11 rendering paths - since we have no way to create an overlay for Mantle, we have resorted to using FRAPs and the integrated benchmark mode in Sniper Elite 3.
Our standard GPU test bed was used with a Core i7-3960X processor, an X79 motherboard, 16GB of DDR3 memory, and the latest drivers for both parties involved. That means we installed Catalyst 14.9 for AMD and 344.16 for NVIDIA. We'll be comparing the GeForce GTX 980 to the Radeon R9 290X, and the GTX 970 to the R9 290. We will also look at SLI/CrossFire scaling at the high end.
If there is one message that I get from NVIDIA's GeForce GTX 900M-series announcement, it is that laptop gaming is a first-class citizen in their product stack. Before even mentioning the products, the company provided relative performance differences between high-end desktops and laptops. Most of the rest of the slide deck is showing feature-parity with the desktop GTX 900-series, and a discussion about battery life.
First, the parts. Two products have been announced: The GeForce GTX 980M and the GeForce GTX 970M. Both are based on the 28nm Maxwell architecture. In terms of shading performance, the GTX 980M has a theoretical maximum of 3.189 TFLOPs, and the GTX 970M is calculated at 2.365 TFLOPs (at base clock). On the desktop, this is very close to the GeForce GTX 770 and the GeForce GTX 760 Ti, respectively. This metric is most useful when you're compute bandwidth-bound, at high resolution with complex shaders.
The full specifications are:
|GTX 980M||GTX 970M||
|Memory||Up to 4GB||Up to 3GB||4GB||4GB||4GB/8GB|
|Memory Rate||2500 MHz||2500 MHz||7.0 (GT/s)||7.0 (GT/s)||2500 MHz|
As for the features, it should be familiar for those paying attention to both desktop 900-series and the laptop 800M-series product launches. From desktop Maxwell, the 900M-series is getting VXGI, Dynamic Super Resolution, and Multi-Frame Sampled AA (MFAA). From the latest generation of Kepler laptops, the new GPUs are getting an updated BatteryBoost technology. From the rest of the GeForce ecosystem, they will also get GeForce Experience, ShadowPlay, and so forth.
For VXGI, DSR, and MFAA, please see Ryan's discussion for the desktop Maxwell launch. Information about these features is basically identical to what was given in September.
BatteryBoost, on the other hand, is a bit different. NVIDIA claims that the biggest change is just raw performance and efficiency, giving you more headroom to throttle. Perhaps more interesting though, is that GeForce Experience will allow separate one-click optimizations for both plugged-in and battery use cases.
The power efficiency demonstrated with the Maxwell GPU in Ryan's original GeForce GTX 980 and GTX 970 review is even more beneficial for the notebook market where thermal designs are physically constrained. Longer battery life, as well as thinner and lighter gaming notebooks, will see tremendous advantages using a GPU that can run at near peak performance on the maximum power output of an integrated battery. In NVIDIA's presentation, they mention that while notebooks on AC power can use as much as 230 watts of power, batteries tend to peak around 100 watts. Given that a full speed, desktop-class GTX 980 has a TDP of 165 watts, compared to the 250 watts of a Radeon R9 290X, translates into notebook GPU performance that will more closely mirror its desktop brethren.
Of course, you probably will not buy your own laptop GPU; rather, you will be buying devices which integrate these. There are currently five designs across four manufacturers that are revealed (see image above). Three contain the GeForce GTX 980M, one has a GTX 970M, and the other has a pair of GTX 970Ms. Prices and availability are not yet announced.
In what can most definitely be called the best surprise of the fall game release schedule, the open-world action game set in the Lord of the Rings world, Middle-earth: Shadow of Mordor has been receiving impressive reviews from gamers and the media. (GiantBomb.com has a great look at it if you are new to the title.) What also might be a surprise to some is that the PC version of the game can be quite demanding on even the latest PC hardware, pulling in frame rates only in the low-60s at 2560x1440 with its top quality presets.
Late last week I spent a couple of days playing around with Shadow of Mordor as well as the integrated benchmark found inside the Options menu. I wanted to get an idea of the performance characteristics of the game to determine if we might include this in our full-time game testing suite update we are planning later in the fall. To get some sample information I decided to run through a couple of quality presets with the top two cards from NVIDIA and AMD and compare them.
Without a doubt, the visual style of Shadow of Mordor is stunning – with the game settings cranked up high the world, characters and fighting scenes look and feel amazing. To be clear, in the build up to this release we had really not heard anything from the developer or NVIDIA (there is an NVIDIA splash screen at the beginning) about the title which is out of the ordinary. If you are looking for a game that is both fun to play (I am 4+ hours in myself) and can provide a “wow” factor to show off your PC rig then this is definitely worth picking up.
Installation and Overview
While once a very popular way to cool your PC, the art of custom water loops tapered off in the early 2000s as the benefits of better cooling, and overclocking in general, met with diminished returns. In its place grew a host of companies offering closed loop system, individually sealed coolers for processors and even graphics cards that offered some of the benefits of standard water cooling (noise, performance) without the hassle of setting up a water cooling configuration manually.
A bit of a resurgence has occurred in the last year or two though where the art and styling provided by custom water loop cooling is starting to reassert itself into the PC enthusiast mindset. Some companies never left (EVGA being one of them), but it appears that many of the users are returning to it. Consider me part of that crowd.
During a live stream we held with EVGA's Jacob Freeman, the very first prototype of the EVGA Hydro Copper was shown and discussed. Lucky for us, I was able to coerce Jacob into leaving the water block with me for a few days to do some of our testing and see just how much capability we could pull out of the GM204 GPU and a GeForce GTX 980.
Our performance preview today will look at the water block itself, installation, performance and temperature control. Keep in mind that this is a very early prototype, the first one to make its way to US shores. There will definitely be some changes and updates (in both the hardware and the software support for overclocking) before final release in mid to late October. Should you consider this ~$150 Hydro Copper water block for your GTX 980?
The GM204 Architecture
James Clerk Maxwell's equations are the foundation of our society's knowledge about optics and electrical circuits. It is a fitting tribute from NVIDIA to include Maxwell as a code name for a GPU architecture and NVIDIA hopes that features, performance, and efficiency that they have built into the GM204 GPU would be something Maxwell himself would be impressed by. Without giving away the surprise conclusion here in the lead, I can tell you that I have never seen a GPU perform as well as we have seen this week, all while changing the power efficiency discussion in as dramatic a fashion.
To be fair though, this isn't our first experience with the Maxwell architecture. With the release of the GeForce GTX 750 Ti and its GM107 GPU, NVIDIA put the industry on watch and let us all ponder if they could possibly bring such a design to a high end, enthusiast class market. The GTX 750 Ti brought a significantly lower power design to a market that desperately needed it, and we were even able to showcase that with some off-the-shelf PC upgrades, without the need for any kind of external power.
That was GM107 though; today's release is the GM204, indicating that not only are we seeing the larger cousin of the GTX 750 Ti but we also have at least some moderate GPU architecture and feature changes from the first run of Maxwell. The GeForce GTX 980 and GTX 970 are going to be taking on the best of the best products from the GeForce lineup as well as the AMD Radeon family of cards, with aggressive pricing and performance levels to match. And, for those that understand the technology at a fundamental level, you will likely be surprised by how much power it requires to achieve these goals. Toss in support for things like a new AA method, Dynamic Super Resolution, and even improved SLI performance and you can see why doing it all on the same process technology is impressive.
The NVIDIA Maxwell GM204 Architecture
The NVIDIA Maxwell GM204 graphics processor was built from the ground up with an emphasis on power efficiency. As it was stated many times during the technical sessions we attended last week, the architecture team learned quite a bit while developing the Kepler-based Tegra K1 SoC and much of that filtered its way into the larger, much more powerful product you see today. This product is fast and efficient, but it was all done while working on the same TSMC 28nm process technology used on the Kepler GTX 680 and even AMD's Radeon R9 series of products.
The fundamental structure of GM204 is setup like the GM107 product shipped as the GTX 750 Ti. There is an array of GPCs (Graphics Processing Clustsers), each comprised of multiple SMs (Streaming Multiprocessors, also called SMMs for this Maxwell derivative) and external memory controllers. The GM204 chip (the full implementation of which is found on the GTX 980), consists of 4 GPCs, 16 SMMs and four 64-bit memory controllers.
A few days with some magic monitors
Last month friend of the site and technology enthusiast Tom Petersen, who apparently does SOMETHING at NVIDIA, stopped by our offices to talk about G-Sync technology. A variable refresh rate feature added to new monitors with custom NVIDIA hardware, G-Sync is a technology that has been frequently discussed on PC Perspective.
The first monitor to ship with G-Sync is the ASUS ROG Swift PG278Q - a fantastic 2560x1440 27-in monitor with a 144 Hz maximum refresh rate. I wrote a glowing review of the display here recently with the only real negative to it being a high price tag: $799. But when Tom stopped out to talk about the G-Sync retail release, he happened to leave a set of three of these new displays for us to mess with in a G-Sync Surround configuration. Yummy.
So what exactly is the current experience of using a triple G-Sync monitor setup if you were lucky enough to pick up a set? The truth is that the G-Sync portion of the equation works great but that game support for Surround (or Eyefinity for that matter) is still somewhat cumbersome.
In this quick impressions article I'll walk through the setup and configuration of the system and tell you about my time playing seven different PC titles in G-Sync Surround.
Tonga GPU Features
On December 22, 2011, AMD launched the first 28nm GPU based on an architecture called GCN on the code name Tahiti silicon. That was the release of the Radeon HD 7970 and it was the beginning of an incredibly long adventure for PC enthusiasts and gamers. We eventually saw the HD 7970 GHz Edition and the R9 280/280X releases, all based on essentially identical silicon, keeping a spot in the market for nearly 3 years. Today AMD is launching the Tonga GPU and Radeon R9 285, a new piece of silicon that shares many traits of Tahiti but adds support for some additional features.
Replacing the Radeon R9 280 in the current product stack, the R9 285 will step in at $249, essentially the same price. Buyers will be treated to an updated feature set though including options that were only previously available on the R9 290 and R9 290X (and R7 260X). These include TrueAudio, FreeSync, XDMA CrossFire and PowerTune.
Many people have been calling this architecture GCN 1.1 though AMD internally doesn't have a moniker for it. The move from Tahiti, to Hawaii and now to Tonga, reveals a new design philosophy from AMD, one of smaller and more gradual steps forward as opposed to sudden, massive improvements in specifications. Whether this change was self-imposed or a result of the slowing of process technology advancement is really a matter of opinion.
The Waiting Game
NVIDIA G-Sync was announced at a media event held in Montreal way back in October, and promised to revolutionize the way the display and graphics card worked together to present images on the screen. It was designed to remove hitching, stutter, and tearing -- almost completely. Since that fateful day in October of 2013, we have been waiting. Patiently waiting. We were waiting for NVIDIA and its partners to actually release a monitor that utilizes the technology and that can, you know, be purchased.
In December of 2013 we took a look at the ASUS VG248QE monitor, the display for which NVIDIA released a mod kit to allow users that already had this monitor to upgrade to G-Sync compatibility. It worked, and I even came away impressed. I noted in my conclusion that, “there isn't a single doubt that I want a G-Sync monitor on my desk” and, “my short time with the NVIDIA G-Sync prototype display has been truly impressive…”. That was nearly 7 months ago and I don’t think anyone at that time really believed it would be THIS LONG before the real monitors began to show in the hands of gamers around the world.
Since NVIDIA’s October announcement, AMD has been on a marketing path with a technology they call “FreeSync” that claims to be a cheaper, standards-based alternative to NVIDIA G-Sync. They first previewed the idea of FreeSync on a notebook device during CES in January and then showed off a prototype monitor in June during Computex. Even more recently, AMD has posted a public FAQ that gives more details on the FreeSync technology and how it differs from NVIDIA’s creation; it has raised something of a stir with its claims on performance and cost advantages.
That doesn’t change the product that we are reviewing today of course. The ASUS ROG Swift PG278Q 27-in WQHD display with a 144 Hz refresh rate is truly an awesome monitor. What did change is the landscape, from NVIDIA's original announcement until now.
Experience with Silent Design
In the time periods between major GPU releases, companies like ASUS have the ability to really dig down and engineer truly unique products. With the expanded time between major GPU releases, from either NVIDIA or AMD, these products have continued evolving to offer better features and experiences than any graphics card before them. The ASUS Strix GTX 780 is exactly one of those solutions – taking a GTX 780 GPU that was originally released in May of last year and twisting it into a new design that offers better cooling, better power and lower noise levels.
ASUS intended, with the Strix GTX 780, to create a card that is perfect for high end PC gamers, without crossing into the realm of bank-breaking prices. They chose to go with the GeForce GTX 780 GPU from NVIDIA at a significant price drop from the GTX 780 Ti, with only a modest performance drop. They double the reference memory capacity from 3GB to 6GB of GDDR5, to assuage any buyer’s thoughts that 3GB wasn’t enough for multi-screen Surround gaming or 4K gaming. And they change the cooling solution to offer a near silent operation mode when used in “low impact” gaming titles.
The ASUS Strix GTX 780 Graphics Card
The ASUS Strix GTX 780 card is a pretty large beast, both in physical size and in performance. The cooler is a slightly modified version of the very popular DirectCU II thermal design used in many of the custom built ASUS graphics cards. It has a heat dissipation area more than twice that of the reference NVIDIA cooler and uses larger fans that allow them to spin slower (and quieter) at the improved cooling capacity.
Out of the box, the ASUS Strix GTX 780 will run at 889 MHz base clock and 941 MHz Boost clock, a fairly modest increase over the 863/900 MHz rates of the reference card. Obviously with much better cooling and a lot of work being done on the PCB of this custom design, users will have a lot of headroom to overclock on their own, but I continue to implore companies like ASUS and MSI to up the ante out of the box! One area where ASUS does impress is with the memory – the Strix card features a full 6GB of GDDR5 running 6.0 GHz, twice the capacity of the reference GTX 780 (and even GTX 780 Ti) cards. If you had any concerns about Surround or 4K gaming, know that memory capacity will not be a problem. (Though raw compute power may still be.)
When Magma Freezes Over...
Intel confirms that they have approached AMD about access to their Mantle API. The discussion, despite being clearly labeled as "an experiment" by an Intel spokesperson, was initiated by them -- not AMD. According to AMD's Gaming Scientist, Richard Huddy, via PCWorld, AMD's response was, "Give us a month or two" and "we'll go into the 1.0 phase sometime this year" which only has about five months left in it. When the API reaches 1.0, anyone who wants to participate (including hardware vendors) will be granted access.
AMD inside Intel Inside???
I do wonder why Intel would care, though. Intel has the fastest per-thread processors, and their GPUs are not known to be workhorses that are held back by API call bottlenecks, either. Of course, that is not to say that I cannot see any reason, however...