All | Editorial | General Tech | Graphics Cards | Networking | Motherboards | Cases and Cooling | Processors | Chipsets | Memory | Displays | Systems | Storage | Mobile | Shows and Expos
High Bandwidth Memory
UPDATE: I have embedded an excerpt from our PC Perspective Podcast that discusses the HBM technology that you might want to check out in addition to the story below.
The chances are good that if you have been reading PC Perspective or almost any other website that focuses on GPU technologies for the past year, you have read the acronym HBM. You might have even seen its full name: high bandwidth memory. HBM is a new technology that aims to turn the ability for a processor (GPU, CPU, APU, etc.) to access memory upside down, almost literally. AMD has already publicly stated that its next generation flagship Radeon GPU will use HBM as part of its design, but it wasn’t until today that we could talk about what HBM actually offers to a high performance processor like Fiji. At its core HBM drastically changes how the memory interface works, how much power is required for it and what metrics we will use to compare competing memory architectures. AMD and its partners started working on HBM with the industry more than 7 years ago, and with the first retail product nearly ready to ship, it’s time to learn about HBM.
We got some time with AMD’s Joe Macri, Corporate Vice President and Product CTO, to talk about AMD’s move to HBM and how it will shift the direction of AMD products going forward.
The first step in understanding HBM is to understand why it’s needed in the first place. Current GPUs, including the AMD Radeon R9 290X and the NVIDIA GeForce GTX 980, utilize a memory technology known as GDDR5. This architecture has scaled well over the past several GPU generations but we are starting to enter the world of diminishing returns. Balancing memory performance and power consumption is always a tough battle; just ask ARM about it. On the desktop component side we have much larger power envelopes to work inside but the power curve that GDDR5 is on will soon hit a wall, if you plot it far enough into the future. The result will be either drastically higher power consuming graphics cards or stalling performance improvements of the graphics market – something we have not really seen in its history.
While it’s clearly possible that current and maybe even next generation GPU designs could still have depended on GDDR5 as the memory interface, the move to a different solution is needed for the future; AMD is just making the jump earlier than the rest of the industry.
DirectX 12 Has No More Secrets
The DirectX 12 API is finalized and the last of its features are known. Before the BUILD conference, the list consisted of Conservative Rasterization, Rasterizer Ordered Viewed, Typed UAV Load, Volume Tiled Resources, and a new Tiled Resources revision for non-volumetric content. When the GeForce GTX 980 launched, NVIDIA claimed it would be compatible with DirectX 12 features. Enthusiasts were skeptical, because Microsoft did not officially finalize the spec at the time.
Last week, Microsoft announced the last feature of the graphics API: Multiadapter.
We already knew that Multiadapter existed, at least to some extent. It is the part of the specification that allows developers to address multiple graphics adapters to split tasks between them. In DirectX 11 and earlier, secondary GPUs would remain idle unless the graphics driver sprinkled some magic fair dust on it with SLI, CrossFire, or Hybrid CrossFire. The only other way to access this dormant hardware was by spinning up an OpenCL (or similar compute API) context on the side.
Way back in January of this year, while attending CES 2015 in Las Vegas, we wandered into the MSI suite without having any idea what we might see as new and exciting product. Besides the GT80 notebook with a mechanical keyboard on it, the MSI GS30 Shadow was easily the most interesting and exciting technology. Although MSI is not the first company to try this, the Shadow is the most recent attempt to combine the benefits of a thin and light notebook with a discrete, high performance GPU when the former is connected to the latter's docking station.
The idea has always been simple but the implementation has always been complex. Take a thin, light, moderately powered notebook that is usable and high quality in its own right and combine it with the ability to connect a discrete GPU while at home for gaming purposes. In theory, this is the best of both worlds: a notebook PC for mobile productivity and gaming capability courtesy of an external GPU. But as the years have gone on, more companies try and more companies fail; the integration process is just never as perfect a mix as we hope.
Today we see if MSI and the GS30 Shadow can fare any better. Does the combination of a very high performance thin and light notebook and the GamingDock truly create a mobile and gaming system that is worth your investment?
It's more than just a branding issue
As a part of my look at the first wave of AMD FreeSync monitors hitting the market, I wrote an analysis of how the competing technologies of FreeSync and G-Sync differ from one another. It was a complex topic that I tried to state in as succinct a fashion as possible given the time constraints and that the article subject was on FreeSync specifically. I'm going to include a portion of that discussion here, to recap:
First, we need to look inside the VRR window, the zone in which the monitor and AMD claims that variable refresh should be working without tears and without stutter. On the LG 34UM67 for example, that range is 48-75 Hz, so frame rates between 48 FPS and 75 FPS should be smooth. Next we want to look above the window, or at frame rates above the 75 Hz maximum refresh rate of the window. Finally, and maybe most importantly, we need to look below the window, at frame rates under the minimum rated variable refresh target, in this example it would be 48 FPS.
AMD FreeSync offers more flexibility for the gamer than G-Sync around this VRR window. For both above and below the variable refresh area, AMD allows gamers to continue to select a VSync enabled or disabled setting. That setting will be handled as you are used to it today when your game frame rate extends outside the VRR window. So, for our 34UM67 monitor example, if your game is capable of rendering at a frame rate of 85 FPS then you will either see tearing on your screen (if you have VSync disabled) or you will get a static frame rate of 75 FPS, matching the top refresh rate of the panel itself. If your game is rendering at 40 FPS, lower than the minimum VRR window, then you will again see the result of tearing (with VSync off) or the potential for stutter and hitching (with VSync on).
But what happens with this FreeSync monitor and theoretical G-Sync monitor below the window? AMD’s implementation means that you get the option of disabling or enabling VSync. For the 34UM67 as soon as your game frame rate drops under 48 FPS you will either see tearing on your screen or you will begin to see hints of stutter and judder as the typical (and previously mentioned) VSync concerns again crop their head up. At lower frame rates (below the window) these artifacts will actually impact your gaming experience much more dramatically than at higher frame rates (above the window).
G-Sync treats this “below the window” scenario very differently. Rather than reverting to VSync on or off, the module in the G-Sync display is responsible for auto-refreshing the screen if the frame rate dips below the minimum refresh of the panel that would otherwise be affected by flicker. So, in a 30-144 Hz G-Sync monitor, we have measured that when the frame rate actually gets to 29 FPS, the display is actually refreshing at 58 Hz, each frame being “drawn” one extra instance to avoid flicker of the pixels but still maintains a tear free and stutter free animation. If the frame rate dips to 25 FPS, then the screen draws at 50 Hz. If the frame rate drops to something more extreme like 14 FPS, we actually see the module quadruple drawing the frame, taking the refresh rate back to 56 Hz. It’s a clever trick that keeps the VRR goals and prevents a degradation of the gaming experience. But, this method requires a local frame buffer and requires logic on the display controller to work. Hence, the current implementation in a G-Sync module.
As you can see, the topic is complicated. So Allyn and I (and an aging analog oscilloscope) decided to take it upon ourselves to try and understand and teach the implementation differences with the help of some science. The video below is where the heart of this story is focused, though I have some visual aids embedded after it.
Still not clear on what this means for frame rates and refresh rates on current FreeSync and G-Sync monitors? Maybe this will help.
Our first DX12 Performance Results
Late last week, Microsoft approached me to see if I would be interested in working with them and with Futuremark on the release of the new 3DMark API Overhead Feature Test. Of course I jumped at the chance, with DirectX 12 being one of the hottest discussion topics among gamers, PC enthusiasts and developers in recent history. Microsoft set us up with the latest iteration of 3DMark and the latest DX12-ready drivers from AMD, NVIDIA and Intel. From there, off we went.
First we need to discuss exactly what the 3DMark API Overhead Feature Test is (and also what it is not). The feature test will be a part of the next revision of 3DMark, which will likely ship in time with the full Windows 10 release. Futuremark claims that it is the "world's first independent" test that allows you to compare the performance of three different APIs: DX12, DX11 and even Mantle.
It was almost one year ago that Microsoft officially unveiled the plans for DirectX 12: a move to a more efficient API that can better utilize the CPU and platform capabilities of future, and most importantly current, systems. Josh wrote up a solid editorial on what we believe DX12 means for the future of gaming, and in particular for PC gaming, that you should check out if you want more background on the direction DX12 has set.
One of DX12 keys for becoming more efficient is the ability for developers to get closer to the metal, which is a phrase to indicate that game and engine coders can access more power of the system (CPU and GPU) without having to have its hand held by the API itself. The most direct benefit of this, as we saw with AMD's Mantle implementation over the past couple of years, is improved quantity of draw calls that a given hardware system can utilize in a game engine.
With the release of the GeForce GTX 980 back in September of 2014, NVIDIA took the lead in performance with single GPU graphics cards. The GTX 980 and GTX 970 were both impressive options. The GTX 970 offered better performance than the R9 290 as did the GTX 980 compared to the R9 290X; on top of that, both did so while running at lower power consumption and while including new features like DX12 feature level support, HDMI 2.0 and MFAA (multi-frame antialiasing). Because of those factors, the GTX 980 and GTX 970 were fantastic sellers, helping to push NVIDIA’s market share over 75% as of the 4th quarter of 2014.
But in the back of our mind, and in the minds of many NVIDIA fans, we knew that the company had another GPU it was holding on to: the bigger, badder version of Maxwell. The only question was going to be WHEN the company would release it and sell us a new flagship GeForce card. In most instances, this decision is based on the competitive landscape, such as when AMD might be finally updating its Radeon R9 290X Hawaii family of products with the rumored R9 390X. Perhaps NVIDIA is tired of waiting or maybe the strategy is to launch soon before Fiji GPUs make their debut. Either way, NVIDIA officially took the wraps off of the new GeForce GTX TITAN X at the Game Developers Conference two weeks ago.
At the session hosted by Epic Games’ Tim Sweeney, NVIDIA CEO Jen-Hsun Huang arrived when Tim lamented about needing more GPU horsepower for their UE4 content. In his hands he had the first TITAN X GPU and talked about only a couple of specifications: the card would have 12GB of memory and it would be based on a GPU with 8 billion transistors.
Since that day, you have likely seen picture after picture, rumor after rumor, about specifications, pricing and performance. Wait no longer: the GeForce GTX TITAN X is here. With a $999 price tag and a GPU with 3072 CUDA cores, we clearly have a new king of the court.
Finally, a SHIELD Console
NVIDIA is filling out the family of the SHIELD brand today with the announcement of SHIELD, a set-top box powered by the Tegra X1 processor. SHIELD will run Android TV and act as a game playing, multimedia watching, GRID streaming device. Selling for $199 and available in May of this year, there is a lot to discuss.
Odd naming scheme aside, the SHIELD looks to be an impressive little device, sitting on your home theater or desk and bringing a ton of connectivity and performance to your TV. Running Android TV means the SHIELD will have access to the entire library of Google Play media including music, movies and apps. SHIELD supports 4K video playback at 60 Hz thanks to an HDMI 2.0 connection and fully supports H.265/HEVC decode thanks to Tegra X1 processor.
Here is a full breakdown of the device's specifications.
|NVIDIA SHIELD Specifications|
|Processor||NVIDIA® Tegra® X1 processor with 256-core Maxwell™ GPU with 3GB RAM|
|Video Features||4K Ultra-HD Ready with 4K playback and capture up to 60 fps (VP9, H265, H264)|
|Audio||7.1 and 5.1 surround sound pass through over HDMI
High-resolution audio playback up to 24-bit/192kHz over HDMI and USB
High-resolution audio upsample to 24-bit/192hHz over USB
|Wireless||802.11ac 2x2 MIMO 2.4 GHz and 5 GHz Wi-Fi
Two USB 3.0 (Type A)
MicroSD slot (supports 128GB cards)
IR Receiver (compatible with Logitech Harmony)
|Gaming Features||NVIDIA GRID™ streaming service
|SW Updates||SHIELD software upgrades directly from NVIDIA|
|Power||40W power adapter|
|Weight and Size||Weight: 23oz / 654g
Height: 5.1in / 130mm
Width: 8.3in / 210mm
Depth: 1.0in / 25mm
|OS||Android TV™, Google Cast™ Ready|
|In the box||NVIDIA SHIELD
NVIDIA SHIELD controller
HDMI cable (High Speed), USB cable (Micro-USB to USB)
Power adapter (Includes plugs for North America, Europe, UK)
|Requirements||TV with HDMI input, Internet access|
|Options||SHIELD controller, SHIELD remove, SHIELD stand|
Obviously the most important feature is the Tegra X1 SoC, built on an 8-core 64-bit ARM processor and a 256 CUDA Core Maxwell architecture GPU. This gives the SHIELD set-top more performance than basically any other mobile part on the market, and demos showing Doom 3 and Crysis 3 running natively on the hardware drive the point home. With integrated HEVC decode support the console is the first Android TV device to offer the support for 4K video content at 60 FPS.
Even though storage is only coming in at 16GB, the inclusion of an MicroSD card slot enabled expansion to as much as 128GB more for content and local games.
The first choice for networking will be the Gigabit Ethernet port, but the 2x2 dual-band 802.11ac wireless controller means that even those of us that don't have hardwired Internet going to our TV will be able to utilize all the performance and features of SHIELD.
As GDC progresses here in San Francisco, AMD took the wraps off of a new SDK for game developers to use to improve experiences with virtual reality (VR) headsets. Called LiquidVR, the goal is provide a smooth and stutter free VR experience that is universal across all headset hardware and to keep the wearer, be it a gamer or professional user, immersed.
AMD's CTO of Graphics, Raja Koduri spoke with us about the three primary tenets of the LiquidVR initiative. The 'three Cs' as it is being called are Comfort, Compatibility and Compelling Content. Ignoring the fact that we have four C's in that phrase, the premise is straight forward. Comfortable use of VR means there is little to no issues with neusea and that can be fixed with ultra-low latency between motion (of your head) and photons (hitting your eyes). For compatibility, AMD would like to assure that all VR headsets are treated equally and all provide the best experience. Oculus, HTC and others should operate in a simple, plug-and-play style. Finally, the content story is easy to grasp with a focus on solid games and software to utilize VR but AMD also wants to ensure that the rendering is scalable across different hardware and multiple GPUs.
To address these tenets AMD has built four technologies into LiquidVR: late data latching, asynchronous shaders, affinity multi-GPU, and direct-to-display.
The idea behind late data latching is to get the absolute most recent raw data from the VR engine to the users eyes. This means that rather than asking for the head position of a gamer at the beginning of a render job, LiquidVR will allow the game to ask for it at the end of the rendering pipeline, which might seem counter-intuitive. Late latch means the users head movement is tracked until the end of the frame render rather until just the beginning, saving potentially 5-10ms of delay.
Who Should Care? Thankfully, Many People
The Khronos Group has made three announcements today: Vulkan (their competitor to DirectX 12), OpenCL 2.1, and SPIR-V. Because there is actually significant overlap, we will discuss them in a single post rather than splitting them up. Each has a role in the overall goal to access and utilize graphics and compute devices.
Before we get into what everything is and does, let's give you a little tease to keep you reading. First, Khronos designs their technologies to be self-reliant. As such, while there will be some minimum hardware requirements, the OS pretty much just needs to have a driver model. Vulkan will not be limited to Windows 10 and similar operating systems. If a graphics vendor wants to go through the trouble, which is a gigantic if, Vulkan can be shimmed into Windows 8.x, Windows 7, possibly Windows Vista despite its quirks, and maybe even Windows XP. The words “and beyond” came up after Windows XP, but don't hold your breath for Windows ME or anything. Again, the further back in Windows versions you get, the larger the “if” becomes but at least the API will not have any “artificial limitations”.
Outside of Windows, the Khronos Group is the dominant API curator. Expect Vulkan on Linux, Mac, mobile operating systems, embedded operating systems, and probably a few toasters somewhere.
On that topic: there will not be a “Vulkan ES”. Vulkan is Vulkan, and it will run on desktop, mobile, VR, consoles that are open enough, and even cars and robotics. From a hardware side, the API requires a minimum of OpenGL ES 3.1 support. This is fairly high-end for mobile GPUs, but it is the first mobile spec to require compute shaders, which are an essential component of Vulkan. The presenter did not state a minimum hardware requirement for desktop GPUs, but he treated it like a non-issue. Graphics vendors will need to be the ones making the announcements in the end, though.
Quiet, Efficient Gaming
The last few weeks have been dominated by talk about the memory controller of the Maxwell based GTX 970. There are some very strong opinions about that particular issue, and certainly NVIDIA was remiss on actually informing consumers about how it handles the memory functionality of that particular product. While that debate rages, we have somewhat lost track of other products in the Maxwell range. The GTX 960 was released during this particular firestorm and, while it also shared the outstanding power/performance qualities of the Maxwell architecture, it is considered a little overpriced when compared to other cards in its price class in terms of performance.
It is easy to forget that the original Maxwell based product to hit shelves was the GTX 750 series of cards. They were released a year ago to some very interesting reviews. The board is one of the first mainstream cards in recent memory to have a power draw that is under 75 watts, but can still play games with good quality settings at 1080P resolutions. Ryan covered this very well and it turned out to be a perfect gaming card for many pre-built systems that do not have extra power connectors (or a power supply that can support 125+ watt graphics cards). These are relatively inexpensive cards and very easy to install, producing a big jump in performance as compared to the integrated graphics components of modern CPUs and APUs.
The GTX 750 and GTX 750 Ti have proven to be popular cards due to their overall price, performance, and extremely low power consumption. They also tend to produce a relatively low amount of heat, due to solid cooling combined with that low power consumption. The Maxwell architecture has also introduced some new features, but the major changes are to the overall design of the architecture as compared to Kepler. Instead of 192 cores per SMK, there are now 128 cores per SMM. NVIDIA has done a lot of work to improve performance per core as well as lower power in a fairly dramatic way. An interesting side effect is that the CPU hit with Maxwell is a couple of percentage points higher than Kepler. NVIDIA does lean a bit more on the CPU to improve overall GPU power, but most of this performance hit is covered up by some really good realtime compiler work in the driver.
Asus has taken the GTX 750 Ti and applied their STRIX design and branding to it. While there are certainly faster GPUs on the market, there are none that exhibit the power characteristics of the GTX 750 Ti. The combination of this GPU and the STRIX design should result in an extremely efficient, cool, and silent card.
A baker's dozen of GTX 960
Back on the launch day of the GeForce GTX 960, we hosted NVIDIA's Tom Petersen for a live stream. During the event, NVIDIA and its partners provided ten GTX 960 cards for our live viewers to win which we handed out through about an hour and a half. An interesting idea was proposed during the event - what would happen if we tried to overclock all of the product NVIDIA had brought along to see what the distribution of results looked like? After notifying all the winners of their prizes and asking for permission from each, we started the arduous process of testing and overclocking a total of 13 (10 prizes plus our 3 retail units already in the office) different GTX 960 cards.
Hopefully we will be able to provide a solid base of knowledge for buyers of the GTX 960 that we don't normally have the opportunity to offer: what is the range of overclocking you can expect and what is the average or median result. I think you will find the data interesting.
The 13 Contenders
Our collection of thirteen GTX 960 cards includes a handful from ASUS, EVGA and MSI. The ASUS models are all STRIX models, the EVGA cards are of the SSC variety, and the MSI cards include a single Gaming model and three 100ME. (The only difference between the Gaming and 100ME MSI cards is the color of the cooler.)
To be fair to the prize winners, I actually assigned each of them a specific graphics card before opening them up and testing them. I didn't want to be accused of favoritism by giving the best overclockers to the best readers!
Battlefield 4 Results
At the end of my first Frame Rating evaluation of the GTX 970 after the discovery of the memory architecture issue, I proposed the idea that SLI testing would need to be done to come to a more concrete conclusion on the entire debate. It seems that our readers and the community at large agreed with us in this instance, repeatedly asking for those results in the comments of the story. After spending the better part of a full day running and re-running SLI results on a pair of GeForce GTX 970 and GTX 980 cards, we have the answers you're looking for.
Today's story is going to be short on details and long on data, so if you want the full back story on what is going on why we are taking a specific look at the GTX 970 in this capacity, read here:
- Part 1: NVIDIA issues initial statement
- Part 2: Full GTX 970 memory architecture disclosed
- Part 3: Frame Rating: GTX 970 vs GTX 980
- Part 4: Frame Rating: GTX 970 SLI vs GTX 980 SLI (what you are reading now)
Okay, are we good now? Let's dive into the first set of results in Battlefield 4.
Battlefield 4 Results
Just as I did with the first GTX 970 performance testing article, I tested Battlefield 4 at 3840x2160 (4K) and utilized the game's ability to linearly scale resolution to help me increase GPU memory allocation. In the game settings you can change that scaling option by a percentage: I went from 110% to 150% in 10% increments, increasing the load on the GPU with each step.
Memory allocation between the two SLI configurations was similar, but not as perfectly aligned with each other as we saw with our single GPU testing.
In a couple of cases, at 120% and 130% scaling, the GTX 970 cards in SLI are actually each using more memory than the GTX 980 cards. That difference is only ~100MB but that delta was not present at all in the single GPU testing.
It has been an abnormal week for us here at PC Perspective. Our typical review schedule has pretty much flown out the window, and the past seven days have been filled with learning, researching, retesting, and publishing. That might sound like the norm, but in these cases the process was initiated by tips from our readers. Last Saturday (24 Jan), a few things were brewing:
- Ryan was informed by NVIDIA that the memory layout of the GTX 970 was different than expected.
- The huge (now 168 page) overclock.net forum thread about the Samsung 840 EVO slowdown was once again gaining traction.
- Someone got G-Sync working on a laptop integrated display.
We had to do a bit of triage here of course, as we can only research and write so quickly. Ryan worked the GTX 970 piece as it was the hottest item. I began a few days of research and testing on the 840 EVO slow down issue reappearing on some drives, and we kept tabs on that third thing, which at the time seemed really farfetched. With those two first items taken care of, Ryan shifted his efforts to GTX 970 SLI testing while I shifted my focus to finding out of there was any credence to this G-Sync laptop thing.
A few weeks ago, an ASUS Nordic Support rep inadvertently leaked an interim build of the NVIDIA driver. This was a mobile driver build (version 346.87) focused at their G751 line of laptops. One recipient of this driver link posted it to the ROG forum back on the 20th. A fellow by the name Gamenab, owning the same laptop cited in that thread, presumably stumbled across this driver, tried it out, and was more than likely greeted by this popup after the installation completed:
Now I know what you’re thinking, and it’s probably the same thing anyone would think. How on earth is this possible? To cut a long story short, while the link to the 346.87 driver was removed shortly after being posted to that forum, we managed to get our hands on a copy of it, installed it on the ASUS G751 that we had in for review, and wouldn’t you know it we were greeted by the same popup!
Ok, so it’s a popup, could it be a bug? We checked NVIDIA control panel and the options were consistent with that of a G-Sync connected system. We fired up the pendulum demo and watched the screen carefully, passing the machine around the office to be inspected by all. We then fired up some graphics benchmarks that were well suited to show off the technology (Unigine Heaven, Metro: Last Light, etc), and everything looked great – smooth steady pans with no juddering or tearing to be seen. Ken Addison, our Video Editor and jack of all trades, researched the panel type and found that it was likely capable of 100 Hz refresh. We quickly dug created a custom profile, hit apply, and our 75 Hz G-Sync laptop was instantly transformed into a 100 Hz G-Sync laptop!
Ryan's Note: I think it is important here to point out that we didn't just look at demos and benchmarks for this evaluation but actually looked at real-world gameplay situations. Playing through Metro: Last Light showed very smooth pans and rotation, Assassin's Creed played smoothly as well and flying through Unigine Heaven manually was a great experience. Crysis 3, Battlefield 4, etc. This was NOT just a couple of demos that we ran through - the variable refresh portion of this mobile G-Sync enabled panel was working and working very well.
At this point in our tinkering, we had no idea how or why this was working, but there was no doubt that we were getting a similar experience as we have seen with G-Sync panels. As I digested what was going on, I thought surely this can’t be as good as it seems to be… Let’s find out, shall we?
A Summary Thus Far
UPDATE 2/2/15: We have another story up that compares the GTX 980 and GTX 970 in SLI as well.
It has certainly been an interesting week for NVIDIA. It started with the release of the new GeForce GTX 960, a $199 graphics card that brought the latest iteration of Maxwell's architecture to a lower price point, competing with the Radeon R9 280 and R9 285 products. But then the proverbial stuff hit the fan with a memory issue on the GeForce GTX 970, the best selling graphics card of the second half of 2014. NVIDIA responded to the online community on Saturday morning but that was quickly followed up with a more detailed expose on the GTX 970 memory hierarchy, which included a couple of important revisions to the specifications of the GTX 970 as well.
At the heart of all this technical debate is a performance question: does the GTX 970 suffer from lower performance because of of the 3.5GB/0.5GB memory partitioning configuration? Many forum members and PC enthusiasts have been debating this for weeks with many coming away with an emphatic yes.
The newly discovered memory system of the GeForce GTX 970
Yesterday I spent the majority of my day trying to figure out a way to validate or invalidate these types of performance claims. As it turns out, finding specific game scenarios that will consistently hit targeted memory usage levels isn't as easy as it might first sound and simple things like the order of start up can vary that as well (and settings change orders). Using Battlefield 4 and Call of Duty: Advanced Warfare though, I think I have presented a couple of examples that demonstrate the issue at hand.
Performance testing is a complicated story. Lots of users have attempted to measure performance on their own setup, looking for combinations of game settings that sit below the 3.5GB threshold and those that cross above it, into the slower 500MB portion. The issue for many of these tests is that they lack access to both a GTX 970 and a GTX 980 to really compare performance degradation between cards. That's the real comparison to make - the GTX 980 does not separate its 4GB into different memory pools. If it has performance drops in the same way as the GTX 970 then we can wager the memory architecture of the GTX 970 is not to blame. If the two cards perform differently enough, beyond the expected performance delta between two cards running at different clock speeds and with different CUDA core counts, then we have to question the decisions that NVIDIA made.
There has also been concern over the frame rate consistency of the GTX 970. Our readers are already aware of how deceptive an average frame rate alone can be, and why looking at frame times and frame time consistency is so much more important to guaranteeing a good user experience. Our Frame Rating method of GPU testing has been in place since early 2013 and it tests exactly that - looking for consistent frame times that result in a smooth animation and improved gaming experience.
Users at reddit.com have been doing a lot of subjective testing
We will be applying Frame Rating to our testing today of the GTX 970 and its memory issues - does the division of memory pools introduce additional stutter into game play? Let's take a look at a couple of examples.
A few secrets about GTX 970
UPDATE 1/28/15 @ 10:25am ET: NVIDIA has posted in its official GeForce.com forums that they are working on a driver update to help alleviate memory performance issues in the GTX 970 and that they will "help out" those users looking to get a refund or exchange.
Yes, that last 0.5GB of memory on your GeForce GTX 970 does run slower than the first 3.5GB. More interesting than that fact is the reason why it does, and why the result is better than you might have otherwise expected. Last night we got a chance to talk with NVIDIA’s Senior VP of GPU Engineering, Jonah Alben on this specific concern and got a detailed explanation to why gamers are seeing what they are seeing along with new disclosures on the architecture of the GM204 version of Maxwell.
NVIDIA's Jonah Alben, SVP of GPU Engineering
For those looking for a little background, you should read over my story from this weekend that looks at NVIDIA's first response to the claims that the GeForce GTX 970 cards currently selling were only properly utilizing 3.5GB of the 4GB frame buffer. While it definitely helped answer some questions it raised plenty more which is whey we requested a talk with Alben, even on a Sunday.
Let’s start with a new diagram drawn by Alben specifically for this discussion.
GTX 970 Memory System
Believe it or not, every issue discussed in any forum about the GTX 970 memory issue is going to be explained by this diagram. Along the top you will see 13 enabled SMMs, each with 128 CUDA cores for the total of 1664 as expected. (Three grayed out SMMs represent those disabled from a full GM204 / GTX 980.) The most important part here is the memory system though, connected to the SMMs through a crossbar interface. That interface has 8 total ports to connect to collections of L2 cache and memory controllers, all of which are utilized in a GTX 980. With a GTX 970 though, only 7 of those ports are enabled, taking one of the combination L2 cache / ROP units along with it. However, the 32-bit memory controller segment remains.
You should take two things away from that simple description. First, despite initial reviews and information from NVIDIA, the GTX 970 actually has fewer ROPs and less L2 cache than the GTX 980. NVIDIA says this was an error in the reviewer’s guide and a misunderstanding between the engineering team and the technical PR team on how the architecture itself functioned. That means the GTX 970 has 56 ROPs and 1792 KB of L2 cache compared to 64 ROPs and 2048 KB of L2 cache for the GTX 980. Before people complain about the ROP count difference as a performance bottleneck, keep in mind that the 13 SMMs in the GTX 970 can only output 52 pixels/clock and the seven segments of 8 ROPs each (56 total) can handle 56 pixels/clock. The SMMs are the bottleneck, not the ROPs.
A new GPU, a familiar problem
Editor's Note: Don't forget to join us today for a live streaming event featuring Ryan Shrout and NVIDIA's Tom Petersen to discuss the new GeForce GTX 960. It will be live at 1pm ET / 10am PT and will include ten (10!) GTX 960 prizes for participants! You can find it all at http://www.pcper.com/live
There are no secrets anymore. Calling today's release of the NVIDIA GeForce GTX 960 a surprise would be like calling another Avenger's movie unexpected. If you didn't just assume it was coming chances are the dozens of leaks of slides and performance would get your attention. So here it is, today's the day, NVIDIA finally upgrades the mainstream segment that was being fed by the GTX 760 for more than a year and half. But does the brand new GTX 960 based on Maxwell move the needle?
But as you'll soon see, the GeForce GTX 960 is a bit of an odd duck in terms of new GPU releases. As we have seen several times in the last year or two with a stagnant process technology landscape, the new cards aren't going be wildly better performing than the current cards from either NVIDIA for AMD. In fact, there are some interesting comparisons to make that may surprise fans of both parties.
The good news is that Maxwell and the GM206 GPU will price out starting at $199 including overclocked models at that level. But to understand what makes it different than the GM204 part we first need to dive a bit into the GM206 GPU and how it matches up with NVIDIA's "small" GPU strategy of the past few years.
The GM206 GPU - Generational Complexity
First and foremost, the GTX 960 is based on the exact same Maxwell architecture as the GTX 970 and GTX 980. The power efficiency, the improved memory bus compression and new features all make their way into the smaller version of Maxwell selling for $199 as of today. If you missed the discussion on those new features including MFAA, Dynamic Super Resolution, VXGI you should read that page of our original GTX 980 and GTX 970 story from last September for a bit of context; these are important aspects of Maxwell and the new GM206.
NVIDIA's GM206 is essentially half of the full GM204 GPU that you find on the GTX 980. That includes 1024 CUDA cores, 64 texture units and 32 ROPs for processing, a 128-bit memory bus and 2GB of graphics memory. This results in half of the memory bandwidth at 112 GB/s and half of the peak compute capability at 2.30 TFLOPS.
There are smart people that work at AMD. A quick look at the company's products, including the APU lineup as well as the discrete GPU fields, clearly indicates a lineup of talent in engineering, design, marketing and business. It's not perfect of course, and very few companies can claim to be, but the strengths of AMD are there and easily discernible to those of us on the outside looking in with the correct vision.
Because AMD has smart people working hard to improve the company, they are also aware of its shortcomings. For many years now, the thorn of GPU software has been sticking in AMD's side, tarnishing the name of Radeon and the products it releases. Even though the Catalyst graphics driver has improved substantially year after year, the truth is that NVIDIA's driver team has been keeping ahead of AMD consistently in basically all regards: features, driver installation, driver stability, performance improvements over time.
If knowing is half the battle, acting on that knowledge is at least another 49%. AMD is hoping to address driver concerns now and into the future with the release of the Catalyst Omega driver. This driver sets itself apart from previous releases in several different ways, starting with a host of new features, some incremental performance improvements and a drastically amped up testing and validation process.
AMD considers this a "special edition" driver and is something that they plan to repeat on a yearly basis. That note in itself is an interesting point - is that often enough to really change the experience and perception of the Catalyst driver program going forward? Though AMD does include some specific numbers of tested cases for its validation of the Omega driver (441,000+ automated test runs, 11,000+ manual test runs) we don't have side by side data from NVIDIA to compare it to. If AMD is only doing a roundup of testing like this once a year, but NVIDIA does it more often, then AMD might soon find itself back in the same position it has been.
UPDATE: There has been some confusion based on this story that I want to correct. AMD informed us that it is still planning on releasing other drivers throughout the year that will address performance updates for specific games and bug fixes for applications and titles released between today and the pending update for the next "special edition." AMD is NOT saying that they will only have a driver drop once a year.
But before we worry about what's going to happen in the future, let's look into what AMD has changed and added to the new Catalyst Omega driver released today.
We’ve been tracking NVIDIA’s G-Sync for quite a while now. The comments section on Ryan’s initial article erupted with questions, and many of those were answered in a follow-on interview with NVIDIA’s Tom Petersen. The idea was radical – do away with the traditional fixed refresh rate and only send a new frame to the display when it has just completed rendering by the GPU. There are many benefits here, but the short version is that you get the low-latency benefit of V-SYNC OFF gaming combined with the image quality (lack of tearing) that you would see if V-SYNC was ON. Despite the many benefits, there are some potential disadvantages that come from attempting to drive an LCD panel at varying periods of time, as opposed to the fixed intervals that have been the norm for over a decade.
As the first round of samples came to us for review, the current leader appeared to be the ASUS ROG Swift. A G-Sync 144 Hz display at 1440P was sure to appeal to gamers who wanted faster response than the 4K 60 Hz G-Sync alternative was capable of. Due to what seemed to be large consumer demand, it has taken some time to get these panels into the hands of consumers. As our Storage Editor, I decided it was time to upgrade my home system, placed a pre-order, and waited with anticipation of finally being able to shift from my trusty Dell 3007WFP-HC to a large panel that can handle >2x the FPS.
Fast forward to last week. My pair of ROG Swifts arrived, and some other folks I knew had also received theirs. Before I could set mine up and get some quality gaming time in, my bro FifthDread and his wife both noted a very obvious flicker on their Swifts within the first few minutes of hooking them up. They reported the flicker during game loading screens and mid-game during background content loading occurring in some RTS titles. Prior to hearing from them, the most I had seen were some conflicting and contradictory reports on various forums (not limed to the Swift, though that is the earliest panel and would therefore see the majority of early reports), but now we had something more solid to go on. That night I fired up my own Swift and immediately got to doing what I do best – trying to break things. We have reproduced the issue and intend to demonstrate it in a measurable way, mostly to put some actual data out there to go along with those trying to describe something that is borderline perceptible for mere fractions of a second.
First a bit of misnomer correction / foundation laying:
- The ‘Screen refresh rate’ option you see in Windows Display Properties is actually a carryover from the CRT days. In terms of an LCD, it is the maximum rate at which a frame is output to the display. It is not representative of the frequency at which the LCD panel itself is refreshed by the display logic.
- LCD panel pixels are periodically updated by a scan, typically from top to bottom. Newer / higher quality panels repeat this process at a rate higher than 60 Hz in order to reduce the ‘rolling shutter’ effect seen when panning scenes or windows across the screen.
- In order to engineer faster responding pixels, manufacturers must deal with the side effect of faster pixel decay between refreshes. This is a balanced by increasing the frequency of scanning out to the panel.
- The effect we are going to cover here has nothing to do with motion blur, LightBoost, backlight PWM, LightBoost combined with G-Sync (not currently a thing, even though Blur Busters has theorized on how it could work, their method would not work with how G-Sync is actually implemented today).
With all of that out of the way, let’s tackle what folks out there may be seeing on their own variable refresh rate displays. Based on our testing so far, the flicker only presented at times when a game enters a 'stalled' state. These are periods where you would see a split-second freeze in the action, like during a background level load during game play in some titles. It also appears during some game level load screens, but as those are normally static scenes, they would have gone unnoticed on fixed refresh rate panels. Since we were absolutely able to see that something was happening, we wanted to be able to catch it in the act and measure it, so we rooted around the lab and put together some gear to do so. It’s not a perfect solution by any means, but we only needed to observe differences between the smooth gaming and the ‘stalled state’ where the flicker was readily observable. Once the solder dust settled, we fired up a game that we knew could instantaneously swing from a high FPS (144) to a stalled state (0 FPS) and back again. As it turns out, EVE Online does this exact thing while taking an in-game screen shot, so we used that for our initial testing. Here’s what the brightness of a small segment of the ROG Swift does during this very event:
Measured panel section brightness over time during a 'stall' event. Click to enlarge.
The relatively small ripple to the left and right of center demonstrate the panel output at just under 144 FPS. Panel redraw is in sync with the frames coming from the GPU at this rate. The center section, however, represents what takes place when the input from the GPU suddenly drops to zero. In the above case, the game briefly stalled, then resumed a few frames at 144, then stalled again for a much longer period of time. Completely stopping the panel refresh would result in all TN pixels bleeding towards white, so G-Sync has a built-in failsafe to prevent this by forcing a redraw every ~33 msec. What you are seeing are the pixels intermittently bleeding towards white and periodically being pulled back down to the appropriate brightness by a scan. The low latency panel used in the ROG Swift does this all of the time, but it is less noticeable at 144, as you can see on the left and right edges of the graph. An additional thing that’s happening here is an apparent rise in average brightness during the event. We are still researching the cause of this on our end, but this brightness increase certainly helps to draw attention to the flicker event, making it even more perceptible to those who might have not otherwise noticed it.
Some of you might be wondering why this same effect is not seen when a game drops to 30 FPS (or even lower) during the course of normal game play. While the original G-Sync upgrade kit implementation simply waited until 33 msec had passed until forcing an additional redraw, this introduced judder from 25-30 FPS. Based on our observations and testing, it appears that NVIDIA has corrected this in the retail G-Sync panels with an algorithm that intelligently re-scans at even multiples of the input frame rate in order to keep the redraw rate relatively high, and therefore keeping flicker imperceptible – even at very low continuous frame rates.
A few final points before we go:
- This is not limited to the ROG Swift. All variable refresh panels we have tested (including 4K) see this effect to a more or less degree than reported here. Again, this only occurs when games instantaneously drop to 0 FPS, and not when those games dip into low frame rates in a continuous fashion.
- The effect is less perceptible (both visually and with recorded data) at lower maximum refresh rate settings.
- The effect is not present at fixed refresh rates (G-Sync disabled or with non G-Sync panels).
This post was primarily meant as a status update and to serve as something for G-Sync users to point to when attempting to explain the flicker they are perceiving. We will continue researching, collecting data, and coordinating with NVIDIA on this issue, and will report back once we have more to discuss.
During the research and drafting of this piece, we reached out to and worked with NVIDIA to discuss this issue. Here is their statement:
"All LCD pixel values relax after refreshing. As a result, the brightness value that is set during the LCD’s scanline update slowly relaxes until the next refresh.
This means all LCDs have some slight variation in brightness. In this case, lower frequency refreshes will appear slightly brighter than high frequency refreshes by 1 – 2%.
When games are running normally (i.e., not waiting at a load screen, nor a screen capture) - users will never see this slight variation in brightness value. In the rare cases where frame rates can plummet to very low levels, there is a very slight brightness variation (barely perceptible to the human eye), which disappears when normal operation resumes."
So there you have it. It's basically down to the physics of how an LCD panel works at varying refresh rates. While I agree that it is a rare occurrence, there are some games that present this scenario more frequently (and noticeably) than others. If you've noticed this effect in some games more than others, let us know in the comments section below.
(Editor's Note: We are continuing to work with NVIDIA on this issue and hope to find a way to alleviate the flickering with either a hardware or software change in the future.)
It has been a couple of months since the release of the GeForce GTX 970 and the GM204 GPU that it is based on. After the initial wave of stock on day one, NVIDIA had admittedly struggled to keep these products available. Couple that with rampant concerns over coil whine from some non-reference designs, and you could see why we were a bit hesitant to focus and spend our time on retail GTX 970 reviews.
These issues appear to be settled for the most part. Finding GeForce GTX 970 cards is no longer a problem and users with coil whine are getting RMA replacements from NVIDIA's partners. Because of that, we feel much more comfortable reporting our results with the various retail cards that we have in house, and you'll see quite a few reviews coming from PC Perspective in the coming weeks.
But let's start with the MSI GeForce GTX 970 4GB Gaming card. Based on user reviews, this is one of the most popular retail cards. MSI's Gaming series of cards combines a custom cooler that typically runs quieter and more efficient than reference design, and it comes with a price tag that is within arms reach of the lower cost options as well.
The MSI GeForce GTX 970 4GB Gaming
MSI continues with its Dragon Army branding, and its associated black/red color scheme, which I think is appealing to a wide range of users. I'm sure NVIDIA would like to see a green or neutral color scheme, but hey, there are only so many colors to go around.
MFAA Technology Recap
In mid-September NVIDIA took the wraps off of the GeForce GTX 980 and GTX 970 GPUs, the first products based on the GM204 GPU utilizing the Maxwell architecture. Our review of the chip, those products and the package that NVIDIA had put together was incredibly glowing. Not only was performance impressive but they were able to offer that performance with power efficiency besting anything else on the market.
Of course, along with the new GPU were a set of new product features coming along for the ride. Two of the most impressive were Dynamic Super Resolution (DSR) and Multi-Frame Sampled AA (MFAA) but only one was available at launch: DSR. With it, you could take advantage of the extreme power of the GTX 980/970 with older games, render in a higher resolution than your panel, and have it filtered down to match your screen in post. The results were great. But NVIDIA spent as much time talking about MFAA (not mother-fu**ing AA as it turned out) during the product briefings and I was shocked when I found out the feature wouldn't be ready to test or included along with launch.
That changes today with the release of NVIDIA's 344.75 driver, the first to implement support for the new and potentially important anti-aliasing method.
Before we dive into the results of our testing, both in performance and image quality, let's get a quick recap on what exactly MFAA is and how it works.
Here is what I wrote back in September in our initial review:
While most of the deep, architectural changes in GM204 are based around power and area efficiency, there are still some interesting feature additions NVIDIA has made to these cards that depend on some specific hardware implementations. First up is a new antialiasing method called MFAA, or Multi-Frame Sampled AA. This new method alternates the AA sample pattern, which is now programmable via software, in both temporal and spatial directions.
The goal is to change the AA sample pattern in a way to produce near 4xMSAA quality at the effective cost of 2x MSAA (in terms of performance). NVIDIA showed a couple of demos of this in action during the press meetings but the only gameplay we saw was in a static scene. I do have some questions about how this temporal addition is affected by fast motion on the screen, though NVIDIA asserts that MFAA will very rarely ever fall below the image quality of standard 2x MSAA.
That information is still correct but we do have a little bit more detail on how this works than we did before. For reasons pertaining to patents NVIDIA seems a bit less interested in sharing exact details than I would like to see, but we'll work with what we have.