A few secrets about GTX 970
Yes, that last 0.5GB of memory on your GeForce GTX 970 does run slower than the first 3.5GB. More interesting than that fact is the reason why it does, and why the result is better than you might have otherwise expected. Last night we got a chance to talk with NVIDIA’s Senior VP of GPU Engineering, Jonah Alben on this specific concern and got a detailed explanation to why gamers are seeing what they are seeing along with new disclosures on the architecture of the GM204 version of Maxwell.
NVIDIA's Jonah Alben, SVP of GPU Engineering
For those looking for a little background, you should read over my story from this weekend that looks at NVIDIA's first response to the claims that the GeForce GTX 970 cards currently selling were only properly utilizing 3.5GB of the 4GB frame buffer. While it definitely helped answer some questions it raised plenty more which is whey we requested a talk with Alben, even on a Sunday.
Let’s start with a new diagram drawn by Alben specifically for this discussion.
GTX 970 Memory System
Believe it or not, every issue discussed in any forum about the GTX 970 memory issue is going to be explained by this diagram. Along the top you will see 13 enabled SMMs, each with 128 CUDA cores for the total of 1664 as expected. (Three grayed out SMMs represent those disabled from a full GM204 / GTX 980.) The most important part here is the memory system though, connected to the SMMs through a crossbar interface. That interface has 8 total ports to connect to collections of L2 cache and memory controllers, all of which are utilized in a GTX 980. With a GTX 970 though, only 7 of those ports are enabled, taking one of the combination L2 cache / ROP units along with it. However, the 32-bit memory controller segment remains.
You should take two things away from that simple description. First, despite initial reviews and information from NVIDIA, the GTX 970 actually has fewer ROPs and less L2 cache than the GTX 980. NVIDIA says this was an error in the reviewer’s guide and a misunderstanding between the engineering team and the technical PR team on how the architecture itself functioned. That means the GTX 970 has 56 ROPs and 1792 KB of L2 cache compared to 64 ROPs and 2048 KB of L2 cache for the GTX 980. Before people complain about the ROP count difference as a performance bottleneck, keep in mind that the 13 SMMs in the GTX 970 can only output 52 pixels/clock and the seven segments of 8 ROPs each (56 total) can handle 56 pixels/clock. The SMMs are the bottleneck, not the ROPs.
Subject: Graphics Cards | January 24, 2015 - 11:51 AM | Ryan Shrout
Tagged: nvidia, maxwell, GTX 970, GM204, 3.5gb memory
Looking for information on the GTX 980 and GTX 970 to start out? Our initial launch review of the NVIDIA Maxwell architecture is a good place to start!
UPDATE 1/26/25 @ 1:00pm ET: We have posted a much more detailed analysis and look at the GTX 970 memory system and what is causing the unusual memory divisions. Check it out right here!
UPDATE 1/26/15 @ 12:10am ET: I now have a lot more information on the technical details of the architecture that cause this issue and more information from NVIDIA to explain it. I spoke with SVP of GPU Engineering Jonah Alben on Sunday night to really dive into the quesitons everyone had. Expect an update here on this page at 10am PT / 1pm ET or so. Bookmark and check back!
UPDATE 1/24/15 @ 11:25pm ET: Apparently there is some concern online that the statement below is not legitimate. I can assure you that the information did come from NVIDIA, though is not attributal to any specific person - the message was sent through a couple of different PR people and is the result of meetings and multiple NVIDIA employee's input. It is really a message from the company, not any one individual. I have had several 10-20 minute phone calls with NVIDIA about this issue and this statement on Saturday alone, so I know that the information wasn't from a spoofed email, etc. Also, this statement was posted by an employee moderator on the GeForce.com forums about 6 hours ago, further proving that the statement is directly from NVIDIA. I hope this clears up any concerns around the validity of the below information!
Over the past couple of weeks users of GeForce GTX 970 cards have noticed and started researching a problem with memory allocation in memory-heavy gaming. Essentially, gamers noticed that the GTX 970 with its 4GB of system memory was only ever accessing 3.5GB of that memory. When it did attempt to access the final 500MB of memory, performance seemed to drop dramatically. What started as simply a forum discussion blew up into news that was being reported at tech and gaming sites across the web.
Image source: Lazygamer.net
NVIDIA has finally responded to the widespread online complaints about GeForce GTX 970 cards only utilizing 3.5GB of their 4GB frame buffer. From the horse's mouth:
The GeForce GTX 970 is equipped with 4GB of dedicated graphics memory. However the 970 has a different configuration of SMs than the 980, and fewer crossbar resources to the memory system. To optimally manage memory traffic in this configuration, we segment graphics memory into a 3.5GB section and a 0.5GB section. The GPU has higher priority access to the 3.5GB section. When a game needs less than 3.5GB of video memory per draw command then it will only access the first partition, and 3rd party applications that measure memory usage will report 3.5GB of memory in use on GTX 970, but may report more for GTX 980 if there is more memory used by other commands. When a game requires more than 3.5GB of memory then we use both segments.
We understand there have been some questions about how the GTX 970 will perform when it accesses the 0.5GB memory segment. The best way to test that is to look at game performance. Compare a GTX 980 to a 970 on a game that uses less than 3.5GB. Then turn up the settings so the game needs more than 3.5GB and compare 980 and 970 performance again.
Here’s an example of some performance data:
|GTX 980||GTX 970|
|Shadow of Mordor|
|<3.5GB setting = 2688x1512 Very High||72 FPS||60 FPS|
|>3.5GB setting = 3456x1944||55 FPS (-24%)||45 FPS (-25%)|
|<3.5GB setting = 3840x2160 2xMSAA||36 FPS||30 FPS|
|>3.5GB setting = 3840x2160 135% res||19 FPS (-47%)||15 FPS (-50%)|
|Call of Duty: Advanced Warfare|
|<3.5GB setting = 3840x2160 FSMAA T2x, Supersampling off||82 FPS||71 FPS|
|>3.5GB setting = 3840x2160 FSMAA T2x, Supersampling on||48 FPS (-41%)||40 FPS (-44%)|
On GTX 980, Shadows of Mordor drops about 24% on GTX 980 and 25% on GTX 970, a 1% difference. On Battlefield 4, the drop is 47% on GTX 980 and 50% on GTX 970, a 3% difference. On CoD: AW, the drop is 41% on GTX 980 and 44% on GTX 970, a 3% difference. As you can see, there is very little change in the performance of the GTX 970 relative to GTX 980 on these games when it is using the 0.5GB segment.
So it would appear that the severing of a trio of SMMs to make the GTX 970 different than the GTX 980 was the root cause of the issue. I'm not sure if this something that we have seen before with NVIDIA GPUs that are cut down in the same way, but I have asked for clarification from NVIDIA on that.
The ratios fit: 500MB is 1/8th of the 4GB total memory capacity and 2 SMMs is 1/8th of the total SMM count. (Edit: The ratios in fact do NOT match up...odd.)
The full GM204 GPU that is the root cause of this memory issue.
Another theory presented itself as well: is this possibly the reason we do not have a GTX 960 Ti yet? If the patterns were followed from previous generations a GTX 960 Ti would be a GM204 GPU with fewer cores enabled and additional SMs disconnected to enable a lower price point. If this memory issue were to be even more substantial, creating larger differentiated "pools" of memory, then it could be an issue for performance or driver development. To be clear, we are just guessing on this one and that could be something that would not occur at all. Again, I've asked NVIDIA for some technical clarification.
Requests for information aside, we may never know for sure if this is a bug with the GM204 ASIC or predetermined characteristic of design.
The questions remains: does NVIDIA's response appease GTX 970 owners? After all, this memory concern is really just a part of a GPU's story and thus performance testing and analysis already incorporates it essentially. Some users will still likely make a claim of a "bait and switch" but do the benchmarks above, as well as our own results at 4K, make it a less significant issue?
Our own Josh Walrath offers this analysis:
A few days ago when we were presented with evidence of the 970 not fully utilizing all 4 GB of memory, I theorized that it had to do with the reduction of SMM units. It makes sense from an efficiency standpoint to perhaps "hard code" memory addresses for each SMM. The thought behind that would be that 4 GB of memory is a huge amount of a video card, and the potential performance gains of a more flexible system would be pretty minimal.
I believe that the memory controller is working as intended and not a bug. When designing a large GPU, there will invariably be compromises made. From all indications NVIDIA decided to save time, die size, and power by simplifying the memory controller and crossbar setup. These things have a direct impact on time to market and power efficiency. NVIDIA probably figured that a couple percentage of performance lost was outweighed by the added complexity, power consumption, and engineering resources that it would have taken to gain those few percentage points back.
Subject: Graphics Cards | January 23, 2015 - 11:09 PM | Sebastian Peak
Tagged: nvidia, gtx 960, graphics drivers, graphics cards, GeForce 347.25, geforce, game ready, dying light
With the release of GTX 960 yesterday NVIDIA also introduced a new version of the GeForce graphics driver, 347.25 - WHQL.
NVIDIA states that the new driver adds "performance optimizations, SLI profiles, expanded Multi-Frame Sampled Anti-Aliasing support, and support for the new GeForce GTX 960".
While support for the newly released GPU goes without saying, the expanded MFAA support will help provide better anti-aliasing performance to many existing games, as “MFAA support is extended to nearly every DX10 and DX11 title”. In the release notes three games are listed that do not benefit from the MFAA support, as “Dead Rising 3, Dragon Age 2, and Max Payne 3 are incompatible with MFAA”.
347.25 also brings additional SLI profiles to add support for five new games, and a DirectX 11 SLI profile for one more:
SLI profiles added
- Black Desert
- Lara Croft and the Temple of Osiris
- Zhu Xian Shi Jie
- The Talos Principle
DirectX 11 SLI profile added
- Final Fantasy XIV: A Realm Reborn
The update is also the Game Ready Driver for Dying Light, a zombie action/survival game set to debut on January 27.
Much more information is available under the release notes on the driver download page, and be sure to check out Ryan’s chat with Tom Peterson from the live stream for a lot more information about this driver and the new GTX 960 graphics card.
Subject: General Tech, Graphics Cards | January 23, 2015 - 07:11 PM | Scott Michaud
Tagged: windows 10, microsoft, dx12, DirectX 12, DirectX
Microsoft has added DirectX 12 with the latest Windows 10 Technical Preview that was released today. Until today, DXDIAG reported DirectX 11 in the Windows 10 Technical Preview. At the moment, there has not been any drivers or software released for it, and the SDK is also no-where to be found. Really, all this means is that one barrier has been lifted, leaving the burden on hardware and software partners (except to release the SDK, that's still Microsoft's responsibility).
No-one needs to know how old my motherboard is...
Note: I have already experienced some issues with Build 9926. Within a half hour of using it, I suffered an instant power-down. There was not even enough time for a bluescreen. When it came back, my Intel GPU (which worked for a few minutes after the update) refused to be activated, along with the monitor it is attached to. My point? Not for production machines.
The interesting part, to me, is how Microsoft pushed DX12 into this release without, you know, telling anyone. It is not on any changelog that I can see, and it was not mentioned anywhere in the briefing as potentially being in an upcoming preview build. Before the keynote, I had a theory that it would be included but, after the announcement, figured that it might be pushed until GDC or BUILD (but I kept an open mind). The only evidence that it might come this month was an editorial on Forbes that referenced a conversation with Futuremark, who allegedly wanted to release an update to 3DMark (they hoped) when Microsoft released the new build. I could not find anything else, so I didn't report on it -- you would think that there would be a second source for that somewhere. It turns out that he might be right.
The new Windows 10 Technical Preview, containing DirectX 12, is available now from the preview build panel. It looks like Futuremark (and maybe others) will soon release software for it, but no hardware vendor has released a driver... yet.
Subject: General Tech, Graphics Cards | January 22, 2015 - 06:44 PM | Ryan Shrout
Tagged: video, tom petersen, nvidia, maxwell, live, gtx 960, gtx, GM206, geforce
UPDATE 2: If you missed the live stream you missed the prizes! But you can still watch the replay to get all the information and Q&A that went along with it as we discuss the GTX 960 and many more topics from the NVIDIA universe.
UPDATE (1/22): Well, the secret is out. Today's discussion will be about the new GeForce GTX 960, a $199 graphics card that takes power efficiency to a previously un-seen level! If you haven't read my review of the card yet, you should do so first, but then be sure you are ready for today's live stream and giveaway - details below! And don't forget: if you have questions, please leave them in the comments!
Get yourself ready, it’s time for another GeForce GTX live stream hosted by PC Perspective’s Ryan Shrout and NVIDIA’s Tom Petersen. Though we can’t dive into the exact details of what topics are going to be covered, intelligent readers that keep an eye on the rumors on our site will likely be able to guess what is happening on January 22nd.
On hand to talk about the products, answer questions about technologies in the GeForce family including GPUs, G-Sync, GameWorks, GeForce Experience and more will be Tom Petersen, well known on the LAN party and events circuit. To spice things up as well Tom has worked with graphics card partners to bring along a sizeable swag pack to give away LIVE during the event, including new GTX graphics cards. LOTS of graphics cards.
NVIDIA GeForce GTX 960 Live Stream and Giveaway
10am PT / 1pm ET - January 22nd
Need a reminder? Join our live mailing list!
Here are some of the prizes we have lined up for those of you that join us for the live stream:
- 3 x MSI GeForce GTX 960 Graphics Cards
- 4 x EVGA GeForce GTX 960 Graphics Cards
- 3 x ASUS GeForce GTX 960 Graphics Cards
Thanks to ASUS, EVGA and MSI for supporting the stream!
The event will take place Thursday, January 22nd at 1pm ET / 10am PT at http://www.pcper.com/live. There you’ll be able to catch the live video stream as well as use our chat room to interact with the audience, asking questions for me and Tom to answer live. To win the prizes you will have to be watching the live stream, with exact details of the methodology for handing out the goods coming at the time of the event.
Tom has a history of being both informative and entertaining and these live streaming events are always full of fun and technical information that you can get literally nowhere else. Previous streams have produced news as well – including statements on support for Adaptive Sync, release dates for displays and first-ever demos of triple display G-Sync functionality. You never know what’s going to happen or what will be said!
If you have questions, please leave them in the comments below and we'll look through them just before the start of the live stream. Of course you'll be able to tweet us questions @pcper and we'll be keeping an eye on the IRC chat as well for more inquiries. What do you want to know and hear from Tom or I?
So join us! Set your calendar for this coming Thursday at 1pm ET / 10am PT and be here at PC Perspective to catch it. If you are a forgetful type of person, sign up for the PC Perspective Live mailing list that we use exclusively to notify users of upcoming live streaming events including these types of specials and our regular live podcast. I promise, no spam will be had!
Subject: Graphics Cards | January 22, 2015 - 01:44 PM | Jeremy Hellstrom
Tagged: video, nvidia, msi gaming 2g, maxwell, gtx 960, GM206, geforce
Did Ryan somehow miss a benchmark that is important to you? Perhaps [H]ard|OCP's coverage of the MSI GeForce GTX 960 GAMING 2G will capture that certain something. MSI runs their 960 at a base of 1216MHz with the boost clock hitting 1279MHz, slightly slower than the ASUS STRIX at 1291 MHz and 1317 MHz. At the time this was posted the cards were available on Amazon for $210, that is obviously going to change so keep an eye out. As [H] states in their conclusions, it is a good value but not the great value which the GTX 970 offered at release, check out their full review here or one of the many down below.
"NVIDIA is today launching a GPU aimed at the "sweet spot" of the video card market. With an unexpectedly low MSRP, we find out if the new GeForce GTX 960 has what it takes to compete with the competition. The MSI GTX 960 GAMING reviewed here today is a retail card you will be able to purchase. No reference card in this review."
Here are some more Graphics Card articles from around the web:
- Nvidia's GeForce GTX 960 @ The Tech Report
- Zotac GTX 960 AMP!-edition @ Bjorn3d
- NVIDIA GeForce GTX 960: A Great $200 GPU For Linux Gamers @ Phoronix
- Palit GTX 960 Super JetStream 2 GB @ techPowerUp
- Gigabyte GTX 960 G1 Gaming 2GB @ Modders-Inc
- NVIDIA, MSI, EVGA GTX 960 Review @ OCC
- NVIDIA GeForce GTX 960 SLI @ techPowerUp
- EVGA GTX 960 Super Superclocked Video Card Review @ Hardware Asylum
- ASUS STRIX GTX 960 Review @ Neoseeker
- MSI GTX 960 Gaming OC 2 GB @ techPowerUp
- GTX 960 @ HardwareHeaven
- Gigabyte GTX960 G1 Gaming SOC @ Kitguru
- EVGA GTX 960 SSC 2 GB @ techPowerUp
- ASUS GTX 960 STRIX OC 2 GB @ techPowerUp
- Asus GTX960 Strix OC Edition @ Kitguru
- ASUS Strix Edition GeForce GTX 960 Graphics Card Review @ Techgage
- Palit GeForce GTX 960 JetStream @ Legion Hardware
- The NVIDIA GTX 960 Performance Review @ Hardware Canucks
- EVGA GeForce GTX 970 SSC ACX 2.0 @ HardwareOverlock
- NVIDIA GeForce GTX 970/980: Windows vs. Ubuntu Linux Performance @ Phoronix
- 22-Way AMD+NVIDIA Graphics Card Tests With Metro Redux On Steam For Linux @ Phoronix
A new GPU, a familiar problem
Editor's Note: Don't forget to join us today for a live streaming event featuring Ryan Shrout and NVIDIA's Tom Petersen to discuss the new GeForce GTX 960. It will be live at 1pm ET / 10am PT and will include ten (10!) GTX 960 prizes for participants! You can find it all at http://www.pcper.com/live
There are no secrets anymore. Calling today's release of the NVIDIA GeForce GTX 960 a surprise would be like calling another Avenger's movie unexpected. If you didn't just assume it was coming chances are the dozens of leaks of slides and performance would get your attention. So here it is, today's the day, NVIDIA finally upgrades the mainstream segment that was being fed by the GTX 760 for more than a year and half. But does the brand new GTX 960 based on Maxwell move the needle?
But as you'll soon see, the GeForce GTX 960 is a bit of an odd duck in terms of new GPU releases. As we have seen several times in the last year or two with a stagnant process technology landscape, the new cards aren't going be wildly better performing than the current cards from either NVIDIA for AMD. In fact, there are some interesting comparisons to make that may surprise fans of both parties.
The good news is that Maxwell and the GM206 GPU will price out starting at $199 including overclocked models at that level. But to understand what makes it different than the GM204 part we first need to dive a bit into the GM206 GPU and how it matches up with NVIDIA's "small" GPU strategy of the past few years.
The GM206 GPU - Generational Complexity
First and foremost, the GTX 960 is based on the exact same Maxwell architecture as the GTX 970 and GTX 980. The power efficiency, the improved memory bus compression and new features all make their way into the smaller version of Maxwell selling for $199 as of today. If you missed the discussion on those new features including MFAA, Dynamic Super Resolution, VXGI you should read that page of our original GTX 980 and GTX 970 story from last September for a bit of context; these are important aspects of Maxwell and the new GM206.
NVIDIA's GM206 is essentially half of the full GM204 GPU that you find on the GTX 980. That includes 1024 CUDA cores, 64 texture units and 32 ROPs for processing, a 128-bit memory bus and 2GB of graphics memory. This results in half of the memory bandwidth at 112 GB/s and half of the peak compute capability at 2.30 TFLOPS.
Subject: General Tech, Graphics Cards | January 16, 2015 - 10:37 PM | Scott Michaud
Tagged: Khronos, opengl, OpenGL ES, webgl, OpenGL Next
The Khornos Group probably wants some advice from graphics developers because they ultimately want to market to them, as the future platform's success depends on their applications. If you develop games or other software (web browsers?) then you can give your feedback. If not, then it's probably best to leave responses to its target demographic.
As for the questions themselves, first and foremost they ask if you are (or were) an active software developer. From there, they ask you to score your opinion on OpenGL, OpenGL ES, and WebGL. They then ask whether you value “Open” or “GL” in the title. They then ask you whether you feel like OpenGL, OpenGL ES, and WebGL are related APIs. They ask how you learn about the Khronos APIs. Finally, they directly ask you for name suggestions and any final commentary.
Now it is time to (metaphorically) read tea leaves. The survey seems written primarily to establish whether developers consider OpenGL, OpenGL ES, and WebGL as related libraries, and to gauge their overall interest in each. If you look at the way OpenGL ES has been developing, it has slowly brought mobile graphics into a subset of desktop GPU features. It is basically an on-ramp to full OpenGL.
We expect that, like Mantle and DirectX 12, the next OpenGL initiative will be designed around efficiently loading massively parallel processors, with a little bit of fixed-function hardware for common tasks, like rasterizing triangles into fragments. The name survey might be implying that the Next Generation OpenGL Initiative is intended to be a unified platform, for high-end, mobile, and even web. Again, modern graphics APIs are based on loading massively parallel processors as directly as possible.
If you are a graphics developer, the Khronos Group is asking for your feedback via their survey.
Subject: Graphics Cards | January 14, 2015 - 10:49 AM | Sebastian Peak
Tagged: rumors, NVIDA, leak, gtx 960, gpu, geforce
The GPU news and rumor site VideoCardz.com had yet another post about the GTX 960 yesterday, and this time the site claims they have most of the details about this unreleased GPU with new leaked photos from a forum on the Chinese site PCEVA.
The card is reportedly based on Maxwell GM206, a 1024 CUDA core part recently announced with the introduction of the GTX 965M. Clock speed was not listed but alleged screenshots indicate the sample had a 1228 MHz core and 1291 MHz Boost clock. The site is calling this an overclock, but it's still likely that the core would have a faster clock speed than the GTX 970 and 980.
The card will reportedly feature 2GB of 128-bit GDDR5 memory, though doubtless 4GB variants would likely be available after launch from the various vendors (an important option considering the possibility of the new card natively supporting triple DisplayPort monitors). Performance will clearly be a step down from the initial GTX 900-series offerings as NVIDIA has led with their more performant parts, but the 960 should still be a solid choice for 1080p gaming if these screenshots are real.
The specs as listed on the page at VideoCardz.com are follows (they do not list clock speed):
- 28nm GM206-300 GPU
- 1024 CUDA cores
- 64(?) TMUs
- 32 ROPs
- 1753 MHz memory
- 128-bit memory bus
- 2GB memory size
- 112 GB/s memory bandwidth
- DirectX 11.3/12
- 120W TDP
- 1x 6-pin power connector
- 1x DVI-I, 1x HDMI 2.0, 3x DP
We await official word on pricing and availability for this unreleased GPU.
Subject: Graphics Cards | January 13, 2015 - 02:28 PM | Sebastian Peak
Tagged: rumors, nvidia, multi monitor, mini-ITX GPU, leak, HDMI 2.0, gtx 960, gpu, geforce, DisplayPort
The crew at VideoCardz.com have been reporting some GTX 960 sightings lately, and today they've added no less than three new cards from KFA2, the "European premium brand" of Galaxy.
The reported reference design GTX 960 (VideoCardz.com)
Such reports are becoming more common, with the site posting photos that appear to be other vendors' versions of the new GPU here, here, and here. Of note with these new alleged photos on what appears to be a reference design board: no less than three DisplayPort outputs, as well as HDMI 2.0 and DVI:
Reported GTX 960 outputs (VideoCardz.com)
This would be big news for multi-monitor users as it would provide potential support three high-resolution DisplayPort monitors from a single card in a strictly non-gaming environment (unless you happen to enjoy the frame-rates of an oil painting).
The reported mini-ITX GTX 960 (VideoCardz.com)
The other designs shown in the post include a mini-ITX form-factor design still sporting the triple DisplayPorts, HDMI and DVI, and a larger EXOC edition built on a custom PCB.
Reported EXOC GTX 960 (VideoCardz.com)
The EXOC edition apparently drops the multi-DisplayPort option in favor of a second DVI output, leaving just one DisplayPort along with the lone HDMI 2.0 output.
With the GTX 960 leaks coming in daily now it seems likely that we would be hearing something official soon.