Subject: Graphics Cards | February 7, 2015 - 06:03 PM | Scott Michaud
Tagged: amd, R9, r9 300, r9 390x, radeon
According to WCCFTech, AMD commented on Facebook that they are “putting the finishing touches on the 300 series to make sure they live up to expectation”. I tried look through AMD's “Posts to Page” for February 3rd and I did not see it listed, so a grain of salt is necessary (either with WCCF or with my lack of Facebook skills).
Image Credit: WCCFTech
The current rumors claim that Fiji XT will have 4096 graphics cores that are fed by a high-bandwidth, stacked memory architecture, which is supposedly rated at 640 GB/s (versus 224 GB/s of the GeForce GTX 980). When you're dealing with data sets at the scale that GPUs are, bandwidth is a precious resource. That said, they also have cache and other methods to reduce this dependency, but let's just say that, if you offer a graphics vendor a free, order-of-magnitude speed-up in memory bandwidth -- you will have friend, and possibly one for life. Need a couch moved? No problem!
The R9 Series is expected to be launched next quarter, which could be as early as about a month.
A baker's dozen of GTX 960
Back on the launch day of the GeForce GTX 960, we hosted NVIDIA's Tom Petersen for a live stream. During the event, NVIDIA and its partners provided ten GTX 960 cards for our live viewers to win which we handed out through about an hour and a half. An interesting idea was proposed during the event - what would happen if we tried to overclock all of the product NVIDIA had brought along to see what the distribution of results looked like? After notifying all the winners of their prizes and asking for permission from each, we started the arduous process of testing and overclocking a total of 13 (10 prizes plus our 3 retail units already in the office) different GTX 960 cards.
Hopefully we will be able to provide a solid base of knowledge for buyers of the GTX 960 that we don't normally have the opportunity to offer: what is the range of overclocking you can expect and what is the average or median result. I think you will find the data interesting.
The 13 Contenders
Our collection of thirteen GTX 960 cards includes a handful from ASUS, EVGA and MSI. The ASUS models are all STRIX models, the EVGA cards are of the SSC variety, and the MSI cards include a single Gaming model and three 100ME. (The only difference between the Gaming and 100ME MSI cards is the color of the cooler.)
To be fair to the prize winners, I actually assigned each of them a specific graphics card before opening them up and testing them. I didn't want to be accused of favoritism by giving the best overclockers to the best readers!
Subject: Graphics Cards, Shows and Expos | February 4, 2015 - 03:33 AM | Scott Michaud
Tagged: OpenGL Next, opengl, glnext, gdc 2015, GDC
The first next-gen, released graphics API was Mantle, which launched a little while after Battlefield 4, but the SDK is still invite-only. The DirectX 12 API quietly launched with the recent Windows 10 Technical Preview, but no drivers, SDK, or software (that we know about) are available to the public yet. The Khronos Group has announced their project, and that's about it currently.
According to Develop Magazine, the GDC event listing, and participants, the next OpenGL (currently called “glNext initiative”) will be unveiled at GDC 2015. The talk will be presented by Valve, but it will also include Epic Games, who was closely involved in DirectX 12 with Unreal Engine, Oxide Games and EA/DICE, who were early partners with AMD on Mantle, and Unity, who recently announced support for DirectX 12 when it launches with Windows 10. Basically, this GDC talk includes almost every software developer that came out in early support of either DirectX 12 or Mantle, plus Valve. Off the top of my head, I can only think of FutureMark as unlisted. On the other hand, while they will obviously have driver support from at least one graphics vendor, none are listed. Will we see NVIDIA? Intel? AMD? All of the above? We don't know.
When I last discussed the next OpenGL initiative, it was attempting to parse the naming survey to figure out bits of the technology itself. As it turns out, the talk claims to go deep into the API, with demos, examples, and “real-world applications running on glNext drivers and hardware”. If this information makes it out (and some talks remain private unfortunately although this one looks public) then we should know more about it than what we know about any competing API today. Personally, I am hoping that they spent a lot of effort on the GPGPU side of things, sort-of building graphics atop it rather than having them be two separate entities. This would be especially good if it could be sandboxed for web applications.
This could get interesting.
Battlefield 4 Results
At the end of my first Frame Rating evaluation of the GTX 970 after the discovery of the memory architecture issue, I proposed the idea that SLI testing would need to be done to come to a more concrete conclusion on the entire debate. It seems that our readers and the community at large agreed with us in this instance, repeatedly asking for those results in the comments of the story. After spending the better part of a full day running and re-running SLI results on a pair of GeForce GTX 970 and GTX 980 cards, we have the answers you're looking for.
Today's story is going to be short on details and long on data, so if you want the full back story on what is going on why we are taking a specific look at the GTX 970 in this capacity, read here:
- Part 1: NVIDIA issues initial statement
- Part 2: Full GTX 970 memory architecture disclosed
- Part 3: Frame Rating: GTX 970 vs GTX 980
- Part 4: Frame Rating: GTX 970 SLI vs GTX 980 SLI (what you are reading now)
Okay, are we good now? Let's dive into the first set of results in Battlefield 4.
Battlefield 4 Results
Just as I did with the first GTX 970 performance testing article, I tested Battlefield 4 at 3840x2160 (4K) and utilized the game's ability to linearly scale resolution to help me increase GPU memory allocation. In the game settings you can change that scaling option by a percentage: I went from 110% to 150% in 10% increments, increasing the load on the GPU with each step.
Memory allocation between the two SLI configurations was similar, but not as perfectly aligned with each other as we saw with our single GPU testing.
In a couple of cases, at 120% and 130% scaling, the GTX 970 cards in SLI are actually each using more memory than the GTX 980 cards. That difference is only ~100MB but that delta was not present at all in the single GPU testing.
Subject: Graphics Cards | January 30, 2015 - 03:42 PM | Jeremy Hellstrom
Tagged: nvidia, msi gaming 2g, msi, maxwell, gtx 960
Sitting right now at $220 ($210 after MIR) the MSI GTX 960 GAMING 2G is more or less the same price as a standard R9 280 or a 280x/290 on discount. We have seen that the price to performance is quite competitive at 1080p gaming but this year the push seems to be for performance at higher resolutions. [H]ard|OCP explored the performance of this card by testing at 2560 x 1440 against the GTX 770 and 760 as well as the R9 285 and 280x with some interesting results. In some cases the 280x turned out on top while in others the 960 won, especially when Gameworks features like God Rays were enabled. Read through this carefully but from their results the deciding factor between picking up one or the other of these cards could be the price as they are very close in performance.
"We take the new NVIDIA GeForce GTX 960 GPU based MSI GeForce GTX 960 GAMING 2G and push it to its limits at 1440p. We also include a GeForce GTX 770 and AMD Radeon R9 280X into this evaluation for some unexpected results. How does the GeForce GTX 960 really compare to a wide range of cards?"
Here are some more Graphics Card articles from around the web:
- Gigabyte GTX 960 G1 Gaming 2 GB @ techPowerUp
- Asus GeForce GTX 960 DirectCU II OC STRIX @ eTeknix
- ASUS GeForce GTX 960 Strix @ Benchmark Reviews
- Asus GeForce GTX 960 Strix OC @ Silent PC Review
- Gigabyte G1 Gaming GeForce GTX 960 @ eTeknix
- GeForce GTX 960 Article Round Up @HiTech Legion
- Linux Benchmarks Of NVIDIA's Early 2015 GeForce Line-Up @ Phoronix
- Corsair HG-10 gpu bracket @ HardwareOverclock
It has been an abnormal week for us here at PC Perspective. Our typical review schedule has pretty much flown out the window, and the past seven days have been filled with learning, researching, retesting, and publishing. That might sound like the norm, but in these cases the process was initiated by tips from our readers. Last Saturday (24 Jan), a few things were brewing:
- Ryan was informed by NVIDIA that the memory layout of the GTX 970 was different than expected.
- The huge (now 168 page) overclock.net forum thread about the Samsung 840 EVO slowdown was once again gaining traction.
- Someone got G-Sync working on a laptop integrated display.
We had to do a bit of triage here of course, as we can only research and write so quickly. Ryan worked the GTX 970 piece as it was the hottest item. I began a few days of research and testing on the 840 EVO slow down issue reappearing on some drives, and we kept tabs on that third thing, which at the time seemed really farfetched. With those two first items taken care of, Ryan shifted his efforts to GTX 970 SLI testing while I shifted my focus to finding out of there was any credence to this G-Sync laptop thing.
A few weeks ago, an ASUS Nordic Support rep inadvertently leaked an interim build of the NVIDIA driver. This was a mobile driver build (version 346.87) focused at their G751 line of laptops. One recipient of this driver link posted it to the ROG forum back on the 20th. A fellow by the name Gamenab, owning the same laptop cited in that thread, presumably stumbled across this driver, tried it out, and was more than likely greeted by this popup after the installation completed:
Now I know what you’re thinking, and it’s probably the same thing anyone would think. How on earth is this possible? To cut a long story short, while the link to the 346.87 driver was removed shortly after being posted to that forum, we managed to get our hands on a copy of it, installed it on the ASUS G751 that we had in for review, and wouldn’t you know it we were greeted by the same popup!
Ok, so it’s a popup, could it be a bug? We checked NVIDIA control panel and the options were consistent with that of a G-Sync connected system. We fired up the pendulum demo and watched the screen carefully, passing the machine around the office to be inspected by all. We then fired up some graphics benchmarks that were well suited to show off the technology (Unigine Heaven, Metro: Last Light, etc), and everything looked great – smooth steady pans with no juddering or tearing to be seen. Ken Addison, our Video Editor and jack of all trades, researched the panel type and found that it was likely capable of 100 Hz refresh. We quickly dug created a custom profile, hit apply, and our 75 Hz G-Sync laptop was instantly transformed into a 100 Hz G-Sync laptop!
Ryan's Note: I think it is important here to point out that we didn't just look at demos and benchmarks for this evaluation but actually looked at real-world gameplay situations. Playing through Metro: Last Light showed very smooth pans and rotation, Assassin's Creed played smoothly as well and flying through Unigine Heaven manually was a great experience. Crysis 3, Battlefield 4, etc. This was NOT just a couple of demos that we ran through - the variable refresh portion of this mobile G-Sync enabled panel was working and working very well.
At this point in our tinkering, we had no idea how or why this was working, but there was no doubt that we were getting a similar experience as we have seen with G-Sync panels. As I digested what was going on, I thought surely this can’t be as good as it seems to be… Let’s find out, shall we?
A Summary Thus Far
UPDATE 2/2/15: We have another story up that compares the GTX 980 and GTX 970 in SLI as well.
It has certainly been an interesting week for NVIDIA. It started with the release of the new GeForce GTX 960, a $199 graphics card that brought the latest iteration of Maxwell's architecture to a lower price point, competing with the Radeon R9 280 and R9 285 products. But then the proverbial stuff hit the fan with a memory issue on the GeForce GTX 970, the best selling graphics card of the second half of 2014. NVIDIA responded to the online community on Saturday morning but that was quickly followed up with a more detailed expose on the GTX 970 memory hierarchy, which included a couple of important revisions to the specifications of the GTX 970 as well.
At the heart of all this technical debate is a performance question: does the GTX 970 suffer from lower performance because of of the 3.5GB/0.5GB memory partitioning configuration? Many forum members and PC enthusiasts have been debating this for weeks with many coming away with an emphatic yes.
The newly discovered memory system of the GeForce GTX 970
Yesterday I spent the majority of my day trying to figure out a way to validate or invalidate these types of performance claims. As it turns out, finding specific game scenarios that will consistently hit targeted memory usage levels isn't as easy as it might first sound and simple things like the order of start up can vary that as well (and settings change orders). Using Battlefield 4 and Call of Duty: Advanced Warfare though, I think I have presented a couple of examples that demonstrate the issue at hand.
Performance testing is a complicated story. Lots of users have attempted to measure performance on their own setup, looking for combinations of game settings that sit below the 3.5GB threshold and those that cross above it, into the slower 500MB portion. The issue for many of these tests is that they lack access to both a GTX 970 and a GTX 980 to really compare performance degradation between cards. That's the real comparison to make - the GTX 980 does not separate its 4GB into different memory pools. If it has performance drops in the same way as the GTX 970 then we can wager the memory architecture of the GTX 970 is not to blame. If the two cards perform differently enough, beyond the expected performance delta between two cards running at different clock speeds and with different CUDA core counts, then we have to question the decisions that NVIDIA made.
There has also been concern over the frame rate consistency of the GTX 970. Our readers are already aware of how deceptive an average frame rate alone can be, and why looking at frame times and frame time consistency is so much more important to guaranteeing a good user experience. Our Frame Rating method of GPU testing has been in place since early 2013 and it tests exactly that - looking for consistent frame times that result in a smooth animation and improved gaming experience.
Users at reddit.com have been doing a lot of subjective testing
We will be applying Frame Rating to our testing today of the GTX 970 and its memory issues - does the division of memory pools introduce additional stutter into game play? Let's take a look at a couple of examples.
Subject: Graphics Cards | January 28, 2015 - 10:21 AM | Ryan Shrout
Tagged: nvidia, memory issue, maxwell, GTX 970, GM204, geforce
UPDATE 1/29/15: This forum post has since been edited and basically removed, with statements made on Twitter that no driver changes are planned that will specifically target the performance of the GeForce GTX 970.
The story around the GeForce GTX 970 and its confusing and shifting memory architecture continues to update. On a post in the official GeForce.com forums (on page 160 of 184!), moderator and NVIDIA employee PeterS claims that the company is working on a driver to help improve performance concerns and will also be willing to "help out" for users that honestly want to return the product they already purchased. Here is the quote:
First, I want you to know that I'm not just a mod, I work for NVIDIA in Santa Clara.
I totally get why so many people are upset. We messed up some of the stats on the reviewer kit and we didn't properly explain the memory architecture. I realize a lot of you guys rely on product reviews to make purchase decisions and we let you down.
It sucks because we're really proud of this thing. The GTX970 is an amazing card and I genuinely believe it's the best card for the money that you can buy. We're working on a driver update that will tune what's allocated where in memory to further improve performance.
Having said that, I understand that this whole experience might have turned you off to the card. If you don't want the card anymore you should return it and get a refund or exchange. If you have any problems getting that done, let me know and I'll do my best to help.
This makes things a bit more interesting - based on my conversations with NVIDIA about the GTX 970 since this news broke, it was stated that the operating system had a much stronger role in the allocation of memory from a game's request than the driver. Based on the above statement though, NVIDIA seems to think it can at least improve on the current level of performance and tune things to help alleviate any potential bottlenecks that might exist simply in software.
As far as the return goes, PeterS at least offers to help this one forum user but I would assume the gesture would be available for anyone that has the same level of concern for the product. Again, as I stated in my detailed breakdown of the GTX 970 memory issue on Monday, I don't believe that users need to go that route - the GeForce GTX 970 is still a fantastic performing card in nearly all cases except (maybe) a tiny fraction where that last 500MB of frame buffer might come into play. I am working on another short piece going up today that details my experiences with the GTX 970 running up on those boundaries.
NVIDIA is trying to be proactive now, that much we can say. It seems that the company understands its mistake - not in the memory pooling decision but in the lack of clarity it offered to reviewers and consumers upon the product's launch.
A few secrets about GTX 970
UPDATE 1/28/15 @ 10:25am ET: NVIDIA has posted in its official GeForce.com forums that they are working on a driver update to help alleviate memory performance issues in the GTX 970 and that they will "help out" those users looking to get a refund or exchange.
Yes, that last 0.5GB of memory on your GeForce GTX 970 does run slower than the first 3.5GB. More interesting than that fact is the reason why it does, and why the result is better than you might have otherwise expected. Last night we got a chance to talk with NVIDIA’s Senior VP of GPU Engineering, Jonah Alben on this specific concern and got a detailed explanation to why gamers are seeing what they are seeing along with new disclosures on the architecture of the GM204 version of Maxwell.
NVIDIA's Jonah Alben, SVP of GPU Engineering
For those looking for a little background, you should read over my story from this weekend that looks at NVIDIA's first response to the claims that the GeForce GTX 970 cards currently selling were only properly utilizing 3.5GB of the 4GB frame buffer. While it definitely helped answer some questions it raised plenty more which is whey we requested a talk with Alben, even on a Sunday.
Let’s start with a new diagram drawn by Alben specifically for this discussion.
GTX 970 Memory System
Believe it or not, every issue discussed in any forum about the GTX 970 memory issue is going to be explained by this diagram. Along the top you will see 13 enabled SMMs, each with 128 CUDA cores for the total of 1664 as expected. (Three grayed out SMMs represent those disabled from a full GM204 / GTX 980.) The most important part here is the memory system though, connected to the SMMs through a crossbar interface. That interface has 8 total ports to connect to collections of L2 cache and memory controllers, all of which are utilized in a GTX 980. With a GTX 970 though, only 7 of those ports are enabled, taking one of the combination L2 cache / ROP units along with it. However, the 32-bit memory controller segment remains.
You should take two things away from that simple description. First, despite initial reviews and information from NVIDIA, the GTX 970 actually has fewer ROPs and less L2 cache than the GTX 980. NVIDIA says this was an error in the reviewer’s guide and a misunderstanding between the engineering team and the technical PR team on how the architecture itself functioned. That means the GTX 970 has 56 ROPs and 1792 KB of L2 cache compared to 64 ROPs and 2048 KB of L2 cache for the GTX 980. Before people complain about the ROP count difference as a performance bottleneck, keep in mind that the 13 SMMs in the GTX 970 can only output 52 pixels/clock and the seven segments of 8 ROPs each (56 total) can handle 56 pixels/clock. The SMMs are the bottleneck, not the ROPs.
Subject: Graphics Cards | January 24, 2015 - 11:51 AM | Ryan Shrout
Tagged: nvidia, maxwell, GTX 970, GM204, 3.5gb memory
UPDATE 1/28/15 @ 10:25am ET: NVIDIA has posted in its official GeForce.com forums that they are working on a driver update to help alleviate memory performance issues in the GTX 970 and that they will "help out" those users looking to get a refund or exchange.
UPDATE 1/26/25 @ 1:00pm ET: We have posted a much more detailed analysis and look at the GTX 970 memory system and what is causing the unusual memory divisions. Check it out right here!
UPDATE 1/26/15 @ 12:10am ET: I now have a lot more information on the technical details of the architecture that cause this issue and more information from NVIDIA to explain it. I spoke with SVP of GPU Engineering Jonah Alben on Sunday night to really dive into the quesitons everyone had. Expect an update here on this page at 10am PT / 1pm ET or so. Bookmark and check back!
UPDATE 1/24/15 @ 11:25pm ET: Apparently there is some concern online that the statement below is not legitimate. I can assure you that the information did come from NVIDIA, though is not attributal to any specific person - the message was sent through a couple of different PR people and is the result of meetings and multiple NVIDIA employee's input. It is really a message from the company, not any one individual. I have had several 10-20 minute phone calls with NVIDIA about this issue and this statement on Saturday alone, so I know that the information wasn't from a spoofed email, etc. Also, this statement was posted by an employee moderator on the GeForce.com forums about 6 hours ago, further proving that the statement is directly from NVIDIA. I hope this clears up any concerns around the validity of the below information!
Over the past couple of weeks users of GeForce GTX 970 cards have noticed and started researching a problem with memory allocation in memory-heavy gaming. Essentially, gamers noticed that the GTX 970 with its 4GB of system memory was only ever accessing 3.5GB of that memory. When it did attempt to access the final 500MB of memory, performance seemed to drop dramatically. What started as simply a forum discussion blew up into news that was being reported at tech and gaming sites across the web.
Image source: Lazygamer.net
NVIDIA has finally responded to the widespread online complaints about GeForce GTX 970 cards only utilizing 3.5GB of their 4GB frame buffer. From the horse's mouth:
The GeForce GTX 970 is equipped with 4GB of dedicated graphics memory. However the 970 has a different configuration of SMs than the 980, and fewer crossbar resources to the memory system. To optimally manage memory traffic in this configuration, we segment graphics memory into a 3.5GB section and a 0.5GB section. The GPU has higher priority access to the 3.5GB section. When a game needs less than 3.5GB of video memory per draw command then it will only access the first partition, and 3rd party applications that measure memory usage will report 3.5GB of memory in use on GTX 970, but may report more for GTX 980 if there is more memory used by other commands. When a game requires more than 3.5GB of memory then we use both segments.
We understand there have been some questions about how the GTX 970 will perform when it accesses the 0.5GB memory segment. The best way to test that is to look at game performance. Compare a GTX 980 to a 970 on a game that uses less than 3.5GB. Then turn up the settings so the game needs more than 3.5GB and compare 980 and 970 performance again.
Here’s an example of some performance data:
|GTX 980||GTX 970|
|Shadow of Mordor|
|<3.5GB setting = 2688x1512 Very High||72 FPS||60 FPS|
|>3.5GB setting = 3456x1944||55 FPS (-24%)||45 FPS (-25%)|
|<3.5GB setting = 3840x2160 2xMSAA||36 FPS||30 FPS|
|>3.5GB setting = 3840x2160 135% res||19 FPS (-47%)||15 FPS (-50%)|
|Call of Duty: Advanced Warfare|
|<3.5GB setting = 3840x2160 FSMAA T2x, Supersampling off||82 FPS||71 FPS|
|>3.5GB setting = 3840x2160 FSMAA T2x, Supersampling on||48 FPS (-41%)||40 FPS (-44%)|
On GTX 980, Shadows of Mordor drops about 24% on GTX 980 and 25% on GTX 970, a 1% difference. On Battlefield 4, the drop is 47% on GTX 980 and 50% on GTX 970, a 3% difference. On CoD: AW, the drop is 41% on GTX 980 and 44% on GTX 970, a 3% difference. As you can see, there is very little change in the performance of the GTX 970 relative to GTX 980 on these games when it is using the 0.5GB segment.
So it would appear that the severing of a trio of SMMs to make the GTX 970 different than the GTX 980 was the root cause of the issue. I'm not sure if this something that we have seen before with NVIDIA GPUs that are cut down in the same way, but I have asked for clarification from NVIDIA on that.
The ratios fit: 500MB is 1/8th of the 4GB total memory capacity and 2 SMMs is 1/8th of the total SMM count. (Edit: The ratios in fact do NOT match up...odd.)
The full GM204 GPU that is the root cause of this memory issue.
Another theory presented itself as well: is this possibly the reason we do not have a GTX 960 Ti yet? If the patterns were followed from previous generations a GTX 960 Ti would be a GM204 GPU with fewer cores enabled and additional SMs disconnected to enable a lower price point. If this memory issue were to be even more substantial, creating larger differentiated "pools" of memory, then it could be an issue for performance or driver development. To be clear, we are just guessing on this one and that could be something that would not occur at all. Again, I've asked NVIDIA for some technical clarification.
Requests for information aside, we may never know for sure if this is a bug with the GM204 ASIC or predetermined characteristic of design.
The questions remains: does NVIDIA's response appease GTX 970 owners? After all, this memory concern is really just a part of a GPU's story and thus performance testing and analysis already incorporates it essentially. Some users will still likely make a claim of a "bait and switch" but do the benchmarks above, as well as our own results at 4K, make it a less significant issue?
Our own Josh Walrath offers this analysis:
A few days ago when we were presented with evidence of the 970 not fully utilizing all 4 GB of memory, I theorized that it had to do with the reduction of SMM units. It makes sense from an efficiency standpoint to perhaps "hard code" memory addresses for each SMM. The thought behind that would be that 4 GB of memory is a huge amount of a video card, and the potential performance gains of a more flexible system would be pretty minimal.
I believe that the memory controller is working as intended and not a bug. When designing a large GPU, there will invariably be compromises made. From all indications NVIDIA decided to save time, die size, and power by simplifying the memory controller and crossbar setup. These things have a direct impact on time to market and power efficiency. NVIDIA probably figured that a couple percentage of performance lost was outweighed by the added complexity, power consumption, and engineering resources that it would have taken to gain those few percentage points back.