Subject: Graphics Cards | April 11, 2014 - 03:30 PM | Ryan Shrout
Tagged: nvidia, geforce, dx11, driver, 337.50
UPDATE: We have put together a much more comprehensive story based on the NVIDIA 337.50 driver that includes more cards and more games while also disputing the Total War: Rome II results seen here. Be sure to read it!!
When I spoke with NVIDIA after the announcement of DirectX 12 at GDC this past March, a lot of the discussion centered around a pending driver release that promised impressive performance advances with current DX11 hardware and DX11 games.
What NVIDIA did want to focus on with us was the significant improvements that have been made on the efficiency and performance of DirectX 11. When NVIDIA is questioned as to why they didn’t create their Mantle-like API if Microsoft was dragging its feet, they point to the vast improvements possible and made with existing APIs like DX11 and OpenGL. The idea is that rather than spend resources on creating a completely new API that needs to be integrated in a totally unique engine port (see Frostbite, CryEngine, etc.) NVIDIA has instead improved the performance, scaling, and predictability of DirectX 11.
NVIDIA claims that these fixes are not game specific and will improve performance and efficiency for a lot of GeForce users. Even if that is the case, we will only really see these improvements surface in titles that have addressable CPU limits or very low end hardware, similar to how Mantle works today.
Lofty goals to be sure. This driver was released last week and I immediately wanted to test and verify many of these claims. However, a certain other graphics project kept me occupied most of the week and then a short jaunt to Dallas kept me from the task until yesterday.
To be clear, I am planning to look at several more games and card configurations next week, but I thought it was worth sharing our first set of results. The test bed in use is the same as our standard GPU reviews.
|Test System Setup|
|CPU||Intel Core i7-3960X Sandy Bridge-E|
|Motherboard||ASUS P9X79 Deluxe|
|Memory||Corsair Dominator DDR3-1600 16GB|
|Hard Drive||OCZ Agility 4 256GB SSD|
|Graphics Card||NVIDIA GeForce GTX 780 Ti 3GB
NVIDIA GeForce GTX 770 2GB
|Graphics Drivers||NVIDIA: 335.23 WHQL, 337.50 Beta|
|Power Supply||Corsair AX1200i|
|Operating System||Windows 8 Pro x64|
The most interesting claims from NVIDIA were spikes as high as 70%+ in Total War: Rome II, so I decided to start there.
First up, let's take a look at the GTX 780 Ti SLI results, the flagship gaming card from NVIDIA.
With this title, running at the Extreme preset, jumps from an average frame rate of 59 FPS to 88 FPS, an increase of 48%! Frame rate variance does increase a bit with the faster average frame rate but it stays within limits of smoothness, but barely.
Next up, the GeForce GTX 770 SLI results.
Results here are even more impressive as the pair of GeForce GTX 770 cards running in SLI jump from 29.5 average FPS to 51 FPS, an increase of 72%!! Even better, this occurs without any kind of frame rate variance increase and in fact, the blue line of the 337.50 driver is actually performing better in that perspective.
All of these tests were run with the latest patch on Total War: Rome II and I did specifically ask NVIDIA if there were any differences in the SLI profiles between these two drivers for this game. I was told absolutely not - this just happens to be the poster child example of changes NVIDIA has made with this DX11 efficiency push.
Of course, not all games are going to see performance improvements like this, or even improvements that are measurable at all. Just as we have seen with other driver enhancements over the years, different hardware configurations, image quality settings and even scenes used to test each game will shift the deltas considerably. I can tell you already that based on some results I have (but am holding for my story next week) performance improvements in other games are ranging from <5% up to 35%+. While those aren't reaching the 72% level we saw in Total War: Rome II above, these kinds of experience changes with driver updates are impressive to see.
Even though we are likely looking at the "best case" for NVIDIA's 337.50 driver changes with the Rome II results here, clearly there is merit behind what the company is pushing. We'll have more results next week!
Subject: Graphics Cards | April 8, 2014 - 06:51 PM | Jeremy Hellstrom
Tagged: asetek, amd, r9 295x2
If you wondered where the custom cooler for the impressively powerful AMD Radeon R9 295X2 came from then wonder no more. The cooler was designed specifically for this card by Asetek, a veteran in cooling computer components with water. You should keep that in mind the next time you think about picking up a third party watercooler!
Asetek, the world’s leading supplier of computer liquid cooling solutions, today announced that its liquid cooling technology will be used to cool AMD’s latest flagship graphics card. The new AMD Radeon R9 295X2 is the world’s fastest graphics card. Boasting 8 gigabytes of memory and over 11 teraflops of computing power, the AMD Radeon R9 295X2 graphics card is the undisputed graphics performance champion.
“Today’s high-end graphic cards pack insane amounts of power into a very small area and removing that heat is no small task. Utilizing our liquid cooling for graphics cards unlocks new opportunities for performance and low noise,” said André Sloth Eriksen, Founder and CEO of Asetek. “The fact that AMD has chosen Asetek liquid cooling for their reference cooling design is a testament to the reliability and performance of our technology.”
The AMD Radeon R9 295X2 is the first graphics card reference design ever to ship with an advanced closed-loop water cooling system. The Asetek-developed liquid cooling system on the AMD Radeon R9 295X2 graphics card delivers significant benefits for the performance-hungry enthusiast, hardcore gamer or Bitcoin miner. Users will appreciate the unobtrusive noise, low GPU and component temperatures, and blistering performance - right out of the box.
“As the most powerful graphics card offered to date, we knew we needed an outstanding custom cooling solution for the AMD Radeon R9 295X2 graphics card,” said Matt Skynner, corporate vice president and general manager, Graphics Business Unit, AMD. “Asetek’s liquid cooling embodies the efficient performance, reliability and reputation we were seeking in a partner. As GPUs become more powerful, the benefits of collaborating with Asetek and integrating our world-class technologies are clear.”
The AMD Radeon R9 295X2 graphics card utilizes Asetek’s proven, maintenance free, factory sealed liquid cooling technology to cool the two powerful GPUs. This liquid cooling design ensures continuous stability even under full load. The card is easy to install and fits in most computer cases on the market today. With more than 1.5 million units in the field today, Asetek liquid cooling provides worry free operation to gamers and PC enthusiasts alike.
Subject: General Tech, Graphics Cards | April 8, 2014 - 06:44 PM | Scott Michaud
Tagged: nvidia, geforce, drivers
NVIDIA's GeForce 337.50 Driver was said to address performance when running DirectX 11-based software. Now that it is out, multiple sources are claiming the vendor-supplied benchmarks are exaggerated or simply untrue.
Going alphabetically, Anandtech tested the R337.50 and R331.xx drivers with a GeForce GTX 780 Ti, finding a double-digit increase with BioShock: Infinite and Metro: Last Light and basically zero improvement for GRID 2, Rome II, Crysis: Warhead, Crysis 3, and Company of Heroes 2. Adding a second GTX 780 Ti into the mix helped matters, seeing a 76% increase in Rome II and about 9% in most of the other titles.
BlackHoleTec is next. Testing the mid-range, but overclocked GeForce 760 between R337.50 and R335.23 drivers, they found slight improvements (1-3 FPS), except for Battlefield 4 and Skyrim (the latter is not DX11 to be fair) which noticed a slight reduction in performance (about 1 FPS).
ExtremeTech, finally, published one benchmark but it did not compare between drivers. All it really shows is CPU scaling in AMD GPUs.
Unfortunately, I do not have any benchmarks to present of my own because I am not a GPU reviewer nor do I have a GPU testbed. Ironically, the launch of the Radeon R9 295 X2 video card might have lessened that number of benchmarks available for NVIDIA's driver, who knows?
If it is true, and R337.50 does basically nothing in a setup with one GPU, I am not exactly sure what NVIDIA was hoping to accomplish. Of course someone was going to test it and publish their results. The point of the driver update was apparently to show how having a close relationship with Microsoft can lead you to better PC gaming products now and in the future. That can really only be the story if you have something to show. Now, at least I expect, we will probably see more positive commentary about Mantle - at least when people are not talking about DirectX 12.
If you own a GeForce card, I would still install the new driver though, especially if you have an SLi configuration. Scaling to a second GPU does see measurable improvements with Release 337.50. Even for a single-card configuration, it certainly should not hurt anything.
Subject: General Tech, Graphics Cards, Processors, Shows and Expos | April 8, 2014 - 03:43 PM | Scott Michaud
Tagged: Intel, NAB, NAB 14, iris pro, Adobe, premiere pro, Adobe CC
When Adobe started to GPU-accelerate their applications beyond OpenGL, it started with NVIDIA and its CUDA platform. After some period of time, they started to integrate OpenCL support and bring AMD into the fold. At first, it was limited to a couple of Apple laptops but has since expanded to include several GPUs on both OSX and Windows. Since then, Adobe switched to a subscription-based release system and has published updates on a more rapid schedule. The next update of Adobe Premiere Pro CC will bring OpenCL to Intel Iris Pro iGPUs.
Of course, they specifically mentioned Adobe Premiere Pro CC which suggests that Photoshop CC users might be coming later. The press release does suggest that the update will affect both Mac and Windows versions of Adobe Premiere Pro CC, however, so at least platforms will not be divided. Well, that is, if you find a Windows machine with Iris Pro graphics. They do exist...
A release date has not been announced for this software upgrade.
A Powerful Architecture
AMD likes to toot its own horn. Just a take a look at the not-so-subtle marketing buildup to the Radeon R9 295X2 dual-Hawaii graphics card, released today. I had photos of me shipped to…me…overnight. My hotel room at GDC was also given a package which included a pair of small Pringles cans (chips) and a bottle of volcanic water. You may have also seen some photos posted of a mysterious briefcase with its side stickered by with the silhouette of a Radeon add-in board.
This tooting is not without some validity though. The Radeon R9 295X2 is easily the fastest graphics card we have ever tested and that says a lot based on the last 24 months of hardware releases. It’s big, it comes with an integrated water cooler, and it requires some pretty damn specific power supply specifications. But AMD did not compromise on the R9 295X2 and, for that, I am sure that many enthusiasts will be elated. Get your wallets ready, though, this puppy will run you $1499.
Both AMD and NVIDIA have a history of producing high quality dual-GPU graphics cards late in the product life cycle. The most recent entry from AMD was the Radeon HD 7990, a pair of Tahiti GPUs on a single PCB with a triple fan cooler. While a solid performing card, the product was released in a time when AMD CrossFire technology was well behind the curve and, as a result, real-world performance suffered considerably. By the time the drivers and ecosystem were fixed, the HD 7990 was more or less on the way out. It was also notorious for some intermittent, but severe, overheating issues, documented by Tom’s Hardware in one of the most harshly titled articles I’ve ever read. (Hey, Game of Thrones started again this week!)
The Hawaii GPU, first revealed back in September and selling today under the guise of the R9 290X and R9 290 products, is even more power hungry than Tahiti. Many in the industry doubted that AMD would ever release a dual-GPU product based on Hawaii as the power and thermal requirements would be just too high. AMD has worked around many of these issues with a custom water cooler and placing specific power supply requirements on buyers. Still, all without compromising on performance. This is the real McCoy.
Subject: Graphics Cards | April 7, 2014 - 07:14 PM | Jeremy Hellstrom
Tagged: msi, R9 290X GAMING 4G, amd, hawaii, R9 290X, Twin Frozr IV, factory overclocked
The familiar Twin Frozr IV cooler has been added to the R9 290X GPU on MSI's latest AMD graphics card. The R9 290X GAMING 4G sports 4GB of GDDR5 running at an even 5GHz and a GPU that has three separate top speeds depending on the profile you choose; 1040 MHz with OC Mode, 1030 MHz for Gaming Mode and 1000 MHz in Silent Mode. [H]ard|OCP also tried manually overclocking and ended up with a peak of 1130MHz GPU and 5.4GHz for the GDDR5, not a bad bump over the factory overclock. Check out the performance of the various speeds in their full review.
"On our test bench today is MSI's newest high-end GAMING series graphics cards in the form of the MSI Radeon R9 290X GAMING 4G video card. We will strap it to our test bench and compare it to the MSI GeForce GTX 780 Ti GAMING 3G card out-of-box and overclocked to determine which card provides the best gameplay experience."
Here are some more Graphics Card articles from around the web:
- HIS Radeon R7 250 iCooler Boost Clock 1GB GDDR5 @ eTeknix
- AMD XFX Radeon R7 250 Core Edition Passive 1GB GDDR5 @ eTeknix
- XFX Radeon R9 290 Double Dissipation Video Card Review @ Legit Reviews
- AMD XFX Radeon R7 240 Core Edition Passive 2GB @ eTeknix
- Sapphire Radeon R7-240 Dual HDMI Review @ Bjorn3d
- AMD Radeon: Windows 8.1 Catalyst vs. Linux Gallium3D vs. Linux Catalyst @ Phoronix
- AMD R9 290X CrossFire Vs Nvidia GTX 780 Ti SLI @ eTeknix
- PowerColor PCS+ AXR9 290X 4GB Video Card Review @ Legit Reviews
- Sapphire Radeon R9 270 2GB Dual-X Edition Video Card Review @HiTech Legion
- Sapphire R9 290X Tri-X Edition Review @ TechwareLab
- NVIDIA's GeForce Driver On Ubuntu 14.04 Runs The Same As Windows 8.1 @ Phoronix
- NVIDIA GeForce 700 Series: Stick To The Binary Linux Drivers @ Phoronix
- ASUS Poseidon GTX 780 Video Card Review @ Legit Reviews
- NVIDIA GeForce GTX 780 Ti Video Card Roundup @ Legit Reviews
- Gigabyte GTX780 Ti Windforce OC @ Kitguru
- MSI R9 270X HAWK & MSI GTX 760 HAWK @ Nitroware
- ASUS ROG POSEIDON GTX 780 @ [H]ard|OCP
Subject: General Tech, Graphics Cards | April 7, 2014 - 09:01 AM | Scott Michaud
Tagged: nvidia, geforce experience, directx 11
We knew that NVIDIA had an impending driver update providing DirectX 11 performance improvements. Launched today, 337.50 still claims significant performance increases over the previous 335.23 version. What was a surprise is GeForce Experience 2.0. This version allows both ShadowPlay and GameStream to operate on notebooks. It also allows ShadowPlay to record, and apparently stream to Twitch, your Windows desktop (but not on notebooks). It also enables Battery Boost, discussed previously.
Personally, I find desktop streaming is the headlining feature, although I rarely use laptops (and much less for gaming). This is especially useful for OpenGL, games which run in windowed mode, and if you want to occasionally screencast without paying for Camtasia or tinkering with CamStudio. If I were to make a critique, and of course I will, I would like the option to select which monitor gets recorded. Its current behavior records the primary monitor as far as I can tell.
I should also mention that, in my testing, "shadow recording" is not supported when not recording a fullscreen game. I'm guessing that NVIDIA believes their users would prefer to not record their desktops until manually started and likewise stopped. It seems like it had to have been a conscious decision. It does limit its usefulness in OpenGL or windowed games, however.
This driver also introduces GameStream for devices out of your home discussed in the SHIELD update.
This slide is SLi improvements, driver-to driver, for the GTX 770 and the 780 Ti.
As for the performance boost, NVIDIA claims up to 64% faster performance in configurations with one active GPU and up to 71% faster in SLI. It will obviously vary on a game-by-game and GPU-by-GPU basis. I do not have any benchmarks, besides a few examples provided by NVIDIA, to share. That said, it is a free driver. If you have a GeForce GPU, download it. It does complicate matters if you are deciding between AMD and NVIDIA, however.
BF4 Integrates FCAT Overlay Support
Back in September AMD publicly announced Mantle, a new lower level API meant to offer more performance for gamers and more control for developers fed up with the restrictions of DirectX. Without diving too much into the politics of the release, the fact that Battlefield 4 developer DICE was integrating Mantle into the Frostbite engine for Battlefield was a huge proof point for the technology. Even though the release was a bit later than AMD had promised us, coming at the end of January 2014, one of the biggest PC games on the market today had integrated a proprietary AMD API.
When I did my first performance preview of BF4 with Mantle on February 1st, the results were mixed but we had other issues to deal with. First and foremost, our primary graphics testing methodology, called Frame Rating, wasn't able to be integrated due to the change of API. Instead we were forced to use an in-game frame rate counter built by DICE which worked fine, but didn't give us the fine grain data we really wanted to put the platform to the test. It worked, but we wanted more. Today we are happy to announce we have full support for our Frame Rating and FCAT testing with BF4 running under Mantle.
A History of Frame Rating
In late 2012 and throughout 2013, testing graphics cards became a much more complicated beast. Terms like frame pacing, stutter, jitter and runts were not in the vocabulary of most enthusiasts but became an important part of the story just about one year ago. Though complicated to fully explain, the basics are pretty simple.
Rather than using software on the machine being tested to measure performance, our Frame Rating system uses a combination of local software and external capture hardware. On the local system with the hardware being evaluated we run a small piece of software called an overlay that draws small colored bars on the left hand side of the game screen that change successively with each frame rendered by the game. Using a secondary system, we capture the output from the graphics card directly, intercepting it from the display output, in real-time in an uncompressed form. With that video file captured, we then analyze it frame by frame, measuring the length of each of those colored bars, how long they are on the screen, how consistently they are displayed. This allows us to find the average frame rate but also to find how smoothly the frames are presented, if there are dropped frames and if there are jitter or stutter issues.
Subject: General Tech, Graphics Cards | April 1, 2014 - 04:42 PM | Tim Verry
Tagged: VCA, nvidia, GTC 2014
NVIDIA launched a new visual computing appliance called the Iray VCA at the GPU Technology Conference last week. This new piece of enterprise hardware uses full GK 110 graphics cards to accelerate the company’s Iray renderer which is used to create photo realistic models in various design programs.
The Iray VCA specifically is a licensed appliance (hardware + software) that combines NVIDIA hardware and software. On the hardware side of things, the Iray VCA is powered by eight graphics cards, dual processors (unspecified but likely Intel Xeons based on usage in last year’s GRID VCA), 256GB of system RAM, and a 2TB SSD. Networking hardware includes two 10GbE NICs, two 1GbE NICs, and one Infiniband connection. In total, the Iray VCA features 20 CPU cores and 23,040 CUDA cores. The GPUs used are based on the full GK110 die and are paired with 12GB of memory each.
Even better, it is a scalable solution such that companies can add additional Iray VCAs to the network. The appliances reportedly transparently accelerate the Iray accelerated renders done on designer’s workstations. NVIDIA reports that an Iray VCA is approximately 60-times faster than a Quadro K5000-powered workstation. Further, according to NVIDIA, 19 Iray VCAs working together amounts to 1 PetaFLOP of compute performance which is enough to render photo realistic simulations using 1 billion rays with up to hundreds of thousands of bounces.
The Iray VCA enables some rather impressive real time renders of 3D models with realistic physical properties and lighting. The models are light simulations that use ray tracing, global illumination and other techniques to show photo realistic models using up to billions of rays of light. NVIDIA is positioning the Iray VCA as an alternative to physical prototyping, allowing designers to put together virtual prototypes that can be iterated and changed at significantly less cost and time.
Iray itself is NVIDIA’s GPU-accelerated photo realistic renderer. The Iray technology is used in a number of design software packages. The Iray VCA is meant to further accelerate that Iray renderer by throwing massive amounts of parallel processing hardware at the resource intensive problem over the network (the Iray VCAs can be installed at a data center or kept on site). Initially the Iray VCA will support 3ds Max, Catia, Bunkspeed, and Maya, but NVIDIA is working on supporting all Iray accelerated software with the VCA hardware.
The virtual prototypes can be sliced and examined and can even be placed in real world environments by importing HDR photos. Jen-Hsun Huang demonstrated this by placing Honda’s vehicle model on the GTC stage (virtually).
In fact, one of NVIDIA’s initial partners with the Iray VCA is Honda. Honda is currently beta testing a cluster of 25 Iray VCAs to refine styling designs for cars and their interiors based on initial artistic work. Honda Research and Development System Engineer Daisuke Ide was quoted by NVIDIA as stating that “Our TOPS tool, which uses NVIDIA Iray on our NVIDIA GPU cluster, enables us to evaluate our original design data as if it were real. This allows us to explore more designs so we can create better designs faster and more affordably.”
The Iray VCA (PDF) will be available this summer for $50,000. The sticker price includes the hardware, Iray license, and the first year of updates and maintenance. This is far from consumer technology, but it is interesting technology that may be used in the design process of your next car or other major purchase.
What do you think about the Iray VCA and NVIDIA's licensed hardware model?
Subject: Editorial, General Tech, Graphics Cards, Processors, Shows and Expos | March 30, 2014 - 01:45 AM | Scott Michaud
Tagged: gdc 14, GDC, GCN, amd
While Mantle and DirectX 12 are designed to reduce overhead and keep GPUs loaded, the conversation shifts when you are limited by shader throughput. Modern graphics processors are dominated by sometimes thousands of compute cores. Video drivers are complex packages of software. One of their many tasks is converting your scripts, known as shaders, into machine code for its hardware. If this machine code is efficient, it could mean drastically higher frame rates, especially at extreme resolutions and intense quality settings.
Emil Persson of Avalanche Studios, probably known best for the Just Cause franchise, published his slides and speech on optimizing shaders. His talk focuses on AMD's GCN architecture, due to its existence in both console and PC, while bringing up older GPUs for examples. Yes, he has many snippets of GPU assembly code.
AMD's GCN architecture is actually quite interesting, especially dissected as it was in the presentation. It is simpler than its ancestors and much more CPU-like, with resources mapped to memory (and caches of said memory) rather than "slots" (although drivers and APIs often pretend those relics still exist) and with how vectors are mostly treated as collections of scalars, and so forth. Tricks which attempt to combine instructions together into vectors, such as using dot products, can just put irrelevant restrictions on the compiler and optimizer... as it breaks down those vector operations into those very same component-by-component ops that you thought you were avoiding.
Basically, and it makes sense coming from GDC, this talk rarely glosses over points. It goes over execution speed of one individual op compared to another, at various precisions, and which to avoid (protip: integer divide). Also, fused multiply-add is awesome.
I know I learned.
As a final note, this returns to the discussions we had prior to the launch of the next generation consoles. Developers are learning how to make their shader code much more efficient on GCN and that could easily translate to leading PC titles. Especially with DirectX 12 and Mantle, which lightens the CPU-based bottlenecks, learning how to do more work per FLOP addresses the other side. Everyone was looking at Mantle as AMD's play for success through harnessing console mindshare (and in terms of Intel vs AMD, it might help). But honestly, I believe that it will be trends like this presentation which prove more significant... even if behind-the-scenes. Of course developers were always having these discussions, but now console developers will probably be talking about only one architecture - that is a lot of people talking about very few things.
This is not really reducing overhead; this is teaching people how to do more work with less, especially in situations (high resolutions with complex shaders) where the GPU is most relevant.