Linked Multi-GPU Arrives... for Developers
The Khronos Group has released the Vulkan 188.8.131.52 specification, which includes experimental (more on that in a couple of paragraphs) support for VR enhancements, sharing resources between processes, and linking similar GPUs. This spec was released alongside a LunarG SDK and NVIDIA drivers, which are intended for developers, not gamers, that fully implement these extensions.
I would expect that the most interesting feature is experimental support for linking similar GPUs together, similar to DirectX 12’s Explicit Linked Multiadapter, which Vulkan calls a “Device Group”. The idea is that the physical GPUs hidden behind this layer can do things like share resources, such as rendering a texture on one GPU and consuming it in another, without the host code being involved. I’m guessing that some studios, like maybe Oxide Games, will decide to not use this feature. While it’s not explicitly stated, I cannot see how this (or DirectX 12’s Explicit Linked mode) would be compatible in cross-vendor modes. Unless I’m mistaken, that would require AMD, NVIDIA, and/or Intel restructuring their drivers to inter-operate at this level. Still, the assumptions that could be made with grouped devices are apparently popular with enough developers for both the Khronos Group and Microsoft to bother.
A slide from Microsoft's DirectX 12 reveal, long ago.
As for the “experimental” comment that I made in the introduction... I was expecting to see this news around SIGGRAPH, which occurs in late-July / early-August, alongside a minor version bump (to Vulkan 1.1).
I might still be right, though.
The major new features of Vulkan 184.108.40.206 are implemented as a new classification of extensions: KHX. In the past, vendors, like NVIDIA and AMD, would add new features as vendor-prefixed extensions. Games could query the graphics driver for these abilities, and enable them if available. If these features became popular enough for multiple vendors to have their own implementation of it, a committee would consider an EXT extension. This would behave the same across all implementations (give or take) but not be officially adopted by the Khronos Group. If they did take it under their wing, it would be given a KHR extension (or added as a required feature).
The Khronos Group has added a new layer: KHX. This level of extension sits below KHR, and is not intended for production code. You might see where this is headed. The VR multiview, multi-GPU, and cross-process extensions are not supposed to be used in released video games until they leave KHX status. Unlike a vendor extension, the Khronos Group wants old KHX standards to drop out of existence at some point after they graduate to full KHR status. It’s not something that NVIDIA owns and will keep it around for 20 years after its usable lifespan just so old games can behave expectedly.
How long will that take? No idea. I’ve already mentioned my logical but uneducated guess a few paragraphs ago, but I’m not going to repeat it; I have literally zero facts to base it on, and I don’t want our readers to think that I do. I don’t. It’s just based on what the Khronos Group typically announces at certain trade shows, and the length of time since their first announcement.
The benefit that KHX does bring us is that, whenever these features make it to public release, developers will have already been using it... internally... since around now. When it hits KHR, it’s done, and anyone can theoretically be ready for it when that time comes.
VR Performance Evaluation
Even though virtual reality hasn’t taken off with the momentum that many in the industry had expected on the heels of the HTC Vive and Oculus Rift launches last year, it remains one of the fastest growing aspects of PC hardware. More importantly for many, VR is also one of the key inflection points for performance moving forward; it requires more hardware, scalability, and innovation than any other sub-category including 4K gaming. As such, NVIDIA, AMD, and even Intel continue to push the performance benefits of their own hardware and technology.
Measuring and validating those claims has proven to be a difficult task. Tools that we used in the era of standard PC gaming just don’t apply. Fraps is a well-known and well-understood tool for measuring frame rates and frame times utilized by countless reviewers and enthusiasts. But Fraps lacked the ability to tell the complete story of gaming performance and experience. NVIDIA introduced FCAT and we introduced Frame Rating back in 2013 to expand the capabilities that reviewers and consumers had access to. Using more sophisticated technique that includes direct capture of the graphics card output in uncompressed form, a software-based overlay applied to each frame being rendered, and post-process analyzation of that data, we were able to communicate the smoothness of a gaming experience, better articulating it to help gamers make purchasing decisions.
VR pipeline when everything is working well.
For VR though, those same tools just don’t cut it. Fraps is a non-starter as it measures frame rendering from the GPU point of view and completely misses the interaction between the graphics system and the VR runtime environment (OpenVR for Steam/Vive and OVR for Oculus). Because the rendering pipeline is drastically changed in the current VR integrations, what Fraps measures is completely different than the experience the user actually gets in the headset. Previous FCAT and Frame Rating methods were still viable but the tools and capture technology needed to be updated. The hardware capture products we used since 2013 were limited in their maximum bandwidth and the overlay software did not have the ability to “latch in” to VR-based games. Not only that but measuring frame drops, time warps, space warps and reprojections would be a significant hurdle without further development.
VR pipeline with a frame miss.
NVIDIA decided to undertake the task of rebuilding FCAT to work with VR. And while obviously the company is hoping that it will prove its claims of performance benefits for VR gaming, it should not be overlooked the investment in time and money spent on a project that is to be open sourced and free available to the media and the public.
NVIDIA FCAT VR is comprised of two different applications. The FCAT VR Capture tool runs on the PC being evaluated and has a similar appearance to other performance and timing capture utilities. It uses data from Oculus Event Tracing as a part of the Windows ETW and SteamVR’s performance API, along with NVIDIA driver stats when used on NVIDIA hardware to generate performance data. It will and does work perfectly well on any GPU vendor’s hardware though with the access to the VR vendor specific timing results.
Subject: Graphics Cards | February 28, 2017 - 10:59 PM | Ryan Shrout
Tagged: pascal, nvidia, gtx 1080 ti, gp102, geforce
Tonight at a GDC party hosted by CEO Jen-Hsun Huang, NVIDIA announced the GeForce GTX 1080 Ti graphics card, coming next week for $699. Let’s dive right into the specifications!
|GTX 1080 Ti||Titan X (Pascal)||GTX 1080||GTX 980 Ti||TITAN X||GTX 980||R9 Fury X||R9 Fury||R9 Nano|
|GPU||GP102||GP102||GP104||GM200||GM200||GM204||Fiji XT||Fiji Pro||Fiji XT|
|Base Clock||1480 MHz||1417 MHz||1607 MHz||1000 MHz||1000 MHz||1126 MHz||1050 MHz||1000 MHz||up to 1000 MHz|
|Boost Clock||1600 MHz||1480 MHz||1733 MHz||1076 MHz||1089 MHz||1216 MHz||-||-||-|
|Memory Clock||11000 MHz||10000 MHz||10000 MHz||7000 MHz||7000 MHz||7000 MHz||500 MHz||500 MHz||500 MHz|
|Memory Interface||352-bit||384-bit G5X||256-bit G5X||384-bit||384-bit||256-bit||4096-bit (HBM)||4096-bit (HBM)||4096-bit (HBM)|
|Memory Bandwidth||484 GB/s||480 GB/s||320 GB/s||336 GB/s||336 GB/s||224 GB/s||512 GB/s||512 GB/s||512 GB/s|
|TDP||250 watts||250 watts||180 watts||250 watts||250 watts||165 watts||275 watts||275 watts||175 watts|
|Peak Compute||10.6 TFLOPS||10.1 TFLOPS||8.2 TFLOPS||5.63 TFLOPS||6.14 TFLOPS||4.61 TFLOPS||8.60 TFLOPS||7.20 TFLOPS||8.19 TFLOPS|
The GTX 1080 Ti looks a whole lot like the TITAN X launched in August of last year. Based on the 12B transistor GP102 chip, the new GTX 1080 Ti will have 3,584 CUDA core with a 1.60 GHz Boost clock. That gives it the same processor count as Titan X but with a slightly higher clock speed which should make the new GTX 1080 Ti slightly faster by at least a few percentage points and has a 4.7% edge in base clock compute capability. It has 28 SMs, 28 geometry units, 224 texture units.
Interestingly, the memory system on the GTX 1080 Ti gets adjusted – NVIDIA has disabled a single 32-bit memory controller to give the card a total of 352-bit wide bus and an odd-sounding 11GB memory capacity. The ROP count also drops to 88 units. Speaking of 11, the memory clock on the G5X implementation on GTX 1080 Ti will now run at 11 Gbps, a boost available to NVIDIA thanks to a chip revision from Micron and improvements to equalization and reverse signal distortion.
The TDP of the new part is 250 watts
, falling between the Titan X and the GTX 1080. That’s an interesting move considering that the GP102 was running at 250 watts with identical to the Titan product. The cooler has been improved compared to the GTX 1080, offering quieter fan speeds and lower temperatures when operating at the same power envelope.
Performance estimates from NVIDIA put the GTX 1080 Ti about 35% faster than the GTX 1080, the largest “kicker performance increase” that we have seen from a flagship Ti launch.
Pricing is going to be set at $699 so don't expect to find this in any budget builds. But for the top performing GeForce card on the market, it's what we expect. It should be on virtual shelves starting next week.
(Side note, with the GTX 1080 getting a $100 price drop tonight, I think we'll find this new lineup very compelling to enthusiasts.)
NVIDIA did finally detail its tiled caching rendering technique. We'll be diving more into that in a separate article with a little more time for research.
One more thing…
In another interesting move, NVIDIA is going to be offering “overclocked” versions of the GTX 1080 and GTX 1060 with +1 Gbps memory speeds. Partners will be offering them with some undisclosed price premium.
I don’t know how much performance this will give us but it’s clear that NVIDIA is preparing its lineup for the upcoming AMD Vega release.
We’ll have more news from NVIDIA and GDC as it comes!
Subject: Graphics Cards | February 28, 2017 - 10:55 PM | Tim Verry
Tagged: pascal, nvidia, GTX 1080, GDC
Update Feb 28 @ 10:03pm It's official, NVIDIA launches $699 GTX 1080 Ti.
NVIDIA is hosting a "Gaming Celebration" live event during GDC 2017 to talk PC gaming and possibly launch new hardware (if rumors are true!). During the event, NVIDIA CEO Jen-Hsun Huang made a major announcement regarding its top-end GTX 1080 graphics card with a price drop to $499 effective immediately.
The NVIDIA GTX 1080 is a pascal based graphics card with 2560 CUDA cores paired with 8GB of GDDR5X memory. Graphics cards based on this GP104 GPU are currently selling for around $580 to $700 (most are around $650+/-) with the "Founders Edition" having an MSRP of $699. The $499 price teased at the live stream represents a significant price drop compared to what the graphics cards are going for now. NVIDIA did not specify if the new $499 MSRP was the new Founders Edition price or an average price that includes partner cards as well but even if it only happened on the reference cards, the partners would have to adjust their prices downwards accordingly to compete.
I suspect that NVIDIA is making such a bold move to make room in their lineup for a new product (the long-rumored 1080 Ti perhaps?) as well as a pre-emptive strike against AMD and their Radeon RX Vega products. This move may also be good news for GTX 1070 pricing as they may also see price drops to make room for cheaper GTX 1080 partner cards that come in below the $499 price point.
If you have been considering buying a new graphics card, NVIDIA has sweetened the pot a bit especially if you had already been eyeing a GTX 1080. (Note that while the price drop is said to be effective immediately, at the time of writing Amazon was still showing "normal"/typical prices for the cards. Enthusiasts might have to wait a few hours or days for the retailers to catch up and update their sites.)
This makes me a bit more excited to see what AMD will have to offer with Vega as well as the likelihood of a GTX 1080 Ti launch happening sooner rather than later!
Subject: General Tech, Graphics Cards | February 27, 2017 - 03:39 PM | Jeremy Hellstrom
Tagged: MWC, GDC, VRMark, Servermark, OptoFidelity, cyan room, benchmark
Futuremark are showing off new benchmarks at GDC and MWC, the two conferences which are both happening this week. We will have quite a bit of coverage this week as we try to keep up with simultaneous news releases and presentations.
First up is a new benchmark in their recently released DX12 VRMark suite, the new Cyan Room which sits between the existing two in the suite. The Orange Room is to test if your system is capable of providing you with an acceptable VR experience or if your system falls somewhat short of the minimum requirements while the Blue Room is to show off what a system that exceeds the recommended specs can manage. The Cyan room will be for those who know that their system can handle most VR, and need to test their systems settings. If you don't have the test suite Humble Bundle has a great deal on this suite and several other tools, if you act quickly.
Next up is a new suite to test Google Daydream, Google Cardboard, and Samsung Gear VR performance and ability. There is more than just performance to test when you are using your phone to view VR content, such as avoiding setting your eyeholes on fire. The tests will help you determine just how long your device can run VR content before overheating becomes an issue and interferes with performance, as well as helping you determine your battery life.
VR Latency testing is the next in the list of announcements and is very important when it comes to VR as high or unstable latency is the reason some users need to add a bucket to their list of VR essentials. Futuremark have partnered with OptoFidelity to produce VR Multimeter HMD hardware based testing. This allows you, and hopefully soon PCPer as well, to test motion-to-photon latency, display persistence, and frame jitter as well as audio to video synchronization and motion-to-audio-latency all of which could lead to a bad time.
Last up is the brand new Servermark to test the performance you can expect out of virtual servers, media servers and other common tasks. The VDI test lets you determine if a virtual machine has been provisioned at a level commensurate to the assigned task, so you can adjust it as required. The Media Transcode portion lets you determine the maximum number of concurrent streams as well as the maximum quality of those streams which your server can handle, very nice for those hosting media for an audience.
Expect to hear more as we see the new benchmarks in action.
Subject: Graphics Cards | February 20, 2017 - 02:54 PM | Jeremy Hellstrom
Tagged: nvidia, gtx 1080 Xtreme Edition, GTX 1080, gigabyte, aorus
Gigabyte created their Aorus line of products to attract enthusiasts away from some of the competitions sub-brands, such as ASUS ROG. It is somewhat similar to the Gigabyte Xtreme Edition released last year but their are some differences, such as the large copper heatsink attached to the bottom of the GPU. The stated clockspeeds are the same as last years model and it also sports the two HDMI connections on the front of the card to connect to Gigabyte's VR Extended Front panel. The Tech Report manually overclocked the card and saw the Aorus reach the highest frequencies they have seen from a GP104 chip, albeit by a small margin. Check out the full review right here.
"Aorus is expanding into graphics cards today with the GeForce GTX 1080 Xtreme Edition 8G, a card that builds on the strong bones of Gigabyte's Editor's Choice-winning GTX 1080 Xtreme Gaming. We dig in to see whether Aorus' take on a GTX 1080 is good enough for a repeat."
Here are some more Graphics Card articles from around the web:
- Gigabyte GTX 1080 Aorus Xtreme Edition 8 GB @ techPowerUp
- NVIDIA’s Fastest Graphics Card Ever: A Look At The Quadro P6000 @ Techgage
- Radeon Windows 10 vs. Linux RadeonSI/RADV Gaming Performance @ Phoronix
- Windows 10 vs. Ubuntu Linux Gaming Performance With NVIDIA GeForce GTX 1060/1080 @ Phoronix
Subject: Graphics Cards | February 17, 2017 - 07:42 AM | Scott Michaud
Tagged: nvidia, graphics drivers
Just a couple of days after publishing 378.66, NVIDIA released GeForce 378.72 Hotfix drivers. This fixes a bug encoding video in Steam’s In-Home Streaming, and it also fixes PhysX not being enabled on the GPU under certain conditions. Normally, hotfix drivers solve large-enough issues that were introduced with the previous release. This time, as far as I can tell, is a little different, though. Instead, these fixes seem to be intended for 378.66 but, for one reason or another, couldn’t be integrated and tested in time for the driver to be available for the game launches.
This is an interesting effect of the Game Ready program. There is value in having a graphics driver available on the same day (or early) as a major game releases, so that people can enjoy the title as soon as it is available. There is also value in having as many fixes as the vendor can provide. These conditions oppose each other to some extent.
From a user standpoint, driver updates are cumulative, so they are able to skip a driver or two if they are not affected by any given issue. AMD has taken up a similar structure, some times releasing three or four drivers in a month with only, like, one of them being WHQL certified. For these reasons, I tend to lean on the side of “release ‘em as you got them”. Still, I can see people feeling a little uneasy about a driver being released incomplete to hit a due-date.
But, again, that due-date has value.
It’s interesting. I’m personally glad that AMD and NVIDIA are on a rapid-release schedule, but I can see where complaints could arise. What’s your opinion?
Living Long and Prospering
The open fork of AMD’s Mantle, the Vulkan API, was released exactly a year ago with, as we reported, a hard launch. This meant public, but not main-branch drivers for developers, a few public SDKs, a proof-of-concept patch for The Talos Principle, and, of course, the ratified specification. This sets up the API to find success right out of the gate, and we can now look back over the year since.
Thor's hammer, or a tempest in a teapot?
The elephant in the room is DOOM. This game has successfully integrated the API and it uses many of its more interesting features, like asynchronous compute. Because the API is designed in a sort-of “make a command, drop it on a list” paradigm, the driver is able to select commands based on priority and available resources. AMD’s products got a significant performance boost, relative to OpenGL, catapulting their Fury X GPU up to the enthusiast level that its theoretical performance suggested.
Mobile developers have been picking up the API, too. Google, who is known for banishing OpenCL from their Nexus line and challenging OpenGL ES with their Android Extension Pack (later integrated into OpenGL ES with version 3.2), has strongly backed Vulkan. The API was integrated as a core feature of Android 7.0.
On the engine and middleware side of things, Vulkan is currently “ready for shipping games” as of Unreal Engine 4.14. It is also included in Unity 5.6 Beta, which is expected for full release in March. Frameworks for emulators are also integrating Vulkan, often just to say they did, but sometimes to emulate the quirks of these system’s offbeat graphics co-processors. Many other engines, from Source 2 to Torque 3D, have also announced or added Vulkan support.
Finally, for the API itself, The Khronos Group announced (pg 22 from SIGGRAPH 2016) areas that they are actively working on. The top feature is “better” multi-GPU support. While Vulkan, like OpenCL, allows developers to enumerate all graphics devices and target them, individually, with work, it doesn’t have certain mechanisms, like being able to directly ingest output from one GPU into another. They haven’t announced a timeline for this.
Subject: Graphics Cards | February 16, 2017 - 03:35 PM | Jeremy Hellstrom
Tagged: msi, AERO ITX, gtx 1070, gtx 1060, gtx 1050, GTX 1050 Ti, SFF, itx
MSI have just release their new series of ITX compatible GPUs, covering NVIDIA's latest series of cards from the GTX 1050 through to the GTX 1070; the GTX 1080 is not available in this form factor. The GTX 1070 and 1060 are available in both factory overclocked and standard versions.
All models share a similar design, with a single TORX fan with 8mm Super Pipes and the Zero Frozr feature which stops the fan to give silent operation when temperatures are below 60C. They are all compatible with the Afterburner Overclocking Utility, including recordings via Predator and wireless control from your phone.
The overclocked cards run slightly over reference, from the GTX 1070 at 1721MHz boost, 1531MHz base with the GDDR5 at 8GHz to the GTX 1050 at 1518MHz boost, 1404MHz base and the GDDR5 at 7GHz. The models which do not bear the OC moniker run at NVIDIA's reference clocks even if they are not quite fully grown.
Subject: Graphics Cards | February 14, 2017 - 09:29 PM | Scott Michaud
Tagged: opencl 2.0, opencl, nvidia, graphics drivers
While the headline of the GeForce 378.66 graphics driver release is support for For Honor, Halo Wars 2, and Sniper Elite 4, NVIDIA has snuck something major into the 378 branch: OpenCL 2.0 is now available for evaluation. (I double-checked 378.49 release notes and confirmed that this is new to 378.66.)
OpenCL 2.0 support is not complete yet, but at least NVIDIA is now clearly intending to roll it out to end-users. Among other benefits, OpenCL 2.0 allows kernels (think shaders) to, without the host intervening, enqueue work onto the GPU. This saves one (or more) round-trips to the CPU, especially in workloads where you don’t know which kernel will be required until you see the results of the previous run, like recursive sorting algorithms.
So yeah, that’s good, albeit you usually see big changes at the start of version branches.
Another major addition is Video SDK 8.0. This version allows 10- and 12-bit decoding of VP9 and HEVC video. So... yeah. Applications that want to accelerate video encoding or decoding can now hook up to NVIDIA GPUs for more codecs and features.
NVIDIA’s GeForce 378.66 drivers are available now.