Caught Up to DirectX 12 in a Single Day
I'm not just talking about the specification. Members of the Khronos Group have also released compatible drivers, SDKs and tools to support them, conformance tests, and a proof-of-concept patch for Croteam's The Talos Principle. To reiterate, this is not a soft launch. The API, and its entire ecosystem, is out and ready for the public on Windows (at least 7+ at launch but a surprise Vista or XP announcement is technically possible) and several distributions of Linux. Google will provide an Android SDK in the near future.
I'm going to editorialize for the next two paragraphs. There was a concern that Vulkan would be too late. The thing is, as of today, Vulkan is now just as mature as DirectX 12. Of course, that could change at a moment's notice; we still don't know how the two APIs are being adopted behind the scenes. A few DirectX 12 titles are planned to launch in a few months, but no full, non-experimental, non-early access game currently exists. Each time I say this, someone links the Wikipedia list of DirectX 12 games. If you look at each entry, though, you'll see that all of them are either: early access, awaiting an unreleased DirectX 12 patch, or using a third-party engine (like Unreal Engine 4) that only list DirectX 12 as an experimental preview. No full, released, non-experimental DirectX 12 game exists today. Besides, if the latter counts, then you'll need to accept The Talos Principle's proof-of-concept patch, too.
But again, that could change. While today's launch speaks well to the Khronos Group and the API itself, it still needs to be adopted by third party engines, middleware, and software. These partners could, like the Khronos Group before today, be privately supporting Vulkan with the intent to flood out announcements; we won't know until they do... or don't. With the support of popular engines and frameworks, dependent software really just needs to enable it. This has not happened for DirectX 12 yet, and, now, there doesn't seem to be anything keeping it from happening for Vulkan at any moment. With the Game Developers Conference just a month away, we should soon find out.
But back to the announcement.
Vulkan-compatible drivers are launching today across multiple vendors and platforms, but I do not have a complete list. On Windows, I was told to expect drivers from NVIDIA for Windows 7, 8.x, 10 on Kepler and Maxwell GPUs. The standard is compatible with Fermi GPUs, but NVIDIA does not plan on supporting the API for those users due to its low market share. That said, they are paying attention to user feedback and they are not ruling it out, which probably means that they are keeping an open mind in case some piece of software gets popular and depends upon Vulkan. I have not heard from AMD or Intel about Vulkan drivers as of this writing, one way or the other. They could even arrive day one.
On Linux, NVIDIA, Intel, and Imagination Technologies have submitted conformant drivers.
Drivers alone do not make a hard launch, though. SDKs and tools have also arrived, including the LunarG SDK for Windows and Linux. LunarG is a company co-founded by Lens Owen, who had a previous graphics software company that was purchased by VMware. LunarG is backed by Valve, who also backed Vulkan in several other ways. The LunarG SDK helps developers validate their code, inspect what the API is doing, and otherwise debug. Even better, it is also open source, which means that the community can rapidly enhance it, even though it's in a releasable state as it is. RenderDoc,
the open-source graphics debugger by Crytek, will also add Vulkan support. ((Update (Feb 16 @ 12:39pm EST): Baldur Karlsson has just emailed me to let me know that it was a personal project at Crytek, not a Crytek project in general, and their GitHub page is much more up-to-date than the linked site.))
The major downside is that Vulkan (like Mantle and DX12) isn't simple.
These APIs are verbose and very different from previous ones, which requires more effort.
Image Credit: NVIDIA
There really isn't much to say about the Vulkan launch beyond this. What graphics APIs really try to accomplish is standardizing signals that enter and leave video cards, such that the GPUs know what to do with them. For the last two decades, we've settled on an arbitrary, single, global object that you attach buffers of data to, in specific formats, and call one of a half-dozen functions to send it.
Compute APIs, like CUDA and OpenCL, decided it was more efficient to handle queues, allowing the application to write commands and send them wherever they need to go. Multiple threads can write commands, and multiple accelerators (GPUs in our case) can be targeted individually. Vulkan, like Mantle and DirectX 12, takes this metaphor and adds graphics-specific instructions to it. Moreover, GPUs can schedule memory, compute, and graphics instructions at the same time, as long as the graphics task has leftover compute and memory resources, and / or the compute task has leftover memory resources.
This is not necessarily a “better” way to do graphics programming... it's different. That said, it has the potential to be much more efficient when dealing with lots of simple tasks that are sent from multiple CPU threads, especially to multiple GPUs (which currently require the driver to figure out how to convert draw calls into separate workloads -- leading to simplifications like mirrored memory and splitting workload by neighboring frames). Lots of tasks aligns well with video games, especially ones with lots of simple objects, like strategy games, shooters with lots of debris, or any game with large crowds of people. As it becomes ubiquitous, we'll see this bottleneck disappear and games will not need to be designed around these limitations. It might even be used for drawing with cross-platform 2D APIs, like Qt or even webpages, although those two examples (especially the Web) each have other, higher-priority bottlenecks. There are also other benefits to Vulkan.
The WebGL comparison is probably not as common knowledge as Khronos Group believes.
Still, Khronos Group was criticized when WebGL launched as "it was too tough for Web developers".
It didn't need to be easy. Frameworks arrived and simplified everything. It's now ubiquitous.
In fact, Adobe Animate CC (the successor to Flash Pro) is now a WebGL editor (experimentally).
Open platforms are required for this to become commonplace. Engines will probably target several APIs from their internal management APIs, but you can't target users who don't fit in any bucket. Vulkan brings this capability to basically any platform, as long as it has a compute-capable GPU and a driver developer who cares.
Thankfully, it arrived before any competitor established market share.
Subject: Graphics Cards | February 10, 2016 - 05:59 PM | Scott Michaud
Tagged: VR, vive vr, Oculus, evga, 980 Ti
You might wonder what makes a graphics card “designed for VR,” but this is actually quite interesting. Rather than plugging your headset into the back of your desktop, EVGA includes a 5.25” bay that provides 2x USB 3.0 ports and 1x HDMI 2.0 connection. The use case is that some users will want to easily connect and disconnect their VR devices, which, knowing a few indie VR developers, seems to be a part of their workflow. The same may be true of gamers, but I'm not sure.
While the bay allows for everything, including the HDMI plug via an on-card port, to be connected internally, you will need a spare USB 3.0 header on your motherboard to hook it up. It would have been interesting to see whether EVGA could have attached a USB 3.0 controller on the add-in board, but that might have been impossible (or unpractical) given that the PCIe connector would need to be shared with the GPU (not to mention the complexity of also adding a USB 3.0 controller to the board). Also, I expect motherboards should have at least one. If not, you can find USB 3.0 add-in cards with internal headers.
The card comes in two sub-versions, one with the NVIDIA-style blower cooler, and the other with EVGA's ACX 2.0+ cooler. I tend to prefer exposed fan GPUs because they're easier to blow air into after a few years, but you might have other methods to control dust.
Both are currently available for $699.99 on Newegg.com, while Amazon only lists the ACX2.0+ cooler version, and that's out of stock. It is also $699.99, though, so that should be what to expect.
Early testing for higher end GPUs
UPDATE 2/5/16: Nixxes released a new version of Rise of the Tomb Raider today with some significant changes. I have added another page at the end of this story that looks at results with the new version of the game, a new AMD driver and I've also included some SLI and CrossFire results.
I will fully admit to being jaded by the industry on many occasions. I love my PC games and I love hardware but it takes a lot for me to get genuinely excited about anything. After hearing game reviewers talk up the newest installment of the Tomb Raider franchise, Rise of the Tomb Raider, since it's release on the Xbox One last year, I've been waiting for its PC release to give it a shot with real hardware. As you'll see in the screenshots and video in this story, the game doesn't appear to disappoint.
Rise of the Tomb Raider takes the exploration and "tomb raiding" aspects that made the first games in the series successful and applies them to the visual quality and character design brought in with the reboot of the series a couple years back. The result is a PC game that looks stunning at any resolution, but even more so in 4K, that pushes your hardware to its limits. For single GPU performance, even the GTX 980 Ti and Fury X struggle to keep their heads above water.
In this short article we'll look at the performance of Rise of the Tomb Raider with a handful of GPUs, leaning towards the high end of the product stack, and offer up my view on whether each hardware vendor is living up to expectations.
Subject: Graphics Cards | February 4, 2016 - 05:51 PM | Jeremy Hellstrom
Tagged: gainward, GTX 960 Phantom 4GB. gtx 960, NVIDA, 4GB
If you don't have a lot of cash on hand for games or hardware, a 4k adaptive sync monitor with two $600 GPUs and a collection of $80 AAA titles simply isn't on your radar. That doesn't mean you have to toss in your love of gaming for occasional free to play gaming sessions; you just have to adapt. A prime example are those die hard Skyrim fans who have modded the game to oblivion over the past few years, with many other games and communities that may not be new but are still thriving. Chances are that you are playing at 1080p so a high powered GPU is not needed, however mods that upscale textures and many others do love huge tracts of RAM.
So for those outside of North America looking for a card they can afford after a bit of penny pinching, check out Legion Hardware's review of the 4GB version of the Gainward GTX 960 Phantom. It won't break any benchmarking records but it will let you play the games you love and even new games as their prices inevitably decrease over time.
Today we are checking out Gainward’s premier GeForce GTX 960 graphics card, the Phantom 4GB. Equipped with twice the memory buffer of standard cards, it is designed for extreme 1080p gaming. Therefore it will be interesting to see how the Phantom 4GB compares to a 2GB GTX 960..."
Here are some more Graphics Card articles from around the web:
- GIGABYTE GTX 980 Ti G1 Gaming Review @ Hardware Canucks
- Inno3D GeForce GTX 980Ti X3 Ultra DHS @ eTeknix
- Desktop Graphics Card Comparison Guide @ TechARP
- Sapphire Nitro R9 Fury OC 4GB @ Kitguru
Subject: Graphics Cards | February 3, 2016 - 02:37 AM | Tim Verry
Tagged: virtual machines, virtual graphics, mxgpu, gpu virtualization, firepro, amd
AMD made an interesting enterprise announcement today with the introduction of new FirePro S-Series graphics cards that integrate hardware-based virtualization technology. The new FirePro S1750 and S1750 x2 are aimed at virtualized workstations, render farms, and cloud gaming platforms where each virtual machine has direct access to the graphics hardware.
The new graphics cards use a GCN-based Tonga GPU with 2,048 stream processors paired with 8GB of ECC GDDR5 memory on the single slot FirePro S1750. The dual slot FirePro S1750 x2, as the name suggests, is a dual GPU card that features a total of 4,096 shaders (2,048 per GPU) and 16 GB of ECC GDDR5 (8 GB per GPU). The S1750 has a TDP of 150W while the dual-GPU S1750 x2 variant is rated at 265W and either can be passively cooled.
Where the graphics cards get niche is the inclusion of what AMD calls MxGPU (Multi-User GPU) technology which is derived from the SR-IOV (Single Root Input/Output Virtualization) PCI-Express standard. According to AMD, the new FirePro S-Series allows virtual machines direct access to the full range of GPU hardware (shaders, memory, ect.) and OpenCL 2.0 support on the software side. The S1750 supports up to 16 simultaneous users and the S1750 x2 tops out at 32 users. Each virtual machine is allocated an equal slice of the GPU, and as you add virtual machines the equal slices get smaller. AMD’s solution to that predicament is to add more GPUs to spread out the users and allocate each VM more hardware horsepower. It is worth noting that AMD has elected not to charge companies any per-user licensing fees for all these VMs the hardware supports which should make these cards more competitive.
The graphics cards use ECC memory to correct errors when dealing with very large numbers and calculations and every VM is reportedly protected and isolated such that one VM can not access any data of a different VM stored in graphics memory.
I am interested to see how these stack up compared to NVIDIA’s GRID and VGX GPU virtualization specialized graphics cards. The difference between the software versus hardware-based virtualization may not make much difference, but AMD’s approach may be every so slightly more efficient with the removal of layer between the virtual machine and hardware. We’ll have to wait and see, however.
Enterprise users will be able to pick up the new cards installed in systems from server manufacturers sometime in the first half of 2016. Pricing for the cards themselves appears to be $2,399 for the single GPU S1750 and $3,999 for the dual GPU S1750 x2.
Needless to say, this is all a bit more advanced (and expensive!) than the somewhat finicky 3D acceleration option desktop users can turn on in VMWare and VirtualBox! Are you experimenting with remote workstations and virtual machines for thin clients that can utilize GPU muscle? Does AMD’s MxGPU approach seem promising?
Subject: General Tech, Graphics Cards, Motherboards, Cases and Cooling | February 2, 2016 - 02:07 PM | Ryan Shrout
Tagged: Z170, PSU, power supply, motherboard, GTX 970, giveaway, ftw, evga, contest
For many of you reading this, the temperature outside has fallen to its deepest levels, making it hard to even bare the thought of going outdoors. What would help out a PC enthusiast and gamer in this situation? Some new hardware, delivered straight to your door, to install and assist in warming up your room, that's what!
PC Perspective has partnered up with EVGA to offer up three amazing prizes for our fans. They include a 750 G2 power supply (obviously with a 750 watt rating), a Z170 FTW motherboard and a GTX 970 SSC Gaming ACX 2.0+ graphics card. The total prize value is over $650 based on MSRPs!
All you have to do to enter is follow the easy steps in the form below.
We want to thank EVGA for its support of PC Perspective in this contest and over the years. Here's to a great 2016 for everyone!
Subject: Graphics Cards | January 25, 2016 - 03:19 PM | Jeremy Hellstrom
Tagged: XFX R9 380 Double Dissipation Black Edition OC 4GB, xfx, gtx 960
In one corner is the XFX R9 380 DD Black Edition OC 4GB, at factory settings and with an overclock of 1170MHz core and 6.4GHz memory and in the other corner is a GTX 960 with a 1178MHz Boost clock and 7GHz memory. These two contenders will compete in a six round 1080p match featuring Fallout 4, Project Cars, Witcher 3, GTAV, Dying Light and BF4 to see which is worthy of your hard earned buckaroos. Your referee for today will be [H]ard|OCP, tune in to see the final results.
"Today we evaluate a custom R9 380 from XFX, the XFX R9 380 DD BLACK EDITION OC 4GB. Sporting a hefty factory overclock and the Ghost Thermal 3.0 custom cooling with Double Dissipation, we compare it to an equally priced reference GeForce GTX 960. Find out which video card provides the better bargain."
Here are some more Graphics Card articles from around the web:
- ASUS STRIX R9 380X DirectCU II OC 1080p @ [H]ard|OCP
- Sapphire R9 390 Nitro 8 GB @ techPowerUp
- The OpenGL Speed & Perf-Per-Watt From The Radeon HD 2000/3000 Series Through The R9 Fury @ Phoronix
- 1080p NVIDIA Linux Comparison From GeForce 8 To GeForce 900 Series @ Phoronix
Subject: Graphics Cards | January 25, 2016 - 11:51 AM | Ryan Shrout
Tagged: fury x2, Fiji, dual fiji, amd
Lo and behold! The dual-Fiji card that we have previous dubbed the AMD Radeon Fury X2 still lives! Based on a tweet from AMD PR dude Antal Tungler, a PC from Falcon Northwest at the VRLA convention was utilizing a dual-GPU Fiji graphics card to power some demos.
— Antal Tungler (@coloredrocks) January 23, 2016
This prototype Falcon Northwest Tiki system was housing the GPU beast but no images were shown of the interior of the system. Still, it's good to see AMD at least recognize that this piece of hardware still exists at all, since it was initially promised to the enthusiast market by "fall of 2015." Even in October we had hints that the card might be coming soon after seeing some shipping manifests leak out to the web.
Better late than never, right? One theory floating around inside the offices here is that AMD is going to release the Fury X2 along with the VR headsets coming out this spring, with hopes of making it THE VR graphics card of choice. The value of using multi-GPU for VR is interesting, with one GPU dedicated to each eye, though the pitfalls that could haunt both AMD and NVIDIA in this regard (latency, frame time consistency) make the technological capability a debate.
Subject: Graphics Cards, Memory | January 22, 2016 - 11:08 AM | Ryan Shrout
Tagged: Polaris, pascal, nvidia, jedec, gddr5x, GDDR5, amd
Though information about the technology has been making rounds over the last several weeks, GDDR5X technology finally gets official with an announcement from JEDEC this morning. The JEDEC Solid State Foundation is, as Wikipedia tells us, an "independent semiconductor engineering trade organization and standardization body" that is responsible for creating memory standards. Getting the official nod from the org means we are likely to see implementations of GDDR5X in the near future.
The press release is short and sweet. Take a look.
ARLINGTON, Va., USA – JANUARY 21, 2016 –JEDEC Solid State Technology Association, the global leader in the development of standards for the microelectronics industry, today announced the publication of JESD232 Graphics Double Data Rate (GDDR5X) SGRAM. Available for free download from the JEDEC website, the new memory standard is designed to satisfy the increasing need for more memory bandwidth in graphics, gaming, compute, and networking applications.
Derived from the widely adopted GDDR5 SGRAM JEDEC standard, GDDR5X specifies key elements related to the design and operability of memory chips for applications requiring very high memory bandwidth. With the intent to address the needs of high-performance applications demanding ever higher data rates, GDDR5X is targeting data rates of 10 to 14 Gb/s, a 2X increase over GDDR5. In order to allow a smooth transition from GDDR5, GDDR5X utilizes the same, proven pseudo open drain (POD) signaling as GDDR5.
“GDDR5X represents a significant leap forward for high end GPU design,” said Mian Quddus, JEDEC Board of Directors Chairman. “Its performance improvements over the prior standard will help enable the next generation of graphics and other high-performance applications.”
JEDEC claims that by using the same signaling type as GDDR5 but it is able to double the per-pin data rate to 10-14 Gb/s. In fact, based on leaked slides about GDDR5X from October, JEDEC actually calls GDDR5X an extension to GDDR5, not a new standard. How does GDDR5X reach these new speeds? By doubling the prefech from 32 bytes to 64 bytes. This will require a redesign of the memory controller for any processor that wants to integrate it.
Image source: VR-Zone.com
As for usable bandwidth, though information isn't quoted directly, it would likely see a much lower increase than we are seeing in the per-pin statements from the press release. Because the memory bus width would remain unchanged, and GDDR5X just grabs twice the chunk sizes in prefetch, we should expect an incremental change. No mention of power efficiency is mentioned either and that was one of the driving factors in the development of HBM.
Performance efficiency graph from AMD's HBM presentation
I am excited about any improvement in memory technology that will increase GPU performance, but I can tell you that from my conversations with both AMD and NVIDIA, no one appears to be jumping at the chance to integrate GDDR5X into upcoming graphics cards. That doesn't mean it won't happen with some version of Polaris or Pascal, but it seems that there may be concerns other than bandwidth that keep it from taking hold.
Subject: Graphics Cards | January 20, 2016 - 03:26 PM | Scott Michaud
Tagged: nvidia, linux, tesla, fermi, kepler, maxwell
It's nice to see long-term roundups every once in a while. They do not really provide useful information for someone looking to make a purchase, but they show how our industry is changing (or not). In this case, Phoronix tested twenty-seven NVIDIA GeForce cards across four architectures: Tesla, Fermi, Kepler, and Maxwell. In other words, from the GeForce 8 series all the way up to the GTX 980 Ti.
Image Credit: Phoronix
Nine years of advancements in ASIC design, with a doubling time-step of 18 months, should yield a 64-fold improvement. The number of transistors falls short, showing about a 12-fold improvement between the Titan X and the largest first-wave Tesla, although that means nothing for a fabless semiconductor designer. The main reason why I include this figure is to show the actual Moore's Law trend over this time span, but it also highlights the slowdown in process technology.
Performance per watt does depend on NVIDIA though, and the ratio between the GTX 980 Ti and the 8500 GT is about 72:1. While this is slightly better than the target 64:1 ratio, these parts are from very different locations in their respective product stacks. Swapping the 8500 GT for the following year's 9800 GTX, which leads to a comparison between top-of-the-line GPUs of their respective times, and you see a 6.2x improvement in performance per watt versus the GTX 980 Ti. On the other hand, that part was outstanding for its era.
I should note that each of these tests take place on Linux. It might not perfectly reflect the landscape on Windows, but again, it's interesting in its own right.