Subject: General Tech | September 14, 2018 - 10:32 PM | Scott Michaud
Tagged: rtx, Unity, ray tracing, directx raytracing, DirectX 12
As Ken wrote up his take in a separate post, NVIDIA has made Turing architecture details public, which will bring real-time ray tracing to PC gaming later this month. When it was announced, NVIDIA had some demos in Unreal Engine 4, and a few partnered games (Battlefield V, Shadow of the Tomb Raider, and Metro Exodus) showed off their implementations.
As we expected, Unity is working on supporting it too.
Not ray tracing, but from the same project at Unity.
The first commit showed up on Unity’s GitHub for their Scriptable Render Pipelines project, dated earlier today. Looking through the changes, it appears to just generate the acceleration structure based on the objects of type renderer in the current scene (as well as define the toggle properties of course). It looks like we are still a long way out.
I’m looking forward to ray tracing implementations, though. I tend to like art styles with anisotropic metal trims and soft shadows, which is difficult to get right with rasterization alone due to the reliance on other objects in the scene. In the case of metal, reflections dominate the look and feel of the material. In the case of soft shadows, you really need to keep track of how much of a light has been blocked between the rendered fragment and the non-point light.
And yes, it will depend on the art style, but mine just happens to be computationally expensive.
O Rayly? Ya Rayly. No Ray!
Microsoft has just announced a raytracing extension to DirectX 12, called DirectX Raytracing (DXR), at the 2018 Game Developer's Conference in San Francisco.
The goal is not to completely replace rasterization… at least not yet. This effect will be mostly implemented for effects that require supplementary datasets, such as reflections, ambient occlusion, and refraction. Rasterization, the typical way that 3D geometry gets drawn on a 2D display, converts triangle coordinates into screen coordinates, and then a point-in-triangle test runs across every sample. This will likely occur once per AA sample (minus pixels that the triangle can’t possibly cover -- such as a pixel outside of the triangle's bounding box -- but that's just optimization).
For rasterization, each triangle is laid on a 2D grid corresponding to the draw surface.
If any sample is in the triangle, the pixel shader is run.
This example shows the rotated grid MSAA case.
A program, called a pixel shader, is then run with some set of data that the GPU could gather on every valid pixel in the triangle. This set of data typically includes things like world coordinate, screen coordinate, texture coordinates, nearby vertices, and so forth. This lacks a lot of information, especially things that are not visible to the camera. The application is free to provide other sources of data for the shader to crawl… but what?
- Cubemaps are useful for reflections, but they don’t necessarily match the scene.
- Voxels are useful for lighting, as seen with NVIDIA’s VXGI and VXAO.
This is where DirectX Raytracing comes in. There’s quite a few components to it, but it’s basically a new pipeline that handles how rays are cast into the environment. After being queued, it starts out with a ray-generation stage, and then, depending on what happens to the ray in the scene, there are close-hit, any-hit, and miss shaders. Ray generation allows the developer to set up how the rays are cast, where they call an HLSL instrinsic instruction, TraceRay (which is a clever way of invoking them, by the way). This function takes an origin and a direction, so you can choose to, for example, cast rays only in the direction of lights if your algorithm was to, for instance, approximate partially occluded soft shadows from a non-point light. (There are better algorithms to do that, but it's just the first example that came off the top of my head.) The close-hit, any-hit, and miss shaders occur at the point where the traced ray ends.
To connect this with current technology, imagine that ray-generation is like a vertex shader in rasterization, where it sets up the triangle to be rasterized, leading to pixel shaders being called.
Even more interesting – the close-hit, any-hit, and miss shaders can call TraceRay themselves, which is used for multi-bounce and other recursive algorithms (see: figure above). The obvious use case might be reflections, which is the headline of the GDC talk, but they want it to be as general as possible, aligning with the evolution of GPUs. Looking at NVIDIA’s VXAO implementation, it also seems like a natural fit for a raytracing algorithm.
Speaking of data structures, Microsoft also detailed what they call the acceleration structure. Each object is composed of two levels. The top level contains per-object metadata, like its transformation and whatever else data that the developer wants to add to it. The bottom level contains the geometry. The briefing states, “essentially vertex and index buffers” so we asked for clarification. DXR requires that triangle geometry be specified as vertex positions in either 32-bit float3 or 16-bit float3 values. There is also a stride property, so developers can tweak data alignment and use their rasterization vertex buffer, as long as it's HLSL float3, either 16-bit or 32-bit.
As for the tools to develop this in…
Microsoft announced PIX back in January 2017. This is a debugging and performance analyzer for 64-bit, DirectX 12 applications. Microsoft will upgrade it to support DXR as soon as the API is released (specifically, “Day 1”). This includes the API calls, the raytracing pipeline resources, the acceleration structure, and so forth. As usual, you can expect Microsoft to support their APIs with quite decent – not perfect, but decent – documentation and tools. They do it well, and they want to make sure it’s available when the API is.
Example of DXR via EA's in-development SEED engine.
In short, raytracing is here, but it’s not taking over rasterization. It doesn’t need to. Microsoft is just giving game developers another, standardized mechanism to gather supplementary data for their games. Several game engines have already announced support for this technology, including the usual suspects of anything top-tier game technology:
- Frostbite (EA/DICE)
- SEED (EA)
- 3DMark (Futuremark)
- Unreal Engine 4 (Epic Games)
- Unity Engine (Unity Technologies)
They also said, “and several others we can’t disclose yet”, so this list is not even complete. But, yeah, if you have Frostbite, Unreal Engine, and Unity, then you have a sizeable market as it is. There is always a question about how much each of these engines will support the technology. Currently, raytracing is not portable outside of DirectX 12, because it’s literally being announced today, and each of these engines intend to support more than just Windows 10 and Xbox.
Still, we finally have a standard for raytracing, which should drive vendors to optimize in a specific direction. From there, it's just a matter of someone taking the risk to actually use the technology for a cool work of art.
If you want to read more, check out Ryan's post about the also-announced RTX, NVIDIA's raytracing technology.
Subject: Graphics Cards | March 28, 2017 - 04:32 PM | Scott Michaud
Tagged: vulkan, DirectX 12, Futuremark, 3dmark
The latest update to 3DMark adds Vulkan support to its API Overhead test, which attempts to render as many simple objects as possible while keeping above 30 FPS. This branch of game performance allows developers to add more objects into a scene, and design these art assets in a more simple, straight-forward way. This is, now, one of the first tests that can directly compare DirectX 12 and Vulkan, which we expect to be roughly equivalent, but we couldn’t tell for sure.
While I wasn’t able to run the tests myself, Luca Rocchi of Ocaholic gave it a shot on their Core i7-5820K and GTX 980. Apparently, Vulkan was just under 10% faster than DirectX 12 in their results, reaching 22.6 million draw calls in Vulkan, but 20.6 million in DirectX 12. Again, this is one test, done by a third-party, for a single system, and a single GPU driver, on a single 3D engine, and one that is designed to stress a specific portion of the API at that; take it with a grain of salt. Still, this suggests that Vulkan can keep pace with the slightly-older DirectX 12 API, and maybe even beat it.
This update also removed Mantle support. I just thought I’d mention that.
Linked Multi-GPU Arrives... for Developers
The Khronos Group has released the Vulkan 126.96.36.199 specification, which includes experimental (more on that in a couple of paragraphs) support for VR enhancements, sharing resources between processes, and linking similar GPUs. This spec was released alongside a LunarG SDK and NVIDIA drivers, which are intended for developers, not gamers, that fully implement these extensions.
I would expect that the most interesting feature is experimental support for linking similar GPUs together, similar to DirectX 12’s Explicit Linked Multiadapter, which Vulkan calls a “Device Group”. The idea is that the physical GPUs hidden behind this layer can do things like share resources, such as rendering a texture on one GPU and consuming it in another, without the host code being involved. I’m guessing that some studios, like maybe Oxide Games, will decide to not use this feature. While it’s not explicitly stated, I cannot see how this (or DirectX 12’s Explicit Linked mode) would be compatible in cross-vendor modes. Unless I’m mistaken, that would require AMD, NVIDIA, and/or Intel restructuring their drivers to inter-operate at this level. Still, the assumptions that could be made with grouped devices are apparently popular with enough developers for both the Khronos Group and Microsoft to bother.
A slide from Microsoft's DirectX 12 reveal, long ago.
As for the “experimental” comment that I made in the introduction... I was expecting to see this news around SIGGRAPH, which occurs in late-July / early-August, alongside a minor version bump (to Vulkan 1.1).
I might still be right, though.
The major new features of Vulkan 188.8.131.52 are implemented as a new classification of extensions: KHX. In the past, vendors, like NVIDIA and AMD, would add new features as vendor-prefixed extensions. Games could query the graphics driver for these abilities, and enable them if available. If these features became popular enough for multiple vendors to have their own implementation of it, a committee would consider an EXT extension. This would behave the same across all implementations (give or take) but not be officially adopted by the Khronos Group. If they did take it under their wing, it would be given a KHR extension (or added as a required feature).
The Khronos Group has added a new layer: KHX. This level of extension sits below KHR, and is not intended for production code. You might see where this is headed. The VR multiview, multi-GPU, and cross-process extensions are not supposed to be used in released video games until they leave KHX status. Unlike a vendor extension, the Khronos Group wants old KHX standards to drop out of existence at some point after they graduate to full KHR status. It’s not something that NVIDIA owns and will keep it around for 20 years after its usable lifespan just so old games can behave expectedly.
How long will that take? No idea. I’ve already mentioned my logical but uneducated guess a few paragraphs ago, but I’m not going to repeat it; I have literally zero facts to base it on, and I don’t want our readers to think that I do. I don’t. It’s just based on what the Khronos Group typically announces at certain trade shows, and the length of time since their first announcement.
The benefit that KHX does bring us is that, whenever these features make it to public release, developers will have already been using it... internally... since around now. When it hits KHR, it’s done, and anyone can theoretically be ready for it when that time comes.
Subject: Graphics Cards | October 5, 2016 - 09:01 PM | Scott Michaud
Tagged: amd, frame pacing, DirectX 12
When I first read this post, it was on the same day that AMD released their Radeon Software Crimson Edition 16.10.1 drivers, although it was apparently posted the day prior. As a result, I thought that their reference to 16.9.1 was a typo, but it apparently wasn't. These changes have been in the driver for a month, at least internally, but it's unclear how much it was enabled until today. (The Scott Wasson video suggests 16.10.1.) It would have been nice to see it on their release notes as a new feature, but at least they made up for it with a blog post and a video.
If you don't recognize him, Scott Wasson used to run The Tech Report, and he shared notes with Ryan while we were developing our Frame Rating testing methodology. He was focused on benchmarking GPUs by frame time, rather than frame rate, because the number of frames that the user sees means less than how smooth the animation they present is. Our sites diverged on implementation, though, as The Tech Report focused on software, while Ryan determined that capturing and analyzing output frames, intercepted between the GPU and the monitor, would tell a more complete story. Regardless, Scott Wasson left his site to work for AMD last year, with the intent to lead User Experience.
We're now seeing AMD announce frame pacing for DirectX 12 Multi-GPU.
This feature particularly interesting, because, depending on the multi-adapter mode, a lot of that control should be in the hands of the game developers. It seems like the three titles they announced, 3D Mark: Time Spy, Rise of the Tomb Raider, and Total War: Warhammer, would be using implicit linked multi-adapter, which basically maps to CrossFire. I'd be interested to see if they can affect this in explicit mode via driver updates as well, but we'll need to wait and see for that (and there isn't many explicit mode titles anyway -- basically just Ashes of the Singularity for now).
If you're interested to see how multi-GPU load-balancing works, we published an animation a little over a month ago that explains three different algorithms, and how explicit APIs differ from OpenGL and DirectX 11. It is also embedded above.
Why Two 4GB GPUs Isn't Necessarily 8GB
We're trying something new here at PC Perspective. Some topics are fairly difficult to explain cleanly without accompanying images. We also like to go fairly deep into specific topics, so we're hoping that we can provide educational cartoons that explain these issues.
This pilot episode is about load-balancing and memory management in multi-GPU configurations. There seems to be a lot of confusion around what was (and was not) possible with DirectX 11 and OpenGL, and even more confusion about what DirectX 12, Mantle, and Vulkan allow developers to do. It highlights three different load-balancing algorithms, and even briefly mentions what LucidLogix was attempting to accomplish almost ten years ago.
If you like it, and want to see more, please share and support us on Patreon. We're putting this out not knowing if it's popular enough to be sustainable. The best way to see more of this is to share!
Subject: Graphics Cards | August 2, 2016 - 07:37 AM | Scott Michaud
Tagged: windows 10, vulkan, microsoft, DirectX 12
Update (August 3rd @ 4:30pm): Turns out Khronos Group announced at SIGGRAPH that Subgroup Instructions have been recently added to SPIR-V (skip video to 21:30), and are a "top priority" for "Vulkan Next". Some (like WaveBallot) are already ARB (multi-vendor) OpenGL extensions, too.
Original post below:
DirectX 12's shading language will receive some new functionality with the new Shader Model 6.0. According to their GDC talks, it is looking like it will be structured similar to SPIR-V in how it's compiled and ingested. Code will be compiled and optimized as an LLVM-style bytecode, which the driver will accept and execute on the GPU. This could make it easy to write DX12-compatible shader code in other languages, like C++, which is a direction that Vulkan is heading, but Microsoft hasn't seemed to announce that yet.
This news shows a bit more of the nitty gritty details. It looks like they added 16-bit signed (short) and unsigned (ushort) integers, which might provide a performance improvement on certain architectures (although I'm not sure that it's new and/or GPUs exist the natively operate upon them) because they operate on half of the data as a standard, 32-bit integer. They have also added more functionality, to both the pixel and compute shaders, to operate in multiple threads, called lanes, similar to OpenCL. This should allow algorithms to work more efficiently in blocks of pixels, rather than needing to use one of a handful of fixed function calls (ex: partial derivates ddx and ddy) to see outside their thread.
When will this land? No idea, but it is conspicuously close to the Anniversary Update. It has been added to Feature Level 12.0, so its GPU support should be pretty good. Also, Vulkan exists, doing its thing. Not sure how these functions overlap with SPIR-V's feature set, but, since SPIR was original for OpenCL, it could be just sitting there for all I know.
Subject: Graphics Cards | April 14, 2016 - 06:17 PM | Scott Michaud
Tagged: microsoft, windows 10, uwp, DirectX 12, dx12
At the PC Gaming Conference from last year's E3 Expo, Microsoft announced that they were looking to bring more first-party titles to Windows. They used to be one of the better PC gaming publishers, back in the Mechwarrior 4 and earlier Flight Simulator days, but they got distracted as Xbox 360 rose and Windows Vista fell.
Again, part of that is because they attempted to push users to Windows Vista and Games for Windows Live, holding back troubled titles like Halo 2: Vista and technologies like DirectX 10 from Windows XP, which drove users to Valve's then-small Steam platform. Epic Games was also a canary in the coalmine at that time, warning users that Microsoft was considering certification for Games for Windows Live, which threatened mod support “because Microsoft's afraid of what you might put into it”.
It's sometimes easy to conform history to fit a specific viewpoint, but it does sound... familiar.
Anyway, we're glad that Microsoft is bringing first-party content to the PC, and they are perfectly within their rights to structure it however they please. We are also within our rights to point out its flaws and ask for them to be corrected. Turns out that Quantum Break, like Gears of War before it, has some severe performance issues. Let's be clear, these will likely be fixed, and I'm glad that Microsoft didn't artificially delay the PC version to give the console an exclusive window. Also, had they delayed the PC version until it was fixed, we wouldn't have known whether it needed the time.
Still, the game apparently has issues with a 50 FPS top-end cap, on top of pacing-based stutters. One concern that I have is, because DigitalFoundry is a European publication, perhaps the 50Hz issue might be caused by their port being based on a PAL version of the game??? Despite suggesting it, I would be shocked if that were the case, but I'm just trying to figure out why anyone would create a ceiling at that specific interval. They are also seeing NVIDIA's graphics drivers frequently crash, which probably means that some areas of their DirectX 12 support are not quite what the game expects. Again, that is solvable by drivers.
It's been a shaky start for both DirectX 12 and the Windows 10 UWP platform. We'll need to keep waiting and see what happens going forward. I hope this doesn't discourage Microsoft too much, but also that they robustly fix the problems we're discussing.
Subject: Graphics Cards | March 9, 2016 - 07:42 PM | Scott Michaud
Tagged: amd, radeon, graphics drivers, vulkan, dx12, DirectX 12
New graphics drivers from AMD have just been published, and it's a fairly big release. First, Catalyst 16.3 adds Vulkan support to main-branch drivers, which they claim is conformant to the 1.0 specification. The Khronos Group website still doesn't list AMD as conforming, but I assume that they will be added shortly (rather than some semantic “conformant” “fully conformant” thing going on). This is great for the platform, as we are still in the launch window of DirectX 12.
Performance has apparently increased as well, significantly. This is especially true in the DirectX 12 title, Gears of War Ultimate Edition. AMD claims that FuryX will see up to a 60% increase in that title, and the R9 380 will gain up to 44%. It's unclear how much that is in real world performance, especially in terms of stutter and jank, which apparently plagues that game.
The driver also has a few other interesting features. One that I don't quite understand is “Power Efficiency Toggle”. This supposedly “allows the user to disable some power efficiency optimizations”. I would assume that means keeping you GPU up-clocked under certain conditions, but I don't believe that was much of an issue for the last few generations. That said, the resolved issues section claims that some games were choppy because of core clock fluctuation, and lists this option as the solution, so maybe it was. It is only available on “select” Radeon 300 GPUs and Fury X. That is, Fury X specifically, not the regular Fury or the Nano. I expect Ryan will be playing around with it in the next little while.
Last of the main features, the driver adds support for XConnect, which is AMD's new external graphics standard. It requires a BIOS that support external GPUs, which AMD lists the Razer Blade Stealth as. Also noteworthy, Eyefinity can now be enabled with just two displays, and Display Scaling can be set per-game. I avoid manually controlling drivers, even my Wacom tablet, to target specific applications, but that's probably great for those who do.
As a final note: the Ashes of the Singularity 2.0 benchmark now supports DirectFlip.
If you have a recent AMD GPU, grab the drivers from AMD's website.
Subject: General Tech | March 9, 2016 - 05:40 PM | Scott Michaud
Tagged: Oxide Games, ashes of the singularity, Star Swarm, dx12, DirectX 12
Ashes of the Singularity, by Oxide Games and Stardock, the RTS that spawned from the Star Swarm demo, will be released on March 31st. Unless something sneaks in before then, this will pretty much be our first look at DirectX 12 in a full, released game, and pretty much the only one to take advantage of its ability to draw many simple objects.
Again, I'm excluding released games based on engines Unreal Engine 4, because you don't have full DirectX 12 support if your engine provider doesn't claim full DirectX 12 support. I'm pretty sure they just enabled Epic's experimental feature, rather than beat them to overhauling their hardware interfaces. I'm also ignoring Gears of War Ultimate Edition because of the state it launched in.
It seems like the only source for this news is PC Gamer. Stardock hasn't officially said what will change in this launch from the previous beta release (Beta 2). The game currently supports mixed multi-GPU on DirectX 12, and a variety of other features, which will be interesting to see in official software. Unless we get a surprise in the official announcement, it looks like Vulkan might be a “patch after launch” thing, though.