Who Should Care? Thankfully, Many People

Khronos has released information about glNext, now called Vulkan.

The Khronos Group has made three announcements today: Vulkan (their competitor to DirectX 12), OpenCL 2.1, and SPIR-V. Because there is actually significant overlap, we will discuss them in a single post rather than splitting them up. Each has a role in the overall goal to access and utilize graphics and compute devices.

Before we get into what everything is and does, let's give you a little tease to keep you reading. First, Khronos designs their technologies to be self-reliant. As such, while there will be some minimum hardware requirements, the OS pretty much just needs to have a driver model. Vulkan will not be limited to Windows 10 and similar operating systems. If a graphics vendor wants to go through the trouble, which is a gigantic if, Vulkan can be shimmed into Windows 8.x, Windows 7, possibly Windows Vista despite its quirks, and maybe even Windows XP. The words “and beyond” came up after Windows XP, but don't hold your breath for Windows ME or anything. Again, the further back in Windows versions you get, the larger the “if” becomes but at least the API will not have any “artificial limitations”.

Outside of Windows, the Khronos Group is the dominant API curator. Expect Vulkan on Linux, Mac, mobile operating systems, embedded operating systems, and probably a few toasters somewhere.

On that topic: there will not be a “Vulkan ES”. Vulkan is Vulkan, and it will run on desktop, mobile, VR, consoles that are open enough, and even cars and robotics. From a hardware side, the API requires a minimum of OpenGL ES 3.1 support. This is fairly high-end for mobile GPUs, but it is the first mobile spec to require compute shaders, which are an essential component of Vulkan. The presenter did not state a minimum hardware requirement for desktop GPUs, but he treated it like a non-issue. Graphics vendors will need to be the ones making the announcements in the end, though.

What Is A Graphics / Compute API?

Applications are bundles of instructions that operate on data, either packaged with it or gathered as the application runs. CPUs follow these series of instructions very quickly, pretty much one at a time, and in order. These are called threads, and modern CPUs can do multiple of them at any given time. It is common to find consumer CPUs that can do anywhere from two to eight threads at once.

Sometimes, your application runs across big tasks, like “calculate the color of every pixel” or “move every point (vertex) in a 3D model by some amount”. These tasks add up to a lot of math, but each task is made up of mostly similar instructions. Modern GPUs can be thousands of cores, which is great when you are dealing with screens that have two million pixels (1080p) or more, many of which are calculated multiple times because of overlapping geometry and complicated effects, as well as 2D and 3D scenes that are made up of thousands or millions of triangles, each with three vertexes (albeit many are shared with neighboring triangles).

To do this, stages in the drawing of 3D objects are introduced to inject programmable scripts, called “shaders”. These shaders are associated with a material, such as water, or stone, or dirt. To achieve any given effect, software engineers think of what series of instructions will result in what effect they're trying to go for, be it based in real-world physics, or even nonsense that creates something fantastical (or something that looks realistic but is just a hacky trick).

Some common shaders are:

  1. Vertex shader
  • A series of instructions for every affected vertex (give or take)
  • It is the first programmable rendering stage for an object
  1. Geometry shader
  • A series of instructions for every affected primitive (triangles, lines, points, etc.)
  • It usually runs after tessellation, which runs after the Vertex shader
  1. Fragment (Pixel) shader
  • A series of instructions for every rasterized pixel
  • It runs after the Geometry shader
  1. Computer shader
  • A series of commands for every whatever-the-programmer-wants.
  • It runs on its own, outside the typical rendering process

If you wish to see a Fragment shader in action, a single 185-line script generates an ocean scene complete with a sky, waves, sub-surface scattering, and so forth. It is available at Shadertoy and should run in any WebGL-compliant browser. It runs once per canvas pixel. You can also edit the values on the right and click the play button at the bottom left of the text box to see how your changes affect the scene (or break it).

For some reason, my Firefox locks up after a few seconds of this, but Google Chrome works fine. I am not exactly sure what either Shadertoy or Firefox is doing wrong, but it happens a lot at their site unfortunately.

So What Does Vulkan Do Better and How Does It Differ from Mantle?

A graphics API's primary job is to do all of the tasks required for a developer to submit their objects to be drawn, according to their geometry and materials. Mostly, this means keeping the unified shader cores of the GPU loaded with as much relevant work as they can. This helps increase the number of batches per frame, which is often one batch per object, per material.

This is what Mantle and DirectX 12 flaunts: lots of draw calls for lots of objects on scene. It also allows reduced CPU usage as each thread is easier, and multiple cores can pitch in, all of which leads to less power consumption (especially useful for mobile apps that draw a lot of simple objects). Vulkan does all of this. Creating “Command Buffers” can be done on multiple threads while another thread assembles and manages a queue of commands to push to the GPU.

Vulkan even allows the game developer to disable most of the error checking for production code. Rather than having the API look over the game code's shoulder to make sure it's not going to crash itself, like OpenGL and OpenGL ES does, all of that debugging cruft can be unhooked from shipped games. With this, the driver can be simple and the developer does not need to wait for the driver to make sure that the developer isn't doing what the developer already checked to make sure didn't happen before it sent its request to the driver. Wow, that sentence seems like an awful waste of time, doesn't it?

But all of that makes the CPU side more efficient

The Khronos Group has also tweaked the GPU side as well. Before now, we had a war between GLSL in OpenGL and HLSL in DirectX. HLSL was quite popular because Windows and Xbox were very popular platforms to target a game for. When AMD made Mantle, one of their selling points was that the Mantle shading language was just HLSL. This meant that game developers could keep using the shader language that they know, and AMD would not need to maintain a whole separate compiler/interpreter chain.

When you are not bound by draw call limitations, another bottleneck might just be how much you can push through the shader cores. A good example of increasing the load on shaders would be increasing resolution (unless you run out of video memory or video memory bandwidth first). When AMD chose HLSL, it meant that they could make shaders for Mantle run just as fast as shaders for DirectX. This means that they would not be sacrificing top-end, GPU-bound performance for low-end, draw call-bound, CPU performance.

Khronos is doing something else entirely.

Rather than adopting HLSL or hoping that driver developers could maintain parity between GLSL and it, they completely removed shader compilation from the driver altogether. Pulling in their work in OpenCL 2, shaders in Vulkan will be compiled by the game developers into an LLVM-derived bytecode, called SPIR-V. All the driver needs to do is accept this pre-compiled intermediate representation and apply it for their architecture.

This means a few interesting things. First, the game developer can write their source code in pretty much any language they want. HLSL and GLSL are based on a subset of C, while SPIR-V can be a subset of C++, some proprietary scripting language, or even HLSL. Khronos is currently working on a GLSL-to-SPIR-V compiler.

Not only does this decrease shader compilation time, because SPIR-V is pre-compiled, but this probably means that a SPIR-V shader will be much more simple than either GLSL or HLSL. These shader languages have constructs for various vector and matrix maths, texture processing, and so forth. To see what I mean, check out pages 9 through 12 of the OpenGL 4.5 Quick Reference Card.

Yes, I said four pages of a quick reference card.

While I have not seen the SPIR-V spec, it sounds a lot like most of this will be relegated to libraries that can be compiled into the program. This lets the developer choose the sort of math algorithms that their Vulkan application will use to perform any arbitrary computation. I asked the presenter if Khronos would provide libraries for developers to use in their Vulkan shader application, and they said that it already existed for OpenCL 2.0, and an optimized version ships with OpenCL 2.1.

Rephrasing the above paragraph: rather than baking all the complex math functions into the driver, such as complex matrix operations, the developer will have a much reduced set of instructions that they can do. They are allowed to combine those however they want, or choose an existing package to do what they want from Khronos, a graphics vendor, or even a friend. I'm not sure exactly what the simpler baseline will be, though.

So What About OpenCL 2.1?

The other part of this discussion is OpenCL 2.1. They added a few interesting things to the spec, but frankly Vulkan is the main thing our readers are here for. What I will discuss here, though, is that both OpenCL 2.1 and Vulkan accept the same SPIR-V bytecode. While Khronos did not mention this, it should mean that an efficient SPIR-V bytecode interpreter for OpenCL 2.1 could carry over optimizations to Vulkan, much like how AMD reused DirectX shader optimizations in Mantle (only Vulkan and OpenCL do not need to worry about the compiler half — just the interpreter).

OpenCL 2.1 still accept OpenCL 1.x kernel code too. This is probably for developers who have old code that they don't want to compile into SPIR-V, and would instead rather to use the legacy route.

But this is why I said the announcement was heavily overlapped. OpenCL and Vulkan both use SPIR-V bytecode as their “send it to the GPU” language. From what I understand, OpenCL is a bit less relaxed that Vulkan in terms of error checking and, because it doesn't care about graphics, allows it to be used on FPGAs, CPUs, and other compute devices. Vulkan is more designed for workloads which mix high-performance graphics with compute.

The Death of OpenGL and OpenGL ES? No. (And Conclusions)

Khronos Group has also stressed that OpenGL and OpenGL are not going away. Some people will want to write applications on a platform that performs error checking and so forth. These APIs are still important to them, and will evolve as Khronos continually finds new directions for them to go.

One last thing: earlier, we mentioned that Vulkan would allow shipped products to unhook its error checking code. If it will crash, let it crash. I asked the presenter whether this meant that they would unhook various robustness features. They said no. Vulkan apps will not be able to hang the GPU or break into other application's memory space to spy on them. I believe the quote was: “Oh yeah. We know how to do robustness.” This is very important for applications like web browsers, which accept arbitrary code from equally arbitrary places on the internet. (Note that Vulkan is not a Web API like WebGL or WebCL, but it could be used by web browsers themselves to massively speed up page rendering, 2D canvas, and so forth.)

In all, this could be interesting. Unlike DirectX, Khronos is allowing Vulkan to evolve outside of OpenGL. This could let them experiment in directions that Microsoft might not be able to. With DirectX 12 being “the next DirectX”, they seem to be admitting that it will need to be suitable for all developers once DirectX 11 gets deprecated. This might leave Microsoft with error-checking overhead that Khronos can chuckle at from the sidelines.

We will see as GDC goes on how this will play out.

OpenCL 2.1 and SPIR-V specs have been released today. Vulkan has not, but they feel the urgency and know that it must be out before the end of 2015. The presenter was also hinting that “before the end of 2015” might be much sooner than it sounds.