Who Should Care? Thankfully, Many People
Khronos has released information about glNext, now called Vulkan.
The Khronos Group has made three announcements today: Vulkan (their competitor to DirectX 12), OpenCL 2.1, and SPIR-V. Because there is actually significant overlap, we will discuss them in a single post rather than splitting them up. Each has a role in the overall goal to access and utilize graphics and compute devices.
Before we get into what everything is and does, let's give you a little tease to keep you reading. First, Khronos designs their technologies to be self-reliant. As such, while there will be some minimum hardware requirements, the OS pretty much just needs to have a driver model. Vulkan will not be limited to Windows 10 and similar operating systems. If a graphics vendor wants to go through the trouble, which is a gigantic if, Vulkan can be shimmed into Windows 8.x, Windows 7, possibly Windows Vista despite its quirks, and maybe even Windows XP. The words “and beyond” came up after Windows XP, but don't hold your breath for Windows ME or anything. Again, the further back in Windows versions you get, the larger the “if” becomes but at least the API will not have any “artificial limitations”.
Outside of Windows, the Khronos Group is the dominant API curator. Expect Vulkan on Linux, Mac, mobile operating systems, embedded operating systems, and probably a few toasters somewhere.
On that topic: there will not be a “Vulkan ES”. Vulkan is Vulkan, and it will run on desktop, mobile, VR, consoles that are open enough, and even cars and robotics. From a hardware side, the API requires a minimum of OpenGL ES 3.1 support. This is fairly high-end for mobile GPUs, but it is the first mobile spec to require compute shaders, which are an essential component of Vulkan. The presenter did not state a minimum hardware requirement for desktop GPUs, but he treated it like a non-issue. Graphics vendors will need to be the ones making the announcements in the end, though.
What Is A Graphics / Compute API?
Applications are bundles of instructions that operate on data, either packaged with it or gathered as the application runs. CPUs follow these series of instructions very quickly, pretty much one at a time, and in order. These are called threads, and modern CPUs can do multiple of them at any given time. It is common to find consumer CPUs that can do anywhere from two to eight threads at once.
Sometimes, your application runs across big tasks, like “calculate the color of every pixel” or “move every point (vertex) in a 3D model by some amount”. These tasks add up to a lot of math, but each task is made up of mostly similar instructions. Modern GPUs can be thousands of cores, which is great when you are dealing with screens that have two million pixels (1080p) or more, many of which are calculated multiple times because of overlapping geometry and complicated effects, as well as 2D and 3D scenes that are made up of thousands or millions of triangles, each with three vertexes (albeit many are shared with neighboring triangles).
To do this, stages in the drawing of 3D objects are introduced to inject programmable scripts, called “shaders”. These shaders are associated with a material, such as water, or stone, or dirt. To achieve any given effect, software engineers think of what series of instructions will result in what effect they're trying to go for, be it based in real-world physics, or even nonsense that creates something fantastical (or something that looks realistic but is just a hacky trick).
Some common shaders are:
- Vertex shader
- A series of instructions for every affected vertex (give or take)
- It is the first programmable rendering stage for an object
- Geometry shader
- A series of instructions for every affected primitive (triangles, lines, points, etc.)
- It usually runs after tessellation, which runs after the Vertex shader
- Fragment (Pixel) shader
- A series of instructions for every rasterized pixel
- It runs after the Geometry shader
- Computer shader
- A series of commands for every whatever-the-programmer-wants.
- It runs on its own, outside the typical rendering process
If you wish to see a Fragment shader in action, a single 185-line script generates an ocean scene complete with a sky, waves, sub-surface scattering, and so forth. It is available at Shadertoy and should run in any WebGL-compliant browser. It runs once per canvas pixel. You can also edit the values on the right and click the play button at the bottom left of the text box to see how your changes affect the scene (or break it).
For some reason, my Firefox locks up after a few seconds of this, but Google Chrome works fine. I am not exactly sure what either Shadertoy or Firefox is doing wrong, but it happens a lot at their site unfortunately.
So What Does Vulkan Do Better and How Does It Differ from Mantle?
A graphics API's primary job is to do all of the tasks required for a developer to submit their objects to be drawn, according to their geometry and materials. Mostly, this means keeping the unified shader cores of the GPU loaded with as much relevant work as they can. This helps increase the number of batches per frame, which is often one batch per object, per material.
This is what Mantle and DirectX 12 flaunts: lots of draw calls for lots of objects on scene. It also allows reduced CPU usage as each thread is easier, and multiple cores can pitch in, all of which leads to less power consumption (especially useful for mobile apps that draw a lot of simple objects). Vulkan does all of this. Creating “Command Buffers” can be done on multiple threads while another thread assembles and manages a queue of commands to push to the GPU.
Vulkan even allows the game developer to disable most of the error checking for production code. Rather than having the API look over the game code's shoulder to make sure it's not going to crash itself, like OpenGL and OpenGL ES does, all of that debugging cruft can be unhooked from shipped games. With this, the driver can be simple and the developer does not need to wait for the driver to make sure that the developer isn't doing what the developer already checked to make sure didn't happen before it sent its request to the driver. Wow, that sentence seems like an awful waste of time, doesn't it?
But all of that makes the CPU side more efficient
The Khronos Group has also tweaked the GPU side as well. Before now, we had a war between GLSL in OpenGL and HLSL in DirectX. HLSL was quite popular because Windows and Xbox were very popular platforms to target a game for. When AMD made Mantle, one of their selling points was that the Mantle shading language was just HLSL. This meant that game developers could keep using the shader language that they know, and AMD would not need to maintain a whole separate compiler/interpreter chain.
When you are not bound by draw call limitations, another bottleneck might just be how much you can push through the shader cores. A good example of increasing the load on shaders would be increasing resolution (unless you run out of video memory or video memory bandwidth first). When AMD chose HLSL, it meant that they could make shaders for Mantle run just as fast as shaders for DirectX. This means that they would not be sacrificing top-end, GPU-bound performance for low-end, draw call-bound, CPU performance.
Khronos is doing something else entirely.
Rather than adopting HLSL or hoping that driver developers could maintain parity between GLSL and it, they completely removed shader compilation from the driver altogether. Pulling in their work in OpenCL 2, shaders in Vulkan will be compiled by the game developers into an LLVM-derived bytecode, called SPIR-V. All the driver needs to do is accept this pre-compiled intermediate representation and apply it for their architecture.
This means a few interesting things. First, the game developer can write their source code in pretty much any language they want. HLSL and GLSL are based on a subset of C, while SPIR-V can be a subset of C++, some proprietary scripting language, or even HLSL. Khronos is currently working on a GLSL-to-SPIR-V compiler.
Not only does this decrease shader compilation time, because SPIR-V is pre-compiled, but this probably means that a SPIR-V shader will be much more simple than either GLSL or HLSL. These shader languages have constructs for various vector and matrix maths, texture processing, and so forth. To see what I mean, check out pages 9 through 12 of the OpenGL 4.5 Quick Reference Card.
Yes, I said four pages of a quick reference card.
While I have not seen the SPIR-V spec, it sounds a lot like most of this will be relegated to libraries that can be compiled into the program. This lets the developer choose the sort of math algorithms that their Vulkan application will use to perform any arbitrary computation. I asked the presenter if Khronos would provide libraries for developers to use in their Vulkan shader application, and they said that it already existed for OpenCL 2.0, and an optimized version ships with OpenCL 2.1.
Rephrasing the above paragraph: rather than baking all the complex math functions into the driver, such as complex matrix operations, the developer will have a much reduced set of instructions that they can do. They are allowed to combine those however they want, or choose an existing package to do what they want from Khronos, a graphics vendor, or even a friend. I'm not sure exactly what the simpler baseline will be, though.
So What About OpenCL 2.1?
The other part of this discussion is OpenCL 2.1. They added a few interesting things to the spec, but frankly Vulkan is the main thing our readers are here for. What I will discuss here, though, is that both OpenCL 2.1 and Vulkan accept the same SPIR-V bytecode. While Khronos did not mention this, it should mean that an efficient SPIR-V bytecode interpreter for OpenCL 2.1 could carry over optimizations to Vulkan, much like how AMD reused DirectX shader optimizations in Mantle (only Vulkan and OpenCL do not need to worry about the compiler half — just the interpreter).
OpenCL 2.1 still accept OpenCL 1.x kernel code too. This is probably for developers who have old code that they don't want to compile into SPIR-V, and would instead rather to use the legacy route.
But this is why I said the announcement was heavily overlapped. OpenCL and Vulkan both use SPIR-V bytecode as their “send it to the GPU” language. From what I understand, OpenCL is a bit less relaxed that Vulkan in terms of error checking and, because it doesn't care about graphics, allows it to be used on FPGAs, CPUs, and other compute devices. Vulkan is more designed for workloads which mix high-performance graphics with compute.
The Death of OpenGL and OpenGL ES? No. (And Conclusions)
Khronos Group has also stressed that OpenGL and OpenGL are not going away. Some people will want to write applications on a platform that performs error checking and so forth. These APIs are still important to them, and will evolve as Khronos continually finds new directions for them to go.
One last thing: earlier, we mentioned that Vulkan would allow shipped products to unhook its error checking code. If it will crash, let it crash. I asked the presenter whether this meant that they would unhook various robustness features. They said no. Vulkan apps will not be able to hang the GPU or break into other application's memory space to spy on them. I believe the quote was: “Oh yeah. We know how to do robustness.” This is very important for applications like web browsers, which accept arbitrary code from equally arbitrary places on the internet. (Note that Vulkan is not a Web API like WebGL or WebCL, but it could be used by web browsers themselves to massively speed up page rendering, 2D canvas, and so forth.)
In all, this could be interesting. Unlike DirectX, Khronos is allowing Vulkan to evolve outside of OpenGL. This could let them experiment in directions that Microsoft might not be able to. With DirectX 12 being “the next DirectX”, they seem to be admitting that it will need to be suitable for all developers once DirectX 11 gets deprecated. This might leave Microsoft with error-checking overhead that Khronos can chuckle at from the sidelines.
We will see as GDC goes on how this will play out.
OpenCL 2.1 and SPIR-V specs have been released today. Vulkan has not, but they feel the urgency and know that it must be out before the end of 2015. The presenter was also hinting that “before the end of 2015” might be much sooner than it sounds.
ok, lot of information.
If I
ok, lot of information.
If I understand it right, Vulkan is another specialized API. While the upcoming glNext is the next version of OpenGL and specialized for gaming engine purposes.
no vulcan is glnext. It is
no vulcan is glnext. It is also the general purpose and crossplatform spawn of mantle(derivative)
Vulkan sounds promising but
Vulkan sounds promising but is really just DX12 catch up; the author cleanly hasn’t used the APIs in anger either as the commentary is flawed. DirectX from v9 had an intermediate shader bytecode allowing pre-compilation of shaders. By default the D3D runtime doesn’t perform error checking, you have to ask for it to be enabled if you want it. DX11 introduced multi-threaded command buffer generation with a single thread submission thread.
Finally, on the windows platform, it isn’t D3D that does the clever kernel driver stuff, it’s the DXGK, the Microsoft Graphics Kernel, even OpenGL goes through this. As such if DX12 is leveraging a WDDM DXGK 2 feature to gain acceleration the only way an OpenGL driver can compete is to also use that very same DXGK feature; to not would result in lower performance. On an OS that doesn’t support that DXGK feature the GL and DX runtimes both can’t use that feature; that’s not an artificial limitation, it’s a very real limitation.
Yeah I’ve done very little
Yeah I've done very little programming with DirectX up to this point. Most of my graphics experience, outside of existing engines, is in GL/CL, but that shouldn't be a dismissal.
Also, "artificial limitation" was a direct quote, and it was not in reference to DirectX or the Windows driver model. It was a statement that Vulkan can be implemented by the graphics drivers in basically any operating system. That doesn't mean it will be implemented in exactly the same way everywhere, but the presenter said that there's nothing stopping implementations from being "shimmed" as far back as XP "and beyond".
Hopefully the Khronos group,
Hopefully the Khronos group, and their Vulkan, will provide the feature set to keep pace with DX, and Mantle influences, and be an OS neutral software/middleware hardware abstraction layer. I’d like the OS/s, and the graphics API/driver ecosystem to simply recognize, and be able to use any GPU, integrated or discrete, as just another available computing resource, not bound to any one OS/graphics API ecosystem. So maybe Vulkan, as well as the HSA foundations standards can complement each other. For sure this will benefit the Linux/other based ecosystems, And the Khronos group needs to keep the advancements coming. This switchable ether/or , but not both at the same time graphics(integrated/discrete) must be relegated to the past, maybe not so much for gaming as for other graphics/GPGPU uses.
A DX12 catchup? or a Metal
A DX12 catchup? or a Metal catchup you mean!?
That seascape shader
That seascape shader hard-locks my Firefox browser too, when I try to go full screen.
Firefox is buggy in this area, I think.
It works fine for me. I’m
It works fine for me. I’m using Firefox 35.0.1 on Linux Lite and Catalyst proprietary drives.
36.0 here. Intel Iris
36.0 here. Intel Iris Graphics, which is probably my problem. 😉
I have switchable integrated
I have switchable integrated Intel, with AMD discrete mobile graphics, and usually browse under the integrated Intel graphics. so IE will sure lockup the whole system, maybe I’ll try Firefox, these lockups are becoming frequent, and annoying in IE11.
I just found out what has
I just found out what has been causing my browser to lockup for 60, or more, seconds on some news sites, and such. IE 11, is at it again, whatever is going on with webGL, and IE is causing the browser, and whole computer to lock up, when I went to the shadertoy site. The browser/computer locks up, 60+ sec, before the webpage loads, also the demo video/sample is just a black screen, when the demo is running. Probably some webGL extension/other is causing IE, and computer to lockup, standard M$ cockup, nothing new under the sun. This does not happen with some webGL sample sites, so it must be the extensions, or something IE is doing wrong.
Vulkan going to be nice, especially on systems with lots of cores/processor threads, but what about the ability to dispatch threads to both my integrated GPU, and the discrete GPU, and get some HSA like workload spreading across all my system’s CPU/GPU computing resources? I hope that ability works its way into all graphics/computing APIs and abstraction layers, regardless of who made the integrated and discrete GPUs.
Just a little heads up, it
Just a little heads up, it will most likely “never” be ported to xp, the main reason being the same as to why Mantle only works in modern game engines, The Operating system and Application need to be 64bit as the GPU memory needs to be directly addressable by the application.
You only need a 32bit address
You only need a 32bit address bus to directly access 4 gigs of memory, the CPUs data bus width and standard register size indirectly have a small role in addressing physical memory. You can have 32bit CPUs with 64bit data buses(if you wanted), and usually the address bus is around 36 bits wide(on 64 bit microprocessors), with 64 bit page tables in memory, that are converted to a 36 bit address to access memory directly. It’s the address bus width that determines how much physical memory can be directly addressed.
The CPU’s VM hardware and the OS, is responsible for converting/handling the page tables, and offsets, and handling page faults, before converting the virtual address into a physical memory address to be sent out over the address bus. a 36bit address bus can directly address 68,719,476,736 bites of memory, and That is directly addressable memory, not including the paged virtual memory, that can be much larger and paged/swapped in and out of physical memory. Windows XP professional has 64bit ISO, if your system is 64 bit. Let’s put an end to the data bus having all the responsibilities of directly addressing memory, that is the address bus/CPU’s VM hardware instructions’, and OS’s responsibility.
CPUs are given there bit size(32, 64, whatever) based on the data bus width, somewhat, but more so on the width of the CPU’s standard general purpose register size, or machine word(32, 64, whatever bits)
The 36-bit physical address
The 36-bit physical address support was on 32-bit x86 processors. AMD64 architectures support 52-bit physical address spaces, and (if the wikipedia article on PAE is correct) 48-bit virtual addresses. The 36-bit physical address support on 32-bit processors only allows 64 GB of memory which is insufficient now. You can get 64 GB on a consumer level socket 2011 board these days. Single socket 2011 server boards support 256 to 512 GB of memory.
The bit-ness of a processor has little to do with the width of the address buss or any actual data paths. Most current systems use multiple 64-bit DRAM channels so internal data paths are wider. Graphics cards use many 32-bit channels. The bit-ness generally refers to the size of a virtual memory address. This usually matches the size of a general purpose register since addresses are stored a lot, and the system needs to be able to handle them efficiently. The size of the address actually translated to a physical address does not need to equal the size of the virtual address though. In AMD64 only 48 bits of the 64-bit virtual address is actually translated to a physical address. The other bits may be still be used by the os though. The NX bit (no execute) is one of these bits.
The only reason to support 32-bit architectures would be for phones and such really. Most 32-bit PCs are not going to be supported by modern games. If you are running an old 32-bit os, then the available memory for each process can be only 2 GB. They can be set up to use more, but the default for Windows was 2 GB. Under a 64-bit os, a 32-bit process is still limited to 4 GB.
I am not actually sure how gpus manage their memory. They have been using more than 4 GB for a while, but this is not managed by the CPUs MMU and is not coherent with system memory yet. It will be in HSA architectures. If some one could write an atricle about that, it would be good. I have not experimented with OpenCL or other such languages, so I don’t know if these allow direct manipulations of pointer type values so the “bitness” of the memory system may not be exposed. They could easily use PAE type mechanisms rather than working with 64-bit addresses. Moving to 64-bit is not without penalties.
Original poster, gpu’s
Original poster, gpu’s traditionally shadow a portion of the main address space, which is why there is a performance improvement when moving from 32 to a 64 bit os with modern hardware, even if the application is only 32bit still, as the gpu no longer has to shadow address space that was being used for RAM. All of the memory in the GPU needs to be directly addressable with new low abstraction graphics api’s, which is why we will never see 32bit versions of games using dx12 or Vulcan.
Don’t mention 32bit PAE, it is a joke compared to just using 64 bit software. There are many performance improvements with going to 64bit, if anyone say inflated pointers and larger executables are a reason for not using 64bit, then they have no idea overall.
From that diagram it looks
From that diagram it looks like they didn’t get rid of the single thread bottleneck for submitting commands to the GPU (which can handle commands in parallel). Didn’t Mantle allow parallel submission for that instead of using a single thread with one queue? DX12 supposedly allows that too. If Vulkan won’t allow it, it will be at a disadvantage.
That is what mantle does for
That is what mantle does for a single GPU, maybe the diagram just isn’t showing multiple GPU configurations. Well actually mantle has a separate one for DMA, Compute and Graphics per GPU according to diagrams.
It still dosen’t matter. The GPU can DMA different parts of the command buffer/s from main system memory and dispatch them to different areas of the GPU (graphics area, DMA area, Compute area).
Oops! Sorry. Missed this
Oops! Sorry. Missed this comment.
That's one thread per command queue. I asked about this, and there will be at least one command queue per GPU. Apparently, the working group is discussing whether they want to allow multiple command queues per device.
“With DirectX 12 being “the
“With DirectX 12 being “the next DirectX”, they seem to be admitting that it will need to be suitable for all developers once DirectX 11 gets deprecated. This might leave Microsoft with error-checking overhead that Khronos can chuckle at from the sidelines.”
I’m pretty sure Microsoft have said that the D3D11 and D3D12 codelines will be coexisting and developed in parallel for the forseeable future, with D3D12 handling ‘low level’ access for game developers who want to put the effort into optimisation, and D3D11 to handle everything else at a higher level. D3D11 and D3D12 are already as separate entities as Vulkan and OGL, so there should be minimal worry about experimenting with reducing the error checking overhead.
Yeah, they have said that. I
Yeah, they have said that. I was just noting that they painted themselves into a marketing corner. That corner might be the one with the door, though, to butcher the analogy.
https://www.youtube.com/watch
https://www.youtube.com/watch?v=EUNMrU8uU5M
Another big Thank You AMD for
Another big Thank You AMD for revolutionizing the industry. Despite many haters snobbed mantle, it really was a game changer.