A New Perspective on Multi-GPU Memory Management
Why Two 4GB GPUs Isn't Necessarily 8GB
We're trying something new here at PC Perspective. Some topics are fairly difficult to explain cleanly without accompanying images. We also like to go fairly deep into specific topics, so we're hoping that we can provide educational cartoons that explain these issues.
This pilot episode is about load-balancing and memory management in multi-GPU configurations. There seems to be a lot of confusion around what was (and was not) possible with DirectX 11 and OpenGL, and even more confusion about what DirectX 12, Mantle, and Vulkan allow developers to do. It highlights three different load-balancing algorithms, and even briefly mentions what LucidLogix was attempting to accomplish almost ten years ago.
If you like it, and want to see more, please share and support us on Patreon. We're putting this out not knowing if it's popular enough to be sustainable. The best way to see more of this is to share!
Crossfire and SLI allow games to load-balance across multiple GPUs. It is basically impossible to do this in OpenGL and DirectX 11 otherwise. Vulkan and DirectX 12 provide game developers with the tools to implement it themselves, but they do not address every limitation. Trade-offs always exist.
In the older APIs, OpenGL and DirectX 11, games and other applications attach geometry buffers, textures, materials, and compute tasks to the API's one, global interface. After, a draw function is called to submit that request to the primary graphics driver. This means that work can only be split from within the driver, and only to the devices that driver controls, which prevents cross-vendor compatibility. Lucidlogix created software and hardware that pretended to be the primary graphics driver, loading GPUs from mismatched vendors behind the scenes. It never took off.
With Vulkan and DirectX 12, rather than binding data and tasks to a global state, applications assemble commands and push them onto lists. Not only does this allow multiple CPU threads to create work independently, but these lists can also point to any GPU. This is how OpenCL and other compute APIs are modeled, but Mantle was the first to extend it to graphics. Developers can load-balance GPUs by managing multiple lists with different destinations. This also means that the developer can control what each GPU stores in its memory, and ignore the data it doesn't need.
That said, even though the game developer has full control over tasks and memory, it doesn't mean that it will be any more efficient than SLI and Crossfire. To load-balance, some algorithm must be chosen that can split work between multiple GPUs, and successfully combine the results. The Alternate Frame Rendering algorithm, or AFR, separates draw calls by the frames they affect. If you have three GPUs, then you can just draw ahead three frames at a time. It's easy to implement, and performance scales very well when you add a nearly-identical card (provided the extra frames add to the experience).
Memory, on the other hand, does not scale well. Neighboring frames will likely draw the exact same list of objects, just with slightly adjusted data, such as camera and object positions. As a result, each GPU will need their own copy of this data in their individual memory pools. If you have two, four-gigabyte cards, they will each store roughly the same four gigabytes of data. This is a characteristic of the algorithm itself, not just the limited information that Crossfire and SLI needed to deal with on OpenGL and DirectX 11.
Other algorithms exist, however. For comparison, imagine a fighting game or a side-scroller. In these titles, objects are often separated into layers by depth, such as the background and the play area. If these layers are rendered by different GPUs into separate images, they could be combined later with transparency or z-sorting. In terms of memory, each GPU would only need to store its fraction of the scene's objects (and a few other things, like the layer it draws). A second benefit is that work does not need to split evenly between the processors. Non-identical pairings, such as an integrated GPU with a discrete GPU, or an old GPU with a new GPU, could also work together, unlike AFR. I say could, because the difference in performance would need to be known before the tasks are split. To compensate, the engine could vary each layer's resolution, complexity, quality settings, and even refresh rate, depending on what the user can notice. This would be similar to what RAGE did to maintain 60 FPS, and it would likely be a QA disaster outside of special cases. Who wouldn't want to dedicate a Titan graphics card to drawing Street Fighter characters, though?
Then again, video memory is large these days. It might be better, for quality and/or performance, to waste RAM in exchange for other benefits. AFR is very balanced for multiple, identical GPUs, and it's easy to implement; unfortunately, it could also introduce latency and stutter, and it is inefficient with video memory. Layer-based methods, on the other hand, are complicated to implement, especially for objects that mutually overlap, but it allows for more control in how tasks and memory are divided. VR and stereoscopic 3D could benefit from another algorithm, where two similar GPUs render separate eyes. Like AFR, this is inefficient with memory, because both eyes will see roughly the same things, but it will load-balance almost perfectly for two identical GPUs. Unlike AFR, it doesn't introduce latency or stutter, but it is useless outside of the two nearly-identical GPUs. Other GPUs will either idle, or be used for something else in the system, like physics or post-processing.
In any case, the developer knows what their game needs to render. They can now choose the best algorithm for themselves.