HYDRA Engine by Lucid - Multi-GPU Technology with No Strings Attached
SLI and CrossFire could be done
Who is Lucid?
You probably haven't heard of Lucid before, also known as LucidLogix. They are a fab-less semiconductor company meaning they design chips but outsource the manufacturing to other companies like TSMC. NVIDIA and ATI are also fab-less though obviously Lucid is a much smaller organization. The company has backing from several investors including Intel Capital and has over 50 patents (or patent applications) for their first technology we are looking at here.
While the company itself might not be overly exciting, the HYDRA Engine technology they are showcasing this week certainly is.
What is the HYDRA Engine?
At its most basic level the HYDRA Engine is an attempt to build a completely GPU-independent graphics scaling technology - imagine having NVIDIA graphics cards from the GeForce 6600 to the GTX 280 working together with little to no software overhead with nearly linear performance scaling. HYDRA uses both software and hardware designed by Lucid to improve gaming performance seamlessly to the application and graphics cards themselves and uses dedicated hardware logic to balance graphics information between the CPU and GPUs.
Why does Lucid feel the traditional methods that NVIDIA and AMD/ATI have been implementing are not up to the challenge? The two primary multi-GPU rendering modes that both companies use are split frame rendering and alternate frame rendering. Lucid challenges that both have significant pitfalls that their HYDRA Engine technology can correct. For split frame rendering the down side is the need for all GPUs to replicate ALL the texture and geometry data and thus memory bandwidth and geometry shader limitations of a single GPU remain. For alternate frame rendering the drawback is latency introduced by alternating frames between X GPUs and latency required for inter-frame dependency resolution.
How it Works
HYDRA is a dedicated silicon with sole purpose of scaling GPUs. Though there is no graphics processing logic on the HYDRA chip, what the chip can do is redistribute graphics workloads across multiple GPUs in real-time.
The HYDRA technology also includes a unique software driver that rests between the DirectX architecture and the GPU vendor driver.
The distribution engine as it is called is responsible for reading the information passed from the game or application to DirectX before it gets to the NVIDIA or AMD drivers. There the engine breaks up the various blocks of information into "tasks" - a task is a specific job that HYDRA defines that can be passed to any of the 2-4 GPUs in the system. A task might be something like a specific lighting effect, a post processing run, a specific model being drawn, etc. The company founders on hand at the meeting were a little vague about the algorithms that decide how, and what parts, of the DirectX data are going to be defined as "tasks" - it is obvious that this is part of the magic that gives HYDRA its power; it is with these task definitions that the hardware logic can efficiently distribute the work load across many GPUs.
Once the tasks have been created, they are then sent over the PCI Express bus to the HYDRA chip where they are VERY quickly processed and split between 2 to 4 GPUs. The HYDRA Engine passes off these tasks to the GPU, awaits a result and return of finished data or pixels, and is then responsible for passing that information on to one of the GPUs for final output to a monitor. At the outset, this doesn't sound that much different than what NVIDIA and AMD already do with AFR and SFR rendering modes, but after seeing the HYDRA technology at work it is obviously something very different.
By essentially intercepting the DirectX calls from the game to the graphics cards, the HYDRA Engine is able to intelligently break up the rendering workload rather than just "brute-forcing" alternate frames or split frames as both GPU vendors are doing today in SLI and CrossFire. And according to Lucid all of this is done with virtually no CPU overhead and no latency compared to standard single GPU rendering.
To accompany this ability to intelligently divide up the graphics workload, Lucid is offering up scaling between GPUs of any KIND within a brand (only ATI with ATI, NVIDIA with NVIDIA) and the ability to load balance GPUs based on performance and other criteria. The load balancing is based on a couple of key data points: pre-existing knowledge from the Lucid team about the GPU in question and the "response time" of the GPU when being sent data from the HYDRA Engine chip. The HYDRA driver will actually recognize the GPUs in a system and will estimate how much processing power each holds but will then fine tune that estimate based on real-time performance of the GPU in action. If a GPU is sent a "task" to perform and the return time on it is slower than expected, the HYDRA engine will back off slightly and send more "tasks" to the less-loaded GPUs. All of this is updated on the fly, in real time as the game is running.
With the ability to divide up the graphics processing into tasks and monitor GPU load, the HYDRA engine offers up some very interesting types of GPU scaling. Yes, it can and will sometimes run standard split frame rendering - this is very efficient for per-pixel processing heavy workloads. However, the HYDRA can also implement some much more interesting divisions of work.
What you are seeing above is two monitors, each displaying the workload of a single GPU in a dual 9800 GT configuration. This of course wouldn't normally be how a game is presented to users - it is simply a way to demonstrate their technology to the press. Before the images are merged again you'll see two very different screens - one has some rendered areas of a level of UT3 with some areas completely in black while the other monitor has the inverse - opposite rendered areas and black areas.
Click to Enlarge - merged image on the left, half of the rendered image on the right; notice no floor on the right, etc.
This is the power of the task-driven graphics workload division. You can clearly see that some of the "items" in the world are being rendered by one GPU while the background and trees by another. There is no rigid requirement of certain size or shapes of divisions and thus many of problems found in "box rendering" are avoided. If Lucid is to be believed, this is the division of tasks that are about "even" in required rendering power. Lucid pointed out there are many other split rendering methods that it uses in the background for games but that demonstrating them to the end users in any way is pretty difficult to do - techniques like blending pixels in a frame buffer for example.
Even more importantly though is that this rendering method is NOT predefined by any driver profile as with NVIDIA's and AMD's SLI and CrossFire technology. Instead, because of HYDRA Engine's pre-processing work, the rendering method can and will change throughout the game and sometimes even inside of individual frames. The HYDRA chip itself does some of this algorithmic work with help from the driver and task setup process.