NVIDIA's Quad SLI Technology - Performance and Quality
Quad SLI Introduction
Today, NVIDIA is introducing the enthusiast community to Quad SLI once again. Only this time, they want you to go buy it and build yourself; no longer is Quad SLI being delegated to the boutique system builder. NVIDIA's reasoning for not wanting to put Quad SLI in the hands of enthusiasts immediately was due to the software questions and instability that existed initially while the driver software was being tweaked. As of today, NVIDIA has readied the driver for the public and we have been spending the last couple of weeks playing with it and the technology behind Quad SLI in order to share our experiences with you.
Quad SLI History
Quad SLI technology was original debuted at CES in January of 2006, here on PC Perspective. Since then, the technology has undergone several revisions including newer GPUs, smaller PCBs and actually having system integrators like Alienware and Voodoo PC selling Quad SLI systems.
Below is some additional technical information from a recent article.
Until today though, much of how quad SLI works has been clouded in an NVIDIA-induced mystery. Now we can share some more information about the technology and the design decisions behind it.
Above you can see a high-level block diagram of how quad SLI logic works. Each large green area represents a single, dual GPU card. In the center of each card you'll notice there is an object labeled 'X48 PCI-E' that is an NVIDIA developer x48 lane PCIe connection chip. It accepts a single x16 PCIe connection input and creates two additional x16 links that lead to each of the two G71 GPUs on the card. This chip is responsible for splitting the data between the two GPUs and making sure that the data the each passes back to it makes it to the PCIe bus on the motherboard without issue.
Each card features two G71 cores and two separate 512 MB frame buffers, for a total of 2 GB of frame buffer memory in a quad SLI computer. Excessive much? Perhaps, but the technology is simply impressive to think about. Do you remember when NVIDIA first introduced SLI into the market and discussed that they had implemented the SLI connection logic into the GPU from the beginning? Well it turns out that NVIDIA did not simply include a single SLI connection; they actually developed two of them. Both of these logical connections are utilized in a quad SLI system.
For our discussions, referencing the block diagram above, I'll call the left hand side G71 core the primary GPUs on each card and the right hand side G71 core the secondary. Each primary GPU is connected to the secondary GPU on the same board through one of the two SLI connections, made with an external connection between the two PCBs on the same card.
Quad SLI Rendering Modes
The rendering modes available on the quad SLI technology are mostly very familiar ones and work just the way you think they would.
First, the quad SLI version of AFR simply uses each GPU core to render a frame, alternating over fourth frame instead of every other frame.
Quad SLI is also compatible with a split frame rendering mode that also does what you would expect; instead of breaking the area of the screen into two pieces it divides it into four. Since SFR uses slightly more CPU overhead when it has to divide up the processing, it would only make sense that quad SLI SFR mode is slightly less efficient than the dual SLI SFR mode, though the additional GPU processing power should more than make up for it.
A new rendering mode being introduced with quad SLI is called AFR of SFR and is also pretty intuitive. Basically, the logic goes that you take two GPUs and run a standard split-frame rendering technique off of them and you do this for both pairs of GPUs. Then you use an alternate frame rendering method to alternate between the SFR frames being generated.
Quad SLI is also introducing a new SLI AA option along with it dubbed 32X SLIAA. The process by which NVIDIA's drivers generate a 32x anti-aliasing blend is pretty complicated, but a quick diagram might help.
Basically, each GPU is rendering a 4x MSAA blend and then it is combined between all four of the GPUs to get a final output that is technically equivalent to a 32x sample. Each GPU on the same card shares the data to generate a single 8xAA sample and then each 8xAA sample is shared across the individual cards and blended into 16x samples and then finally combined to generate a 32xAA sample that is sent to the screen.