Investigating Bandwidth on the SLI X16 Chipset
Why PCI Express Bandwidth Matters
If you recall not too long ago, when PCIe video cards and motherboards were first coming to market, most reviewer's noted that since the AGP bus was far from being saturated in its last 8x implementation, the super high bandwidth x16 PCIe connections were mostly being under utilized. This was very obviously true and our initial PCIe benchmark numbers showed that in fact there was very little performance difference between the same GPU on an AGP motherboard and a PCI Express motherboard.
Then along came NVIDIA's reinvention of SLI. By adding another GPU into the mix, NVIDIA greatly increased the amount of PCIe bandwidth being used as it was, and is essential that both GPUs communicate with one another quickly. NVIDIA's SLI used both the PCIe bus in its dual x8 configuration and an external connection between both cards in the form of a small bridge. The current generation of SLI X16 motherboards continue this but with a full x16 PCIe connection under each card. ATI's CrossFire uses the PCIe bus as well, both in their x8 and new x16 motherboard options, but uses an external dongle that is responsible for transferring the frames from the secondary card to the primary card's rendering engine. For more information on SLI and CrossFire, visit my respective articles on both of these technologies.
In NVIDIA's latest nForce4 X16 chipset for the AMD platform, the two x16 PCIe connections do not reside in the same chip. The primary GPU slot stems from a x16 connection off of the CK08 north bridge and the secondary GPU slot is supplied by an additional x16 connection on the MCP south bridge. In order for these two cards to communicate over the PCI Express bus, the data from the cards must travel to their respective core logic chips, then transfer over a HyperTransport connection that is the interlink between the north and south bridge. This connection speed was previously not revealed by NVIDIA, but after recent accusations from ATI, was indicated as a full x16 HT/PCIe connection.
ATI's new RD580 chipset that also supports two full x16 PCIe connections has a very different approach that uses a single north bridge chip that houses 40 total lanes of PCIe to power both x16 GPU connections. This means that ATI's CrossFire GPUs do have one less 'hop' to communicate with each other and ATI even tells us that the x16 slots have a direct communication link inside the north bridge that does intermix with the core logic, thus bypassing any additional latency in that regard.
Just before the RD580 launch on the first of March, ATI came to the media with some claims about NVIDIA's X16 chipset and how it was much less efficient than ATI's own competing architecture. It was said that the connection between the two GPUs was in fact much less than the full x16 PCIe speeds that ATI's chipset had and that this would mean a less than optimal performance scenario for the end user. ATI provided some data from internal testing they had done that indicated NVIDIA's bandwidth for communication over the NF4 SLI X16 inter-GPU connection was only around 1500 MB/s compared to ATI's 2500 MB/s.
ATI's Test Results
This data provided by ATI indicated that NVIDIA's chipset provided much slower data transfer between the two X1900 XTX cards they used for testing. When the HT multiplier was lowered to 2x and 1x, the results were even lower, as we would expect. Since there is no north bridge to south bridge connection needed for transfer between the GPUs on the RD580, ATI's results remain the same throughout the graph as it has no multipliers to modify.
Also, ATI conjured up that since NVIDIA's bandwidth numbers were so much lower than theirs, the fact that the MCP is also responsible for the networking, USB, storage and PCI bus, this would actually turn out to be even more of an issue for users. Since the same connection between the south and north bridges had to not only supply data between the two graphics cards bus also storage data, networking, etc then gaming while using these features would be even slower.
When we approached NVIDIA with this data, they quickly took me to their SLI engineers office and showed me some competing results that indicated the bandwidth for their inter-GPU connections on the SLI X16 chipset was not only more than enough for the necessary transfers but were also much higher than what ATI was indicating to us.
NVIDIA's Test Results on A8N32-SLI
These benchmark results from NVIDIA's testing show that in their GPU to GPU frame buffer copy test, the bandwidth between the two cards was between 2500 and 2600 MB/s. These numbers are very similar to what ATI's claims on their own chipset.
I also asked NVIDIA to run their test with 7800 GTX 512 GPUs on the A8R32-MVP motherboard sporting the ATI RD580 chipset.
NVIDIA's Test Results on A8R32-MVP
You can see that NVIDIA's results for the ATI chipset are pretty much on par with their own results for the nForce4 SLI X16.
Of course, both ATI and NVIDIA questioned the other's testing methods, and for good reason. Both applications are very proprietary and neither company wanted to part with the testing application necessary to run these benchmarks. ATI's was run in DOS without any drivers loaded while the NVIDIA test was run in a standard Windows install. The tests only varied further from there.
Page 2 - Our Own Testing