An Introduction to CrossFire X technology

AMD has been teasing us with the notion of four GPUs in a system for quite a while, and today we were finally able to get our hands on the driver to enable it. Come in and see how much damage a pair of Radeon HD 3870 X2 graphics can do!!


Introduction

It’s rather funny that until today, the technology behind CrossFireX was really supposed to be some kind of secret.  The idea of plugging in more than two GPUs into a multi-GPU system for AMD/ATI is actually brand new in the modern hardware era.  Ever since we first saw ATI graphics boards with pairs of CrossFire connectors on them we were told to expect the ability to scale with three and four cards in the future.  The only problem was the future just never caught up to us.  Until now that is. 

We have chronicled the recent news and rumors about CrossFireX and the ability to pair up several graphics boards in a single system for some time.  There have been interviews with AMD staff, the news about the upcoming RS780 chipset and hybrid graphics as well as our own revelation that CrossFireX would be working on Intel’s Skulltrail platform while NVIDIA’s 3-Way SLI would not.  Even in our review of the Radeon HD 3870 X2, AMD teased us with the “coming soon” insignia of 2 Teraflops of GPU computing power.

But have no fear faithful AMD fans, the CrossFireX driver is merely weeks from a public release and we were given the chance to test it out for ourselves.  Read and enjoy.

AMD CrossFireX in Brief

CrossFireX technology is really simple in theory: take two, three or four GPUs and use their power to render one game faster than you otherwise would be able.  

AMD Quad-CrossFire - CrossFireX Performance Preview - Graphics Cards 58

While simple in theory, because of the complexities of the drivers, DirectX versions and the games themselves, it is quite difficult in implementation.  Just ask NVIDIA how well Quad SLI has gone for them in 2007.  

AMD is the first GPU company to attempt to make multi-GPU technology as forgiving as possible.  What do I mean by this?  Simply that AMD is trying to get rid of the hard coded restrictions that NVIDIA and their SLI technology place on gamers and what hardware they can use together.  You can only use same-GPU graphics cards for example, on both SLI and past CrossFire platforms.  That is changing; here are some of the most interesting bullet points that AMD is bringing to enthusiasts:
  • Pairing of two Radeon HD 3870 X2 cards (two GPUs on one card) for four GPUs
  • Pairing a single Radeon HD 3870 X2 with either an HD 3870 or HD 3850
  • Combining up to four RV670 cards of any kind or speed: HD 3850 256MB, HD 3850 512MB and HD 3870 512MB for improved performance
This obviously left me with lot of questions as to performance, compatibility, etc.  AMD was kind enough to answer these questions for me today. 
 
Q&A with Catalyst lead Terry Makedon

PCPER: What type of algorithm does CFX use for breaking up GPU work?  Is it AFR/SFR?  Some combo?

AMD: Always AFR.

PCPER: How flexible can the CFX software be in terms of pairing different GPUs together?

AMD: CrossFireX supports multi-GPU configurations of any combination of RV670- and R680-based products (i.e. any combination of ATI Radeon HD 3850, HD 3870 and HD 3870 X2 cards that add up to four GPUs).

PCPER: How does CFX work with card with different size frame buffers like a 3850 512MB and a 3850 256MB?

AMD: In cases where different memory configurations are being used or different clock speeds, the faster memory has to wait for the slower one.  Clock speeds aren’t reduced through the driver as they were in the past, but the net effect is still the same – you’re always limited by your slowest card. Having said that there is no problem in mixing and matching different frame buffer sizes.

Driver will always go to lowest common denominator whether it will be memory speed or frame buffer. So in essence the 512MB card will be seen as a 256MB card. I unfortunately don’t have scaling numbers handy for 256 and 512 MB cards mix-ups (I don’t actually think we ever would have benchmarked that scenario).

PCPER: Why does DX10 have more trouble working with 4 GPUs than DX9 does?  I thought there was a DX9 limitation to frames rendered ahead originally?

AMD: The limitations in DX9 actually make it easier for an application to be AFR friendly.  The biggest issue is DX10 has a lot more opportunities for persistent resources (resources rendered or updated in one frame and then read in subsequent frames).  In DX9 we only had to handle texture render targets, which we have a good handle on in the DX10 driver.  In addition to texture render targets DX10 allows an application to render to IBs and VBs using stream out from the GS or as a traditional render target.  An application can also update any resource with a copy blt operation, but in DX9 copy blt operations were restricted to offscreen plains and render targets.   This additional flexibility makes it harder to maximize performance without impacting quality.

Another area that creates issues is constant buffers, which is new for DX10.  Some applications update dynamic constant buffers every frame while other apps update them less frequently.  So again we have to find the right balance that generally works for quality without impacting performance. 

We are also seeing new software bottlenecks in DX10 that we continue to work through.  These software bottlenecks are sometimes caused by interactions with the OS and the Vista driver model that did not exist for DX9, most likely due to the limited feature set.   Software bottlenecks impact our multi-GPU performance more than single GPU and can be a contributing factor to limited scaling.

We’re pushing hard to find the right solution to each issue we come across and boost performance and scalability wherever we can.  As you can see, there are a lot of things that factor in.

PCPER: What are you all doing to address the issue of “lag” in games where the game is “thinking” and rendering up to 3 frames ahead of what the gamer is seeing?  3 frames doesn’t seem like much but in a game that is running at 30 FPS, that might become an issue.  In the cases where the driver has to “throw away” frames because one of the frame took longer than expected to render, is anything adjusted there?

AMD: There are certain theories we will explore in the future, but as of right now that is an unfortunate possible outcome for multi-gpu systems. However we can say we are actively looking for ways to beat this behavior.


So it’s clear from these answers that there isn’t any magic potion that AMD is using to get CrossFireX working as they have: just a lot of work and testing.   And the same problems or issues that were found when mixing cards in previous generations are basically still here: your system will only have the POTENTIAL to work as fast as your SLOWEST graphics board multiplied by the number of GPUs.



« PreviousNext »