ATI Physics Acceleration Gets a Voice
ATI's Architectural Advantages
Now, ATI didn't share all of this information out of the goodness of their heart; they had something else in mind. With NVIDIA and Havok's recent partnership announcement, and with ATI's prior input into the world of GPGPU projects, ATI was eager to specify how their GPU architecture is well built for physics simulations.
First, the R580 has a tremendous amount of floating point capability with its 48 pixel shaders. ATI estimates that 375 GFlops for a single card and 750 GFlops for a CrossFire system are open for different processing models. Compared to a blazing fast modern CPU that has 10 GFlops of total floating point calculation capability, the GPU has tremendous opportunity. Interestingly, though no one outside AGEIA knows for sure how much power their PhysX chip actually has, ATI feels that even if they have 25 GFlops of performance running at 100% efficiency, they can more than make up for their slightly lower efficiency with higher GFlops to spend.
Another architecture feature that is very important to physics processing on ATI's GPUs is the inclusion of their dedicated branching logic. When the R520 and R580 were launched, this feature was critical to ATI's computing logic that graphics shaders were moving to a highly branched coding style. These same branching units can be utilized when doing physics calculations as well.
Finally the highly threaded cores that the R520 architecture was designed with in conjunction with the branching logic allow the ATI GPUs to break up physics calculations into smaller, easier to process sections. Again, just as this feature helped the R520 and R580 in pixel shading performance, physics shading can utilize it as well.
As you can see in the slide above, a finer, more granulated threading system allows ATI's GPU to process a much closer amount of pixels that actually need processing. The more black areas that exist in each example represent more data that is NOT processed and thus more efficient. As the thread size increases to even 16x16 (256 bytes) the number of physics 'pixels' or events that must pass through the system increase dramatically.
This performance diagram was provided by ATI, so of course we use the same caution when looking at the results. That being said, ATI claimed the code that was run on this was written in D3D HLSL using 'out of the books' physics formulas. As you can, if these numbers are even close to being right, the G70 and G71 architecture have a severe performance deficit.
ATI's Software Approach
What's more ATI is taking a broader approach to gaming physics by opening up the hardware to developers just like they promised they would during the R520 launch. They are working on a Data Parallel Processing Architecture Abstraction layer that will allow developers to utilizing the hardware without having to go through Direct3D or OpenGL APIs. This software is being given away for free to developers and API developers to use as they see fit. And while physics is the focus for now, this architecture abstraction will be open to all other types of GPGPU work too.
Without having to go through Direct3D, ATI's hardware should be able to pull more performance out of their architecture than if they left the calculations to D3D. However, the abstraction layer that ATI is writing CAN go through D3D and OpenGL if the developer would like it to.
ATI did admit that a common API for physics coding would simplify the industry and allow competition between NVIDIA, ATI and even AGEIA to exist at a level where we could actually tell you which one is better with some kind of certainty. This might not be good for companies like Havok though, that are purely software based. But with information about 'Direct Physics' already publicly viewable on the MSDN website and on other Direct X areas, we can't stop that progression. It usually takes Microsoft to step in and make a standard for hardware vendors to adhere to, and this should happen with physics as well.