AGEIA PhysX Physics Processor Technology Preview
The four components of an advanced physics simulation all require some specific hardware requirements that will allow them to be processed fast enough to be run in real time. Scale is the most obvious of these, and it is very task-parallel in nature. The solution must be able to handle large chunks of data worked on in parallel. If you can have a one boulder rolling down a hill that can be tracked on one processing pipeline each time-step, then adding 23 more pipelines should expand the number of boulders you can simulate to 24. Having a highly parallel system is essential then.
Memory requirements are also very important when you look at scale, fidelity and interaction. All of the components that can be interacted with must each have their own memory space, so the more objects you want, the more memory you are going to need. The amount of data that the physics processing must be able to work on at a single time should be great and so should the amount of memory bandwidth available on the system for the sudden spikes of activity you will see when say, one wall of a castle explodes.
AGEIA's Architectural Answers
So how does AGEIA's first generation PPU (physics processing unit) address these requirements and keep the future open for even more physic processing performance? First, AGEIA has massive amounts of internal bandwidth on the chip. They claim to have nearly two terabits per second (2 Tbits/s) of internal memory bandwidth to work with, many times more than even the fastest processors or GPUs available today. This addresses the needs of our first component of a physic processing system, scale, to a T. Detecting and resolving the collisions of a large number of moving rigid bodies requires this kind of bandwidth to implement the geometric math necessary for the calculations.
The next architectural feature that the PhysX chip integrates are processing cores optimized specifically for physics work. While they refused to spill the beans on actual multi-core configurations, they claim that "each of the many processing cores of the AGEIA PhysX processor" are configured for physics work specifically. Even more direct, the yare tailored to work on the geometric and linear algebraic calculations than any college physics student should know about.
Another feature AGEIA is touting is their "massive multi-core system" that includes "multiple independent processing elements." Again, I couldn't get a final answer, but this could be referring to either physical cores or perhaps just different pipelines on the chip architecture. This is mainly addressing the interactivity of an advanced physics processing system and provides enough compute bandwidth to enable the different components to communicate and interact effectively.
Finally, the memory architecture on the PhysX PPU is incredibly high bandwidth and has the ability to "scatter gather" which allows the unit to access memory quickly even when the accesses must be very random throughout a large set of physical objects in the system. And because all of these objects will be constantly moving through the system, the memory architecture is even more important on the PPU design.
It is a combination of these features and others that AGEIA haven't yet disclosed that they feel allows all four of their defined portions of an advanced physics engine to work well. Of course, without an open look at the architecture, its hard for us to get much more detail on what is going on inside the chip, as we do with GPUs and CPUs so much more frequently. Hopefully in the future AGEIA will open on this kind of information.
CPUs and GPUs as Alternatives?
AGEIA did spend a fair bit of time in their whitepaper talking about how the CPU and GPU just don't have the features and ability to come close to what the PhysX PPU can do. On the CPU side of things, AGEIA points out that even though the CPU has much faster internal memory system compared to GPUs communicating to texture cache, it is substantially less than what the PhysX can do. In the end though, it's the generalized nature of the CPU that keeps it from completing physics calculations with effective speed, even in dual core versions, and AGEIA feels that the visible roadmaps have no significant jump in physics processing capabilities from AMD or Intel.
The GPU debate is much more heated, especially since NVIDIA's announcement of a partnership with Havok FX just a couple of days ago. AGEIA claims that the requirements for graphics and physics processing differ in fundamental ways that cannot be overlooked or bypassed no matter the software implementation a GPU vendor takes.
First, GPUs do not have enough internal memory bandwidth with their limited texture cache link. Because pixel shading has become more dominant in games that texture shading (which accesses memory), the bandwidth increases on GPUs have not moved up enough to match what AGEIA claims to have and what they claim is required. This will negatively affect the scale that a GPU can address in physics calculations as the number of entities it can store and access simultaneously is diminished.
The lack of a real write-back method on the GPU is also going to hurt it in the world of physics processing for sure. Since pixel shaders are read-only devices, they can not write back results that would change the state of other objects in the "world", a necessary feature for a solid physics engine on all four counts.
AGEIA also claims that since the API for a graphics card is going through Direct3D and any software that does physics calculations on the GPU is forced to "map" the physics language to the pixel processing language, there is additional overhead. The ability to easily map physics code to a physics pipelines will increase speed and lessen complexity on the software back end system.
Because of these limitations, the physical simulations that are possible on a GPU are limited mainly to eye candy and special effects. And actually, NVIDIA didn't try to deny this fact during our briefing on SLI Physics, so it makes a lot of sense. However, AGEIA doesn't want to just bring eye candy to games, they want to change the way games are made and played from the ground up. Of course, they will also be starting with the eye candy features as well, but who's counting? So while AGEIA admits that the NVIDIA and Havok FX announcement will probably be able to produce some simple collisions between particles and static geometry in an "acceptable" manner, they will in no way be able to scale the way the PhysX will.
Another interesting issue that AGEIA brought up is that since the Havok FX API, and any API that attempts to run physics code on a GPU, has to map their own code to a Direct3D API using Shader Models then as shader models change, code will be affected. This means that the Havok FX engine will be affected very dramatically every time Microsoft makes changes to D3D and NVIDIA and ATI makes changes in their hardware for D3D changes (ala DX10 for Vista). This might create an unstable development platform for designers that they may wish to avoid and stick with a static API like the one AGEIA has on their PhysX PPU.