NVIDIA Optimus Technology: Performance and Battery Life for your Notebook
How Optimus Works
The NVIDIA Optimus profiles are used to enable the GPU for a particular application if it can "add quality, performance, lower power or add functionality" to it. NVIDIA is apparently spending a lot of time verifying and validation these profiles for each application on a large amount of hardware including current-generation notebooks, upcoming Arrandale machines and even the lowly Atom-based netbooks. For each configuration, the GPU may or may not benefit the end user and NVIDIA claims to always be looking out for the consumer when it comes to the profiling system. But what good does all of this profile work do for NVIDIA is they can't get the fruits of that labor out to consumers in a timely and easy manner? Let's be honest, the typical notebook consumer doesn't update their drivers EVER, let alone on a regular enough basis to trust that their Optimus profiles are updated for any new software (or even updated software) they have installed on it.
Enter the first NVIDIA "profile push" system.
This option in the new Optimus driver control panel should pretty much say it all: NVIDIA is officially launching a new automatic profile updating platform that all Optimus users will by default be integrated into. As new applications are released or updated NVIDIA's engineers will be creating new profiles to properly optimize the Optimus technology for various circumstances and hardware configurations. Those profiles will then be pushed to the customers machine whenever they are online. Exact details of this integration (how often will the notebook check for updates, how big are the downloads) aren't quite revealed yet but NVIDIA is hosting this service completely on their own so we are eager to see how well it works out.
For the hardcore user that wants to do so, you can disable this feature and count on driver updates or manual additions. But to me, that defeats the purpose of this seamless technology to begin with. This diagram above gives us an overview of the technology in its entirety: from the application launch to the final output display and including the automatic profile push system. There are still quite a few questions about how Optimus technology works and while NVIDIA wasn't willing to share 100% of the details, we were able to get a lot of it out of them.
The Man Behind the Curtain
First, we should note that even had NVIDIA come up with a solution like this before last year, they wouldn't have been able to implement it. Without the support that Windows 7 brought for installing display drivers from different hardware vendors we could not have both Intel IGP and NVIDIA discrete solutions accessible at the same time. Indeed, that was one of the reasons why the first switchable graphics options required the unique proxy driver that also created so many hassles of its own. Now with the ability to load both vendors drivers on the same system the communication between the two could be delegated easier.
One of the key software ingredients in the new Optimus driver is called the NVIDIA Routing Layer - a piece of technology that was only vaguely described in NVIDIA's own white paper on Optimus technology. From what I can tell the routing layer is what decides which portion of any given application is going to be computed on the integrated graphics or by the NVIDIA discrete graphics and that decision is based on a lot of algorithmic information they aren't likely to divulge. Essentially that means even when the GPU is powered on it isn't controlling all of the programs and applications being run as you might expect. Instead, if you are watching a Flash video while doing email in another window, the Intel IGP will still be handling all the commands and information for the email browser while the NVIDIA GPU is working on the Flash video.
Since the integrated graphics will be the default GPU for nearly all cases, the profiling system becomes incredibly important here. When a profile is configured to enable the GPU it is actually only letting the GPU know that it MIGHT be needed and that is should be online and ready to respond to requests from the system. Without a profile, the system would basically by-pass the discrete NVIDIA GPU all together and work only with the Intel IGP; this is obviously why NVIDIA was determined to get a quick and easy profile updating system in place. The profiles, in essence, hand the power of control from the IGP to the discrete GPU letting NVIDIA's Routing Layer decide which components it will handle and which it will pass off to the IGP.
As described earlier, the NVIDIA Optimus-ready GPUs also needed to be able to quickly and efficiently move data from its own local frame buffer (GPU memory) to the main system memory that acts as the IGPs frame buffer. This allows NVIDIA to use the IGP as a display controller as the IGP simply reads from that frame buffer as it normally would and outputs the results to the screen, even the results that NVIDIA placed. The data transfer for this has to be done only through the PCI Express bus - by using standards for communication NVIDIA is hoping to prevent any future "lock outs" from Intel's engineering or legal teams. Traditional methods for doing this via DMA (direct memory access) were slower and also added delay in the rendering pipeline by forcing the GPU into a halt state until the data copy was completed to prevent synchronization errors.
The original method forced the 3D Engine to halt while the data copy was made but the NVIDIA Copy Engine does not.
The new NVIDIA Copy Engine, integrated into the GeForce 200M, 300M and upcoming Fermi mobility GPUs, performs this task in a faster, asynchronous manner that prevents any kind of noticeable delays in data transfer. This faster DMA operation allows for simultaneous 3D rendering and data copy with only a 3ms latency when running at 60 Hz.
Let's not forget the technological feat in being able to completely power off the GPU and associated PCI Express lanes to a completely 0 watt state! When the GPU is going to be needed for processing it is able to power on and start accepting computing calls within 300ms.