Review Index:

Ray Tracing and Gaming - One Year Later

Author: Daniel Pohl
Subject: Processors
Manufacturer: Intel

Ray tracing faster than rasterization: Example 1

I have been asked several times when ray tracing will become faster than rasterization (the current GPU approach for rendering). There are cases in which it is faster on a single desktop machine TODAY!

Example 1: High number of triangles

Ray tracing uses "acceleration structures" that sort all the geometry of a virtual world according to their position in space. There are several different acceleration structures, but the most common ones used in ray tracing are Uniform Grids, BSP-trees, kd-trees and Bounding Volume Hierarchies (BVHs).

As described in the introduction, when a ray is shot, we must find the first object that is hit by that ray. The general idea of spatial partitioning / acceleration structures is best described through a small example:

Imagine you have a building with 4 levels and on each of those there are 4 rooms, if a player is located in the top-most, left-most room as shown below:

Artistic representation of a building with 4 levels with each 4 rooms

When ray tracing, we want to figure out whether a piece of geometry is hit or not. In ray tracing speak we talk about a "camera" and when we shoot rays through it to determine what is visible we refer to these as "eye rays" or "primary rays". Naively, we could just shoot eye rays everywhere and see what gets hit and what does not. Clearly, in the above example 15/16 of these rays would be wasted. Thinking about this situation a little, it seems obvious that it is very unlikely that the camera will "see" anything in the right-most room on the lowest-level, so why even bother checking if a ray hits any of the geometry from that area? How can we avoid such redundant work?

With hierarchical spatial acceleration structures we can just look at a convenient representation of an area and determine if there is any interesting detail there worth investigating further. If not for a given area or volume, we can save a lot of computational effort. This way we can focus computational resources on the areas where the geometrical details are located.

One advantage of such hierarchical data structures is that they have the effect of changing a "linear search" – check everything (every time) to see if there is a match of interest -  to a "logarithmic" search – check the highest level which represents the bounds of a large area, if and only if there is detail of interest in this area, then proceed to the next level. In the above example, the "top" level might represent the 4×4 grid...

...the next level might split the grid right down the middle top to bottom...

...on the left side that half might be split in the middle from side to side etc.

In the linear case we check every square in the grid – 16 units of work. In the logarithmic case, in step one, we eliminate the right half; In step two we eliminate the left, bottom half; In step three, the right half of the remaining quartet, and; In step 4, we determine which remaining half the interesting detail is in – so 4 steps instead of 16. If we increased the number of grid cells to 32, the result would be 32 checks in the linear case to 5 checks in the logarithmic case and so on. One way to think of this is that if we increase the complexity of a scene 10 times, using a hierarchical acceleration structure increases the cost of finding something by only 2×. Contrast that with the traditional rasterization approach, it is condemned to use a linear approach, so if we increase complexity by 10× the cost goes up by 10×.

This behavior is represented in this diagram:


The green curve represents the logarithmic behavior of ray tracing when the number of triangles are increased, the red line represents the linear behavior of rasterization. As you can see, initially for ray tracing (when the polygon count is low) ray tracing performance is at a disadvantage compared to rasterization, but quickly the two curves meet, and from that point on, as complexity increases ray tracing is ALWAYS faster than rasterization. This cross-over point depends on many factors: the performance of the CPU, the performance of the GPU etc, but this trend is a mathematical certainty, a logarithmic curve will always intersect a linear curve and the logarithmic curve will always win! Due to the linear scaling of ray tracing performance, doubling the number of CPUs would shrink the height of the green curve by half, moving the intersection point (S) closer and closer to 0, ie throw enough CPU cores at the problem and Ray Tracing would always be faster than Rasterization using a GPU.

One example that clearly is above that point is the Boeing 777 model with 350 million triangles. This extreme highly detailed model that includes every screw of the plane has been provided by Boeing to the research community in order to experiment with methods on how to render it.

Ray tracing has been proven to be the right solution for that ( ). Even back in 2004 the research group from Saarland University was able to render the model with 3 to 7 frames per Second at 640x480 pixel resolution with a dual-core CPU from that year.

The question may arise, why can't these acceleration structures be used in rasterization? Well, they are, to a certain extent, but at a lower precision. Some games have one such coarse-grained structure for rendering graphics, a different one (at a different resolution) for collision-detection and yet another one for AI (sometimes created by different third-party infrastructure suppliers). Besides taking up more memory than necessary, one problem in using these three separately computed data structures is the effort to keep them consistent. See below for an example of what happens when the information from the collision detection structure differs from the rendering structure.

In the game "Oblivion" two different structures are used for graphical rendering and for collision detection. The process of a slowly closing door, changing its angle from frame to frame is clearly visible to the player. Therefore four states can be speculated: Open, closing, closed and opening. But the structure for collision detection engine does not update the dynamic movements with fine details such as the angle that the door is currently at. Therefore other NPCs like "Velwyn" can detect only two states of the door: Open; or Closed.

What happens in the game as a result, is that Velwyn approaches the player while is obviously closing, but Velwyn only gets information that the door is either open or closed. So we can end up in the situation depicted below where the character ends up mixed up with the door.

Velwyn stuck in the door in Oblivion (2006)

(Of course ray tracing can also be used for collision detection to avoid those problems but that is another story.)

But even if the acceleration structure for graphical rendering is consistent with the rest then there is another problem: In ray tracing we are testing a ray in a per-pixel exact method against the triangles. In rasterization there are no rays therefore the relevant area / volume of the structure can only be approximated. This can be done by different methods. Let' have a short look at two of them:

  • Time consuming pre-calculations resulting in statements like "When I am in this room 1, then I could potentially see into room 2 and 3, but not room 4."  For more information have a look at the Wikipedia entry for "Potentially Visible Set" 
  • Manually placed hints from a level designer for the engine known as visibility portals. Those consume a lot of time for the artists, e.g. Quake 4 has 3,200 of them. So far automatic algorithms for this have not made it into the practical world of game developers. The evaluation of the portals during the rendering process leads to complicated multi-pass techniques: First the scene is rendered with placeholders for these portals. Once it is detected that one of the placeholders is visible then this part of the scene is rendered again in full detail. More information can be found here