Review Index:
Feedback

Frame Rating Dissected: Full Details on Capture-based Graphics Performance Testing

Vsync and its Effect on Frame Rating – Does it fix CrossFire?

After publishing the Frame Rating Part 3 story, I started to see quite a bit of feedback from readers and other enthusiasts with many requests for information about Vsync and how it might affect the results we are seeing here.  Vertical Sync is the fix for screen tearing, a common artifact seen in gaming (and other mediums) when the frame rendering rate doesn’t match the display’s refresh rate.  Enabling Vsync will force the rendering engine to only display and switch frames in the buffer to match the vertical refresh rate of the monitor or a divisor of it.  So a 60 Hz monitor could only display frames at 16ms (60 FPS), 33ms (30 FPS), 50ms (20 FPS), and so on with a 120 Hz monitor could also being capable of 8ms (120 FPS), etc. 

Many early readers hypothesized that simply enabling Vsync would fix the stutter and runt issues that Frame Rating was bringing to light.  To test this we looked for a game that ran right around the 60 FPS mark in our in normal testing with Vsync disabled and then set about to re-run results with it on.  We are using a standard 60 Hz monitor with the goal of being able to test some 120 Hz capability soon after we figure out a final bug or two with our capture configuration. 

First up, let’s take a look at the NVIDIA GeForce GTX 680 and GTX 680 SLI and see what shows up.

View Full Size

Because the average frame rate per second graph averages out the frame times for a total of one second of time, the averages won’t quite be the straight lines you might have expected.  Looking at the GTX 680 SLI Vsync enabled results the only key item is that the frame rate doesn’t go above 60 FPS like it does with Vsync disabled.

View Full Size

The single card and SLI configurations without Vsync disabled look just like they did on previous pages but the graph for GTX 680 SLI with Vsync on is very different.  Frame times are only switching back and forth between 16 ms and 33 ms, 60 and 30 instantaneous FPS due to the restrictions of Vsync.  What might not be obvious at first is that the constant shifting back and forth between these two rates (two refresh cycles with one frame, one refresh cycle with one frame) can actually cause more stuttering and animation inconsistencies than would otherwise appear.

View Full Size

Based on our graph here we found that with Vsync enabled we had about 87% of our frames running at 60 FPS (16 ms) and 13% at 30 FPS (33 ms).  You might be curious how there could be 60 FPS frame rate so often with Vsync on but very few frames at 60 FPS with Vsync off, and the answer lies in the rate limiting caused by Vsync.  Because of the back pressure on the game engine caused by the longer frame times (30 FPS, 33 ms) from Vsync there is more time for the GPUs to “catch up” and render another frame at 16 ms. 

View Full Size

Our ISU graph on stutter potential tells the story in a more damning light; starting at the 30th percentile the Vsync enabled setup of GTX 680s in SLI are already running at much higher frame variances and it only gets worse as we hit the 60s, 80s and 90s.  At the 90th percentile we are seeing frame variances over 12 ms, which is nearly a complete monitor refresh cycle!

 

Now let’s see how the AMD Radeon HD 7970 results change.

View Full Size

Something interesting is already happening here – the Vsync enabled results from the HD 7970 CrossFire configuration are running at HIGHER average frame rates per second than with Vsync disabled!  The orange line clearly never hits the 60 FPS mark while the black line (Vsync) does. 

View Full Size

Without Vsync we clearly see the runts affecting the plot of frame times here on the HD 7970s in CrossFire but enabling Vsync does appear to eliminate them! 

View Full Size

With our observed frame rate we have the same results for the HD 7970 CrossFire as we did with our FRAPS results, indicating no dropped frames or runt frames.  Standard CrossFire mode still shows the horrible results we have come to expect from our analysis today.

View Full Size

Our Min FPS percentile graph shows us that we are running at 60 FPS (16 ms) 85% of the time and 30 FPS (33 ms) the rest.  Because our data here is based the observed frame rates and not the FRAPS frame rates, there is no correlation between the two CrossFire runs.

View Full Size

The ISU graph of stutter potential again indicates that the Vsync enabled option is introducing higher frame variances than we would like and it is doing it more dramatically and earlier than the GTX 680s in SLI. 

It does appear that enabling Vsync will help alleviate the runts issue seen with AMD Radeon cards in CrossFire but at the cost of much more frame variance and stuttered animation on games that previously didn’t exhibit that problem. 

Let's take a look at another example using CrossFire that has another particular set of circumstances.  I theorized that in a gaming scenario that bordered just under 60 FPS with a single GPU, we would still see problematic results when jumping to HD 7970s in CrossFire.  Take our Battlefield 3 2560x1440 testing: with only one HD 7970 we are running just under 60 FPS most of the time which would, with Vsync enabled, force the game to run at 30 FPS with 33ms frame times.  Ideally we would like to see that move from 33ms frame times to 16ms frame times when adding in another HD 7970 in CrossFire due to the extra performance pushing the card over 60 FPS steady.

View Full Size

Our FRAPS graphs looks how we would hope and expect real-world performance to look.  While the single HD 7970 ran at a non-standard frame rate when performance was under 60 FPS, towards the end (50 sec point) where it could, we see a flat line that is partially hidden behind the pink line.  That pink line represents CrossFire HD 7970s and by doubling the number of GPUs we expected to maximize performance at 60 Hz with Vsync enabled, and we have. 

View Full Size

Observed frame rates calculated by removing runts are showing the Vsync DISABLED results on the HD 7970s in CrossFire mirror what we have seen before with much lower performance.  However, the Vsync ENABLED results did not change! 

View Full Size

The somewhat complicated plot diagram of frame times indicates that at no time did the frame rate of the HD 7970 cards in CrossFire go below 60 FPS or above the 16ms mark - even though there are thousands of frames under 16ms (runts) when Vsync is disabled.  Not only that but performance over the single HD 7970 with Vsync enabled is improved - rather than having jumps between the 16ms and 33ms frame times, we are locked in at 16ms - matching the 60 Hz refresh of our panel. 

View Full Size

The minimum FPS percentile graphic shows the same story - the pink link representing the HD 7970s with Vsync turned on looks solid.

View Full Size

Notice as well that with a static 16ms frame time we see no frame time variance at all in our ISU graph indicating that the kinds of stutter we are searching for are not showing up at all.

How is this happening?  How is enabling Vsync 'fixing' the runts and frame time issues of CrossFire?  The secret lies in the inherent back pressure of vertical sync to pace the graphics card and AMD's CrossFire engines even against its own will.  By forcing the GPUs to only render one frame every 16ms (at the maximum), Vsync is able to force the GPU to pace itself in a way that it would otherwise not.  This doesn't happen in every game though as we saw in the Crysis results first, and there is a lot more testing that needs to be done with Vsync to make a firm decision.

 

NVIDIA has a couple of different solutions in the NVIDIA Control Panel that might help: Adaptive Vsync and Smooth Vsync.  Adaptive Vsync was released with the first Kepler GPUs last year and we found it to be very effective at reducing stutter while also eliminating tearing.  Smooth Vsync is a little known feature that only exists in the driver when SLI is enabled as it takes advantage of many of the same frame metering features that SLI uses.  It attempts to keep frame rates “settled” at a level until it decides it has enough horsepower to move up to the next frame rate option for an extended period of time.  It is a very dubious description at best and NVIDIA didn’t go into much detail on how they decide if they have enough GPU overhead remaining or how long that “period of time” really is.

View Full Size

I decided to run through the same Crysis 3 sequences at 1920x1080 on the GTX 680s in SLI with all four NVIDIA options enabled: Vsync off, Vsync on, Adaptive Vsync and Smooth Vsync. 

View Full Size

Our FRAPS based results show the same similar looking results for standard Vsync on and off, but the adaptive and smooth Vsync options appear to be fixed at 30 FPS with the occasional hiccup on the Smooth Vsync.

View Full Size

The plot of frame times is kind of confusing but the important data is to compare standard Vsync On to Adaptive and Smooth.  With the exception of the 6 or so spikes on the smooth configuration the frames are basically fixed at 33 ms, resulting in a perfectly smooth gameplay experience but at the expensive of limiting performance. 

View Full Size

The observed FPS doesn’t change at all.

View Full Size

Another view here shows the same thing with a fixed frame rate of 30 FPS for adaptive and smooth Vsync options.

View Full Size

NVIDIA’s Adaptive Vsync shows basically 0 variance and only very minimal variance on the Smooth Vsync option at the 96th percentile.  So even though performance is lower on average, the experience is smoother.

 

NVIDIA’s additional Vsync options are definitely a strong point in favor of its technology though the Smooth Vsync only exists on SLI configurations.  I have been told that they were considering adding it to single graphics card configurations and I certainly hope they do as it adds some significant value in the same way Adaptive Vsync and Frame Rate Limiting do.

For both NVIDIA and AMD multi-GPU solutions with standard Vsync, enabling it definitely changes the story.  NVIDIA’s cards pretty much perform as we expected but for CrossFire we didn’t really know what expect with the various visual concerns.  It does appear that the runts problem was at least mostly solved with the enabling of Vsync though to be clear we are only testing a couple of game at this point – much more needs to be done. 

However, enabling Vsync creates a whole host of other potential issues that gamers have to deal with.  Even though the goal of removing visual tearing is met with the option turned on, you do add latency to the gameplay experience, as much as 60ms in some cases, from input to display.  Putting back pressure on the GPU pipeline, for both NVIDIA and AMD, means that some frames are going to be running behind schedule or behind the input timing of the game itself.  Many gamers won't want to deal with those kind of input problems and that is why many still play games with Vsync disabled.  Turning on Vsync does help AMD's CrossFire performance but it isn't the final answer just yet.

March 29, 2013 | 11:28 AM - Posted by Steve (not verified)

Ryan: Is it possible to test AMD's CF with older drivers to see if this problem has been around for a long time or if it is a more recent problem with AMD's continuous driver upgrades to improve speeds?

March 28, 2013 | 07:00 AM - Posted by Filip Svensson (not verified)

First, a very interesting article with lots of information but..

there are some pretty big holes in the conclusions in this article.

First, the conclusion is that you observe smoother graphics if you have lots of GPU frames inside each display frame. This is just not true. If you accept this the following conclusions will also be not true. That runt and dropped frames always affect the perceived frame rate.
I will give you an example that proves this:

Say that the graphics card is able to produce two frames (letters) for each display frame (numbers). So if the the output will look like this to the display:
1A,2C,3E,4G etc.
then you would have an optimal smoothness (as long as the output from the game engine is constant). This even if you in this case drops 1 gpu frame each display frame. If in stead it would be like the following:
1A,2D,3E,4H
then you could perhaps notice some unevenness even though you still have the same number of drops. I doubt it but it could be possible. Someone should do a video and see if this behaviour could be detected by the human eye :)

If you instead have drop frames when the gpu frame is spread across a multiple of display frames, then you would potentially have a serious issue with stuttering. But that is not what you are measuring here. Ex. 1A,2A,3C,4C

One conclusion to this is that you Observed FPS is totally wrong. Both from what I write above and that you are not limiting this figure to the refresh rate of your monitor. Capping the graphs to 60 frames would make it some way better. Alternately give it some other headline for example (and here comes the flame bite):
NVIDIA sponsored measuring technique to make there technology to look good

March 29, 2013 | 10:41 AM - Posted by Ryan Shrout

I don't follow your letters/numbers analogy at all but I can assure we are confident in our results and experiences here.

March 28, 2013 | 07:10 AM - Posted by Martin Trautvetter (not verified)

Very insightful article full of interesting data points, thanks for all the work that went into this!

I wonder if you plan to expand this testing down to AMD's APU and Crossfired APUs, as well as Intel iGPUs in the future, I know it's not as flashy as the 'big guns', but that's where a huge chunk of the market is going and I'm curious if there are skeletons to find in that closet, too.

March 29, 2013 | 10:41 AM - Posted by Ryan Shrout

I do plan on using this method whenever possible going forward.  Laptops are a bit more of a pain since we'd have to external displays, but we are going to experiment.

March 30, 2013 | 06:25 AM - Posted by Martin Trautvetter

Cool, can't wait for your findings!

_
btw: Twich is SO much better when you guys are in the same room, really enjoyed this week's episode!

March 30, 2013 | 06:26 AM - Posted by Martin Trautvetter

Cool, can't wait for your findings!

_
btw: Twich is SO much better when you guys are in the same room, really enjoyed this week's episode!

March 28, 2013 | 07:58 AM - Posted by steen (not verified)

Nice work Ryan. The key is the capture card. Input metering from eg FRAPs to output at the monitor. What you're missing is that games seem to use use single frame timing to determine simulation engine steps. No smoothing to account for any overheads - at all.

This whole AFR caper is just a sham, though. NUMA-esque multi gpu designs are the only way to do it. Simple 3dfx SLI was better at distributing load, but in the days of DX11+, load balancing is tricky. V-Sync with triple buffering is also an option, but input lag is a problem any way you slice it.

I do have concerns over Nvidia's overlay layer & software, though. They do kick an own goal with the GTX68 being slower than a 7970, but that's been known for a while now. They're banking on SLI & Titan. Your comments also spruik Nvidia, rather than just give facts.

March 29, 2013 | 10:42 AM - Posted by Ryan Shrout

Thanks, appreciate it!

I have another version of the overlay from a third party we are testing out as well.

March 30, 2013 | 10:03 AM - Posted by Luciano (not verified)

They're not the same code nor are available through the same ways.
But they are the same methods and persue the same results.
SLI and Crossfire are not the same thing...
But...

March 28, 2013 | 08:08 AM - Posted by steen (not verified)

P.S. Did you get a visit from Tom Petersen, too? ;)

March 30, 2013 | 10:22 AM - Posted by John Doe (not verified)

He gets a visit from me everyday.

March 28, 2013 | 08:37 AM - Posted by steen (not verified)

P.P.S. (Sorry) Haven't you fixed the sampling rate of the capture card at 60Hz?

March 28, 2013 | 08:59 AM - Posted by ThorAxe

Thank you very much for testing SLI and crossfire. It confirms my suspicion about my Crossfire and SLI configurations.

To give you some background I have run 8800GTX SLI, 4870x2 + 4870 Trifire, 6870 Crossfire and GTX 680 SLI.

The 4870x2 + 4870 appeared to my eyes to be okay, however 6870 Crossfire never seemed to be quite right while the GTX 680 has always appeared smooth to me. I don't recall any issues with 8800GTX SLI but that was a while ago.

March 28, 2013 | 09:28 AM - Posted by Luciano (not verified)

Error in the article:
"Smooth Vsync", "Adaptive VSync", etc, are not exclusive to nVidia.
They are available for everyone and you can use them through console commands.
The names differ due to manufacturers marketing.
But they are available since at least 2005 (rFactor game).

Various names: "double vsync", "vsync", "dynamic vsync", "vsync with double or triple buffering", "vsync with 1~5 frame queue", etc.

If the game lack the option in a menu, you can use console commands or ini files.

Radeonpro is the most famous "ini profile" creator to that use.

March 29, 2013 | 10:43 AM - Posted by Ryan Shrout

These are definitely not the same things...

March 30, 2013 | 10:06 AM - Posted by Luciano (not verified)

They're not the same code nor are available through the same ways.
But they are the same methods and persue the same results.
SLI and Crossfire are not the same thing...
But...

March 30, 2013 | 10:06 AM - Posted by Luciano (not verified)

They're not the same code nor are available through the same ways.
But they are the same methods and persue the same results.
SLI and Crossfire are not the same thing...
But...

March 28, 2013 | 09:59 AM - Posted by Luciano (not verified)

You have created a minimum quality level where the basic requirement is "more than X scan lines displayed" because of "its contribution for the animation observed".
"Animation" is measured in full frames in sequence.
Any corruption in alternating frames is animation corruption.
Thus you have to filtered half of the SLI performance too.

SLI is UNDOUBTLY superior as the data shows.
But "animation" is corrupted by ANY tearing or stutter.
Simracers always use framecap and vsync with triple buffer for that matter.

March 28, 2013 | 12:00 PM - Posted by onehourleft (not verified)

How would the framerating change on Windows 8 vs. Windows 7? Linus appears to have found FPS improvements in some games in Windows 8. http://youtu.be/YHnsfIJtZ2o . I'm wondering if runt or dropped frames are increasing or there are actual improvements in user experience.

March 29, 2013 | 10:44 AM - Posted by Ryan Shrout

We started this process on Windows 7 before moving to Windows 8 and nothing changed.

March 28, 2013 | 01:30 PM - Posted by gamerk2 (not verified)

I've speculated since 2008 that SLI/CF introduced unacceptable latency into the system, based on all the threads titled "Why do I get 90 FPS and my game is laggy?" in various forums. I'm glad someone is FINALLY really looking into this aspect of the actual rendering chain.

March 28, 2013 | 06:25 PM - Posted by Anonymous (not verified)

Hi;

Can you please test other SLI render methodologies such as split frame rendering (SFR).

I know that SFR is not officially supported by nvidia anymore but you can always force it using Nvidia Inspector as some of us sometimes do.

It would be great if you could try other render methods with AMD as well. (such as scissor or supertile methods as far as I know they can be forced using radeon pro tool)

Best Regards

March 28, 2013 | 07:59 PM - Posted by Foosh (not verified)

I play with Radeon Pro's Dynamic Vsync Control which eliminates stuttering without introducing any noticeable input lag. Vsync off will run your video cards at 100% full time generating a lot of heat and decreasing their life with minimal benefit. If you're playing for twitch response you're running at 120Hz double buffered, your latency will be 16ms max which isn't bad considering good human response is 226ms. If you're playing for maximum visual quality then screen tearing is unacceptable. Statements like "Crossfire does nothing" just creates unnecessary drama.

March 29, 2013 | 10:45 AM - Posted by Ryan Shrout

I consider it entirely necessary to make sure people see what is going on.

Input latency is our next thing to try and address though and its possible that CrossFire, even with its runt frames, is improving that more.

March 29, 2013 | 08:06 PM - Posted by steen (not verified)

I bet that's exactly what you'll find. AMD will have reduced input lag at the expense of these "runt" frames, whereas Nvidia's metering will show huge input lag. AMD were just outmanouvered by Nvidia subverting your (& other's) inverstigations on frame latency. I can see AMD introducing a latency/metering control for Xfire in future drivers. Will Nvidia do the same, I wonder? As I said a pox on AFR. SFR is an alternative with Nvidia via hack, but has its own issues.

March 29, 2013 | 08:07 PM - Posted by steen (not verified)

I bet that's exactly what you'll find. AMD will have reduced input lag at the expense of these "runt" frames, whereas Nvidia's metering will show huge input lag. AMD were just outmanouvered by Nvidia subverting your (& other's) inverstigations on frame latency. I can see AMD introducing a latency/metering control for Xfire in future drivers. Will Nvidia do the same, I wonder? As I said a pox on AFR. SFR is an alternative with Nvidia via hack, but has its own issues.

March 30, 2013 | 12:50 AM - Posted by bystander (not verified)

Given that AFR has every other frame rendered by a different card, the actual time between moving the mouse and it being displayed on the screen would not improve with crossfire/SLI over a single card.

However, how often a move initiates a frame does improve, but if those extra updates are almost at the same exact time as the single cards updates, it won't give you any benefit, so spacing will likely help.

March 30, 2013 | 01:57 AM - Posted by bystander (not verified)

Hopefully when you guys test latency, you realize that there is a polling component to consider.

If you have evenly spaced out times when you initiate a frame, your input is more evenly received. While simply taking an input when each GPU is ready may reduce latency, two in a row at almost the same exact time results redundant frames and input.

However, if those frames evenly distributed and received, more useful mouse inputs are gathered and utilized. The benefit of this may out weigh pure latency readings.

The difference may be the difference between receiving input a max of 33ms intervals, and having up to 66ms intervals with near 0 intervals at other points.

March 30, 2013 | 05:43 AM - Posted by Ryan Shrout

Interesting, hadn't considered the pros/cons of smoother or erratic input polling.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.