Review Index:
Feedback

Frame Rating Dissected: Full Details on Capture-based Graphics Performance Testing

Crysis 3 – HD 7970 versus GTX 680

View Full Size

Crysis 3 is the currently the biggest GPU hog and both the GTX 680 and HD 7970 handle it equally well at 1920x1080.  Even using our FRAPS metrics though, GTX 680s in SLI are scaling better than the HD 7970s in CrossFire.

View Full Size

The frame time plot from our Frame Rating system shows another instance though of CrossFire’s inability to keep consistent animation on the screen.  The single card configurations are pretty consistent with each other but both also exhibit some tiny bumps in frame times on a repeating pattern, obviously a particular of the CryEngine.  SLI does have some increases in frame time variance across the board with a few minor “hitches” as well.  CrossFire though appears to be alternating between 2ms frame times and up to 50ms frame times resulting in…

View Full Size

Not only a lower observed frame rate but a frame rate that is LOWER than the single card!  I can tell you from first-hand experience that this definitely was the case in play through as well; it felt slower than the single card experience.

View Full Size

SLI looks fantastic in this graph and is able to take the matching performance of the GTX 680 and HD 7970 up from 30 FPS average for the entire run to 57 FPS.

View Full Size

Our custom ISU rating tells me that the GTX 680 SLI configuration looks GREAT and only differs from the single card configurations at 95% and above percentile.  The HD 7970 in CrossFire though shows huge amounts of variance from the outset and in fact does exhibit a lot of stutter in game as well.

 

View Full Size

At the higher resolution the single card HD 7970 as a slight edge over the GTX 680 this time around and this time the scaling of CrossFire appears to be faster than SLI.

View Full Size

A quick glance at our results from the observed frame rate clearly shows that isn't the case though as only in short bursts does the CrossFire experience actually match that of the dual GTX 680s in SLI.

View Full Size

Our frame time plot indicates where the alternating frame times in Crysis 3 occur with CrossFire and how it relates to the performance of SLI.  Other than the single large hitch seen at the 12 second mark or so in SLI, the GTX 680s handle Crysis 3 much better.

View Full Size

The GTX 680s are able to scale from about 19 FPS on average to 36 FPS - a solid 89% scaling factor.  The HD 7970 GHz Edition cards are not so impressive, only going from 21 FPS to 25 FPS but that quickly falls down to just even performance with a single card.

View Full Size

Ouch, another very poor result here for HD 7970s in CrossFire with Crysis 3 at 2560x1440 with as much as 25 ms of frame variance (nearly two full refresh cycels). 

 

View Full Size

This is one of the few Eyefinity runs for the HD 7970 CrossFire configuration that was able to run until completion and generate the necessary graphs for us.  So we’ll finally get to see some interesting results.  Even at first glance, we can tell that something here isn't quite right.  According to FRAPS, the HD 7970 is pushing out more than 200 FPS to the screen on Crysis 3 at 5760x1080, which is obviously inaccurate. 

View Full Size

Ouch, there are definitely some problems here, not the least of which is the graphs poor setting of range maximums (will fix soon!)  Notice the spots on the plot of the orange line (HD 7970 CF) where there is no data – that indicates a dropped frame and a lot of frame time variance. 

View Full Size

Removing those and any runts we find that the observed FPS is actually right in line with that of the single HD 7970 graphics card.  Also, without the CrossFire misreported results out of the window, the update scale helps us see the scaling that the GTX 680s in SLI. 

View Full Size

Here again is another one of our RUN files to show you the affects of dropped frames on Eyefinity testing.  FRAPS based frame rates sky rocket up though the observed frame rate is much lower, in line with a single HD 7970 GHz Edition card.  There are some runts involved in this but the biggest factor is obviously the dropped frames (missing colors in our pattern).

View Full Size

Again, for comparison, here is the RUN graph for the GTX 680s running in SLI at 5760x1080.  Notice that the frame rate is consistent with no drops or runts.

View Full Size

Interesting results here with the CrossFire setup taking a lower position across the board in our percentile minimum frame rate graphs. Also, we see the pair of GTX 680s in SLI start out much faster than anything else tested but at the 94th percentile or so fall below that of the HD 7970.

View Full Size

Keeping mind that we are looking at pretty low frame rates across the board, the HD 7970 has the best overall result here in our ISU graph with the least amount of frame variance over the course of our 60 second run.  Obviously CrossFire has a big issue once again and see significant variance starting at the 50th percentile and it only gets worse from there.

March 29, 2013 | 08:28 AM - Posted by Steve (not verified)

Ryan: Is it possible to test AMD's CF with older drivers to see if this problem has been around for a long time or if it is a more recent problem with AMD's continuous driver upgrades to improve speeds?

March 28, 2013 | 04:00 AM - Posted by Filip Svensson (not verified)

First, a very interesting article with lots of information but..

there are some pretty big holes in the conclusions in this article.

First, the conclusion is that you observe smoother graphics if you have lots of GPU frames inside each display frame. This is just not true. If you accept this the following conclusions will also be not true. That runt and dropped frames always affect the perceived frame rate.
I will give you an example that proves this:

Say that the graphics card is able to produce two frames (letters) for each display frame (numbers). So if the the output will look like this to the display:
1A,2C,3E,4G etc.
then you would have an optimal smoothness (as long as the output from the game engine is constant). This even if you in this case drops 1 gpu frame each display frame. If in stead it would be like the following:
1A,2D,3E,4H
then you could perhaps notice some unevenness even though you still have the same number of drops. I doubt it but it could be possible. Someone should do a video and see if this behaviour could be detected by the human eye :)

If you instead have drop frames when the gpu frame is spread across a multiple of display frames, then you would potentially have a serious issue with stuttering. But that is not what you are measuring here. Ex. 1A,2A,3C,4C

One conclusion to this is that you Observed FPS is totally wrong. Both from what I write above and that you are not limiting this figure to the refresh rate of your monitor. Capping the graphs to 60 frames would make it some way better. Alternately give it some other headline for example (and here comes the flame bite):
NVIDIA sponsored measuring technique to make there technology to look good

March 29, 2013 | 07:41 AM - Posted by Ryan Shrout

I don't follow your letters/numbers analogy at all but I can assure we are confident in our results and experiences here.

March 28, 2013 | 04:10 AM - Posted by Martin Trautvetter (not verified)

Very insightful article full of interesting data points, thanks for all the work that went into this!

I wonder if you plan to expand this testing down to AMD's APU and Crossfired APUs, as well as Intel iGPUs in the future, I know it's not as flashy as the 'big guns', but that's where a huge chunk of the market is going and I'm curious if there are skeletons to find in that closet, too.

March 29, 2013 | 07:41 AM - Posted by Ryan Shrout

I do plan on using this method whenever possible going forward.  Laptops are a bit more of a pain since we'd have to external displays, but we are going to experiment.

March 30, 2013 | 03:25 AM - Posted by Martin Trautvetter

Cool, can't wait for your findings!

_
btw: Twich is SO much better when you guys are in the same room, really enjoyed this week's episode!

March 30, 2013 | 03:26 AM - Posted by Martin Trautvetter

Cool, can't wait for your findings!

_
btw: Twich is SO much better when you guys are in the same room, really enjoyed this week's episode!

March 28, 2013 | 04:58 AM - Posted by steen (not verified)

Nice work Ryan. The key is the capture card. Input metering from eg FRAPs to output at the monitor. What you're missing is that games seem to use use single frame timing to determine simulation engine steps. No smoothing to account for any overheads - at all.

This whole AFR caper is just a sham, though. NUMA-esque multi gpu designs are the only way to do it. Simple 3dfx SLI was better at distributing load, but in the days of DX11+, load balancing is tricky. V-Sync with triple buffering is also an option, but input lag is a problem any way you slice it.

I do have concerns over Nvidia's overlay layer & software, though. They do kick an own goal with the GTX68 being slower than a 7970, but that's been known for a while now. They're banking on SLI & Titan. Your comments also spruik Nvidia, rather than just give facts.

March 29, 2013 | 07:42 AM - Posted by Ryan Shrout

Thanks, appreciate it!

I have another version of the overlay from a third party we are testing out as well.

March 30, 2013 | 07:03 AM - Posted by Luciano (not verified)

They're not the same code nor are available through the same ways.
But they are the same methods and persue the same results.
SLI and Crossfire are not the same thing...
But...

March 28, 2013 | 05:08 AM - Posted by steen (not verified)

P.S. Did you get a visit from Tom Petersen, too? ;)

March 30, 2013 | 07:22 AM - Posted by John Doe (not verified)

He gets a visit from me everyday.

March 28, 2013 | 05:37 AM - Posted by steen (not verified)

P.P.S. (Sorry) Haven't you fixed the sampling rate of the capture card at 60Hz?

March 28, 2013 | 05:59 AM - Posted by ThorAxe

Thank you very much for testing SLI and crossfire. It confirms my suspicion about my Crossfire and SLI configurations.

To give you some background I have run 8800GTX SLI, 4870x2 + 4870 Trifire, 6870 Crossfire and GTX 680 SLI.

The 4870x2 + 4870 appeared to my eyes to be okay, however 6870 Crossfire never seemed to be quite right while the GTX 680 has always appeared smooth to me. I don't recall any issues with 8800GTX SLI but that was a while ago.

March 28, 2013 | 06:28 AM - Posted by Luciano (not verified)

Error in the article:
"Smooth Vsync", "Adaptive VSync", etc, are not exclusive to nVidia.
They are available for everyone and you can use them through console commands.
The names differ due to manufacturers marketing.
But they are available since at least 2005 (rFactor game).

Various names: "double vsync", "vsync", "dynamic vsync", "vsync with double or triple buffering", "vsync with 1~5 frame queue", etc.

If the game lack the option in a menu, you can use console commands or ini files.

Radeonpro is the most famous "ini profile" creator to that use.

March 29, 2013 | 07:43 AM - Posted by Ryan Shrout

These are definitely not the same things...

March 30, 2013 | 07:06 AM - Posted by Luciano (not verified)

They're not the same code nor are available through the same ways.
But they are the same methods and persue the same results.
SLI and Crossfire are not the same thing...
But...

March 30, 2013 | 07:06 AM - Posted by Luciano (not verified)

They're not the same code nor are available through the same ways.
But they are the same methods and persue the same results.
SLI and Crossfire are not the same thing...
But...

March 28, 2013 | 06:59 AM - Posted by Luciano (not verified)

You have created a minimum quality level where the basic requirement is "more than X scan lines displayed" because of "its contribution for the animation observed".
"Animation" is measured in full frames in sequence.
Any corruption in alternating frames is animation corruption.
Thus you have to filtered half of the SLI performance too.

SLI is UNDOUBTLY superior as the data shows.
But "animation" is corrupted by ANY tearing or stutter.
Simracers always use framecap and vsync with triple buffer for that matter.

March 28, 2013 | 09:00 AM - Posted by onehourleft (not verified)

How would the framerating change on Windows 8 vs. Windows 7? Linus appears to have found FPS improvements in some games in Windows 8. http://youtu.be/YHnsfIJtZ2o . I'm wondering if runt or dropped frames are increasing or there are actual improvements in user experience.

March 29, 2013 | 07:44 AM - Posted by Ryan Shrout

We started this process on Windows 7 before moving to Windows 8 and nothing changed.

March 28, 2013 | 10:30 AM - Posted by gamerk2 (not verified)

I've speculated since 2008 that SLI/CF introduced unacceptable latency into the system, based on all the threads titled "Why do I get 90 FPS and my game is laggy?" in various forums. I'm glad someone is FINALLY really looking into this aspect of the actual rendering chain.

March 28, 2013 | 03:25 PM - Posted by Anonymous (not verified)

Hi;

Can you please test other SLI render methodologies such as split frame rendering (SFR).

I know that SFR is not officially supported by nvidia anymore but you can always force it using Nvidia Inspector as some of us sometimes do.

It would be great if you could try other render methods with AMD as well. (such as scissor or supertile methods as far as I know they can be forced using radeon pro tool)

Best Regards

March 28, 2013 | 04:59 PM - Posted by Foosh (not verified)

I play with Radeon Pro's Dynamic Vsync Control which eliminates stuttering without introducing any noticeable input lag. Vsync off will run your video cards at 100% full time generating a lot of heat and decreasing their life with minimal benefit. If you're playing for twitch response you're running at 120Hz double buffered, your latency will be 16ms max which isn't bad considering good human response is 226ms. If you're playing for maximum visual quality then screen tearing is unacceptable. Statements like "Crossfire does nothing" just creates unnecessary drama.

March 29, 2013 | 07:45 AM - Posted by Ryan Shrout

I consider it entirely necessary to make sure people see what is going on.

Input latency is our next thing to try and address though and its possible that CrossFire, even with its runt frames, is improving that more.

March 29, 2013 | 05:06 PM - Posted by steen (not verified)

I bet that's exactly what you'll find. AMD will have reduced input lag at the expense of these "runt" frames, whereas Nvidia's metering will show huge input lag. AMD were just outmanouvered by Nvidia subverting your (& other's) inverstigations on frame latency. I can see AMD introducing a latency/metering control for Xfire in future drivers. Will Nvidia do the same, I wonder? As I said a pox on AFR. SFR is an alternative with Nvidia via hack, but has its own issues.

March 29, 2013 | 05:07 PM - Posted by steen (not verified)

I bet that's exactly what you'll find. AMD will have reduced input lag at the expense of these "runt" frames, whereas Nvidia's metering will show huge input lag. AMD were just outmanouvered by Nvidia subverting your (& other's) inverstigations on frame latency. I can see AMD introducing a latency/metering control for Xfire in future drivers. Will Nvidia do the same, I wonder? As I said a pox on AFR. SFR is an alternative with Nvidia via hack, but has its own issues.

March 29, 2013 | 09:50 PM - Posted by bystander (not verified)

Given that AFR has every other frame rendered by a different card, the actual time between moving the mouse and it being displayed on the screen would not improve with crossfire/SLI over a single card.

However, how often a move initiates a frame does improve, but if those extra updates are almost at the same exact time as the single cards updates, it won't give you any benefit, so spacing will likely help.

March 29, 2013 | 10:57 PM - Posted by bystander (not verified)

Hopefully when you guys test latency, you realize that there is a polling component to consider.

If you have evenly spaced out times when you initiate a frame, your input is more evenly received. While simply taking an input when each GPU is ready may reduce latency, two in a row at almost the same exact time results redundant frames and input.

However, if those frames evenly distributed and received, more useful mouse inputs are gathered and utilized. The benefit of this may out weigh pure latency readings.

The difference may be the difference between receiving input a max of 33ms intervals, and having up to 66ms intervals with near 0 intervals at other points.

March 30, 2013 | 02:43 AM - Posted by Ryan Shrout

Interesting, hadn't considered the pros/cons of smoother or erratic input polling.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.