How Games Work
Because of the complexity and sheer amount of data we have gathered using our Frame Rating performance methodology, we are breaking it up into several articles that each feature different GPU comparisons. Here is the schedule:
- 3/27: Frame Rating Dissected: Full Details on Capture-based Graphics Performance Testing
- 3/27: Radeon HD 7970 GHz Edition vs GeForce GTX 680 (Single and Dual GPU)
- 3/30: AMD Radeon HD 7990 vs GeForce GTX 690 vs GeForce GTX Titan
- 4/2: Radeon HD 7950 vs GeForce GTX 660 Ti (Single and Dual GPU)
- 4/5: Radeon HD 7870 GHz Edition vs GeForce GTX 660 (Single and Dual GPU)
- 4/16: Frame Rating: Visual Effects of Vsync on Gaming Animation
The process of testing games and graphics has been evolving even longer than I have been a part of the industry: 14+ years at this point. That transformation in benchmarking has been accelerating for the last 12 months. Typical benchmarks test some hardware against some software and look at the average frame rate which can be achieved. While access to frame time has been around for nearly the full life of FRAPS, it took an article from Scott Wasson at the Tech Report to really get the ball moving and investigate how each frame contributes to the actual user experience. I immediately began research into testing actual performance perceived by the user, including the "microstutter" reported by many in PC gaming, and pondered how we might be able to test for this criteria even more accurately.
The result of that research is being fully unveiled today in what we are calling Frame Rating – a completely new way of measuring and validating gaming performance.
The release of this story for me is like the final stop on a journey that has lasted nearly a complete calendar year. I began to release bits and pieces of this methodology starting on January 3rd with a video and short article that described our capture hardware and the benefits that directly capturing the output from a graphics card would bring to GPU evaluation. After returning from CES later in January, I posted another short video and article that showcased some of the captured video and stepping through a recorded file frame by frame to show readers how capture could help us detect and measure stutter and frame time variance.
Finally, during the launch of the NVIDIA GeForce GTX Titan graphics card, I released the first results from our Frame Rating system and discussed how certain card combinations, in this case CrossFire against SLI, could drastically differ in perceived frame rates and performance while giving very similar average frame rates. This article got a lot more attention than the previous entries and that was expected – this method doesn’t attempt to dismiss other testing options but it is going to be pretty disruptive. I think the remainder of this article will prove that.
Today we are finally giving you all the details on Frame Rating; how we do it, what we learned and how you should interpret the results that we are providing. I warn you up front though that this is not an easy discussion and while I am doing my best to explain things completely, there are going to be more questions going forward and I want to see them all! There is still much to do regarding graphics performance testing, even after Frame Rating becomes more common. We feel that the continued dialogue with readers, game developers and hardware designers is necessary to get it right.
Below is our full video that features the Frame Rating process, some example results and some discussion on what it all means going forward. I encourage everyone to watch it but you will definitely need the written portion here to fully understand this transition in testing methods. Subscribe to your YouTube channel if you haven't already!
Subject: Graphics Cards | March 26, 2013 - 07:41 PM | Jeremy Hellstrom
Tagged: nvidia, hd 7790, gtx 650 ti boost, gtx 650 Ti, gpu boost, gk106
Why Boost you may ask? If you guessed that NVIDIA added their new Boost Clock feature to the card you should win a prize as that is exactly what makes the GTX 650Ti special. With a core GPU speed of 980MHz, boosting to 1033MHz and beyond this card is actually aimed to compete with AMD's HD7850, not the newly released HD7790, at least the 2GB model is. Along with the boost in clock comes a wider memory pipeline and a corresponding increase in ROPs. The 2GB model should be about $170, right on the cusp between value and mid-range but is the price worth admission? Get a look at the performance at [H]ard|OCP.
"NVIDIA is launching the GeForce GTX 650 Ti Boost today. This video card is priced in the $149-$169 price range, and should give the $150 price segment another shakedown. Does it compare to the Radeon HD 7790, or is it on the level of the more expensive Radeon HD 7850? We will find out in today's latest games, you may be surprised."
Here are some more Graphics Card articles from around the web:
- Nvidia's GeForce GTX 650 Ti Boost @ The Tech Report
- Nvidia GTX 650 Ti Boost 2GB @ LanOC Reviews
- NVIDIA and EVGA GeForce GTX 650 Ti BOOST Video Card Review @ Legit Reviews
- NVIDIA GeForce GTX 650Ti Boost Review @ OCC
- Nvidia GeForce GTX 650 Ti Boost @ Hardware.info
- Nvidia GeForce GTX 650 Ti Boost @ Bjorn3D
- NVIDIA Geforce GTX 650Ti Boost 2GB Edition Review @Hi Tech Legion
- EVGA GTX 650Ti BOOST 2GB Superclocked Review @Hi Tech Legion
- NVIDIA GeForce GTX 650 Ti Boost 2GB @ Tweaktown
- NVIDIA GeForce GTX 650 Ti Boost 2 GB @ techPowerUp
- NVIDIA GeForce GTX 650 Ti BOOST @ Benchmark Reviews
- NVIDIA GTX 650 Ti Boost 2GB Review @ Hardware Canucks
- NVIDIA Chips Comparison Table @ Hardware Secrets
- AMD ATI Chips Comparison Table @ Hardware Secrets
- Workstation Graphics Card Comparison Guide @ TechARP
- PowerColor Radeon HD 7790 Turbo Duo Review @ OCC
- PowerColor HD 7790 Turbo Duo 1 GB @ techPowerUp
- Sapphire HD7950 MAC Edition @ Kitguru
Subject: General Tech, Systems | March 26, 2013 - 06:18 PM | Tim Verry
Tagged: workstation, nvidia, GTC 2013, BOXX, 3dboxx 8950
Boxx Technologies recently launched a new multi-GPU workstation called the 3DBoxx 8950. It is aimed at professionals that need a fast system with beefy GPU accelerator cards that they can design and render at the same time. The 8950 is intended to be used with applications like Autodesk, Dassault, NVIDIA iray, and V-Ray (et al).
The Boxx 3DBoxx 8950 features two liquid cooled Intel Xeon Ed-2600 processors (2GHz, 16 cores, 32 threads), up to 512GB of system memory (16 DIMM slots), and seven PCI-E slots (four of which accept dual slot GPUs, the remaining three are spaced for single slot cards). A 1250W power suppy (80 PLUS Gold) powers the workstation. An example configuration would include three Tesla K20 cards and one Quadro K5000. The Tesla cards would handle the computation while the Quadro can power the multi-display ouput. The chassis has room for eight 3.5" hard drives and a single externally-accessible 5.25" drive. The 8950 workstation can be loaded with either the Windows or Linux operating system.
Rear IO on the 8950 workstation includes:
- 5 x audio jacks
- 1 x optical in/out
- 4 x USB 2.0 ports
- 1 x serial port
- 2 x RJ45 jacks, backed by Intel Gigabit NICs
The system is available now, with pricing available upon request. You can find the full list of specifications and supported hardware configurations in this spec sheet (PDF).
Subject: General Tech | March 26, 2013 - 01:49 PM | Jeremy Hellstrom
Tagged: tegra 4, tegra, shield, nvidia, Tegrazone
Remember Project Shield from CES and before? The Inquirer has managed to get their hands on an actual console at the Game Developers Conference and played a bit of Need For Speed streamed from a PC onto the Shield. Project Shield its self is a Tegra 4 powered controller running Android 4.2 with a 5" 720p display attached and wireless connectivity. The actual game is streamed wireless from a PC with a Kepler GPU via the Tegrazone application, so the real performance limit occurs from latency, similar to the company once known as Onlive. While The Inq was not quite ready to toss their money at Project Shield, but it was close.
"CHIP DESIGNER Nvidia caused something of a stir at CES when it announced the Project Shield handheld games console, and with its launch nearing, the firm is letting people try its first own-brand game console, which we managed to get our hands on at this week's GDC gaming conference in San Francisco."
Here is some more Tech News from around the web:
- Installing GLaDOS in the ceiling of your house @ Hack a Day
- Maybe don't install that groovy pirated Android keyboard @ The Register
- Backing your Apple Mac up with Time Machine @ Tweaktown
- The Best Servers for Linux in 2013 @ Linux.com
- Ninjalane Podcast - GTX Titan, Free 2 Play and Cooler Master Interview
The GTX 650 Ti Gets Boost and More Memory
In mid-October NVIDIA released the GeForce GTX 650 Ti based on GK106, the same GPU that powers the GTX 660 though with fewer enabled CUDA cores and GPC units. At the time we were pretty impressed with the 650 Ti:
The GTX 650 Ti has more in common with the GTX 660 than it does the GTX 650, both being based on the GK106 GPU, but is missing some of the unique features that NVIDIA has touted of the 600-series cards like GPU Boost and SLI.
Today's release of the GeForce GTX 650 Ti BOOST actually addresses both of those missing features by moving even closer to the specification sheet found on the GTX 660 cards.
Our video review of the GTX 650 Ti BOOST and Radeon HD 7790.
Option 1: Two GPCs with Four SMXs
Just like we saw with the original GTX 650 Ti, there are two different configurations of the GTX 650 Ti BOOST; both have the same primary specifications but will differ in which SMX is disabled from the full GK106 ASIC. The newer version will still have 768 CUDA cores but clock speeds will increase from 925 MHz to 980 MHz base and 1033 MHz typical boost clock. Texture unit count remains the same at 64.
Subject: General Tech | March 25, 2013 - 01:30 PM | Jeremy Hellstrom
Tagged: bioshock infinite, geforce, GeForce 314.22, nvidia, gaming
BioShock Infinite launches tomorrow and promises to be an exciting third installment to the award-winning franchise.
GeForce gamers today can get ready for a great Day 1 experience with BioShock Infinite by upgrading to our new GeForce 314.22 Game Ready drivers. These drivers are Microsoft WHQL-certified and available for download on GeForce.com.
Our software engineers have been working with Irrational Games over the past two years to optimize BioShock Infinite for GeForce users and, as a result, these drivers offer game-changing performance increases of up to 41 percent.
Also, with a single click in GeForce Experience, gamers can optimize the image quality in BioShock Infinite and have it instantly tuned to the capability of their PC’s hardware.
GeForce 314.22 drivers also offer several other significant performance increases in other current games. For more details, refer to the release highlights on the driver download pages and read the GeForce driver article on GeForce.com.
GeForce 314.22 Highlights
Delivers GeForce Game Ready experience for BioShock Infinite:
- Up to 41% faster performance
- Optimal game settings with GeForce Experience
- Microsoft WHQL-certified
Increases gaming performance in other popular titles:
- Up to 60% faster in Tomb Raider
- Up to 23% faster in Sniper Elite V2
- Up to 13% faster in Sleeping Dogs
- Adds new SLI and 3D Vision profiles for upcoming games.
Subject: Editorial, General Tech, Processors, Shows and Expos | March 20, 2013 - 06:26 PM | Scott Michaud
Tagged: windows rt, nvidia, GTC 2013
NVIDIA develops processors, but without an x86 license they are only able to power ARM-based operating systems. When it comes to Windows, that means Windows Phone or Windows RT. The latter segment of the market has disappointing sales according to multiple OEMs, which Microsoft blames them for, but the jolly green GPU company is not crying doomsday.
NVIDIA just skimming the Surface RT, they hope.
As reported by The Verge, NVIDIA CEO Jen-Hsun Huang was optimistic that Microsoft would eventually let Windows RT blossom. He noted how Microsoft very often "gets it right" at some point when they push an initiative. And it is true, Microsoft has a history of turning around perceived disasters across a variety of devices.
They also have a history of, as they call it, "knifing the baby."
I think there is a very real fear for some that Microsoft could consider Intel's latest offerings as good enough to stop pursuing ARM. Of course, the more the pursue ARM, the more their business model will rely upon the-interface-formerly-known-as-Metro and likely all of its certification politics. As such, I think it is safe to say that I am watching the industry teeter on a fence with a bear on one side and a pack of rabid dogs on the other. On the one hand, Microsoft jumping back to Intel would allow them to perpetuate the desktop and all of the openness it provides. On the other hand, even if they stick with Intel they likely will just kill the desktop anyway, for the sake of user confusion and the security benefits of cert. We might just have less processor manufacturers when they do that.
So it could be that NVIDIA is confident that Microsoft will push Windows RT, or it could be that NVIDIA is pushing Microsoft to continue to develop Windows RT. Frankly, I do not know which would be better... or more accurately, worse.
Subject: General Tech, Graphics Cards | March 20, 2013 - 01:47 PM | Tim Verry
Tagged: tesla, tegra 3, supercomputer, pedraforca, nvidia, GTC 2013, GTC, graphics cards, data centers
There is a lot of talk about heterogeneous computing at GTC, in the sense of adding graphics cards to servers. If you have HPC workloads that can benefit from GPU parallelism, adding GPUs gives you computing performance in less physical space, and using less power, than a CPU only cluster (for equivalent TFLOPS).
However, there was a session at GTC that actually took things to the opposite extreme. Instead of a CPU only cluster or a mixed cluster, Alex Ramirez (leader of Heterogeneous Architectures Group at Barcelona Supercomputing Center) is proposing a homogeneous GPU cluster called Pedraforca.
Pedraforca V2 combines NVIDIA Tesla GPUs with low power ARM processors. Each node is comprised of the following components:
- 1 x Mini-ITX carrier board
1 x Q7 module (which hosts the ARM SoC and memory)
- Current config is one Tegra 3 @ 1.3GHz and 2GB DDR2
- 1 x NVIDIA Tesla K20 accelerator card (1170 GFLOPS)
- 1 x InfiniBand 40Gb/s card (via Mellanox ConnectX-3 slot)
- 1 x 2.5" SSD (SATA 3 MLC, 250GB)
The ARM processor is used solely for booting the system and facilitating GPU communication between nodes. It is not intended to be used for computing. According to Dr. Ramirez, in situations where running code on a CPU would be faster, it would be best to have a small number of Intel Xeon powered nodes to do the CPU-favorable computing, and then offload the parallel workloads to the GPU cluster over the InfiniBand connection (though this is less than ideal, Pedraforca would be most-efficient with data-sets that can be processed solely on the Tesla cards).
While Pedraforca is not necessarily locked to NVIDIA's Tegra hardware, it is currently the only SoC that meets their needs. The system requires the ARM chip to have PCI-E support. The Tegra 3 SoC has four PCI-E lanes, so the carrier board is using two PLX chips to allow the Tesla and InfiniBand cards to both be connected.
The researcher stated that he is also looking forward to using NVIDIA's upcoming Logan processor in the Pedraforca cluster. It will reportedly be possible to upgrade existing Pedraforca clusters with the new chips by replacing the existing (Tegra 3) Q7 module with one that has the Logan SoC when it is released.
Pedraforca V2 has an initial cluster size of 64 nodes. While the speaker was reluctant to provide TFLOPS performance numbers, as it would depend on the workload, with 64 Telsa K20 cards, it should provide respectable performance. The intent of the cluster is to save power costs by using a low power CPU. If your sever kernel and applications can run on GPUs alone, there are noticeable power savings to be had by switching from a ~100W Intel Xeon chip to a lower-power (approximately 2-3W) Tegra 3 processor. If you have a kernel that needs to run on a CPU, it is recommended to run the OS on an Intel server and transfer just the GPU work to the Pedraforca cluster. Each Pedraforca node is reportedly under 300W, with the Tesla card being the majority of that figure. Despite the limitations, and niche nature of the workloads and software necessary to get the full power-saving benefits, Pedraforca is certainly an interesting take on a homogeneous server cluster!
In another session relating to the path to exascale computing, power use in data centers was listed as one of the biggest hurdles to getting to Exaflop-levels of performance, and while Pedraforca is not the answer to Exascale, it should at least be a useful learning experience at wringing the most parallelism out of code and pushing GPGPU to the limits. And that research will help other clusters use the GPUs more efficiently as researchers explore the future of computing.
The Pedraforca project built upon research conducted on Tibidabo, a multi-core ARM CPU cluster, and CARMA (CUDA on ARM development kit) which is a Tegra SoC paired with an NVIDIA Quadro card. The two slides below show CARMA benchmarks and a Tibidabo cluster (click on image for larger version).
Stay tuned to PC Perspective for more GTC 2013 coverage!
Subject: General Tech, Graphics Cards | March 19, 2013 - 06:52 PM | Tim Verry
Tagged: GTC 2013, tyan, HPC, servers, tesla, kepler, nvidia
Server platform manufacturer TYAN is showing off several of its latest servers aimed at the high performance computing (HPC) market. The new servers range in size from 2U to 4U chassis and hold up to 8 Kepler-based Tesla accelerator cards. The new product lineup consists of two motherboards and three bare-bones systems. The S7055 and S7056 are the motherboards while the FT77-B7059, TA77-B7061, and FT48-B7055.
The TA77-B7061 is the smallest system, with support for two Intel Xeon E5-2600 processors and four Kepler-based Tesla accelerator cards. The FT48-B7055 has si7056 specifications but is housed in a 4U chassis. Finally, the FT77-B7059 is a 4U system with support for two Intel Xeon E5-2600 processors, and up to eight Tesla accelerator cards. The S7055 supports a maximum of 4 GPUs while the S7056 can support two Tesla cards, though these are bare boards so you will have to supply your own cards, processors, and RAM (of course).
According to TYAN, the new Kepler-based HPC systems will be available in Q2 2013, though there is no word on pricing yet.
Stay tuned to PC Perspective for further GTC 2013 Coverage!
Subject: General Tech, Graphics Cards | March 19, 2013 - 02:55 PM | Tim Verry
Tagged: unified virtual memory, ray tracing, nvidia, GTC 2013, grid vca, grid, graphics cards
Today, NVIDIA's CEO Jen-Hsun Huang stepped on stage to present the GTC keynote. In the presentation (which was live streamed on the GTC website and archived here.), NVIDIA discussed five major points, looking back over 2013 and into the future of its mobile and professional products. In addition to the product roadmap, NVIDIA discussed the state of computer graphics and GPGPU software. Remote graphics and GPU virtualization was also on tap. Finally, towards the end of the Keynote, the company revealed its first appliance with the NVIDIA GRID VCA. The culmination of NVIDIA's GRID and GPU virtualization technology, the VCA is a device that hosts up to 16 virtual machines which each can tap into one of 16 Kepler-based graphics processors (8 cards, 16 GPUs per card) to fully hardware accelerate software running of the VCA. Three new mobile Tegra parts and two new desktop graphics processors were also hinted at, with improvements to power efficiency and performance.
On the desktop side of things, NVIDIA's roadmap included two new GPUs. Following Kepler, NVIDIA will introduce Maxwell and Volta. Maxwell will feature a new virtualized memory technology called Unified Virtual Memory. This tech will allow both the CPU and GPU to read from a single (virtual) memory store. Much as with the promise of AMD's Kaveri APU, the Unified Virtual Meory will result in speed improvements in heterogeneous applications because data will not have to be copied to/from the GPU and CPU in order for the data to be processed. Server applications will really benefit from the shared memory tech. NVIDIA did not provide details, but from the sound of it, the CPU and GPU both continue to write to their own physical memory, but their is a layer of virtualized memory on top of that, that will allow the two (or more) different processors to read from each other's memory store.
Following Maxwell, Volta will be a physically smaller chip with more transistors (likely a smaller process node). In addition to the power efficiency improvements over Maxwell, it steps up the memory bandwidth significantly. NVIDIA will use TSV (through silicon via) technology to physically mount the graphics DRAM chips over the GPU (attached to the same silicon substrate electrically). According to NVIDIA, this new TSV-mounted memory will achieve up to 1 Terabytes/second of memory bandwidth, which is a notable increase over existing GPUs.
NVIDIA continues to pursue the mobile market with its line of Tegra chips that pair an ARM CPU, NVIDIA GPU, and SDR modem. Two new mobile chips called Logan and Parker will follow Tegra 4. Both new chips will support the full CUDA 5 stack and OpenGL 4.3 out of the box. Logan will feature a Kepler-based graphics porcessor on the chip that can “everything a modern computer ought to do” according to NVIDIA. Parker will have a yet-to-be-revealed graphics processor (Kepler successor). This mobile chip will utilize 3D FinFET transistors. It will have a greater number of transistors in a smaller package than previous Tegra parts (it will be about the size of a dime), and NVIDIA also plans to ramp up the frequency to wrangle more performance out of the mobile chip. NVIDIA has stated that Logan silicon should be completed towards the end of 2013, with the mobile chips entering production in 2014.
Interestingly, Logan has a sister chip that NVIDIA is calling Kayla. This mobile chip is capable of running ray tracing applications and features OpenGL geometric shaders. It can support GPGPU code and will be compatible with Linux.
NVIDIA has been pushing CUDA for several years, now. The company has seen some respectable adoption rates, by growing from 1 Tesla supercomputer in 2008 to its graphics cards being used in 50 supercomputers, with 500 million CUDA processors on the market. There are now allegedly 640 universities working with CUDA and 37,000 academic papers on CUDA.
Finally, NVIDIA's hinted-at new product announcement was the NVIDIA VCA, which is a GPU virtualization appliance that hooks into the network and can deliver up to 16 virtual machines running independant applications. These GPU accelerated workspaces can be presneted to thin clinets over the netowrk by installing the GRID client software on users' workstations. The specifications of the GRID VCA is rather impressive, as well.
The GRID VCA features:
- 2 x Intel Xeon processors with 16 threads each (32 total threads)
- 192GB to 384GB of system memory
- 8 Kepler-based graphics cards, with two GPUs each (16 total GPUs)
- 16 x GPU-accelerated virtual machines
The GRID VCA fits into a 4U case. It can deliver remote graphics to workstations, and is allegedly fast enough to deliver gpu accelerated software that is equivalent to having it run on the local machine (at least over LAN). The GRID Visual Computing Appliance will come in two flavors at different price points. The first will have 8 Kepler GPUs with 4GB of memory each, 16 CPU threads, and 192GB of system memory for $24,900. The other version will cost $34,900 and features 16 Kepler GPUs (4GB memory), 32 CPU threads, and 384GB system memory. On top of the hardware cost, NVIDIA is also charging licensing fees. While both GRID VCA devices can support unlimited devices, the licenses cost $2,400 and $4,800 per year respectively.
Overall, it was an interesting keynote, and the proposed graphics cards look to be offering up some unique and necessary features that should help hasten the day of ubiquitous general purpose GPU computing. The Unified Virtual Memory was something I was not expecting, and it will be interesting to see how AMD responds. AMD is already promising shared memory in its Kaveri APU, but I am interested to see the details of how NVIDIA and AMD will accomplish shared memory with dedicated grapahics cards (and whether CrossFire/SLI setups will all have a single shared memory pool)..
Stay tuned to PC Perspective for more GTC 2013 Coverage!