Author:
Manufacturer: Various

Background and setup

A couple of weeks back, during the excitement surrounding the announcement of the GeForce GTX 1080 Ti graphics card, NVIDIA announced an update to its performance reporting project known as FCAT to support VR gaming. The updated iteration, FCAT VR as it is now called, gives us the first true ability to not only capture the performance of VR games and experiences, but the tools with which to measure and compare.

Watch ths video walk through of FCAT VR with me and NVIDIA's Tom Petersen

I already wrote an extensive preview of the tool and how it works during the announcement. I think it’s likely that many of you overlooked it with the noise from a new GPU, so I’m going to reproduce some of it here, with additions and updates. Everyone that attempts to understand the data we will be presenting in this story and all VR-based tests going forward should have a baseline understanding of the complexity of measuring VR games. Previous tools don’t tell the whole story, and even the part they do tell is often incomplete.

If you already know how FCAT VR works from reading the previous article, you can jump right to the beginning of our results here.

Measuring and validating those claims has proven to be a difficult task. Tools that we used in the era of standard PC gaming just don’t apply. Fraps is a well-known and well-understood tool for measuring frame rates and frame times utilized by countless reviewers and enthusiasts, but Fraps lacked the ability to tell the complete story of gaming performance and experience. NVIDIA introduced FCAT and we introduced Frame Rating back in 2013 to expand the capabilities that reviewers and consumers had access to. Using more sophisticated technique that includes direct capture of the graphics card output in uncompressed form, a software-based overlay applied to each frame being rendered, and post-process analyzation of that data, we could communicate the smoothness of a gaming experience, better articulating it to help gamers make purchasing decisions.

vrpipe1.png

For VR though, those same tools just don’t cut it. Fraps is a non-starter as it measures frame rendering from the GPU point of view and completely misses the interaction between the graphics system and the VR runtime environment (OpenVR for Steam/Vive and OVR for Oculus). Because the rendering pipeline is drastically changed in the current VR integrations, what Fraps measures is completely different than the experience the user actually gets in the headset. Previous FCAT and Frame Rating methods were still viable but the tools and capture technology needed to be updated. The hardware capture products we used since 2013 were limited in their maximum bandwidth and the overlay software did not have the ability to “latch in” to VR-based games. Not only that but measuring frame drops, time warps, space warps and reprojections would be a significant hurdle without further development. 

vrpipe2.png

vrpipe3.png

NVIDIA decided to undertake the task of rebuilding FCAT to work with VR. And while obviously the company is hoping that it will prove its claims of performance benefits for VR gaming, it should not be overlooked the investment in time and money spent on a project that is to be open sourced and free available to the media and the public.

vlcsnap-2017-02-27-11h31m17s057.png

NVIDIA FCAT VR is comprised of two different applications. The FCAT VR Capture tool runs on the PC being evaluated and has a similar appearance to other performance and timing capture utilities. It uses data from Oculus Event Tracing as a part of the Windows ETW and SteamVR’s performance API, along with NVIDIA driver stats when used on NVIDIA hardware to generate performance data. It will and does work perfectly well on any GPU vendor’s hardware though with the access to the VR vendor specific timing results.

fcatvrcapture.jpg

Continue reading our first look at VR performance testing with FCAT VR!!

Author:
Manufacturer: NVIDIA

Flagship Performance Gets Cheaper

UPDATE! If you missed our launch day live stream, you can find the reply below:

It’s a very interesting time in the world of PC gaming hardware. We just saw the release of AMD’s Ryzen processor platform that shook up the processor market for the first time in a decade, AMD’s Vega architecture has been given the brand name “Vega”, and the anticipation for the first high-end competitive part from AMD since Hawaii grows as well. AMD was seemingly able to take advantage of Intel’s slow innovation pace on the processor and it was hoping to do the same to NVIDIA on the GPU. NVIDIA’s product line has been dominant in the mid and high-end gaming market since the 900-series with the 10-series products further cementing the lead.

box1.jpg

The most recent high end graphics card release came in the form of the updated Titan X based on the Pascal architecture. That was WAY back in August of 2016 – a full seven months ago! Since then we have seen very little change at the top end of the product lines and what little change we did see came from board vendors adding in technology and variation on the GTX 10-series.

Today we see the release of the new GeForce GTX 1080 Ti, a card that offers only a handful of noteworthy technological changes but instead is able to shake up the market by instigating pricing adjustments to make the performance offers more appealing, and lowering the price of everything else.

The GTX 1080 Ti GP102 GPU

I already wrote about the specifications of the GPU in the GTX 1080 Ti when it was announced last week, so here’s a simple recap.

  GTX 1080 Ti Titan X (Pascal) GTX 1080 GTX 980 Ti TITAN X GTX 980 R9 Fury X R9 Fury R9 Nano
GPU GP102 GP102 GP104 GM200 GM200 GM204 Fiji XT Fiji Pro Fiji XT
GPU Cores 3584 3584 2560 2816 3072 2048 4096 3584 4096
Base Clock 1480 MHz 1417 MHz 1607 MHz 1000 MHz 1000 MHz 1126 MHz 1050 MHz 1000 MHz up to 1000 MHz
Boost Clock 1600 MHz 1480 MHz 1733 MHz 1076 MHz 1089 MHz 1216 MHz - - -
Texture Units 224 224 160 176 192 128 256 224 256
ROP Units 88 96 64 96 96 64 64 64 64
Memory 11GB 12GB 8GB 6GB 12GB 4GB 4GB 4GB 4GB
Memory Clock 11000 MHz 10000 MHz 10000 MHz 7000 MHz 7000 MHz 7000 MHz 500 MHz 500 MHz 500 MHz
Memory Interface 352-bit 384-bit G5X 256-bit G5X 384-bit 384-bit 256-bit 4096-bit (HBM) 4096-bit (HBM) 4096-bit (HBM)
Memory Bandwidth 484 GB/s 480 GB/s 320 GB/s 336 GB/s 336 GB/s 224 GB/s 512 GB/s 512 GB/s 512 GB/s
TDP 250 watts 250 watts 180 watts 250 watts 250 watts 165 watts 275 watts 275 watts 175 watts
Peak Compute 10.6 TFLOPS 10.1 TFLOPS 8.2 TFLOPS 5.63 TFLOPS 6.14 TFLOPS 4.61 TFLOPS 8.60 TFLOPS 7.20 TFLOPS 8.19 TFLOPS
Transistor Count 12.0B 12.0B 7.2B 8.0B 8.0B 5.2B 8.9B 8.9B 8.9B
Process Tech 16nm 16nm 16nm 28nm 28nm 28nm 28nm 28nm 28nm
MSRP (current) $699 $1,200 $599 $649 $999 $499 $649 $549 $499

The GTX 1080 Ti looks a whole lot like the TITAN X launched in August of last year. Based on the 12B transistor GP102 chip, the new GTX 1080 Ti will have 3,584 CUDA core with a 1.60 GHz Boost clock. That gives it the same processor count as Titan X but with a slightly higher clock speed which should make the new GTX 1080 Ti slightly faster by at least a few percentage points and has a 4.7% edge in base clock compute capability. It has 28 SMs, 28 geometry units, 224 texture units.

GeForce_GTX_1080_Ti_Block_Diagram.png

Interestingly, the memory system on the GTX 1080 Ti gets adjusted – NVIDIA has disabled a single 32-bit memory controller to give the card a total of 352-bit wide bus and an odd-sounding 11GB memory capacity. The ROP count also drops to 88 units. Speaking of 11, the memory clock on the G5X implementation on GTX 1080 Ti will now run at 11 Gbps, a boost available to NVIDIA thanks to a chip revision from Micron and improvements to equalization and reverse signal distortion.

The move from 12GB of memory on the GP102-based Titan X to 11GB on the GTX 1080 Ti is an interesting move, and evokes memories of the GTX 970 fiasco where NVIDIA disabled a portion of that memory controller but left the memory that would have resided on it ON the board. At that point, what behaved as 3.5GB of memory at one speed and 500 MB at another speed, was the wrong move to make. But releasing the GTX 970 with "3.5GB" of memory would have seemed odd too. NVIDIA is not making the same mistake, instead building the GTX 1080 Ti with 11GB out the gate.

Continue reading our review of the NVIDIA GeForce GTX 1080 Ti 11GB graphics card!

Linked Multi-GPU Arrives... for Developers

The Khronos Group has released the Vulkan 1.0.42.0 specification, which includes experimental (more on that in a couple of paragraphs) support for VR enhancements, sharing resources between processes, and linking similar GPUs. This spec was released alongside a LunarG SDK and NVIDIA drivers, which are intended for developers, not gamers, that fully implement these extensions.

I would expect that the most interesting feature is experimental support for linking similar GPUs together, similar to DirectX 12’s Explicit Linked Multiadapter, which Vulkan calls a “Device Group”. The idea is that the physical GPUs hidden behind this layer can do things like share resources, such as rendering a texture on one GPU and consuming it in another, without the host code being involved. I’m guessing that some studios, like maybe Oxide Games, will decide to not use this feature. While it’s not explicitly stated, I cannot see how this (or DirectX 12’s Explicit Linked mode) would be compatible in cross-vendor modes. Unless I’m mistaken, that would require AMD, NVIDIA, and/or Intel restructuring their drivers to inter-operate at this level. Still, the assumptions that could be made with grouped devices are apparently popular with enough developers for both the Khronos Group and Microsoft to bother.

microsoft-dx12-build15-linked.png

A slide from Microsoft's DirectX 12 reveal, long ago.

As for the “experimental” comment that I made in the introduction... I was expecting to see this news around SIGGRAPH, which occurs in late-July / early-August, alongside a minor version bump (to Vulkan 1.1).

I might still be right, though.

The major new features of Vulkan 1.0.42.0 are implemented as a new classification of extensions: KHX. In the past, vendors, like NVIDIA and AMD, would add new features as vendor-prefixed extensions. Games could query the graphics driver for these abilities, and enable them if available. If these features became popular enough for multiple vendors to have their own implementation of it, a committee would consider an EXT extension. This would behave the same across all implementations (give or take) but not be officially adopted by the Khronos Group. If they did take it under their wing, it would be given a KHR extension (or added as a required feature).

The Khronos Group has added a new layer: KHX. This level of extension sits below KHR, and is not intended for production code. You might see where this is headed. The VR multiview, multi-GPU, and cross-process extensions are not supposed to be used in released video games until they leave KHX status. Unlike a vendor extension, the Khronos Group wants old KHX standards to drop out of existence at some point after they graduate to full KHR status. It’s not something that NVIDIA owns and will keep it around for 20 years after its usable lifespan just so old games can behave expectedly.

khronos-group-logo.png

How long will that take? No idea. I’ve already mentioned my logical but uneducated guess a few paragraphs ago, but I’m not going to repeat it; I have literally zero facts to base it on, and I don’t want our readers to think that I do. I don’t. It’s just based on what the Khronos Group typically announces at certain trade shows, and the length of time since their first announcement.

The benefit that KHX does bring us is that, whenever these features make it to public release, developers will have already been using it... internally... since around now. When it hits KHR, it’s done, and anyone can theoretically be ready for it when that time comes.

Author:
Manufacturer: NVIDIA

VR Performance Evaluation

Even though virtual reality hasn’t taken off with the momentum that many in the industry had expected on the heels of the HTC Vive and Oculus Rift launches last year, it remains one of the fastest growing aspects of PC hardware. More importantly for many, VR is also one of the key inflection points for performance moving forward; it requires more hardware, scalability, and innovation than any other sub-category including 4K gaming.  As such, NVIDIA, AMD, and even Intel continue to push the performance benefits of their own hardware and technology.

Measuring and validating those claims has proven to be a difficult task. Tools that we used in the era of standard PC gaming just don’t apply. Fraps is a well-known and well-understood tool for measuring frame rates and frame times utilized by countless reviewers and enthusiasts. But Fraps lacked the ability to tell the complete story of gaming performance and experience. NVIDIA introduced FCAT and we introduced Frame Rating back in 2013 to expand the capabilities that reviewers and consumers had access to. Using more sophisticated technique that includes direct capture of the graphics card output in uncompressed form, a software-based overlay applied to each frame being rendered, and post-process analyzation of that data, we were able to communicate the smoothness of a gaming experience, better articulating it to help gamers make purchasing decisions.

pipe1.jpg

VR pipeline when everything is working well.

For VR though, those same tools just don’t cut it. Fraps is a non-starter as it measures frame rendering from the GPU point of view and completely misses the interaction between the graphics system and the VR runtime environment (OpenVR for Steam/Vive and OVR for Oculus). Because the rendering pipeline is drastically changed in the current VR integrations, what Fraps measures is completely different than the experience the user actually gets in the headset. Previous FCAT and Frame Rating methods were still viable but the tools and capture technology needed to be updated. The hardware capture products we used since 2013 were limited in their maximum bandwidth and the overlay software did not have the ability to “latch in” to VR-based games. Not only that but measuring frame drops, time warps, space warps and reprojections would be a significant hurdle without further development.  

pipe2.jpg

VR pipeline with a frame miss.

NVIDIA decided to undertake the task of rebuilding FCAT to work with VR. And while obviously the company is hoping that it will prove its claims of performance benefits for VR gaming, it should not be overlooked the investment in time and money spent on a project that is to be open sourced and free available to the media and the public.

vlcsnap-2017-02-27-11h31m17s057.png

NVIDIA FCAT VR is comprised of two different applications. The FCAT VR Capture tool runs on the PC being evaluated and has a similar appearance to other performance and timing capture utilities. It uses data from Oculus Event Tracing as a part of the Windows ETW and SteamVR’s performance API, along with NVIDIA driver stats when used on NVIDIA hardware to generate performance data. It will and does work perfectly well on any GPU vendor’s hardware though with the access to the VR vendor specific timing results.

fcatvrcapture.jpg

Continue reading our preview of the new FCAT VR tool!

Manufacturer: PC Perspective

Living Long and Prospering

The open fork of AMD’s Mantle, the Vulkan API, was released exactly a year ago with, as we reported, a hard launch. This meant public, but not main-branch drivers for developers, a few public SDKs, a proof-of-concept patch for The Talos Principle, and, of course, the ratified specification. This sets up the API to find success right out of the gate, and we can now look back over the year since.

khronos-2017-vulkan-alt-logo.png

Thor's hammer, or a tempest in a teapot?

The elephant in the room is DOOM. This game has successfully integrated the API and it uses many of its more interesting features, like asynchronous compute. Because the API is designed in a sort-of “make a command, drop it on a list” paradigm, the driver is able to select commands based on priority and available resources. AMD’s products got a significant performance boost, relative to OpenGL, catapulting their Fury X GPU up to the enthusiast level that its theoretical performance suggested.

Mobile developers have been picking up the API, too. Google, who is known for banishing OpenCL from their Nexus line and challenging OpenGL ES with their Android Extension Pack (later integrated into OpenGL ES with version 3.2), has strongly backed Vulkan. The API was integrated as a core feature of Android 7.0.

On the engine and middleware side of things, Vulkan is currently “ready for shipping games” as of Unreal Engine 4.14. It is also included in Unity 5.6 Beta, which is expected for full release in March. Frameworks for emulators are also integrating Vulkan, often just to say they did, but sometimes to emulate the quirks of these system’s offbeat graphics co-processors. Many other engines, from Source 2 to Torque 3D, have also announced or added Vulkan support.

Finally, for the API itself, The Khronos Group announced (pg 22 from SIGGRAPH 2016) areas that they are actively working on. The top feature is “better” multi-GPU support. While Vulkan, like OpenCL, allows developers to enumerate all graphics devices and target them, individually, with work, it doesn’t have certain mechanisms, like being able to directly ingest output from one GPU into another. They haven’t announced a timeline for this.

Author:
Manufacturer: EVGA

The new EVGA GTX 1080 FTW2 with iCX Technology

Back in November of 2016, EVGA had a problem on its hands. The company had a batch of GTX 10-series graphics cards using the new ACX 3.0 cooler solution leave the warehouse missing thermal pads required to keep the power management hardware on its cards within reasonable temperature margins. To its credit, the company took the oversight seriously and instituted a set of solutions for consumers to select from: RMA, new VBIOS to increase fan speeds, or to install thermal pads on your hardware manually. Still, as is the case with any kind of product quality lapse like that, there were (and are) lingering questions about EVGA’s ability to maintain reliable product; with features and new options that don’t compromise the basics.

Internally, the drive to correct these lapses was…strong. From the very top of the food chain on down, it was hammered home that something like this simply couldn’t occur again, and even more so, EVGA was to develop and showcase a new feature set and product lineup demonstrating its ability to innovate. Thus was born, and accelerated, the EVGA iCX Technology infrastructure. While this was something in the pipeline for some time already, it was moved up to counter any negative bias that might have formed for EVGA’s graphics cards over the last several months. The goal was simple: prove that EVGA was the leader in graphics card design and prove that EVGA has learned from previous mistakes.

EVGA iCX Technology

Previous issues aside, the creation of iCX Technology is built around one simple question: is one GPU temperature sensor enough? For nearly all of today’s graphics cards, cooling is based around the temperature of the GPU silicon itself, as measured by NVIDIA (for all of EVGA’s cards). This is how fan curves are built, how GPU clock speeds are handled with GPU Boost, how noise profiles are created, and more. But as process technology has improved, and GPU design has weighed towards power efficiency, the GPU itself is often no longer the thermally limiting factor.

slides05.jpg

As it turns out, converting 12V (from the power supply) to ~1V (necessary for the GPU) is a simple process that creates a lot of excess heat. The thermal images above clearly demonstrate that and EVGA isn’t the only card vendor to take notice of this. As it turns out, EVGA’s product issue from last year was related to this – the fans were only spinning fast enough to keep the GPU cool and did not take into account the temperature of memory or power delivery.

The fix from EVGA is to ratchet up the number of sensors on the card PCB and wrap them with intelligence in the form of MCUs, updated Precision XOC software and user viewable LEDs on the card itself.

slides10.jpg

EVGA graphics cards with iCX Technology will include 9 total thermal sensors on the board, independent of the GPU temperature sensor directly integrated by NVIDIA. There are three sensors for memory, five for power delivery and an additional sensor for the GPU temperature. Some are located on the back of the PCB to avoid any conflicts with trace routing between critical components, including the secondary GPU sensor.

Continue reading about EVGA iCX Technology!

Author:
Manufacturer: NVIDIA

NVIDIA P100 comes to Quadro

At the start of the SOLIDWORKS World conference this week, NVIDIA took the cover off of a handful of new Quadro cards targeting professional graphics workloads. Though the bulk of NVIDIA’s discussion covered lower cost options like the Quadro P4000, P2000, and below, the most interesting product sits at the high end, the Quadro GP100.

As you might guess from the name alone, the Quadro GP100 is based on the GP100 GPU, the same silicon used on the Tesla P100 announced back in April of 2016. At the time, the GP100 GPU was specifically billed as an HPC accelerator for servers. It had a unique form factor with a passive cooler that required additional chassis fans. Just a couple of months later, a PCIe version of the GP100 was released under the Tesla GP100 brand with the same specifications.

quadro2017-2.jpg

Today that GPU hardware gets a third iteration as the Quadro GP100. Let’s take a look at the Quadro GP100 specifications and how it compares to some recent Quadro offerings.

  Quadro GP100 Quadro P6000 Quadro M6000 Full GP100
GPU GP100 GP102 GM200 GP100 (Pascal)
SMs 56 60 48 60
TPCs 28 30 24 (30?)
FP32 CUDA Cores / SM 64 64 64 64
FP32 CUDA Cores / GPU 3584 3840 3072 3840
FP64 CUDA Cores / SM 32 2 2 32
FP64 CUDA Cores / GPU 1792 120 96 1920
Base Clock 1303 MHz 1417 MHz 1026 MHz TBD
GPU Boost Clock 1442 MHz 1530 MHz 1152 MHz TBD
FP32 TFLOPS (SP) 10.3 12.0 7.0 TBD
FP64 TFLOPS (DP) 5.15 0.375 0.221 TBD
Texture Units 224 240 192 240
ROPs 128? 96 96 128?
Memory Interface 1.4 Gbps
4096-bit HBM2
9 Gbps
384-bit GDDR5X
6.6 Gbps
384-bit
GDDR5
4096-bit HBM2
Memory Bandwidth 716 GB/s 432 GB/s 316.8 GB/s ?
Memory Size 16GB 24 GB 12GB 16GB
TDP 235 W 250 W 250 W TBD
Transistors 15.3 billion 12 billion 8 billion 15.3 billion
GPU Die Size 610mm2 471 mm2 601 mm2 610mm2
Manufacturing Process 16nm 16nm 28nm 16nm

There are some interesting stats here that may not be obvious at first glance. Most interesting is that despite the pricing and segmentation, the GP100 is not the de facto fastest Quadro card from NVIDIA depending on your workload. With 3584 CUDA cores running at somewhere around 1400 MHz at Boost speeds, the single precision (32-bit) rating for GP100 is 10.3 TFLOPS, less than the recently released P6000 card. Based on GP102, the P6000 has 3840 CUDA cores running at something around 1500 MHz for a total of 12 TFLOPS.

gp102-blockdiagram.jpg

GP100 (full) Block Diagram

Clearly the placement for Quadro GP100 is based around its 64-bit, double precision performance, and its ability to offer real-time simulations on more complex workloads than other Pascal-based Quadro cards can offer. The Quadro GP100 offers 1/2 DP compute rate, totaling 5.2 TFLOPS. The P6000 on the other hand is only capable of 0.375 TLOPS with the standard, consumer level 1/32 DP rate. Inclusion of ECC memory support on GP100 is also something no other recent Quadro card has.

quadro2017-3.jpg

Raw graphics performance and throughput is going to be questionable until someone does some testing, but it seems likely that the Quadro P6000 will still be the best solution for that by at least a slim margin. With a higher CUDA core count, higher clock speeds and equivalent architecture, the P6000 should run games, graphics rendering and design applications very well.

There are other important differences offered by the GP100. The memory system is built around a 16GB HBM2 implementation which means more total memory bandwidth but at a lower capacity than the 24GB Quadro P6000. Offering 66% more memory bandwidth does mean that the GP100 offers applications that are pixel throughput bound an advantage, as long as the compute capability keeps up on the backend.

m.jpg

Continue reading our preview of the new Quadro GP100!

Author:
Manufacturer: AMD

Performance and Impressions

This content was sponsored by AMD.

Last week in part 1 of our look at the Radeon RX 460 as a budget gaming GPU, I detailed our progress through component selection. Centered around an XFX 2GB version of the Radeon RX 460, we built a machine using an Intel Core i3-6100, ASUS H110M motherboard, 8GB of DDR4 memory, both an SSD and a HDD, as well as an EVGA power supply and Corsair chassis. Part 1 discussed the reasons for our hardware selections as well as an unboxing and preview of the giveaway to come.

In today's short write up and video, I will discuss my impressions of the system overall as well as touch on the performance in a handful of games. Despite the low the price, and despite the budget moniker attributed to this build, a budding PC gamer or converted console gamer will find plenty of capability in this system.

Check out prices of Radeon RX 460 graphics cards on Amazon!!

Let's quickly recap the components making up our RX 460 budget build.

Our Radeon RX 460 Build

  Budget Radeon RX 460 Build
Processor Intel Core i3-6100 - $109
Cooler CRYORIG M9i - $19
Motherboard ASUS H110M-A/M.2 - $54
Memory 2 x 4GB Crucial Ballistix DDR4-2400 - $51
Graphics Card XFX Radeon RX 460 2GB - $98
Storage 240GB Sandisk SSD Plus - $68
1TB Western Digital Blue - $49
Case Corsair Carbide Series 88R - $49
Power Supply EVGA 500 Watt - $42
Monitor Nixues VUE24A 1080p 144Hz FreeSync - $251
Total Price $549 on Amazon; $799 with monitor on Amazon

For just $549 I was able to create shopping list of hardware that provides very impressive performance for the investment.

02.jpg

The completed system is damn nice looking, if I do say so myself. The Corsair Carbide 88R case sports a matte black finish with a large window to peer in at the hardware contained within. Coupled with the Nixeus FreeSync display and some Logitech G mouse and keyboard hardware we love, this is a configuration that any PC gamer would be proud to display.

03.jpg

Continue reading our performance thoughts on the RX 460 budget PC build!

Author:
Manufacturer: AMD

Our Radeon RX 460 Build

This content was sponsored by AMD.

Be sure you check out part 2 of our story where we detail the performance our RX 460 build provides as well as our contest page where you can win this PC from AMD and PC Perspective!

Just before CES this month, AMD came to me asking about our views and opinions on its Radeon RX 460 line of graphics cards, how the GPU is perceived in the market, and how I felt they could better position it to the target audience. It was at that point that I had to openly admit to never actually having installed and used an RX 460 GPU before. I know, shame on me.

I like to pride myself and PC Perspective on being one of the top sources of technical information in the world of PCs, gaming or otherwise, and in particular on GPUs. But a pitfall that I fall into, and I imagine many other reviewers and media do as well, is that I overly emphasize the high end of the market. And that I tend to shift what is considered a “budget” product up the scale more than I should. Is a $250 graphics card really a budget product that the mass market is going to purchase? No, and the numbers clearly point to that as fact. More buyers purchase cards in the sub-$150 segment than in any other, upgrading OEMs PCs and building low cost boxes for themselves and for the family/friends.

So, AMD came to me with a proposal to address this deficiency in my mental database. If we were willing to build a PC based on the RX 460, testing it and evaluating it honestly, and then give that built system back to the community, they would pay for the hardware and promotion of such an event. So here we are.

To build out the RX 460-based PC, I went to the experts in the world of budget PC builds, the /r/buildapc subreddit. The community here is known for being the best at penny-pinching and maximizing the performance-per-dollar implementations on builds. While not the only types of hardware they debate and discuss in that group, it definitely is the most requested. I started a thread there to ask for input and advice on building a system with the only requirements being inclusion of the Radeon RX 460 and perhaps an AMD FreeSync monitor.

Check out prices of Radeon RX 460 graphics cards on Amazon!!

The results were impressive; a solid collection of readers and contributors gave me suggestions for complete builds based around the RX 460. Processors varied, memory configurations varied, storage options varied, but in the end I had at least a dozen solid options that ranged in price from $400-800. With the advice of the community at hand, I set out to pick the components for our own build, which are highlighted below:

Our Radeon RX 460 Build

  Budget Radeon RX 460 Build
Processor Intel Core i3-6100 - $109
Cooler CRYORIG M9i - $19
Motherboard ASUS H110M-A/M.2 - $54
Memory 2 x 4GB Crucial Ballistix DDR4-2400 - $51
Graphics Card XFX Radeon RX 460 2GB - $98
Storage 240GB Sandisk SSD Plus - $68
1TB Western Digital Blue - $49
Case Corsair Carbide Series 88R - $49
Power Supply EVGA 500 Watt - $42
Monitor Nixues VUE24A 1080p 144Hz FreeSync - $251
Total Price $549 on Amazon; $799 with monitor on Amazon

I’ll go in order of presentation for simplicity sake. First up is the selection of the Intel Core i3-6100 processor. This CPU was the most popular offering in the /r/buildapc group and has been the darling of budget gaming builds for a while. It is frequently used because of it $109 price tag, along with dual-core, HyperThreaded performance at 3.7 GHz; giving you plenty of headroom for single threaded applications. Since most games aren’t going to utilize more than four threads, the PC gaming performance will be excellent as well. One frequent suggestion in our thread was the Intel Pentium G4560, a Kaby Lake based part that will sell for ~$70. That would have been my choice but it’s not shipping yet, and I don’t know when it will be.

cpu.jpg

Continue reading our budget build based on the Radeon RX 460!

High Bandwidth Cache

Apart from AMD’s other new architecture due out in 2017, its Zen CPU design, there is no other product that has had as much build up and excitement surrounding it than its Vega GPU architecture. After the world learned that Polaris would be a mainstream-only design that was released as the Radeon RX 480, the focus for enthusiasts came straight to Vega. It’s been on the public facing roadmaps for years and signifies the company’s return to the world of high end GPUs, something they have been missing since the release of the Fury X in mid-2015.

slides-2.jpg

Let’s be clear: today does not mark the release of the Vega GPU or products based on Vega. In reality, we don’t even know enough to make highly educated guesses about the performance without more details on the specific implementations. That being said, the information released by AMD today is interesting and shows that Vega will be much more than simply an increase in shader count over Polaris. It reminds me a lot of the build to the Fiji GPU release, when the information and speculation about how HBM would affect power consumption, form factor and performance flourished. What we can hope for, and what AMD’s goal needs to be, is a cleaner and more consistent product release than how the Fury X turned out.

The Design Goals

AMD began its discussion about Vega last month by talking about the changes in the world of GPUs and how the data sets and workloads have evolved over the last decade. No longer are GPUs only worried about games, but instead they must address profession workloads, enterprise workloads, scientific workloads. Even more interestingly, as we have discussed the gap in CPU performance vs CPU memory bandwidth and the growing gap between them, AMD posits that the gap between memory capacity and GPU performance is a significant hurdle and limiter to performance and expansion. Game installs, professional graphics sets, and compute data sets continue to skyrocket. Game installs now are regularly over 50GB but compute workloads can exceed petabytes. Even as we saw GPU memory capacities increase from Megabytes to Gigabytes, reaching as high as 12GB in high end consumer products, AMD thinks there should be more.

slides-8.jpg

Coming from a company that chose to release a high-end product limited to 4GB of memory in 2015, it’s a noteworthy statement.

slides-11.jpg

The High Bandwidth Cache

Bold enough to claim a direct nomenclature change, Vega 10 will feature a HBM2 based high bandwidth cache (HBC) along with a new memory hierarchy to call it into play. This HBC will be a collection of memory on the GPU package just like we saw on Fiji with the first HBM implementation and will be measured in gigabytes. Why the move to calling it a cache will be covered below. (But can’t we call get behind the removal of the term “frame buffer”?) Interestingly, this HBC doesn’t have to be HBM2 and in fact I was told that you could expect to see other memory systems on lower cost products going forward; cards that integrate this new memory topology with GDDR5X or some equivalent seem assured.

slides-13.jpg

Continue reading our preview of the AMD Vega GPU Architecture!

Author:
Manufacturer: AMD

AMD Enters Machine Learning Game with Radeon Instinct Products

NVIDIA has been diving in to the world of machine learning for quite a while, positioning themselves and their GPUs at the forefront on artificial intelligence and neural net development. Though the strategies are still filling out, I have seen products like the DIGITS DevBox place a stake in the ground of neural net training and platforms like Drive PX to perform inference tasks on those neural nets in self-driving cars. Until today AMD has remained mostly quiet on its plans to enter and address this growing and complex market, instead depending on the compute prowess of its latest Polaris and Fiji GPUs to make a general statement on their own.

instinct-18.jpg

The new Radeon Instinct brand of accelerators based on current and upcoming GPU architectures will combine with an open-source approach to software and present researchers and implementers with another option for machine learning tasks.

The statistics and requirements that come along with the machine learning evolution in the compute space are mind boggling. More than 2.5 quintillion bytes of data are generated daily and stored on phones, PCs and servers, both on-site and through a cloud infrastructure. That includes 500 million tweets, 4 million hours of YouTube video, 6 billion google searches and 205 billion emails.

instinct-6.jpg

Machine intelligence is going to allow software developers to address some of the most important areas of computing for the next decade. Automated cars depend on deep learning to train, medical fields can utilize this compute capability to more accurately and expeditiously diagnose and find cures to cancer, security systems can use neural nets to locate potential and current risk areas before they affect consumers; there are more uses for this kind of network and capability than we can imagine.

Continue reading our preview of the AMD Radeon Instinct machine learning processors!

Author:
Manufacturer: AMD

Third annual release

For the past two years, AMD has made a point of releasing one major software update to Radeon users and gamers annually. In 2014 this started with Catalyst Omega, a dramatic jump in performance, compatibility testing and new features were the story. We were told that for the first time in a very long while, and admitting this was the most important aspect to me, AMD was going to focus on building great software with regular and repeated updates. In 2015 we got a rebrand along with the release: Radeon Software Crimson Edition.  AMD totally revamped the visual and user experience of the driver software, bringing into the modern world of style and function. New features and added performance were also the hallmarks of this release, with a stronger promise to produce more frequent drivers to address any performance gaps, stability concerns and to include new features.

For December 2016 and into the new year, AMD is launching the Radeon Software Crimson ReLive Edition driver. While the name might seem silly, it will make sense as we dive into the new features.

While you may have seen the slides leak out through some other sites over the past 48 hours, I thought it was worth offering my input on the release.

Not a performance focused story

The first thing that should be noted with the ReLive Edition is that AMD isn’t making any claims of substantially improved performance. Instead, the Radeon Technologies Group software team is dedicated to continued and frequent iterations that improve performance gradually over time.

slides40.jpg

As you can see in the slide above, AMD is showing modest 4-8% performance gains on the Radeon RX 480 with the Crimson ReLive driver, and even then, its being compared to the launch driver of 16.6.2.  That is significantly lower than the claims made in previous major driver releases. Talking with AMD about this concern, it told us that they don’t foresee any dramatic, single large step increases in performance going forward. The major design changes that were delivered over the last several years, starting with a reconstruction of the CrossFire system thanks to our testing, have been settled. All we should expect going forward is a steady trickle of moderate improvements.

(Obviously, an exception may occur here or there, like with a new game release.)

Radeon ReLive Capture and Streaming Feature

So, what is new? The namesake feature for this driver is the Radeon ReLive application that is built in. ReLive is a capture and streaming tool that will draw obvious comparisons to what NVIDIA has done with GeForce Experience. The ReLive integration is clean and efficient, well designed and seems easy to use in my quick time with it. There are several key capabilities it offers.

First, you can record your gameplay with the press of a hotkey; this includes the ability to record and capture the desktop as well. AMD has included a bevy of settings for your captures to adjust quality, resolution, bitrate, FPS and more.

rev10-table.jpg

ReLive supports resolutions up to 4K30 with the Radeon R9 series of GPUs and up to 1440p30 with the RX 480/470/460. That includes both AVC H.264 and HEVC H.265.

Along with recording is support for background capture, called Instant Replay. This allows the gamer to always record in the background, up to 20 minutes, so you can be sure you capture amazing moments that happen during your latest gaming session. Hitting a hotkey will save the clip permanently to the system.

rev9.jpg

Continue reading our overview of the new AMD Radeon Crimson ReLive driver!

Author:
Manufacturer: NVIDIA

A Holiday Project

A couple of years ago, I performed an experiment around the GeForce GTX 750 Ti graphics card to see if we could upgrade basic OEM, off-the-shelf computers to become competent gaming PCs. The key to this potential upgrade was that the GTX 750 Ti offered a great amount of GPU horsepower (at the time) without the need for an external power connector. Lower power requirements on the GPU meant that even the most basic of OEM power supplies should be able to do the job.

That story was a success, both in terms of the result in gaming performance and the positive feedback it received. Today, I am attempting to do that same thing but with a new class of GPU and a new class of PC games.

The goal for today’s experiment remains pretty much the same: can a low-cost, low-power GeForce GTX 1050 Ti graphics card that also does not require any external power connector offer enough gaming horsepower to upgrade current shipping OEM PCs to "gaming PC" status?

Our target PCs for today come from Dell and ASUS. I went into my local Best Buy just before the Thanksgiving holiday and looked for two machines that varied in price and relative performance.

01.jpg

  Dell Inspiron 3650 ASUS M32CD-B09
Processor Intel Core i3-6100 Intel Core i7-6700
Motherboard Custom Custom
Memory 8GB DDR4 12GB DDR4
Graphics Card Intel HD Graphics 530 Intel HD Graphics 530
Storage 1TB HDD 1TB Hybrid HDD
Case Custom Custom
Power Supply 240 watt 350 watt
OS Windows 10 64-bit Windows 10 64-bit
Total Price $429 (Best Buy) $749 (Best Buy)

The specifications of these two machines are relatively modern for OEM computers. The Dell Inspiron 3650 uses a modest dual-core Core i3-6100 processor with a fixed clock speed of 3.7 GHz. It has a 1TB standard hard drive and a 240 watt power supply. The ASUS M32CD-B09 PC has a quad-core HyperThreaded processor with a 4.0 GHz maximum Turbo clock, a 1TB hybrid hard drive and a 350 watt power supply. Both of the CPUs share the same Intel brand of integrated graphics, the HD Graphics 520. You’ll see in our testing that not only is this integrated GPU unqualified for modern PC gaming, but it also performs quite differently based on the CPU it is paired with.

Continue reading our look at upgrading an OEM machine with the GTX 1050 Ti!!

Author:
Manufacturer: MSI

We have a lot of gaming notebooks

Back in April I did a video with MSI that looked at all of the gaming notebook lines it built around the GTX 900-series of GPUs. Today we have stepped it up a notch, and again are giving you an overview of MSI's gaming notebook lines that now feature the ultra-powerful GTX 10-series using NVIDIA's Pascal architecture. That includes the GTX 1060, GTX 1070 and GTX 1080.

What differentiates the various series of notebooks from MSI? The GE series is for entry level notebook gaming, the GS series offers slim options while the GT series is the ultimate PC gaming mobile platforms. 

  GE series GS series GT62/72 series GT 73/83 series
MSRP $1549-1749 $1499-2099 $1499-2599 $2199-4999
Screen 15.6" and 17.3"
1080p
14", 15.6" and 17.3"
1080p and 4K
15.6" and 17.3"
1080p, G-Sync
17.3" and 18"
1080p, 4K
G-Sync (varies)
CPU Core i7-6700HQ Core i7-6700HQ Core i7-6700HQ Core i7-6820HK
Core i7-6920HQ
GPU GTX 1060 6GB GTX 1060 6GB GTX 1060 6GB
GTX 1070 8GB
GTX 1070 8GB (SLI option)
GTX 1080 8GB (SLI option)
RAM 12-16GB 16-32GB 12-32GB 16-64GB
Storage 128-512GB M.2 SATA
1TB HDD
128-512GB M.2 SATA
1TB HDD
128-512GB PCIe and SATA
1TB HDD
Up to 1TB SSD (SATA, NVMe)
1TB HDD
Optical DVD Super-multi None Yes (GT72 only) Blu-ray burner (GT83 only)
Features Killer E2400 LAN
USB 3.1 Type-C
Steel Series RGB Keyboard
Killer E2400 LAN
Killer 1535 WiFi
Thunderbolt 3
Killer E2400 LAN
Killer 1535 WiFi
USB 3.1 Type-C
3x USB 3.0 (GT62)
3x USB 3.0 (GT72)
Killer E2400 LAN
Killer 1535 WiFi
Thunderbolt 3
5x USB 3.0
Steel Series RGB (GT73)
Mechanical Keyboard (GT83)
Weight 5.29-5.35 lbs 3.75-5.35 lbs 6.48-8.33 lbs 8.59-11.59 lbs

Our video below will break down the differences and help point you toward the right notebook for you based on the three key pillars of performance, price and form factor.

Thanks goes out to CUK, Computer Upgrade King, for supplying the 9 different MSI notebooks for our testing and evaluation!

Author:
Manufacturer: ASUS

Specifications and Card Breakdown

The flurry of retail built cards based on NVIDIA's new Pascal GPUs has been hitting us hard at PC Perspective. So much in fact that, coupled with new gaming notebooks, new monitors, new storage and a new church (you should listen to our podcast, really) output has slowed dramatically. How do you write reviews for all of these graphics cards when you don't even know where to start? My answer: blindly pick one and start typing away.

07.jpg

Just after launch day of the GeForce GTX 1060, ASUS sent over the GTX 1060 Turbo 6GB card. Despite the name, the ASUS Turbo line of GTX 10-series graphics cards is the company's most basic, most stock iteration of graphics cards. That isn't necessarily a drawback though - you get reference level performance at the lowest available price and you still get the promises of quality and warranty from ASUS.

With a target MSRP of just $249, does the ASUS GTX 1060 Turbo make the cut for users looking for that perfect mainstream 1080p gaming graphics card? Let's find out.

Continue reading our review of the ASUS GeForce GTX 1060 Turbo 6GB!

Manufacturer: PC Perspective

Why Two 4GB GPUs Isn't Necessarily 8GB

We're trying something new here at PC Perspective. Some topics are fairly difficult to explain cleanly without accompanying images. We also like to go fairly deep into specific topics, so we're hoping that we can provide educational cartoons that explain these issues.

This pilot episode is about load-balancing and memory management in multi-GPU configurations. There seems to be a lot of confusion around what was (and was not) possible with DirectX 11 and OpenGL, and even more confusion about what DirectX 12, Mantle, and Vulkan allow developers to do. It highlights three different load-balancing algorithms, and even briefly mentions what LucidLogix was attempting to accomplish almost ten years ago.

pcper-2016-animationlogo-multiGPU.png

If you like it, and want to see more, please share and support us on Patreon. We're putting this out not knowing if it's popular enough to be sustainable. The best way to see more of this is to share!

Open the expanded article to see the transcript, below.

Manufacturer: NVIDIA

Is Enterprise Ascending Outside of Consumer Viability?

So a couple of weeks have gone by since the Quadro P6000 (update: was announced) and the new Titan X launched. With them, we received a new chip: GP102. Since Fermi, NVIDIA has labeled their GPU designs with a G, followed by a single letter for the architecture (F, K, M, or P for Fermi, Kepler, Maxwell, and Pascal, respectively), which is then followed by a three digit number. The last digit is the most relevant one, however, as it separates designs by their intended size.

nvidia-2016-Quadro_P6000_7440.jpg

Typically, 0 corresponds to a ~550-600mm2 design, which is about as larger of a design that fabrication labs can create without error-prone techniques, like multiple exposures (update for clarity: trying to precisely overlap multiple designs to form a larger integrated circuit). 4 corresponds to ~300mm2, although GM204 was pretty large at 398mm2, which was likely to increase the core count while remaining on a 28nm process. Higher numbers, like 6 or 7, fill back the lower-end SKUs until NVIDIA essentially stops caring for that generation. So when we moved to Pascal, jumping two whole process nodes, NVIDIA looked at their wristwatches and said “about time to make another 300mm2 part, I guess?”

The GTX 1080 and the GTX 1070 (GP104, 314mm2) were born.

nvidia-2016-gtc-pascal-banner.png

NVIDIA already announced a 600mm2 part, though. The GP100 had 3840 CUDA cores, HBM2 memory, and an ideal ratio of 1:2:4 between FP64:FP32:FP16 performance. (A 64-bit chunk of memory can store one 64-bit value, two 32-bit values, or four 16-bit values, unless the register is attached to logic circuits that, while smaller, don't know how to operate on the data.) This increased ratio, even over Kepler's 1:6 FP64:FP32, is great for GPU compute, but wasted die area for today's (and tomorrow's) games. I'm predicting that it takes the wind out of Intel's sales, as Xeon Phi's 1:2 FP64:FP32 performance ratio is one of its major selling points, leading to its inclusion in many supercomputers.

Despite the HBM2 memory controller supposedly being actually smaller than GDDR5(X), NVIDIA could still save die space while still providing 3840 CUDA cores (despite disabling a few on Titan X). The trade-off is that FP64 and FP16 performance had to decrease dramatically, from 1:2 and 2:1 relative to FP32, all the way down to 1:32 and 1:64. This new design comes in at 471mm2, although it's $200 more expensive than what the 600mm2 products, GK110 and GM200, launched at. Smaller dies provide more products per wafer, and, better, the number of defective chips should be relatively constant.

Anyway, that aside, it puts NVIDIA in an interesting position. Splitting the xx0-class chip into xx0 and xx2 designs allows NVIDIA to lower the cost of their high-end gaming parts, although it cuts out hobbyists who buy a Titan for double-precision compute. More interestingly, it leaves around 150mm2 for AMD to sneak in a design that's FP32-centric, leaving them a potential performance crown.

nvidia-2016-pascal-volta-roadmap-extremetech.png

Image Credit: ExtremeTech

On the other hand, as fabrication node changes are becoming less frequent, it's possible that NVIDIA could be leaving itself room for Volta, too. Last month, it was rumored that NVIDIA would release two architectures at 16nm, in the same way that Maxwell shared 28nm with Kepler. In this case, Volta, on top of whatever other architectural advancements NVIDIA rolls into that design, can also grow a little in size. At that time, TSMC would have better yields, making a 600mm2 design less costly in terms of waste and recovery.

If this is the case, we could see the GPGPU folks receiving a new architecture once every second gaming (and professional graphics) architecture. That is, unless you are a hobbyist. If you are? I would need to be wrong, or NVIDIA would need to somehow bring their enterprise SKU into an affordable price point. The xx0 class seems to have been pushed up and out of viability for consumers.

Or, again, I could just be wrong.

Author:
Manufacturer: NVIDIA

Take your Pascal on the go

Easily the strongest growth segment in PC hardware today is in the adoption of gaming notebooks. Ask companies like MSI and ASUS, even Gigabyte, as they now make more models and sell more units of notebooks with a dedicated GPU than ever before.  Both AMD and NVIDIA agree on this point and it’s something that AMD was adamant in discussing during the launch of the Polaris architecture.

pascalnb-2.jpg

Both AMD and NVIDIA predict massive annual growth in this market – somewhere on the order of 25-30%. For an overall culture that continues to believe the PC is dying, seeing projected growth this strong in any segment is not only amazing, but welcome to those of us that depend on it. AMD and NVIDIA have different goals here: GeForce products already have 90-95% market share in discrete gaming notebooks. In order for NVIDIA to see growth in sales, the total market needs to grow. For AMD, simply taking back a portion of those users and design wins would help its bottom line.

pascalnb-4.jpg

But despite AMD’s early talk about getting Polaris 10 and 11 in mobile platforms, it’s NVIDIA again striking first. Gaming notebooks with Pascal GPUs in them will be available today, from nearly every system vendor you would consider buying from: ASUS, MSI, Gigabyte, Alienware, Razer, etc. NVIDIA claims to have quicker adoption of this product family in notebooks than in any previous generation. That’s great news for NVIDIA, but might leave AMD looking in from the outside yet again.

Technologically speaking though, this makes sense. Despite the improvement that Polaris made on the GCN architecture, Pascal is still more powerful and more power efficient than anything AMD has been able to product. Looking solely at performance per watt, which is really the defining trait of mobile designs, Pascal is as dominant over Polaris as Maxwell was to Fiji. And this time around NVIDIA isn’t messing with cut back parts that have brand changes – GeForce is diving directly into gaming notebooks in a way we have only seen with one release.

g752-open.jpg

The ASUS G752VS OC Edition with GTX 1070

Do you remember our initial look at the mobile variant of the GeForce GTX 980? Not the GTX 980M mind you, the full GM204 operating in notebooks. That was basically a dry run for what we see today: NVIDIA will be releasing the GeForce GTX 1080, GTX 1070 and GTX 1060 to notebooks.

Continue reading our preview of the new GeForce GTX 1080, 1070 and 1060 mobile Pascal GPUs!!

Author:
Manufacturer: NVIDIA

A Beautiful Graphics Card

As a surprise to nearly everyone, on July 21st NVIDIA announced the existence of the new Titan X graphics cards, which are based on the brand new GP102 Pascal GPU. Though it shares a name, for some unexplained reason, with the Maxwell-based Titan X graphics card launched in March of 2015, this is card is a significant performance upgrade. Using the largest consumer-facing Pascal GPU to date (with only the GP100 used in the Tesla P100 exceeding it), the new Titan X is going to be a very expensive, and very fast gaming card.

As has been the case since the introduction of the Titan brand, NVIDIA claims that this card is for gamers that want the very best in graphics hardware as well as for developers and need an ultra-powerful GPGPU device. GP102 does not integrate improved FP64 / double precision compute cores, so we are basically looking at an upgraded and improved GP104 Pascal chip. That’s nothing to sneeze at, of course, and you can see in the specifications below that we expect (and can now show you) Titan X (Pascal) is a gaming monster.

  Titan X (Pascal) GTX 1080 GTX 980 Ti TITAN X GTX 980 R9 Fury X R9 Fury R9 Nano R9 390X
GPU GP102 GP104 GM200 GM200 GM204 Fiji XT Fiji Pro Fiji XT Hawaii XT
GPU Cores 3584 2560 2816 3072 2048 4096 3584 4096 2816
Rated Clock 1417 MHz 1607 MHz 1000 MHz 1000 MHz 1126 MHz 1050 MHz 1000 MHz up to 1000 MHz 1050 MHz
Texture Units 224 160 176 192 128 256 224 256 176
ROP Units 96 64 96 96 64 64 64 64 64
Memory 12GB 8GB 6GB 12GB 4GB 4GB 4GB 4GB 8GB
Memory Clock 10000 MHz 10000 MHz 7000 MHz 7000 MHz 7000 MHz 500 MHz 500 MHz 500 MHz 6000 MHz
Memory Interface 384-bit G5X 256-bit G5X 384-bit 384-bit 256-bit 4096-bit (HBM) 4096-bit (HBM) 4096-bit (HBM) 512-bit
Memory Bandwidth 480 GB/s 320 GB/s 336 GB/s 336 GB/s 224 GB/s 512 GB/s 512 GB/s 512 GB/s 320 GB/s
TDP 250 watts 180 watts 250 watts 250 watts 165 watts 275 watts 275 watts 175 watts 275 watts
Peak Compute 11.0 TFLOPS 8.2 TFLOPS 5.63 TFLOPS 6.14 TFLOPS 4.61 TFLOPS 8.60 TFLOPS 7.20 TFLOPS 8.19 TFLOPS 5.63 TFLOPS
Transistor Count 11.0B 7.2B 8.0B 8.0B 5.2B 8.9B 8.9B 8.9B 6.2B
Process Tech 16nm 16nm 28nm 28nm 28nm 28nm 28nm 28nm 28nm
MSRP (current) $1,200 $599 $649 $999 $499 $649 $549 $499 $329

GP102 features 40% more CUDA cores than the GP104 at slightly lower clock speeds. The rated 11 TFLOPS of single precision compute of the new Titan X is 34% higher than that of the GeForce GTX 1080 and I would expect gaming performance to scale in line with that difference.

Titan X (Pascal) does not utilize the full GP102 GPU; the recently announced Pascal P6000 does, however, which gives it a CUDA core count of 3,840 (256 more than Titan X).

blockdiagram.jpg

A full GP102 GPU

The complete GPU effectively loses 7% of its compute capability with the new Titan X, although that is likely to help increase available clock headroom and yield.

The new Titan X will feature 12GB of GDDR5X memory, not HBM as the GP100 chip has, so this is clearly a unique chip with a new memory interface. NVIDIA claims it has 480 GB/s of bandwidth on a 384-bit memory controller interface running at the same 10 Gbps as the GTX 1080.

Continue reading our review of the new NVIDIA Titan X (Pascal) Graphics Card!!

Author:
Manufacturer: Realworldtech

Realworldtech with Compelling Evidence

Yesterday David Kanter of Realworldtech posted a pretty fascinating article and video that explored the two latest NVIDIA architectures and how they have branched away from the traditional immediate mode rasterization units.  It has revealed through testing that with Maxwell and Pascal NVIDIA has gone to a tiling method with rasterization.  This is a somewhat significant departure for the company considering they have utilized the same basic immediate mode rasterization model since the 90s.

VideoLogic_Apocalypse_3Dx.jpg

The Videologic Apocolypse 3Dx based on the PowerVR PCX2.

(photo courtesy of Wikipedia)

Tiling is an interesting subject and we can harken back to the PowerVR days to see where it was first implemented.  There are many advantages to tiling and deferred rendering when it comes to overall efficiency in power and memory bandwidth.  These first TBDR (Tile Based Deferred Renderers) offered great performance per clock and could utilize slower memory as compared to other offerings of the day (namely Voodoo Graphics).  There were some significant drawbacks to the technology.  Essentially a lot of work had to be done by the CPU and driver in scene setup and geometry sorting.  On fast CPU systems the PowerVR boards could provide very good performance, but it suffered on lower end parts as compared to the competition.  This is a very simple explanation of what is going on, but the long and short of it is that TBDR did not take over the world due to limitations in its initial implementations.  Traditional immediate mode rasters would improve in efficiency and performance with aggressive Z checks and other optimizations that borrow from the TBDR playbook.

Tiling is also present in a lot of mobile parts.  Imagination’s PowerVR graphics technologies have been implemented by others such as Intel, Apple, Mediatek, and others.  Qualcomm (Adreno) and ARM (Mali) both implement tiler technologies to improve power consumption and performance while increasing bandwidth efficiency.  Perhaps most interestingly we can remember back to the Gigapixel days with the GP-1 chip that implemented a tiling method that seemed to work very well without the CPU hit and driver overhead that had plagued the PowerVR chips up to that point.  3dfx bought Gigapixel for some $150 million at the time.  That company then went on to file bankruptcy a year later and their IP was acquired by NVIDIA.

kanter_video.jpg

Screenshot of the program used to uncover the tiling behavior of the rasterizer.

It now appears as though NVIDIA has evolved their raster units to embrace tiling.  This is not a full TBDR implementation, but rather an immediate mode tiler that will still break up the scene in tiles but does not implement deferred rendering.  This change should improve bandwidth efficiency when it comes to rasterization, but it does not affect the rest of the graphics pipeline by forcing it to be deferred (tessellation, geometry setup and shaders, etc. are not impacted).  NVIDIA has not done a deep dive on this change for editors, so we do not know the exact implementation and what advantages we can expect.  We can look at the evidence we have and speculate where those advantages exist.

The video where David Kanter explains his findings

 

Bandwidth and Power

Tilers have typically taken the tiled regions and buffered them on the chip.  This is a big improvement in both performance and power efficiency as the raster data does not have to be cached and written out to the frame buffer and then swapped back.  This makes quite a bit of sense considering the overall lack of big jumps in memory technologies over the past five years.  We have had GDDR-5 since 2007/2008.  The speeds have increased over time, but the basic technology is still much the same.  We have seen HBM introduced with AMD’s Fury series, but large scale production of HBM 2 is still to come.  Samsung has released small amounts of HBM 2 to the market, but not nearly enough to handle the needs of a mass produced card.  GDDR-5X is an extension of GDDR-5 that does offer more bandwidth, but it is still not a next generation memory technology like HBM 2.

By utilizing a tiler NVIDIA is able to lower memory bandwidth needs for the rasterization stage. Considering that both Maxwell and Pascal architectures are based on GDDR-5 and 5x technologies, it makes sense to save as much bandwidth as possible where they can.  This is again probably one, among many, of the reasons that we saw a much larger L2 cache in Maxwell vs. Kepler (2048 KB vs. 256KB respectively).  Every little bit helps when we are looking at hard, real world bandwidth limits for a modern GPU.

The area of power efficiency has also come up in discussion when going to a tiler.  Tilers have traditionally been more power efficient as well due to how the raster data is tiled and cached, requiring fewer reads and writes to main memory.  The first impulse is to say, “Hey, this is the reason why NVIDIA’s Maxwell was so much more power efficient than Kepler and AMD’s latest parts!”  Sadly, this is not exactly true.  The tiler is more power efficient, but it is a small part to the power savings on a GPU.

DSC00209.jpg

The second fastest Pascal based card...

A modern GPU is very complex.  There are some 7.2 billion transistors on the latest Pascal GP-104 that powers the GTX 1080.  The vast majority of those transistors are implemented in the shader units of the chip.  While the raster units are very important, they are but a fraction of that transistor budget.  The rest is taken up by power regulation, PCI-E controllers, and memory controllers.  In the big scheme of things the raster portion is going to be dwarfed in power consumption by the shader units.  This does not mean that they are not important though.  Going back to the hated car analogy, one does not achieve weight savings by focusing on one aspect alone.  It is going over every single part of the car and shaving ounces here and there, and in the end achieving significant savings by addressing every single piece of a complex product.

This does appear to be the long and short of it.  This is one piece of a very complex ASIC that improves upon memory bandwidth utilization and power efficiency.  It is not the whole story, but it is an important part.  I find it interesting that NVIDIA did not disclose this change to editors with the introduction of Maxwell and Pascal, but if it is transparent to users and developers alike then there is no need.  There is a lot of “secret sauce” that goes into each architecture, and this is merely one aspect.  The one question that I do have is how much of the technology is based upon the Gigapixel IP that 3dfx bought at such a premium?  I believe that particular tiler was an immediate mode renderer as well due to it not having as many driver and overhead issues that PowerVR exhibited back in the day.  Obviously it would not be a copy/paste of the technology that was developed back in the 90s, it would be interesting to see if it was the basis for this current implementation.