Manufacturer: Overclock.net

Yes, We're Writing About a Forum Post

Update - July 19th @ 7:15pm EDT: Well that was fast. Futuremark published their statement today. I haven't read it through yet, but there's no reason to wait to link it until I do.

Update 2 - July 20th @ 6:50pm EDT: We interviewed Jani Joki, Futuremark's Director of Engineering, on our YouTube page. The interview is embed just below this update.

Original post below

The comments of a previous post notified us of an Overclock.net thread, whose author claims that 3DMark's implementation of asynchronous compute is designed to show NVIDIA in the best possible light. At the end of the linked post, they note that asynchronous compute is a general blanket, and that we should better understand what is actually going on.

amd-mantle-queues.jpg

So, before we address the controversy, let's actually explain what asynchronous compute is. The main problem is that it actually is a broad term. Asynchronous compute could describe any optimization that allows tasks to execute when it is most convenient, rather than just blindly doing them in a row.

I will use JavaScript as a metaphor. In this language, you can assign tasks to be executed asynchronously by passing functions as parameters. This allows events to execute code when it is convenient. JavaScript, however, is still only single threaded (without Web Workers and newer technologies). It cannot run callbacks from multiple events simultaneously, even if you have an available core on your CPU. What it does, however, is allow the browser to manage its time better. Many events can be delayed until the browser renders the page, it performs other high-priority tasks, or until the asynchronous code has everything it needs, like assets that are loaded from the internet.

mozilla-architecture.jpg

This is asynchronous computing.

However, if JavaScript was designed differently, it would have been possible to run callbacks on any available thread, not just the main thread when available. Again, JavaScript is not designed in this way, but this is where I pull the analogy back into AMD's Asynchronous Compute Engines. In an ideal situation, a graphics driver will be able to see all the functionality that a task will require, and shove them down an at-work GPU, provided the specific resources that this task requires are not fully utilized by the existing work.

Read on to see how this is being implemented, and what the controversy is.

Author:
Manufacturer: NVIDIA

GP106 Specifications

Twelve days ago, NVIDIA announced its competitor to the AMD Radeon RX 480, the GeForce GTX 1060, based on a new Pascal GPU; GP 106. Though that story was just a brief preview of the product, and a pictorial of the GTX 1060 Founders Edition card we were initially sent, it set the community ablaze with discussion around which mainstream enthusiast platform was going to be the best for gamers this summer.

Today we are allowed to show you our full review: benchmarks of the new GeForce GTX 1060 against the likes of the Radeon RX 480, the GTX 970 and GTX 980, and more. Starting at $250, the GTX 1060 has the potential to be the best bargain in the market today, though much of that will be decided based on product availability and our results on the following pages.

Does NVIDIA’s third consumer product based on Pascal make enough of an impact to dissuade gamers from buying into AMD Polaris?

01.jpg

All signs point to a bloody battle this July and August and the retail cards based on the GTX 1060 are making their way to our offices sooner than even those based around the RX 480. It is those cards, and not the reference/Founders Edition option, that will be the real competition that AMD has to go up against.

First, however, it’s important to find our baseline: where does the GeForce GTX 1060 find itself in the wide range of GPUs?

Continue reading our review of the GeForce GTX 1060 6GB graphics card!!

Author:
Manufacturer: Futuremark
Tagged:

Through the looking glass

Futuremark has been the most consistent and most utilized benchmark company for PCs for quite a long time. While other companies have faltered and faded, Futuremark continues to push forward with new benchmarks and capabilities in an attempt to maintain a modern way to compare performance across platforms with standardized tests.

Back in March of 2015, 3DMark added support for an API Overhead test to help gamers and editors understand the performance advantages of Mantle and DirectX 12 compared to existing APIs. Though the results were purely “peak theoretical” numbers, the data helped showcase to consumers and developers what low levels APIs brought to the table.

3dmark-time-spy-screenshot-2.jpg

Today Futuremark is releasing a new benchmark that focuses on DX12 gaming. No longer just a feature test, Time Spy is a fully baked benchmark with its own rendering engine and scenarios for evaluating the performance of graphics cards and platforms. It requires Windows 10 and a DX12-capable graphics card, and includes two different graphics tests and a CPU test. Oh, and of course, there is a stunningly gorgeous demo mode to go along with it.

I’m not going to spend much time here dissecting the benchmark itself, but it does make sense to have an idea of what kind of technologies are built into the game engine and tests. The engine is based purely on DX12, and integrates technologies like asynchronous compute, explicit multi-adapter and multi-threaded workloads. These are highly topical ideas and will be the focus of my testing today.

Futuremark provides an interesting diagram to demonstrate the advantages DX12 has over DX11. Below you will find a listing of the average number of vertices, triangles, patches and shader calls in 3DMark Fire Strike compared with 3DMark Time Spy.

daigram.png

It’s not even close here – the new Time Spy engine has more than a factor of 10 more processing calls for some of these items. As Futuremark states, however, this kind of capability isn’t free.

With DirectX 12, developers can significantly improve the multi-thread scaling and hardware utilization of their titles. But it requires a considerable amount of graphics expertise and memory-level programming skill. The programming investment is significant and must be considered from the start of a project.

Continue reading our look at 3DMark Time Spy Asynchronous Compute performance!!

Author:
Manufacturer: AMD

Radeon Software 16.7.1 Adjustments

Last week we posted a story that looked at a problem found with the new AMD Radeon RX 480 graphics card’s power consumption. The short version of the issue was that AMD’s new Polaris 10-based reference card was drawing more power than its stated 150 watt TDP and that it was drawing more power through the motherboard PCI Express slot that the connection was rated for. And sometimes that added power draw was significant, both at stock settings and overclocked. Seeing current draw over a connection rated at just 5.5A peaking over 7A at stock settings raised an alarm (validly) and our initial report detailed the problem very specifically.

AMD responded initially that “everything was fine here” but the company eventually saw the writing on the wall and started to work on potential solutions. The Radeon RX 480 is a very important product for the future of Radeon graphics and this was a launch that needs to be as perfect as it can be. Though the risk to users’ hardware with the higher than expected current draw is muted somewhat by motherboard-based over-current protection, it’s crazy to think that AMD actually believed that was the ideal scenario. Depending on the “circuit breaker” in any system to save you when standards exists for exactly that purpose is nuts.

powertesting.jpg

Today AMD has released a new driver, version 16.7.1, that actually introduces a pair of fixes for the problem. One of them is hard coded into the software and adjusts power draw from the different +12V sources (PCI Express slot and 6-pin connector) while the other is an optional flag in the software that is disabled by default.

Reconfiguring the power phase controller

The Radeon RX 480 uses a very common power controller (IR3567B) on its PCB to cycle through the 6 power phases providing electricity to the GPU itself. Allyn did some simple multimeter trace work to tell us which phases were connected to which sources and the result is seen below.

rx480-phases.jpg

The power controller is responsible for pacing the power coming in from the PCI Express slot and the 6-pin power connection to the GPU, in phases. Phases 1-3 come in from the power supply via the 6-pin connection, while phases 4-6 source power from the motherboard directly. At launch, the RX 480 drew nearly identical amounts of power from both the PEG slot and the 6-pin connection, essentially giving each of the 6 phases at work equal time.

That might seem okay, but it’s far from the standard of what we have seen in the past. In no other case have we measured a graphics card drawing equal power from the PEG slot as from an external power connector on the card. (Obviously for cards without external power connections, that’s a different discussion.) In general, with other AMD and NVIDIA based graphics cards, the motherboard slot would provide no more than 50-60 watts of power, while any above that would come from the 6/8-pin connections on the card. In many cases I saw that power draw through the PEG slot was as low as 20-30 watts if the external power connections provided a lot of overage for the target TDP of the product.

Continue reading our analysis of the new AMD 16.7.1. driver that fixed RX 480 power concerns!!

Author:
Manufacturer: NVIDIA

GP106 Preview

It’s probably not going to come as a surprise to anyone that reads the internet, but NVIDIA is officially taking the covers off its latest GeForce card in the Pascal family today, the GeForce GTX 1060. As the number scheme would suggest, this is a more budget-friendly version of NVIDIA’s latest architecture, lowering performance in line with expectations. The GP106-based GPU will still offer impressive specifications and capabilities and will probably push AMD’s new Radeon RX 480 to its limits.

01.jpg

Let’s take a quick look at the card’s details.

  GTX 1060 RX 480 R9 390 R9 380 GTX 980 GTX 970 GTX 960 R9 Nano GTX 1070
GPU GP106 Polaris 10 Grenada Tonga GM204 GM204 GM206 Fiji XT GP104
GPU Cores 1280 2304 2560 1792 2048 1664 1024 4096 1920
Rated Clock 1506 MHz 1120 MHz 1000 MHz 970 MHz 1126 MHz 1050 MHz 1126 MHz up to 1000 MHz 1506 MHz
Texture Units 80 (?) 144 160 112 128 104 64 256 120
ROP Units 48 (?) 32 64 32 64 56 32 64 64
Memory 6GB 4GB
8GB
8GB 4GB 4GB 4GB 2GB 4GB 8GB
Memory Clock 8000 MHz 7000 MHz
8000 MHz
6000 MHz 5700 MHz 7000 MHz 7000 MHz 7000 MHz 500 MHz 8000 MHz
Memory Interface 192-bit 256-bit 512-bit 256-bit 256-bit 256-bit 128-bit 4096-bit (HBM) 256-bit
Memory Bandwidth 192 GB/s 224 GB/s
256 GB/s
384 GB/s 182.4 GB/s 224 GB/s 196 GB/s 112 GB/s 512 GB/s 256 GB/s
TDP 120 watts 150 watts 275 watts 190 watts 165 watts 145 watts 120 watts 275 watts 150 watts
Peak Compute 3.85 TFLOPS 5.1 TFLOPS 5.1 TFLOPS 3.48 TFLOPS 4.61 TFLOPS 3.4 TFLOPS 2.3 TFLOPS 8.19 TFLOPS 5.7 TFLOPS
Transistor Count ? 5.7B 6.2B 5.0B 5.2B 5.2B 2.94B 8.9B 7.2B
Process Tech 16nm 14nm 28nm 28nm 28nm 28nm 28nm 28nm 16nm
MSRP (current) $249 $199 $299 $199 $379 $329 $279 $499 $379

The GeForce GTX 1060 will sport 1280 CUDA cores with a GPU Boost clock speed rated at 1.7 GHz. Though the card will be available in only 6GB varieties, the reference / Founders Edition will ship with 6GB of GDDR5 memory running at 8.0 GHz / 8 Gbps. With 1280 CUDA cores, the GP106 GPU is essentially one half of a GP104 in terms of compute capability. NVIDIA decided not to cut the memory interface in half though, instead going with a 192-bit design compared to the GP104 and its 256-bit option.

The rated GPU clock speeds paint an interesting picture for peak performance of the new card. At the rated boost clock speed, the GeForce GTX 1070 produces 6.46 TFLOPS of performance. The GTX 1060 by comparison will hit 4.35 TFLOPS, a 48% difference. The GTX 1080 offers nearly the same delta of performance above the GTX 1070; clearly NVIDIA has set the scale Pascal and product deviation.

NVIDIA wants us to compare the new GeForce GTX 1060 to the GeForce GTX 980 in gaming performance, but the peak theoretical performance results don’t really match up. The GeForce GTX 980 is rated at 4.61 TFLOPS at BASE clock speed, while the GTX 1060 doesn’t hit that number at its Boost clock. Obviously Pascal improves on performance with memory compression advancements, but the 192-bit memory bus is only able to run at 192 GB/s, compared to the 224 GB/s of the GTX 980. Obviously we’ll have to wait for performance result from our own testing to be sure, but it seems possible that NVIDIA’s performance claims might depend on technology like Simultaneous Multi-Projection and VR gaming to be validated.

Continue reading our preview of the new NVIDIA GeForce GTX 1060!!

Author:
Manufacturer: AMD

Too much power to the people?

UPDATE (7/1/16): I have added a third page to this story that looks at the power consumption and power draw of the ASUS GeForce GTX 960 Strix card. This card was pointed out by many readers on our site and on reddit as having the same problem as the Radeon RX 480. As it turns out...not so much. Check it out!

UPDATE 2 (7/2/16): We have an official statement from AMD this morning.

As you know, we continuously tune our GPUs in order to maximize their performance within their given power envelopes and the speed of the memory interface, which in this case is an unprecedented 8Gbps for GDDR5. Recently, we identified select scenarios where the tuning of some RX 480 boards was not optimal. Fortunately, we can adjust the GPU's tuning via software in order to resolve this issue. We are already testing a driver that implements a fix, and we will provide an update to the community on our progress on Tuesday (July 5, 2016).

Honestly, that doesn't tell us much. And AMD appears to be deflecting slightly by using words like "some RX 480 boards". I don't believe this is limited to a subset of cards, or review samples only. AMD does indicate that the 8 Gbps memory on the 8GB variant might be partially to blame - which is an interesting correlation to test out later. The company does promise a fix for the problem via a driver update on Tuesday - we'll be sure to give that a test and see what changes are measured in both performance and in power consumption.

The launch of the AMD Radeon RX 480 has generally been considered a success. Our review of the new reference card shows impressive gains in architectural efficiency, improved positioning against NVIDIA’s competing parts in the same price range as well as VR-ready gaming performance starting at $199 for the 4GB model. AMD has every right to be proud of the new product and should have this lone position until the GeForce product line brings a Pascal card down into the same price category.

If you read carefully through my review, there was some interesting data that cropped up around the power consumption and delivery on the new RX 480. Looking at our power consumption numbers, measured directly from the card, not from the wall, it was using slightly more than the 150 watt TDP it was advertised as. This was done at 1920x1080 and tested in both Rise of the Tomb Raider and The Witcher 3.

When overclocked, the results were even higher, approaching the 200 watt mark in Rise of the Tomb Raider!

A portion of the review over at Tom’s Hardware produced similar results but detailed the power consumption from the motherboard PCI Express connection versus the power provided by the 6-pin PCIe power cable. There has been a considerable amount of discussion in the community about the amount of power the RX 480 draws through the motherboard, whether it is out of spec and what kind of impact it might have on the stability or life of the PC the RX 480 is installed in.

As it turns out, we have the ability to measure the exact same kind of data, albeit through a different method than Tom’s, and wanted to see if the result we saw broke down in the same way.

Our Testing Methods

This is a complex topic so it makes sense to detail the methodology of our advanced power testing capability up front.

How do we do it? Simple in theory but surprisingly difficult in practice, we are intercepting the power being sent through the PCI Express bus as well as the ATX power connectors before they go to the graphics card and are directly measuring power draw with a 10 kHz DAQ (data acquisition) device. A huge thanks goes to Allyn for getting the setup up and running. We built a PCI Express bridge that is tapped to measure both 12V and 3.3V power and built some Corsair power cables that measure the 12V coming through those as well.

The result is data that looks like this.

powertesting.jpg

What you are looking at here is the power measured from the GTX 1080. From time 0 to time 8 seconds or so, the system is idle, from 8 seconds to about 18 seconds Steam is starting up the title. From 18-26 seconds the game is at the menus, we load the game from 26-39 seconds and then we play through our benchmark run after that.

There are four lines drawn in the graph, the 12V and 3.3V results are from the PCI Express bus interface, while the one labeled PCIE is from the PCIE power connection from the power supply to the card. We have the ability to measure two power inputs there but because the GTX 1080 only uses a single 8-pin connector, there is only one shown here. Finally, the blue line is labeled total and is simply that: a total of the other measurements to get combined power draw and usage by the graphics card in question.

From this we can see a couple of interesting data points. First, the idle power of the GTX 1080 Founders Edition is only about 7.5 watts. Second, under a gaming load of Rise of the Tomb Raider, the card is pulling about 165-170 watts on average, though there are plenty of intermittent, spikes. Keep in mind we are sampling the power at 1000/s so this kind of behavior is more or less expected.

Different games and applications impose different loads on the GPU and can cause it to draw drastically different power. Even if a game runs slowly, it may not be drawing maximum power from the card if a certain system on the GPU (memory, shaders, ROPs) is bottlenecking other systems.

One interesting note on our data compared to what Tom’s Hardware presents – we are using a second order low pass filter to smooth out the data to make it more readable and more indicative of how power draw is handled by the components on the PCB. Tom’s story reported “maximum” power draw at 300 watts for the RX 480 and while that is technically accurate, those figures represent instantaneous power draw. That is interesting data in some circumstances, and may actually indicate other potential issues with excessively noisy power circuitry, but to us, it makes more sense to sample data at a high rate (10 kHz) but to filter it and present it more readable way that better meshes with the continuous power delivery capabilities of the system.

1300.DSC_0233.jpg

Image source: E2E Texas Instruments

An example of instantaneous voltage spikes on power supply phase changes

Some gamers have expressed concern over that “maximum” power draw of 300 watts on the RX 480 that Tom’s Hardware reported. While that power measurement is technically accurate, it doesn’t represent the continuous power draw of the hardware. Instead, that measure is a result of a high frequency data acquisition system that may take a reading at the exact moment that a power phase on the card switches. Any DC switching power supply that is riding close to a certain power level is going to exceed that on the leading edges of phase switches for some minute amount of time. This is another reason why our low pass filter on power data can help represent real-world power consumption accurately. That doesn’t mean the spikes they measure are not a potential cause for concern, that’s just not what we are focused on with our testing.

Continue reading our analysis of the power consumption concerns surrounding the Radeon RX 480!!

Author:
Manufacturer: AMD

Polaris 10 Specifications

It would be hard at this point to NOT know about the Radeon RX 480 graphics card. AMD and the Radeon Technologies Group has been talking publicly about the Polaris architecture since December of 2015 with lofty ambitions. In the precarious position that the company rests, being well behind in market share and struggling to compete with the dominant player in the market (NVIDIA), the team was willing to sacrifice sales of current generation parts (300-series) in order to excite the user base for the upcoming move to Polaris. It is a risky bet and one that will play out over the next few months in the market.

amdpretty1.jpg

Since then AMD continued to release bits of information at a time. First there were details on the new display support, then information about the 14nm process technology advantages. We then saw demos of working silicon at CES with targeted form factors and then at events in Macau, showed press the full details and architecture. At Computex they announced rough performance metrics and a price point. Finally, at E3, AMD discussed the RX 460 and RX 470 cousins and the release date of…today. It’s been quite a whirlwind.

Today the rubber meets the road: is the Radeon RX 480 the groundbreaking and stunning graphics card that we have been promised? Or does it struggle again to keep up with the behemoth that is NVIDIA’s GeForce product line? AMD’s marketing team would have you believe that the RX 480 is the start of some kind of graphics revolution – but will the coup be successful?

Join us for our second major graphics architecture release of the summer and learn for yourself if the Radeon RX 480 is your next GPU.

Continue reading our review of the AMD Radeon RX 480 8GB Graphics Card!!

Author:
Manufacturer: AMD

AMD gets aggressive

At its Computex 2016 press conference in Taipei today, AMD has announced the branding and pricing, along with basic specifications, for one of its upcoming Polaris GPUs shipping later this June. The Radeon RX 480, based on Polaris 10, will cost just $199 and will offer more than 5 TFLOPS of compute capability. This is an incredibly aggressive move obviously aimed at continuing to gain market share at NVIDIA's expense. Details of the product are listed below.

  RX 480 GTX 1070 GTX 980 GTX 970 R9 Fury R9 Nano R9 390X R9 390
GPU Polaris 10 GP104 GM204 GM204 Fiji Pro Fiji XT Hawaii XT Grenada Pro
GPU Cores 2304 1920 2048 1664 3584 4096 2816 2560
Rated Clock ? 1506 MHz 1126 MHz 1050 MHz 1000 MHz up to 1000 MHz 1050 MHz 1000 MHz
Texture Units ? 120 128 104 224 256 176 160
ROP Units ? 64 64 56 64 64 64 64
Memory 4/8GB 8GB 4GB 4GB 4GB 4GB 8GB 8GB
Memory Clock 8000 MHz 8000 MHz 7000 MHz 7000 MHz 500 MHz 500 MHz 6000 MHz 6000 MHz
Memory Interface 256-bit 256-bit 256-bit 256-bit 4096-bit (HBM) 4096-bit (HBM) 512-bit 512-bit
Memory Bandwidth 256 GB/s 256 GB/s 224 GB/s 196 GB/s 512 GB/s 512 GB/s 384 GB/s 384 GB/s
TDP 150 watts 150 watts 165 watts 145 watts 275 watts 175 watts 275 watts 230 watts
Peak Compute > 5.0 TFLOPS 5.7 TFLOPS 4.61 TFLOPS 3.4 TFLOPS 7.20 TFLOPS 8.19 TFLOPS 5.63 TFLOPS 5.12 TFLOPS
Transistor Count ? 7.2B 5.2B 5.2B 8.9B 8.9B 6.2B 6.2B
Process Tech 14nm 16nm 28nm 28nm 28nm 28nm 28nm 28nm
MSRP (current) $199 $379 $499 $329 $549 $499 $389 $329

The RX 480 will ship with 36 CUs totaling 2304 stream processors based on the current GCN breakdown of 64 stream processors per CU. AMD didn't list clock speeds and instead is only telling us that the performance offered will exceed 5 TFLOPS of compute; how much is still a mystery and will likely change based on final clocks.

9310_ellesmere_cam1_02_0010_4K.jpg

The memory system is powered by a 256-bit GDDR5 memory controller running at 8 Gbps and hitting 256 GB/s of throughput. This is the same resulting memory bandwidth as NVIDIA's new GeForce GTX 1070 graphics card.

AMD also tells us that the TDP of the card is 150 watts, again matching the GTX 1070, though without more accurate performance data it's hard to assume anything about the new architectural efficiency of the Polaris GPUs built on the 14nm Global Foundries process.

Obviously the card will support FreeSync and all of AMD's VR features, in addition to being DP 1.3 and 1.4 ready. 

AMD stated that the RX 480 will launch on June 29th.

9310_ellesmere_cam4_02_0010_4K.jpg

I know that many of you will want us to start guessing at what performance level the new RX 480 will actually fall, and trust me, I've been trying to figure it out. Based on TFLOPS rating and memory bandwidth alone, it seems possible that the RX 480 could compete with the GTX 1070. But if that were the case, I don't think even AMD is crazy enough to set the price this far below where the GTX 1070 launched, $379. 

9310_ellesmere_cam3_02_0010_4K.jpg

I would expect the configuration of the GCN architecture to remain mostly unchanged on Polaris, compared to Hawaii, for the same reasons that we saw NVIDIA leave Pascal's basic compute architecture unchanged compared to Maxwell. Moving to the new process node was the primary goal and adding to that with drastic shifts in compute design might overly complicate product development.

9310_ellesmere_cam2_02_0010_4K.jpg

In the past, we have observed that AMD's GCN architecture tends to operate slightly less efficiently in terms of rated maximum compute capability versus realized gaming performance, at least compared to Maxwell and now Pascal. With that in mind, the >5 TFLOPS offered by the RX 480 likely lies somewhere between the Radeon R9 390 and R9 390X in realized gaming output. If that is the case, the Radeon RX 480 should have performance somewhere between the GeForce GTX 970 and the GeForce GTX 980. 

polaris-15 (1).jpg

AMD claims that the RX 480 at $199 is set to offer a "premium VR experience" that has previously be limited to $500 graphics cards (another reference to the original price of the GTX 980 perhaps...). AMD claims this should have a dramatic impact on increasing the TAM (total addressable market) for VR.

In a notable market survey, price was a leading barrier to adoption of VR. The $199 SEP for select Radeon™ RX Series GPUs is an integral part of AMD’s strategy to dramatically accelerate VR adoption and unleash the VR software ecosystem. AMD expects that its aggressive pricing will jumpstart the growth of the addressable market for PC VR and accelerate the rate at which VR headsets drop in price:

  • More affordable VR-ready desktops and notebooks
  • Making VR accessible to consumers in retail
  • Unleashing VR developers on a larger audience
  • Reducing the cost of entry to VR

AMD calls this strategy of starting with the mid-range product its "Water Drop" strategy with the goal "at releasing new graphics architectures in high volume segments first to support continued market share growth for Radeon GPUs."

So what do you guys think? Are you impressed with what Polaris looks like its going to be now?

Author:
Manufacturer: NVIDIA

GP104 Strikes Again

It’s only been three weeks since NVIDIA unveiled the GeForce GTX 1080 and GTX 1070 graphics cards at a live streaming event in Austin, TX. But it feels like those two GPUs, one of which hasn't even been reviewed until today, have already drastically shifted the landscape of graphics, VR and PC gaming.

nvidia1.jpg

Half of the “new GPU” stories are told, with AMD due to follow up soon with Polaris, but it was clear to anyone watching the enthusiast segment with a hint of history that a line was drawn in the sand that day. There is THEN, and there is NOW. Today’s detailed review of the GeForce GTX 1070 completes NVIDIA’s first wave of NOW products, following closely behind the GeForce GTX 1080.

Interestingly, and in a move that is very uncharacteristic of NVIDIA, detailed specifications of the GeForce GTX 1070 were released on GeForce.com well before today’s reviews. With information on the CUDA core count, clock speeds, and memory bandwidth it was possible to get a solid sense of where the GTX 1070 performed; and I imagine that many of you already did the napkin math to figure that out. There is no more guessing though - reviews and testing are all done, and I think you'll find that the GTX 1070 is as exciting, if not more so, than the GTX 1080 due to the performance and pricing combination that it provides.

Let’s dive in.

Continue reading our review of the GeForce GTX  1070 8GB Founders Edition!!

Manufacturer: NVIDIA

First, Some Background

 
TL;DR:
NVIDIA's Rumored GP102
 
Based on two rumors, NVIDIA seems to be planning a new GPU, called GP102, that sits between GP100 and GP104. This changes how their product stack flowed since Fermi and Kepler. GP102's performance, both single-precision and double-precision, will likely signal NVIDIA's product plans going forward.
  • - GP100's ideal 1 : 2 : 4 FP64 : FP32 : FP16 ratio is inefficient for gaming
  • - GP102 either extends GP104's gaming lead or bridges GP104 and GP100
  • - If GP102 is a bigger GP104, the future is unclear for smaller GPGPU devs
    • This is, unless GP100 can be significantly up-clocked for gaming.
  • - If GP102 matches (or outperforms) GP100 in gaming, and has better than 1 : 32 double-precision performance, then GP100 would be the first time that NVIDIA designed an enterprise-only, high-end GPU.
 

 

When GP100 was announced, Josh and I were discussing, internally, how it would make sense in the gaming industry. Recently, an article on WCCFTech cited anonymous sources, which should always be taken with a dash of salt, that claimed NVIDIA was planning a second architecture, GP102, between GP104 and GP100. As I was writing this editorial about it, relating it to our own speculation about the physics of Pascal, VideoCardz claims to have been contacted by the developers of AIDA64, seemingly on-the-record, also citing a GP102 design.

I will retell chunks of the rumor, but also add my opinion to it.

nvidia-titan-black-1.jpg

In the last few generations, each architecture had a flagship chip that was released in both gaming and professional SKUs. Neither audience had access to a chip that was larger than the other's largest of that generation. Clock rates and disabled portions varied by specific product, with gaming usually getting the more aggressive performance for slightly better benchmarks. Fermi had GF100/GF110, Kepler had GK110/GK210, and Maxwell had GM200. Each of these were available in Tesla, Quadro, and GeForce cards, especially Titans.

Maxwell was interesting, though. NVIDIA was unable to leave 28nm, which Kepler launched on, so they created a second architecture at that node. To increase performance without having access to more feature density, you need to make your designs bigger, more optimized, or more simple. GM200 was giant and optimized, but, to get the performance levels it achieved, also needed to be more simple. Something needed to go, and double-precision (FP64) performance was the big omission. NVIDIA was upfront about it at the Titan X launch, and told their GPU compute customers to keep purchasing Kepler if they valued FP64.

Fast-forward to Pascal.

Author:
Manufacturer: NVIDIA

A new architecture with GP104

Table of Contents

The summer of change for GPUs has begun with today’s review of the GeForce GTX 1080. NVIDIA has endured leaks, speculation and criticism for months now, with enthusiasts calling out NVIDIA for not including HBM technology or for not having asynchronous compute capability. Last week NVIDIA’s CEO Jen-Hsun Huang went on stage and officially announced the GTX 1080 and GTX 1070 graphics cards with a healthy amount of information about their supposed performance and price points. Issues around cost and what exactly a Founders Edition is aside, the event was well received and clearly showed a performance and efficiency improvement that we were not expecting.

DSC00209.jpg

The question is, does the actual product live up to the hype? Can NVIDIA overcome some users’ negative view of the Founders Edition to create a product message that will get the wide range of PC gamers looking for an upgrade path an option they’ll take?

I’ll let you know through the course of this review, but what I can tell you definitively is that the GeForce GTX 1080 clearly sits alone at the top of the GPU world.

Continue reading our review of the GeForce GTX 1080 Founders Edition!!

Manufacturer: NVIDIA

An Overview

 
TL;DR:
NVIDIA's Ansel Technology
 
Ansel is a utility that expands the concept of screenshots along the direction of photography. When fully enabled, it allows the user to capture still images with HDR exposures, gigapixel levels of resolution, 360-degree views for VR, 3D stereo projection, and post-processing filters, all from either the game's view, or from a free-roaming camera (if available). While it must be implemented by the game developer, mostly to prevent the user from either cheating or seeing hidden parts of the world, such as an inventory or minimap rendering room, NVIDIA claims that it is a tiny burden.
  • - NVIDIA blog claims "GTX 600-series and up"
  • - UI/UX is NVIDIA controlled
    • Allows NVIDIA to provide a consistent UI across all supported games
    • Game developers don't need to spend UX and QA effort on their own
  • - Can signal the game to use its highest-quality assets during the shot
  • - NVIDIA will provide an API for users to create their own post-process shader
    • Will allow access to Color, Normal, Depth, Geometry, (etc.) buffers
  • - When asked about implementing Ansel with ShadowPlay: "Stay tuned."
     

 

“In-game photography” is an interesting concept. Not too long ago, it was difficult to just capture the user's direct experience with a title. Print screen could only hold a single screenshot at a time, which allowed Steam and FRAPS to provide a better user experience. FRAPS also made video more accessible to the end-user, but it output huge files and, while it wasn't too expensive, it needed to be purchased online, which was a big issue ten-or-so years ago.

shadowplay-vs.jpg

Seeing that their audience would enjoy video captures, NVIDIA introduced ShadowPlay a couple of years ago. The feature allowed users to, not only record video, but also capture the last few minutes. It did this with hardware acceleration, and it did this for free (for compatible GPUs). While I don't use ShadowPlay, preferring the control of OBS, it's a good example of how NVIDIA wants to support their users. They see these features as a value-add, which draw people to their hardware.

Read on to learn more about NVIDIA Ansel

Author:
Manufacturer: AMD

History and Specifications

The Radeon Pro Duo had an interesting history. Originally shown as an unbranded, dual-GPU PCB during E3 2015, which took place last June, AMD touted it as the ultimate graphics card for both gamers and professionals. At that time, the company thought that an October launch was feasible, but that clearly didn’t work out. When pressed for information in the Oct/Nov timeframe, AMD said that they had delayed the product into Q2 2016 to better correlate with the launch of the VR systems from Oculus and HTC/Valve.

During a GDC press event in March, AMD finally unveiled the Radeon Pro Duo brand, but they were also walking back on the idea of the dual-Fiji beast being aimed at the gaming crowd, even partially. Instead, the company talked up the benefits for game developers and content creators, such as its 8192 stream processors for offline rendering, or even to aid game devs in the implementation and improvement of multi-GPU for upcoming games.

05.jpg

Anyone that pays attention to the graphics card market can see why AMD would make the positional shift with the Radeon Pro Duo. The Fiji architecture is on the way out, with Polaris due out in June by AMD’s own proclamation. At $1500, the Radeon Pro Duo will be a stark contrast to the prices of the Polaris GPUs this summer, and it is well above any NVIDIA-priced part in the GeForce line. And, though CrossFire has made drastic improvements over the last several years thanks to new testing techniques, the ecosystem for multi-GPU is going through a major shift with both DX12 and VR bearing down on it.

So yes, the Radeon Pro Duo has both RADEON and PRO right there in the name. What’s a respectable PC Perspective graphics reviewer supposed to do with a card like that if it finds its way into your office? Test it of course! I’ll take a look at a handful of recent games as well as a new feature that AMD has integrated with 3DS Max called FireRender to showcase some of the professional chops of the new card.

Continue reading our review of the AMD Radeon Pro Duo!!

Author:
Manufacturer: AMD

The Dual-Fiji Card Finally Arrives

This weekend, leaks of information on both WCCFTech and VideoCardz.com have revealed all the information about the pending release of AMD’s dual-GPU giant, the Radeon Pro Duo. While no one at PC Perspective has been briefed on the product officially, all of the interesting data surrounding the product is clearly outlined in the slides on those websites, minus some independent benchmark testing that we are hoping to get to next week. Based on the report from both sites, the Radeon Pro Duo will be released on April 26th.

AMD actually revealed the product and branding for the Radeon Pro Duo back in March, during its live streamed Capsaicin event surrounding GDC. At that point we were given the following information:

  • Dual Fiji XT GPUs
  • 8GB of total HBM memory
  • 4x DisplayPort (this has since been modified)
  • 16 TFLOPS of compute
  • $1499 price tag

The design of the card follows the same industrial design as the reference designs of the Radeon Fury X, and integrates a dual-pump cooler and external fan/radiator to keep both GPUs running cool.

01-official.jpg

Based on the slides leaked out today, AMD has revised the Radeon Pro Duo design to include a set of three DisplayPort connections and one HDMI port. This was a necessary change as the Oculus Rift requires an HDMI port to work; only the HTC Vive has built in support for a DisplayPort connection and even in that case you would need a full-size to mini-DisplayPort cable.

The 8GB of HBM (high bandwidth memory) on the card is split between the two Fiji XT GPUs on the card, just like other multi-GPU options on the market. The 350 watts power draw mark is exceptionally high, exceeded only by AMD’s previous dual-GPU beast, the Radeon 295X2 that used 500+ watts and the NVIDIA GeForce GTX Titan Z that draws 375 watts!

02-official.jpg

Here is the specification breakdown of the Radeon Pro Duo. The card has 8192 total stream processors and 128 Compute Units, split evenly between the two GPUs. You are getting two full Fiji XT GPUs in this card, an impressive feat made possible in part by the use of High Bandwidth Memory and its smaller physical footprint.

  Radeon Pro Duo R9 Nano R9 Fury R9 Fury X GTX 980 Ti TITAN X GTX 980 R9 290X
GPU Fiji XT x 2 Fiji XT Fiji Pro Fiji XT GM200 GM200 GM204 Hawaii XT
GPU Cores 8192 4096 3584 4096 2816 3072 2048 2816
Rated Clock up to 1000 MHz up to 1000 MHz 1000 MHz 1050 MHz 1000 MHz 1000 MHz 1126 MHz 1000 MHz
Texture Units 512 256 224 256 176 192 128 176
ROP Units 128 64 64 64 96 96 64 64
Memory 8GB (4GB x 2) 4GB 4GB 4GB 6GB 12GB 4GB 4GB
Memory Clock 500 MHz 500 MHz 500 MHz 500 MHz 7000 MHz 7000 MHz 7000 MHz 5000 MHz
Memory Interface 4096-bit (HMB) x 2 4096-bit (HBM) 4096-bit (HBM) 4096-bit (HBM) 384-bit 384-bit 256-bit 512-bit
Memory Bandwidth 1024 GB/s 512 GB/s 512 GB/s 512 GB/s 336 GB/s 336 GB/s 224 GB/s 320 GB/s
TDP 350 watts 175 watts 275 watts 275 watts 250 watts 250 watts 165 watts 290 watts
Peak Compute 16.38 TFLOPS 8.19 TFLOPS 7.20 TFLOPS 8.60 TFLOPS 5.63 TFLOPS 6.14 TFLOPS 4.61 TFLOPS 5.63 TFLOPS
Transistor Count 8.9B x 2 8.9B 8.9B 8.9B 8.0B 8.0B 5.2B 6.2B
Process Tech 28nm 28nm 28nm 28nm 28nm 28nm 28nm 28nm
MSRP (current) $1499 $499 $549 $649 $649 $999 $499 $329

The Radeon Pro Duo has a rated clock speed of up to 1000 MHz. That’s the same clock speed as the R9 Fury and the rated “up to” frequency on the R9 Nano. It’s worth noting that we did see a handful of instances where the R9 Nano’s power limiting capability resulted in some extremely variable clock speeds in practice. AMD recently added a feature to its Crimson driver to disable power metering on the Nano, at the expense of more power draw, and I would assume the same option would work for the Pro Duo.

Continue reading our preview of the AMD Radeon Pro Duo!!

Manufacturer: NVIDIA

93% of a GP100 at least...

NVIDIA has announced the Tesla P100, the company's newest (and most powerful) accelerator for HPC. Based on the Pascal GP100 GPU, the Tesla P100 is built on 16nm FinFET and uses HBM2.

nvidia-2016-gtc-pascal-banner.png

NVIDIA provided a comparison table, which we added what we know about a full GP100 to:

  Tesla K40 Tesla M40 Tesla P100 Full GP100
GPU GK110 (Kepler) GM200 (Maxwell) GP100 (Pascal) GP100 (Pascal)
SMs 15 24 56 60
TPCs 15 24 28 (30?)
FP32 CUDA Cores / SM 192 128 64 64
FP32 CUDA Cores / GPU 2880 3072 3584 3840
FP64 CUDA Cores / SM 64 4 32 32
FP64 CUDA Cores / GPU 960 96 1792 1920
Base Clock 745 MHz 948 MHz 1328 MHz TBD
GPU Boost Clock 810/875 MHz 1114 MHz 1480 MHz TBD
FP64 GFLOPS 1680 213 5304 TBD
Texture Units 240 192 224 240
Memory Interface 384-bit GDDR5 384-bit GDDR5 4096-bit HBM2 4096-bit HBM2
Memory Size Up to 12 GB Up to 24 GB 16 GB TBD
L2 Cache Size 1536 KB 3072 KB 4096 KB TBD
Register File Size / SM 256 KB 256 KB 256 KB 256 KB
Register File Size / GPU 3840 KB 6144 KB 14336 KB 15360 KB
TDP 235 W 250 W 300 W TBD
Transistors 7.1 billion 8 billion 15.3 billion 15.3 billion
GPU Die Size 551 mm2 601 mm2 610 mm2 610mm2
Manufacturing Process 28 nm 28 nm 16 nm 16nm

This table is designed for developers that are interested in GPU compute, so a few variables (like ROPs) are still unknown, but it still gives us a huge insight into the “big Pascal” architecture. The jump to 16nm allows for about twice the number of transistors, 15.3 billion, up from 8 billion with GM200, with roughly the same die area, 610 mm2, up from 601 mm2.

nvidia-2016-gp100_block_diagram-1-624x368.png

A full GP100 processor will have 60 shader modules, compared to GM200's 24, although Pascal stores half of the shaders per SM. The GP100 part that is listed in the table above is actually partially disabled, cutting off four of the sixty total. This leads to 3584 single-precision (32-bit) CUDA cores, which is up from 3072 in GM200. (The full GP100 architecture will have 3840 of these FP32 CUDA cores -- but we don't know when or where we'll see that.) The base clock is also significantly higher than Maxwell, 1328 MHz versus ~1000 MHz for the Titan X and 980 Ti, although Ryan has overclocked those GPUs to ~1390 MHz with relative ease. This is interesting, because even though 10.6 TeraFLOPs is amazing, it's only about 20% more than what GM200 could pull off with an overclock.

Continue reading our preview of the NVIDIA Pascal architecture!!

Author:
Manufacturer: HTC

Why things are different in VR performance testing

It has been an interesting past several weeks and I find myself in an interesting spot. Clearly, and without a shred of doubt, virtual reality, more than any other gaming platform that has come before it, needs an accurate measure of performance and experience. With traditional PC gaming, if you dropped a couple of frames, or saw a slightly out of sync animation, you might notice and get annoyed. But in VR, with a head-mounted display just inches from your face taking up your entire field of view, a hitch in frame or a stutter in motion can completely ruin the immersive experience that the game developer is aiming to provide. Even worse, it could cause dizziness, nausea and define your VR experience negatively, likely killing the excitement of the platform.

pic-hmd1.jpg

My conundrum, and the one that I think most of our industry rests in, is that we don’t yet have the tools and ability to properly quantify the performance of VR. In a market and a platform that so desperately needs to get this RIGHT, we are at a point where we are just trying to get it AT ALL. I have read and seen some other glances at performance of VR headsets like the Oculus Rift and the HTC Vive released today, but honest all are missing the mark at some level. Using tools built for traditional PC gaming environments just doesn’t work, and experiential reviews talk about what the gamer can expect to “feel” but lack the data and analysis to back it up and to help point the industry in the right direction to improve in the long run.

With final hardware from both Oculus and HTC / Valve in my hands for the last three weeks, I have, with the help of Ken and Allyn, been diving into the important question of HOW do we properly test VR? I will be upfront: we don’t have a final answer yet. But we have a direction. And we have some interesting results to show you that should prove we are on the right track. But we’ll need help from the likes of Valve, Oculus, AMD, NVIDIA, Intel and Microsoft to get it right. Based on a lot of discussion I’ve had in just the last 2-3 days, I think we are moving in the correct direction.

Why things are different in VR performance testing

So why don’t our existing tools work for testing performance in VR? Things like Fraps, Frame Rating and FCAT have revolutionized performance evaluation for PCs – so why not VR? The short answer is that the gaming pipeline changes in VR with the introduction of two new SDKs: Oculus and OpenVR.

Though both have differences, the key is that they are intercepting the draw ability from the GPU to the screen. When you attach an Oculus Rift or an HTC Vive to your PC it does not show up as a display in your system; this is a change from the first developer kits from Oculus years ago. Now they are driven by what’s known as “direct mode.” This mode offers improved user experiences and the ability for the Oculus an OpenVR systems to help with quite a bit of functionality for game developers. It also means there are actions being taken on the rendered frames after we can last monitor them. At least for today.

Continue reading our experience in benchmarking VR games!!

Author:
Manufacturer: Various

A system worthy of VR!

Early this year I started getting request after request for hardware suggestions for upcoming PC builds for VR. The excitement surrounding the Oculus Rift and the HTC Vive has caught fire across all spectrums of technology, from PC enthusiasts to gaming enthusiasts to just those of you interested in a technology that has been "right around the corner" for decades. The requests for build suggestions spanned our normal readership as well as those that had previously only focused on console gaming, and thus the need for a selection of build guides began.

Looking for all of the PC Perspective Spring 2016 VR guides?

I launched build guides for $900 and $1500 price points earlier in the week, but today we look at the flagship option, targeting a budget of $2500. Though this is a pricey system that should not be undertaken lightly, it is far from a "crazy expensive" build with multiple GPUs, multiple CPUs or high dollar items unnecessary for gaming and VR.

system1.jpg

With that in mind, let's jump right into the information you are looking for: the components we recommend.

VR Build Guide
$2500 Spring 2016
Component Amazon.com Link B&H Photo Link
Processor Intel Core i7-5930K $527 $578
Motherboard ASUS X99-A USB 3.1 $264 $259
Memory Corsair Dominator Platinum 16GB DDR4-3000 $169  
Graphics Card ASUS GeForce GTX 980 Ti STRIX $659 $669
Storage 512GB Samsung 950 Pro
Western Digital Red 4TB
$326
$180
$322
$154
Power Supply Corsair HX750i Platinum $144 $149
CPU Cooler Corsair H100i v2 $107 $107
Case Corsair Carbide 600C $149 $141
Total Price   Full cart - $2,519  

For those of you interested in a bit more detail on the why of the parts selection, rather than just the what, I have some additional information for you.

cpu.jpg

Unlike the previous two builds that used Intel's consumer Skylake processors, our $2500 build moves to the Haswell-E platform, an enthusiast design that comes from the realm of workstation products. The Core i7-5930K is a 6-core processor with HyperThreading, allowing for 12 addressable threads. Though we are targeting this machine for VR gaming, the move to this processor will mean better performance for other tasks as well including video encoding, photo editing and more. It's unlocked too - so if you want to stretch that clock speed up via overclocking, you have the flexibility for that.

Update: Several people have pointed out that the Core i7-5820K is a very similar processor to the 5930K, with a $100-150 price advantage. It's another great option if you are looking to save a bit more money, and you don't expect to want/need the additional PCI Express lanes the 5930K offers (40 lanes versus 28 lanes).

mb.jpg

With the transition to Haswell-E we have an ASUS X99-A USB 3.1 motherboard. This board is the first in our VR builds to support not just 2-Way SLI and CrossFire but 3-Way as well if we find that VR games and engines are able to consistently and properly integrate support for multi-GPU. This recently updated board from ASUS includes USB 3.1 support as you can tell from the name, includes 8 slots for DDR4 memory and offers enough PCIe lanes for expansion in all directions.

Looking to build a PC for the very first time, or need a refresher? You can find our recent step-by-step build videos to help you through the process right here!!

980ti.jpg

For our graphics card we have gone with the ASUS GeForce GTX 980 Ti Strix. The 980 Ti is the fastest single GPU solution on the market today and with 6GB of memory on-board should be able to handle anything that VR can toss at it. In terms of compute performance the 980 Ti is more than 40% faster than the GTX 980, the GPU used in our $1500 solution. The Strix integration uses a custom cooler that performs much better than the stock solution and is quieter. 

Continue reading our recommend build for a VR system with a budget of $2500!!

Author:
Manufacturer: AMD

Some Hints as to What Comes Next

On March 14 at the Capsaicin event at GDC AMD disclosed their roadmap for GPU architectures through 2018.  There were two new names in attendance as well as some hints at what technology will be implemented in these products.  It was only one slide, but some interesting information can be inferred from what we have seen and what was said in the event and afterwards during interviews.

Polaris the the next generation of GCN products from AMD that have been shown off for the past few months.  Previously in December and at CES we saw the Polaris 11 GPU on display.  Very little is known about this product except that it is small and extremely power efficient.  Last night we saw the Polaris 10 being run and we only know that it is competitive with current mainstream performance and is larger than the Polaris 11.  These products are purportedly based on Samsung/GLOBALFOUNDRIES 14nm LPP.

roadmap.jpg

The source of near endless speculation online.

In the slide AMD showed it listed Polaris as having 2.5X the performance per watt over the previous 28 nm products in AMD’s lineup.  This is impressive, but not terribly surprising.  AMD and NVIDIA both skipped the 20 nm planar node because it just did not offer up the type of performance and scaling to make sense economically.  Simply put, the expense was not worth the results in terms of die size improvements and more importantly power scaling.  20 nm planar just could not offer the type of performance overall that GPU manufacturers could achieve with 2nd and 3rd generation 28nm processes.

What was missing from the slide is mention that Polaris will integrate either HMB1 or HBM2.  Vega, the architecture after Polaris, does in fact list HBM2 as the memory technology it will be packaged with.  It promises another tick up in terms of performance per watt, but that is going to come more from aggressive design optimizations and likely improvements on FinFET process technologies.  Vega will be a 2017 product.

Beyond that we see Navi.  It again boasts an improvement in perf per watt as well as the inclusion of a new memory technology behind HBM.  Current conjecture is that this could be HMC (hybrid memory cube).  I am not entirely certain of that particular conjecture as it does not necessarily improve upon the advantages of current generation HBM and upcoming HBM2 implementations.  Navi will not show up until 2018 at the earliest.  This *could* be a 10 nm part, but considering the struggle that the industry has had getting to 14/16nm FinFET I am not holding my breath.

AMD provided few details about these products other than what we see here.  From here on out is conjecture based upon industry trends, analysis of known roadmaps, and the limitations of the process and memory technologies that are already well known.

Click here to read the rest about AMD's upcoming roadmap!

Shedding a little light on Monday's announcement

Most of our readers should have some familiarity with GameWorks, which is a series of libraries and utilities that help game developers (and others) create software. While many hardware and platform vendors provide samples and frameworks, taking the brunt of the work required to solve complex problems, this is NVIDIA's branding for their suite of technologies. Their hope is that it pushes the industry forward, which in turn drives GPU sales as users see the benefits of upgrading.

nvidia-2016-gdc-gameworksmission.png

This release, GameWorks SDK 3.1, contains three complete features and two “beta” ones. We will start with the first three, each of which target a portion of the lighting and shadowing problem. The last two, which we will discuss at the end, are the experimental ones and fall under the blanket of physics and visual effects.

nvidia-2016-gdc-volumetriclighting-fallout.png

The first technology is Volumetric Lighting, which simulates the way light scatters off dust in the atmosphere. Game developers have been approximating this effect for a long time. In fact, I remember a particular section of Resident Evil 4 where you walk down a dim hallway that has light rays spilling in from the windows. Gamecube-era graphics could only do so much, though, and certain camera positions show that the effect was just a translucent, one-sided, decorative plane. It was a cheat that was hand-placed by a clever artist.

nvidia-2016-gdc-volumetriclighting-shaftswireframe.png

GameWorks' Volumetric Lighting goes after the same effect, but with a much different implementation. It looks at the generated shadow maps and, using hardware tessellation, extrudes geometry from the unshadowed portions toward the light. These little bits of geometry sum, depending on how deep the volume is, which translates into the required highlight. Also, since it's hardware tessellated, it probably has a smaller impact on performance because the GPU only needs to store enough information to generate the geometry, not store (and update) the geometry data for all possible light shafts themselves -- and it needs to store those shadow maps anyway.

nvidia-2016-gdc-volumetriclighting-shaftsfinal.png

Even though it seemed like this effect was independent of render method, since it basically just adds geometry to the scene, I asked whether it was locked to deferred rendering methods. NVIDIA said that it should be unrelated, as I suspected, which is good for VR. Forward rendering is easier to anti-alias, which makes the uneven pixel distribution (after lens distortion) appear more smooth.

Read on to see the other four technologies, and a little announcement about source access.

Author:
Manufacturer: GitHub

A start to proper testing

During all the commotion last week surrounding the release of a new Ashes of the Singularity DX12 benchmark, Microsoft's launching of the Gears of War Ultimate Edition on the Windows Store and the company's supposed desire to merge Xbox and PC gaming, a constant source of insight for me was one Andrew Lauritzen. Andrew is a graphics guru at Intel and has extensive knowledge of DirectX, rendering, engines, etc. and has always been willing to teach and educate me on areas that crop up. The entire DirectX 12 and Unified Windows Platform was definitely one such instance. 

Yesterday morning Andrew pointed me to a GitHub release for a tool called PresentMon, a small sample of code written by a colleague of Andrew's that might be the beginnings of being able to properly monitor performance of DX12 games and even UWP games.

The idea is simple and it's implementation even more simple: PresentMon monitors the Windows event tracing stack for present commands and records data about them to a CSV file. Anyone familiar with the kind of ETW data you can gather will appreciate that PresentMon culls out nearly all of the headache of data gathering by simplifying the results into application name/ID, Present call deltas and a bit more.

gears.jpg

Gears of War Ultimate Edition - the debated UWP version

The "Present" method in Windows is what produces a frame and shows it to the user. PresentMon looks at the Windows events running through the system, takes note of when those present commands are received by the OS for any given application, and records the time between them. Because this tool runs at the OS level, it can capture Present data from all kinds of APIs including DX12, DX11, OpenGL, Vulkan and more. It does have limitations though - it is read only so producing an overlay on the game/application being tested isn't possible today. (Or maybe ever in the case of UWP games.) 

What PresentMon offers us at this stage is an early look at a Fraps-like performance monitoring tool. In the same way that Fraps was looking for Present commands from Windows and recording them, PresentMon does the same thing, at a very similar point in the rendering pipeline as well. What is important and unique about PresentMon is that it is API independent and useful for all types of games and programs.

presentmonscreen.png

PresentMon at work

The first and obvious question for our readers is how this performance monitoring tool compares with Frame Rating, our FCAT-based capture benchmarking platform we have used on GPUs and CPUs for years now. To be honest, it's not the same and should not be considered an analog to it. Frame Rating and capture-based testing looks for smoothness, dropped frames and performance at the display, while Fraps and PresentMon look at performance closer to the OS level, before the graphics driver really gets the final say in things. I am still targeting for universal DX12 Frame Rating testing with exclusive full screen capable applications and expect that to be ready sooner rather than later. However, what PresentMon does give us is at least an early universal look at DX12 performance including games that are locked behind the Windows Store rules.

Continue reading our look at the new PresentMon tool!!