NVIDIA Release 368.95 Hotfix Driver for DPC Latency

Subject: Graphics Cards | July 22, 2016 - 05:51 PM |
Tagged: pascal, nvidia, graphics drivers

Turns out the Pascal-based GPUs suffered from DPC latency issues, and there's been an ongoing discussion about it for a little over a month. This is not an area that I know a lot about, but it's a system that schedules workloads by priority, which provides regular windows of time for sound and video devices to update. It can be stalled by long-running driver code, though, which could manifest as stutter, audio hitches, and other performance issues. With a 10-series GeForce device installed, users have reported that this latency increases about 10-20x, from ~20us to ~300-400us. This can increase to 1000us or more under load. (8333us is ~1 whole frame at 120FPS.)

nvidia-2015-bandaid.png

NVIDIA has acknowledged the issue and, just yesterday, released an optional hotfix. Upon installing the driver, while it could just be psychosomatic, the system felt a lot more responsive. I ran LatencyMon (DPCLat isn't compatible with Windows 8.x or Windows 10) before and after, and the latency measurement did drop significantly. It was consistently the largest source of latency, spiking in the thousands of microseconds, before the update. After the update, it was hidden by other drivers for the first night, although today it seems to have a few spikes again. That said, Microsoft's networking driver is also spiking in the ~200-300us range, so a good portion of it might be the sad state of my current OS install. I've been meaning to do a good system wipe for a while...

nvidia-2016-hotfix-pascaldpc.png

Measurement taken after the hotfix, while running Spotify.
That said, my computer's a mess right now.

That said, some of the post-hotfix driver spikes are reaching ~570us (mostly when I play music on Spotify through my Blue Yeti Pro). Also, Photoshop CC 2015 started complaining about graphics acceleration issues after installing the hotfix, so only install it if you're experiencing problems. About the latency, if it's not just my machine, NVIDIA might still have some work to do.

It does feel a lot better, though.

Source: NVIDIA

NVIDIA Announces GP102-based TITAN X with 3,584 CUDA cores

Subject: Graphics Cards | July 21, 2016 - 10:21 PM |
Tagged: titan x, titan, pascal, nvidia, gp102

Donning the leather jacket he goes very few places without, NVIDIA CEO Jen-Hsun Huang showed up at an AI meet-up at Stanford this evening to show, for the very first time, a graphics card based on a never before seen Pascal GP102 GPU. 

titanxpascal1.jpg

Source: Twitter (NVIDIA)

Rehashing an old name, NVIDIA will call this new graphics card the Titan X. You know, like the "new iPad" this is the "new TitanX." Here is the data we know about thus far:

  Titan X (Pascal) GTX 1080 GTX 980 Ti TITAN X GTX 980 R9 Fury X R9 Fury R9 Nano R9 390X
GPU GP102 GP104 GM200 GM200 GM204 Fiji XT Fiji Pro Fiji XT Hawaii XT
GPU Cores 3584 2560 2816 3072 2048 4096 3584 4096 2816
Rated Clock 1417 MHz 1607 MHz 1000 MHz 1000 MHz 1126 MHz 1050 MHz 1000 MHz up to 1000 MHz 1050 MHz
Texture Units 224 (?) 160 176 192 128 256 224 256 176
ROP Units 96 (?) 64 96 96 64 64 64 64 64
Memory 12GB 8GB 6GB 12GB 4GB 4GB 4GB 4GB 8GB
Memory Clock 10000 MHz 10000 MHz 7000 MHz 7000 MHz 7000 MHz 500 MHz 500 MHz 500 MHz 6000 MHz
Memory Interface 384-bit G5X 256-bit G5X 384-bit 384-bit 256-bit 4096-bit (HBM) 4096-bit (HBM) 4096-bit (HBM) 512-bit
Memory Bandwidth 480 GB/s 320 GB/s 336 GB/s 336 GB/s 224 GB/s 512 GB/s 512 GB/s 512 GB/s 320 GB/s
TDP 250 watts 180 watts 250 watts 250 watts 165 watts 275 watts 275 watts 175 watts 275 watts
Peak Compute 11.0 TFLOPS 8.2 TFLOPS 5.63 TFLOPS 6.14 TFLOPS 4.61 TFLOPS 8.60 TFLOPS 7.20 TFLOPS 8.19 TFLOPS 5.63 TFLOPS
Transistor Count 11.0B 7.2B 8.0B 8.0B 5.2B 8.9B 8.9B 8.9B 6.2B
Process Tech 16nm 16nm 28nm 28nm 28nm 28nm 28nm 28nm 28nm
MSRP (current) $1,200 $599 $649 $999 $499 $649 $549 $499 $329

Note: everything with a ? on is educated guesses on our part.

Obviously there is a lot for us to still learn about this new GPU and graphics card, including why in the WORLD it is still being called Titan X, rather than...just about anything else. That aside, GP102 will feature 40% more CUDA cores than the GP104 at slightly lower clock speeds. The rated 11 TFLOPS of single precision compute of the new Titan X is 34% better than that of the GeForce GTX 1080 and I would expect gaming performance to scale in line with that difference.

The new Titan X will feature 12GB of GDDR5X memory, not HBM as the GP100 chip has, so this is clearly a new chip with a new memory interface. NVIDIA claims it will have 480 GB/s of bandwidth, and I am guessing is built on a 384-bit memory controller interface running at the same 10 Gbps as the GTX 1080. It's truly amazing hardware.

titanxpascal2.jpg

What will you be asked to pay? $1200, going on sale on August 2nd, and only on NVIDIA.com, at least for now. Considering the prices of GeForce GTX 1080 cards with such limited availability, the $1200 price tag MIGHT NOT seem so insane. That's higher than the $999 starting price of the Titan X based on Maxwell in March of 2015 - the claims that NVIDIA is artificially raising prices of cards in each segment will continue, it seems.

I am curious about the TDP on the new Titan X - will it hit the 250 watt mark of the previous version? Yes, apparently it will it that 250 watt TDP - specs above updated. Does this also mean we'll see a GeForce GTX 1080 Ti that falls between the GTX 1080 and this new Titan X? Maybe, but we are likely looking at an $899 or higher SEP - so get those wallets ready. 

That's it for now; we'll have a briefing where we can get more details soon, and hopefully a review ready for you on August 2nd when the cards go on sale!

Source: NVIDIA

Podcast #409 - GTX 1060 Review, 3DMark Time Spy Controversy, Tiny Nintendo and more!

Subject: General Tech | July 21, 2016 - 12:21 PM |
Tagged: Wraith, Volta, video, time spy, softbank, riotoro, retroarch, podcast, nvidia, new, kaby lake, Intel, gtx 1060, geforce, asynchronous compute, async compute, arm, apollo lake, amd, 3dmark, 10nm, 1070m, 1060m

PC Perspective Podcast #409 - 07/21/2016

Join us this week as we discuss the GTX 1060 review, controversy surrounding the async compute of 3DMark Time Spy and more!!

You can subscribe to us through iTunes and you can still access it directly through the RSS page HERE.

The URL for the podcast is: http://pcper.com/podcast - Share with your friends!

This episode of the PC Perspective Podcast is sponsored by Casper!

Hosts:  Ryan Shrout, Allyn Malventano, Jeremy Hellstrom, and Josh Walrath

Program length: 1:34:57
  1. Week in Review:
  2. 0:51:17 This episode of the PC Perspective Podcast is sponsored by Casper!
  3. News items of interest:
  4. 1:26:26 Hardware/Software Picks of the Week
    1. Ryan: Sapphire Nitro Bot
    2. Allyn: klocki - chill puzzle game (also on iOS / Android)
  5. Closing/outro

Report: NVIDIA GeForce GTX 1070M and 1060M Specs Leaked

Subject: Graphics Cards | July 20, 2016 - 12:19 PM |
Tagged: VideoCardz, rumor, report, nvidia, GTX 1070M, GTX 1060M, GeForce GTX 1070, GeForce GTX 1060, 2048 CUDA Cores

Specifications for the upcoming mobile version of NVIDIA's GTX 1070 GPU may have leaked, and according to the report at VideoCardz.com this GTX 1070M will have 2048 CUDA cores; 128 more than the desktop version's 1920 cores.

nvidia-geforce-gtx-1070-mobile-specs.jpg

Image credit: BenchLife via VideoCardz

The report comes via BenchLife, with the screenshot of GPU-Z showing the higher CUDA core count (though VideoCardz mentions the TMU count should be 128). The memory interface remains at 256-bit for the mobile version, with 8GB of GDDR5.

VideoCardz reported another GPU-Z screenshot (via PurePC) of the mobile GTX 1060, which appears to offer the same specs of the desktop version, at a slightly lower clock speed.

nvidia-geforce-gtx-1060-mobile-specs.jpg

Image credit: PurePC via VideoCardz

Finally, this chart was provided for reference:

videocardz_chart.PNG

Image credit: VideoCardz

Note the absence of information about a mobile variant of the GTX 1080, details of which are still unknown (for now).

Source: VideoCardz
Manufacturer: Overclock.net

Yes, We're Writing About a Forum Post

Update - July 19th @ 7:15pm EDT: Well that was fast. Futuremark published their statement today. I haven't read it through yet, but there's no reason to wait to link it until I do.

Update 2 - July 20th @ 6:50pm EDT: We interviewed Jani Joki, Futuremark's Director of Engineering, on our YouTube page. The interview is embed just below this update.

Original post below

The comments of a previous post notified us of an Overclock.net thread, whose author claims that 3DMark's implementation of asynchronous compute is designed to show NVIDIA in the best possible light. At the end of the linked post, they note that asynchronous compute is a general blanket, and that we should better understand what is actually going on.

amd-mantle-queues.jpg

So, before we address the controversy, let's actually explain what asynchronous compute is. The main problem is that it actually is a broad term. Asynchronous compute could describe any optimization that allows tasks to execute when it is most convenient, rather than just blindly doing them in a row.

I will use JavaScript as a metaphor. In this language, you can assign tasks to be executed asynchronously by passing functions as parameters. This allows events to execute code when it is convenient. JavaScript, however, is still only single threaded (without Web Workers and newer technologies). It cannot run callbacks from multiple events simultaneously, even if you have an available core on your CPU. What it does, however, is allow the browser to manage its time better. Many events can be delayed until the browser renders the page, it performs other high-priority tasks, or until the asynchronous code has everything it needs, like assets that are loaded from the internet.

mozilla-architecture.jpg

This is asynchronous computing.

However, if JavaScript was designed differently, it would have been possible to run callbacks on any available thread, not just the main thread when available. Again, JavaScript is not designed in this way, but this is where I pull the analogy back into AMD's Asynchronous Compute Engines. In an ideal situation, a graphics driver will be able to see all the functionality that a task will require, and shove them down an at-work GPU, provided the specific resources that this task requires are not fully utilized by the existing work.

Read on to see how this is being implemented, and what the controversy is.

NVIDIA's GTX 1060, the newest in their Hari Seldon lineup of cards

Subject: Graphics Cards | July 19, 2016 - 01:54 PM |
Tagged: pascal, nvidia, gtx 1060, gp106, geforce, founders edition

The GTX 1060 Founders Edition has arrived and also happens to be our first look at the 16nm FinFET GP106 silicon, the GTX 1080 and 1070 used GP104.  This card features 10 SMs, 1280 CUDA cores, 48 ROPs and 80 texture units, in many ways it is a half of a GTX 1080. The GPU is clocked at a base of 1506MHz with a boost of 1708MHz, the 6GB of VRAM at 8GHz.  [H]ard|OCP took this card through its paces, contrasting it with the RX480 and the GTX 980 at resolutions of 1440p as well as the more common 1080p.  As they do not use the frame rating tools which are the basis of our graphics testing of all cards, including the GTX 1060 of course, they included the new DOOM in their test suite.  Read on to see how they felt the card compared to the competition ... just don't expect to see a follow up article on SLI performance.

1468921254mrv4f5CHZE_1_14_l.jpg

"NVIDIA's GeForce GTX 1060 video card is launched today in the $249 and $299 price point for the Founders Edition. We will find out how it performs in comparison to AMD Radeon RX 480 in DOOM with the Vulkan API as well as DX12 and DX11 games. We'll also see how a GeForce GTX 980 compares in real world gaming."

Here are some more Graphics Card articles from around the web:

Graphics Cards

Source: [H]ard|OCP
Author:
Manufacturer: NVIDIA

GP106 Specifications

Twelve days ago, NVIDIA announced its competitor to the AMD Radeon RX 480, the GeForce GTX 1060, based on a new Pascal GPU; GP 106. Though that story was just a brief preview of the product, and a pictorial of the GTX 1060 Founders Edition card we were initially sent, it set the community ablaze with discussion around which mainstream enthusiast platform was going to be the best for gamers this summer.

Today we are allowed to show you our full review: benchmarks of the new GeForce GTX 1060 against the likes of the Radeon RX 480, the GTX 970 and GTX 980, and more. Starting at $250, the GTX 1060 has the potential to be the best bargain in the market today, though much of that will be decided based on product availability and our results on the following pages.

Does NVIDIA’s third consumer product based on Pascal make enough of an impact to dissuade gamers from buying into AMD Polaris?

01.jpg

All signs point to a bloody battle this July and August and the retail cards based on the GTX 1060 are making their way to our offices sooner than even those based around the RX 480. It is those cards, and not the reference/Founders Edition option, that will be the real competition that AMD has to go up against.

First, however, it’s important to find our baseline: where does the GeForce GTX 1060 find itself in the wide range of GPUs?

Continue reading our review of the GeForce GTX 1060 6GB graphics card!!

NVIDIA's New #OrderOf10 Origins Contest

Subject: Graphics Cards | July 19, 2016 - 01:07 AM |
Tagged: nvidia

Honestly, when I first received this news, I thought it was a mistaken re-announcement of the contest from a few months ago. The original Order of 10 challenge was made up of a series of puzzles, and the first handful of people to solve it, received a GTX 10-Series graphics card. Turns out, NVIDIA is doing it again.

nvidia-2016-orderof10-july.png

For four weeks, starting on July 21st, NVIDIA will add four new challenges and, more importantly, 100 new “chances to win”. They did not announce what those prizes will be or whether all of them will be distributed to the first 25 complete entries of each challenge, though. Some high-profile YouTube personalities, such as some of the members of Rooster Teeth, were streaming their attempts the last time around, so there might be some of that again this time, too.

Source: NVIDIA

Rumor: 16nm for NVIDIA's Volta Architecture

Subject: Graphics Cards | July 16, 2016 - 06:37 PM |
Tagged: Volta, pascal, nvidia, maxwell, 16nm

For the past few generations, NVIDIA has been roughly trying to release a new architecture with a new process node, and release a refresh the following year. This ran into a hitch as Maxwell was delayed a year, apart from the GTX 750 Ti, and then pushed back to the same 28nm process that Kepler utilized. Pascal caught up with 16nm, although we know that some hard, physical limitations are right around the corner. The lattice spacing for silicon at room temperature is around ~0.5nm, so we're talking about features the size of ~the low 30s of atoms in width.

nvidia-2016-gtc-pascal-fivemiracles.png

This rumor claims that NVIDIA is not trying to go with 10nm for Volta. Instead, it will take place on the same, 16nm node that Pascal is currently occupying. This is quite interesting, because GPUs scale quite well with complexity changes, as they have many features with a relatively low clock rate, so the only real ways to increase performance are to make the existing architecture more efficient, or make a larger chip.

That said, GP100 leaves a lot of room on the table for an FP32-optimized, ~600mm2 part to crush its performance at the high end, similar to how GM200 replaced GK110. The rumored GP102, expected in the ~450mm2 range for Titan or GTX 1080 Ti-style parts, has some room to grow. Like GM200, however, it would also be unappealing to GPU compute users who need FP64. If this is what is going on, and we're totally just speculating at the moment, it would signal that enterprise customers should expect a new GPGPU card every second gaming generation.

That is, of course, unless NVIDIA recognized ways to make the Maxwell-based architecture significantly more die-space efficient in Volta. Clocks could get higher, or the circuits themselves could get simpler. You would think that, especially in the latter case, they would have integrated those ideas into Maxwell and Pascal, though; but, like HBM2 memory, there might have been a reason why they couldn't.

We'll need to wait and see. The entire rumor could be crap, who knows?

Source: Fudzilla

Ansel arrives and NVIDIA holds a celebration in their VR Funhouse

Subject: General Tech | July 14, 2016 - 06:06 PM |
Tagged: nvidia, vr funhouse, ansel, vrworks

A while back Scott wrote about NVIDIA's Ansel, a screenshot application on performance enhancing drugs.  Today it arrives, paired with their new driver and adds support for Mirror's Edge Catalyst to the list of supported games such as The Witcher 3: Wild Hunt, Unreal Tournament, Tom Clancy’s The Division, and No Man’s Sky just to name a few.  The tool allows you to take 360 degree screen captures, allowing you to completely rotate around the image on a 2D screen or with VR headsets like the Vive or Rift.  Just trigger the recording while you are in game, the game will pause and you can roll, zoom, and position your focus to get the screenshot you want.  From there hit the Super Resolution button and your screenshot will be of significantly greater quality than the game ever could be.  The thumbnail below is available in its original 46080x25920 resolution by visting NVIDIA's Ansel page, it is a mere 1.7GB in size.

nvidia-geforce-gtx-1080-nvidia-ansel-super-resolution.png

NVIDIA also released their first game today, a VR Funhouse available on Steam for no charge ... apart from the HTC Vive and minimum hardware requirements of an GTX 1060 and i7 4790 or the recommended GTX 1080 and i7 5930, which are enough of an investment as it is.  There are seven games to play, expect skeet shooting, whack a mole and other standard carny games.  At the same time it is a showcase of NVIDIA's VR technology, not just the *Works which we are familiar with but also VR SLI support for those with multiple GPUs and VRWorks Multi-res Shading which reduces processing load by only rendering full detail to objects within your field of view.  If you have the hardware you should check out the game, it is certanly worth the admission price.

ss_71e15985dcdd6aa6ae3f83dfa661df7a18887085.600x338.jpg

 

Source: NVIDIA