NVIDIA Releases 364.91 Beta Drivers for Developers

Subject: Graphics Cards | April 10, 2016 - 09:04 PM |
Tagged: nvidia, vulkan, graphics drivers

This is not a main-line, WHQL driver. This is not even a mainstream beta driver. The beta GeForce 364.91 drivers (364.16 on Linux) are only available on the NVIDIA developer website, which, yes, is publicly accessible, but should probably not be installed unless you are intending to write software and every day counts. Also, some who have installed it claim that certain Vulkan demos stop working. I'm not sure whether that means the demo is out-of-date due to a rare conformance ambiguity, the driver has bugs, or the reports themselves are simply unreliable.

khronos-2016-vulkanlogo2.png

That said, if you are a software developer, and you don't mind rolling back if things go awry, you can check out the new version at NVIDIA's website. It updates Vulkan to 1.0.8, which is just documentation bugs and conformance tweaks. These things happen over time. In fact, the initial Vulkan release was actually Vulkan 1.0.3, if I remember correctly.

The driver also addresses issues with Vulkan and NVIDIA Optimus technologies, which is interesting. Optimus controls which GPU acts as primary in a laptop, switching between the discrete NVIDIA one and the Intel integrated one, depending on load and power. Vulkan and DirectX 12, however, expose all GPUs to the system. I'm curious how NVIDIA knows whether to sleep one or the other, and what that would look like to software that enumerates all compatible devices. Would it omit listing one of the GPUs? Or would it allow the software to wake the system out of Optimus should it want more performance?

Anywho, the driver is available now, but you probably should wait for official releases. The interesting thing is this seems to mean that NVIDIA will continue to release non-public Vulkan drivers. Hmm.

Source: NVIDIA
Manufacturer: NVIDIA

93% of a GP100 at least...

NVIDIA has announced the Tesla P100, the company's newest (and most powerful) accelerator for HPC. Based on the Pascal GP100 GPU, the Tesla P100 is built on 16nm FinFET and uses HBM2.

nvidia-2016-gtc-pascal-banner.png

NVIDIA provided a comparison table, which we added what we know about a full GP100 to:

  Tesla K40 Tesla M40 Tesla P100 Full GP100
GPU GK110 (Kepler) GM200 (Maxwell) GP100 (Pascal) GP100 (Pascal)
SMs 15 24 56 60
TPCs 15 24 28 (30?)
FP32 CUDA Cores / SM 192 128 64 64
FP32 CUDA Cores / GPU 2880 3072 3584 3840
FP64 CUDA Cores / SM 64 4 32 32
FP64 CUDA Cores / GPU 960 96 1792 1920
Base Clock 745 MHz 948 MHz 1328 MHz TBD
GPU Boost Clock 810/875 MHz 1114 MHz 1480 MHz TBD
FP64 GFLOPS 1680 213 5304 TBD
Texture Units 240 192 224 240
Memory Interface 384-bit GDDR5 384-bit GDDR5 4096-bit HBM2 4096-bit HBM2
Memory Size Up to 12 GB Up to 24 GB 16 GB TBD
L2 Cache Size 1536 KB 3072 KB 4096 KB TBD
Register File Size / SM 256 KB 256 KB 256 KB 256 KB
Register File Size / GPU 3840 KB 6144 KB 14336 KB 15360 KB
TDP 235 W 250 W 300 W TBD
Transistors 7.1 billion 8 billion 15.3 billion 15.3 billion
GPU Die Size 551 mm2 601 mm2 610 mm2 610mm2
Manufacturing Process 28 nm 28 nm 16 nm 16nm

This table is designed for developers that are interested in GPU compute, so a few variables (like ROPs) are still unknown, but it still gives us a huge insight into the “big Pascal” architecture. The jump to 16nm allows for about twice the number of transistors, 15.3 billion, up from 8 billion with GM200, with roughly the same die area, 610 mm2, up from 601 mm2.

nvidia-2016-gp100_block_diagram-1-624x368.png

A full GP100 processor will have 60 shader modules, compared to GM200's 24, although Pascal stores half of the shaders per SM. The GP100 part that is listed in the table above is actually partially disabled, cutting off four of the sixty total. This leads to 3584 single-precision (32-bit) CUDA cores, which is up from 3072 in GM200. (The full GP100 architecture will have 3840 of these FP32 CUDA cores -- but we don't know when or where we'll see that.) The base clock is also significantly higher than Maxwell, 1328 MHz versus ~1000 MHz for the Titan X and 980 Ti, although Ryan has overclocked those GPUs to ~1390 MHz with relative ease. This is interesting, because even though 10.6 TeraFLOPs is amazing, it's only about 20% more than what GM200 could pull off with an overclock.

Continue reading our preview of the NVIDIA Pascal architecture!!

EVGA Releases NVIDIA GeForce GTX 950 Low Power Cards

Subject: Graphics Cards | April 5, 2016 - 11:57 AM |
Tagged: PCIe power, nvidia, low-power, GTX950, GTX 950 Low Power, graphics card, gpu, GeForce GTX 950, evga

EVGA has announced new low-power versions of the NVIDIA GeForce GTX 950, some of which do not require any PCIe power connection to work.

02G-P4-0958-KR_no_6_pin.jpg

"The EVGA GeForce GTX 950 is now available in special low power models, but still retains all the performance intact. In fact, several of these models do not even have a 6-Pin power connector!"

With or without power, all of these cards are full-on GTX 950's, with 768 CUDA cores and 2GB of GDDR5 memory. The primary difference will be with clock speeds, and EVGA provides a chart to illustrate which models still require PCIe power, as well as how they compare in performance.

evga_chart.png

It looks like the links to the 75W (no PCIe power required) models aren't working just yet on EVGA's site. Doubtless we will soon have active listings for pricing and availability info.

Source: EVGA
Author:
Manufacturer: HTC

Why things are different in VR performance testing

It has been an interesting past several weeks and I find myself in an interesting spot. Clearly, and without a shred of doubt, virtual reality, more than any other gaming platform that has come before it, needs an accurate measure of performance and experience. With traditional PC gaming, if you dropped a couple of frames, or saw a slightly out of sync animation, you might notice and get annoyed. But in VR, with a head-mounted display just inches from your face taking up your entire field of view, a hitch in frame or a stutter in motion can completely ruin the immersive experience that the game developer is aiming to provide. Even worse, it could cause dizziness, nausea and define your VR experience negatively, likely killing the excitement of the platform.

pic-hmd1.jpg

My conundrum, and the one that I think most of our industry rests in, is that we don’t yet have the tools and ability to properly quantify the performance of VR. In a market and a platform that so desperately needs to get this RIGHT, we are at a point where we are just trying to get it AT ALL. I have read and seen some other glances at performance of VR headsets like the Oculus Rift and the HTC Vive released today, but honest all are missing the mark at some level. Using tools built for traditional PC gaming environments just doesn’t work, and experiential reviews talk about what the gamer can expect to “feel” but lack the data and analysis to back it up and to help point the industry in the right direction to improve in the long run.

With final hardware from both Oculus and HTC / Valve in my hands for the last three weeks, I have, with the help of Ken and Allyn, been diving into the important question of HOW do we properly test VR? I will be upfront: we don’t have a final answer yet. But we have a direction. And we have some interesting results to show you that should prove we are on the right track. But we’ll need help from the likes of Valve, Oculus, AMD, NVIDIA, Intel and Microsoft to get it right. Based on a lot of discussion I’ve had in just the last 2-3 days, I think we are moving in the correct direction.

Why things are different in VR performance testing

So why don’t our existing tools work for testing performance in VR? Things like Fraps, Frame Rating and FCAT have revolutionized performance evaluation for PCs – so why not VR? The short answer is that the gaming pipeline changes in VR with the introduction of two new SDKs: Oculus and OpenVR.

Though both have differences, the key is that they are intercepting the draw ability from the GPU to the screen. When you attach an Oculus Rift or an HTC Vive to your PC it does not show up as a display in your system; this is a change from the first developer kits from Oculus years ago. Now they are driven by what’s known as “direct mode.” This mode offers improved user experiences and the ability for the Oculus an OpenVR systems to help with quite a bit of functionality for game developers. It also means there are actions being taken on the rendered frames after we can last monitor them. At least for today.

Continue reading our experience in benchmarking VR games!!

AMD Brings Dual Fiji and HBM Memory To Server Room With FirePro S9300 x2

Subject: Graphics Cards | April 5, 2016 - 02:13 AM |
Tagged: HPC, hbm, gpgpu, firepro s9300x2, firepro, dual fiji, deep learning, big data, amd

Earlier this month AMD launched a dual Fiji powerhouse for VR gamers it is calling the Radeon Pro Duo. Now, AMD is bringing its latest GCN architecture and HBM memory to servers with the dual GPU FirePro S9300 x2.

AMD Firepro S9300x2 Server HPC Card.jpg

The new server-bound professional graphics card packs an impressive amount of computing hardware into a dual-slot card with passive cooling. The FirePro S9300 x2 combines two full Fiji GPUs clocked at 850 MHz for a total of 8,192 cores, 512 TUs, and 128 ROPs. Each GPU is paired with 4GB of non-ECC HBM memory on package with 512GB/s of memory bandwidth which AMD combines to advertise this as the first professional graphics card with 1TB/s of memory bandwidth.

Due to lower clockspeeds the S9300 x2 has less peak single precision compute performance versus the consumer Radeon Pro Duo at 13.9 TFLOPS versus 16 TFLOPs on the desktop card. Businesses will be able to cram more cards into their rack mounted servers though since they do not need to worry about mounting locations for the sealed loop water cooling of the Radeon card.

  FirePro S9300 x2 Radeon Pro Duo R9 Fury X FirePro S9170
GPU Dual Fiji Dual Fiji Fiji Hawaii
GPU Cores 8192 (2 x 4096) 8192 (2 x 4096) 4096 2816
Rated Clock 850 MHz 1050 MHz 1050 MHz 930 MHz
Texture Units 2 x 256 2 x 256 256 176
ROP Units 2 x 64 2 x 64 64 64
Memory 8GB (2 x 4GB) 8GB (2 x 4GB) 4GB 32GB ECC
Memory Clock 500 MHz 500 MHz 500 MHz 5000 MHz
Memory Interface 4096-bit (HBM) per GPU 4096-bit (HBM) per GPU 4096-bit (HBM) 512-bit
Memory Bandwidth 1TB/s (2 x 512GB/s) 1TB/s (2 x 512GB/s) 512 GB/s 320 GB/s
TDP 300 watts ? 275 watts 275 watts
Peak Compute 13.9 TFLOPS 16 TFLOPS 8.60 TFLOPS 5.24 TFLOPS
Transistor Count 17.8B 17.8B 8.9B 8.0B
Process Tech 28nm 28nm 28nm 28nm
Cooling Passive Liquid Liquid Passive
MSRP $6000 $1499 $649 $4000

AMD is aiming this card at datacenter and HPC users working on "big data" tasks that do not require the accuracy of double precision floating point calculations. Deep learning tasks, seismic processing, and data analytics are all examples AMD says the dual GPU card will excel at. These are all tasks that can be greatly accelerated by the massive parallel nature of a GPU but do not need to be as precise as stricter mathematics, modeling, and simulation work that depend on FP64 performance. In that respect, the FirePro S9300 x2 has only 870 GLFOPS of double precision compute performance.

Further, this card supports a GPGPU optimized Linux driver stack called GPUOpen and developers can program for it using either OpenCL (it supports OpenCL 1.2) or C++. AMD PowerTune, and the return of FP16 support are also features. AMD claims that its new dual GPU card is twice as fast as the NVIDIA Tesla M40 (1.6x the K80) and 12 times as fast as the latest Intel Xeon E5 in peak single precision floating point performance. 

The double slot card is powered by two PCI-E power connectors and is rated at 300 watts. This is a bit more palatable than the triple 8-pin needed for the Radeon Pro Duo!

The FirePro S9300 x2 comes with a 3 year warranty and will be available in the second half of this year for $6000 USD. You are definitely paying a premium for the professional certifications and support. Here's hoping developers come up with some cool uses for the dual 8.9 Billion transistor GPUs and their included HBM memory!

Source: AMD

NVIDIA's New Quadro VR Ready Program Targets Enterprise

Subject: Graphics Cards | April 4, 2016 - 09:00 AM |
Tagged: workstation, VR, virtual reality, quadro, NVIDIA Quadro M5500, nvidia, msi, mobile workstation, enterprise

NVIDIA's VR Ready program, which is designed to inform users which GeForce GTX GPUs “deliver an optimal VR experience”, has moved to enterprise with a new program aimed at NVIDIA Quadro GPUs and related systems.

NVIDIA_VR.png

“We’re working with top OEMs such as Dell, HP and Lenovo to offer NVIDIA VR Ready professional workstations. That means models like the HP Z Workstation, Dell Precision T5810, T7810, T7910, R7910, and the Lenovo P500, P710, and P910 all come with NVIDIA-recommended configurations that meet the minimum requirements for the highest performing VR experience.

Quadro professional GPUs power NVIDIA professional VR Ready systems. These systems put our VRWorks software development kit at the fingertips of VR headset and application developers. VRWorks offers exclusive tools and technologies — including Context Priority, Multi-res Shading, Warp & Blend, Synchronization, GPU Affinity and GPU Direct — so pro developers can create great VR experiences.”

Partners include Dell, HP, and Lenovo, with new workstations featuring NVIDIA professional VR Ready certification. 

Pro VR Ready Deck.png

Desktop isn't the only space for workstations, and in this morning's announcement NVIDIA and MSI are introducing the WT72 mobile workstation; the “the first NVIDIA VR Ready professional laptop”:

"The MSI WT72 VR Ready laptop is the first to use our new Maxwell architecture-based Quadro M5500 GPU. With 2,048 CUDA cores, the Quadro M5500 is the world’s fastest mobile GPU. It’s also our first mobile GPU for NVIDIA VR Ready professional mobile workstations, optimized for VR performance with ultra-low latency."

Here are the specs for the WT72 6QN:

  • GPU: NVIDIA Quadro M5500 3D (8GB GDDR5)
  • CPU Options:
    • Xeon E3-1505M v5
    • Core i7-6920HQ
    • Core i7-6700HQ
  • Chipset: CM236
  • Memory:
    • 64GB ECC DDR4 2133 MHz (Xeon)
    • 32GB DDR4 2133 MHz (Core i7)
  • Storage: Super RAID 4, 256GB SSD + 1TB SATA 7200 rpm
  • Display:
    • 17.3” UHD 4K (Xeon, i7-6920HQ)
    • 17.3” FHD Anti-Glare IPS (i7-6700HQ)
  • LAN: Killer Gaming Network E2400
  • Optical Drive: BD Burner
  • I/O: Thunderbolt, USB 3.0 x6, SDXC card reader
  • Webcam: FHD type (1080p/30)
  • Speakers: Dynaudio Tech Speakers 3Wx2 + Subwoofer
  • Battery: 9 cell
  • Dimensions: 16.85” x 11.57” x 1.89”
  • Weight: 8.4 lbs
  • Warranty: 3-year limited
  • Pricing:  
    • Xeon E3-1505M v5 model: $6899
    • Core i7-6920HQ model: $6299
    • Core i7-6700HQ model: $5499

MSI_NB_WT72_Skylake_Photo18.jpg

No doubt we will see details of other Quadro VR Ready workstations as GTC unfolds this week.

Source: NVIDIA

Asus Echelon GTX 950 Limited Edition In Arctic Camouflage Available Soon

Subject: Graphics Cards | March 30, 2016 - 02:58 AM |
Tagged: maxwell, gtx 950, GM206, asus

Asus is launching a new midrange gaming graphics card clad in arctic camouflage. The Echelon GTX 950 Limited Edition is a Maxwell-based card that will come factory overclocked and paired with Asus features normally reserved for their higher end cards.

This dual slot, dual fan graphics card features “auto-extreme technology” which is Asus marketing speak for high end capacitors, chokes, and other components. Further, the card uses a DirectCU II cooler that Asus claims offers 20% better cooling performance while being 3-times quieter than the NVIDIA reference cooler. Asus tweaked the shroud on this card to resemble a white and gray arctic camouflage design. There is also a reinforced backplate that continues the stealthy camo theme.

Asus Echelon GTX 950 Limited Edition.png

I/O on the Echelon GTX 950 Limited Edition includes:

  • 1 x DVI-D
  • 1 x DVI-I
  • 1 x HDMI 2.0
  • 1 x DisplayPort

The card supports NVIDIA’s G-Sync technology and the inclusion of an HDMI 2.0 port allows it to be used in a HTPC/gaming PC build for the living room though case selection would be limited since it’s a larger dual slot card.

Beneath the stealthy exterior, Asus conceals a GM206-derived GTX 950 GPU with 768 CUDA cores, 48 Texture Units, and 32 ROPs as well as 2GB of GDDR5 memory. Out of the box, users have two factory overclocks to choose from that Asus calls Gaming and Overclock modes. In gaming mode, the Echelon GTX 950 GPU is clocked at 1,140 MHz base and 1,329 MHz boost. Turing the card to OC Mode, clockspeeds are further increased to 1,165 MHz base and 1,355 MHz boost.

For reference, the, well, reference GTX 950 clockspeeds are 1,024 MHz base and 1,186 MHz boost.

Asus Echelon GTX 950 Limited Edition Artic Camo Backplate.png

Asus also ever-so-slightly overclocked the GDDR5 memory to 6,610 MHz which is unfortunately a mere 10MHz over reference. The memory sits on a 128-bit bus and while a factory overclock is nice to see, transfer speeds increases will be minimal at best.

In our review of the GTX 950 which focused on the Asus Strix variant, Ryan found it be a good option for 1080p gamers wanting a bit more graphical prowess than the 750Ti for their games.

Maximum PC reports that camo-clad Echelon GTX 950 will be available at the end of the month. Pricing has not been released by Asus, but I would expect this card to come with an MSRP of around $180 USD.

Check out our review of the NVIDIA GTX 950: Maxwell for MOBAs

 

Source: Asus

Video Perspective: Retail Oculus Rift Day One - Setup, Early Testing

Subject: General Tech, Graphics Cards | March 28, 2016 - 11:24 PM |
Tagged: pcper, hardware, technology, review, Oculus, rift, Kickstarter, nvidia, geforce, GTX 980 Ti

It's Oculus Rift launch day and the team and I spent the afternoon setting up the Rift, running through a set of game play environments and getting some good first impressions on performance, experience and more. Oh, and we entered a green screen into the mix today as well.

AMD and NVIDIA release drivers for Oculus Rift launch day!

Subject: Graphics Cards | March 28, 2016 - 10:20 AM |
Tagged: vive, valve, steamvr, rift, Oculus, nvidia, htc, amd

As the first Oculus Rift retail units begin hitting hands in the US and abroad, both AMD and NVIDIA have released new drivers to help gamers ease into the world of VR gaming. 

Up first is AMD, with Radeon Software Crimson Edition 16.3.2. It adds support for Oculus SDK v1.3 and the Radeon Pro Duo...for all none of you that have that product in your hands. AMD claims that this driver will offer "the most stable and compatible driver for developing VR experiences on the Rift to-date." AMD tells us that the latest implementation of LiquidVR features in the software help the SDKs and VR games at release take better advantage of AMD Radeon GPUs. This includes capabilities like asynchronous shaders (which AMD thinks should be capitalized for some reason??) and Quick Response Queue (which I think refers to the ability to process without context change penalties) to help Oculus implement Asynchronous Timewarp.

ocululs.jpg

NVIDIA's release is a bit more substantial, with GeForce Game Ready 364.72 WHQL drivers adding support for the Oculus Rift, HTC Vive and improvements for Dark Souls III, Killer Instinct, Paragon early access and even Quantum Break.

For the optimum experience when using the Oculus Rift, and when playing the thirty games launching alongside the headset, upgrade to today's VR-optimized Game Ready driver. Whether you're playing Chronos, Elite Dangerous, EVE: Valkyrie, or any of the other VR titles, you'll want our latest driver to minimize latency, improve performance, and add support for our newest VRWorks features that further enhance your experience.

Today's Game Ready driver also supports the HTC Vive Virtual Reality headset, which launches next week. As with the Oculus Rift, our new driver optimizes and improves the experience, and adds support for the latest Virtual Reality-enhancing technology.

Good to see both GPU vendors giving us new drivers for the release of the Oculus Rift...let's hope it pans out well and the response from the first buyers is positive!

Video Perspective: HTC Vive Pre First Impressions

Subject: General Tech, Graphics Cards | March 26, 2016 - 12:11 AM |
Tagged: VR, vive pre, vive, virtual reality, video, pre, htc

On Friday I was able to get a pre-release HTC Vive Pre in the office and spend some time with it. Not only was I interested in getting more hands-on time with the hardware without a time limit but we were also experimenting with how to stream and record VR demos and environments. 

Enjoy and mock!