93% of a GP100 at least...
NVIDIA has announced the Tesla P100, the company's newest (and most powerful) accelerator for HPC. Based on the Pascal GP100 GPU, the Tesla P100 is built on 16nm FinFET and uses HBM2.
NVIDIA provided a comparison table, which we added what we know about a full GP100 to:
|Tesla K40||Tesla M40||Tesla P100||Full GP100|
|GPU||GK110 (Kepler)||GM200 (Maxwell)||GP100 (Pascal)||GP100 (Pascal)|
|FP32 CUDA Cores / SM||192||128||64||64|
|FP32 CUDA Cores / GPU||2880||3072||3584||3840|
|FP64 CUDA Cores / SM||64||4||32||32|
|FP64 CUDA Cores / GPU||960||96||1792||1920|
|Base Clock||745 MHz||948 MHz||1328 MHz||TBD|
|GPU Boost Clock||810/875 MHz||1114 MHz||1480 MHz||TBD|
|Memory Interface||384-bit GDDR5||384-bit GDDR5||4096-bit HBM2||4096-bit HBM2|
|Memory Size||Up to 12 GB||Up to 24 GB||16 GB||TBD|
|L2 Cache Size||1536 KB||3072 KB||4096 KB||TBD|
|Register File Size / SM||256 KB||256 KB||256 KB||256 KB|
|Register File Size / GPU||3840 KB||6144 KB||14336 KB||15360 KB|
|TDP||235 W||250 W||300 W||TBD|
|Transistors||7.1 billion||8 billion||15.3 billion||15.3 billion|
|GPU Die Size||551 mm2||601 mm2||610 mm2||610mm2|
|Manufacturing Process||28 nm||28 nm||16 nm||16nm|
This table is designed for developers that are interested in GPU compute, so a few variables (like ROPs) are still unknown, but it still gives us a huge insight into the “big Pascal” architecture. The jump to 16nm allows for about twice the number of transistors, 15.3 billion, up from 8 billion with GM200, with roughly the same die area, 610 mm2, up from 601 mm2.
A full GP100 processor will have 60 shader modules, compared to GM200's 24, although Pascal stores half of the shaders per SM. The GP100 part that is listed in the table above is actually partially disabled, cutting off four of the sixty total. This leads to 3584 single-precision (32-bit) CUDA cores, which is up from 3072 in GM200. (The full GP100 architecture will have 3840 of these FP32 CUDA cores -- but we don't know when or where we'll see that.) The base clock is also significantly higher than Maxwell, 1328 MHz versus ~1000 MHz for the Titan X and 980 Ti, although Ryan has overclocked those GPUs to ~1390 MHz with relative ease. This is interesting, because even though 10.6 TeraFLOPs is amazing, it's only about 20% more than what GM200 could pull off with an overclock.
Subject: Graphics Cards | April 5, 2016 - 11:57 AM | Sebastian Peak
Tagged: PCIe power, nvidia, low-power, GTX950, GTX 950 Low Power, graphics card, gpu, GeForce GTX 950, evga
EVGA has announced new low-power versions of the NVIDIA GeForce GTX 950, some of which do not require any PCIe power connection to work.
"The EVGA GeForce GTX 950 is now available in special low power models, but still retains all the performance intact. In fact, several of these models do not even have a 6-Pin power connector!"
With or without power, all of these cards are full-on GTX 950's, with 768 CUDA cores and 2GB of GDDR5 memory. The primary difference will be with clock speeds, and EVGA provides a chart to illustrate which models still require PCIe power, as well as how they compare in performance.
It looks like the links to the 75W (no PCIe power required) models aren't working just yet on EVGA's site. Doubtless we will soon have active listings for pricing and availability info.
Why things are different in VR performance testing
It has been an interesting past several weeks and I find myself in an interesting spot. Clearly, and without a shred of doubt, virtual reality, more than any other gaming platform that has come before it, needs an accurate measure of performance and experience. With traditional PC gaming, if you dropped a couple of frames, or saw a slightly out of sync animation, you might notice and get annoyed. But in VR, with a head-mounted display just inches from your face taking up your entire field of view, a hitch in frame or a stutter in motion can completely ruin the immersive experience that the game developer is aiming to provide. Even worse, it could cause dizziness, nausea and define your VR experience negatively, likely killing the excitement of the platform.
My conundrum, and the one that I think most of our industry rests in, is that we don’t yet have the tools and ability to properly quantify the performance of VR. In a market and a platform that so desperately needs to get this RIGHT, we are at a point where we are just trying to get it AT ALL. I have read and seen some other glances at performance of VR headsets like the Oculus Rift and the HTC Vive released today, but honest all are missing the mark at some level. Using tools built for traditional PC gaming environments just doesn’t work, and experiential reviews talk about what the gamer can expect to “feel” but lack the data and analysis to back it up and to help point the industry in the right direction to improve in the long run.
With final hardware from both Oculus and HTC / Valve in my hands for the last three weeks, I have, with the help of Ken and Allyn, been diving into the important question of HOW do we properly test VR? I will be upfront: we don’t have a final answer yet. But we have a direction. And we have some interesting results to show you that should prove we are on the right track. But we’ll need help from the likes of Valve, Oculus, AMD, NVIDIA, Intel and Microsoft to get it right. Based on a lot of discussion I’ve had in just the last 2-3 days, I think we are moving in the correct direction.
Why things are different in VR performance testing
So why don’t our existing tools work for testing performance in VR? Things like Fraps, Frame Rating and FCAT have revolutionized performance evaluation for PCs – so why not VR? The short answer is that the gaming pipeline changes in VR with the introduction of two new SDKs: Oculus and OpenVR.
Though both have differences, the key is that they are intercepting the draw ability from the GPU to the screen. When you attach an Oculus Rift or an HTC Vive to your PC it does not show up as a display in your system; this is a change from the first developer kits from Oculus years ago. Now they are driven by what’s known as “direct mode.” This mode offers improved user experiences and the ability for the Oculus an OpenVR systems to help with quite a bit of functionality for game developers. It also means there are actions being taken on the rendered frames after we can last monitor them. At least for today.
Subject: Graphics Cards | April 5, 2016 - 02:13 AM | Tim Verry
Tagged: HPC, hbm, gpgpu, firepro s9300x2, firepro, dual fiji, deep learning, big data, amd
Earlier this month AMD launched a dual Fiji powerhouse for VR gamers it is calling the Radeon Pro Duo. Now, AMD is bringing its latest GCN architecture and HBM memory to servers with the dual GPU FirePro S9300 x2.
The new server-bound professional graphics card packs an impressive amount of computing hardware into a dual-slot card with passive cooling. The FirePro S9300 x2 combines two full Fiji GPUs clocked at 850 MHz for a total of 8,192 cores, 512 TUs, and 128 ROPs. Each GPU is paired with 4GB of non-ECC HBM memory on package with 512GB/s of memory bandwidth which AMD combines to advertise this as the first professional graphics card with 1TB/s of memory bandwidth.
Due to lower clockspeeds the S9300 x2 has less peak single precision compute performance versus the consumer Radeon Pro Duo at 13.9 TFLOPS versus 16 TFLOPs on the desktop card. Businesses will be able to cram more cards into their rack mounted servers though since they do not need to worry about mounting locations for the sealed loop water cooling of the Radeon card.
|FirePro S9300 x2||Radeon Pro Duo||R9 Fury X||FirePro S9170|
|GPU||Dual Fiji||Dual Fiji||Fiji||Hawaii|
|GPU Cores||8192 (2 x 4096)||8192 (2 x 4096)||4096||2816|
|Rated Clock||850 MHz||1050 MHz||1050 MHz||930 MHz|
|Texture Units||2 x 256||2 x 256||256||176|
|ROP Units||2 x 64||2 x 64||64||64|
|Memory||8GB (2 x 4GB)||8GB (2 x 4GB)||4GB||32GB ECC|
|Memory Clock||500 MHz||500 MHz||500 MHz||5000 MHz|
|Memory Interface||4096-bit (HBM) per GPU||4096-bit (HBM) per GPU||4096-bit (HBM)||512-bit|
|Memory Bandwidth||1TB/s (2 x 512GB/s)||1TB/s (2 x 512GB/s)||512 GB/s||320 GB/s|
|TDP||300 watts||?||275 watts||275 watts|
|Peak Compute||13.9 TFLOPS||16 TFLOPS||8.60 TFLOPS||5.24 TFLOPS|
AMD is aiming this card at datacenter and HPC users working on "big data" tasks that do not require the accuracy of double precision floating point calculations. Deep learning tasks, seismic processing, and data analytics are all examples AMD says the dual GPU card will excel at. These are all tasks that can be greatly accelerated by the massive parallel nature of a GPU but do not need to be as precise as stricter mathematics, modeling, and simulation work that depend on FP64 performance. In that respect, the FirePro S9300 x2 has only 870 GLFOPS of double precision compute performance.
Further, this card supports a GPGPU optimized Linux driver stack called GPUOpen and developers can program for it using either OpenCL (it supports OpenCL 1.2) or C++. AMD PowerTune, and the return of FP16 support are also features. AMD claims that its new dual GPU card is twice as fast as the NVIDIA Tesla M40 (1.6x the K80) and 12 times as fast as the latest Intel Xeon E5 in peak single precision floating point performance.
The double slot card is powered by two PCI-E power connectors and is rated at 300 watts. This is a bit more palatable than the triple 8-pin needed for the Radeon Pro Duo!
The FirePro S9300 x2 comes with a 3 year warranty and will be available in the second half of this year for $6000 USD. You are definitely paying a premium for the professional certifications and support. Here's hoping developers come up with some cool uses for the dual 8.9 Billion transistor GPUs and their included HBM memory!
Subject: Graphics Cards | April 4, 2016 - 09:00 AM | Sebastian Peak
Tagged: workstation, VR, virtual reality, quadro, NVIDIA Quadro M5500, nvidia, msi, mobile workstation, enterprise
NVIDIA's VR Ready program, which is designed to inform users which GeForce GTX GPUs “deliver an optimal VR experience”, has moved to enterprise with a new program aimed at NVIDIA Quadro GPUs and related systems.
“We’re working with top OEMs such as Dell, HP and Lenovo to offer NVIDIA VR Ready professional workstations. That means models like the HP Z Workstation, Dell Precision T5810, T7810, T7910, R7910, and the Lenovo P500, P710, and P910 all come with NVIDIA-recommended configurations that meet the minimum requirements for the highest performing VR experience.
Quadro professional GPUs power NVIDIA professional VR Ready systems. These systems put our VRWorks software development kit at the fingertips of VR headset and application developers. VRWorks offers exclusive tools and technologies — including Context Priority, Multi-res Shading, Warp & Blend, Synchronization, GPU Affinity and GPU Direct — so pro developers can create great VR experiences.”
Partners include Dell, HP, and Lenovo, with new workstations featuring NVIDIA professional VR Ready certification.
Desktop isn't the only space for workstations, and in this morning's announcement NVIDIA and MSI are introducing the WT72 mobile workstation; the “the first NVIDIA VR Ready professional laptop”:
"The MSI WT72 VR Ready laptop is the first to use our new Maxwell architecture-based Quadro M5500 GPU. With 2,048 CUDA cores, the Quadro M5500 is the world’s fastest mobile GPU. It’s also our first mobile GPU for NVIDIA VR Ready professional mobile workstations, optimized for VR performance with ultra-low latency."
Here are the specs for the WT72 6QN:
- GPU: NVIDIA Quadro M5500 3D (8GB GDDR5)
- CPU Options:
- Xeon E3-1505M v5
- Core i7-6920HQ
- Core i7-6700HQ
- Chipset: CM236
- 64GB ECC DDR4 2133 MHz (Xeon)
- 32GB DDR4 2133 MHz (Core i7)
- Storage: Super RAID 4, 256GB SSD + 1TB SATA 7200 rpm
- 17.3” UHD 4K (Xeon, i7-6920HQ)
- 17.3” FHD Anti-Glare IPS (i7-6700HQ)
- LAN: Killer Gaming Network E2400
- Optical Drive: BD Burner
- I/O: Thunderbolt, USB 3.0 x6, SDXC card reader
- Webcam: FHD type (1080p/30)
- Speakers: Dynaudio Tech Speakers 3Wx2 + Subwoofer
- Battery: 9 cell
- Dimensions: 16.85” x 11.57” x 1.89”
- Weight: 8.4 lbs
- Warranty: 3-year limited
- Xeon E3-1505M v5 model: $6899
- Core i7-6920HQ model: $6299
- Core i7-6700HQ model: $5499
No doubt we will see details of other Quadro VR Ready workstations as GTC unfolds this week.
Subject: Graphics Cards | March 30, 2016 - 02:58 AM | Tim Verry
Tagged: maxwell, gtx 950, GM206, asus
Asus is launching a new midrange gaming graphics card clad in arctic camouflage. The Echelon GTX 950 Limited Edition is a Maxwell-based card that will come factory overclocked and paired with Asus features normally reserved for their higher end cards.
This dual slot, dual fan graphics card features “auto-extreme technology” which is Asus marketing speak for high end capacitors, chokes, and other components. Further, the card uses a DirectCU II cooler that Asus claims offers 20% better cooling performance while being 3-times quieter than the NVIDIA reference cooler. Asus tweaked the shroud on this card to resemble a white and gray arctic camouflage design. There is also a reinforced backplate that continues the stealthy camo theme.
I/O on the Echelon GTX 950 Limited Edition includes:
- 1 x DVI-D
- 1 x DVI-I
- 1 x HDMI 2.0
- 1 x DisplayPort
The card supports NVIDIA’s G-Sync technology and the inclusion of an HDMI 2.0 port allows it to be used in a HTPC/gaming PC build for the living room though case selection would be limited since it’s a larger dual slot card.
Beneath the stealthy exterior, Asus conceals a GM206-derived GTX 950 GPU with 768 CUDA cores, 48 Texture Units, and 32 ROPs as well as 2GB of GDDR5 memory. Out of the box, users have two factory overclocks to choose from that Asus calls Gaming and Overclock modes. In gaming mode, the Echelon GTX 950 GPU is clocked at 1,140 MHz base and 1,329 MHz boost. Turing the card to OC Mode, clockspeeds are further increased to 1,165 MHz base and 1,355 MHz boost.
For reference, the, well, reference GTX 950 clockspeeds are 1,024 MHz base and 1,186 MHz boost.
Asus also ever-so-slightly overclocked the GDDR5 memory to 6,610 MHz which is unfortunately a mere 10MHz over reference. The memory sits on a 128-bit bus and while a factory overclock is nice to see, transfer speeds increases will be minimal at best.
In our review of the GTX 950 which focused on the Asus Strix variant, Ryan found it be a good option for 1080p gamers wanting a bit more graphical prowess than the 750Ti for their games.
Maximum PC reports that camo-clad Echelon GTX 950 will be available at the end of the month. Pricing has not been released by Asus, but I would expect this card to come with an MSRP of around $180 USD.
Subject: General Tech, Graphics Cards | March 28, 2016 - 11:24 PM | Ryan Shrout
Tagged: pcper, hardware, technology, review, Oculus, rift, Kickstarter, nvidia, geforce, GTX 980 Ti
It's Oculus Rift launch day and the team and I spent the afternoon setting up the Rift, running through a set of game play environments and getting some good first impressions on performance, experience and more. Oh, and we entered a green screen into the mix today as well.
Subject: Graphics Cards | March 28, 2016 - 10:20 AM | Ryan Shrout
Tagged: vive, valve, steamvr, rift, Oculus, nvidia, htc, amd
As the first Oculus Rift retail units begin hitting hands in the US and abroad, both AMD and NVIDIA have released new drivers to help gamers ease into the world of VR gaming.
Up first is AMD, with Radeon Software Crimson Edition 16.3.2. It adds support for Oculus SDK v1.3 and the Radeon Pro Duo...for all none of you that have that product in your hands. AMD claims that this driver will offer "the most stable and compatible driver for developing VR experiences on the Rift to-date." AMD tells us that the latest implementation of LiquidVR features in the software help the SDKs and VR games at release take better advantage of AMD Radeon GPUs. This includes capabilities like asynchronous shaders (which AMD thinks should be capitalized for some reason??) and Quick Response Queue (which I think refers to the ability to process without context change penalties) to help Oculus implement Asynchronous Timewarp.
NVIDIA's release is a bit more substantial, with GeForce Game Ready 364.72 WHQL drivers adding support for the Oculus Rift, HTC Vive and improvements for Dark Souls III, Killer Instinct, Paragon early access and even Quantum Break.
For the optimum experience when using the Oculus Rift, and when playing the thirty games launching alongside the headset, upgrade to today's VR-optimized Game Ready driver. Whether you're playing Chronos, Elite Dangerous, EVE: Valkyrie, or any of the other VR titles, you'll want our latest driver to minimize latency, improve performance, and add support for our newest VRWorks features that further enhance your experience.
Today's Game Ready driver also supports the HTC Vive Virtual Reality headset, which launches next week. As with the Oculus Rift, our new driver optimizes and improves the experience, and adds support for the latest Virtual Reality-enhancing technology.
Good to see both GPU vendors giving us new drivers for the release of the Oculus Rift...let's hope it pans out well and the response from the first buyers is positive!
Subject: General Tech, Graphics Cards | March 26, 2016 - 12:11 AM | Ryan Shrout
Tagged: VR, vive pre, vive, virtual reality, video, pre, htc
On Friday I was able to get a pre-release HTC Vive Pre in the office and spend some time with it. Not only was I interested in getting more hands-on time with the hardware without a time limit but we were also experimenting with how to stream and record VR demos and environments.
Enjoy and mock!
A system worthy of VR!
Early this year I started getting request after request for hardware suggestions for upcoming PC builds for VR. The excitement surrounding the Oculus Rift and the HTC Vive has caught fire across all spectrums of technology, from PC enthusiasts to gaming enthusiasts to just those of you interested in a technology that has been "right around the corner" for decades. The requests for build suggestions spanned our normal readership as well as those that had previously only focused on console gaming, and thus the need for a selection of build guides began.
I launched build guides for $900 and $1500 price points earlier in the week, but today we look at the flagship option, targeting a budget of $2500. Though this is a pricey system that should not be undertaken lightly, it is far from a "crazy expensive" build with multiple GPUs, multiple CPUs or high dollar items unnecessary for gaming and VR.
With that in mind, let's jump right into the information you are looking for: the components we recommend.
|VR Build Guide
$2500 Spring 2016
|Component||Amazon.com Link||B&H Photo Link|
|Processor||Intel Core i7-5930K||$527||$578|
|Motherboard||ASUS X99-A USB 3.1||$264||$259|
|Memory||Corsair Dominator Platinum 16GB DDR4-3000||$169|
|Graphics Card||ASUS GeForce GTX 980 Ti STRIX||$659||$669|
|Storage||512GB Samsung 950 Pro
Western Digital Red 4TB
|Power Supply||Corsair HX750i Platinum||$144||$149|
|CPU Cooler||Corsair H100i v2||$107||$107|
|Case||Corsair Carbide 600C||$149||$141|
|Total Price||Full cart - $2,519|
For those of you interested in a bit more detail on the why of the parts selection, rather than just the what, I have some additional information for you.
Unlike the previous two builds that used Intel's consumer Skylake processors, our $2500 build moves to the Haswell-E platform, an enthusiast design that comes from the realm of workstation products. The Core i7-5930K is a 6-core processor with HyperThreading, allowing for 12 addressable threads. Though we are targeting this machine for VR gaming, the move to this processor will mean better performance for other tasks as well including video encoding, photo editing and more. It's unlocked too - so if you want to stretch that clock speed up via overclocking, you have the flexibility for that.
Update: Several people have pointed out that the Core i7-5820K is a very similar processor to the 5930K, with a $100-150 price advantage. It's another great option if you are looking to save a bit more money, and you don't expect to want/need the additional PCI Express lanes the 5930K offers (40 lanes versus 28 lanes).
With the transition to Haswell-E we have an ASUS X99-A USB 3.1 motherboard. This board is the first in our VR builds to support not just 2-Way SLI and CrossFire but 3-Way as well if we find that VR games and engines are able to consistently and properly integrate support for multi-GPU. This recently updated board from ASUS includes USB 3.1 support as you can tell from the name, includes 8 slots for DDR4 memory and offers enough PCIe lanes for expansion in all directions.
Looking to build a PC for the very first time, or need a refresher? You can find our recent step-by-step build videos to help you through the process right here!!
For our graphics card we have gone with the ASUS GeForce GTX 980 Ti Strix. The 980 Ti is the fastest single GPU solution on the market today and with 6GB of memory on-board should be able to handle anything that VR can toss at it. In terms of compute performance the 980 Ti is more than 40% faster than the GTX 980, the GPU used in our $1500 solution. The Strix integration uses a custom cooler that performs much better than the stock solution and is quieter.