A Beautiful Graphics Card
As a surprise to nearly everyone, on July 21st NVIDIA announced the existence of the new Titan X graphics cards, which are based on the brand new GP102 Pascal GPU. Though it shares a name, for some unexplained reason, with the Maxwell-based Titan X graphics card launched in March of 2015, this is card is a significant performance upgrade. Using the largest consumer-facing Pascal GPU to date (with only the GP100 used in the Tesla P100 exceeding it), the new Titan X is going to be a very expensive, and very fast gaming card.
As has been the case since the introduction of the Titan brand, NVIDIA claims that this card is for gamers that want the very best in graphics hardware as well as for developers and need an ultra-powerful GPGPU device. GP102 does not integrate improved FP64 / double precision compute cores, so we are basically looking at an upgraded and improved GP104 Pascal chip. That’s nothing to sneeze at, of course, and you can see in the specifications below that we expect (and can now show you) Titan X (Pascal) is a gaming monster.
|Titan X (Pascal)||GTX 1080||GTX 980 Ti||TITAN X||GTX 980||R9 Fury X||R9 Fury||R9 Nano||R9 390X|
|GPU||GP102||GP104||GM200||GM200||GM204||Fiji XT||Fiji Pro||Fiji XT||Hawaii XT|
|Rated Clock||1417 MHz||1607 MHz||1000 MHz||1000 MHz||1126 MHz||1050 MHz||1000 MHz||up to 1000 MHz||1050 MHz|
|Memory Clock||10000 MHz||10000 MHz||7000 MHz||7000 MHz||7000 MHz||500 MHz||500 MHz||500 MHz||6000 MHz|
|Memory Interface||384-bit G5X||256-bit G5X||384-bit||384-bit||256-bit||4096-bit (HBM)||4096-bit (HBM)||4096-bit (HBM)||512-bit|
|Memory Bandwidth||480 GB/s||320 GB/s||336 GB/s||336 GB/s||224 GB/s||512 GB/s||512 GB/s||512 GB/s||320 GB/s|
|TDP||250 watts||180 watts||250 watts||250 watts||165 watts||275 watts||275 watts||175 watts||275 watts|
|Peak Compute||11.0 TFLOPS||8.2 TFLOPS||5.63 TFLOPS||6.14 TFLOPS||4.61 TFLOPS||8.60 TFLOPS||7.20 TFLOPS||8.19 TFLOPS||5.63 TFLOPS|
GP102 features 40% more CUDA cores than the GP104 at slightly lower clock speeds. The rated 11 TFLOPS of single precision compute of the new Titan X is 34% higher than that of the GeForce GTX 1080 and I would expect gaming performance to scale in line with that difference.
Titan X (Pascal) does not utilize the full GP102 GPU; the recently announced Pascal P6000 does, however, which gives it a CUDA core count of 3,840 (256 more than Titan X).
A full GP102 GPU
The complete GPU effectively loses 7% of its compute capability with the new Titan X, although that is likely to help increase available clock headroom and yield.
The new Titan X will feature 12GB of GDDR5X memory, not HBM as the GP100 chip has, so this is clearly a unique chip with a new memory interface. NVIDIA claims it has 480 GB/s of bandwidth on a 384-bit memory controller interface running at the same 10 Gbps as the GTX 1080.
Realworldtech with Compelling Evidence
Yesterday David Kanter of Realworldtech posted a pretty fascinating article and video that explored the two latest NVIDIA architectures and how they have branched away from the traditional immediate mode rasterization units. It has revealed through testing that with Maxwell and Pascal NVIDIA has gone to a tiling method with rasterization. This is a somewhat significant departure for the company considering they have utilized the same basic immediate mode rasterization model since the 90s.
The Videologic Apocolypse 3Dx based on the PowerVR PCX2.
(photo courtesy of Wikipedia)
Tiling is an interesting subject and we can harken back to the PowerVR days to see where it was first implemented. There are many advantages to tiling and deferred rendering when it comes to overall efficiency in power and memory bandwidth. These first TBDR (Tile Based Deferred Renderers) offered great performance per clock and could utilize slower memory as compared to other offerings of the day (namely Voodoo Graphics). There were some significant drawbacks to the technology. Essentially a lot of work had to be done by the CPU and driver in scene setup and geometry sorting. On fast CPU systems the PowerVR boards could provide very good performance, but it suffered on lower end parts as compared to the competition. This is a very simple explanation of what is going on, but the long and short of it is that TBDR did not take over the world due to limitations in its initial implementations. Traditional immediate mode rasters would improve in efficiency and performance with aggressive Z checks and other optimizations that borrow from the TBDR playbook.
Tiling is also present in a lot of mobile parts. Imagination’s PowerVR graphics technologies have been implemented by others such as Intel, Apple, Mediatek, and others. Qualcomm (Adreno) and ARM (Mali) both implement tiler technologies to improve power consumption and performance while increasing bandwidth efficiency. Perhaps most interestingly we can remember back to the Gigapixel days with the GP-1 chip that implemented a tiling method that seemed to work very well without the CPU hit and driver overhead that had plagued the PowerVR chips up to that point. 3dfx bought Gigapixel for some $150 million at the time. That company then went on to file bankruptcy a year later and their IP was acquired by NVIDIA.
Screenshot of the program used to uncover the tiling behavior of the rasterizer.
It now appears as though NVIDIA has evolved their raster units to embrace tiling. This is not a full TBDR implementation, but rather an immediate mode tiler that will still break up the scene in tiles but does not implement deferred rendering. This change should improve bandwidth efficiency when it comes to rasterization, but it does not affect the rest of the graphics pipeline by forcing it to be deferred (tessellation, geometry setup and shaders, etc. are not impacted). NVIDIA has not done a deep dive on this change for editors, so we do not know the exact implementation and what advantages we can expect. We can look at the evidence we have and speculate where those advantages exist.
The video where David Kanter explains his findings
Bandwidth and Power
Tilers have typically taken the tiled regions and buffered them on the chip. This is a big improvement in both performance and power efficiency as the raster data does not have to be cached and written out to the frame buffer and then swapped back. This makes quite a bit of sense considering the overall lack of big jumps in memory technologies over the past five years. We have had GDDR-5 since 2007/2008. The speeds have increased over time, but the basic technology is still much the same. We have seen HBM introduced with AMD’s Fury series, but large scale production of HBM 2 is still to come. Samsung has released small amounts of HBM 2 to the market, but not nearly enough to handle the needs of a mass produced card. GDDR-5X is an extension of GDDR-5 that does offer more bandwidth, but it is still not a next generation memory technology like HBM 2.
By utilizing a tiler NVIDIA is able to lower memory bandwidth needs for the rasterization stage. Considering that both Maxwell and Pascal architectures are based on GDDR-5 and 5x technologies, it makes sense to save as much bandwidth as possible where they can. This is again probably one, among many, of the reasons that we saw a much larger L2 cache in Maxwell vs. Kepler (2048 KB vs. 256KB respectively). Every little bit helps when we are looking at hard, real world bandwidth limits for a modern GPU.
The area of power efficiency has also come up in discussion when going to a tiler. Tilers have traditionally been more power efficient as well due to how the raster data is tiled and cached, requiring fewer reads and writes to main memory. The first impulse is to say, “Hey, this is the reason why NVIDIA’s Maxwell was so much more power efficient than Kepler and AMD’s latest parts!” Sadly, this is not exactly true. The tiler is more power efficient, but it is a small part to the power savings on a GPU.
The second fastest Pascal based card...
A modern GPU is very complex. There are some 7.2 billion transistors on the latest Pascal GP-104 that powers the GTX 1080. The vast majority of those transistors are implemented in the shader units of the chip. While the raster units are very important, they are but a fraction of that transistor budget. The rest is taken up by power regulation, PCI-E controllers, and memory controllers. In the big scheme of things the raster portion is going to be dwarfed in power consumption by the shader units. This does not mean that they are not important though. Going back to the hated car analogy, one does not achieve weight savings by focusing on one aspect alone. It is going over every single part of the car and shaving ounces here and there, and in the end achieving significant savings by addressing every single piece of a complex product.
This does appear to be the long and short of it. This is one piece of a very complex ASIC that improves upon memory bandwidth utilization and power efficiency. It is not the whole story, but it is an important part. I find it interesting that NVIDIA did not disclose this change to editors with the introduction of Maxwell and Pascal, but if it is transparent to users and developers alike then there is no need. There is a lot of “secret sauce” that goes into each architecture, and this is merely one aspect. The one question that I do have is how much of the technology is based upon the Gigapixel IP that 3dfx bought at such a premium? I believe that particular tiler was an immediate mode renderer as well due to it not having as many driver and overhead issues that PowerVR exhibited back in the day. Obviously it would not be a copy/paste of the technology that was developed back in the 90s, it would be interesting to see if it was the basis for this current implementation.
Subject: Graphics Cards | August 1, 2016 - 10:52 PM | Scott Michaud
Tagged: nvidia, Lawsuit, GTX 980, gtx 960
Update @ 9:45pm: I heard that some AMD users were notified about their R9 purchase as well, calling it simply "R9". Since I didn't see concrete proof, I omit it from the post in case it was a hoax (as the story is still developing). I have since been notified of a tweet with an email screenshot.
Original post below:
Apparently, Newegg is informing customers that NVIDIA has settled a class action lawsuit with customers of the GeForce GTX 960 and GTX 980 cards, along with the GTX 970. It's currently unclear whether this is an error, or whether this is one of the sibling class action lawsuits that were apparently bundled together with the GTX 970 one. Users on the NVIDIA Reddit are claiming that it has to do with DirectX 12 feature level support, although that seems like knee-jerk confirmation bias to me.
Regardless, if you purchased a GeForce 900-series graphics card from Newegg, maybe even including the 980 Ti, then you should check your email. You might have a settlement en-route.
That's all we know at this point, though. Thanks to our readers for pointing this out.
Subject: Graphics Cards | August 1, 2016 - 07:39 PM | Sebastian Peak
Tagged: pascal, nvidia, notebooks, mobile gpu, mobile gaming, laptops, GTX 1080M, GTX 1070M, GTX 1060M, discrete gpu
VideoCardz is reporting that an official announcement of the rumored mobile GPUs might be coming at Gamescom later this month.
"Mobile Pascal may arrive at Gamescom in Europe. According to DigiTimes, NVIDIA would allow its notebook partners to unveil mobile Pascal between August 17th to 21st, so just when Gamescom is hosted is hosted in Germany."
We had previously reported on the rumors of a mobile GTX 1070 and 1060, and we can only assume a 1080 will also be available (though VideoCardz is not speculating on the specs of this high-end mobile card just yet).
Rumored NVIDIA Mobile Pascal GPU specs (Image credit: VideoCardz)
Gamescom runs from August 17 - 21 in Germany, so we only have to wait about three weeks to know for sure.
NVIDIA Offers Preliminary Settlement To Geforce GTX 970 Buyers In False Advertising Class Action Lawsuit
Subject: Graphics Cards | July 28, 2016 - 11:07 PM | Tim Verry
Tagged: nvidia, maxwell, GTX 970, GM204, 3.5gb memory
A recent post on Top Class Actions suggests that buyers of NVIDIA GTX 970 graphics cards may soon see a payout from a settlement agreement as part of the series of class action lawsuits facing NVIDIA over claims of false advertising. NVIDIA has reportedly offered up a preliminary settlement of $30 to "all consumers who purchased the GTX 970 graphics card" with no cap on the total payout amount along with a whopping $1.3 million in attorney's fees.
This settlement offer is in response to several class action lawsuits that consumers filed against the graphics giant following the controversy over mis-advertised specifications (particularly the number of ROP units and amount of L2 cache) and the method in which NVIDIA's GM204 GPU addressed the four total gigabytes of graphics memory.
Specifically, the graphics card specifications initially indicated that it had 64 ROPs and 2048 KB of L2 cache, but later was revealed to have only 56 ROPs and 1792 KB of L2. On the memory front, the "3.5 GB memory controvesy" spawned many memes and investigations into how the 3.5 GB and 0.5 GB pools of memory worked and how performance both real world and theoretical were affected by the memory setup.
(My opinions follow)
It was quite the PR disaster and had NVIDIA been upfront with all the correct details on specifications and the new memory implementation the controversy could have been avoided. As is though buyers were not able to make informed decisions about the card and at the end of the day that is what is important and why the lawsuits have merit.
As such, I do expect both sides to reach a settlement rather than see this come to a full trial, but it may not be exactly the $30 per buyer payout as that amount still needs to be approved by the courts to ensure that it is "fair and reasonable."
For more background on the GTX 970 memory issue (it has been awhile since this all came about after all, so you may need a refresher):
- NVIDIA Discloses Full Memory Structure and Limitations of GTX 970
- NVIDIA Responds to GTX 970 3.5GB Memory Issue
- Frame Rating: GTX 970 Memory Issues Tested in SLI
- Frame Rating: Looking at GTX 970 Memory Performance
Subject: Editorial | July 28, 2016 - 05:03 PM | Ryan Shrout
Tagged: XSPC, wings, windows 10, VR, video, titan x, tegra, Silverstone, sapphire, rx 480, Raystorm, RapidSpar, radeon pro ssg, quadro, px1, podcast, p6000, p5000, nvidia, nintendo nx, MX300, gp102, evga, dg-87, crucial, angelbird
PC Perspective Podcast #410 - 07/28/2016
Join us this week as we discuss the new Pascal based Titan X, an AMD graphics card with 1TB of SSD storage on-board, data recovery with RapidSpar and more!!
The URL for the podcast is: http://pcper.com/podcast - Share with your friends!
- iTunes - Subscribe to the podcast directly through the Store (audio only)
- Google Play - Subscribe to our audio podcast directly through Google Play!
- RSS - Subscribe through your regular RSS reader (audio only)
- MP3 - Direct download link to the MP3 file
Hosts: Ryan Shrout, Allyn Malventano, Sebastian Peak, and Josh Walrath
Subject: Graphics Cards, Systems, Mobile | July 27, 2016 - 11:58 PM | Scott Michaud
Tagged: nvidia, Nintendo, nintendo nx, tegra, Tegra X1, tegra x2, pascal, maxwell
Okay so there's a few rumors going around, mostly from Eurogamer / DigitalFoundry, that claim the Nintendo NX is going to be powered by an NVIDIA Tegra system on a chip (SoC). DigitalFoundry, specifically, cites multiple sources who claim that their Nintendo NX development kits integrate the Tegra X1 design, as seen in the Google Pixel C. That said, the Nintendo NX release date, March 2017, does provide enough time for them to switch to NVIDIA's upcoming Pascal Tegra design, rumored to be called the Tegra X2, which uses NVIDIA's custom-designed Denver CPU cores.
Preamble aside, here's what I think about the whole situation.
First, the Tegra X1 would be quite a small jump in performance over the WiiU. The WiiU's GPU, “Latte”, has 320 shaders clocked at 550 MHz, and it was based on AMD's TeraScale 1 architecture. Because these stream processors have single-cycle multiply-add for floating point values, you can get its FLOP rating by multiplying 320 shaders, 550,000,000 cycles per second, and 2 operations per clock (one multiply and one add). This yields 352 GFLOPs. The Tegra X1 is rated at 512 GFLOPs, which is just 45% more than the previous generation.
This is a very tiny jump, unless they indeed use Pascal-based graphics. If this is the case, you will likely see a launch selection of games ported from WiiU and a few games that use whatever new feature Nintendo has. One rumor is that the console will be kind-of like the WiiU controller, with detachable controllers. If this is true, it's a bit unclear how this will affect games in a revolutionary way, but we might be missing a key bit of info that ties it all together.
As for the choice of ARM over x86... well. First, this obviously allows Nintendo to choose from a wider selection of manufacturers than AMD, Intel, and VIA, and certainly more than IBM with their previous, Power-based chips. That said, it also jives with Nintendo's interest in the mobile market. They joined The Khronos Group and I'm pretty sure they've said they are interested in Vulkan, which is becoming the high-end graphics API for Android, supported by Google and others. That said, I'm not sure how many engineers exist that specialize in ARM optimization, as most mobile platforms try to abstract this as much as possible, but this could be Nintendo's attempt to settle on a standardized instruction set, and they opted for mobile over PC (versus Sony and especially Microsoft, who want consoles to follow high-end gaming on the desktop).
Why? Well that would just be speculating on speculation about speculation. I'll stop here.
Subject: Graphics Cards | July 25, 2016 - 08:48 PM | Scott Michaud
Tagged: siggraph 2016, Siggraph, quadro, nvidia
SIGGRAPH is the big, professional graphics event of the year, bringing together tens of thousands of attendees. They include engineers from Adobe, AMD, Blender, Disney (including ILM, Pixar, etc.), NVIDIA, The Khronos Group, and many, many others. Not only are new products announced, but many technologies are explained in detail, down to the specific algorithms that are used, so colleagues can advance their own research and share in kind.
But new products will indeed be announced.
The NVIDIA Quadro P6000
NVIDIA, having just launched a few Pascal GPUs to other markets, decided to announce updates to their Quadro line at the event. Two cards have been added, the Quadro P5000 and the Quadro P6000, both at the top end of the product stack. Interestingly, both use GDDR5X memory, meaning that neither will be based on the GP100 design, which is built around HBM2 memory.
The NVIDIA Quadro P5000
The lower end one, the Quadro P5000, should look somewhat familiar to our reader. Exact clocks are not specified, but the chip has 2560 CUDA cores. This is identical to the GTX 1080, but with twice the memory: 16GB of GDDR5X.
Above it sits the Quadro P6000. This chip has 3840 CUDA cores, paired with 24GB of GDDR5X. We have not seen a GPU with exactly these specifications before. It has the same number of FP32 shaders as a fully unlocked GP100 die, but it doesn't have HBM2 memory. On the other hand, the new Titan X uses GP102, combining 3584 CUDA cores with GDDR5X memory, although only 12GB of it. This means that the Quadro P6000 has 256 more (single-precision) shader units than the Titan X, but otherwise very similar specifications.
Both graphics cards have four DisplayPort 1.4 connectors, as well as a single DVI output. These five connectors can be used to drive up to four, 4K, 120Hz monitors, or four, 5K, 60Hz ones. It would be nice if all five connections could be used at once, but what can you do.
Pascal has other benefits for professional users, too. For instance, Simultaneous Multi-Projection (SMP) is used in VR applications to essentially double the GPU's geometry processing ability. NVIDIA will be pushing professional VR at SIGGRAPH this year, also launching Iray VR. This uses light fields, rendered on devices like the DGX-1, with its eight GP100 chips connected by NVLink, to provide accurately lit environments. This is particularly useful for architectural visualization.
No price is given for either of these cards, but they will launch in October of this year.
Subject: General Tech | July 25, 2016 - 08:47 PM | Scott Michaud
Tagged: nvidia, mental ray, maya, 3D rendering
NVIDIA purchased Mental Images, the German software developer that makes the mental ray renderer, all the way back in 2007. It has been bundled with every copy of Maya for a very long time now. In fact, my license of Maya 8, which I purchased back in like, 2006, came with mental ray in both plug-in format, and stand-alone.
Interestingly, even though nearly a decade has passed since NVIDIA's acquisition, Autodesk has been the middle-person that end-users dealt with. This will end soon, as NVIDIA announced, at SIGGRAPH, that they will “be serving end users directly” with their mental ray for Maya plug-in. The new plug-in will show results directly in the viewport, starting at low quality and increasing until the view changes. They are obviously not the first company to do this, with Cycles in Blender being a good example, but I would expect that it is a welcome feature for users.
Benchmark results are by NVIDIA
At the same time, they are also announcing GI-Next. This will speed up global illumination in mental ray, and it will also reduce the number of options required to tune the results to just a single quality slider, making it easier for artists to pick up. One of their benchmarks shows a 26-fold increase in performance, although most of that can be attributed to GPU acceleration from a pair of GM200 Quadro cards. CPU-only tests of the same scene show a 4x increase, though, which is still pretty good.
The new version of mental ray for Maya is expected to ship in September, although it has been in an open beta (for existing Maya users) since February. They do say that “pricing and policies will be announced closer to availability” though, so we'll need to see, then, how different the licensing structure will be. Currently, Maya ships with a few licenses of mental ray out of the box, and has for quite some time.
Subject: Graphics Cards | July 22, 2016 - 09:51 PM | Scott Michaud
Tagged: pascal, nvidia, graphics drivers
Turns out the Pascal-based GPUs suffered from DPC latency issues, and there's been an ongoing discussion about it for a little over a month. This is not an area that I know a lot about, but it's a system that schedules workloads by priority, which provides regular windows of time for sound and video devices to update. It can be stalled by long-running driver code, though, which could manifest as stutter, audio hitches, and other performance issues. With a 10-series GeForce device installed, users have reported that this latency increases about 10-20x, from ~20us to ~300-400us. This can increase to 1000us or more under load. (8333us is ~1 whole frame at 120FPS.)
NVIDIA has acknowledged the issue and, just yesterday, released an optional hotfix. Upon installing the driver, while it could just be psychosomatic, the system felt a lot more responsive. I ran LatencyMon (DPCLat isn't compatible with Windows 8.x or Windows 10) before and after, and the latency measurement did drop significantly. It was consistently the largest source of latency, spiking in the thousands of microseconds, before the update. After the update, it was hidden by other drivers for the first night, although today it seems to have a few spikes again. That said, Microsoft's networking driver is also spiking in the ~200-300us range, so a good portion of it might be the sad state of my current OS install. I've been meaning to do a good system wipe for a while...
Measurement taken after the hotfix, while running Spotify.
That said, my computer's a mess right now.
That said, some of the post-hotfix driver spikes are reaching ~570us (mostly when I play music on Spotify through my Blue Yeti Pro). Also, Photoshop CC 2015 started complaining about graphics acceleration issues after installing the hotfix, so only install it if you're experiencing problems. About the latency, if it's not just my machine, NVIDIA might still have some work to do.
It does feel a lot better, though.