Realworldtech with Compelling Evidence
Yesterday David Kanter of Realworldtech posted a pretty fascinating article and video that explored the two latest NVIDIA architectures and how they have branched away from the traditional immediate mode rasterization units. It has revealed through testing that with Maxwell and Pascal NVIDIA has gone to a tiling method with rasterization. This is a somewhat significant departure for the company considering they have utilized the same basic immediate mode rasterization model since the 90s.
The Videologic Apocolypse 3Dx based on the PowerVR PCX2.
(photo courtesy of Wikipedia)
Tiling is an interesting subject and we can harken back to the PowerVR days to see where it was first implemented. There are many advantages to tiling and deferred rendering when it comes to overall efficiency in power and memory bandwidth. These first TBDR (Tile Based Deferred Renderers) offered great performance per clock and could utilize slower memory as compared to other offerings of the day (namely Voodoo Graphics). There were some significant drawbacks to the technology. Essentially a lot of work had to be done by the CPU and driver in scene setup and geometry sorting. On fast CPU systems the PowerVR boards could provide very good performance, but it suffered on lower end parts as compared to the competition. This is a very simple explanation of what is going on, but the long and short of it is that TBDR did not take over the world due to limitations in its initial implementations. Traditional immediate mode rasters would improve in efficiency and performance with aggressive Z checks and other optimizations that borrow from the TBDR playbook.
Tiling is also present in a lot of mobile parts. Imagination’s PowerVR graphics technologies have been implemented by others such as Intel, Apple, Mediatek, and others. Qualcomm (Adreno) and ARM (Mali) both implement tiler technologies to improve power consumption and performance while increasing bandwidth efficiency. Perhaps most interestingly we can remember back to the Gigapixel days with the GP-1 chip that implemented a tiling method that seemed to work very well without the CPU hit and driver overhead that had plagued the PowerVR chips up to that point. 3dfx bought Gigapixel for some $150 million at the time. That company then went on to file bankruptcy a year later and their IP was acquired by NVIDIA.
Screenshot of the program used to uncover the tiling behavior of the rasterizer.
It now appears as though NVIDIA has evolved their raster units to embrace tiling. This is not a full TBDR implementation, but rather an immediate mode tiler that will still break up the scene in tiles but does not implement deferred rendering. This change should improve bandwidth efficiency when it comes to rasterization, but it does not affect the rest of the graphics pipeline by forcing it to be deferred (tessellation, geometry setup and shaders, etc. are not impacted). NVIDIA has not done a deep dive on this change for editors, so we do not know the exact implementation and what advantages we can expect. We can look at the evidence we have and speculate where those advantages exist.
The video where David Kanter explains his findings
Bandwidth and Power
Tilers have typically taken the tiled regions and buffered them on the chip. This is a big improvement in both performance and power efficiency as the raster data does not have to be cached and written out to the frame buffer and then swapped back. This makes quite a bit of sense considering the overall lack of big jumps in memory technologies over the past five years. We have had GDDR-5 since 2007/2008. The speeds have increased over time, but the basic technology is still much the same. We have seen HBM introduced with AMD’s Fury series, but large scale production of HBM 2 is still to come. Samsung has released small amounts of HBM 2 to the market, but not nearly enough to handle the needs of a mass produced card. GDDR-5X is an extension of GDDR-5 that does offer more bandwidth, but it is still not a next generation memory technology like HBM 2.
By utilizing a tiler NVIDIA is able to lower memory bandwidth needs for the rasterization stage. Considering that both Maxwell and Pascal architectures are based on GDDR-5 and 5x technologies, it makes sense to save as much bandwidth as possible where they can. This is again probably one, among many, of the reasons that we saw a much larger L2 cache in Maxwell vs. Kepler (2048 KB vs. 256KB respectively). Every little bit helps when we are looking at hard, real world bandwidth limits for a modern GPU.
The area of power efficiency has also come up in discussion when going to a tiler. Tilers have traditionally been more power efficient as well due to how the raster data is tiled and cached, requiring fewer reads and writes to main memory. The first impulse is to say, “Hey, this is the reason why NVIDIA’s Maxwell was so much more power efficient than Kepler and AMD’s latest parts!” Sadly, this is not exactly true. The tiler is more power efficient, but it is a small part to the power savings on a GPU.
The second fastest Pascal based card...
A modern GPU is very complex. There are some 7.2 billion transistors on the latest Pascal GP-104 that powers the GTX 1080. The vast majority of those transistors are implemented in the shader units of the chip. While the raster units are very important, they are but a fraction of that transistor budget. The rest is taken up by power regulation, PCI-E controllers, and memory controllers. In the big scheme of things the raster portion is going to be dwarfed in power consumption by the shader units. This does not mean that they are not important though. Going back to the hated car analogy, one does not achieve weight savings by focusing on one aspect alone. It is going over every single part of the car and shaving ounces here and there, and in the end achieving significant savings by addressing every single piece of a complex product.
This does appear to be the long and short of it. This is one piece of a very complex ASIC that improves upon memory bandwidth utilization and power efficiency. It is not the whole story, but it is an important part. I find it interesting that NVIDIA did not disclose this change to editors with the introduction of Maxwell and Pascal, but if it is transparent to users and developers alike then there is no need. There is a lot of “secret sauce” that goes into each architecture, and this is merely one aspect. The one question that I do have is how much of the technology is based upon the Gigapixel IP that 3dfx bought at such a premium? I believe that particular tiler was an immediate mode renderer as well due to it not having as many driver and overhead issues that PowerVR exhibited back in the day. Obviously it would not be a copy/paste of the technology that was developed back in the 90s, it would be interesting to see if it was the basis for this current implementation.
Subject: Graphics Cards | August 2, 2016 - 07:37 AM | Scott Michaud
Tagged: windows 10, vulkan, microsoft, DirectX 12
Update (August 3rd @ 4:30pm): Turns out Khronos Group announced at SIGGRAPH that Subgroup Instructions have been recently added to SPIR-V (skip video to 21:30), and are a "top priority" for "Vulkan Next". Some (like WaveBallot) are already ARB (multi-vendor) OpenGL extensions, too.
Original post below:
DirectX 12's shading language will receive some new functionality with the new Shader Model 6.0. According to their GDC talks, it is looking like it will be structured similar to SPIR-V in how it's compiled and ingested. Code will be compiled and optimized as an LLVM-style bytecode, which the driver will accept and execute on the GPU. This could make it easy to write DX12-compatible shader code in other languages, like C++, which is a direction that Vulkan is heading, but Microsoft hasn't seemed to announce that yet.
This news shows a bit more of the nitty gritty details. It looks like they added 16-bit signed (short) and unsigned (ushort) integers, which might provide a performance improvement on certain architectures (although I'm not sure that it's new and/or GPUs exist the natively operate upon them) because they operate on half of the data as a standard, 32-bit integer. They have also added more functionality, to both the pixel and compute shaders, to operate in multiple threads, called lanes, similar to OpenCL. This should allow algorithms to work more efficiently in blocks of pixels, rather than needing to use one of a handful of fixed function calls (ex: partial derivates ddx and ddy) to see outside their thread.
When will this land? No idea, but it is conspicuously close to the Anniversary Update. It has been added to Feature Level 12.0, so its GPU support should be pretty good. Also, Vulkan exists, doing its thing. Not sure how these functions overlap with SPIR-V's feature set, but, since SPIR was original for OpenCL, it could be just sitting there for all I know.
Subject: Graphics Cards | August 2, 2016 - 03:50 AM | Tim Verry
Tagged: sapphire, rx 460, polaris 11, nitro, amd
AMD and its board partners will officially launch the first Polaris 11 GPU and the Radeon RX 460 graphics cards based around that processor on August 8th. Fortunately Videocardz.com got a hold of an image that shows off Sapphire's take on the RX 460 in the form of a factory overclocked and custom cooled RX460 Nitro OC. This gives us a hint at the kinds of cards we can expect and it appears to be good news for budget gamers as it suggests that there will be several options around this firm $100 price point that are a bit more than the bare necessities.
In the case of Sapphire's RX 460 Nitro OC, it uses a custom dual fan cooler with two copper heatpipes, an aluminum fin stack (that is much larger than reference), and two 90mm fans. Display IO includes one DVI, one HDMI, and one DisplayPort. The card itself uses a physical PCI-E x16 connector that is electrically PCI-E 3.0 x8. The x8 connection will be more than enough for this GPU though it also enables partners to cut costs.
Clockspeeds are not yet known, but the Polaris 11 GPU (896 cores, 56 TMUs, 16 ROPs) will be paired with 4GB GDDR5 memory.
It is encouraging to me to see custom cards at this price point out of the gate with the full 4GB of memory (AMD allows 2GB or 4GB versions). Gamers that simply can't justify spending much more than a hundred dollars on a GPU should have ample options to choose from and I am looking forward to seeing what all the partners have to offer.
Are you looking at Polaris 11 and the RX 460 for a super budget gaming build? What do you think about Sapphire's card with the company's custom cooler?
Subject: Graphics Cards | August 1, 2016 - 06:52 PM | Scott Michaud
Tagged: nvidia, Lawsuit, GTX 980, gtx 960
Update @ 9:45pm: I heard that some AMD users were notified about their R9 purchase as well, calling it simply "R9". Since I didn't see concrete proof, I omit it from the post in case it was a hoax (as the story is still developing). I have since been notified of a tweet with an email screenshot.
Original post below:
Apparently, Newegg is informing customers that NVIDIA has settled a class action lawsuit with customers of the GeForce GTX 960 and GTX 980 cards, along with the GTX 970. It's currently unclear whether this is an error, or whether this is one of the sibling class action lawsuits that were apparently bundled together with the GTX 970 one. Users on the NVIDIA Reddit are claiming that it has to do with DirectX 12 feature level support, although that seems like knee-jerk confirmation bias to me.
Regardless, if you purchased a GeForce 900-series graphics card from Newegg, maybe even including the 980 Ti, then you should check your email. You might have a settlement en-route.
That's all we know at this point, though. Thanks to our readers for pointing this out.
Subject: Graphics Cards | August 1, 2016 - 03:39 PM | Sebastian Peak
Tagged: pascal, nvidia, notebooks, mobile gpu, mobile gaming, laptops, GTX 1080M, GTX 1070M, GTX 1060M, discrete gpu
VideoCardz is reporting that an official announcement of the rumored mobile GPUs might be coming at Gamescom later this month.
"Mobile Pascal may arrive at Gamescom in Europe. According to DigiTimes, NVIDIA would allow its notebook partners to unveil mobile Pascal between August 17th to 21st, so just when Gamescom is hosted is hosted in Germany."
We had previously reported on the rumors of a mobile GTX 1070 and 1060, and we can only assume a 1080 will also be available (though VideoCardz is not speculating on the specs of this high-end mobile card just yet).
Rumored NVIDIA Mobile Pascal GPU specs (Image credit: VideoCardz)
Gamescom runs from August 17 - 21 in Germany, so we only have to wait about three weeks to know for sure.
Subject: Graphics Cards | August 1, 2016 - 10:16 AM | Sebastian Peak
Tagged: amd, radeon, radeon software, Crimson Edition 16.7.3, driver, graphics, update, rx480, rise of the tomb raider
AMD has released the Radeon Software Crimson Edition 16.7.3 driver, with improved performance in Rise of the Tomb Raider for Radeon RX 480 owners, as well as various bug fixes.
Radeon Software Crimson Edition is AMD's revolutionary new graphics software that delivers redesigned functionality, supercharged graphics performance, remarkable new features, and innovation that redefines the overall user experience. Every Radeon Software release strives to deliver new features, better performance and stability improvements.
Radeon Software Crimson Edition 16.7.3 Highlights
Rise of the Tomb Raider performance increase up to 10% versus Radeon Software Crimson Edition 16.7.2 on Radeon RX 480 graphics
Subject: Graphics Cards | July 30, 2016 - 11:35 PM | Tim Verry
Tagged: xfx, rx 470, polaris 10, Double Dissipation Edition, amd
AMD's budget (under $200) Polaris-based graphics cards are coming next week, and the leaks are starting to appear online. In the case of the Radeon RX 470, AMD is expecting that most (if not all) of its board partners will be using their own custom coolers. Thanks to Chinese technology site EXPReview, we finally have an idea of what an RX 470 will look like – or at least what an XFX-branded RX 470 will look like!
The website posted several photos of the alleged (but likely legitimate) XFX RX 470 "Black Wolf" graphics card which will probably be branded as the XFX RX 470 Double Dissipation in North America. This is a dual slot card with dual fan cooler that measures 9.45 inches long. Three copper heat pipes pull heat into an aluminum heatsink that is cooled by two 80mm fans that can reportedly be removed by the user for cleaning (and maybe user RMA replacement like Sapphire is planning). The card also features a full backplate and LED-backlit XFX logo along the side of the card. The design is all black with a white XFX logo.
Video outputs include three DisplayPort 1.4, one HDMI 2.0b, and one DL-DVI which seems about right for this price point.
The card is powered by a single 6-pin PCI-E power connector and the card will use AMD's RX 470 GPU and 4GB of GDDR5 memory. The RX 470 features 2048 cores, 128 texture units, and 32 raster operators, This is essentially a RX 480 GPU with four less Compute Units though it maintains the same number of ROPs and the same 256-bit memory bus. We do not know clockspeeds on this custom cooled XFX card yet, but overclockers may well be able to push clocks further than they could on RX 480 (there are less cores so the chips may be able to be pushed further on clocks), but it is hard to say right now. I would expect out of the box clocks to be a bit above the reference RX 470 clocks of 926 MHz base and 1206 MHz boost.
You can check out all of the photos of this card here.
Stay tuned to PC Perspective for more RX 470 and RX 460 news as we near the official launch dates!
- AMD Details the RX 470 and RX 460 Graphics Cards, Coming in August
- The AMD Radeon RX 480 Review - The Polaris Promise
Subject: Graphics Cards | July 29, 2016 - 02:51 AM | Tim Verry
Tagged: water cooling, RGB LED, phanteks, GPU Water Block
Phanteks, a company that produces cases, CPU coolers, and fans has unveiled its first GPU cooler in the form of a full cover water block for Nvidia's GTX 1080 Founder's Edition (and any partner PCBs that use the reference design) graphics card. The PH-GB1080-X is a full cover nickel plated copper block with acrylic top and black (aluminum?) accents on the edges of the block. There are two ports for inlet/outlet on both top and bottom (so users could SLI multiple cards and water cool in series or parallel). Phanteks allegedly uses Dupont Viton for the gaskets which is a "high-performance seal elastomer" for the aerospace industry (and overkill for the temps that will be seen in a PC water loop heh).
In addition to the acrylic top, users can plug in three (1mm) RGB LEDs into the bottom edge of the card to add a glow effect. Oddly, Phanteks shows the LEDs using three individual cables that then go off to a reported proprietary power adapter that can plug into RGB motherboards or Phanteks' cases. Having the LEDs running off of a single cable (or bundled together) coming of the back edge of the card closest to the motherboard would have been helpful to cable management!
Phanteks' new water block is available for pre-order now for $129.99.
Using a water block on the GTX 1080 should allow users to easily achieve above 2000 MHz GPU clocks and have the card clockspeeds be much more stable than on air. Gamer's Nexus tested their GTX 1080 with an EVGA all in one cooler and managed to crank the GPU clockspeeds up to 2164 MHz and the memory clockspeeds up to 5602 MHz. That 2164 MHz clockspeed is quite the overclock and while it was only a bit above what they achieved on air, the clocks were much more stable and actually able to be maintained during long gaming sessions unlike on air. A custom water loop and a water block like the one Phanteks is selling should do just as well as Gamer's Nexus' results if not ever so slightly better.
If you already have a water loop in your system and have been waiting for a block to go with your GTX 1080 you now have another option!
Subject: Graphics Cards | July 29, 2016 - 01:09 AM | Ryan Shrout
Tagged: rx 470, rx 460, radeon, polaris 11, polaris 10, Polaris, amd
We know pretty much all there is to know about AMD's new Polaris architecture thanks to our Radeon RX 480 review, but AMD is taking the covers off of the lower priced, lower performance products based on the same architecture tonight. We previously covered AMD's launch event in Australia where the company officially introduced the Polaris 10 RX 470 and Polaris 11 RX 460 and talked about the broader specifications. Now, we have a bit more information to share on specifics and release dates. Specifically, AMD's RX 470 will launch on Thursday, August 4th and the RX 460 will launch on the following Monday, August 8th.
First up is the Radeon RX 470, based on the same Polaris 10 GPU as the RX 480, but with some CUs disabled to lower performance and increase yields.
This card is aimed at 1080p gaming at top quality settings with AA enabled at 60 FPS. Obviously that is a very vague statement, but it gives you an idea of what price point and target segment the RX 470 is going after.
The only comparison we have from AMD pits the upcoming RX 470 against the R9 270, where Polaris offers a range from 1.5x to 2.4x improvement in a handful of titles, which include DX12 and Vulkan enabled games, of course.
From a specifications stand point, the RX 470 will include 2048 stream processors running at a base clock of 926 MHz and a rated boost frequency of 1206 MHz. That gives us 4.9 TFLOPS of theoretical peak performance to pair with a 6.6 Gbps memory interface capable of 211 GB/s of peak bandwidth. With a 4GB frame buffer and a 120 watt TDP, the RX 470 should offer some compelling performance in the ~$150 price segment (this price is just a guess on my part... though yields should be better – they can salvage RX 480s – and partners being able to use memory chips that do not have to hit 8 Gbps should help to lower costs).
Going down another step to the Radeon RX 460, AMD is targeting this card at 1080p resolutions at "high" image quality settings. The obvious game categories here are eSports titles like MOBAs, CS: Go, Overwatch, etc.
Again, AMD provides a comparison to other AMD hardware: in this case the R7 260X. You'll find a 1.2x to 1.3x performance improvements in these types of titles. Clearly we want to know where the performance rests against the GeForce line but this comparison seems somewhat modest.
Based on the smaller Polaris 11 GPU, which is a new chip that we have not seen before, the RX 460 features up to 2.2 TFLOPS of computing capability with 896 stream processors (14 CUs enabled out of 16 total in full Polaris 11) running between 1090 MHz and 1200 MHz. The memory system is actually running faster on the RX 460 than the RX 470, though with half the memory bus width at 128-bits. The TDP of this card is sub-75 watts and thus we should find cards that don't require any kind of external power. The RX 460 GPU will be used in desktop cards as well as notebooks (though with lower TDPs and clocks).
The chart below outlines the comparison between the three known Polaris graphics processors.
|RX 480||RX 470||RX 460|
|GPU Clock (Base)||1120 MHz||926 MHz||1090 MHz|
|GPU Clock (Boost)||1266 MHz||1206 MHz||1200 MHz|
|Memory||4 or 8 GB GDDR5||4 or 8 GB GDDR5||2 or 4 GB GDDR5|
|Memory Bandwidth||256 GB/s||211 GB/s||112 GB/s|
|GPU||Polaris 10||Polaris 10||Polaris 11|
There is still much to learn about these new products, most importantly, prices. AMD is still shying away from telling us that important data point. The RX 470 will be on sale and will have reviews on August 4th, with the RX 460 following that on August 8th, so we'll have details and costs in our hands very soon.
It is not clear how many or what kinds of cards we can expect to see on the August 4th and August 8th release days though it would stand to reason that they will be mostly based upon reference designs especially for the RX 460 (though Gamer's Nexus did spot a dual fan Sapphire card).. With that said, we may see custom cooled RX 470 graphics cards because while AMD does technically have a reference design with blower style cooler the company expects most if not all of its partners to go their own direction with this board including their own single and dual fan coolers.
For gamers looking to buy into the truly budget card segment, stay tuned just a little longer!
NVIDIA Offers Preliminary Settlement To Geforce GTX 970 Buyers In False Advertising Class Action Lawsuit
Subject: Graphics Cards | July 28, 2016 - 07:07 PM | Tim Verry
Tagged: nvidia, maxwell, GTX 970, GM204, 3.5gb memory
A recent post on Top Class Actions suggests that buyers of NVIDIA GTX 970 graphics cards may soon see a payout from a settlement agreement as part of the series of class action lawsuits facing NVIDIA over claims of false advertising. NVIDIA has reportedly offered up a preliminary settlement of $30 to "all consumers who purchased the GTX 970 graphics card" with no cap on the total payout amount along with a whopping $1.3 million in attorney's fees.
This settlement offer is in response to several class action lawsuits that consumers filed against the graphics giant following the controversy over mis-advertised specifications (particularly the number of ROP units and amount of L2 cache) and the method in which NVIDIA's GM204 GPU addressed the four total gigabytes of graphics memory.
Specifically, the graphics card specifications initially indicated that it had 64 ROPs and 2048 KB of L2 cache, but later was revealed to have only 56 ROPs and 1792 KB of L2. On the memory front, the "3.5 GB memory controvesy" spawned many memes and investigations into how the 3.5 GB and 0.5 GB pools of memory worked and how performance both real world and theoretical were affected by the memory setup.
(My opinions follow)
It was quite the PR disaster and had NVIDIA been upfront with all the correct details on specifications and the new memory implementation the controversy could have been avoided. As is though buyers were not able to make informed decisions about the card and at the end of the day that is what is important and why the lawsuits have merit.
As such, I do expect both sides to reach a settlement rather than see this come to a full trial, but it may not be exactly the $30 per buyer payout as that amount still needs to be approved by the courts to ensure that it is "fair and reasonable."
For more background on the GTX 970 memory issue (it has been awhile since this all came about after all, so you may need a refresher):
- NVIDIA Discloses Full Memory Structure and Limitations of GTX 970
- NVIDIA Responds to GTX 970 3.5GB Memory Issue
- Frame Rating: GTX 970 Memory Issues Tested in SLI
- Frame Rating: Looking at GTX 970 Memory Performance