Subject: Editorial | August 18, 2016 - 02:20 PM | Ryan Shrout
Tagged: video, podcast, pascal, nvidia, msi, mobile, Intel, idf, GTX 1080, gtx 1070, gtx 1060, gigabyte, FMS, Flash Memory Summit, asus, arm, 10nm
PC Perspective Podcast #413 - 08/18/2016
Join us this week as we discuss the new mobile GeForce GTX 10-series gaming notebooks, ARM and Intel partnering on 10nm, Flash Memory Summit and more!
The URL for the podcast is: http://pcper.com/podcast - Share with your friends!
- iTunes - Subscribe to the podcast directly through the Store (audio only)
- Google Play - Subscribe to our audio podcast directly through Google Play!
- RSS - Subscribe through your regular RSS reader (audio only)
- MP3 - Direct download link to the MP3 file
Hosts: Allyn Malventano, Sebastian Peak, Josh Walrath and Jeremy Hellstrom
Week in Review:
This episode of PC Perspective is brought to you by Casper!! Use code “PCPER”
News items of interest:
0:42:05 Final news from FMS 2016
Hardware/Software Picks of the Week
It always feels a little odd when covering NVIDIA’s quarterly earnings due to how they present their financial calendar. No, we are not reporting from the future. Yes, it can be confusing when comparing results and getting your dates mixed up. Regardless of the date before the earnings, NVIDIA did exceptionally well in a quarter that is typically the second weakest after Q1.
NVIDIA reported revenue of $1.43 billion. This is a jump from an already strong Q1 where they took in $1.30 billion. Compare this to the $1.027 billion of its competitor AMD who also provides CPUs as well as GPUs. NVIDIA sold a lot of GPUs as well as other products. Their primary money makers were the consumer space GPUs and the professional and compute markets where they have a virtual stranglehold on at the moment. The company’s GAAP net income is a very respectable $253 million.
The release of the latest Pascal based GPUs were the primary mover for the gains for this latest quarter. AMD has had a hard time competing with NVIDIA for marketshare. The older Maxwell based chips performed well against the entire line of AMD offerings and typically did so with better power and heat characteristics. Even though the GTX 970 was somewhat limited in its memory configuration as compared to the AMD products (3.5 GB + .5 GB vs. a full 4 GB implementation) it was a top seller in its class. The same could be said for the products up and down the stack.
Pascal was released at the end of May, but the company had been shipping chips to its partners as well as creating the “Founder’s Edition” models to its exacting specifications. These were strong sellers throughout the end of May until the end of the quarter. NVIDIA recently unveiled their latest Pascal based Quadro cards, but we do not know how much of an impact those have had on this quarter. NVIDIA has also been shipping, in very limited quantities, the Tesla P100 based units to select customers and outfits.
Is Enterprise Ascending Outside of Consumer Viability?
So a couple of weeks have gone by since the Quadro P6000 (update: was announced) and the new Titan X launched. With them, we received a new chip: GP102. Since Fermi, NVIDIA has labeled their GPU designs with a G, followed by a single letter for the architecture (F, K, M, or P for Fermi, Kepler, Maxwell, and Pascal, respectively), which is then followed by a three digit number. The last digit is the most relevant one, however, as it separates designs by their intended size.
Typically, 0 corresponds to a ~550-600mm2 design, which is about as larger of a design that fabrication labs can create without error-prone techniques, like
multiple exposures (update for clarity: trying to precisely overlap multiple designs to form a larger integrated circuit). 4 corresponds to ~300mm2, although GM204 was pretty large at 398mm2, which was likely to increase the core count while remaining on a 28nm process. Higher numbers, like 6 or 7, fill back the lower-end SKUs until NVIDIA essentially stops caring for that generation. So when we moved to Pascal, jumping two whole process nodes, NVIDIA looked at their wristwatches and said “about time to make another 300mm2 part, I guess?”
The GTX 1080 and the GTX 1070 (GP104, 314mm2) were born.
NVIDIA already announced a 600mm2 part, though. The GP100 had 3840 CUDA cores, HBM2 memory, and an ideal ratio of 1:2:4 between FP64:FP32:FP16 performance. (A 64-bit chunk of memory can store one 64-bit value, two 32-bit values, or four 16-bit values, unless the register is attached to logic circuits that, while smaller, don't know how to operate on the data.) This increased ratio, even over Kepler's 1:6 FP64:FP32, is great for GPU compute, but wasted die area for today's (and tomorrow's) games. I'm predicting that it takes the wind out of Intel's sales, as Xeon Phi's 1:2 FP64:FP32 performance ratio is one of its major selling points, leading to its inclusion in many supercomputers.
Despite the HBM2 memory controller supposedly being actually smaller than GDDR5(X), NVIDIA could still save die space while still providing 3840 CUDA cores (despite disabling a few on Titan X). The trade-off is that FP64 and FP16 performance had to decrease dramatically, from 1:2 and 2:1 relative to FP32, all the way down to 1:32 and 1:64. This new design comes in at 471mm2, although it's $200 more expensive than what the 600mm2 products, GK110 and GM200, launched at. Smaller dies provide more products per wafer, and, better, the number of defective chips should be relatively constant.
Anyway, that aside, it puts NVIDIA in an interesting position. Splitting the xx0-class chip into xx0 and xx2 designs allows NVIDIA to lower the cost of their high-end gaming parts, although it cuts out hobbyists who buy a Titan for double-precision compute. More interestingly, it leaves around 150mm2 for AMD to sneak in a design that's FP32-centric, leaving them a potential performance crown.
Image Credit: ExtremeTech
On the other hand, as fabrication node changes are becoming less frequent, it's possible that NVIDIA could be leaving itself room for Volta, too. Last month, it was rumored that NVIDIA would release two architectures at 16nm, in the same way that Maxwell shared 28nm with Kepler. In this case, Volta, on top of whatever other architectural advancements NVIDIA rolls into that design, can also grow a little in size. At that time, TSMC would have better yields, making a 600mm2 design less costly in terms of waste and recovery.
If this is the case, we could see the GPGPU folks receiving a new architecture once every second gaming (and professional graphics) architecture. That is, unless you are a hobbyist. If you are? I would need to be wrong, or NVIDIA would need to somehow bring their enterprise SKU into an affordable price point. The xx0 class seems to have been pushed up and out of viability for consumers.
Or, again, I could just be wrong.
Subject: General Tech | August 17, 2016 - 12:41 PM | Jeremy Hellstrom
Tagged: nvidia, Intel, HPC, Xeon Phi, maxwell, pascal, dirty pool
There is a spat going on between Intel and NVIDIA over the slide below, as you can read about over at Ars Technica. It seems that Intel have reached into the industries bag of dirty tricks and polished off an old standby, testing new hardware and software against older products from their competitors. In this case it was high performance computing products which were tested, Intel's new Xeon Phi against NVIDIA's Maxwell, tested on an older version of the Caffe AlexNet benchmark.
NVIDIA points out that not only would they have done better than Intel if an up to date version of the benchmarking software was used, but that the comparison should have been against their current architecture, Pascal. This is not quite as bad as putting undocumented flags into compilers to reduce the performance of competitors chips or predatory discount programs but it shows that the computer industry continues to have only a passing acquaintance with fair play and honest competition.
"At this juncture I should point out that juicing benchmarks is, rather sadly, par for the course. Whenever a chip maker provides its own performance figures, they are almost always tailored to the strength of a specific chip—or alternatively, structured in such a way as to exacerbate the weakness of a competitor's product."
Here is some more Tech News from around the web:
- USB Implementers Forum introduces branding for safe USB-C charging @ The Inquirer
- Some Windows 10 Anniversary Update: SSD freeze @ The Register
- Intel Project Alloy: all-in-one VR headset takes aim at Google's Project Daydream @ The Inquirer
- Wanna build your own drone? Intel emits Linux-powered x86 brains for DIY flying gizmos @ The Register
- Intel's Optane XPoint DIMMs pushed back – source @ The Register
Take your Pascal on the go
Easily the strongest growth segment in PC hardware today is in the adoption of gaming notebooks. Ask companies like MSI and ASUS, even Gigabyte, as they now make more models and sell more units of notebooks with a dedicated GPU than ever before. Both AMD and NVIDIA agree on this point and it’s something that AMD was adamant in discussing during the launch of the Polaris architecture.
Both AMD and NVIDIA predict massive annual growth in this market – somewhere on the order of 25-30%. For an overall culture that continues to believe the PC is dying, seeing projected growth this strong in any segment is not only amazing, but welcome to those of us that depend on it. AMD and NVIDIA have different goals here: GeForce products already have 90-95% market share in discrete gaming notebooks. In order for NVIDIA to see growth in sales, the total market needs to grow. For AMD, simply taking back a portion of those users and design wins would help its bottom line.
But despite AMD’s early talk about getting Polaris 10 and 11 in mobile platforms, it’s NVIDIA again striking first. Gaming notebooks with Pascal GPUs in them will be available today, from nearly every system vendor you would consider buying from: ASUS, MSI, Gigabyte, Alienware, Razer, etc. NVIDIA claims to have quicker adoption of this product family in notebooks than in any previous generation. That’s great news for NVIDIA, but might leave AMD looking in from the outside yet again.
Technologically speaking though, this makes sense. Despite the improvement that Polaris made on the GCN architecture, Pascal is still more powerful and more power efficient than anything AMD has been able to product. Looking solely at performance per watt, which is really the defining trait of mobile designs, Pascal is as dominant over Polaris as Maxwell was to Fiji. And this time around NVIDIA isn’t messing with cut back parts that have brand changes – GeForce is diving directly into gaming notebooks in a way we have only seen with one release.
The ASUS G752VS OC Edition with GTX 1070
Do you remember our initial look at the mobile variant of the GeForce GTX 980? Not the GTX 980M mind you, the full GM204 operating in notebooks. That was basically a dry run for what we see today: NVIDIA will be releasing the GeForce GTX 1080, GTX 1070 and GTX 1060 to notebooks.
Subject: Systems, Mobile | August 16, 2016 - 12:00 AM | Sebastian Peak
Tagged: pascal, nvidia, notebook, msi, GTX 1080, gtx 1070, gtx 1060, gaming laptop, gaming
MSI has updated their gaming notebook lineup with the new NVIDIA Pascal mobile GPUs, with the GTX 1080, GTX 1070, and GTX 1060 now available across the board. MSI says the new GPUs will provide up to 40% better performance than the company’s previous GT, GS, and GE models.
“MSI’s GT83/73VR Titan series now showcases an even more commanding design with sports car inspired exhausts and MSI’s Cooler Boost Titan, featuring multiple exhausts and dual whirlwind blade fans to guarantee the best performance even under the most stress. Available in 3 different sizes and 17 unique configurations, including with SLI graphics, 4K panels and Tobii’s eye-tracking technology, MSI’s GT series is the optimum laptop for serious gamers.”
Positioned at the top of the heap is the mighty Titan series, which naturally offers the highest possible specs for those who can afford the price tag.
Notice anything about the top-end GT83 model in the chart above? The GT83VR Titan SLI indeed contains not one, but two NVIDIA GTX 1080 graphics chips, making this $5099 gaming machine a monster of a system - though its 1080p screen real estate means a connected VR headset will be more likely to use all of that available GPU power.
Moving down to the GT72/GT62 series, we see a move to the GTX 1070 GPU accross the board:
Next up is the GS73, which offers (in addition to Pascal graphics) MSI's "Cooler Boost Trinity", which is the company's advanced cooling system for thin notebook designs.
“MSI’s redesigned GS73/63 VR Stealth Pro series now comes with MSI’s Cooler Boost Trinity, a temperature control system featuring three ultra-thin whirlwind blade fans, and a 5-pipe thermal design optimized for ultra-slim gaming notebooks. Available in 17-inch, 15-inch, and 14-inch options, MSI’s GS series gives power mobile gaming a new meaning with the performance of larger systems while measuring less than 1-inch thick.”
The more modest GTX 1060 powers the <1 inch thick notebooks in the series, and both the GS73 and GS63 VR Stealth Pro are equipped with 4K resolution IPS screens (with the GS43VR Phantom Pro at 1080p).
Next we have the VR Apache series, with another approach to cooling called "Cooler Boost 4":
“MSI’s GE72/62 VR Apache series now features MSI’s Cooler Boost 4 technology, an enhanced cooling system with multiple exhausts to keep temperatures low even during the most headed battles. Starting at $1,649, the VR-ready GE series comes in two different sizes and is the ideal unit for gaming enthusiast looking for a powerful and reliable unit.”
These lower-cost gaming machines are still equipped with Intel Core i7 processors, and offer GTX 1060 graphics for both models.
As a very interesting addition to the news of these new laptops, MSI has also announced that select machines equipped with NVIDIA GTX 10 Series graphics will feature 120Hz IPS panels with a 5ms response time.
We should have more imformation on availability soon.
A Beautiful Graphics Card
As a surprise to nearly everyone, on July 21st NVIDIA announced the existence of the new Titan X graphics cards, which are based on the brand new GP102 Pascal GPU. Though it shares a name, for some unexplained reason, with the Maxwell-based Titan X graphics card launched in March of 2015, this is card is a significant performance upgrade. Using the largest consumer-facing Pascal GPU to date (with only the GP100 used in the Tesla P100 exceeding it), the new Titan X is going to be a very expensive, and very fast gaming card.
As has been the case since the introduction of the Titan brand, NVIDIA claims that this card is for gamers that want the very best in graphics hardware as well as for developers and need an ultra-powerful GPGPU device. GP102 does not integrate improved FP64 / double precision compute cores, so we are basically looking at an upgraded and improved GP104 Pascal chip. That’s nothing to sneeze at, of course, and you can see in the specifications below that we expect (and can now show you) Titan X (Pascal) is a gaming monster.
|Titan X (Pascal)||GTX 1080||GTX 980 Ti||TITAN X||GTX 980||R9 Fury X||R9 Fury||R9 Nano||R9 390X|
|GPU||GP102||GP104||GM200||GM200||GM204||Fiji XT||Fiji Pro||Fiji XT||Hawaii XT|
|Rated Clock||1417 MHz||1607 MHz||1000 MHz||1000 MHz||1126 MHz||1050 MHz||1000 MHz||up to 1000 MHz||1050 MHz|
|Memory Clock||10000 MHz||10000 MHz||7000 MHz||7000 MHz||7000 MHz||500 MHz||500 MHz||500 MHz||6000 MHz|
|Memory Interface||384-bit G5X||256-bit G5X||384-bit||384-bit||256-bit||4096-bit (HBM)||4096-bit (HBM)||4096-bit (HBM)||512-bit|
|Memory Bandwidth||480 GB/s||320 GB/s||336 GB/s||336 GB/s||224 GB/s||512 GB/s||512 GB/s||512 GB/s||320 GB/s|
|TDP||250 watts||180 watts||250 watts||250 watts||165 watts||275 watts||275 watts||175 watts||275 watts|
|Peak Compute||11.0 TFLOPS||8.2 TFLOPS||5.63 TFLOPS||6.14 TFLOPS||4.61 TFLOPS||8.60 TFLOPS||7.20 TFLOPS||8.19 TFLOPS||5.63 TFLOPS|
GP102 features 40% more CUDA cores than the GP104 at slightly lower clock speeds. The rated 11 TFLOPS of single precision compute of the new Titan X is 34% higher than that of the GeForce GTX 1080 and I would expect gaming performance to scale in line with that difference.
Titan X (Pascal) does not utilize the full GP102 GPU; the recently announced Pascal P6000 does, however, which gives it a CUDA core count of 3,840 (256 more than Titan X).
A full GP102 GPU
The complete GPU effectively loses 7% of its compute capability with the new Titan X, although that is likely to help increase available clock headroom and yield.
The new Titan X will feature 12GB of GDDR5X memory, not HBM as the GP100 chip has, so this is clearly a unique chip with a new memory interface. NVIDIA claims it has 480 GB/s of bandwidth on a 384-bit memory controller interface running at the same 10 Gbps as the GTX 1080.
Realworldtech with Compelling Evidence
Yesterday David Kanter of Realworldtech posted a pretty fascinating article and video that explored the two latest NVIDIA architectures and how they have branched away from the traditional immediate mode rasterization units. It has revealed through testing that with Maxwell and Pascal NVIDIA has gone to a tiling method with rasterization. This is a somewhat significant departure for the company considering they have utilized the same basic immediate mode rasterization model since the 90s.
The Videologic Apocolypse 3Dx based on the PowerVR PCX2.
(photo courtesy of Wikipedia)
Tiling is an interesting subject and we can harken back to the PowerVR days to see where it was first implemented. There are many advantages to tiling and deferred rendering when it comes to overall efficiency in power and memory bandwidth. These first TBDR (Tile Based Deferred Renderers) offered great performance per clock and could utilize slower memory as compared to other offerings of the day (namely Voodoo Graphics). There were some significant drawbacks to the technology. Essentially a lot of work had to be done by the CPU and driver in scene setup and geometry sorting. On fast CPU systems the PowerVR boards could provide very good performance, but it suffered on lower end parts as compared to the competition. This is a very simple explanation of what is going on, but the long and short of it is that TBDR did not take over the world due to limitations in its initial implementations. Traditional immediate mode rasters would improve in efficiency and performance with aggressive Z checks and other optimizations that borrow from the TBDR playbook.
Tiling is also present in a lot of mobile parts. Imagination’s PowerVR graphics technologies have been implemented by others such as Intel, Apple, Mediatek, and others. Qualcomm (Adreno) and ARM (Mali) both implement tiler technologies to improve power consumption and performance while increasing bandwidth efficiency. Perhaps most interestingly we can remember back to the Gigapixel days with the GP-1 chip that implemented a tiling method that seemed to work very well without the CPU hit and driver overhead that had plagued the PowerVR chips up to that point. 3dfx bought Gigapixel for some $150 million at the time. That company then went on to file bankruptcy a year later and their IP was acquired by NVIDIA.
Screenshot of the program used to uncover the tiling behavior of the rasterizer.
It now appears as though NVIDIA has evolved their raster units to embrace tiling. This is not a full TBDR implementation, but rather an immediate mode tiler that will still break up the scene in tiles but does not implement deferred rendering. This change should improve bandwidth efficiency when it comes to rasterization, but it does not affect the rest of the graphics pipeline by forcing it to be deferred (tessellation, geometry setup and shaders, etc. are not impacted). NVIDIA has not done a deep dive on this change for editors, so we do not know the exact implementation and what advantages we can expect. We can look at the evidence we have and speculate where those advantages exist.
The video where David Kanter explains his findings
Bandwidth and Power
Tilers have typically taken the tiled regions and buffered them on the chip. This is a big improvement in both performance and power efficiency as the raster data does not have to be cached and written out to the frame buffer and then swapped back. This makes quite a bit of sense considering the overall lack of big jumps in memory technologies over the past five years. We have had GDDR-5 since 2007/2008. The speeds have increased over time, but the basic technology is still much the same. We have seen HBM introduced with AMD’s Fury series, but large scale production of HBM 2 is still to come. Samsung has released small amounts of HBM 2 to the market, but not nearly enough to handle the needs of a mass produced card. GDDR-5X is an extension of GDDR-5 that does offer more bandwidth, but it is still not a next generation memory technology like HBM 2.
By utilizing a tiler NVIDIA is able to lower memory bandwidth needs for the rasterization stage. Considering that both Maxwell and Pascal architectures are based on GDDR-5 and 5x technologies, it makes sense to save as much bandwidth as possible where they can. This is again probably one, among many, of the reasons that we saw a much larger L2 cache in Maxwell vs. Kepler (2048 KB vs. 256KB respectively). Every little bit helps when we are looking at hard, real world bandwidth limits for a modern GPU.
The area of power efficiency has also come up in discussion when going to a tiler. Tilers have traditionally been more power efficient as well due to how the raster data is tiled and cached, requiring fewer reads and writes to main memory. The first impulse is to say, “Hey, this is the reason why NVIDIA’s Maxwell was so much more power efficient than Kepler and AMD’s latest parts!” Sadly, this is not exactly true. The tiler is more power efficient, but it is a small part to the power savings on a GPU.
The second fastest Pascal based card...
A modern GPU is very complex. There are some 7.2 billion transistors on the latest Pascal GP-104 that powers the GTX 1080. The vast majority of those transistors are implemented in the shader units of the chip. While the raster units are very important, they are but a fraction of that transistor budget. The rest is taken up by power regulation, PCI-E controllers, and memory controllers. In the big scheme of things the raster portion is going to be dwarfed in power consumption by the shader units. This does not mean that they are not important though. Going back to the hated car analogy, one does not achieve weight savings by focusing on one aspect alone. It is going over every single part of the car and shaving ounces here and there, and in the end achieving significant savings by addressing every single piece of a complex product.
This does appear to be the long and short of it. This is one piece of a very complex ASIC that improves upon memory bandwidth utilization and power efficiency. It is not the whole story, but it is an important part. I find it interesting that NVIDIA did not disclose this change to editors with the introduction of Maxwell and Pascal, but if it is transparent to users and developers alike then there is no need. There is a lot of “secret sauce” that goes into each architecture, and this is merely one aspect. The one question that I do have is how much of the technology is based upon the Gigapixel IP that 3dfx bought at such a premium? I believe that particular tiler was an immediate mode renderer as well due to it not having as many driver and overhead issues that PowerVR exhibited back in the day. Obviously it would not be a copy/paste of the technology that was developed back in the 90s, it would be interesting to see if it was the basis for this current implementation.
Subject: Graphics Cards | August 1, 2016 - 03:39 PM | Sebastian Peak
Tagged: pascal, nvidia, notebooks, mobile gpu, mobile gaming, laptops, GTX 1080M, GTX 1070M, GTX 1060M, discrete gpu
VideoCardz is reporting that an official announcement of the rumored mobile GPUs might be coming at Gamescom later this month.
"Mobile Pascal may arrive at Gamescom in Europe. According to DigiTimes, NVIDIA would allow its notebook partners to unveil mobile Pascal between August 17th to 21st, so just when Gamescom is hosted is hosted in Germany."
We had previously reported on the rumors of a mobile GTX 1070 and 1060, and we can only assume a 1080 will also be available (though VideoCardz is not speculating on the specs of this high-end mobile card just yet).
Rumored NVIDIA Mobile Pascal GPU specs (Image credit: VideoCardz)
Gamescom runs from August 17 - 21 in Germany, so we only have to wait about three weeks to know for sure.
Subject: Graphics Cards, Systems, Mobile | July 27, 2016 - 07:58 PM | Scott Michaud
Tagged: nvidia, Nintendo, nintendo nx, tegra, Tegra X1, tegra x2, pascal, maxwell
Okay so there's a few rumors going around, mostly from Eurogamer / DigitalFoundry, that claim the Nintendo NX is going to be powered by an NVIDIA Tegra system on a chip (SoC). DigitalFoundry, specifically, cites multiple sources who claim that their Nintendo NX development kits integrate the Tegra X1 design, as seen in the Google Pixel C. That said, the Nintendo NX release date, March 2017, does provide enough time for them to switch to NVIDIA's upcoming Pascal Tegra design, rumored to be called the Tegra X2, which uses NVIDIA's custom-designed Denver CPU cores.
Preamble aside, here's what I think about the whole situation.
First, the Tegra X1 would be quite a small jump in performance over the WiiU. The WiiU's GPU, “Latte”, has 320 shaders clocked at 550 MHz, and it was based on AMD's TeraScale 1 architecture. Because these stream processors have single-cycle multiply-add for floating point values, you can get its FLOP rating by multiplying 320 shaders, 550,000,000 cycles per second, and 2 operations per clock (one multiply and one add). This yields 352 GFLOPs. The Tegra X1 is rated at 512 GFLOPs, which is just 45% more than the previous generation.
This is a very tiny jump, unless they indeed use Pascal-based graphics. If this is the case, you will likely see a launch selection of games ported from WiiU and a few games that use whatever new feature Nintendo has. One rumor is that the console will be kind-of like the WiiU controller, with detachable controllers. If this is true, it's a bit unclear how this will affect games in a revolutionary way, but we might be missing a key bit of info that ties it all together.
As for the choice of ARM over x86... well. First, this obviously allows Nintendo to choose from a wider selection of manufacturers than AMD, Intel, and VIA, and certainly more than IBM with their previous, Power-based chips. That said, it also jives with Nintendo's interest in the mobile market. They joined The Khronos Group and I'm pretty sure they've said they are interested in Vulkan, which is becoming the high-end graphics API for Android, supported by Google and others. That said, I'm not sure how many engineers exist that specialize in ARM optimization, as most mobile platforms try to abstract this as much as possible, but this could be Nintendo's attempt to settle on a standardized instruction set, and they opted for mobile over PC (versus Sony and especially Microsoft, who want consoles to follow high-end gaming on the desktop).
Why? Well that would just be speculating on speculation about speculation. I'll stop here.