Manufacturer: PC Perspective

Why Two 4GB GPUs Isn't Necessarily 8GB

We're trying something new here at PC Perspective. Some topics are fairly difficult to explain cleanly without accompanying images. We also like to go fairly deep into specific topics, so we're hoping that we can provide educational cartoons that explain these issues.

This pilot episode is about load-balancing and memory management in multi-GPU configurations. There seems to be a lot of confusion around what was (and was not) possible with DirectX 11 and OpenGL, and even more confusion about what DirectX 12, Mantle, and Vulkan allow developers to do. It highlights three different load-balancing algorithms, and even briefly mentions what LucidLogix was attempting to accomplish almost ten years ago.

pcper-2016-animationlogo-multiGPU.png

If you like it, and want to see more, please share and support us on Patreon. We're putting this out not knowing if it's popular enough to be sustainable. The best way to see more of this is to share!

Open the expanded article to see the transcript, below.

AMD Announces TrueAudio Next

Subject: Graphics Cards | August 18, 2016 - 07:58 PM |
Tagged: amd, TrueAudio, trueaudio next

Using a GPU for audio makes a lot of sense. That said, the original TrueAudio was not really about that, and it didn't really take off. The API was only implemented in a handful of titles, and it required dedicated hardware that they have since removed from their latest architectures. It was not about using the extra horsepower of the GPU to simulate sound, although they did have ideas for “sound shaders” in the original TrueAudio.

amd-2016-true-audio-next.jpg

TrueAudio Next, on the other hand, is an SDK that is part of AMD's LiquidVR package. It is based around OpenCL; specifically, it uses AMD's open-source FireRays library to trace the ways that audio can move from source to receiver, including reflections. For high-frequency audio, this is a good assumption, and that range of frequencies are more useful for positional awareness in VR, anyway.

Basically, TrueAudio Next has very little to do with the original.

Interestingly, AMD is providing an interface for TrueAudio Next to reserve compute units, but optionally (and under NDA). This allows audio processing to be unhooked from the video frame rate, provided that the CPU can keep both fed with actual game data. Since audio is typically a secondary thread, it could be ready to send sound calls at any moment. Various existing portions of asynchronous compute could help with this, but allowing developers to wholly reserve a fraction of the GPU should remove the issue entirely. That said, when I was working on a similar project in WebCL, I was looking to the integrated GPU, because it's there and it's idle, so why not? I would assume that, in actual usage, CU reservation would only be enabled if an AMD GPU is the only device installed.

Anywho, if you're interested, then be sure to check out AMD's other post on it, too.

Source: AMD

NVIDIA Officially Announces GeForce GTX 1060 3GB Edition

Subject: Graphics Cards | August 18, 2016 - 02:28 PM |
Tagged: nvidia, gtx 1060 3gb, gtx 1060, graphics card, gpu, geforce, 1152 CUDA Cores

NVIDIA has officially announced the 3GB version of the GTX 1060 graphics card, and it indeed contains fewer CUDA cores than the 6GB version.

GTX1060.jpg

The GTX 1060 Founders Edition

The product page on NVIDIA.com now reflects the 3GB model, and board partners have begun announcing their versions. The MSRP on this 3GB version is set at $199, and availablity of partner cards is expected in the next couple of weeks. The two versions will be designated only by their memory size, and no other capacities of either card are forthcoming.

  GeForce GTX 1060 3GB GeForce GTX 1060 6GB
Architecture Pascal Pascal
CUDA Cores 1152 1280
Base Clock 1506 MHz 1506 MHz
Boost Clock 1708 MHz 1708 MHz
Memory Speed 8 Gbps 8 Gbps
Memory Configuration 3GB 6GB
Memory Interface 192-bit 192-bit
Power Connector 6-pin 6-pin
TDP 120W 120W

As you can see from the above table, the only specification that has changed is the CUDA core count, with base/boost clocks, memory speed and interface, and TDP identical. As to performance, NVIDIA says the 6GB version holds a 5% performance advantage over this lower-cost version, which at $199 is 20% less expensive than the previous GTX 1060 6GB.

Source: NVIDIA
Manufacturer: NVIDIA

Is Enterprise Ascending Outside of Consumer Viability?

So a couple of weeks have gone by since the Quadro P6000 (update: was announced) and the new Titan X launched. With them, we received a new chip: GP102. Since Fermi, NVIDIA has labeled their GPU designs with a G, followed by a single letter for the architecture (F, K, M, or P for Fermi, Kepler, Maxwell, and Pascal, respectively), which is then followed by a three digit number. The last digit is the most relevant one, however, as it separates designs by their intended size.

nvidia-2016-Quadro_P6000_7440.jpg

Typically, 0 corresponds to a ~550-600mm2 design, which is about as larger of a design that fabrication labs can create without error-prone techniques, like multiple exposures (update for clarity: trying to precisely overlap multiple designs to form a larger integrated circuit). 4 corresponds to ~300mm2, although GM204 was pretty large at 398mm2, which was likely to increase the core count while remaining on a 28nm process. Higher numbers, like 6 or 7, fill back the lower-end SKUs until NVIDIA essentially stops caring for that generation. So when we moved to Pascal, jumping two whole process nodes, NVIDIA looked at their wristwatches and said “about time to make another 300mm2 part, I guess?”

The GTX 1080 and the GTX 1070 (GP104, 314mm2) were born.

nvidia-2016-gtc-pascal-banner.png

NVIDIA already announced a 600mm2 part, though. The GP100 had 3840 CUDA cores, HBM2 memory, and an ideal ratio of 1:2:4 between FP64:FP32:FP16 performance. (A 64-bit chunk of memory can store one 64-bit value, two 32-bit values, or four 16-bit values, unless the register is attached to logic circuits that, while smaller, don't know how to operate on the data.) This increased ratio, even over Kepler's 1:6 FP64:FP32, is great for GPU compute, but wasted die area for today's (and tomorrow's) games. I'm predicting that it takes the wind out of Intel's sales, as Xeon Phi's 1:2 FP64:FP32 performance ratio is one of its major selling points, leading to its inclusion in many supercomputers.

Despite the HBM2 memory controller supposedly being actually smaller than GDDR5(X), NVIDIA could still save die space while still providing 3840 CUDA cores (despite disabling a few on Titan X). The trade-off is that FP64 and FP16 performance had to decrease dramatically, from 1:2 and 2:1 relative to FP32, all the way down to 1:32 and 1:64. This new design comes in at 471mm2, although it's $200 more expensive than what the 600mm2 products, GK110 and GM200, launched at. Smaller dies provide more products per wafer, and, better, the number of defective chips should be relatively constant.

Anyway, that aside, it puts NVIDIA in an interesting position. Splitting the xx0-class chip into xx0 and xx2 designs allows NVIDIA to lower the cost of their high-end gaming parts, although it cuts out hobbyists who buy a Titan for double-precision compute. More interestingly, it leaves around 150mm2 for AMD to sneak in a design that's FP32-centric, leaving them a potential performance crown.

nvidia-2016-pascal-volta-roadmap-extremetech.png

Image Credit: ExtremeTech

On the other hand, as fabrication node changes are becoming less frequent, it's possible that NVIDIA could be leaving itself room for Volta, too. Last month, it was rumored that NVIDIA would release two architectures at 16nm, in the same way that Maxwell shared 28nm with Kepler. In this case, Volta, on top of whatever other architectural advancements NVIDIA rolls into that design, can also grow a little in size. At that time, TSMC would have better yields, making a 600mm2 design less costly in terms of waste and recovery.

If this is the case, we could see the GPGPU folks receiving a new architecture once every second gaming (and professional graphics) architecture. That is, unless you are a hobbyist. If you are? I would need to be wrong, or NVIDIA would need to somehow bring their enterprise SKU into an affordable price point. The xx0 class seems to have been pushed up and out of viability for consumers.

Or, again, I could just be wrong.

Intel Larrabee Post-Mortem by Tom Forsyth

Subject: Graphics Cards, Processors | August 17, 2016 - 01:38 PM |
Tagged: Xeon Phi, larrabee, Intel

Tom Forsyth, who is currently at Oculus, was once on the core Larrabee team at Intel. Just prior to Intel's IDF conference in San Francisco, which Ryan is at and covering as I type this, Tom wrote a blog post that outlined the project and its design goals, including why it didn't hit market as a graphics device. He even goes into the details of the graphics architecture, which was almost entirely in software apart from texture units and video out. For instance, Larrabee was running FreeBSD with a program, called DirectXGfx, that gave it the DirectX 11 feature set -- and it worked on hundreds of titles, too.

Intel_Xeon_Phi_Family.jpg

Also, if you found the discussion interesting, then there is plenty of content from back in the day to browse. A good example is an Intel Developer Zone post from Michael Abrash that discussed software rasterization, doing so with several really interesting stories.

Author:
Manufacturer: NVIDIA

Take your Pascal on the go

Easily the strongest growth segment in PC hardware today is in the adoption of gaming notebooks. Ask companies like MSI and ASUS, even Gigabyte, as they now make more models and sell more units of notebooks with a dedicated GPU than ever before.  Both AMD and NVIDIA agree on this point and it’s something that AMD was adamant in discussing during the launch of the Polaris architecture.

pascalnb-2.jpg

Both AMD and NVIDIA predict massive annual growth in this market – somewhere on the order of 25-30%. For an overall culture that continues to believe the PC is dying, seeing projected growth this strong in any segment is not only amazing, but welcome to those of us that depend on it. AMD and NVIDIA have different goals here: GeForce products already have 90-95% market share in discrete gaming notebooks. In order for NVIDIA to see growth in sales, the total market needs to grow. For AMD, simply taking back a portion of those users and design wins would help its bottom line.

pascalnb-4.jpg

But despite AMD’s early talk about getting Polaris 10 and 11 in mobile platforms, it’s NVIDIA again striking first. Gaming notebooks with Pascal GPUs in them will be available today, from nearly every system vendor you would consider buying from: ASUS, MSI, Gigabyte, Alienware, Razer, etc. NVIDIA claims to have quicker adoption of this product family in notebooks than in any previous generation. That’s great news for NVIDIA, but might leave AMD looking in from the outside yet again.

Technologically speaking though, this makes sense. Despite the improvement that Polaris made on the GCN architecture, Pascal is still more powerful and more power efficient than anything AMD has been able to product. Looking solely at performance per watt, which is really the defining trait of mobile designs, Pascal is as dominant over Polaris as Maxwell was to Fiji. And this time around NVIDIA isn’t messing with cut back parts that have brand changes – GeForce is diving directly into gaming notebooks in a way we have only seen with one release.

g752-open.jpg

The ASUS G752VS OC Edition with GTX 1070

Do you remember our initial look at the mobile variant of the GeForce GTX 980? Not the GTX 980M mind you, the full GM204 operating in notebooks. That was basically a dry run for what we see today: NVIDIA will be releasing the GeForce GTX 1080, GTX 1070 and GTX 1060 to notebooks.

Continue reading our preview of the new GeForce GTX 1080, 1070 and 1060 mobile Pascal GPUs!!

3GB Version of NVIDIA GTX 1060 Has 128 Fewer CUDA Cores

Subject: Graphics Cards | August 12, 2016 - 06:33 PM |
Tagged: report, nvidia, gtx 1060 3gb, gtx 1060, GeForce GTX 1060, geforce, cuda cores

NVIDIA will offer a 3GB version of the GTX 1060, and there's more to the story than the obvious fact that is has half the frame buffer of the 6GB version available now. It appears that this is an entirely different product, with 128 fewer CUDA cores (1152) than the 6GB version's 1280.

NVIDIA-GeForce-GTX-1060-3-GB-Announcement.jpg

Image credit: VideoCardz.com

Boost clocks are the same at 1.7 GHz, and the 3GB version will still operate with a 120W TDP and require a 6-pin power connector. So why not simply name this product differently? It's always possible that this will be an OEM version of the GTX 1060, but in any case expect slightly lower performance than the existing version even if you don't run at high enough resolutions to require the larger 6GB frame buffer.

Source: VideoCardz

Wherein the RX 470 teaches us a valuable lesson about deferred procedure calls

Subject: Graphics Cards | August 12, 2016 - 05:44 PM |
Tagged: rx 470, LatencyMon, dpc, amd

When The Tech Report first conducted their review of the RX 470 they saw benchmark behaviour very different from any other GPU in that family but could not figure out what it was and resolve it before the mob arrived with pitchforks and torches demanding they publish or die. 

As it turns out there was indeed something rotten in benchmark; incredibly high DPC on the test machine.  Investigation determined the culprit to be the beta BIOS on their ASRock Z170 Extreme7+, specifically the BIOS which allowed you to overclock locked Intel CPUs.  They have just released their new findings along with a look at LatencyMon and DPC in general.  Take a look at the new benchmarks and information about DPC, but also absorb the consequences of demanding articles arrive picoseconds after the NDA expires; if there is a delay in publishing there might just be a damn good reason why.

villagers_with_pitchforks.jpg

"We retested our RX 470 to account for this issue, and we also updated our review with DirectX 12 benchmarks for Rise of the Tomb Raider and Hitman, plus full OpenGL and Vulkan benchmarks for Doom."

Here are some more Graphics Card articles from around the web:

Graphics Cards

Corsair Releases Hydro GFX GTX 1080 Liquid-Cooled Graphics Card

Subject: Graphics Cards | August 12, 2016 - 10:59 AM |
Tagged: overclock, nvidia, msi, liquid cooled, hydro H55, hydro gfx, GTX 1080, graphics card, gaming, corsair

Corsair and MSI have teamed up once again to produce a liquid-cooled edition of the latest NVIDIA GPU, with the GTX 1080 receiving the same treatment these two gave to the Hydro GFX version of GTX 980 Ti last year.

HydroGFX_01.jpg

“The CORSAIR Hydro GFX GTX 1080 brings all the benefits of liquid cooling to the GeForce GTX 1080, boasting an integrated CORSAIR Hydro Series H55 cooler that draws heat from the GPU via a micro-fin copper base cold plate and dissipates it efficiently using a 120mm high-surface area radiator. A pre-installed low-noise LED-lit 120mm fan ensures steady, reliable air-flow, keeping GPU temperatures down and clock speeds high.

With a low-profile PCB and pre-fitted, fully-sealed liquid cooler, the Hydro GFX GTX 1080 is simple and easy to install. Just fit the card into a PCI-E 3.0 x16 slot, mount the radiator and enjoy low maintenance liquid cooling for the lifetime of the card.”

Naturally, with an integrated closed-loop liquid cooler this GTX 1080 won't be relegated to stock speeds out of the box, though Corsair leaves this up to the user. The card offers three performance modes which allow users to choose between lower noise and higher performance. Silent Mode leaves the GTX 1080 at stock settings (1733 MHz Boost), Gaming Mode increases the Boost clock to 1822 MHz, and OC Mode increases this slightly to 1847 MHz (while increasing memory speed in this mode as well).

gpu_clocks.png

This liquid-cooled version will provide higher sustained clocks

Here are the full specs from Corsair:

  • GPU: NVIDIA GeForce GTX 1080
  • CUDA Cores: 2,560
  • Interface: PCI Express 3.0 x16
  • Boost / Base Core Clock:
    • 1,847 MHz / 1,708 MHz (OC Mode)
    • 1,822 MHz / 1,683 MHz (Gaming Mode)
    • 1,733 MHz / 1,607 MHz (Silent Mode)
  • Memory Clock:
    • 10,108 MHz (OC Mode)
    • 10,010 MHZ (Gaming Mode)
    • 10,010 MHz (Silent Mode)
  • Memory Size: 8192MB
  • Memory Type: 8GB GDDR5X
  • Memory Bus: 256-bit
  • Outputs:
    • 3x DisplayPort (Version 1.4)
    • 1x HDMI (Version 2.0)
    • 1x DL-DVI-D
  • Power Connector: 8-pin x 1
  • Power Consumption: 180W
  • Dimension / Weight:Card: 270 x 111 x 40 mm / 1249 g
  • Cooler: 151 x 118 x 52 mm/ 1286 g
  • SKU: CB-9060010-WW

HydroGFX_Environmental_10.png

The Corsair Hydro GFX GTX 1080 is available now, exclusively on Corsair's official online store, and priced at $749.99.

Source: Corsair

ASUS Adds Radeon RX 470 and RX 460 to ROG STRIX Gaming Lineup

Subject: Graphics Cards | August 10, 2016 - 08:22 PM |
Tagged: video card, strix rx470, strix rx460, strix, rx 470, rx 460, ROG, Republic of Gamers, graphics, gpu, gaming, asus

Ryan posted details about the Radeon RX 470 and 460 graphics cards at the end of last month, and both are now available. Now the largest of the board partners, ASUS, has added both of these new GPUs to their Republic of Gamers STRIX series.

1470330564155.jpg

The STRIX Gaming RX 470 (Image: ASUS)

ASUS announced the Radeon RX 470 STRIX Gaming cards last week, and today the more affordable RX 460 GPU variant has been announced. The RX 470 is certainly a capable gaming option as it's a slightly cut-down version of the RX 480 GPU, and with the two versions of the STRIX Gaming cards offering varying levels of overclocking, they can come even closer to the performance of a stock RX 480.

1470739275395.jpg

The STRIX Gaming RX 460 (Image: ASUS)

The new STRIX Gaming RX 460 is significantly slower, with just 896 stream processors (to the 2048 of the RX 470) and a 128-bit memory interface (compared to 256-bit). Part of the appeal of the reference RX 460 - aside from low cost - is low power draw, as the <75W power draw allows for slot-powered board designs. This STRIX Gaming version adds a 6-pin power connector, however, which should provide additional overhead for further overclocking.

Specifications:

  STRIX-RX470-O4G-GAMING STRIX-RX470-4G-GAMING STRIX-RX460-O4G-GAMING
GPU AMD Radeon RX 470 AMD Radeon RX 470 AMD Radeon RX 460
Stream Processors 2048 2048 896
Memory 4GB GDDR5 4GB GDDR5 4GB GDDR5
Memory Clock 6600 MHz 6600 MHz 7000 MHz
Memory Interface 256-bit 256-bit 128-bit
Core Clock 1270 MHz (OC Mode)
1250 MHz (Gaming Mode)
1226 MHz (OC Mode)
1206 MHz (Gaming Mode)
1256 MHz (OC Mode)
1236 MHz (Gaming Mode)
Video Output DVI-D x2
HDMI 2.0
DisplayPort
DVI-D x2
HDMI 2.0
DisplayPort
DVI-D
HDMI 2.0
DisplayPort
Power Connection 6-pin 6-pin 6-pin
Dimensions 9.5" x 5.1" x 1.6" 9.5" x 5.1" x 1.6" 7.6" x 4.7" x 1.4"

The STRIX Gaming RX 470 OC 4GB is priced at $199, matching the (theoretical) retail of the 4GB RX 480, and the STRIX Gaming RX 470 is just behind at $189. The considerably lower-end STRIX Gaming RX 460 is $139. A check of Amazon/Newegg shows listings for these cards, but no in-stock units as of early this afternoon.

Source: ASUS