Author:
Manufacturer: PC Perspective

Why?

Astute readers of the site might remember the original story we did on Bitcoin mining in 2011, the good ole' days where the concept of the blockchain was new and exciting and mining Bitcoin on a GPU was still plenty viable.

gpu-bitcoin.jpg

However, that didn't last long, as the race for cash lead people to developing Application Specific Integrated Circuits (ASICs) dedicated solely to Bitcoin mining quickly while sipping power. Use of the expensive ASICs drove the difficulty of mining Bitcoin to the roof and killed any sort of chance of profitability from mere mortals mining cryptocurrency.

Cryptomining saw a resurgence in late 2013 with the popular adoption of alternate cryptocurrencies, specifically Litecoin which was based on the Scrypt algorithm instead of AES-256 like Bitcoin. This meant that the ASIC developed for mining Bitcoin were useless. This is also the period of time that many of you may remember as the "Dogecoin" era, my personal favorite cryptocurrency of all time. 

dogecoin-300.png

Defenders of these new "altcoins" claimed that Scrypt was different enough that ASICs would never be developed for it, and GPU mining would remain viable for a larger portion of users. As it turns out, the promise of money always wins out, and we soon saw Scrypt ASICs. Once again, the market for GPU mining crashed.

That brings us to today, and what I am calling "Third-wave Cryptomining." 

While the mass populous stopped caring about cryptocurrency as a whole, the dedicated group that was left continued to develop altcoins. These different currencies are based on various algorithms and other proofs of works (see technologies like Storj, which use the blockchain for a decentralized Dropbox-like service!).

As you may have predicted, for various reasons that might be difficult to historically quantify, there is another very popular cryptocurrency from this wave of development, Ethereum.

ETHEREUM-LOGO_LANDSCAPE_Black.png

Ethereum is based on the Dagger-Hashimoto algorithm and has a whole host of different quirks that makes it different from other cryptocurrencies. We aren't here to get deep in the woods on the methods behind different blockchain implementations, but if you have some time check out the Ethereum White Paper. It's all very fascinating.

Continue reading our look at this third wave of cryptocurrency!

Author:
Manufacturer: AMD

We are up to two...

UPDATE (5/31/2017): Crystal Dynamics was able to get back to us with a couple of points on the changes that were made with this patch to affect the performance of AMD Ryzen processors.

  1. Rise of the Tomb Raider splits rendering tasks to run on different threads. By tuning the size of those tasks – breaking some up, allowing multicore CPUs to contribute in more cases, and combining some others, to reduce overheads in the scheduler – the game can more efficiently exploit extra threads on the host CPU.
     
  2. An optimization was identified in texture management that improves the combination of AMD CPU and NVIDIA GPU.  Overhead was reduced by packing texture descriptor uploads into larger chunks.

There you have it, a bit more detail on the software changes made to help adapt the game engine to AMD's Ryzen architecture. Not only that, but it does confirm our information that there was slightly MORE to address in the Ryzen+GeForce combinations.

END UPDATE

Despite a couple of growing pains out of the gate, the Ryzen processor launch appears to have been a success for AMD. Both the Ryzen 7 and the Ryzen 5 releases proved to be very competitive with Intel’s dominant CPUs in the market and took significant leads in areas of massive multi-threading and performance per dollar. An area that AMD has struggled in though has been 1080p gaming – performance in those instances on both Ryzen 7 and 5 processors fell behind comparable Intel parts by (sometimes) significant margins.

Our team continues to watch the story to see how AMD and game developers work through the issue. Most recently I posted a look at the memory latency differences between Ryzen and Intel Core processors. As it turns out, the memory latency differences are a significant part of the initial problem for AMD:

Because of this, I think it is fair to claim that some, if not most, of the 1080p gaming performance deficits we have seen with AMD Ryzen processors are a result of this particular memory system intricacy. You can combine memory latency with the thread-to-thread communication issue we discussed previously into one overall system level complication: the Zen memory system behaves differently than anything we have seen prior and it currently suffers in a couple of specific areas because of it.

In that story I detailed our coverage of the Ryzen processor and its gaming performance succinctly:

Our team has done quite a bit of research and testing on this topic. This included a detailed look at the first asserted reason for the performance gap, the Windows 10 scheduler. Our summary there was that the scheduler was working as expected and that minimal difference was seen when moving between different power modes. We also talked directly with AMD to find out its then current stance on the results, backing up our claims on the scheduler and presented a better outlook for gaming going forward. When AMD wanted to test a new custom Windows 10 power profile to help improve performance in some cases, we took part in that too. In late March we saw the first gaming performance update occur courtesy of Ashes of the Singularity: Escalation where an engine update to utilize more threads resulted in as much as 31% average frame increase.

Quick on the heels of the Ryzen 7 release, AMD worked with the developer Oxide on the Ashes of the Singularity: Escalation engine. Through tweaks and optimizations, the game was able to showcase as much as a 30% increase in average frame rate on the integrated benchmark. While this was only a single use case, it does prove that through work with the developers, AMD has the ability to improve the 1080p gaming positioning of Ryzen against Intel.

rotr-screen4-small.jpg

Fast forward to today and I was surprised to find a new patch for Rise of the Tomb Raider, a game that was actually one of the worst case scenarios for AMD with Ryzen. (Patch #12, v1.0.770.1) The patch notes mention the following:

The following changes are included in this patch

- Fix certain DX12 crashes reported by users on the forums.

- Improve DX12 performance across a variety of hardware, in CPU bound situations. Especially performance on AMD Ryzen CPUs can be significantly improved.

While we expect this patch to be an improvement for everyone, if you do have trouble with this patch and prefer to stay on the old version we made a Beta available on Steam, build 767.2, which can be used to switch back to the previous version.

We will keep monitoring for feedback and will release further patches as it seems required. We always welcome your feedback!

Obviously the data point that stood out for me was the improved DX12 performance “in CPU bound situations. Especially on AMD Ryzen CPUs…”

Remember how the situation appeared in April?

rotr.png

The Ryzen 7 1800X was 24% slower than the Intel Core i7-7700K – a dramatic difference for a processor that should only have been ~8-10% slower in single threaded workloads.

How does this new patch to RoTR affect performance? We tested it on the same Ryzen 7 1800X benchmarks platform from previous testing including the ASUS Crosshair VI Hero motherboard, 16GB DDR4-2400 memory and GeForce GTX 1080 Founders Edition using the 378.78 driver. All testing was done under the DX12 code path.

tr-1.png

tr-2.png

The Ryzen 7 1800X score jumps from 107 FPS to 126.44 FPS, an increase of 17%! That is a significant boost in performance at 1080p while still running at the Very High image quality preset, indicating that the developer (and likely AMD) were able to find substantial inefficiencies in the engine. For comparison, the 8-core / 16-thread Intel Core i7-6900K only sees a 2.4% increase from this new game revision. This tells us that the changes to the game were specific to Ryzen processors and their design, but that no performance was redacted from the Intel platforms.

Continue reading our look at the new Rise of the Tomb Raider patch for Ryzen!

Manufacturer: The Khronos Group

The Right People to Interview

Last week, we reported that OpenCL’s roadmap would be merging into Vulkan, and OpenCL would, starting at some unspecified time in the future, be based “on an extended version of the Vulkan API”. This was based on quotes from several emails between myself and the Khronos Group.

Since that post, I had the opportunity to have a phone interview with Neil Trevett, president of the Khronos Group and chairman of the OpenCL working group, and Tom Olson, chairman of the Vulkan working group. We spent a little over a half hour going over Neil’s International Workshop on OpenCL (IWOCL) presentation, discussing the decision, and answering a few lingering questions. This post will present the results of that conference call in a clean, readable way.

khronos-officiallogo-Vulkan_500px_Dec16.png

First and foremost, while OpenCL is planning to merge into the Vulkan API, the Khronos Group wants to make it clear that “all of the merging” is coming from the OpenCL working group. The Vulkan API roadmap is not affected by this decision. Of course, the Vulkan working group will be able to take advantage of technologies that are dropping into their lap, but those discussions have not even begun yet.

Neil: Vulkan has its mission and its roadmap, and it’s going ahead on that. OpenCL is doing all of the merging. We’re kind-of coming in to head in the Vulkan direction.

Does that mean, in the future, that there’s a bigger wealth of opportunity to figure out how we can take advantage of all this kind of mutual work? The answer is yes, but we haven’t started those discussions yet. I’m actually excited to have those discussions, and are many people, but that’s a clarity. We haven’t started yet on how Vulkan, itself, is changed (if at all) by this. So that’s kind-of the clarity that I think is important for everyone out there trying to understand what’s going on.

Tom also prepared an opening statement. It’s not as easy to abbreviate, so it’s here unabridged.

Tom: I think that’s fair. From the Vulkan point of view, the way the working group thinks about this is that Vulkan is an abstract machine, or at least there’s an abstract machine underlying it. We have a programming language for it, called SPIR-V, and we have an interface controlling it, called the API. And that machine, in its full glory… it’s a GPU, basically, and it’s got lots of graphics functionality. But you don’t have to use that. And the API and the programming language are very general. And you can build lots of things with them. So it’s great, from our point of view, that the OpenCL group, with their special expertise, can use that and leverage that. That’s terrific, and we’re fully behind it, and we’ll help them all we can. We do have our own constituency to serve, which is the high-performance game developer first and foremost, and we are going to continue to serve them as our main mission.

So we’re not changing our roadmap so much as trying to make sure we’re a good platform for other functionality to be built on.

Neil then went on to mention that the decision to merge OpenCL’s roadmap into the Vulkan API took place only a couple of weeks ago. The purpose of the press release was to reach OpenCL developers and get their feedback. According to him, they did a show of hands at the conference, with a room full of a hundred OpenCL developers, and no-one was against moving to the Vulkan API. This gives them confidence that developers will accept the decision, and that their needs will be served by it.

Next up is the why. Read on for more.

Author:
Manufacturer: AMD

Is it time to buy that new GPU?

Testing commissioned by AMD. This means that AMD paid us for our time, but had no say in the results or presentation of them.

Earlier this week Bethesda and Arkane Studios released Prey, a first-person shooter that is a re-imaging of the 2006 game of the same name. Fans of System Shock will find a lot to love about this new title and I have found myself enamored with the game…in the name of science of course.

Prey-2017-05-06-15-52-04-16.jpg

While doing my due diligence and performing some preliminary testing to see if we would utilize Prey for graphics testing going forward, AMD approached me to discuss this exact title. With the release of the Radeon RX 580 in April, one of the key storylines is that the card offers a reasonably priced upgrade path for users of 2+ year old hardware. With that upgrade you should see some substantial performance improvements and as I will show you here, the new Prey is a perfect example of that.

Targeting the Radeon R9 380, a graphics card that was originally released back in May of 2015, the RX 580 offers substantially better performance at a very similar launch price. The same is true for the GeForce GTX 960: launched in January of 2015, it is slightly longer in the tooth. AMD’s data shows that 80% of the users on Steam are running on R9 380X or slower graphics cards and that only 10% of them upgraded in 2016. Considering the great GPUs that were available then (including the RX 480 and the GTX 10-series), it seems more and more likely that we going to hit an upgrade inflection point in the market.

slides-5.jpg

A simple experiment was setup: does the new Radeon RX 580 offer a worthwhile upgrade path for those many users of R9 380 or GTX 960 classifications of graphics cards (or older)?

  Radeon RX 580 Radeon R9 380 GeForce GTX 960
GPU Polaris 20 Tonga Pro GM206
GPU Cores 2304 1792 1024
Rated Clock 1340 MHz 918 MHz 1127 MHz
Memory 4GB
8GB
4GB 2GB
4GB
Memory Interface 256-bit 256-bit 128-bit
TDP 185 watts 190 watts 120 watts
MSRP (at launch) $199 (4GB)
$239 (8GB)
$219 $199

Continue reading our look at the Radeon RX 580 in Prey!

Author:
Manufacturer: EVGA

Specifications and Design

When the GeForce GTX 1080 Ti launched last month it became the fastest consumer graphics card on the market, taking over a spot that NVIDIA had already laid claim to since the launch of the GTX 1080, and arguably before that with the GTX 980 Ti. Passing on the notion that the newly released Titan Xp is a graphics cards gamers should actually consider for their cash, the 1080 Ti continues to stand alone at the top. That is until NVIDIA comes up another new architecture or AMD surprises us all with the release of the Vega chip this summer.

NVIDIA board partners have the flexibility to build custom hardware around the GTX 1080 Ti design and the EVGA GeForce GTX 1080 Ti SC2 sporting iCX Technology is one of those new models. Today’s story is going to give you my thoughts and impressions on this card in a review – one with fewer benchmarks than you are used to see but one that covers all the primary differentiation points to consider over the reference/Founders Edition options.

Specifications and Design

The EVGA GTX 1080 Ti SC2 with iCX Technology takes the same GPU and memory technology shown off with the GTX 1080 Ti launch and gussies it up with higher clocks, a custom PCB with thermal sensors in 9 different locations, LEDs for externally monitoring the health of your card and a skeleton-like cooler design that is both effective and aggressive.

  EVGA 1080 Ti SC2 GTX 1080 Ti Titan X (Pascal) GTX 1080 GTX 980 Ti TITAN X GTX 980 R9 Fury X R9 Fury
GPU GP102 GP102 GP102 GP104 GM200 GM200 GM204 Fiji XT Fiji Pro
GPU Cores 3584 3584 3584 2560 2816 3072 2048 4096 3584
Base Clock 1557 MHz 1480 MHz 1417 MHz 1607 MHz 1000 MHz 1000 MHz 1126 MHz 1050 MHz 1000 MHz
Boost Clock 1671 MHz 1582 MHz 1480 MHz 1733 MHz 1076 MHz 1089 MHz 1216 MHz - -
Texture Units 224 224 224 160 176 192 128 256 224
ROP Units 88 88 96 64 96 96 64 64 64
Memory 11GB 11GB 12GB 8GB 6GB 12GB 4GB 4GB 4GB
Memory Clock 11000 MHz 11000 MHz 10000 MHz 10000 MHz 7000 MHz 7000 MHz 7000 MHz 500 MHz 500 MHz
Memory Interface 352-bit 352-bit 384-bit G5X 256-bit G5X 384-bit 384-bit 256-bit 4096-bit (HBM) 4096-bit (HBM)
Memory Bandwidth 484 GB/s 484 GB/s 480 GB/s 320 GB/s 336 GB/s 336 GB/s 224 GB/s 512 GB/s 512 GB/s
TDP 250 watts 250 watts 250 watts 180 watts 250 watts 250 watts 165 watts 275 watts 275 watts
Peak Compute 11.1 TFLOPS 10.6 TFLOPS 10.1 TFLOPS 8.2 TFLOPS 5.63 TFLOPS 6.14 TFLOPS 4.61 TFLOPS 8.60 TFLOPS 7.20 TFLOPS
Transistor Count 12.0B 12.0B 12.0B 7.2B 8.0B 8.0B 5.2B 8.9B 8.9B
Process Tech 16nm 16nm 16nm 16nm 28nm 28nm 28nm 28nm 28nm
MSRP (current) $719 $699 $1,200 $599 $649 $999 $499 $649 $549

Out of the box EVGA has overclocked the GTX 1080 Ti SC2 above reference specs. With a base clock of 1557 MHz and a GPU Boost clock of 1671 MHz, it has a 77 MHz jump on base and an 89 MHz jump on boost. Though moderate by some overclockers’ standards, that’s a healthy increase of 5.3% on the typical boost clock rate. The memory speed remains the same at 11.0 Gbps on 11GB, unchanged from the Founders Edition.

IMG_7629.jpg

I’m not going to walk through the other specifications of the GeForce GTX 1080 Ti GPU in general – I assume if you are looking at this story you are already well aware of it features and capabilities. If you need a refresh on this oddly-designed 352-bit memory bus behemoth, just read over the first page of my GeForce GTX 1080 Ti launch review.

DSC02824.JPG

Continue reading our review of the EVGA GeForce GTX 1080 Ti SC2!!

Author:
Manufacturer: AMD

What is old is new again

Trust me on this one – AMD is aware that launching the RX 500-series of graphics cards, including the RX 580 we are reviewing today, is an uphill battle. Besides battling the sounds on the hills that whisper “reeebbrraannndd” AMD needs to work with its own board partners to offer up total solutions that compete well with NVIDIA’s stronghold on the majority of the market. Just putting out the Radeon RX 580 and RX 570 cards with same coolers and specs as the RX 400-series would be a recipe for ridicule. AMD is aware and is being surprisingly proactive in its story telling the consumer and the media.

  • If you already own a Radeon RX 400-series card, the RX 500-series is not expected to be an upgrade path for you.
     
  • The Radeon RX 500-series is NOT based on Vega. Polaris here everyone.
     
  • Target users are those with Radeon R9 380 class cards and older – Polaris is still meant as an upgrade for that very large user base.

The story that is being told is compelling; more than you might expect. With more than 500 million gamers using graphics cards two years or older, based on Steam survey data, there is a HUGE audience that would benefit from an RX 580 graphics card upgrade. Older cards may lack support for FreeSync, HDR, higher refresh rate HDMI output and hardware encode/decode support for 4K resolution content. And while the GeForce GTX 1060 family would also meet that criteria, AMD wants to make the case that the Radeon family is the way to go.

slides-10.jpg

The Radeon RX 500-series is based on the same Polaris architecture as the RX 400-series, though AMD would tell us that the technology has been refined since initial launch. More time with the 14nm FinFET process technology has given the fab facility, and AMD, some opportunities to refine. This gives the new GPUs the ability to scale to higher clocks than they could before (though not without the cost of additional power draw). AMD has tweaked multi-monitor efficiency modes, allowing idle power consumption to drop a handful of watts thanks to a tweaked pixel clock.

Maybe the most substantial change with this RX 580 release is the unleashing of any kind of power consumption constraints for the board partners. The Radeon RX 480 launch was marred with issues surrounding the amount of power AMD claimed the boards would use compared to how much they DID use. This time around, all RX 580 graphics cards will ship with AT LEAST an 8-pin power connector, opening overclocked models to use as much as 225 watts. Some cards will have an 8+6-pin configuration to go even higher. Considering the RX 480 launched with a supposed 150 watt TDP (that it never lived up to), that’s quite an increase.

slides-13.jpg

AMD is hoping to convince gamers that Radeon Chill is a good solution to help some specific instances of excessive power draw. Recent drivers have added support for games like League of Legends and DOTA 2, adding to The Witcher 3, Dues Ex: Mankind Divided and more. I will freely admit that while the technology behind Chill sounds impressive, I don’t have the experience with it yet to claim or counterclaim its supposed advantages…without sacrificing user experience.

Continue reading our review of the Radeon RX 580 graphics card!

Author:
Manufacturer: NVIDIA

Overview

Since the launch of NVIDIA's Pascal architecture with the GTX 1070 and 1080 last May, we've taken a look at a lot of Pascal-based products, including the recent launch of the GTX 1080 Ti. By now, it is clear that Pascal has proven itself in a gaming context.

One frequent request we get about GPU coverage is to look at professional uses cases for these sort of devices. While gaming is still far and away the most common use for GPUs, things like high-quality rendering in industries like architecture, and new industries like deep learning can see vast benefits from acceleration by GPUs.

Today, we are taking a look at some of the latest NVIDIA Quadro GPUs on the market, the Quadro P2000, P4000, and P5000. 

DSC02752.JPG

Diving deep into the technical specs of these Pascal-based Quadro products and the AMD competitor we will be testing,  we find a wide range of compute capability, power consumption, and price.

  Quadro P2000 Quadro P4000 Quadro P5000 Radeon Pro Duo
Process 16nm 16nm 16nm 28nm
Code Name GP106 GP104 GP104 Fiji XT x 2
Shaders 1024 1792 2560 8192
Rated Clock Speed 1470 MHz (Boost) 1480 MHz (Boost) 1730 MHz (Boost) up to 1000 MHz
Memory Width 160-bit 256-bit 256-bit 4096-bit (HBM) x 2
Compute Perf (FP32) 3.0 TFLOPS 5.3 TFLOPS 8.9 TFLOPS 16.38 TFLOPS
Compute Perf (FP64) 1/32 FP32 1/32 FP32 1/32 FP 32 1/16 FP32
Frame Buffer 5GB 8GB 16GB 8GB (4GB x 2)
TDP 75W 105W 180W 350W
Street Price $599 $900 $2000 $800

The astute readers will notice similarities to the NVIDIA GeForce line of products as they take a look at these specifications.

Continue to read our roundup of 3 Pascal Quadro Graphics Cards

Author:
Manufacturer: Various

Background and setup

A couple of weeks back, during the excitement surrounding the announcement of the GeForce GTX 1080 Ti graphics card, NVIDIA announced an update to its performance reporting project known as FCAT to support VR gaming. The updated iteration, FCAT VR as it is now called, gives us the first true ability to not only capture the performance of VR games and experiences, but the tools with which to measure and compare.

Watch ths video walk through of FCAT VR with me and NVIDIA's Tom Petersen

I already wrote an extensive preview of the tool and how it works during the announcement. I think it’s likely that many of you overlooked it with the noise from a new GPU, so I’m going to reproduce some of it here, with additions and updates. Everyone that attempts to understand the data we will be presenting in this story and all VR-based tests going forward should have a baseline understanding of the complexity of measuring VR games. Previous tools don’t tell the whole story, and even the part they do tell is often incomplete.

If you already know how FCAT VR works from reading the previous article, you can jump right to the beginning of our results here.

Measuring and validating those claims has proven to be a difficult task. Tools that we used in the era of standard PC gaming just don’t apply. Fraps is a well-known and well-understood tool for measuring frame rates and frame times utilized by countless reviewers and enthusiasts, but Fraps lacked the ability to tell the complete story of gaming performance and experience. NVIDIA introduced FCAT and we introduced Frame Rating back in 2013 to expand the capabilities that reviewers and consumers had access to. Using more sophisticated technique that includes direct capture of the graphics card output in uncompressed form, a software-based overlay applied to each frame being rendered, and post-process analyzation of that data, we could communicate the smoothness of a gaming experience, better articulating it to help gamers make purchasing decisions.

vrpipe1.png

For VR though, those same tools just don’t cut it. Fraps is a non-starter as it measures frame rendering from the GPU point of view and completely misses the interaction between the graphics system and the VR runtime environment (OpenVR for Steam/Vive and OVR for Oculus). Because the rendering pipeline is drastically changed in the current VR integrations, what Fraps measures is completely different than the experience the user actually gets in the headset. Previous FCAT and Frame Rating methods were still viable but the tools and capture technology needed to be updated. The hardware capture products we used since 2013 were limited in their maximum bandwidth and the overlay software did not have the ability to “latch in” to VR-based games. Not only that but measuring frame drops, time warps, space warps and reprojections would be a significant hurdle without further development. 

vrpipe2.png

vrpipe3.png

NVIDIA decided to undertake the task of rebuilding FCAT to work with VR. And while obviously the company is hoping that it will prove its claims of performance benefits for VR gaming, it should not be overlooked the investment in time and money spent on a project that is to be open sourced and free available to the media and the public.

vlcsnap-2017-02-27-11h31m17s057.png

NVIDIA FCAT VR is comprised of two different applications. The FCAT VR Capture tool runs on the PC being evaluated and has a similar appearance to other performance and timing capture utilities. It uses data from Oculus Event Tracing as a part of the Windows ETW and SteamVR’s performance API, along with NVIDIA driver stats when used on NVIDIA hardware to generate performance data. It will and does work perfectly well on any GPU vendor’s hardware though with the access to the VR vendor specific timing results.

fcatvrcapture.jpg

Continue reading our first look at VR performance testing with FCAT VR!!

Author:
Manufacturer: NVIDIA

Flagship Performance Gets Cheaper

UPDATE! If you missed our launch day live stream, you can find the reply below:

It’s a very interesting time in the world of PC gaming hardware. We just saw the release of AMD’s Ryzen processor platform that shook up the processor market for the first time in a decade, AMD’s Vega architecture has been given the brand name “Vega”, and the anticipation for the first high-end competitive part from AMD since Hawaii grows as well. AMD was seemingly able to take advantage of Intel’s slow innovation pace on the processor and it was hoping to do the same to NVIDIA on the GPU. NVIDIA’s product line has been dominant in the mid and high-end gaming market since the 900-series with the 10-series products further cementing the lead.

box1.jpg

The most recent high end graphics card release came in the form of the updated Titan X based on the Pascal architecture. That was WAY back in August of 2016 – a full seven months ago! Since then we have seen very little change at the top end of the product lines and what little change we did see came from board vendors adding in technology and variation on the GTX 10-series.

Today we see the release of the new GeForce GTX 1080 Ti, a card that offers only a handful of noteworthy technological changes but instead is able to shake up the market by instigating pricing adjustments to make the performance offers more appealing, and lowering the price of everything else.

The GTX 1080 Ti GP102 GPU

I already wrote about the specifications of the GPU in the GTX 1080 Ti when it was announced last week, so here’s a simple recap.

  GTX 1080 Ti Titan X (Pascal) GTX 1080 GTX 980 Ti TITAN X GTX 980 R9 Fury X R9 Fury R9 Nano
GPU GP102 GP102 GP104 GM200 GM200 GM204 Fiji XT Fiji Pro Fiji XT
GPU Cores 3584 3584 2560 2816 3072 2048 4096 3584 4096
Base Clock 1480 MHz 1417 MHz 1607 MHz 1000 MHz 1000 MHz 1126 MHz 1050 MHz 1000 MHz up to 1000 MHz
Boost Clock 1582 MHz 1480 MHz 1733 MHz 1076 MHz 1089 MHz 1216 MHz - - -
Texture Units 224 224 160 176 192 128 256 224 256
ROP Units 88 96 64 96 96 64 64 64 64
Memory 11GB 12GB 8GB 6GB 12GB 4GB 4GB 4GB 4GB
Memory Clock 11000 MHz 10000 MHz 10000 MHz 7000 MHz 7000 MHz 7000 MHz 500 MHz 500 MHz 500 MHz
Memory Interface 352-bit 384-bit G5X 256-bit G5X 384-bit 384-bit 256-bit 4096-bit (HBM) 4096-bit (HBM) 4096-bit (HBM)
Memory Bandwidth 484 GB/s 480 GB/s 320 GB/s 336 GB/s 336 GB/s 224 GB/s 512 GB/s 512 GB/s 512 GB/s
TDP 250 watts 250 watts 180 watts 250 watts 250 watts 165 watts 275 watts 275 watts 175 watts
Peak Compute 10.6 TFLOPS 10.1 TFLOPS 8.2 TFLOPS 5.63 TFLOPS 6.14 TFLOPS 4.61 TFLOPS 8.60 TFLOPS 7.20 TFLOPS 8.19 TFLOPS
Transistor Count 12.0B 12.0B 7.2B 8.0B 8.0B 5.2B 8.9B 8.9B 8.9B
Process Tech 16nm 16nm 16nm 28nm 28nm 28nm 28nm 28nm 28nm
MSRP (current) $699 $1,200 $599 $649 $999 $499 $649 $549 $499

The GTX 1080 Ti looks a whole lot like the TITAN X launched in August of last year. Based on the 12B transistor GP102 chip, the new GTX 1080 Ti will have 3,584 CUDA core with a 1.60 GHz Boost clock. That gives it the same processor count as Titan X but with a slightly higher clock speed which should make the new GTX 1080 Ti slightly faster by at least a few percentage points and has a 4.7% edge in base clock compute capability. It has 28 SMs, 28 geometry units, 224 texture units.

GeForce_GTX_1080_Ti_Block_Diagram.png

Interestingly, the memory system on the GTX 1080 Ti gets adjusted – NVIDIA has disabled a single 32-bit memory controller to give the card a total of 352-bit wide bus and an odd-sounding 11GB memory capacity. The ROP count also drops to 88 units. Speaking of 11, the memory clock on the G5X implementation on GTX 1080 Ti will now run at 11 Gbps, a boost available to NVIDIA thanks to a chip revision from Micron and improvements to equalization and reverse signal distortion.

The move from 12GB of memory on the GP102-based Titan X to 11GB on the GTX 1080 Ti is an interesting move, and evokes memories of the GTX 970 fiasco where NVIDIA disabled a portion of that memory controller but left the memory that would have resided on it ON the board. At that point, what behaved as 3.5GB of memory at one speed and 500 MB at another speed, was the wrong move to make. But releasing the GTX 970 with "3.5GB" of memory would have seemed odd too. NVIDIA is not making the same mistake, instead building the GTX 1080 Ti with 11GB out the gate.

Continue reading our review of the NVIDIA GeForce GTX 1080 Ti 11GB graphics card!

Linked Multi-GPU Arrives... for Developers

The Khronos Group has released the Vulkan 1.0.42.0 specification, which includes experimental (more on that in a couple of paragraphs) support for VR enhancements, sharing resources between processes, and linking similar GPUs. This spec was released alongside a LunarG SDK and NVIDIA drivers, which are intended for developers, not gamers, that fully implement these extensions.

I would expect that the most interesting feature is experimental support for linking similar GPUs together, similar to DirectX 12’s Explicit Linked Multiadapter, which Vulkan calls a “Device Group”. The idea is that the physical GPUs hidden behind this layer can do things like share resources, such as rendering a texture on one GPU and consuming it in another, without the host code being involved. I’m guessing that some studios, like maybe Oxide Games, will decide to not use this feature. While it’s not explicitly stated, I cannot see how this (or DirectX 12’s Explicit Linked mode) would be compatible in cross-vendor modes. Unless I’m mistaken, that would require AMD, NVIDIA, and/or Intel restructuring their drivers to inter-operate at this level. Still, the assumptions that could be made with grouped devices are apparently popular with enough developers for both the Khronos Group and Microsoft to bother.

microsoft-dx12-build15-linked.png

A slide from Microsoft's DirectX 12 reveal, long ago.

As for the “experimental” comment that I made in the introduction... I was expecting to see this news around SIGGRAPH, which occurs in late-July / early-August, alongside a minor version bump (to Vulkan 1.1).

I might still be right, though.

The major new features of Vulkan 1.0.42.0 are implemented as a new classification of extensions: KHX. In the past, vendors, like NVIDIA and AMD, would add new features as vendor-prefixed extensions. Games could query the graphics driver for these abilities, and enable them if available. If these features became popular enough for multiple vendors to have their own implementation of it, a committee would consider an EXT extension. This would behave the same across all implementations (give or take) but not be officially adopted by the Khronos Group. If they did take it under their wing, it would be given a KHR extension (or added as a required feature).

The Khronos Group has added a new layer: KHX. This level of extension sits below KHR, and is not intended for production code. You might see where this is headed. The VR multiview, multi-GPU, and cross-process extensions are not supposed to be used in released video games until they leave KHX status. Unlike a vendor extension, the Khronos Group wants old KHX standards to drop out of existence at some point after they graduate to full KHR status. It’s not something that NVIDIA owns and will keep it around for 20 years after its usable lifespan just so old games can behave expectedly.

khronos-group-logo.png

How long will that take? No idea. I’ve already mentioned my logical but uneducated guess a few paragraphs ago, but I’m not going to repeat it; I have literally zero facts to base it on, and I don’t want our readers to think that I do. I don’t. It’s just based on what the Khronos Group typically announces at certain trade shows, and the length of time since their first announcement.

The benefit that KHX does bring us is that, whenever these features make it to public release, developers will have already been using it... internally... since around now. When it hits KHR, it’s done, and anyone can theoretically be ready for it when that time comes.

Author:
Manufacturer: NVIDIA

VR Performance Evaluation

Even though virtual reality hasn’t taken off with the momentum that many in the industry had expected on the heels of the HTC Vive and Oculus Rift launches last year, it remains one of the fastest growing aspects of PC hardware. More importantly for many, VR is also one of the key inflection points for performance moving forward; it requires more hardware, scalability, and innovation than any other sub-category including 4K gaming.  As such, NVIDIA, AMD, and even Intel continue to push the performance benefits of their own hardware and technology.

Measuring and validating those claims has proven to be a difficult task. Tools that we used in the era of standard PC gaming just don’t apply. Fraps is a well-known and well-understood tool for measuring frame rates and frame times utilized by countless reviewers and enthusiasts. But Fraps lacked the ability to tell the complete story of gaming performance and experience. NVIDIA introduced FCAT and we introduced Frame Rating back in 2013 to expand the capabilities that reviewers and consumers had access to. Using more sophisticated technique that includes direct capture of the graphics card output in uncompressed form, a software-based overlay applied to each frame being rendered, and post-process analyzation of that data, we were able to communicate the smoothness of a gaming experience, better articulating it to help gamers make purchasing decisions.

pipe1.jpg

VR pipeline when everything is working well.

For VR though, those same tools just don’t cut it. Fraps is a non-starter as it measures frame rendering from the GPU point of view and completely misses the interaction between the graphics system and the VR runtime environment (OpenVR for Steam/Vive and OVR for Oculus). Because the rendering pipeline is drastically changed in the current VR integrations, what Fraps measures is completely different than the experience the user actually gets in the headset. Previous FCAT and Frame Rating methods were still viable but the tools and capture technology needed to be updated. The hardware capture products we used since 2013 were limited in their maximum bandwidth and the overlay software did not have the ability to “latch in” to VR-based games. Not only that but measuring frame drops, time warps, space warps and reprojections would be a significant hurdle without further development.  

pipe2.jpg

VR pipeline with a frame miss.

NVIDIA decided to undertake the task of rebuilding FCAT to work with VR. And while obviously the company is hoping that it will prove its claims of performance benefits for VR gaming, it should not be overlooked the investment in time and money spent on a project that is to be open sourced and free available to the media and the public.

vlcsnap-2017-02-27-11h31m17s057.png

NVIDIA FCAT VR is comprised of two different applications. The FCAT VR Capture tool runs on the PC being evaluated and has a similar appearance to other performance and timing capture utilities. It uses data from Oculus Event Tracing as a part of the Windows ETW and SteamVR’s performance API, along with NVIDIA driver stats when used on NVIDIA hardware to generate performance data. It will and does work perfectly well on any GPU vendor’s hardware though with the access to the VR vendor specific timing results.

fcatvrcapture.jpg

Continue reading our preview of the new FCAT VR tool!

Manufacturer: PC Perspective

Living Long and Prospering

The open fork of AMD’s Mantle, the Vulkan API, was released exactly a year ago with, as we reported, a hard launch. This meant public, but not main-branch drivers for developers, a few public SDKs, a proof-of-concept patch for The Talos Principle, and, of course, the ratified specification. This sets up the API to find success right out of the gate, and we can now look back over the year since.

khronos-2017-vulkan-alt-logo.png

Thor's hammer, or a tempest in a teapot?

The elephant in the room is DOOM. This game has successfully integrated the API and it uses many of its more interesting features, like asynchronous compute. Because the API is designed in a sort-of “make a command, drop it on a list” paradigm, the driver is able to select commands based on priority and available resources. AMD’s products got a significant performance boost, relative to OpenGL, catapulting their Fury X GPU up to the enthusiast level that its theoretical performance suggested.

Mobile developers have been picking up the API, too. Google, who is known for banishing OpenCL from their Nexus line and challenging OpenGL ES with their Android Extension Pack (later integrated into OpenGL ES with version 3.2), has strongly backed Vulkan. The API was integrated as a core feature of Android 7.0.

On the engine and middleware side of things, Vulkan is currently “ready for shipping games” as of Unreal Engine 4.14. It is also included in Unity 5.6 Beta, which is expected for full release in March. Frameworks for emulators are also integrating Vulkan, often just to say they did, but sometimes to emulate the quirks of these system’s offbeat graphics co-processors. Many other engines, from Source 2 to Torque 3D, have also announced or added Vulkan support.

Finally, for the API itself, The Khronos Group announced (pg 22 from SIGGRAPH 2016) areas that they are actively working on. The top feature is “better” multi-GPU support. While Vulkan, like OpenCL, allows developers to enumerate all graphics devices and target them, individually, with work, it doesn’t have certain mechanisms, like being able to directly ingest output from one GPU into another. They haven’t announced a timeline for this.

Author:
Manufacturer: EVGA

The new EVGA GTX 1080 FTW2 with iCX Technology

Back in November of 2016, EVGA had a problem on its hands. The company had a batch of GTX 10-series graphics cards using the new ACX 3.0 cooler solution leave the warehouse missing thermal pads required to keep the power management hardware on its cards within reasonable temperature margins. To its credit, the company took the oversight seriously and instituted a set of solutions for consumers to select from: RMA, new VBIOS to increase fan speeds, or to install thermal pads on your hardware manually. Still, as is the case with any kind of product quality lapse like that, there were (and are) lingering questions about EVGA’s ability to maintain reliable product; with features and new options that don’t compromise the basics.

Internally, the drive to correct these lapses was…strong. From the very top of the food chain on down, it was hammered home that something like this simply couldn’t occur again, and even more so, EVGA was to develop and showcase a new feature set and product lineup demonstrating its ability to innovate. Thus was born, and accelerated, the EVGA iCX Technology infrastructure. While this was something in the pipeline for some time already, it was moved up to counter any negative bias that might have formed for EVGA’s graphics cards over the last several months. The goal was simple: prove that EVGA was the leader in graphics card design and prove that EVGA has learned from previous mistakes.

EVGA iCX Technology

Previous issues aside, the creation of iCX Technology is built around one simple question: is one GPU temperature sensor enough? For nearly all of today’s graphics cards, cooling is based around the temperature of the GPU silicon itself, as measured by NVIDIA (for all of EVGA’s cards). This is how fan curves are built, how GPU clock speeds are handled with GPU Boost, how noise profiles are created, and more. But as process technology has improved, and GPU design has weighed towards power efficiency, the GPU itself is often no longer the thermally limiting factor.

slides05.jpg

As it turns out, converting 12V (from the power supply) to ~1V (necessary for the GPU) is a simple process that creates a lot of excess heat. The thermal images above clearly demonstrate that and EVGA isn’t the only card vendor to take notice of this. As it turns out, EVGA’s product issue from last year was related to this – the fans were only spinning fast enough to keep the GPU cool and did not take into account the temperature of memory or power delivery.

The fix from EVGA is to ratchet up the number of sensors on the card PCB and wrap them with intelligence in the form of MCUs, updated Precision XOC software and user viewable LEDs on the card itself.

slides10.jpg

EVGA graphics cards with iCX Technology will include 9 total thermal sensors on the board, independent of the GPU temperature sensor directly integrated by NVIDIA. There are three sensors for memory, five for power delivery and an additional sensor for the GPU temperature. Some are located on the back of the PCB to avoid any conflicts with trace routing between critical components, including the secondary GPU sensor.

Continue reading about EVGA iCX Technology!

Author:
Manufacturer: NVIDIA

NVIDIA P100 comes to Quadro

At the start of the SOLIDWORKS World conference this week, NVIDIA took the cover off of a handful of new Quadro cards targeting professional graphics workloads. Though the bulk of NVIDIA’s discussion covered lower cost options like the Quadro P4000, P2000, and below, the most interesting product sits at the high end, the Quadro GP100.

As you might guess from the name alone, the Quadro GP100 is based on the GP100 GPU, the same silicon used on the Tesla P100 announced back in April of 2016. At the time, the GP100 GPU was specifically billed as an HPC accelerator for servers. It had a unique form factor with a passive cooler that required additional chassis fans. Just a couple of months later, a PCIe version of the GP100 was released under the Tesla GP100 brand with the same specifications.

quadro2017-2.jpg

Today that GPU hardware gets a third iteration as the Quadro GP100. Let’s take a look at the Quadro GP100 specifications and how it compares to some recent Quadro offerings.

  Quadro GP100 Quadro P6000 Quadro M6000 Full GP100
GPU GP100 GP102 GM200 GP100 (Pascal)
SMs 56 60 48 60
TPCs 28 30 24 (30?)
FP32 CUDA Cores / SM 64 64 64 64
FP32 CUDA Cores / GPU 3584 3840 3072 3840
FP64 CUDA Cores / SM 32 2 2 32
FP64 CUDA Cores / GPU 1792 120 96 1920
Base Clock 1303 MHz 1417 MHz 1026 MHz TBD
GPU Boost Clock 1442 MHz 1530 MHz 1152 MHz TBD
FP32 TFLOPS (SP) 10.3 12.0 7.0 TBD
FP64 TFLOPS (DP) 5.15 0.375 0.221 TBD
Texture Units 224 240 192 240
ROPs 128? 96 96 128?
Memory Interface 1.4 Gbps
4096-bit HBM2
9 Gbps
384-bit GDDR5X
6.6 Gbps
384-bit
GDDR5
4096-bit HBM2
Memory Bandwidth 716 GB/s 432 GB/s 316.8 GB/s ?
Memory Size 16GB 24 GB 12GB 16GB
TDP 235 W 250 W 250 W TBD
Transistors 15.3 billion 12 billion 8 billion 15.3 billion
GPU Die Size 610mm2 471 mm2 601 mm2 610mm2
Manufacturing Process 16nm 16nm 28nm 16nm

There are some interesting stats here that may not be obvious at first glance. Most interesting is that despite the pricing and segmentation, the GP100 is not the de facto fastest Quadro card from NVIDIA depending on your workload. With 3584 CUDA cores running at somewhere around 1400 MHz at Boost speeds, the single precision (32-bit) rating for GP100 is 10.3 TFLOPS, less than the recently released P6000 card. Based on GP102, the P6000 has 3840 CUDA cores running at something around 1500 MHz for a total of 12 TFLOPS.

gp102-blockdiagram.jpg

GP100 (full) Block Diagram

Clearly the placement for Quadro GP100 is based around its 64-bit, double precision performance, and its ability to offer real-time simulations on more complex workloads than other Pascal-based Quadro cards can offer. The Quadro GP100 offers 1/2 DP compute rate, totaling 5.2 TFLOPS. The P6000 on the other hand is only capable of 0.375 TLOPS with the standard, consumer level 1/32 DP rate. Inclusion of ECC memory support on GP100 is also something no other recent Quadro card has.

quadro2017-3.jpg

Raw graphics performance and throughput is going to be questionable until someone does some testing, but it seems likely that the Quadro P6000 will still be the best solution for that by at least a slim margin. With a higher CUDA core count, higher clock speeds and equivalent architecture, the P6000 should run games, graphics rendering and design applications very well.

There are other important differences offered by the GP100. The memory system is built around a 16GB HBM2 implementation which means more total memory bandwidth but at a lower capacity than the 24GB Quadro P6000. Offering 66% more memory bandwidth does mean that the GP100 offers applications that are pixel throughput bound an advantage, as long as the compute capability keeps up on the backend.

m.jpg

Continue reading our preview of the new Quadro GP100!

Author:
Manufacturer: AMD

Performance and Impressions

This content was sponsored by AMD.

Last week in part 1 of our look at the Radeon RX 460 as a budget gaming GPU, I detailed our progress through component selection. Centered around an XFX 2GB version of the Radeon RX 460, we built a machine using an Intel Core i3-6100, ASUS H110M motherboard, 8GB of DDR4 memory, both an SSD and a HDD, as well as an EVGA power supply and Corsair chassis. Part 1 discussed the reasons for our hardware selections as well as an unboxing and preview of the giveaway to come.

In today's short write up and video, I will discuss my impressions of the system overall as well as touch on the performance in a handful of games. Despite the low the price, and despite the budget moniker attributed to this build, a budding PC gamer or converted console gamer will find plenty of capability in this system.

Check out prices of Radeon RX 460 graphics cards on Amazon!!

Let's quickly recap the components making up our RX 460 budget build.

Our Radeon RX 460 Build

  Budget Radeon RX 460 Build
Processor Intel Core i3-6100 - $109
Cooler CRYORIG M9i - $19
Motherboard ASUS H110M-A/M.2 - $54
Memory 2 x 4GB Crucial Ballistix DDR4-2400 - $51
Graphics Card XFX Radeon RX 460 2GB - $98
Storage 240GB Sandisk SSD Plus - $68
1TB Western Digital Blue - $49
Case Corsair Carbide Series 88R - $49
Power Supply EVGA 500 Watt - $42
Monitor Nixues VUE24A 1080p 144Hz FreeSync - $251
Total Price $549 on Amazon; $799 with monitor on Amazon

For just $549 I was able to create shopping list of hardware that provides very impressive performance for the investment.

02.jpg

The completed system is damn nice looking, if I do say so myself. The Corsair Carbide 88R case sports a matte black finish with a large window to peer in at the hardware contained within. Coupled with the Nixeus FreeSync display and some Logitech G mouse and keyboard hardware we love, this is a configuration that any PC gamer would be proud to display.

03.jpg

Continue reading our performance thoughts on the RX 460 budget PC build!

Author:
Manufacturer: AMD

Our Radeon RX 460 Build

This content was sponsored by AMD.

Be sure you check out part 2 of our story where we detail the performance our RX 460 build provides as well as our contest page where you can win this PC from AMD and PC Perspective!

Just before CES this month, AMD came to me asking about our views and opinions on its Radeon RX 460 line of graphics cards, how the GPU is perceived in the market, and how I felt they could better position it to the target audience. It was at that point that I had to openly admit to never actually having installed and used an RX 460 GPU before. I know, shame on me.

I like to pride myself and PC Perspective on being one of the top sources of technical information in the world of PCs, gaming or otherwise, and in particular on GPUs. But a pitfall that I fall into, and I imagine many other reviewers and media do as well, is that I overly emphasize the high end of the market. And that I tend to shift what is considered a “budget” product up the scale more than I should. Is a $250 graphics card really a budget product that the mass market is going to purchase? No, and the numbers clearly point to that as fact. More buyers purchase cards in the sub-$150 segment than in any other, upgrading OEMs PCs and building low cost boxes for themselves and for the family/friends.

So, AMD came to me with a proposal to address this deficiency in my mental database. If we were willing to build a PC based on the RX 460, testing it and evaluating it honestly, and then give that built system back to the community, they would pay for the hardware and promotion of such an event. So here we are.

To build out the RX 460-based PC, I went to the experts in the world of budget PC builds, the /r/buildapc subreddit. The community here is known for being the best at penny-pinching and maximizing the performance-per-dollar implementations on builds. While not the only types of hardware they debate and discuss in that group, it definitely is the most requested. I started a thread there to ask for input and advice on building a system with the only requirements being inclusion of the Radeon RX 460 and perhaps an AMD FreeSync monitor.

Check out prices of Radeon RX 460 graphics cards on Amazon!!

The results were impressive; a solid collection of readers and contributors gave me suggestions for complete builds based around the RX 460. Processors varied, memory configurations varied, storage options varied, but in the end I had at least a dozen solid options that ranged in price from $400-800. With the advice of the community at hand, I set out to pick the components for our own build, which are highlighted below:

Our Radeon RX 460 Build

  Budget Radeon RX 460 Build
Processor Intel Core i3-6100 - $109
Cooler CRYORIG M9i - $19
Motherboard ASUS H110M-A/M.2 - $54
Memory 2 x 4GB Crucial Ballistix DDR4-2400 - $51
Graphics Card XFX Radeon RX 460 2GB - $98
Storage 240GB Sandisk SSD Plus - $68
1TB Western Digital Blue - $49
Case Corsair Carbide Series 88R - $49
Power Supply EVGA 500 Watt - $42
Monitor Nixues VUE24A 1080p 144Hz FreeSync - $251
Total Price $549 on Amazon; $799 with monitor on Amazon

I’ll go in order of presentation for simplicity sake. First up is the selection of the Intel Core i3-6100 processor. This CPU was the most popular offering in the /r/buildapc group and has been the darling of budget gaming builds for a while. It is frequently used because of it $109 price tag, along with dual-core, HyperThreaded performance at 3.7 GHz; giving you plenty of headroom for single threaded applications. Since most games aren’t going to utilize more than four threads, the PC gaming performance will be excellent as well. One frequent suggestion in our thread was the Intel Pentium G4560, a Kaby Lake based part that will sell for ~$70. That would have been my choice but it’s not shipping yet, and I don’t know when it will be.

cpu.jpg

Continue reading our budget build based on the Radeon RX 460!

High Bandwidth Cache

Apart from AMD’s other new architecture due out in 2017, its Zen CPU design, there is no other product that has had as much build up and excitement surrounding it than its Vega GPU architecture. After the world learned that Polaris would be a mainstream-only design that was released as the Radeon RX 480, the focus for enthusiasts came straight to Vega. It’s been on the public facing roadmaps for years and signifies the company’s return to the world of high end GPUs, something they have been missing since the release of the Fury X in mid-2015.

slides-2.jpg

Let’s be clear: today does not mark the release of the Vega GPU or products based on Vega. In reality, we don’t even know enough to make highly educated guesses about the performance without more details on the specific implementations. That being said, the information released by AMD today is interesting and shows that Vega will be much more than simply an increase in shader count over Polaris. It reminds me a lot of the build to the Fiji GPU release, when the information and speculation about how HBM would affect power consumption, form factor and performance flourished. What we can hope for, and what AMD’s goal needs to be, is a cleaner and more consistent product release than how the Fury X turned out.

The Design Goals

AMD began its discussion about Vega last month by talking about the changes in the world of GPUs and how the data sets and workloads have evolved over the last decade. No longer are GPUs only worried about games, but instead they must address profession workloads, enterprise workloads, scientific workloads. Even more interestingly, as we have discussed the gap in CPU performance vs CPU memory bandwidth and the growing gap between them, AMD posits that the gap between memory capacity and GPU performance is a significant hurdle and limiter to performance and expansion. Game installs, professional graphics sets, and compute data sets continue to skyrocket. Game installs now are regularly over 50GB but compute workloads can exceed petabytes. Even as we saw GPU memory capacities increase from Megabytes to Gigabytes, reaching as high as 12GB in high end consumer products, AMD thinks there should be more.

slides-8.jpg

Coming from a company that chose to release a high-end product limited to 4GB of memory in 2015, it’s a noteworthy statement.

slides-11.jpg

The High Bandwidth Cache

Bold enough to claim a direct nomenclature change, Vega 10 will feature a HBM2 based high bandwidth cache (HBC) along with a new memory hierarchy to call it into play. This HBC will be a collection of memory on the GPU package just like we saw on Fiji with the first HBM implementation and will be measured in gigabytes. Why the move to calling it a cache will be covered below. (But can’t we call get behind the removal of the term “frame buffer”?) Interestingly, this HBC doesn’t have to be HBM2 and in fact I was told that you could expect to see other memory systems on lower cost products going forward; cards that integrate this new memory topology with GDDR5X or some equivalent seem assured.

slides-13.jpg

Continue reading our preview of the AMD Vega GPU Architecture!

Author:
Manufacturer: AMD

AMD Enters Machine Learning Game with Radeon Instinct Products

NVIDIA has been diving in to the world of machine learning for quite a while, positioning themselves and their GPUs at the forefront on artificial intelligence and neural net development. Though the strategies are still filling out, I have seen products like the DIGITS DevBox place a stake in the ground of neural net training and platforms like Drive PX to perform inference tasks on those neural nets in self-driving cars. Until today AMD has remained mostly quiet on its plans to enter and address this growing and complex market, instead depending on the compute prowess of its latest Polaris and Fiji GPUs to make a general statement on their own.

instinct-18.jpg

The new Radeon Instinct brand of accelerators based on current and upcoming GPU architectures will combine with an open-source approach to software and present researchers and implementers with another option for machine learning tasks.

The statistics and requirements that come along with the machine learning evolution in the compute space are mind boggling. More than 2.5 quintillion bytes of data are generated daily and stored on phones, PCs and servers, both on-site and through a cloud infrastructure. That includes 500 million tweets, 4 million hours of YouTube video, 6 billion google searches and 205 billion emails.

instinct-6.jpg

Machine intelligence is going to allow software developers to address some of the most important areas of computing for the next decade. Automated cars depend on deep learning to train, medical fields can utilize this compute capability to more accurately and expeditiously diagnose and find cures to cancer, security systems can use neural nets to locate potential and current risk areas before they affect consumers; there are more uses for this kind of network and capability than we can imagine.

Continue reading our preview of the AMD Radeon Instinct machine learning processors!

Author:
Manufacturer: AMD

Third annual release

For the past two years, AMD has made a point of releasing one major software update to Radeon users and gamers annually. In 2014 this started with Catalyst Omega, a dramatic jump in performance, compatibility testing and new features were the story. We were told that for the first time in a very long while, and admitting this was the most important aspect to me, AMD was going to focus on building great software with regular and repeated updates. In 2015 we got a rebrand along with the release: Radeon Software Crimson Edition.  AMD totally revamped the visual and user experience of the driver software, bringing into the modern world of style and function. New features and added performance were also the hallmarks of this release, with a stronger promise to produce more frequent drivers to address any performance gaps, stability concerns and to include new features.

For December 2016 and into the new year, AMD is launching the Radeon Software Crimson ReLive Edition driver. While the name might seem silly, it will make sense as we dive into the new features.

While you may have seen the slides leak out through some other sites over the past 48 hours, I thought it was worth offering my input on the release.

Not a performance focused story

The first thing that should be noted with the ReLive Edition is that AMD isn’t making any claims of substantially improved performance. Instead, the Radeon Technologies Group software team is dedicated to continued and frequent iterations that improve performance gradually over time.

slides40.jpg

As you can see in the slide above, AMD is showing modest 4-8% performance gains on the Radeon RX 480 with the Crimson ReLive driver, and even then, its being compared to the launch driver of 16.6.2.  That is significantly lower than the claims made in previous major driver releases. Talking with AMD about this concern, it told us that they don’t foresee any dramatic, single large step increases in performance going forward. The major design changes that were delivered over the last several years, starting with a reconstruction of the CrossFire system thanks to our testing, have been settled. All we should expect going forward is a steady trickle of moderate improvements.

(Obviously, an exception may occur here or there, like with a new game release.)

Radeon ReLive Capture and Streaming Feature

So, what is new? The namesake feature for this driver is the Radeon ReLive application that is built in. ReLive is a capture and streaming tool that will draw obvious comparisons to what NVIDIA has done with GeForce Experience. The ReLive integration is clean and efficient, well designed and seems easy to use in my quick time with it. There are several key capabilities it offers.

First, you can record your gameplay with the press of a hotkey; this includes the ability to record and capture the desktop as well. AMD has included a bevy of settings for your captures to adjust quality, resolution, bitrate, FPS and more.

rev10-table.jpg

ReLive supports resolutions up to 4K30 with the Radeon R9 series of GPUs and up to 1440p30 with the RX 480/470/460. That includes both AVC H.264 and HEVC H.265.

Along with recording is support for background capture, called Instant Replay. This allows the gamer to always record in the background, up to 20 minutes, so you can be sure you capture amazing moments that happen during your latest gaming session. Hitting a hotkey will save the clip permanently to the system.

rev9.jpg

Continue reading our overview of the new AMD Radeon Crimson ReLive driver!

Author:
Manufacturer: NVIDIA

A Holiday Project

A couple of years ago, I performed an experiment around the GeForce GTX 750 Ti graphics card to see if we could upgrade basic OEM, off-the-shelf computers to become competent gaming PCs. The key to this potential upgrade was that the GTX 750 Ti offered a great amount of GPU horsepower (at the time) without the need for an external power connector. Lower power requirements on the GPU meant that even the most basic of OEM power supplies should be able to do the job.

That story was a success, both in terms of the result in gaming performance and the positive feedback it received. Today, I am attempting to do that same thing but with a new class of GPU and a new class of PC games.

The goal for today’s experiment remains pretty much the same: can a low-cost, low-power GeForce GTX 1050 Ti graphics card that also does not require any external power connector offer enough gaming horsepower to upgrade current shipping OEM PCs to "gaming PC" status?

Our target PCs for today come from Dell and ASUS. I went into my local Best Buy just before the Thanksgiving holiday and looked for two machines that varied in price and relative performance.

01.jpg

  Dell Inspiron 3650 ASUS M32CD-B09
Processor Intel Core i3-6100 Intel Core i7-6700
Motherboard Custom Custom
Memory 8GB DDR4 12GB DDR4
Graphics Card Intel HD Graphics 530 Intel HD Graphics 530
Storage 1TB HDD 1TB Hybrid HDD
Case Custom Custom
Power Supply 240 watt 350 watt
OS Windows 10 64-bit Windows 10 64-bit
Total Price $429 (Best Buy) $749 (Best Buy)

The specifications of these two machines are relatively modern for OEM computers. The Dell Inspiron 3650 uses a modest dual-core Core i3-6100 processor with a fixed clock speed of 3.7 GHz. It has a 1TB standard hard drive and a 240 watt power supply. The ASUS M32CD-B09 PC has a quad-core HyperThreaded processor with a 4.0 GHz maximum Turbo clock, a 1TB hybrid hard drive and a 350 watt power supply. Both of the CPUs share the same Intel brand of integrated graphics, the HD Graphics 520. You’ll see in our testing that not only is this integrated GPU unqualified for modern PC gaming, but it also performs quite differently based on the CPU it is paired with.

Continue reading our look at upgrading an OEM machine with the GTX 1050 Ti!!