Author:
Manufacturer: AMD

Specifications and Design

Just a couple of short weeks ago we looked at the Radeon Vega Frontier Edition 16GB graphics card in its air-cooled variety. The results were interesting – gaming performance proved to fall somewhere between the GTX 1070 and the GTX 1080 from NVIDIA’s current generation of GeForce products. That is under many of the estimates from players in the market, including media, fans, and enthusiasts.  But before we get to the RX Vega product family that is targeted at gamers, AMD has another data point for us to look at with a water-cooled version of Vega Frontier Edition. At a $1500 MSRP, which we shelled out ourselves, we are very interested to see how it changes the face of performance for the Vega GPU and architecture.

Let’s start with a look at the specifications of this version of the Vega Frontier Edition, which will be…familiar.

  Vega Frontier Edition (Liquid) Vega Frontier Edition Titan Xp GTX 1080 Ti Titan X (Pascal) GTX 1080 TITAN X GTX 980 R9 Fury X
GPU Vega Vega GP102 GP102 GP102 GP104 GM200 GM204 Fiji XT
GPU Cores 4096 4096 3840 3584 3584 2560 3072 2048 4096
Base Clock 1382 MHz 1382 MHz 1480 MHz 1480 MHz 1417 MHz 1607 MHz 1000 MHz 1126 MHz 1050 MHz
Boost Clock 1600 MHz 1600 MHz 1582 MHz 1582 MHz 1480 MHz 1733 MHz 1089 MHz 1216 MHz -
Texture Units ? ? 224 224 224 160 192 128 256
ROP Units 64 64 96 88 96 64 96 64 64
Memory 16GB 16GB 12GB 11GB 12GB 8GB 12GB 4GB 4GB
Memory Clock 1890 MHz 1890 MHz 11400 MHz 11000 MHz 10000 MHz 10000 MHz 7000 MHz 7000 MHz 1000 MHz
Memory Interface 2048-bit HBM2 2048-bit HBM2 384-bit G5X 352-bit 384-bit G5X 256-bit G5X 384-bit 256-bit 4096-bit (HBM)
Memory Bandwidth 483 GB/s 483 GB/s 547.7 GB/s 484 GB/s 480 GB/s 320 GB/s 336 GB/s 224 GB/s 512 GB/s
TDP 300 watts
~350 watts
300 watts 250 watts 250 watts 250 watts 180 watts 250 watts 165 watts 275 watts
Peak Compute 13.1 TFLOPS 13.1 TFLOPS 12.0 TFLOPS 10.6 TFLOPS 10.1 TFLOPS 8.2 TFLOPS 6.14 TFLOPS 4.61 TFLOPS 8.60 TFLOPS
Transistor Count ? ? 12.0B 12.0B 12.0B 7.2B 8.0B 5.2B 8.9B
Process Tech 14nm 14nm 16nm 16nm 16nm 16nm 28nm 28nm 28nm
MSRP (current) $1499 $999 $1200 $699 $1,200 $599 $999 $499 $649

The base specs remain unchanged and AMD lists the same memory frequency and even GPU clock rates across both models. In practice though, the liquid cooled version runs at higher sustained clocks and can overclock a bit easier as well (more details later). What does change with the liquid cooled version is a usable BIOS switch on top of the card that allows you to move between two distinct power draw states: 300 watts and 350 watts.

IMG_4728.JPG

First, it’s worth noting this is a change from the “375 watt” TDP that this card was listed at during the launch and announcement. AMD was touting a 300-watt and 375-watt version of Frontier Edition, but it appears the company backed off a bit on that, erring on the side of caution to avoid breaking any of the specifcations of PCI Express (board slot or auxiliary connectors). Even more concerning is that AMD chose to have the default state of the switch on the Vega FE Liquid card at 300 watts rather than the more aggressive 350 watts. AMD claims this to avoid any problems with lower quality power supplies that may struggle to hit slightly over 150 watts of power draw (and resulting current) from the 8-pin power connections. I would argue that any system that is going to install a $1500 graphics card can and should be prepared to provide the necessary power, but for the professional market, AMD leans towards caution. (It’s worth pointing out the RX 480 power issues that may have prompted this internal decision making were more problematic because they impacted the power delivery through the motherboard, while the 6- and 8-pin connectors are generally much safer to exceed the ratings.)

Even without clock speed changes, the move to water cooling should result in better and more consistent performance by removing the overheating concerns that surrounded our first Radeon Vega Frontier Edition review. But let’s dive into the card itself and see how the design process created a unique liquid cooled solution.

Continue reading our review of the Radeon Vega Frontier Edition Liquid-Cooled card!!

Author:
Manufacturer: Sapphire

Overview

There has been a lot of news lately about the release of Cryptocurrency-specific graphics cards from both NVIDIA and AMD add-in board partners. While we covered the currently cryptomining phenomenon in an earlier article, today we are taking a look at one of these cards geared towards miners.

IMG_4681.JPG

It's worth noting that I purchased this card myself from Newegg, and neither AMD or Sapphire are involved in this article. I saw this card pop up on Newegg a few days ago, and my curiosity got the best of me.

There has been a lot of speculation, and little official information from vendors about what these mining cards will actually entail.

From the outward appearance, it is virtually impossible to distinguish this "new" RX 470 from the previous Sapphire Nitro+ RX 470, besides the lack of additional display outputs beyond the DVI connection. Even the branding and labels on the card identify it as a Nitro+ RX 470.

In order to test the hashing rates of this GPU, we are using Claymore's Dual Miner Version 9.6 (mining Ethereum only) against a reference design RX 470, also from Sapphire.

IMG_4684.JPG

On the reference RX 470 out of the box, we hit rates of about 21.8 MH/s while mining Ethereum. 

Once we moved to the Sapphire mining card, we move up to at least 24 MH/s from the start.

Continue reading about the Sapphire Radeon RX 470 Mining Edition!

Author:
Manufacturer: AKiTiO

A long time coming

External video cards for laptops have long been a dream of many PC enthusiasts, and for good reason. It’s compelling to have a thin-and-light notebook with great battery life for things like meetings or class, with the ability to plug it into a dock at home and enjoy your favorite PC games.

Many times we have been promised that external GPUs for notebooks would be a viable option. Over the years there have been many commercial solutions involving both industry standard protocols like ExpressCard, as well as proprietary connections to allow you to externally connect PCIe devices. Inspiring hackers have also had their hand with this for many years, cobbling together interesting solutions using mPCIe and M.2 ports on their notebooks which were meant for other devices.

With the introduction of Intel’s Thunderbolt standard in 2011, there was a hope that we would finally achieve external graphics nirvana. A modern, Intel-backed protocol promising PCIe x4 speeds (PCIe 2.0 at that point) sounded like it would be ideal for connecting GPUs to notebooks, and in some ways it was. Once again the external graphics communities managed to get it to work through the use of enclosures meant to connect other non-GPU PCIe devices such as RAID and video capture cards to systems. However, software support was still a limiting factor. You were required to use an external monitor to display your video, and it still felt like you were just riding the line between usability and a total hack. It felt like we were never going to get true universal support for external GPUs on notebooks.

Then, seemingly of out of nowhere, Intel decided to promote native support for external GPUs as a priority when they introduced Thunderbolt 3. Fast forward, and we've already seen a much larger adoption of Thunderbolt 3 on PC notebooks than we ever did with the previous Thunderbolt implementations. Taking all of this into account, we figured it was time to finally dip our toes into the eGPU market. 

For our testing, we decided on the AKiTio Node for several reasons. First, at around $300, it's by far the lowest cost enclosure built to support GPUs. Additionally, it seems to be one of the most compatible devices currently on the market according to the very helpful comparison chart over at eGPU.io. The eGPU site is a wonderful resource for everything external GPU, over any interface possible, and I would highly recommend heading over there to do some reading if you are interested in trying out an eGPU for yourself.

The Node unit itself is a very utilitarian design. Essentially you get a folded sheet metal box with a Thunderbolt controller and 400W SFX power supply inside.

DSC03490.JPG

In order to install a GPU into the Node, you must first unscrew the enclosure from the back and slide the outer shell off of the device.

DSC03495.JPG

Once inside, we can see that there is ample room for any graphics card you might want to install in this enclosure. In fact, it seems a little too large for any of the GPUs we installed, including GTX 1080 Ti models. Here, you can see a more reasonable RX 570 installed.

Beyond opening up the enclosure to install a GPU, there is very little configuration required. My unit required a firmware update, but that was easily applied with the tools from the AKiTio site.

From here, I simply connected the Node to a ThinkPad X1, installed the NVIDIA drivers for our GTX 1080 Ti, and everything seemed to work — including using the 1080 Ti with the integrated notebook display and no external monitor!

Now that we've got the Node working, let's take a look at some performance numbers.

Continue reading our look at external graphics with the Thunderbolt 3 AKiTiO Node!

Author:
Manufacturer: AMD

Two Vegas...ha ha ha

When the preorders for the Radeon Vega Frontier Edition went up last week, I made the decision to place orders in a few different locations to make sure we got it in as early as possible. Well, as it turned out, we actually had the cards show up very quickly…from two different locations.

dualvega.jpg

So, what is a person to do if TWO of the newest, most coveted GPUs show up on their doorstep? After you do the first, full review of the single GPU iteration, you plug those both into your system and do some multi-GPU CrossFire testing!

There of course needs to be some discussion up front about this testing and our write up. If you read my first review of the Vega Frontier Edition you will clearly note my stance on the idea that “this is not a gaming card” and that “the drivers aren’t ready. Essentially, I said these potential excuses for performance were distraction and unwarranted based on the current state of Vega development and the proximity of the consumer iteration, Radeon RX.

IMG_4688.JPG

But for multi-GPU, it’s a different story. Both competitors in the GPU space will tell you that developing drivers for CrossFire and SLI is incredibly difficult. Much more than simply splitting the work across different processors, multi-GPU requires extra attention to specific games, game engines, and effects rendering that are not required in single GPU environments. Add to that the fact that the market size for CrossFire and SLI has been shrinking, from an already small state, and you can see why multi-GPU is going to get less attention from AMD here.

Even more, when CrossFire and SLI support gets a focus from the driver teams, it is often late in the process, nearly last in the list of technologies to address before launch.

With that in mind, we all should understand the results we are going to show you might be indicative of the CrossFire scaling when Radeon RX Vega launches, but it very well could not. I would look at the data we are presenting today as a “current state” of CrossFire for Vega.

Continue reading our look at a pair of Vega Frontier Edition cards in CrossFire!

Manufacturer: NVIDIA

Performance not two-die four.

When designing an integrated circuit, you are attempting to fit as much complexity as possible within your budget of space, power, and so forth. One harsh limitation for GPUs is that, while your workloads could theoretically benefit from more and more processing units, the number of usable chips from a batch shrinks as designs grow, and the reticle limit of a fab’s manufacturing node is basically a brick wall.

What’s one way around it? Split your design across multiple dies!

nvidia-2017-multidie.png

NVIDIA published a research paper discussing just that. In their diagram, they show two examples. In the first diagram, the GPU is a single, typical die that’s surrounded by four stacks of HBM, like GP100; the second configuration breaks the GPU into five dies, four GPU modules and an I/O controller, with each GPU module attached to a pair of HBM stacks.

NVIDIA ran simulations to determine how this chip would perform, and, in various workloads, they found that it out-performed the largest possible single-chip GPU by about 45.5%. They scaled up the single-chip design until it had the same amount of compute units as the multi-die design, even though this wouldn’t work in the real world because no fab could actual lithograph it. Regardless, that hypothetical, impossible design was only ~10% faster than the actually-possible multi-chip one, showing that the overhead of splitting the design is only around that much, according to their simulation. It was also faster than the multi-card equivalent by 26.8%.

While NVIDIA’s simulations, run on 48 different benchmarks, have accounted for this, I still can’t visualize how this would work in an automated way. I don’t know how the design would automatically account for fetching data that’s associated with other GPU modules, as this would probably be a huge stall. That said, they spent quite a bit of time discussing how much bandwidth is required within the package, and figures of 768 GB/s to 3TB/s were mentioned, so it’s possible that it’s just the same tricks as fetching from global memory. The paper touches on the topic several times, but I didn’t really see anything explicit about what they were doing.

amd-2017-epyc-breakdown.jpg

If you’ve been following the site over the last couple of months, you’ll note that this is basically the same as AMD is doing with Threadripper and EPYC. The main difference is that CPU cores are isolated, so sharing data between them is explicit. In fact, when that product was announced, I thought, “Huh, that would be cool for GPUs. I wonder if it’s possible, or if it would just end up being Crossfire / SLI.”

Apparently not? It should be possible?

I should note that I doubt this will be relevant for consumers. The GPU is the most expensive part of a graphics card. While the thought of four GP102-level chips working together sounds great for 4K (which is 4x1080p in resolution) gaming, quadrupling the expensive part sounds like a giant price-tag. That said, the market of GP100 (and the upcoming GV100) would pay five-plus digits for the absolute fastest compute device for deep-learning, scientific research, and so forth.

The only way I could see this working for gamers is if NVIDIA finds the sweet-spot for performance-to-yield (for a given node and time) and they scale their product stack with multiples of that. In that case, it might be cost-advantageous to hit some level of performance, versus trying to do it with a single, giant chip.

This is just my speculation, however. It’ll be interesting to see where this goes, whenever it does.

Author:
Manufacturer: Galax

GTX 1060 keeps on kicking

Despite the market for graphics cards being disrupted by the cryptocurrency mining craze, board partners like Galax continue to build high quality options for gamers...if they can get their hands on them. We recently received a new Galax GTX 1060 EXOC White 6GB card that offers impressive performance and features as well as a visual style to help it stand out from the crowd.

We have worked with GeForce GTX 1060 graphics cards quite a bit on PC Perspective, so there is not a need to dive into the history of the GPU itself. If you need a refresher on this GP106 GPU, where it stands in the pantheon on the current GPU market, check out my launch review of the GTX 1060 from last year. The release of AMD’s Radeon RX 580 did change things a bit in the market landscape, so that review might be worth looking at too.

Our quick review at the Galax GTX 1060 EXOC White will look at performance (briefly), overclocking, and cost. But first, let’s take a look at this thing.

The Galax GTX 1060 EXOC White

As the name implies, the EXOC White card from Galax is both overclocked and uses a white fan shroud to add a little flair to the design. The PCB is a standard black color, but with the fan and back plate both a bright white, the card will be a point of interest for nearly any PC build. Pairing this with a white-accented motherboard, like the recent ASUS Prime series, would be an excellent visual combination.

IMG_4674.JPG

The fans on the EXOC White have clear-ish white blades that are illuminated by the white LEDs that shine through the fan openings on the shroud.

IMG_4675.JPG

IMG_4676.JPG

The cooler that Galax has implemented is substantial, with three heatpipes used to distribute the load from the GPU across the fins. There is a 6-pin power connector (standard for the GTX 1060) but that doesn’t appear to hold back the overclocking capability of the GPU.

IMG_4677.JPG

There is a lot of detail on the heatsink shroud – and either you like it or you don’t.

IMG_4678.JPG

Galax has included a white backplate that doubles as artistic style and heatsink. I do think that with most users’ cases showcasing the rear of the graphics card more than the front, a good quality back plate is a big selling point.

IMG_4680.JPG

The output connectivity includes a pair of DVI ports, a full-size HDMI and a full-size DisplayPort; more than enough for nearly any buyer of this class of GPU.

Continue reading about the Galax GTX 1060 EXOC White 6GB!

Author:
Manufacturer: AMD

An interesting night of testing

Last night I did our first ever live benchmarking session using the just-arrived Radeon Vega Frontier Edition air-cooled graphics card. Purchased directly from a reseller, rather than being sampled by AMD, gave us the opportunity to testing for a new flagship product without an NDA in place to keep us silenced, so I thought it would be fun to the let the audience and community go along for the ride of a traditional benchmarking session. Though I didn’t get all of what I wanted done in that 4.5-hour window, it was great to see the interest and excitement for the product and the results that we were able to generate.

But to the point of the day – our review of the Radeon Vega Frontier Edition graphics card. Based on the latest flagship GPU architecture from AMD, the Radeon Vega FE card has a lot riding on its shoulders, despite not being aimed at gamers. It is the FIRST card to be released with Vega at its heart. It is the FIRST instance of HBM2 being utilized in a consumer graphics card. It is the FIRST in a new attempt from AMD to target the group of users between gamers and professional users (like NVIDIA has addressed with Titan previously). And, it is the FIRST to command as much attention and expectation for the future of a company, a product line, and a fan base.

IMG_4621.JPG

Other than the architectural details that AMD gave us previously, we honestly haven’t been briefed on the performance expectations or the advancements in Vega that we should know about. The Vega FE products were released to the market with very little background, only well-spun turns of phrase emphasizing the value of the high performance and compatibility for creators. There has been no typical “tech day” for the media to learn fully about Vega and there were no samples from AMD to media or analysts (that I know of). Unperturbed by that, I purchased one (several actually, seeing which would show up first) and decided to do our testing.

On the following pages, you will see a collection of tests and benchmarks that range from 3DMark to The Witcher 3 to SPECviewperf to LuxMark, attempting to give as wide a viewpoint of the Vega FE product as I can in a rather short time window. The card is sexy (maybe the best looking I have yet seen), but will disappoint many on the gaming front. For professional users that are okay not having certified drivers, performance there is more likely to raise some impressed eyebrows.

Radeon Vega Frontier Edition Specifications

Through leaks and purposeful information dumps over the past couple of months, we already knew a lot about the Radeon Vega Frontier Edition card prior to the official sale date this week. But now with final specifications in hand, we can start to dissect what this card actually is.

  Vega Frontier Edition Titan Xp GTX 1080 Ti Titan X (Pascal) GTX 1080 TITAN X GTX 980 R9 Fury X R9 Fury
GPU Vega GP102 GP102 GP102 GP104 GM200 GM204 Fiji XT Fiji Pro
GPU Cores 4096 3840 3584 3584 2560 3072 2048 4096 3584
Base Clock 1382 MHz 1480 MHz 1480 MHz 1417 MHz 1607 MHz 1000 MHz 1126 MHz 1050 MHz 1000 MHz
Boost Clock 1600 MHz 1582 MHz 1582 MHz 1480 MHz 1733 MHz 1089 MHz 1216 MHz - -
Texture Units ? 224 224 224 160 192 128 256 224
ROP Units 64 96 88 96 64 96 64 64 64
Memory 16GB 12GB 11GB 12GB 8GB 12GB 4GB 4GB 4GB
Memory Clock 1890 MHz 11400 MHz 11000 MHz 10000 MHz 10000 MHz 7000 MHz 7000 MHz 1000 MHz 1000 MHz
Memory Interface 2048-bit HBM2 384-bit G5X 352-bit 384-bit G5X 256-bit G5X 384-bit 256-bit 4096-bit (HBM) 4096-bit (HBM)
Memory Bandwidth 483 GB/s 547.7 GB/s 484 GB/s 480 GB/s 320 GB/s 336 GB/s 224 GB/s 512 GB/s 512 GB/s
TDP 300 watts 250 watts 250 watts 250 watts 180 watts 250 watts 165 watts 275 watts 275 watts
Peak Compute 13.1 TFLOPS 12.0 TFLOPS 10.6 TFLOPS 10.1 TFLOPS 8.2 TFLOPS 6.14 TFLOPS 4.61 TFLOPS 8.60 TFLOPS 7.20 TFLOPS
Transistor Count ? 12.0B 12.0B 12.0B 7.2B 8.0B 5.2B 8.9B 8.9B
Process Tech 14nm 16nm 16nm 16nm 16nm 28nm 28nm 28nm 28nm
MSRP (current) $999 $1200 $699 $1,200 $599 $999 $499 $649 $549

The Vega FE shares enough of a specification listing with the Fury X that it deserves special recognition. Both cards sport 4096 stream processors, 64 ROPs and 256 texture units. The Vega FE is running at much higher clock speeds (35-40% higher) and also upgrades to the next generation of high-bandwidth memory and quadruples capacity. Still, there will be plenty of comparisons between the two products, looking to measure IPC changes from the CUs (compute units) from Fiji to the NCUs built for Vega.

DSC03536 copy.jpg

The Radeon Vega GPU

The clock speeds also see another shift this time around with the adoption of “typical” clock speeds. This is something that NVIDIA has been using for a few generations with the introduction of GPU Boost, and tells the consumer how high they should expect clocks to go in a nominal workload. Normally I would say a gaming workload, but since this card is supposedly for professional users and the like, I assume this applies across the board. So even though the GPU is rated at a “peak” clock rate of 1600 MHz, the “typical” clock rate is 1382 MHz. (As an early aside, I did NOT see 1600 MHz in any of my testing time with our Vega FE but did settle in a ~1440 MHz clock most of the time.)

Continue reading our review of the AMD Radeon Vega Frontier Edition!

Author:
Manufacturer: PC Perspective

Why?

Astute readers of the site might remember the original story we did on Bitcoin mining in 2011, the good ole' days where the concept of the blockchain was new and exciting and mining Bitcoin on a GPU was still plenty viable.

gpu-bitcoin.jpg

However, that didn't last long, as the race for cash lead people to developing Application Specific Integrated Circuits (ASICs) dedicated solely to Bitcoin mining quickly while sipping power. Use of the expensive ASICs drove the difficulty of mining Bitcoin to the roof and killed any sort of chance of profitability from mere mortals mining cryptocurrency.

Cryptomining saw a resurgence in late 2013 with the popular adoption of alternate cryptocurrencies, specifically Litecoin which was based on the Scrypt algorithm instead of AES-256 like Bitcoin. This meant that the ASIC developed for mining Bitcoin were useless. This is also the period of time that many of you may remember as the "Dogecoin" era, my personal favorite cryptocurrency of all time. 

dogecoin-300.png

Defenders of these new "altcoins" claimed that Scrypt was different enough that ASICs would never be developed for it, and GPU mining would remain viable for a larger portion of users. As it turns out, the promise of money always wins out, and we soon saw Scrypt ASICs. Once again, the market for GPU mining crashed.

That brings us to today, and what I am calling "Third-wave Cryptomining." 

While the mass populous stopped caring about cryptocurrency as a whole, the dedicated group that was left continued to develop altcoins. These different currencies are based on various algorithms and other proofs of works (see technologies like Storj, which use the blockchain for a decentralized Dropbox-like service!).

As you may have predicted, for various reasons that might be difficult to historically quantify, there is another very popular cryptocurrency from this wave of development, Ethereum.

ETHEREUM-LOGO_LANDSCAPE_Black.png

Ethereum is based on the Dagger-Hashimoto algorithm and has a whole host of different quirks that makes it different from other cryptocurrencies. We aren't here to get deep in the woods on the methods behind different blockchain implementations, but if you have some time check out the Ethereum White Paper. It's all very fascinating.

Continue reading our look at this third wave of cryptocurrency!

Author:
Manufacturer: AMD

We are up to two...

UPDATE (5/31/2017): Crystal Dynamics was able to get back to us with a couple of points on the changes that were made with this patch to affect the performance of AMD Ryzen processors.

  1. Rise of the Tomb Raider splits rendering tasks to run on different threads. By tuning the size of those tasks – breaking some up, allowing multicore CPUs to contribute in more cases, and combining some others, to reduce overheads in the scheduler – the game can more efficiently exploit extra threads on the host CPU.
     
  2. An optimization was identified in texture management that improves the combination of AMD CPU and NVIDIA GPU.  Overhead was reduced by packing texture descriptor uploads into larger chunks.

There you have it, a bit more detail on the software changes made to help adapt the game engine to AMD's Ryzen architecture. Not only that, but it does confirm our information that there was slightly MORE to address in the Ryzen+GeForce combinations.

END UPDATE

Despite a couple of growing pains out of the gate, the Ryzen processor launch appears to have been a success for AMD. Both the Ryzen 7 and the Ryzen 5 releases proved to be very competitive with Intel’s dominant CPUs in the market and took significant leads in areas of massive multi-threading and performance per dollar. An area that AMD has struggled in though has been 1080p gaming – performance in those instances on both Ryzen 7 and 5 processors fell behind comparable Intel parts by (sometimes) significant margins.

Our team continues to watch the story to see how AMD and game developers work through the issue. Most recently I posted a look at the memory latency differences between Ryzen and Intel Core processors. As it turns out, the memory latency differences are a significant part of the initial problem for AMD:

Because of this, I think it is fair to claim that some, if not most, of the 1080p gaming performance deficits we have seen with AMD Ryzen processors are a result of this particular memory system intricacy. You can combine memory latency with the thread-to-thread communication issue we discussed previously into one overall system level complication: the Zen memory system behaves differently than anything we have seen prior and it currently suffers in a couple of specific areas because of it.

In that story I detailed our coverage of the Ryzen processor and its gaming performance succinctly:

Our team has done quite a bit of research and testing on this topic. This included a detailed look at the first asserted reason for the performance gap, the Windows 10 scheduler. Our summary there was that the scheduler was working as expected and that minimal difference was seen when moving between different power modes. We also talked directly with AMD to find out its then current stance on the results, backing up our claims on the scheduler and presented a better outlook for gaming going forward. When AMD wanted to test a new custom Windows 10 power profile to help improve performance in some cases, we took part in that too. In late March we saw the first gaming performance update occur courtesy of Ashes of the Singularity: Escalation where an engine update to utilize more threads resulted in as much as 31% average frame increase.

Quick on the heels of the Ryzen 7 release, AMD worked with the developer Oxide on the Ashes of the Singularity: Escalation engine. Through tweaks and optimizations, the game was able to showcase as much as a 30% increase in average frame rate on the integrated benchmark. While this was only a single use case, it does prove that through work with the developers, AMD has the ability to improve the 1080p gaming positioning of Ryzen against Intel.

rotr-screen4-small.jpg

Fast forward to today and I was surprised to find a new patch for Rise of the Tomb Raider, a game that was actually one of the worst case scenarios for AMD with Ryzen. (Patch #12, v1.0.770.1) The patch notes mention the following:

The following changes are included in this patch

- Fix certain DX12 crashes reported by users on the forums.

- Improve DX12 performance across a variety of hardware, in CPU bound situations. Especially performance on AMD Ryzen CPUs can be significantly improved.

While we expect this patch to be an improvement for everyone, if you do have trouble with this patch and prefer to stay on the old version we made a Beta available on Steam, build 767.2, which can be used to switch back to the previous version.

We will keep monitoring for feedback and will release further patches as it seems required. We always welcome your feedback!

Obviously the data point that stood out for me was the improved DX12 performance “in CPU bound situations. Especially on AMD Ryzen CPUs…”

Remember how the situation appeared in April?

rotr.png

The Ryzen 7 1800X was 24% slower than the Intel Core i7-7700K – a dramatic difference for a processor that should only have been ~8-10% slower in single threaded workloads.

How does this new patch to RoTR affect performance? We tested it on the same Ryzen 7 1800X benchmarks platform from previous testing including the ASUS Crosshair VI Hero motherboard, 16GB DDR4-2400 memory and GeForce GTX 1080 Founders Edition using the 378.78 driver. All testing was done under the DX12 code path.

tr-1.png

tr-2.png

The Ryzen 7 1800X score jumps from 107 FPS to 126.44 FPS, an increase of 17%! That is a significant boost in performance at 1080p while still running at the Very High image quality preset, indicating that the developer (and likely AMD) were able to find substantial inefficiencies in the engine. For comparison, the 8-core / 16-thread Intel Core i7-6900K only sees a 2.4% increase from this new game revision. This tells us that the changes to the game were specific to Ryzen processors and their design, but that no performance was redacted from the Intel platforms.

Continue reading our look at the new Rise of the Tomb Raider patch for Ryzen!

Manufacturer: The Khronos Group

The Right People to Interview

Last week, we reported that OpenCL’s roadmap would be merging into Vulkan, and OpenCL would, starting at some unspecified time in the future, be based “on an extended version of the Vulkan API”. This was based on quotes from several emails between myself and the Khronos Group.

Since that post, I had the opportunity to have a phone interview with Neil Trevett, president of the Khronos Group and chairman of the OpenCL working group, and Tom Olson, chairman of the Vulkan working group. We spent a little over a half hour going over Neil’s International Workshop on OpenCL (IWOCL) presentation, discussing the decision, and answering a few lingering questions. This post will present the results of that conference call in a clean, readable way.

khronos-officiallogo-Vulkan_500px_Dec16.png

First and foremost, while OpenCL is planning to merge into the Vulkan API, the Khronos Group wants to make it clear that “all of the merging” is coming from the OpenCL working group. The Vulkan API roadmap is not affected by this decision. Of course, the Vulkan working group will be able to take advantage of technologies that are dropping into their lap, but those discussions have not even begun yet.

Neil: Vulkan has its mission and its roadmap, and it’s going ahead on that. OpenCL is doing all of the merging. We’re kind-of coming in to head in the Vulkan direction.

Does that mean, in the future, that there’s a bigger wealth of opportunity to figure out how we can take advantage of all this kind of mutual work? The answer is yes, but we haven’t started those discussions yet. I’m actually excited to have those discussions, and are many people, but that’s a clarity. We haven’t started yet on how Vulkan, itself, is changed (if at all) by this. So that’s kind-of the clarity that I think is important for everyone out there trying to understand what’s going on.

Tom also prepared an opening statement. It’s not as easy to abbreviate, so it’s here unabridged.

Tom: I think that’s fair. From the Vulkan point of view, the way the working group thinks about this is that Vulkan is an abstract machine, or at least there’s an abstract machine underlying it. We have a programming language for it, called SPIR-V, and we have an interface controlling it, called the API. And that machine, in its full glory… it’s a GPU, basically, and it’s got lots of graphics functionality. But you don’t have to use that. And the API and the programming language are very general. And you can build lots of things with them. So it’s great, from our point of view, that the OpenCL group, with their special expertise, can use that and leverage that. That’s terrific, and we’re fully behind it, and we’ll help them all we can. We do have our own constituency to serve, which is the high-performance game developer first and foremost, and we are going to continue to serve them as our main mission.

So we’re not changing our roadmap so much as trying to make sure we’re a good platform for other functionality to be built on.

Neil then went on to mention that the decision to merge OpenCL’s roadmap into the Vulkan API took place only a couple of weeks ago. The purpose of the press release was to reach OpenCL developers and get their feedback. According to him, they did a show of hands at the conference, with a room full of a hundred OpenCL developers, and no-one was against moving to the Vulkan API. This gives them confidence that developers will accept the decision, and that their needs will be served by it.

Next up is the why. Read on for more.

Author:
Manufacturer: AMD

Is it time to buy that new GPU?

Testing commissioned by AMD. This means that AMD paid us for our time, but had no say in the results or presentation of them.

Earlier this week Bethesda and Arkane Studios released Prey, a first-person shooter that is a re-imaging of the 2006 game of the same name. Fans of System Shock will find a lot to love about this new title and I have found myself enamored with the game…in the name of science of course.

Prey-2017-05-06-15-52-04-16.jpg

While doing my due diligence and performing some preliminary testing to see if we would utilize Prey for graphics testing going forward, AMD approached me to discuss this exact title. With the release of the Radeon RX 580 in April, one of the key storylines is that the card offers a reasonably priced upgrade path for users of 2+ year old hardware. With that upgrade you should see some substantial performance improvements and as I will show you here, the new Prey is a perfect example of that.

Targeting the Radeon R9 380, a graphics card that was originally released back in May of 2015, the RX 580 offers substantially better performance at a very similar launch price. The same is true for the GeForce GTX 960: launched in January of 2015, it is slightly longer in the tooth. AMD’s data shows that 80% of the users on Steam are running on R9 380X or slower graphics cards and that only 10% of them upgraded in 2016. Considering the great GPUs that were available then (including the RX 480 and the GTX 10-series), it seems more and more likely that we going to hit an upgrade inflection point in the market.

slides-5.jpg

A simple experiment was setup: does the new Radeon RX 580 offer a worthwhile upgrade path for those many users of R9 380 or GTX 960 classifications of graphics cards (or older)?

  Radeon RX 580 Radeon R9 380 GeForce GTX 960
GPU Polaris 20 Tonga Pro GM206
GPU Cores 2304 1792 1024
Rated Clock 1340 MHz 918 MHz 1127 MHz
Memory 4GB
8GB
4GB 2GB
4GB
Memory Interface 256-bit 256-bit 128-bit
TDP 185 watts 190 watts 120 watts
MSRP (at launch) $199 (4GB)
$239 (8GB)
$219 $199

Continue reading our look at the Radeon RX 580 in Prey!

Author:
Manufacturer: EVGA

Specifications and Design

When the GeForce GTX 1080 Ti launched last month it became the fastest consumer graphics card on the market, taking over a spot that NVIDIA had already laid claim to since the launch of the GTX 1080, and arguably before that with the GTX 980 Ti. Passing on the notion that the newly released Titan Xp is a graphics cards gamers should actually consider for their cash, the 1080 Ti continues to stand alone at the top. That is until NVIDIA comes up another new architecture or AMD surprises us all with the release of the Vega chip this summer.

NVIDIA board partners have the flexibility to build custom hardware around the GTX 1080 Ti design and the EVGA GeForce GTX 1080 Ti SC2 sporting iCX Technology is one of those new models. Today’s story is going to give you my thoughts and impressions on this card in a review – one with fewer benchmarks than you are used to see but one that covers all the primary differentiation points to consider over the reference/Founders Edition options.

Specifications and Design

The EVGA GTX 1080 Ti SC2 with iCX Technology takes the same GPU and memory technology shown off with the GTX 1080 Ti launch and gussies it up with higher clocks, a custom PCB with thermal sensors in 9 different locations, LEDs for externally monitoring the health of your card and a skeleton-like cooler design that is both effective and aggressive.

  EVGA 1080 Ti SC2 GTX 1080 Ti Titan X (Pascal) GTX 1080 GTX 980 Ti TITAN X GTX 980 R9 Fury X R9 Fury
GPU GP102 GP102 GP102 GP104 GM200 GM200 GM204 Fiji XT Fiji Pro
GPU Cores 3584 3584 3584 2560 2816 3072 2048 4096 3584
Base Clock 1557 MHz 1480 MHz 1417 MHz 1607 MHz 1000 MHz 1000 MHz 1126 MHz 1050 MHz 1000 MHz
Boost Clock 1671 MHz 1582 MHz 1480 MHz 1733 MHz 1076 MHz 1089 MHz 1216 MHz - -
Texture Units 224 224 224 160 176 192 128 256 224
ROP Units 88 88 96 64 96 96 64 64 64
Memory 11GB 11GB 12GB 8GB 6GB 12GB 4GB 4GB 4GB
Memory Clock 11000 MHz 11000 MHz 10000 MHz 10000 MHz 7000 MHz 7000 MHz 7000 MHz 500 MHz 500 MHz
Memory Interface 352-bit 352-bit 384-bit G5X 256-bit G5X 384-bit 384-bit 256-bit 4096-bit (HBM) 4096-bit (HBM)
Memory Bandwidth 484 GB/s 484 GB/s 480 GB/s 320 GB/s 336 GB/s 336 GB/s 224 GB/s 512 GB/s 512 GB/s
TDP 250 watts 250 watts 250 watts 180 watts 250 watts 250 watts 165 watts 275 watts 275 watts
Peak Compute 11.1 TFLOPS 10.6 TFLOPS 10.1 TFLOPS 8.2 TFLOPS 5.63 TFLOPS 6.14 TFLOPS 4.61 TFLOPS 8.60 TFLOPS 7.20 TFLOPS
Transistor Count 12.0B 12.0B 12.0B 7.2B 8.0B 8.0B 5.2B 8.9B 8.9B
Process Tech 16nm 16nm 16nm 16nm 28nm 28nm 28nm 28nm 28nm
MSRP (current) $719 $699 $1,200 $599 $649 $999 $499 $649 $549

Out of the box EVGA has overclocked the GTX 1080 Ti SC2 above reference specs. With a base clock of 1557 MHz and a GPU Boost clock of 1671 MHz, it has a 77 MHz jump on base and an 89 MHz jump on boost. Though moderate by some overclockers’ standards, that’s a healthy increase of 5.3% on the typical boost clock rate. The memory speed remains the same at 11.0 Gbps on 11GB, unchanged from the Founders Edition.

IMG_7629.jpg

I’m not going to walk through the other specifications of the GeForce GTX 1080 Ti GPU in general – I assume if you are looking at this story you are already well aware of it features and capabilities. If you need a refresh on this oddly-designed 352-bit memory bus behemoth, just read over the first page of my GeForce GTX 1080 Ti launch review.

DSC02824.JPG

Continue reading our review of the EVGA GeForce GTX 1080 Ti SC2!!

Author:
Manufacturer: AMD

What is old is new again

Trust me on this one – AMD is aware that launching the RX 500-series of graphics cards, including the RX 580 we are reviewing today, is an uphill battle. Besides battling the sounds on the hills that whisper “reeebbrraannndd” AMD needs to work with its own board partners to offer up total solutions that compete well with NVIDIA’s stronghold on the majority of the market. Just putting out the Radeon RX 580 and RX 570 cards with same coolers and specs as the RX 400-series would be a recipe for ridicule. AMD is aware and is being surprisingly proactive in its story telling the consumer and the media.

  • If you already own a Radeon RX 400-series card, the RX 500-series is not expected to be an upgrade path for you.
     
  • The Radeon RX 500-series is NOT based on Vega. Polaris here everyone.
     
  • Target users are those with Radeon R9 380 class cards and older – Polaris is still meant as an upgrade for that very large user base.

The story that is being told is compelling; more than you might expect. With more than 500 million gamers using graphics cards two years or older, based on Steam survey data, there is a HUGE audience that would benefit from an RX 580 graphics card upgrade. Older cards may lack support for FreeSync, HDR, higher refresh rate HDMI output and hardware encode/decode support for 4K resolution content. And while the GeForce GTX 1060 family would also meet that criteria, AMD wants to make the case that the Radeon family is the way to go.

slides-10.jpg

The Radeon RX 500-series is based on the same Polaris architecture as the RX 400-series, though AMD would tell us that the technology has been refined since initial launch. More time with the 14nm FinFET process technology has given the fab facility, and AMD, some opportunities to refine. This gives the new GPUs the ability to scale to higher clocks than they could before (though not without the cost of additional power draw). AMD has tweaked multi-monitor efficiency modes, allowing idle power consumption to drop a handful of watts thanks to a tweaked pixel clock.

Maybe the most substantial change with this RX 580 release is the unleashing of any kind of power consumption constraints for the board partners. The Radeon RX 480 launch was marred with issues surrounding the amount of power AMD claimed the boards would use compared to how much they DID use. This time around, all RX 580 graphics cards will ship with AT LEAST an 8-pin power connector, opening overclocked models to use as much as 225 watts. Some cards will have an 8+6-pin configuration to go even higher. Considering the RX 480 launched with a supposed 150 watt TDP (that it never lived up to), that’s quite an increase.

slides-13.jpg

AMD is hoping to convince gamers that Radeon Chill is a good solution to help some specific instances of excessive power draw. Recent drivers have added support for games like League of Legends and DOTA 2, adding to The Witcher 3, Dues Ex: Mankind Divided and more. I will freely admit that while the technology behind Chill sounds impressive, I don’t have the experience with it yet to claim or counterclaim its supposed advantages…without sacrificing user experience.

Continue reading our review of the Radeon RX 580 graphics card!

Author:
Manufacturer: NVIDIA

Overview

Since the launch of NVIDIA's Pascal architecture with the GTX 1070 and 1080 last May, we've taken a look at a lot of Pascal-based products, including the recent launch of the GTX 1080 Ti. By now, it is clear that Pascal has proven itself in a gaming context.

One frequent request we get about GPU coverage is to look at professional uses cases for these sort of devices. While gaming is still far and away the most common use for GPUs, things like high-quality rendering in industries like architecture, and new industries like deep learning can see vast benefits from acceleration by GPUs.

Today, we are taking a look at some of the latest NVIDIA Quadro GPUs on the market, the Quadro P2000, P4000, and P5000. 

DSC02752.JPG

Diving deep into the technical specs of these Pascal-based Quadro products and the AMD competitor we will be testing,  we find a wide range of compute capability, power consumption, and price.

  Quadro P2000 Quadro P4000 Quadro P5000 Radeon Pro Duo
Process 16nm 16nm 16nm 28nm
Code Name GP106 GP104 GP104 Fiji XT x 2
Shaders 1024 1792 2560 8192
Rated Clock Speed 1470 MHz (Boost) 1480 MHz (Boost) 1730 MHz (Boost) up to 1000 MHz
Memory Width 160-bit 256-bit 256-bit 4096-bit (HBM) x 2
Compute Perf (FP32) 3.0 TFLOPS 5.3 TFLOPS 8.9 TFLOPS 16.38 TFLOPS
Compute Perf (FP64) 1/32 FP32 1/32 FP32 1/32 FP 32 1/16 FP32
Frame Buffer 5GB 8GB 16GB 8GB (4GB x 2)
TDP 75W 105W 180W 350W
Street Price $599 $900 $2000 $800

The astute readers will notice similarities to the NVIDIA GeForce line of products as they take a look at these specifications.

Continue to read our roundup of 3 Pascal Quadro Graphics Cards

Author:
Manufacturer: Various

Background and setup

A couple of weeks back, during the excitement surrounding the announcement of the GeForce GTX 1080 Ti graphics card, NVIDIA announced an update to its performance reporting project known as FCAT to support VR gaming. The updated iteration, FCAT VR as it is now called, gives us the first true ability to not only capture the performance of VR games and experiences, but the tools with which to measure and compare.

Watch ths video walk through of FCAT VR with me and NVIDIA's Tom Petersen

I already wrote an extensive preview of the tool and how it works during the announcement. I think it’s likely that many of you overlooked it with the noise from a new GPU, so I’m going to reproduce some of it here, with additions and updates. Everyone that attempts to understand the data we will be presenting in this story and all VR-based tests going forward should have a baseline understanding of the complexity of measuring VR games. Previous tools don’t tell the whole story, and even the part they do tell is often incomplete.

If you already know how FCAT VR works from reading the previous article, you can jump right to the beginning of our results here.

Measuring and validating those claims has proven to be a difficult task. Tools that we used in the era of standard PC gaming just don’t apply. Fraps is a well-known and well-understood tool for measuring frame rates and frame times utilized by countless reviewers and enthusiasts, but Fraps lacked the ability to tell the complete story of gaming performance and experience. NVIDIA introduced FCAT and we introduced Frame Rating back in 2013 to expand the capabilities that reviewers and consumers had access to. Using more sophisticated technique that includes direct capture of the graphics card output in uncompressed form, a software-based overlay applied to each frame being rendered, and post-process analyzation of that data, we could communicate the smoothness of a gaming experience, better articulating it to help gamers make purchasing decisions.

vrpipe1.png

For VR though, those same tools just don’t cut it. Fraps is a non-starter as it measures frame rendering from the GPU point of view and completely misses the interaction between the graphics system and the VR runtime environment (OpenVR for Steam/Vive and OVR for Oculus). Because the rendering pipeline is drastically changed in the current VR integrations, what Fraps measures is completely different than the experience the user actually gets in the headset. Previous FCAT and Frame Rating methods were still viable but the tools and capture technology needed to be updated. The hardware capture products we used since 2013 were limited in their maximum bandwidth and the overlay software did not have the ability to “latch in” to VR-based games. Not only that but measuring frame drops, time warps, space warps and reprojections would be a significant hurdle without further development. 

vrpipe2.png

vrpipe3.png

NVIDIA decided to undertake the task of rebuilding FCAT to work with VR. And while obviously the company is hoping that it will prove its claims of performance benefits for VR gaming, it should not be overlooked the investment in time and money spent on a project that is to be open sourced and free available to the media and the public.

vlcsnap-2017-02-27-11h31m17s057.png

NVIDIA FCAT VR is comprised of two different applications. The FCAT VR Capture tool runs on the PC being evaluated and has a similar appearance to other performance and timing capture utilities. It uses data from Oculus Event Tracing as a part of the Windows ETW and SteamVR’s performance API, along with NVIDIA driver stats when used on NVIDIA hardware to generate performance data. It will and does work perfectly well on any GPU vendor’s hardware though with the access to the VR vendor specific timing results.

fcatvrcapture.jpg

Continue reading our first look at VR performance testing with FCAT VR!!

Author:
Manufacturer: NVIDIA

Flagship Performance Gets Cheaper

UPDATE! If you missed our launch day live stream, you can find the reply below:

It’s a very interesting time in the world of PC gaming hardware. We just saw the release of AMD’s Ryzen processor platform that shook up the processor market for the first time in a decade, AMD’s Vega architecture has been given the brand name “Vega”, and the anticipation for the first high-end competitive part from AMD since Hawaii grows as well. AMD was seemingly able to take advantage of Intel’s slow innovation pace on the processor and it was hoping to do the same to NVIDIA on the GPU. NVIDIA’s product line has been dominant in the mid and high-end gaming market since the 900-series with the 10-series products further cementing the lead.

box1.jpg

The most recent high end graphics card release came in the form of the updated Titan X based on the Pascal architecture. That was WAY back in August of 2016 – a full seven months ago! Since then we have seen very little change at the top end of the product lines and what little change we did see came from board vendors adding in technology and variation on the GTX 10-series.

Today we see the release of the new GeForce GTX 1080 Ti, a card that offers only a handful of noteworthy technological changes but instead is able to shake up the market by instigating pricing adjustments to make the performance offers more appealing, and lowering the price of everything else.

The GTX 1080 Ti GP102 GPU

I already wrote about the specifications of the GPU in the GTX 1080 Ti when it was announced last week, so here’s a simple recap.

  GTX 1080 Ti Titan X (Pascal) GTX 1080 GTX 980 Ti TITAN X GTX 980 R9 Fury X R9 Fury R9 Nano
GPU GP102 GP102 GP104 GM200 GM200 GM204 Fiji XT Fiji Pro Fiji XT
GPU Cores 3584 3584 2560 2816 3072 2048 4096 3584 4096
Base Clock 1480 MHz 1417 MHz 1607 MHz 1000 MHz 1000 MHz 1126 MHz 1050 MHz 1000 MHz up to 1000 MHz
Boost Clock 1582 MHz 1480 MHz 1733 MHz 1076 MHz 1089 MHz 1216 MHz - - -
Texture Units 224 224 160 176 192 128 256 224 256
ROP Units 88 96 64 96 96 64 64 64 64
Memory 11GB 12GB 8GB 6GB 12GB 4GB 4GB 4GB 4GB
Memory Clock 11000 MHz 10000 MHz 10000 MHz 7000 MHz 7000 MHz 7000 MHz 500 MHz 500 MHz 500 MHz
Memory Interface 352-bit 384-bit G5X 256-bit G5X 384-bit 384-bit 256-bit 4096-bit (HBM) 4096-bit (HBM) 4096-bit (HBM)
Memory Bandwidth 484 GB/s 480 GB/s 320 GB/s 336 GB/s 336 GB/s 224 GB/s 512 GB/s 512 GB/s 512 GB/s
TDP 250 watts 250 watts 180 watts 250 watts 250 watts 165 watts 275 watts 275 watts 175 watts
Peak Compute 10.6 TFLOPS 10.1 TFLOPS 8.2 TFLOPS 5.63 TFLOPS 6.14 TFLOPS 4.61 TFLOPS 8.60 TFLOPS 7.20 TFLOPS 8.19 TFLOPS
Transistor Count 12.0B 12.0B 7.2B 8.0B 8.0B 5.2B 8.9B 8.9B 8.9B
Process Tech 16nm 16nm 16nm 28nm 28nm 28nm 28nm 28nm 28nm
MSRP (current) $699 $1,200 $599 $649 $999 $499 $649 $549 $499

The GTX 1080 Ti looks a whole lot like the TITAN X launched in August of last year. Based on the 12B transistor GP102 chip, the new GTX 1080 Ti will have 3,584 CUDA core with a 1.60 GHz Boost clock. That gives it the same processor count as Titan X but with a slightly higher clock speed which should make the new GTX 1080 Ti slightly faster by at least a few percentage points and has a 4.7% edge in base clock compute capability. It has 28 SMs, 28 geometry units, 224 texture units.

GeForce_GTX_1080_Ti_Block_Diagram.png

Interestingly, the memory system on the GTX 1080 Ti gets adjusted – NVIDIA has disabled a single 32-bit memory controller to give the card a total of 352-bit wide bus and an odd-sounding 11GB memory capacity. The ROP count also drops to 88 units. Speaking of 11, the memory clock on the G5X implementation on GTX 1080 Ti will now run at 11 Gbps, a boost available to NVIDIA thanks to a chip revision from Micron and improvements to equalization and reverse signal distortion.

The move from 12GB of memory on the GP102-based Titan X to 11GB on the GTX 1080 Ti is an interesting move, and evokes memories of the GTX 970 fiasco where NVIDIA disabled a portion of that memory controller but left the memory that would have resided on it ON the board. At that point, what behaved as 3.5GB of memory at one speed and 500 MB at another speed, was the wrong move to make. But releasing the GTX 970 with "3.5GB" of memory would have seemed odd too. NVIDIA is not making the same mistake, instead building the GTX 1080 Ti with 11GB out the gate.

Continue reading our review of the NVIDIA GeForce GTX 1080 Ti 11GB graphics card!

Linked Multi-GPU Arrives... for Developers

The Khronos Group has released the Vulkan 1.0.42.0 specification, which includes experimental (more on that in a couple of paragraphs) support for VR enhancements, sharing resources between processes, and linking similar GPUs. This spec was released alongside a LunarG SDK and NVIDIA drivers, which are intended for developers, not gamers, that fully implement these extensions.

I would expect that the most interesting feature is experimental support for linking similar GPUs together, similar to DirectX 12’s Explicit Linked Multiadapter, which Vulkan calls a “Device Group”. The idea is that the physical GPUs hidden behind this layer can do things like share resources, such as rendering a texture on one GPU and consuming it in another, without the host code being involved. I’m guessing that some studios, like maybe Oxide Games, will decide to not use this feature. While it’s not explicitly stated, I cannot see how this (or DirectX 12’s Explicit Linked mode) would be compatible in cross-vendor modes. Unless I’m mistaken, that would require AMD, NVIDIA, and/or Intel restructuring their drivers to inter-operate at this level. Still, the assumptions that could be made with grouped devices are apparently popular with enough developers for both the Khronos Group and Microsoft to bother.

microsoft-dx12-build15-linked.png

A slide from Microsoft's DirectX 12 reveal, long ago.

As for the “experimental” comment that I made in the introduction... I was expecting to see this news around SIGGRAPH, which occurs in late-July / early-August, alongside a minor version bump (to Vulkan 1.1).

I might still be right, though.

The major new features of Vulkan 1.0.42.0 are implemented as a new classification of extensions: KHX. In the past, vendors, like NVIDIA and AMD, would add new features as vendor-prefixed extensions. Games could query the graphics driver for these abilities, and enable them if available. If these features became popular enough for multiple vendors to have their own implementation of it, a committee would consider an EXT extension. This would behave the same across all implementations (give or take) but not be officially adopted by the Khronos Group. If they did take it under their wing, it would be given a KHR extension (or added as a required feature).

The Khronos Group has added a new layer: KHX. This level of extension sits below KHR, and is not intended for production code. You might see where this is headed. The VR multiview, multi-GPU, and cross-process extensions are not supposed to be used in released video games until they leave KHX status. Unlike a vendor extension, the Khronos Group wants old KHX standards to drop out of existence at some point after they graduate to full KHR status. It’s not something that NVIDIA owns and will keep it around for 20 years after its usable lifespan just so old games can behave expectedly.

khronos-group-logo.png

How long will that take? No idea. I’ve already mentioned my logical but uneducated guess a few paragraphs ago, but I’m not going to repeat it; I have literally zero facts to base it on, and I don’t want our readers to think that I do. I don’t. It’s just based on what the Khronos Group typically announces at certain trade shows, and the length of time since their first announcement.

The benefit that KHX does bring us is that, whenever these features make it to public release, developers will have already been using it... internally... since around now. When it hits KHR, it’s done, and anyone can theoretically be ready for it when that time comes.

Author:
Manufacturer: NVIDIA

VR Performance Evaluation

Even though virtual reality hasn’t taken off with the momentum that many in the industry had expected on the heels of the HTC Vive and Oculus Rift launches last year, it remains one of the fastest growing aspects of PC hardware. More importantly for many, VR is also one of the key inflection points for performance moving forward; it requires more hardware, scalability, and innovation than any other sub-category including 4K gaming.  As such, NVIDIA, AMD, and even Intel continue to push the performance benefits of their own hardware and technology.

Measuring and validating those claims has proven to be a difficult task. Tools that we used in the era of standard PC gaming just don’t apply. Fraps is a well-known and well-understood tool for measuring frame rates and frame times utilized by countless reviewers and enthusiasts. But Fraps lacked the ability to tell the complete story of gaming performance and experience. NVIDIA introduced FCAT and we introduced Frame Rating back in 2013 to expand the capabilities that reviewers and consumers had access to. Using more sophisticated technique that includes direct capture of the graphics card output in uncompressed form, a software-based overlay applied to each frame being rendered, and post-process analyzation of that data, we were able to communicate the smoothness of a gaming experience, better articulating it to help gamers make purchasing decisions.

pipe1.jpg

VR pipeline when everything is working well.

For VR though, those same tools just don’t cut it. Fraps is a non-starter as it measures frame rendering from the GPU point of view and completely misses the interaction between the graphics system and the VR runtime environment (OpenVR for Steam/Vive and OVR for Oculus). Because the rendering pipeline is drastically changed in the current VR integrations, what Fraps measures is completely different than the experience the user actually gets in the headset. Previous FCAT and Frame Rating methods were still viable but the tools and capture technology needed to be updated. The hardware capture products we used since 2013 were limited in their maximum bandwidth and the overlay software did not have the ability to “latch in” to VR-based games. Not only that but measuring frame drops, time warps, space warps and reprojections would be a significant hurdle without further development.  

pipe2.jpg

VR pipeline with a frame miss.

NVIDIA decided to undertake the task of rebuilding FCAT to work with VR. And while obviously the company is hoping that it will prove its claims of performance benefits for VR gaming, it should not be overlooked the investment in time and money spent on a project that is to be open sourced and free available to the media and the public.

vlcsnap-2017-02-27-11h31m17s057.png

NVIDIA FCAT VR is comprised of two different applications. The FCAT VR Capture tool runs on the PC being evaluated and has a similar appearance to other performance and timing capture utilities. It uses data from Oculus Event Tracing as a part of the Windows ETW and SteamVR’s performance API, along with NVIDIA driver stats when used on NVIDIA hardware to generate performance data. It will and does work perfectly well on any GPU vendor’s hardware though with the access to the VR vendor specific timing results.

fcatvrcapture.jpg

Continue reading our preview of the new FCAT VR tool!

Manufacturer: PC Perspective

Living Long and Prospering

The open fork of AMD’s Mantle, the Vulkan API, was released exactly a year ago with, as we reported, a hard launch. This meant public, but not main-branch drivers for developers, a few public SDKs, a proof-of-concept patch for The Talos Principle, and, of course, the ratified specification. This sets up the API to find success right out of the gate, and we can now look back over the year since.

khronos-2017-vulkan-alt-logo.png

Thor's hammer, or a tempest in a teapot?

The elephant in the room is DOOM. This game has successfully integrated the API and it uses many of its more interesting features, like asynchronous compute. Because the API is designed in a sort-of “make a command, drop it on a list” paradigm, the driver is able to select commands based on priority and available resources. AMD’s products got a significant performance boost, relative to OpenGL, catapulting their Fury X GPU up to the enthusiast level that its theoretical performance suggested.

Mobile developers have been picking up the API, too. Google, who is known for banishing OpenCL from their Nexus line and challenging OpenGL ES with their Android Extension Pack (later integrated into OpenGL ES with version 3.2), has strongly backed Vulkan. The API was integrated as a core feature of Android 7.0.

On the engine and middleware side of things, Vulkan is currently “ready for shipping games” as of Unreal Engine 4.14. It is also included in Unity 5.6 Beta, which is expected for full release in March. Frameworks for emulators are also integrating Vulkan, often just to say they did, but sometimes to emulate the quirks of these system’s offbeat graphics co-processors. Many other engines, from Source 2 to Torque 3D, have also announced or added Vulkan support.

Finally, for the API itself, The Khronos Group announced (pg 22 from SIGGRAPH 2016) areas that they are actively working on. The top feature is “better” multi-GPU support. While Vulkan, like OpenCL, allows developers to enumerate all graphics devices and target them, individually, with work, it doesn’t have certain mechanisms, like being able to directly ingest output from one GPU into another. They haven’t announced a timeline for this.

Author:
Manufacturer: EVGA

The new EVGA GTX 1080 FTW2 with iCX Technology

Back in November of 2016, EVGA had a problem on its hands. The company had a batch of GTX 10-series graphics cards using the new ACX 3.0 cooler solution leave the warehouse missing thermal pads required to keep the power management hardware on its cards within reasonable temperature margins. To its credit, the company took the oversight seriously and instituted a set of solutions for consumers to select from: RMA, new VBIOS to increase fan speeds, or to install thermal pads on your hardware manually. Still, as is the case with any kind of product quality lapse like that, there were (and are) lingering questions about EVGA’s ability to maintain reliable product; with features and new options that don’t compromise the basics.

Internally, the drive to correct these lapses was…strong. From the very top of the food chain on down, it was hammered home that something like this simply couldn’t occur again, and even more so, EVGA was to develop and showcase a new feature set and product lineup demonstrating its ability to innovate. Thus was born, and accelerated, the EVGA iCX Technology infrastructure. While this was something in the pipeline for some time already, it was moved up to counter any negative bias that might have formed for EVGA’s graphics cards over the last several months. The goal was simple: prove that EVGA was the leader in graphics card design and prove that EVGA has learned from previous mistakes.

EVGA iCX Technology

Previous issues aside, the creation of iCX Technology is built around one simple question: is one GPU temperature sensor enough? For nearly all of today’s graphics cards, cooling is based around the temperature of the GPU silicon itself, as measured by NVIDIA (for all of EVGA’s cards). This is how fan curves are built, how GPU clock speeds are handled with GPU Boost, how noise profiles are created, and more. But as process technology has improved, and GPU design has weighed towards power efficiency, the GPU itself is often no longer the thermally limiting factor.

slides05.jpg

As it turns out, converting 12V (from the power supply) to ~1V (necessary for the GPU) is a simple process that creates a lot of excess heat. The thermal images above clearly demonstrate that and EVGA isn’t the only card vendor to take notice of this. As it turns out, EVGA’s product issue from last year was related to this – the fans were only spinning fast enough to keep the GPU cool and did not take into account the temperature of memory or power delivery.

The fix from EVGA is to ratchet up the number of sensors on the card PCB and wrap them with intelligence in the form of MCUs, updated Precision XOC software and user viewable LEDs on the card itself.

slides10.jpg

EVGA graphics cards with iCX Technology will include 9 total thermal sensors on the board, independent of the GPU temperature sensor directly integrated by NVIDIA. There are three sensors for memory, five for power delivery and an additional sensor for the GPU temperature. Some are located on the back of the PCB to avoid any conflicts with trace routing between critical components, including the secondary GPU sensor.

Continue reading about EVGA iCX Technology!