All | Editorial | General Tech | Graphics Cards | Networking | Motherboards | Cases and Cooling | Processors | Chipsets | Memory | Displays | Systems | Storage | Mobile | Shows and Expos
We have to go all the way back to 2015 for NVIDIA's previous graphics card announcement at CES, with the GeForce GTX 960 revealed during the show four years ago. And coming on the heels of this announcement today we have the latest “mid-range” offering in the tradition of the GeForce x60 (or x060) cards, the RTX 2060. This launch comes as no surprise to those of us following the PC industry, as various rumors and leaks preceded the announcement by weeks and even months, but such is the reality of the modern supply chain process (sadly, few things are ever really a surprise anymore).
But there is still plenty of new information available with the official launch of this new GPU, not the least of which is the opportunity to look at independent benchmark results to find out what to expect with this new GPU relative to the market. To this end we had the opportunity to get our hands on the card before the official launch, testing the RTX 2060 in several games as well as a couple of synthetic benchmarks. The story is just beginning, and as time permits a "part two" of the RTX 2060 review will be offered to supplement this initial look, addressing omissions and adding further analysis of the data collected thus far.
Before getting into the design and our initial performance impressions of the card, let's look into the specifications of this new RTX 2060, and see how it relates to the rest of the RTX family from NVIDIA. We are taking a high level look at specs here, so for a deep dive into the RTX series you can check out our previous exploration of the Turing Architecture here.
"Based on a modified version of the Turing TU106 GPU used in the GeForce RTX 2070, the GeForce RTX 2060 brings the GeForce RTX architecture, including DLSS and ray-tracing, to the midrange GPU segment. It delivers excellent gaming performance on all modern games with the graphics settings cranked up. Priced at $349, the GeForce RTX 2060 is designed for 1080p gamers, and delivers an excellent gaming experience at 1440p."
|RTX 2080 Ti||RTX 2080||RTX 2070||RTX 2060||GTX 1080||GTX 1070|
|Base Clock||1350 MHz||1515 MHz||1410 MHz||1365 MHz||1607 MHz||1506 MHz|
|Boost Clock||1545 MHz/
1635 MHz (FE)
1800 MHz (FE)
1710 MHz (FE)
|1680 MHz||1733 MHz||1683 MHz|
|Ray Tracing Speed||10 Giga Rays||8 Giga Rays||6 Giga Rays||5 Giga Rays||--||--|
|Memory Clock||14000 MHz||14000 MHz||14000 MHz||14000 MHz||10000 MHz||8000 MHz|
|Memory Interface||352-bit GDDR6||256-bit GDDR6||256-bit GDDR6||192-bit GDDR6||256-bit GDDR5X||256-bit GDDR5|
|Memory Bandwidth||616 GB/s||448 GB/s||448 GB/s||336.1 GB/s||320 GB/s||256 GB/s|
|TDP||250 W /
260 W (FE)
|175 W / 185W (FE)||160 W||180 W||150 W|
|MSRP (current)||$1200 (FE)/
|$599 (FE)/ $499||$349||$549||$379|
Vega meets Radeon Pro
Professional graphics cards are a segment of the industry that can look strange to gamers and PC enthusiasts. From the outside, it appears that businesses are paying more for almost identical hardware when compared to their gaming counterparts from both NVIDIA and AMD.
However, a lot goes into a professional-level graphics card that makes all the difference to the consumers they are targeting. From the addition of ECC memory to protect against data corruption, all the way to a completely different driver stack with specific optimizations for professional applications, there's a lot of work put into these particular products.
The professional graphics market has gotten particularly interesting in the last few years with the rise of the NVIDIA TITAN-level GPUs and "Frontier Edition" graphics cards from AMD. While lacking ECC memory, these new GPUs have brought over some of the application level optimizations, while providing a lower price for more hobbyist level consumers.
However, if you're a professional that depends on a graphics card for mission-critical work, these options are no replacement for the real thing.
Today we're looking at one of AMD's latest Pro graphics offerings, the AMD Radeon Pro WX 8200.
While 2018 so far has contained lots of talk about graphics cards, and new GPU architectures, little of this talk has been revolving around AMD. After having launched their long-awaited Vega GPUs in late 2017, AMD has remained mostly quiet on the graphics front.
As we headed into summer 2018, the talk around graphics started to turn to NVIDIA's next generation Turing architecture, the RTX 2070, 2080, and 2080 Ti, and the subsequent price creeps of graphics cards in their given product segment.
However, there has been one segment in particular that has been lacking any excitement in 2018—mid-range GPUs for gamers on a budget.
AMD is aiming to change that today with the release of the RX 590. Join us as we discuss the current state of affordable graphics cards.
|RX 590||RX 580||GTX 1060 6GB||GTX 1060 3GB|
|GPU||Polaris 30||Polaris 20||GP106||GP106|
|Rated Clock||1469 MHz Base
1545 MHz Boost
1257 MHz Base
|1506 MHz Base
1708 MHz Boost
|1506 MHz Base
1708 MHz Boost
|Memory Clock||8000 MHz||8000 MHz||8000 MHz||8000 MHz|
|Memory Bandwidth||256 GB/s||256 GB/s||192 GB/s||192 GB/s|
|TDP||225 watts||185 watts||120 watts||120 watts|
|Peak Compute||7.1 TFLOPS||6.17 TFLOPS||3.85 TFLOPS (Base)||2.4 TFLOPS (Base)|
|MSRP (of retail cards)||$239||$219||$249||$209|
With the launch of the GeForce RTX 2070, NVIDIA seems to have applied some pressure to their partners to get SKUs that actually hit the advertised "starting at $499" price. Compared to the $599 Founders Edition RTX 2070, these lower cost options have the potential to bring significantly more value to the consumer, especially taken into account the relative performance levels of the RTX 2070 to the GTX 1080 we observed in our initial review.
Earlier this week, we took a look at the EVGA RTX 2070 Black Edition, but it's not the only card to hit the $499 price range that we've received.
Today, we are taking a look at MSI's low-cost RTX 2070 offering, the MSI RTX 2070 Armor.
|MSI RTX 2070 ARMOR 8G|
|Base Clock Speed||1410 MHz|
|Boost Clock Speed||1620 MHz|
|Memory Clock Speed||14000 MHz GDDR6|
|Outputs||DisplayPort x 3(v1.4) / HDMI 2.0b x 1 / USB Type-C x1 (VirtualLink) /|
12.1 x 6.1 x 1.9 inches (309 x 155 x 50 mm)
TU106 joins the party
In general, the launch of RTX 20-series GPUs from NVIDIA in the form of the RTX 2080 and RTX 2080 Ti has been a bit of a mixed bag.
While these new products did give us the fastest gaming GPU available, the RTX 2080 Ti, they are also some of the most expensive videos cards ever to launch. With a value proposition that is partially tied to the adoption of new hardware features into games, the reception of these new RTX cards has been rocky.
To say this puts a bit of pressure on the RTX 2070 launch would be an apt assessment. The community wants to see a reason to get excited for new graphics cards, without having to wait for applications to take advantage of the new hardware features like Tensor and RT cores. Conversely, NVIDIA would surely love to see an RTX launch with a bit more praise from the press and community than their previous release has garnered.
The wait is no longer, today we are taking a look at the RTX 2070, the last of the RTX-series graphics cards announced by NVIDIA back in August.
|RTX 2080 Ti||GTX 1080 Ti||RTX 2080||RTX 2070||GTX 1080||GTX 1070||RX Vega 64 (Air)|
|Base Clock||1350 MHz||1408 MHz||1515 MHz||1410 MHz||1607 MHz||1506 MHz||1247 MHz|
|Boost Clock||1545 MHz/
1635 MHz (FE)
|1582 MHz||1710 MHz/
1800 MHz (FE)
|1620 MHz/ 1710 MHz (FE)||1733 MHz||1683 MHz||1546 MHz|
|Ray Tracing Speed||10 GRays/s||--||8 GRays/s||6 GRays/s||--||--||--|
|Memory Clock||14000 MHz||11000 MHz||14000 MHz||14000 MHz||10000 MHz||8000 MHz||1890 MHz|
|Memory Interface||352-bit G6||352-bit G5X||256-bit G6||256-bit G6||256-bit G5X||256-bit G5||2048-bit HBM2|
|Memory Bandwidth||616GB/s||484 GB/s||448 GB/s||448 GB/s||320 GB/s||256 GB/s||484 GB/s|
|TDP||250 W /
260 W (FE)
|250 W||215W /
|175 W / 185W (FE)||180 W||150 W||292 W|
|Peak Compute (FP32)||13.4 TFLOPS / 14.2 TFLOP (FE)||10.6 TFLOPS||10 TFLOPS / 10.6 TFLOPS (FE)||7.5 TFLOPS / 7.9 TFLOPS (FE)||8.2 TFLOPS||6.5 TFLOPS||13.7 TFLOPS|
|Transistor Count||18.6 B||12.0 B||13.6 B||10.8 B||7.2 B||7.2B||12.5 B|
|MSRP (current)||$1200 (FE)/
|$599 (FE)/ $499||$549||$379||$499|
With the release of the NVIDIA GeForce RTX 2080 and 2080 Ti just last week, the graphics card vendors have awakened with a flurry of new products based on the Turing GPUs.
Today, we're taking a look at ASUS's flagship option, the ASUS Republic of Gamers STRIX 2080 Ti.
|ASUS ROG STRIX 2080 Ti|
|Base Clock Speed||1350 MHz|
|Boost Clock Speed||1665 MHz|
|Memory Clock Speed||14000 MHz GDDR6|
|Outputs||DisplayPort x 2 (v1.4) / HDMI 2.0b x 2 / USB Type-C x1 (VirtualLink)|
12 x 5.13 x 2.13 inches (30.47 x 13.04 x 5.41 cm)
For those of you familiar with the most recent STRIX video cards, the GTX 1080 Ti, and the RX Vega 64, the design of the RTX 2080 Ti will be immediately familiar. The same symmetric triple fan setup is present, contrasted against some of the recent triple fan designs we've seen from other manufacturers with different size fans.
Just as with the STRIX GTX 1080 Ti, the RTX 2080 Ti version features RGB lighting along the fan shroud of the card.
Our First Look
Over the years, the general trend for new GPU launches, especially GPUs from new graphics architecture is to launch only with the "reference" graphics card designs, developed by AMD or NVIDIA. While the idea of a "reference" design has changed over the years, with the introduction of NVIDIA's Founders Edition cards, and different special edition designs at launch from AMD like we saw with Vega 56 and Vega 64, generally there aren't any custom designs from partners available at launch.
However with the launch of NVIDIA's Turing architecture, in the form of the RTX 2080 and RTX 2080 Ti, we've been presented with an embarrassment of riches in the form of plenty of custom cooler and custom PCB designs found from Add-in Board (AIB) Manufacturers.
Today, we're taking a look at our first custom RTX 2080 design, the MSI RTX 2080 Gaming X Trio.
|MSI GeForce RTX 2080 Gaming X Trio|
|Base Clock Speed||1515 MHz|
|Boost Clock Speed||1835 MHz|
|Memory Clock Speed||7000 MHz GDDR6|
|Outputs||DisplayPort x 3 (v1.4) / HDMI 2.0b x 1 / USB Type-C x1 (VirtualLink)|
12.9-in x 5.5-in x 2.1-in (327 x 140 x 55.6 mm)
|Weight||3.42 lbs (1553 g)|
Introduced with the GTX 1080 Ti, the Gaming X Trio is as you might expect, a triple fan design, that makes up MSI's highest performance graphics card offering.
New Generation, New Founders Edition
At this point, it seems that calling NVIDIA's 20-series GPUs highly anticipated would be a bit of an understatement. Between months and months of speculation about what these new GPUs would be called, what architecture they would be based off, and what features they would bring, the NVIDIA GeForce RTX 2080 and RTX 2080 Ti were officially unveiled in August, alongside the Turing architecture.
We've already posted our deep dive into the Turing architecture and the TU 102 and TU 104 GPUs powering these new graphics cards, but here's a short take away. Turing provides efficiency improvements in both memory and shader performance, as well as adds additional specialized hardware to accelerate both deep learning (Tensor cores), and enable real-time ray tracing (RT cores).
|RTX 2080 Ti||Quadro RTX 6000||GTX 1080 Ti||RTX 2080||Quadro RTX 5000||GTX 1080||TITAN V||RX Vega 64 (Air)|
|Base Clock||1350 MHz||1455 MHz||1408 MHz||1515 MHz||1620 MHz||1607 MHz||1200 MHz||1247 MHz|
|Boost Clock||1545 MHz/
1635 MHz (FE)
|1770 MHz||1582 MHz||1710 MHz/
1800 MHz (FE)
|1820 MHz||1733 MHz||1455 MHz||1546 MHz|
|Ray Tracing Speed||10 GRays/s||10 GRays/s||--||8 GRays/s||8 GRays/s||--||--||--|
|Memory Clock||14000 MHz||14000 MHz||11000 MHz||14000 MHz||14000 MHz||10000 MHz||1700 MHz||1890 MHz|
|Memory Interface||352-bit G6||384-bit G6||352-bit G5X||256-bit G6||256-bit G6||256-bit G5X||3072-bit HBM2||2048-bit HBM2|
|Memory Bandwidth||616GB/s||672GB/s||484 GB/s||448 GB/s||448 GB/s||320 GB/s||653 GB/s||484 GB/s|
260 W (FE)
|260 W||250 watts||215W
|230 W||180 watts||250W||292|
|Peak Compute (FP32)||13.4 TFLOPS / 14.2 TFLOP (FE)||16.3 TFLOPS||10.6 TFLOPS||10 TFLOPS / 10.6 TFLOPS (FE)||11.2 TFLOPS||8.2 TFLOPS||14.9 TFLOPS||13.7 TFLOPS|
|Transistor Count||18.6 B||18.6B||12.0 B||13.6 B||13.6 B||7.2 B||21.0 B||12.5 B|
|MSRP (current)||$1200 (FE)/
As unusual as it is for them NVIDIA has decided to release both the RTX 2080 and RTX 2080 Ti at the same time, as the first products in the Turing family.
The TU102-based RTX 2080 Ti features 4352 CUDA cores, while the TU104-based RTX 2080 features 2944, less than the GTX 1080 Ti. Also, these new RTX GPUs have moved to GDDR6 from the GDDR5X we found on the GTX 10-series.
A Look Back and Forward
Although NVIDIA's new GPU architecture, revealed previously as Turing, has been speculated about for what seems like an eternity at this point, we finally have our first look at exactly what NVIDIA is positioning as the future of gaming.
Unfortunately, we can't talk about this card just yet, but we can talk about what powers it
First though, let's take a look at the journey to get here over the past 30 months or so.
Unveiled in early 2016, Pascal marked by the launch of the GTX 1070 and 1080 was NVIDIA's long-awaited 16nm successor to Maxwell. Constrained by the oft-delayed 16nm process node, Pascal refined the shader unit design original found in Maxwell, while lowering power consumption and increasing performance.
Next, in May 2017 came Volta, the next (and last) GPU architecture outlined in NVIDIA's public roadmaps since 2013. However, instead of the traditional launch with a new GeForce gaming card, Volta saw a different approach.
Retesting the 2990WX
Earlier today, NVIDIA released version 399.24 of their GeForce drivers for Windows, citing Game Ready support for some newly released games including Shadow of the Tomb Raider, The Call of Duty: Black Ops 4 Blackout Beta, and Assetto Corsa Competizione early access.
While this in and of itself is a normal event, we shortly started to get some tips from readers about an interesting bug fix found in NVIDIA's release notes for this specific driver revision.
Specifically addressing performance differences between 16-core/32-thread processors and 32-core/64-thread processors, this patched issue immediately rang true of our experiences benchmarking the AMD Ryzen Threadripper 2990WX back in August, where we saw some games resulting in frames rates around 50% slower than the 16-core Threadripper 2950X.
This particular patch note lead us to update out Ryzen Threadripper 2990WX test platform to this latest NVIDIA driver release and see if there were any noticeable changes in performance.
The full testbed configuration is listed below:
|Test System Setup|
AMD Ryzen Threadripper 2990WX
|Motherboard||ASUS ROG Zenith Extreme - BIOS 1304|
16GB Corsair Vengeance DDR4-3200
Operating at DDR4-2933
|Storage||Corsair Neutron XTi 480 SSD|
|Graphics Card||NVIDIA GeForce GTX 1080 Ti 11GB|
|Graphics Drivers||NVIDIA 398.26 and 399.24|
|Power Supply||Corsair RM1000x|
|Operating System||Windows 10 Pro x64 RS4 (17134.165)|
Included at the end of this article are the full results from our entire suite of game benchmarks from our CPU testbed, but first, let's take a look at some of the games that provided particularly bad issues with the 2990WX previously.
The interesting data points for this testing are the 2990WX scores across both the driver revision we tested across every CPU, 398.26, as well as the results from the 1/4 core compatibility mode, and the Ryzen Threadripper 2950X. From the wording of the patch notes, we would expect gaming performance between the 16-core 2950X and the 32-core 2990WX to be very similar.
Grand Theft Auto V
GTA V was previously one of the worst offenders in our original 2990WX testing, with the frame rate almost halving compared to the 2950X.
However, with the newest GeForce driver update, we see this gap shrinking to around a 20% difference.
Your Mileage May Vary
One of the most interesting things going around in the computer hardware communities this past weekend was the revelation from a user named bryf50 on Reddit that they somehow had gotten his FreeSync display working with his NVIDIA GeForce GPU.
For those of you that might not be familiar with the particular ins-and-outs of these variable refresh technologies, getting FreeSync displays to work on NVIDIA GPUs is potentially a very big deal.
While NVIDIA GPUs support the NVIDIA G-SYNC variable refresh rate standard, they are not compatible with Adaptive Sync (the technology on which FreeSync is based) displays. Despite Adaptive Sync being an open standard, and an optional extension to the DisplayPort specification, NVIDIA so far has chosen not to support these displays.
However, this provides some major downsides to consumers looking to purchase displays and graphics cards. Due to the lack of interoperability, consumers can get locked into a GPU vendor if they want to continue to use the variable refresh functionality of their display. Plus, Adaptive-Sync/FreeSync monitors, in general, seem to be significantly more inexpensive for similar specifications.
A long time coming
To say that the ASUS ROG Swift PG27UQ has been a long time coming is a bit of an understatement. In the computer hardware world where we are generally lucky to know about a product for 6-months, the PG27UQ is a product that has been around in some form or another for at least 18 months.
Originally demonstrated at CES 2017, the ASUS ROG Swift PG27UQ debuted alongside the Acer Predator X27 as the world's first G-SYNC displays supporting HDR. With promised brightness levels of 1000 nits, G-SYNC HDR was a surprising and aggressive announcement considering that HDR was just starting to pick up steam on TVs, and was unheard of for PC monitors. On top of the HDR support, these monitors were the first announced displays sporting a 144Hz refresh rate at 4K, due to their DisplayPort 1.4 connections.
However, delays lead to the PG27UQ being displayed yet again at CES this year, with a promised release date of Q1 2018. Even more slippages in release lead us to today, where the ASUS PG27UQ is available for pre-order for a staggering $2,000 and set to ship at some point this month.
In some ways, the launch of the PG27UQ very much mirrors the launch of the original G-SYNC display, the ROG Swift PG278Q. Both displays represented the launch of an oft waited technology, in a 27" form factor, and were seen as extremely expensive at their time of release.
Finally, we have our hands on a production model of the ASUS PG27UQ, the first monitor to support G-SYNC HDR, as well as 144Hz refresh rate at 4K. Can a PC monitor really be worth a $2,000 price tag?
Announced at Intel's Developer Forum in 2012, and launched later that year, the Next Unit of Computing (NUC) project was initially a bit confusing to the enthusiast PC press. In a market that appeared to be discarding traditional desktops in favor of notebooks, it seemed a bit odd to launch a product that still depended on a monitor, mouse, and keyboard, yet didn't provide any more computing power.
Despite this criticism, the NUC lineup has rapidly expanded over the years, seeing success in areas such as digital signage and enterprise environments. However, the enthusiast PC market has mostly eluded the lure of the NUC.
Intel's Skylake-based Skull Canyon NUC was the company's first attempt to cater to the enthusiast market, with a slight stray from the traditional 4-in x 4-in form factor and the adoption of their best-ever integrated graphics solution in the Iris Pro. Additionally, the ability to connect external GPUs via Thunderbolt 3 meant Skull Canyon offered more of a focus on high-end PC graphics.
However, Skull Canyon mostly fell on deaf ears among hardcore PC users, and it seemed that Intel lacked the proper solution to make a "gaming-focused" NUC device—until now.
Announced at CES 2018, the lengthily named 8th Gen Intel® Core™ processors With Radeon™ RX Vega M Graphics (henceforth referred to as the code name, Kaby Lake-G) marks a new direction for Intel. By partnering with one of the leaders in high-end PC graphics, AMD, Intel can now pair their processors with graphics capable of playing modern games at high resolutions and frame rates.
The first product to launch using the new Kaby Lake-G family of processors is Intel's own NUC, the NUC8i7HVK (Hades Canyon). Will the marriage of Intel and AMD finally provide a NUC capable of at least moderate gaming? Let's dig a bit deeper and find out.
O Rayly? Ya Rayly. No Ray!
Microsoft has just announced a raytracing extension to DirectX 12, called DirectX Raytracing (DXR), at the 2018 Game Developer's Conference in San Francisco.
The goal is not to completely replace rasterization… at least not yet. This effect will be mostly implemented for effects that require supplementary datasets, such as reflections, ambient occlusion, and refraction. Rasterization, the typical way that 3D geometry gets drawn on a 2D display, converts triangle coordinates into screen coordinates, and then a point-in-triangle test runs across every sample. This will likely occur once per AA sample (minus pixels that the triangle can’t possibly cover -- such as a pixel outside of the triangle's bounding box -- but that's just optimization).
For rasterization, each triangle is laid on a 2D grid corresponding to the draw surface.
If any sample is in the triangle, the pixel shader is run.
This example shows the rotated grid MSAA case.
A program, called a pixel shader, is then run with some set of data that the GPU could gather on every valid pixel in the triangle. This set of data typically includes things like world coordinate, screen coordinate, texture coordinates, nearby vertices, and so forth. This lacks a lot of information, especially things that are not visible to the camera. The application is free to provide other sources of data for the shader to crawl… but what?
- Cubemaps are useful for reflections, but they don’t necessarily match the scene.
- Voxels are useful for lighting, as seen with NVIDIA’s VXGI and VXAO.
This is where DirectX Raytracing comes in. There’s quite a few components to it, but it’s basically a new pipeline that handles how rays are cast into the environment. After being queued, it starts out with a ray-generation stage, and then, depending on what happens to the ray in the scene, there are close-hit, any-hit, and miss shaders. Ray generation allows the developer to set up how the rays are cast, where they call an HLSL instrinsic instruction, TraceRay (which is a clever way of invoking them, by the way). This function takes an origin and a direction, so you can choose to, for example, cast rays only in the direction of lights if your algorithm was to, for instance, approximate partially occluded soft shadows from a non-point light. (There are better algorithms to do that, but it's just the first example that came off the top of my head.) The close-hit, any-hit, and miss shaders occur at the point where the traced ray ends.
To connect this with current technology, imagine that ray-generation is like a vertex shader in rasterization, where it sets up the triangle to be rasterized, leading to pixel shaders being called.
Even more interesting – the close-hit, any-hit, and miss shaders can call TraceRay themselves, which is used for multi-bounce and other recursive algorithms (see: figure above). The obvious use case might be reflections, which is the headline of the GDC talk, but they want it to be as general as possible, aligning with the evolution of GPUs. Looking at NVIDIA’s VXAO implementation, it also seems like a natural fit for a raytracing algorithm.
Speaking of data structures, Microsoft also detailed what they call the acceleration structure. Each object is composed of two levels. The top level contains per-object metadata, like its transformation and whatever else data that the developer wants to add to it. The bottom level contains the geometry. The briefing states, “essentially vertex and index buffers” so we asked for clarification. DXR requires that triangle geometry be specified as vertex positions in either 32-bit float3 or 16-bit float3 values. There is also a stride property, so developers can tweak data alignment and use their rasterization vertex buffer, as long as it's HLSL float3, either 16-bit or 32-bit.
As for the tools to develop this in…
Microsoft announced PIX back in January 2017. This is a debugging and performance analyzer for 64-bit, DirectX 12 applications. Microsoft will upgrade it to support DXR as soon as the API is released (specifically, “Day 1”). This includes the API calls, the raytracing pipeline resources, the acceleration structure, and so forth. As usual, you can expect Microsoft to support their APIs with quite decent – not perfect, but decent – documentation and tools. They do it well, and they want to make sure it’s available when the API is.
Example of DXR via EA's in-development SEED engine.
In short, raytracing is here, but it’s not taking over rasterization. It doesn’t need to. Microsoft is just giving game developers another, standardized mechanism to gather supplementary data for their games. Several game engines have already announced support for this technology, including the usual suspects of anything top-tier game technology:
- Frostbite (EA/DICE)
- SEED (EA)
- 3DMark (Futuremark)
- Unreal Engine 4 (Epic Games)
- Unity Engine (Unity Technologies)
They also said, “and several others we can’t disclose yet”, so this list is not even complete. But, yeah, if you have Frostbite, Unreal Engine, and Unity, then you have a sizeable market as it is. There is always a question about how much each of these engines will support the technology. Currently, raytracing is not portable outside of DirectX 12, because it’s literally being announced today, and each of these engines intend to support more than just Windows 10 and Xbox.
Still, we finally have a standard for raytracing, which should drive vendors to optimize in a specific direction. From there, it's just a matter of someone taking the risk to actually use the technology for a cool work of art.
If you want to read more, check out Ryan's post about the also-announced RTX, NVIDIA's raytracing technology.
It's all fun and games until something something AI.
Microsoft announced the Windows Machine Learning (WinML) API about two weeks ago, but they did so in a sort-of abstract context. This week, alongside the 2018 Game Developers Conference, they are grounding it in a practical application: video games!
Specifically, the API provides the mechanisms for game developers to run inference on the target machine. The training data that it runs against would be in the Open Neural Network Exchange (ONNX) format from Microsoft, Facebook, and Amazon. Like the initial announcement suggests, it can be used for any application, not just games, but… you know. If you want to get a technology off the ground, and it requires a high-end GPU, then video game enthusiasts are good lead users. When run in a DirectX application, WinML kernels are queued on the DirectX 12 compute queue.
We’ve discussed the concept before. When you’re rendering a video game, simulating an accurate scenario isn’t your goal – the goal is to look like you are. The direct way of looking like you’re doing something is to do it. The problem is that some effects are too slow (or, sometimes, too complicated) to correctly simulate. In these cases, it might be viable to make a deep-learning AI hallucinate a convincing result, even though no actual simulation took place.
Fluid dynamics, global illumination, and up-scaling are three examples.
Previously mentioned SIGGRAPH demo of fluid simulation without fluid simulation...
... just a trained AI hallucinating a scene based on input parameters.
Another place where AI could be useful is… well… AI. One way of making AI is to give it some set of data from the game environment, often including information that a player in its position would not be able to know, and having it run against a branching logic tree. Deep learning, on the other hand, can train itself on billions of examples of good and bad play, and make results based on input parameters. While the two methods do not sound that different, the difference between logic being designed (vs logic being assembled from an abstract good/bad dataset) someone abstracts the potential for assumptions and programmer error. Of course, it abstracts that potential for error into the training dataset, but that’s a whole other discussion.
The third area that AI could be useful is when you’re creating the game itself.
There’s a lot of grunt and grind work when developing a video game. Licensing prefab solutions (or commissioning someone to do a one-off asset for you) helps ease this burden, but that gets expensive in terms of both time and money. If some of those assets could be created by giving parameters to a deep-learning AI, then those are assets that you would not need to make, allowing you to focus on other assets and how they all fit together.
These are three of the use cases that Microsoft is aiming WinML at.
Sure, these are smooth curves of large details, but the antialiasing pattern looks almost perfect.
For instance, Microsoft is pointing to an NVIDIA demo where they up-sample a photo of a car, once with bilinear filtering and once with a machine learning algorithm (although not WinML-based). The bilinear algorithm behaves exactly as someone who has used Photoshop would expect. The machine learning algorithm, however, was able to identify the objects that the image intended to represent, and it drew the edges that it thought made sense.
Like their DirectX Raytracing (DXR) announcement, Microsoft plans to have PIX support WinML “on Day 1”. As for partners? They are currently working with Unity Technologies to provide WinML support in Unity’s ML-Agents plug-in. That’s all the game industry partners they have announced at the moment, though. It’ll be interesting to see who jumps in and who doesn’t over the next couple of years.
It's clear by now that AMD's latest CPU releases, the Ryzen 3 2200G and the Ryzen 5 2400G are compelling products. We've already taken a look at them in our initial review, as well as investigated how memory speed affected the graphics performance of the internal GPU but it seemed there was something missing.
Recently, it's been painfully clear that GPUs excel at more than just graphics rendering. With the rise of cryptocurrency mining, OpenCL and CUDA performance are as important as ever.
Cryptocurrency mining certainly isn't the only application where having a powerful GPU can help system performance. We set out to see how much of an advantage the Radeon Vega 11 graphics in the Ryzen 5 2400G provided over the significantly less powerful UHD 630 graphics in the Intel i5-8400.
|Test System Setup|
|CPU||AMD Ryzen 5 2400G
Intel Core i5-8400
|Motherboard||Gigabyte AB350N-Gaming WiFi
ASUS STRIX Z370-E Gaming
|Memory||2 x 8GB G.SKILL FlareX DDR4-3200
(All memory running at 3200 MHz)
|Storage||Corsair Neutron XTi 480 SSD|
|Graphics Card||AMD Radeon Vega 11 Graphics
Intel UHD 630 Graphics
|Graphics Drivers||AMD 17.40.3701
|Power Supply||Corsair RM1000x|
|Operating System||Windows 10 Pro x64 RS3|
Before we take a look at some real-world examples of where a powerful GPU can be utilized, let's look at the relative power of the Vega 11 graphics on the Ryzen 5 2400G compared to the UHD 630 graphics on the Intel i5-8400.
SiSoft Sandra is a suite of benchmarks covering a wide array of system hardware and functionality, including an extensive range of GPGPU tests, which we are looking at today.
Comparing the raw shader performance of the Ryzen 5 2400G and the Intel i5-8400 provides a clear snapshot of what we are dealing with. In every precision category, the Vega 11 graphics in the AMD part are significantly more powerful than the Intel UHD 630 graphics. This all combines to provide a 175% increase in aggregate shader performance over Intel for the AMD part.
Now that we've taken a look at the theoretical power of these GPUs, let's see how they perform in real-world applications.
Memory speed is not a factor that the average gamer thinks about when building their PC. For the most part, memory performance hasn't had much of an effect on modern processors running high-speed memory such as DDR3 and DDR4.
With the launch of AMD's Ryzen processors, last year emerged a platform that was more sensitive to memory speeds. By running Ryzen processors with higher frequency and lower latency memory, users should see significant performance improvements, especially in 1080p gaming scenarios.
However, the Ryzen processors are not the only ones to exhibit this behavior.
Gaming on integrated GPUs is a perfect example of a memory starved situation. Take for instance the new AMD Ryzen 5 2400G and it's Vega-based GPU cores. In a full Vega 56 or 64 situation, these Vega cores utilize blazingly fast HBM 2.0 memory. However, due to constraints such as die space and cost, this processor does not integrate HBM.
Instead, both the CPU portion and the graphics portion of the APU must both depend on the same pool of DDR4 system memory. DDR4 is significantly slower than memory traditionally found on graphics cards such as GDDR5 or HBM. As a result, APU performance is usually memory limited to some extent.
In the past, we've done memory speed testing with AMD's older APUs, however with the launch of the new Ryzen and Vega based R3 2200G and R5 2400G, we decided to take another look at this topic.
For our testing, we are running the Ryzen 5 2400G at three different memory speeds, 2400 MHz, 2933 MHz, and 3200 MHz. While the maximum supported JEDEC memory standard for the R5 2400G is 2933, the memory provided by AMD for our processor review will support overclocking to 3200MHz just fine.
Specifications and Design
With all of the activity in both the GPU and CPU markets this year, it's hard to remember some of the launches in the first half of the year—including NVIDIA's GTX 1080 Ti. Maintaining the rank of fastest gaming GPU for the majority of the year, little has challenged NVIDIA's GP102-based offering, making it the defacto choice for high-end gamers.
Even though we've been giving a lot of attention to NVIDIA's new flagship TITAN V graphics card, the $3000 puts it out of the range of almost every gamer who doesn't have a day job involving deep learning.
Today, we're taking a look back to the (slightly) more reasonable GP102 and the one of the most premiere offerings to feature it, the ASUS ROG Strix GTX 1080 Ti.
While the actual specifications of the GP102 GPU onboard the ASUS Strix GTX 1080 Ti hasn't changed at all, let's take a moment to refresh ourselves on where it sits in regards to the rest of the market.
|RX Vega 64 Liquid||RX Vega 56||GTX 1080 Ti||GTX 1080||GTX 1070 Ti||GTX 1070|
|Base Clock||1406 MHz||1156 MHz||1480 MHz||1607 MHz||1607 MHz||1506 MHz|
|Boost Clock||1677 MHz||1471 MHz||1582 MHz||1733 MHz||1683 MHz||1683 MHz|
|Memory Clock||1890 MHz||1600 MHz||11000 MHz||10000 MHz||8000 MHz||8000 MHz|
|Memory Interface||2048-bit HBM2||2048-bit HBM2||352-bit G5X||256-bit G5X||256-bit||256-bit|
|Memory Bandwidth||484 GB/s||410 GB/s||484 GB/s||320 GB/s||256 GB/s||256 GB/s|
|TDP||345 watts||210 watts||250 watts||180 watts||180 watts||150 watts|
|Peak Compute||13.7 TFLOPS||10.5 TFLOPS||11.3 TFLOPS||8.2 TFLOPS||7.8 TFLOPS||5.7 TFLOPS|
The GTX 1000 series of products from NVIDIA has marked a consolidation in ASUS's GPU offerings. Instead of having both Strix and Matrix products available, the Strix has supplanted everything to be the most premium option from ASUS for any given GPU, and the Strix GTX 1080 Ti doesn't disappoint.
While it might not be the largest graphics card we've ever seen, the ASUS Strix GTX 1080 Ti is more massive in all dimensions compared to both the NVIDIA Founder's Edition card, as well as the EVGA ICX option we took a look at earlier this year. Compared to the Founder's Edition, the Strix GTX 1080 Ti is 1.23-in longer, 0.9-in taller, and takes up an extra PCIe slot in width.
How deep is your learning?
Recently, we've had some hands-on time with NVIDIA's new TITAN V graphics card. Equipped with the GV100 GPU, the TITAN V has shown us some impressive results in both gaming and GPGPU compute workloads.
However, one of the most interesting areas that NVIDIA has been touting for GV100 has been deep learning. With a 1.33x increase in single-precision FP32 compute over the Titan Xp, and the addition of specialized Tensor Cores for deep learning, the TITAN V is well positioned for deep learning workflows.
In mathematics, a tensor is a multi-dimensional array of numerical values with respect to a given basis. While we won't go deep into the math behind it, Tensors are a crucial data structure for deep learning applications.
NVIDIA's Tensor Cores aim to accelerate Tensor-based math by utilizing half-precision FP16 math in order to process both dimensions of a Tensor at the same time. The GV100 GPU contains 640 of these Tensor Cores to accelerate FP16 neural network training.
It's worth noting that these are not the first Tensor operation-specific hardware, with others such as Google developing hardware for these specific functions.
|PC Perspective Deep Learning Testbed|
|Processor||AMD Ryzen Threadripper 1920X|
|Motherboard||GIGABYTE X399 AORUS Gaming 7|
|Memory||64GB Corsair Vengeance RGB DDR4-3000|
|Storage||Samsung SSD 960 Pro 2TB|
|Power Supply||Corsair AX1500i 1500 watt|
|OS||Ubuntu 16.04.3 LTS|
|Drivers||AMD: AMD GPU Pro 17.50
For our NVIDIA testing, we used the NVIDIA GPU Cloud 17.12 Docker containers for both TensorFlow and Caffe2 inside of our Ubuntu 16.04.3 host operating system.
For all tests, we are using the ImageNet Large Scale Visual Recognition Challenge 2012 (ILSVRC2012) data set.
Looking Towards the Professionals
This is a multi-part story for the NVIDIA Titan V:
Earlier this week we dove into the new NVIDIA Titan V graphics card and looked at its performacne from a gaming perspective. Our conclusions were more or less what we expected - the card was on average ~20% faster than the Titan Xp and about ~80% faster than the GeForce GTX 1080. But with that $3000 price tag, the Titan V isn't going to win any enthusiasts over.
What the Titan V is meant for in reality is the compute space. Developers, coders, engineers, and professionals that use GPU hardware for research, for profit, or for both. In that case, $2999 for the Titan V is simply an investment that needs to show value in select workloads. And though $3000 is still a lot of money, keep in mind that the NVIDIA Quadro GP100, the most recent part with full-performance double precision compute from the Pascal chip, is still selling for well over $6000 today.
The Volta GV100 GPU offers 1:2 double precision performance, equating to 2560 FP64 cores. That is a HUGE leap over the GP102 GPU used on the Titan Xp that uses a 1:32 ratio, giving us just 120 FP64 cores equivalent.
|Titan V||Titan Xp||GTX 1080 Ti||GTX 1080||GTX 1070 Ti||GTX 1070||RX Vega 64 Liquid||Vega Frontier Edition|
|Base Clock||1200 MHz||1480 MHz||1480 MHz||1607 MHz||1607 MHz||1506 MHz||1406 MHz||1382 MHz|
|Boost Clock||1455 MHz||1582 MHz||1582 MHz||1733 MHz||1683 MHz||1683 MHz||1677 MHz||1600 MHz|
|Memory Clock||1700 MHz MHz||11400 MHz||11000 MHz||10000 MHz||8000 MHz||8000 MHz||1890 MHz||1890 MHz|
|384-bit G5X||352-bit G5X||256-bit G5X||256-bit||256-bit||2048-bit HBM2||2048-bit HBM2|
|Memory Bandwidth||653 GB/s||547 GB/s||484 GB/s||320 GB/s||256 GB/s||256 GB/s||484 GB/s||484 GB/s|
|TDP||250 watts||250 watts||250 watts||180 watts||180 watts||150 watts||345 watts||300 watts|
|Peak Compute||12.2 (base) TFLOPS
14.9 (boost) TFLOPS
|12.1 TFLOPS||11.3 TFLOPS||8.2 TFLOPS||7.8 TFLOPS||5.7 TFLOPS||13.7 TFLOPS||13.1 TFLOPS|
|Peak DP Compute||6.1 (base) TFLOPS
7.45 (boost) TFLOPS
|0.37 TFLOPS||0.35 TFLOPS||0.25 TFLOPS||0.24 TFLOPS||0.17 TFLOPS||0.85 TFLOPS||0.81 TFLOPS|
The current AMD Radeon RX Vega 64, and the Vega Frontier Edition, all ship with a 1:16 FP64 ratio, giving us the equivalent of 256 DP cores per card.
Test Setup and Benchmarks
Our testing setup remains the same from our gaming tests, but obviously the software stack is quite different.
|PC Perspective GPU Testbed|
|Processor||Intel Core i7-5960X Haswell-E|
|Motherboard||ASUS Rampage V Extreme X99|
|Memory||G.Skill Ripjaws 16GB DDR4-3200|
|Storage||OCZ Agility 4 256GB (OS)
Adata SP610 500GB (games)
|Power Supply||Corsair AX1500i 1500 watt|
|OS||Windows 10 x64|
Applications in use include:
- Cinebench R15
- Sisoft Sandra GPU Compute
- SPECviewperf 12.1
Let's not drag this along - I know you are hungry for results! (Thanks to Ken for running most of these tests for us!!)