Microsoft Announces Variable Rate Shading for DirectX 12

Subject: Graphics Cards | March 19, 2019 - 08:29 PM |
Tagged: microsoft, DirectX 12, turing, nvidia

To align with the Game Developers Conference, GDC 2019, Microsoft has announced Variable Rate Shading for DirectX 12. This feature increases performance by allowing the GPU to lower its shading resolution for specific parts of the scene (without the developer doing explicit, render texture-based tricks).

An NVIDIA speech from SIGGRAPH 2018 (last August)

The feature is divided into three parts:

  • Lowering the resolution of specific draw calls (tier 1)
  • Lowering the resolution within a draw call by using an image mask (tier 2)
  • Lowering the resolution within a draw call per primitive (ex: triangle) (tier 2)

The last two points are tagged as “tier 2” because they can reduce the workload within a single draw call, which is an item of work that is sent to the GPU. A typical draw call for a 3D engine is a set of triangles (vertices and indices) paired with a material (a shader program, textures, and properties). While it is sometimes useful to lower the resolution for particularly complex draw calls that take up a lot of screen space but whose output is also relatively low detail, such as water, there are real benefits to being more granular.

The second part, an image mask, allows detail to be reduced for certain areas of the screen. This can be useful in several situations:

  • The edges of a VR field of view
  • Anywhere that will be brutalized by a blur or distortion effect
  • Objects behind some translucent overlays
  • Even negating a tier 1-optimized section to re-add quality where needed

The latter example was the one that Microsoft focused on with their blog. Unfortunately, I am struggling to figure out what specifically is going on, because the changes that I see (ex: the coral reef, fish, and dirt) don’t line up with their red/blue visualizer. The claim is that they use an edge detection algorithm to force high-frequency shading where there would be high-frequency detail.

microsoft-2019-dx12-vrs-1.png

Right side reduces shading by 75% for terrain and water

microsoft-2019-dx12-vrs-2.png

Right side reclaims some lost fidelity based on edge detection algorithm

microsoft-2019-dx12-vrs-3.png

Visualization of where shading complexity is spent.

(Red is per-pixel. Blue is 1 shade per 2x2 block.)

Images Source: Firaxis via Microsoft

Microsoft claims that this feature will only be available for DirectX 12. That said, NVIDIA, when Turing launched, claims that Variable Rate Shading will be available for DirectX 11, DirectX 12, Vulkan, and OpenGL. I’m not sure what’s different between Microsoft’s implementation that lets them separate it from NVIDIA’s extension.

Microsoft will have good tools support, however. They claim that their PIX for Windows performance analysis tool will support this feature on Day 1.

Source: Microsoft

NVIDIA is ready to storm the server room

Subject: General Tech, Graphics Cards, Networking, Shows and Expos | March 19, 2019 - 06:16 PM |
Tagged: nvidia, t4, amazon, microsoft, NGC, Mellanox, CUDA-X, GTC, jen-hsun huang, DRIVE Constellation, ai

As part of their long list of announcements yesterday, NVIDIA revealed they are partnering with Cisco, Dell EMC, Fujitsu, Hewlett Packard Enterprise, Inspur, Lenovo and Sugon to provide servers powered by T4 Tensor Core GPUs, optimized to run their CUDA-X AI accelerators. 

t4.PNG

Those T4 GPUs have been on the market for a while but this marks the first major success for NVIDIA in the server room, with models available for purchase from those aforementioned companies.  Those who prefer other people's servers can also benefit from these new products, with Amazon and Microsoft offering Cloud based solutions.  Setting yourself up to run NVIDIA's NGC software may save a lot of money down the road, the cards sip a mere 70W of power which is rather more attractive than the consumption of a gaggle of Tesla V100s.  One might be guilty of suspecting this offers an explanation for their recent acquisition of Mellanox.

purty.PNG

NGC software offers more than just a platform to run optimizations on, it also offers a range of templates to start off with from classification, and object detection, through sentiment analysis and most other basic starting points for training a machine.  Customers will also be able to upload their own models to share internally or, if in the mood, externally with other users and companies.  It supports existing products such as TensorFlow and PyTorch but also offers access to CUDA-X AI, which as the name suggests takes advantage of the base design of the T4 GPU to reduce the amount of time waiting for results and letting users advance designs quickly. 

memorex.PNG

If you are curious exactly what particular implementations of everyone's favourite buzzword might be, NVIDIA's DRIVE Constellation is a example after JoshTekk's own heart.  Literally an way to create open, scalable simulation for large fleets of self-driving cars to train them ... for good one hopes.  Currently the Toyota Research Institute-Advanced Development utilizes these products in the development of their next self-driving fleet, and NVIDIA obviously hopes others will follow suit. 

replaced.PNG

There is not much to see from the perspective of a gamer in the short term, but considering NVIDIA's work at shifting the horsepower from the silicon you own to their own Cloud this will certainly impact the future of gaming from both a hardware and gameplay perspective.  GPUs as a Service may not be the future many of us want but this suggests it could be possible, not to mention the dirty tricks enemy AIs will be able to pull with this processing power behind them.

One might even dream that escort missions could become less of a traumatic experience!

Source: NVIDIA

GTC 19: NVIDIA Announces GameWorks RTX; Unreal Engine and Unity Gain DXR Support

Subject: General Tech | March 18, 2019 - 10:00 PM |
Tagged: gameworks, unreal engine, Unity, rtx, ray tracing, nvidia, GTC 19, GTC, dxr, developers

Today at GTC NVIDIA announced GameWorks RTX and the implementation of real-time ray tracing in the upcoming Unreal Engine 4.22 and the latest version of Unity, currently in 2019.03.

NVIDIA Announces GameWorks RTX

While Pascal and non-RTX Turing support for real-time ray tracing is something of a bombshell from NVIDIA, the creation of GameWorks tools for such effects is not surprising.

“NVIDIA GameWorks RTX is a comprehensive set of tools that help developers implement real time ray-traced effects in games. GameWorks RTX is available to the developer community in open source form under the GameWorks license and includes plugins for Unreal Engine 4.22 and Unity’s 2019.03 preview release.”

NVIDIA lists these components of GameWorks RTX:

GW_RTX.PNG

  • RTX Denoiser SDK – a library that enables fast, real-time ray tracing by providing denoising techniques to lower the required ray count and samples per pixel. It includes algorithms for ray traced area light shadows, glossy reflections, ambient occlusion and diffuse global illumination.
  • Nsight for RT – a standalone developer tool that saves developers time by helping to debug and profile graphics applications built with DXR and other supported APIs.

Unreal Engine and Unity Gaining Real-Time Ray Tracing Support

DXR_GAME_ENGINES.png

And while not specific to NVIDIA hardware, news of more game engines offering integrated DXR support was also announced during the keynote:

“Unreal Engine 4.22 is available in preview now, with final release details expected in Epic’s GDC keynote on Wednesday. Starting on April 4, Unity will offer optimized, production-focused, realtime ray tracing support with a custom experimental build available on GitHub to all users with full preview access in the 2019.03 Unity release. Real-time ray tracing support from other first-party AAA game engines includes DICE/EA’s Frostbite Engine, Remedy Entertainment’s Northlight Engine and engines from Crystal Dynamics, Kingsoft, Netease and others.”

RTX may have been off to a slow start, but this will apparently be the year of real-time ray tracing after all - especially with the upcoming NVIDIA driver update adding support to the GTX 10-series and new GTX 16-series GPUs.

Source: NVIDIA

NVIDIA to Add Real-Time Ray Tracing Support to Pascal GPUs via April Driver Update

Subject: Graphics Cards | March 18, 2019 - 09:41 PM |
Tagged: unreal engine, Unity, turing, rtx, ray tracing, pascal, nvidia, geforce, GTC 19, GTC, gaming, developers

Today at GTC NVIDIA announced a few things of particular interest to gamers, including GameWorks RTX and the implementation of real-time ray tracing in upcoming versions of both Unreal Engine and Unity (we already posted the news that CRYENGINE will be supporting real-time ray tracing as well). But there is something else... NVIDIA is bringing ray tracing support to GeForce GTX graphics cards.

DXR_GPUs.png

This surprising turn means that hardware RT support won’t be limited to RTX cards after all, as the install base of NVIDIA ray-tracing GPUs “grows to tens of millions” with a simple driver update next month, adding the feature to both to previous-gen Pascal and the new Turing GTX GPUs.

How is this possible? It’s all about the programmable shaders:

“NVIDIA GeForce GTX GPUs powered by Pascal and Turing architectures will be able to take advantage of ray tracing-supported games via a driver expected in April. The new driver will enable tens of millions of GPUs for games that support real-time ray tracing, accelerating the growth of the technology and giving game developers a massive installed base.

With this driver, GeForce GTX GPUs will execute ray traced effects on shader cores. Game performance will vary based on the ray-traced effects and on the number of rays cast in the game, along with GPU model and game resolution. Games that support the Microsoft DXR and Vulkan APIs are all supported.

However, GeForce RTX GPUs, which have dedicated ray tracing cores built directly into the GPU, deliver the ultimate ray tracing experience. They provide up to 2-3x faster ray tracing performance with a more visually immersive gaming environment than GPUs without dedicated ray tracing cores.”

A very important caveat is that “2-3x faster ray tracing performance” for GeForce RTX graphics cards mentioned in the last paragraph, so expectations will need to be tempered as RT features will be less efficient running on shader cores (Pascal and Turing) than they are with dedicated cores, as demonstrated by these charts:

BFV_CHART.png

METRO_EXODUS_CHART.png

SOTTR_CHART.png

It's going to be a busy April.

Source: NVIDIA
Manufacturer: PC Perspective

AMD and NVIDIA GPUs Tested

Tom Clancy’s The Division 2 launched over the weekend and we've been testing it out over the past couple of days with a collection of currently-available graphics cards. Of interest to AMD fans, this game joins the ranks of those well optimized for Radeon graphics, and with a new driver (Radeon Software Adrenalin 2019 Edition 19.3.2) released over the weekend it was a good time to run some benchmarks and see how some AMD and NVIDIA hardware stack up.

d2-key-art-1920x600.jpg

The Division 2 offers DirectX 11 and 12 support, and uses Ubisoft's Snowdrop engine to provide some impressive visuals, particularly at the highest detail settings. We found the "ultra" preset to be quite attainable with very playable frame rates from most midrange-and-above hardware even at 2560x1440, though bear in mind that this game uses quite a bit of video memory. We hit a performance ceiling at 4GB with the "ultra" preset even at 1080p, so we opted for 6GB+ graphics cards for our final testing. And while most of our testing was done at 1440p we did test a selection of cards at 1080p and 4K, just to provide a look at how the GPUs on test scaled when facing different workloads.

Tom Clancy's The Division 2

d2-screen1-1260x709.jpg

Washington D.C. is on the brink of collapse. Lawlessness and instability threaten our society, and rumors of a coup in the capitol are only amplifying the chaos. All active Division agents are desperately needed to save the city before it's too late.

d2-screen4-1260x709.jpg

Developed by Ubisoft Massive and the same teams that brought you Tom Clancy’s The Division, Tom Clancy’s The Division 2 is an online open world, action shooter RPG experience set in a collapsing and fractured Washington, D.C. This rich new setting combines a wide variety of beautiful, iconic, and realistic environments where the player will experience the series’ trademark for authenticity in world building, rich RPG systems, and fast-paced action like never before.

d2-screen3-1260x709.jpg

Play solo or co-op with a team of up to four players to complete a wide range of activities, from the main campaign and adversarial PvP matches to the Dark Zone – where anything can happen.

Continue reading our preview of GPU performance with The Division 2

Remember Conservative Morphological Anti-Aliasing?

Subject: Graphics Cards | March 18, 2019 - 03:13 PM |
Tagged: fxaa, SMAA, Anti-aliasing, MLAA, taa, amd, nvidia

Apart from the new DLSS available on NVIDIA's RTX cards, it has been a very long time since we looked at anti-aliasing implementations and the effects your choice has on performance and visual quality.  You are likely familiar with the four most common implementations, dating back to AMD's MLAA and NVIDIA's FXAA which are not used in new generation games to TAA/TXAA and SMAA but when was the last time you refreshed your memory on what they actually do and how they compare.

Not only did Overclockers Club looking into those, they discuss some of the other attempted implementations as well as sampling types that lie behind these technologies.  Check out their deep dive here.

anti.PNG

"One setting present in many if not all modern PC games that can dramatically impact performance and quality is anti-aliasing and, to be honest, I never really understood how it works. Sure we have the general idea that super-sampling is in effect running at a higher resolution and then downscaling, but then what is multi-sampling? How do post-processing methods work, like the very common FXAA and often favored SMAA?"

Here are some more Graphics Card articles from around the web:

Graphics Cards

Crytek's Neon Noir is a Platform Agnostic Real-Time Ray Tracing Demo

Subject: General Tech | March 18, 2019 - 09:03 AM |
Tagged: vulkan, RX Vega 56, rtx, ray tracing, radeon, nvidia, Neon Noir, dx12, demo, crytek, CRYENGINE, amd

Crytek has released video of a new demo called Neon Noir, showcasing real-time ray tracing with a new version of CRYENGINE Total Illumination, slated for release in 2019. The big story here is that this is platform agnostic, meaning both AMD and NVIDIA (including non-RTX) graphics cards can produce the real-time lighting effects. The video was rendered in real time using an AMD Radeon RX Vega 56 (!) at 4K30, with Crytek's choice in GPU seeming to assuage fears of any meaningful performance penalty with this feature enabled (video embedded below):

“Neon Noir follows the journey of a police drone investigating a crime scene. As the drone descends into the streets of a futuristic city, illuminated by neon lights, we see its reflection accurately displayed in the windows it passes by, or scattered across the shards of a broken mirror while it emits a red and blue lighting routine that will bounce off the different surfaces utilizing CRYENGINE's advanced Total Illumination feature. Demonstrating further how ray tracing can deliver a lifelike environment, neon lights are reflected in the puddles below them, street lights flicker on wet surfaces, and windows reflect the scene opposite them accurately.”

Crytek is calling the new ray tracing features “experimental” at this time, but the implications of ray tracing tech beyond proprietary hardware and even graphics API (it works with both DirectX 12 and Vulcan) are obviously a very big deal.

crytek_demo.png

“Neon Noir was developed on a bespoke version of CRYENGINE 5.5., and the experimental ray tracing feature based on CRYENGINE’s Total Illumination used to create the demo is both API and hardware agnostic, enabling ray tracing to run on most mainstream, contemporary AMD and NVIDIA GPUs. However, the future integration of this new CRYENGINE technology will be optimized to benefit from performance enhancements delivered by the latest generation of graphics cards and supported APIs like Vulkan and DX12.”

You can read the full announcement from Crytek here.

Source: Crytek

Need a new NVIDIA GPU but don't want to get Ti'd down in debt?

Subject: Graphics Cards | March 14, 2019 - 01:33 PM |
Tagged: video card, turing, rtx, nvidia, gtx 1660 ti, gtx 1660, gtx 1060, graphics card, geforce, GDDR5, gaming, 6Gb

Sebastian has given you a look at the triple slot EVGA GTX 1660 XC Black as well as the dual fan and dual slot MSI GTX 1660 GAMING X, both doing well in benchmarks especially when overclocked.  The new GTX 1660 does come in other shapes and sizes, like the dual slot, single fan GTX 1660 StormX OC 6G from Palit which The Guru of 3D reviewed.  Do not underestimate it because of its diminutive size, the Boost Clock is 1830MHz out of the box and with some tweaking will sit around 2070MHz and the GDDR5 pushed up to 9800MHz.

Check out even more models below.

img_7957.jpg

"We review a GeForce GTX 1660 that is priced spot on that 219 USD marker, the MSRP of the new non-Ti model, meet the petite Palit GeForce GTX 1660 StormX OC edition. Based on a big single fan and a small form factor you should not be fooled by its looks. It performs well on all fronts, including cooling acoustic levels."

Here are some more Graphics Card articles from around the web:

Graphics Cards

Source: Guru of 3D
Manufacturer: NVIDIA

Turing at $219

NVIDIA has introduced another midrange GPU with today’s launch of the GTX 1660. It joins the GTX 1660 Ti as the company’s answer to high frame rate 1080p gaming, and hits a more aggressive $219 price point, with the GTX 1660 Ti starting at $279. What has changed, and how close is this 1660 to the “Ti” version launched just last month? We find out here.

GTX_1660_cards.jpg

RTX and Back Again

We are witnessing a shift in branding from NVIDIA, as GTX was supplanted by RTX with the introduction of the 20 series, only to see “RTX” give way to GTX as we moved down the product stack beginning with the GTX 1660 Ti. This has been a potentially confusing change for consumers used to the annual uptick in series number. Most recently we saw the 900 series move logically to 1000 series (aka 10 series) cards, so when the first 2000 series cards were released it seemed as if the 20 series would be a direct successor to the GTX cards of the previous generation.

But RTX ended up being more of a feature level designation, and not so much a new branding for GeForce cards as we had anticipated. No, GTX is here to stay it appears, and what then of the RTX cards and their real-time ray tracing capabilities? Here the conversation changes to focus on higher price tags and the viability of early adoption of ray tracing tech, and enter the internet of outspoken individuals who decry ray-tracing, and more so DLSS; NVIDIA’s proprietary deep learning secret sauce that has seemingly become as controversial as the Genesis planet in Star Trek III.

  GTX 1660 GTX 1660 Ti RTX 2060 RTX 2070 GTX 1080 GTX 1070 GTX 1060 6GB
GPU TU116 TU116 TU106 TU106 GP104 GP104 GP106
Architecture Turing Turing Turing Turing Pascal Pascal Pascal
SMs 22 24 30 36 20 15 10
CUDA Cores 1408 1536 1920 2304 2560 1920 1280
Tensor Cores N/A N/A 240 288 N/A N/A N/A
RT Cores N/A N/A 30 36 N/A N/A N/A
Base Clock 1530 MHz 1500 MHz 1365 MHz 1410 MHz 1607 MHz 1506 MHz 1506 MHz
Boost Clock 1785 MHz 1770 MHz 1680 MHz 1620 MHz 1733 MHz 1683 MHz 1708 MHz
Texture Units 88 96 120 144 160 120 80
ROPs 48 48 48 64 64 64 48
Memory 6GB GDDR5 6GB GDDR6 6GB GDDR6 8GB GDDR6 8GB GDDR5X 8GB GDDR5 6GB GDDR5
Memory Data Rate 8 Gbps 12 Gbps 14 Gbps 14 Gbps 10 Gbps 8 Gbps 8 Gbps
Memory Interface 192-bit 192-bit 192-bit 256-bit 256-bit 256-bit 192-bit
Memory Bandwidth 192.1 GB/s 288.1 GB/s 336.1 GB/s 448.0 GB/s 320.3 GB/s 256.3 GB/s 192.2 GB/s
Transistor Count 6.6B 6.6B 10.8B 10.8B 7.2B 7.2B 4.4B
Die Size 284 mm2 284 mm2 445 mm2 445 mm2 314 mm2 314 mm2 200 mm2
Process Tech 12 nm 12 nm 12 nm 12 nm 16 nm 16 nm 16 nm
TDP 120W 120W 160W 175W 180W 150W 120W
Launch Price $219 $279 $349 $499 $599 $379 $299

So what is a GTX 1660 minus the “Ti”? A hybrid product of sorts, it turns out. The card is based on the same TU116 GPU as the GTX 1660 Ti, and while the Ti features the full version of TU116, this non-Ti version has two of the SMs disabled, bringing the count from 24 to 22. This results in a total of 1408 CUDA cores - down from 1536 with the GTX 1660 Ti. This 128-core drop is not as large as I was expecting from the vanilla 1660, and with the same memory specs the capabilities of this card would not fall far behind - but this card uses the older GDDR5 standard, matching the 8 Gbps speed and 192 GB/s bandwidth of the outgoing GTX 1060, and not the 12 Gbps GDDR6 and 288.1 GB/s bandwidth of the GTX 1660 Ti.

Continue reading our review of the NVIDIA GeForce GTX 1660 graphics card

NVIDIA Acquires Mellanox: Beyond the Numbers

Subject: Editorial | March 12, 2019 - 10:14 PM |
Tagged: nvswitch, nvlink, nvidia, Mellanox, Intel, Infiniband, Ethernet, communications, chiplets, amd

In a bit of a surprise this past weekend NVIDIA announced that it is purchasing the networking company Mellanox for approximately $6.9 billion US. NVIDIA and Intel were engaged in a bidding war for the Israel based company. At first glance we do not see the synergies that could potentially come from such an acquisition, but in digging deeper it makes much more sense. This is still a risky move for NVIDIA as their previous history of acquisitions have not been very favorable for the company (Ageia, Icera, etc.).

633889_NVLogo_3D_H_DarkType.jpg

Mellanox’s portfolio centers around datacenter connectivity solutions such as high speed ethernet and InfiniBand products. They are already a successful company that has products shipping out the door. If there is a super computer somewhere, chances are it is running Mallanox technology for high speed interconnects. This is where things get interesting for NVIDIA.

While NVIDIA focuses on GPUS they are spreading into the datacenter at a pretty tremendous rate. Their NVLink implementation allows high speed connectivity between GPUS and recently they showed off their NVSwitch which features 18 ports. We do not know how long it took to design the NVSwitch and get it running at a high level, but NVIDIA is aiming for implementations that will exceed that technology. NVIDIA had the choice to continue in-house designs or to purchase a company already well versed in such work with access to advanced networking technology.

Intel was also in play for Mellanox, but that particular transaction might not have been approved by anti-trust authorities around the world. If Intel had made an aggressive bid for Mellanox it would have essentially consolidated the market for these high end networking products. In the end NVIDIA offered the $6.9B US for the company and it was accepted. Because NVIDIA has no real networking solutions that are on the market it will likely be approved without issue. Unlike other purchases like Icera, Mellanox is actively shipping product and will add to the bottom line at NVIDIA.

mellanox-logo-square-blue.jpg

The company was able to purchase Mellanox in a cash transaction. They simply dove into their cash reserves instead of offering Mellanox shareholders equal shares in NVIDIA. This $6.9B is above what AMD paid for ATI back in 2006 ($5.4B). There may be some similarities here in that the price for Mellanox could be overvalued compared to what they actually bring to the table and we will see write downs over the next several years, much as AMD did for the ATI purchase.

The purchase will bring them instant expertise with high performance standards like InfiniBand. It will also help to have design teams versed in high speed, large node networking apply their knowledge to the GPU field and create solutions better suited for the technology. They will also continue to sell current Mellanox products.

Another purchase in the past that looks somewhat similar to this is AMD’s acquisition of SeaMicro. That company was selling products based on their Freedom Fabric technology to create ultra-dense servers utilizing dozens of CPUs. This line of products was discontinued by AMD after poor sales, but they expanded upon Freedom Fabric and created the Infinity Fabric that powers their latest Zen CPUs.

I can see a very similar situation occurring at NVIDIA. AMD is using their Infinity Fabric to connect multiple chiplets on a substrate, as well as utilizing that fabric off of the substrate. It also has integrated that fabric into their latest Vega GPUs. This philosophy looks to pay significant dividends for AMD once they introduce their 7nm CPUs in the form of Zen 2 and EPYC 2. AMD is not relying on large, monolithic dies for both their consumer and enterprise parts, thereby improving yields and bins on these parts as compared to what Intel does with current Xeon parts.

mellanox-quantum-connectx-6-chips-652x381.jpg

When looking at the Mellanox purchase from this view, it makes a lot of sense for NVIDIA. With process node advances moving at a much slower pace, the demand for higher performance solutions is only increasing. To meet this demand NVIDIA will be required to make efficient, multi-chip solutions that may require more performance and features than what can be covered by NVLINK. Mellanox could potentially provide the expertise and experience to help NVIDIA achieve such scale.

Source: NVIDIA