GTC 2018: Nvidia and ARM Integrating NVDLA Into Project Trillium For Inferencing at the Edge

Subject: General Tech | March 29, 2018 - 03:10 PM |
Tagged: project trillium, nvidia, machine learning, iot, GTC 2018, GTC, deep learning, arm, ai

During GTC 2018 NVIDIA and ARM announced a partnership that will see ARM integrate NVIDIA's NVDLA deep learning inferencing accelerator into the company's Project Trillium machine learning processors. The NVIDIA Deep Learning Accelerator (NVDLA) is an open source modular architecture that is specifically optimized for inferencing operations such as object and voice recognition and bringing that acceleration to the wider ARM ecosystem through Project Trillium will enable a massive number of smarter phones, tablets, Internet-of-Things, and embedded devices that will be able to do inferencing at the edge which is to say without the complexity and latency of having to rely on cloud processing. This means potentially smarter voice assistants (e.g. Alexa, Google), doorbell cameras, lighting, and security around the home and out-and-about on your phone for better AR, natural translation, and assistive technologies.

NVIDIAandARM_NVDLA.jpg

Karl Freund, lead analyst for deep learning at Moor Insights & Strategy was quoted in the press release in stating:

“This is a win/win for IoT, mobile and embedded chip companies looking to design accelerated AI inferencing solutions. NVIDIA is the clear leader in ML training and Arm is the leader in IoT end points, so it makes a lot of sense for them to partner on IP.”

ARM's Project Trillium was announced back in February and is a suite of IP for processors optimized for parallel low latency workloads and includes a Machine Learning processor, Object Detection processor, and neural network software libraries. NVDLA is a hardware and software platform based upon the Xavier SoC that is highly modular and configurable hardware that can feature a convolution core, single data processor, planar data processor, channel data processor, and data reshape engines. The NVDLA can be configured with all or only some of those elements and they can independently them up or down depending on what processing acceleration they need for their devices. NVDLA connects to the main system processor over a control interface and through two AXI memory interfaces (one optional) that connect to system memory and (optionally) dedicated high bandwidth memory (not necessarily HBM but just its own SRAM for example).

arm project trillium integrates NVDLA.jpg

NVDLA is presented as a free and open source architecture that promotes a standard way to design deep learning inferencing that can accelerate operations to infer results from trained neural networks (with the training being done on other devices perhaps by the DGX-2). The project, which hosts the code on GitHub and encourages community contributions, goes beyond the Xavier-based hardware and includes things like drivers, libraries, TensorRT support (upcoming)  for Google's TensorFlow acceleration, testing suites and SDKs as well as a deep learning training infrastructure (for the training side of things) that is compatible with the NVDLA software and hardware, and system integration support.

Bringing the "smarts" of smart devices to the local hardware and closer to the users should mean much better performance and using specialized accelerators will reportedly offer the performance levels needed without blowing away low power budgets. Internet-of-Things (IoT) and mobile devices are not going away any time soon, and the partnership between NVIDIA and ARM should make it easier for developers and chip companies to offer smarter (and please tell me more secure!) smart devices.

Also read:

Source: NVIDIA

Podcast #493 - New XPS 13, Noctua NH-L9a, News from NVIDIA GTC and more!

Subject: General Tech | March 29, 2018 - 02:37 PM |
Tagged: podcast, nvidia, GTC 2018, Volta, quadro gv100, dgx-2, noctua, NH-L9a-AM4

PC Perspective Podcast #493 - 03/29/18

Join us this week for our review of the new XPS 13,  Noctua NH-L9a, news from NVIDIA GTC and more!

You can subscribe to us through iTunes and you can still access it directly through the RSS page HERE.

The URL for the podcast is: http://pcper.com/podcast - Share with your friends!

Hosts: Allyn Malventano, Jeremy Hellstrom, Josh Walrath

Peanut Gallery: Ken Addison

Program length: 0:59:35

Podcast topics of discussion:

  1. Week in Review:
  2. News items of interest:
  3. Picks of the Week:
    1. Allyn: retro game music remixed - ocremix.org (torrents)

How to see Hope County, Montana in all its glory

Subject: General Tech | March 28, 2018 - 02:08 PM |
Tagged: gaming, amd, nvidia, far cry 5

Looking to get Far Cry 5 running with the highest settings your GPU can handle?  The Guru of 3D have done a lot of the heavy lifting for you, testing the performance of thirteen cards each from NVIDIA and AMD at 1080p, 1440p and 4K resolutions.  This game needs some juice, even the mighty Titan's cannot reach 60fps with Ultra settings at 4K.  In the review they also take a look at the effect the number of cores and the frequency of your CPU has on performance, not much but enough you might notice.  Check the full details here.

bull_gold_1080p_1495792038.jpg

"Tomorrow Far Cry 5 will become available to the masses, we put it through our testing paces with almost 30 graphics cards, CPU performance and frame times. The looks great, and will offer great game play. Join us in this PC performance analysis."

Here is some more Tech News from around the web:

Tech Talk

 

Source: Guru of 3D

NVIDIA Announces DGX-2 with 16 GV100s & 8 100Gb NICs

Subject: Systems | March 27, 2018 - 08:04 PM |
Tagged: Volta, nvidia, dgx-2, DGX

So… this is probably not for your home.

NVIDIA has just announced their latest pre-built system for enterprise customers: the DGX-2. In it, sixteen Volta-based Tesla V100 graphics devices are connected using NVSwitch. This allows groups of graphics cards to communicate to and from every other group at 300GB/s, which, to give a sense of scale, is about as much bandwidth as the GTX 1080 has available to communicate with its own VRAM. NVSwitch treats all 512GB as a unified memory space, too, which means that the developer doesn’t need redundant copies across multiple boards just so it can be seen by the target GPU.

nvidia-2018-dgx2-explode.png

Note: 512GB is 16 x 32GB. This is not a typo. 32GB Tesla V100s are now available.

For a little recap, Tesla V100 cards run a Volta-based GV100 GPU, which has 5120 CUDA cores and runs them at ~15 TeraFLOPs of 32-bit performance. Each of these cores also scale exactly to FP64 and FP16, as was the case since Pascal’s high-end offering, leading to ~7.5 TeraFLOPs of 64-bit or ~30 TeraFLOPs of 16-bit computational throughput. Multiply that by sixteen and you get 480 TeraFLOPs of FP16, 240 TeraFLOPs of FP32, or 120 TeraFLOPs of FP64 performance for the whole system. If you count the tensor units, then we’re just under 2 PetaFlops of tensor instructions. This is powered by a pair of Xeon Platinum CPUs (Skylake) and backed by 1.5TB of system RAM – which is only 3x the amount of RAM that the GPUs have if you stop and think about it.

nvidia-2018-dgx-list.png

The device communicates with the outside world through eight EDR InfiniBand NICs. NVIDIA claims that this yields 1600 gigabits of bi-directional bandwidth. Given how much data this device is crunching, it makes sense to keep data flowing in and out as fast as possible, especially for real-time applications. While the Xeons are fast and have many cores, I’m curious to see how much overhead the networking adds to the system when under full load, minus any actual processing.

NVIDIA’s DGX-2 is expected to ship in Q3.

Source: NVIDIA

GTC 2018: NVIDIA Announces Volta-Powered Quadro GV100

Subject: General Tech | March 27, 2018 - 03:30 PM |
Tagged: nvidia, GTC, quadro, gv100, GP100, tesla, titan v, v100, votla

One of the big missing markets for NVIDIA with their slow rollout of the Volta architecture was professional workstations. Today, NVIDIA announced they are bringing Volta to the Quadro family with the Quadro GV100 card.

27-gv100-gpu.jpg

Powered by the same GV100 GPU that announced at last year's GTC in the Tesla V100, and late last year in the Titan V, the Quadro GV100 represents a leap forward in computing power for workstation-level applications. While these users could currently be using TITAN V for similar workloads, as we've seen in the past, Quadro drivers generally provide big performance advantages in these sorts of applications. Although, we'd love to see NVIDIA repeat their move of bringing these optimizations to the TITAN lineup as they did with the TITAN Xp.

As it is a Quadro, we would expect this to be NVIDIA's first Volta-powered product which provides certified, professional driver code paths for applications such as CATIA, Solidedge, and more.

quadro-gv100.png

NVIDIA also heavily promoted the idea of using two of these GV100 cards in one system, utilizing NVLink. Considering the lack of NVLink support for the TITAN V, this is also the first time we've seen a Volta card with display outputs supporting NVLink in more standard workstations.

More importantly, this announcement brings NVIDIA's RTX technology to the professional graphics market. 

With popular rendering applications like V-Ray already announcing and integrating support for NVIDIA's Optix Raytracing denoiser in their beta branch, it seems only a matter of time before we'll see a broad suite of professional applications supporting RTX technology for real-time. For example, raytraced renders of items being designed in CAD and modeling applications. 

This sort of speed represents a potential massive win for professional users, who won't have to waste time waiting for preview renderings to complete to continue iterating on their projects.

The NVIDIA Quadro GV100 is available now directly from NVIDIA now for a price of $8,999, which puts it squarely in the same price range of the previous highest-end Quadro GP100. 

Source: NVIDIA

GDC 2018: NVIDIA Adds new Ansel and Highlights features to GeForce Experience

Subject: Graphics Cards | March 21, 2018 - 09:37 PM |
Tagged: GDC, GDC 2018, nvidia, geforce experience, ansel, nvidia highlights, call of duty wwii, fortnite, pubg, tekken 7

Building upon the momentum of being included in the two most popular PC games in the world, PlayerUnknown's Battlegrounds and Fortnite, NVIDIA Highlights (previously known as ShadowPlay Highlights) is expanding to even more titles. Support for Call of Duty: WWII and Tekken 7 are available now, with Dying Light: Bad Blood and Escape from Tarkov coming soon.

For those unfamiliar with NVIDIA Highlights, it’s a feature that when integrated into a game, allows for the triggering of automatic screen recording when specific events happen. For example, think of the kill cam in Call of Duty. When enabled, Highlights will save a recording whenever the kill cam is triggered, allowing you to share exciting gameplay moments without having to think about it.

Animated GIF support has also been added to NVIDIA Highlights, allowing users to share shorter clips to platforms including Facebook, Google Photos, or Weibo.

In addition to supporting more games and formats, NVIDIA has also released the NVIDIA Highlights SDK, as well as plugins for Unreal Engine and Unity platforms. Previously, NVIDIA was working with developers to integrate Highlights into their games, but now developers will have the ability to add the support themselves.

Hopefully, these changes mean a quicker influx more titles with Highlights support, compared to the 16 currently supported titles.

In addition to enhancements in Highlights, NVIDIA has also launched a new sharing site for screen captures performed with the Ansel in-game photography tool.

The new ShotWithGeforce.com lets users upload and share their captures from any Ansel supported game.

shotwithgeforce.PNG

Screenshots uploaded to Shot With GeForce are tagged with the specific game the capture is from, making it easy for users to scroll through all of the uploaded captures from a given title.

Source: NVIDIA
Manufacturer: Microsoft

O Rayly? Ya Rayly. No Ray!

Microsoft has just announced a raytracing extension to DirectX 12, called DirectX Raytracing (DXR), at the 2018 Game Developer's Conference in San Francisco.

microsoft-2015-directx12-logo.jpg

The goal is not to completely replace rasterization… at least not yet. This effect will be mostly implemented for effects that require supplementary datasets, such as reflections, ambient occlusion, and refraction. Rasterization, the typical way that 3D geometry gets drawn on a 2D display, converts triangle coordinates into screen coordinates, and then a point-in-triangle test runs across every sample. This will likely occur once per AA sample (minus pixels that the triangle can’t possibly cover -- such as a pixel outside of the triangle's bounding box -- but that's just optimization).

microsoft-2018-gdc-directx12raytracing-rasterization.png

For rasterization, each triangle is laid on a 2D grid corresponding to the draw surface.
If any sample is in the triangle, the pixel shader is run.
This example shows the rotated grid MSAA case.

A program, called a pixel shader, is then run with some set of data that the GPU could gather on every valid pixel in the triangle. This set of data typically includes things like world coordinate, screen coordinate, texture coordinates, nearby vertices, and so forth. This lacks a lot of information, especially things that are not visible to the camera. The application is free to provide other sources of data for the shader to crawl… but what?

  • Cubemaps are useful for reflections, but they don’t necessarily match the scene.
  • Voxels are useful for lighting, as seen with NVIDIA’s VXGI and VXAO.

This is where DirectX Raytracing comes in. There’s quite a few components to it, but it’s basically a new pipeline that handles how rays are cast into the environment. After being queued, it starts out with a ray-generation stage, and then, depending on what happens to the ray in the scene, there are close-hit, any-hit, and miss shaders. Ray generation allows the developer to set up how the rays are cast, where they call an HLSL instrinsic instruction, TraceRay (which is a clever way of invoking them, by the way). This function takes an origin and a direction, so you can choose to, for example, cast rays only in the direction of lights if your algorithm was to, for instance, approximate partially occluded soft shadows from a non-point light. (There are better algorithms to do that, but it's just the first example that came off the top of my head.) The close-hit, any-hit, and miss shaders occur at the point where the traced ray ends.

To connect this with current technology, imagine that ray-generation is like a vertex shader in rasterization, where it sets up the triangle to be rasterized, leading to pixel shaders being called.

microsoft-2018-gdc-directx12raytracing-multibounce.png

Even more interesting – the close-hit, any-hit, and miss shaders can call TraceRay themselves, which is used for multi-bounce and other recursive algorithms (see: figure above). The obvious use case might be reflections, which is the headline of the GDC talk, but they want it to be as general as possible, aligning with the evolution of GPUs. Looking at NVIDIA’s VXAO implementation, it also seems like a natural fit for a raytracing algorithm.

Speaking of data structures, Microsoft also detailed what they call the acceleration structure. Each object is composed of two levels. The top level contains per-object metadata, like its transformation and whatever else data that the developer wants to add to it. The bottom level contains the geometry. The briefing states, “essentially vertex and index buffers” so we asked for clarification. DXR requires that triangle geometry be specified as vertex positions in either 32-bit float3 or 16-bit float3 values. There is also a stride property, so developers can tweak data alignment and use their rasterization vertex buffer, as long as it's HLSL float3, either 16-bit or 32-bit.

As for the tools to develop this in…

microsoft-2018-gdc-PIX.png

Microsoft announced PIX back in January 2017. This is a debugging and performance analyzer for 64-bit, DirectX 12 applications. Microsoft will upgrade it to support DXR as soon as the API is released (specifically, “Day 1”). This includes the API calls, the raytracing pipeline resources, the acceleration structure, and so forth. As usual, you can expect Microsoft to support their APIs with quite decent – not perfect, but decent – documentation and tools. They do it well, and they want to make sure it’s available when the API is.

ea-2018-SEED screenshot (002).png

Example of DXR via EA's in-development SEED engine.

In short, raytracing is here, but it’s not taking over rasterization. It doesn’t need to. Microsoft is just giving game developers another, standardized mechanism to gather supplementary data for their games. Several game engines have already announced support for this technology, including the usual suspects of anything top-tier game technology:

  • Frostbite (EA/DICE)
  • SEED (EA)
  • 3DMark (Futuremark)
  • Unreal Engine 4 (Epic Games)
  • Unity Engine (Unity Technologies)

They also said, “and several others we can’t disclose yet”, so this list is not even complete. But, yeah, if you have Frostbite, Unreal Engine, and Unity, then you have a sizeable market as it is. There is always a question about how much each of these engines will support the technology. Currently, raytracing is not portable outside of DirectX 12, because it’s literally being announced today, and each of these engines intend to support more than just Windows 10 and Xbox.

Still, we finally have a standard for raytracing, which should drive vendors to optimize in a specific direction. From there, it's just a matter of someone taking the risk to actually use the technology for a cool work of art.

If you want to read more, check out Ryan's post about the also-announced RTX, NVIDIA's raytracing technology.

NVIDIA RTX Technology Accelerates Ray Tracing for Microsoft DirectX Raytracing API

Subject: Graphics Cards | March 19, 2018 - 01:00 PM |
Tagged: rtx, nvidia, dxr

The big news from the Game Developers Conference this week was Microsoft’s reveal of its work on a new ray tracing API for DirectX called DirectX Raytracing. As the name would imply, this is a new initiative to bring the image quality improvements of ray tracing to consumer hardware with the push of Microsoft’s DX team. Scott already has a great write up on that news and current and future implications of what it will mean for PC gamers, so I highly encourage you all to read that over before diving more into this NVIDIA-specific news.

Ray tracing has been the holy grail of real-time rendering. It is the gap between movies and games – though ray tracing continues to improve in performance it takes the power of offline server farms to render the images for your favorite flicks. Modern game engines continue to use rasterization, an efficient method for rendering graphics but one that depends on tricks and illusions to recreate the intended image. Ray tracing inherently solves the problems that rasterization works around including shadows, transparency, refraction, and reflection. But it does so at a prohibitive performance cost. But that will be changing with Microsoft’s enablement of ray tracing through a common API and technology like what NVIDIA has built to accelerate it.

04.jpg

Alongside support and verbal commitment to DXR, NVIDIA is announcing RTX Technology. This is a combination of hardware and software advances to improve the performance of ray tracing algorithms on its hardware and it works hand in hand with DXR. NVIDIA believes this is the culmination of 10 years of development on ray tracing, much of which we have talked about on this side from the world of professional graphics systems. Think Iray, OptiX, and more.

RTX will run on Volta GPUs only today, which does limit usefulness to gamers. With the only graphics card on the market even close to considered a gaming product the $3000 TITAN V, RTX is more of a forward-looking technology announcement for the company. We can obviously assume then that RTX technology will be integrated on any future consumer gaming graphics cards, be that a revision of Volta of something completely different. (NVIDIA refused to acknowledge plans for any pending Volta consumer GPUs during our meeting.)

The idea I get from NVIDIA is that today’s RTX is meant as a developer enablement platform, getting them used to the idea of adding ray tracing effects into their games and engines and to realize that NVIDIA provides the best hardware to get that done.

I’ll be honest with you – NVIDIA was light on the details of what RTX exactly IS and how it accelerates ray tracing. One very interesting example I was given was seen first with the AI-powered ray tracing optimizations for Optix from last year’s GDC. There, NVIDIA demonstrated that using the Volta Tensor cores it could run an AI-powered de-noiser on the ray traced image, effectively improving the quality of the resulting image and emulating much higher ray counts than are actually processed.

By using the Tensor cores with RTX for DXR implementation on the TITAN V, NVIDIA will be able to offer image quality and performance for ray tracing well ahead of even the TITAN Xp or GTX 1080 Ti as those GPUs do not have Tensor cores on-board. Does this mean that all (or flagship) consumer graphics cards from NVIDIA will includ Tensor cores to enable RTX performance? Obviously, NVIDIA wouldn’t confirm that but to me it makes sense that we will see that in future generations. The scale of Tensor core integration might change based on price points, but if NVIDIA and Microsoft truly believe in the future of ray tracing to augment and significantly replace rasterization methods, then it will be necessary.

Though that is one example of hardware specific features being used for RTX on NVIDIA hardware, it’s not the only one that is on Volta. But NVIDIA wouldn’t share more.

The relationship between Microsoft DirectX Raytracing and NVIDIA RTX is a bit confusing, but it’s easier to think of RTX as the underlying brand for the ability to ray trace on NVIDIA GPUs. The DXR API is still the interface between the game developer and the hardware, but RTX is what gives NVIDIA the advantage over AMD and its Radeon graphics cards, at least according to NVIDIA.

DXR will still run on other GPUS from NVIDIA that aren’t utilizing the Volta architecture. Microsoft says that any board that can support DX12 Compute will be able to run the new API. But NVIDIA did point out that in its mind, even with a high-end SKU like the GTX 1080 Ti, the ray tracing performance will limit the ability to integrate ray tracing features and enhancements in real-time game engines in the immediate timeframe. It’s not to say it is impossible, or that some engine devs might spend the time to build something unique, but it is interesting to hear NVIDIA infer that only future products will benefit from ray tracing in games.

It’s also likely that we are months if not a year or more from seeing good integration of DXR in games at retail. And it is also possible that NVIDIA is downplaying the importance of DXR performance today if it happens to be slower than the Vega 64 in the upcoming Futuremark benchmark release.

05.jpg

Alongside the RTX announcement comes GameWorks Ray Tracing, a colleciton of turnkey modules based on DXR. GameWorks has its own reputation, and we aren't going to get into that here, but NVIDIA wants to think of this addition to it as a way to "turbo charge enablement" of ray tracing effects in games.

NVIDIA believes that developers are incredibly excited for the implementation of ray tracing into game engines, and that the demos being shown at GDC this week will blow us away. I am looking forward to seeing them and for getting the reactions of major game devs on the release of Microsoft’s new DXR API. The performance impact of ray tracing will still be a hindrance to larger scale implementations, but with DXR driving the direction with a unified standard, I still expect to see some games with revolutionary image quality by the end of the year. 

Source: NVIDIA

The GeForce Partner Program has some Kool-Aid it would like you to try

Subject: General Tech | March 8, 2018 - 03:26 PM |
Tagged: dirty pool, nvidia, gpp, GeForce Partner Program

[H]ard|OCP have posted an article looking at the brand new GeForce Partner Program which NVIDIA has announced that has a striking resemblance to a certain Intel initiative ... which turned out poorly.  After investigating the details for several weeks, including attempts to talk with OEMs and AIBs some serious concerns have been raised, including what seems to be a membership requirement to only sell NVIDIA GPUs in a product line which is aligned with GPP.  As membership to the GPP offers "high-effort engineering engagements -- early tech engagement -- launch partner status -- game bundling -- sales rebate programs -- social media and PR support -- marketing reports -- Marketing Development Funds (MDF)" this would cut out a company which chose to sell competitors products from quite a few things.

At this time NVIDIA has not responded to inquiries and the OEMs and AIBs which [H] spoke to declined to make any official comments; off the record there were serious concerns about the legality of this project.  Expect to hear more about this from various sites as they seek the transparency which NVIDIA Director John Teeple mentioned in his post.

special.PNG

"While we usually like to focus on all the wonderful and immersive worlds that video cards and their GPUs can open up to us, today we are tackling something a bit different. The GeForce Partner Program, known as GPP in the industry, is a "marketing" program that looks to HardOCP as being an anticompetitive tactic against AMD and Intel."

Here is some more Tech News from around the web:

Tech Talk

 

Source: [H]ard|OCP

Blender Foundation Releases Blender 2.79a

Subject: General Tech | March 4, 2018 - 04:55 PM |
Tagged: Blender, Volta, nvidia

Normally the “a” patch of Blender arrives much closer to the number release – about a month or so.

Five months after 2.79, however, the Blender Foundation has released 2.79a. It seemed likely that it would happen at some point, because it looks like they are aiming for 2.80 to be the next full release, and that will take some time. I haven’t had a chance to use 2.79a yet, but the release notes are mostly bug fixes and performance improvements.

blender-2017-cyclesdenoise.png

Glancing through the release notes, one noteworthy edition is that Blender 2.79a now includes the CUDA 9 SDK in its build process, and it includes work-arounds for “performance loss” with those devices. While I haven’t heard any complaints from Titan V owners, the lack of CUDA 8 SDK was a big problem for early owners of GeForce GTX 10X0 cards, so Volta users might have been suffering in silence until now. If you were having issues with the Titan V, then you should try 2.79a.

If you’re interested, be sure to check out the latest release. As always, it’s free.