NVIDIA is ready to storm the server room

Subject: General Tech, Graphics Cards, Networking, Shows and Expos | March 19, 2019 - 06:16 PM |
Tagged: nvidia, t4, amazon, microsoft, NGC, Mellanox, CUDA-X, GTC, jen-hsun huang, DRIVE Constellation, ai

As part of their long list of announcements yesterday, NVIDIA revealed they are partnering with Cisco, Dell EMC, Fujitsu, Hewlett Packard Enterprise, Inspur, Lenovo and Sugon to provide servers powered by T4 Tensor Core GPUs, optimized to run their CUDA-X AI accelerators. 

t4.PNG

Those T4 GPUs have been on the market for a while but this marks the first major success for NVIDIA in the server room, with models available for purchase from those aforementioned companies.  Those who prefer other people's servers can also benefit from these new products, with Amazon and Microsoft offering Cloud based solutions.  Setting yourself up to run NVIDIA's NGC software may save a lot of money down the road, the cards sip a mere 70W of power which is rather more attractive than the consumption of a gaggle of Tesla V100s.  One might be guilty of suspecting this offers an explanation for their recent acquisition of Mellanox.

purty.PNG

NGC software offers more than just a platform to run optimizations on, it also offers a range of templates to start off with from classification, and object detection, through sentiment analysis and most other basic starting points for training a machine.  Customers will also be able to upload their own models to share internally or, if in the mood, externally with other users and companies.  It supports existing products such as TensorFlow and PyTorch but also offers access to CUDA-X AI, which as the name suggests takes advantage of the base design of the T4 GPU to reduce the amount of time waiting for results and letting users advance designs quickly. 

memorex.PNG

If you are curious exactly what particular implementations of everyone's favourite buzzword might be, NVIDIA's DRIVE Constellation is a example after JoshTekk's own heart.  Literally an way to create open, scalable simulation for large fleets of self-driving cars to train them ... for good one hopes.  Currently the Toyota Research Institute-Advanced Development utilizes these products in the development of their next self-driving fleet, and NVIDIA obviously hopes others will follow suit. 

replaced.PNG

There is not much to see from the perspective of a gamer in the short term, but considering NVIDIA's work at shifting the horsepower from the silicon you own to their own Cloud this will certainly impact the future of gaming from both a hardware and gameplay perspective.  GPUs as a Service may not be the future many of us want but this suggests it could be possible, not to mention the dirty tricks enemy AIs will be able to pull with this processing power behind them.

One might even dream that escort missions could become less of a traumatic experience!

Source: NVIDIA

GTC 19: NVIDIA Announces GameWorks RTX; Unreal Engine and Unity Gain DXR Support

Subject: General Tech | March 18, 2019 - 10:00 PM |
Tagged: gameworks, unreal engine, Unity, rtx, ray tracing, nvidia, GTC 19, GTC, dxr, developers

Today at GTC NVIDIA announced GameWorks RTX and the implementation of real-time ray tracing in the upcoming Unreal Engine 4.22 and the latest version of Unity, currently in 2019.03.

NVIDIA Announces GameWorks RTX

While Pascal and non-RTX Turing support for real-time ray tracing is something of a bombshell from NVIDIA, the creation of GameWorks tools for such effects is not surprising.

“NVIDIA GameWorks RTX is a comprehensive set of tools that help developers implement real time ray-traced effects in games. GameWorks RTX is available to the developer community in open source form under the GameWorks license and includes plugins for Unreal Engine 4.22 and Unity’s 2019.03 preview release.”

NVIDIA lists these components of GameWorks RTX:

GW_RTX.PNG

  • RTX Denoiser SDK – a library that enables fast, real-time ray tracing by providing denoising techniques to lower the required ray count and samples per pixel. It includes algorithms for ray traced area light shadows, glossy reflections, ambient occlusion and diffuse global illumination.
  • Nsight for RT – a standalone developer tool that saves developers time by helping to debug and profile graphics applications built with DXR and other supported APIs.

Unreal Engine and Unity Gaining Real-Time Ray Tracing Support

DXR_GAME_ENGINES.png

And while not specific to NVIDIA hardware, news of more game engines offering integrated DXR support was also announced during the keynote:

“Unreal Engine 4.22 is available in preview now, with final release details expected in Epic’s GDC keynote on Wednesday. Starting on April 4, Unity will offer optimized, production-focused, realtime ray tracing support with a custom experimental build available on GitHub to all users with full preview access in the 2019.03 Unity release. Real-time ray tracing support from other first-party AAA game engines includes DICE/EA’s Frostbite Engine, Remedy Entertainment’s Northlight Engine and engines from Crystal Dynamics, Kingsoft, Netease and others.”

RTX may have been off to a slow start, but this will apparently be the year of real-time ray tracing after all - especially with the upcoming NVIDIA driver update adding support to the GTX 10-series and new GTX 16-series GPUs.

Source: NVIDIA

NVIDIA to Add Real-Time Ray Tracing Support to Pascal GPUs via April Driver Update

Subject: Graphics Cards | March 18, 2019 - 09:41 PM |
Tagged: unreal engine, Unity, turing, rtx, ray tracing, pascal, nvidia, geforce, GTC 19, GTC, gaming, developers

Today at GTC NVIDIA announced a few things of particular interest to gamers, including GameWorks RTX and the implementation of real-time ray tracing in upcoming versions of both Unreal Engine and Unity (we already posted the news that CRYENGINE will be supporting real-time ray tracing as well). But there is something else... NVIDIA is bringing ray tracing support to GeForce GTX graphics cards.

DXR_GPUs.png

This surprising turn means that hardware RT support won’t be limited to RTX cards after all, as the install base of NVIDIA ray-tracing GPUs “grows to tens of millions” with a simple driver update next month, adding the feature to both to previous-gen Pascal and the new Turing GTX GPUs.

How is this possible? It’s all about the programmable shaders:

“NVIDIA GeForce GTX GPUs powered by Pascal and Turing architectures will be able to take advantage of ray tracing-supported games via a driver expected in April. The new driver will enable tens of millions of GPUs for games that support real-time ray tracing, accelerating the growth of the technology and giving game developers a massive installed base.

With this driver, GeForce GTX GPUs will execute ray traced effects on shader cores. Game performance will vary based on the ray-traced effects and on the number of rays cast in the game, along with GPU model and game resolution. Games that support the Microsoft DXR and Vulkan APIs are all supported.

However, GeForce RTX GPUs, which have dedicated ray tracing cores built directly into the GPU, deliver the ultimate ray tracing experience. They provide up to 2-3x faster ray tracing performance with a more visually immersive gaming environment than GPUs without dedicated ray tracing cores.”

A very important caveat is that “2-3x faster ray tracing performance” for GeForce RTX graphics cards mentioned in the last paragraph, so expectations will need to be tempered as RT features will be less efficient running on shader cores (Pascal and Turing) than they are with dedicated cores, as demonstrated by these charts:

BFV_CHART.png

METRO_EXODUS_CHART.png

SOTTR_CHART.png

It's going to be a busy April.

Source: NVIDIA

NVIDIA in the news

Subject: General Tech | May 31, 2018 - 01:41 PM |
Tagged: jen-hsun huang, GTC, HPC, nvswitch, tesla v100

Jen-Hsun Huang has a busy dance card right now, with several interesting tidbits hitting the news recently, including his statement in this DigiTimes post that GPU development is outstripping Moore's law. The GPU Technology Conference kicked off yesterday in Taiwan 2018, with NVIDIA showing off their brand new HGX-2 GPU which contains both AIs and HPCs with Deep Learnings a sure bet as well.  Buzzwords aside, the new accelerator is made up of 16  Tesla V100 GPUs, a mere half terabyte of memory and NVIDIA's NVSwitch.   Specialized products from Lenovo and Supermicro, to name a few, as well as cloud providers will also be picking up this newest peice of kit from NVIDIA. 

For those less interested in HPC, there is an interesting tidbit of information about an event at Hot Chips, on August 20th Stuart Oberman will be talking about NVIDIA’s Next Generation Mainstream GPU with other sessions dealing with their IoT and fabric connections.

asdasd.PNG

"But demand for that power is "growing, not slowing," thanks to AI, Huang said. "Before this time, software was written by humans and software engineers can only write so much software, but machines don't get tired," he said, adding that every single company in the world that develops software will need an AI supercomputer."

Here is some more Tech News from around the web:

Tech Talk

Source: NVIDIA

Eight-GPU SLI in Unreal Engine 4 (Yes There Is a Catch)

Subject: Graphics Cards | March 29, 2018 - 09:52 PM |
Tagged: nvidia, GTC, gp102, quadro p6000

At GTC 2018, Walt Disney Imagineering unveiled a work-in-progress clip of their upcoming Star Wars: Galaxy’s Edge attraction, which is expected to launch next year at Disneyland and Walt Disney World Resort. The cool part about this ride is that it will be using Unreal Engine 4 with eight, GP102-based Quadro P6000 graphics cards. NVIDIA also reports that Disney has donated the code back to Epic Games to help them with their multi-GPU scaling in general – a win for us consumers… in a more limited fashion.

nvidia-2018-GTC-starwars-8-way-sli.jpg

See? SLI doesn’t need to be limited to two cards if you have a market cap of $100 billion USD.

Another interesting angle to this story is how typical PC components are contributing to these large experiences. Sure, Quadro hardware isn’t exactly cheap, but it can be purchased through typical retail channels and it allows the company to focus their engineering time elsewhere.

Ironically, this also comes about two decades after location-based entertainment started to decline… but, you know, it’s Disneyland and Disney World. They’re fine.

Source: NVIDIA

GTC 2018: Nvidia and ARM Integrating NVDLA Into Project Trillium For Inferencing at the Edge

Subject: General Tech | March 29, 2018 - 03:10 PM |
Tagged: project trillium, nvidia, machine learning, iot, GTC 2018, GTC, deep learning, arm, ai

During GTC 2018 NVIDIA and ARM announced a partnership that will see ARM integrate NVIDIA's NVDLA deep learning inferencing accelerator into the company's Project Trillium machine learning processors. The NVIDIA Deep Learning Accelerator (NVDLA) is an open source modular architecture that is specifically optimized for inferencing operations such as object and voice recognition and bringing that acceleration to the wider ARM ecosystem through Project Trillium will enable a massive number of smarter phones, tablets, Internet-of-Things, and embedded devices that will be able to do inferencing at the edge which is to say without the complexity and latency of having to rely on cloud processing. This means potentially smarter voice assistants (e.g. Alexa, Google), doorbell cameras, lighting, and security around the home and out-and-about on your phone for better AR, natural translation, and assistive technologies.

NVIDIAandARM_NVDLA.jpg

Karl Freund, lead analyst for deep learning at Moor Insights & Strategy was quoted in the press release in stating:

“This is a win/win for IoT, mobile and embedded chip companies looking to design accelerated AI inferencing solutions. NVIDIA is the clear leader in ML training and Arm is the leader in IoT end points, so it makes a lot of sense for them to partner on IP.”

ARM's Project Trillium was announced back in February and is a suite of IP for processors optimized for parallel low latency workloads and includes a Machine Learning processor, Object Detection processor, and neural network software libraries. NVDLA is a hardware and software platform based upon the Xavier SoC that is highly modular and configurable hardware that can feature a convolution core, single data processor, planar data processor, channel data processor, and data reshape engines. The NVDLA can be configured with all or only some of those elements and they can independently them up or down depending on what processing acceleration they need for their devices. NVDLA connects to the main system processor over a control interface and through two AXI memory interfaces (one optional) that connect to system memory and (optionally) dedicated high bandwidth memory (not necessarily HBM but just its own SRAM for example).

arm project trillium integrates NVDLA.jpg

NVDLA is presented as a free and open source architecture that promotes a standard way to design deep learning inferencing that can accelerate operations to infer results from trained neural networks (with the training being done on other devices perhaps by the DGX-2). The project, which hosts the code on GitHub and encourages community contributions, goes beyond the Xavier-based hardware and includes things like drivers, libraries, TensorRT support (upcoming)  for Google's TensorFlow acceleration, testing suites and SDKs as well as a deep learning training infrastructure (for the training side of things) that is compatible with the NVDLA software and hardware, and system integration support.

Bringing the "smarts" of smart devices to the local hardware and closer to the users should mean much better performance and using specialized accelerators will reportedly offer the performance levels needed without blowing away low power budgets. Internet-of-Things (IoT) and mobile devices are not going away any time soon, and the partnership between NVIDIA and ARM should make it easier for developers and chip companies to offer smarter (and please tell me more secure!) smart devices.

Also read:

Source: NVIDIA

GTC 2018: NVIDIA Announces Volta-Powered Quadro GV100

Subject: General Tech | March 27, 2018 - 03:30 PM |
Tagged: nvidia, GTC, quadro, gv100, GP100, tesla, titan v, v100, votla

One of the big missing markets for NVIDIA with their slow rollout of the Volta architecture was professional workstations. Today, NVIDIA announced they are bringing Volta to the Quadro family with the Quadro GV100 card.

27-gv100-gpu.jpg

Powered by the same GV100 GPU that announced at last year's GTC in the Tesla V100, and late last year in the Titan V, the Quadro GV100 represents a leap forward in computing power for workstation-level applications. While these users could currently be using TITAN V for similar workloads, as we've seen in the past, Quadro drivers generally provide big performance advantages in these sorts of applications. Although, we'd love to see NVIDIA repeat their move of bringing these optimizations to the TITAN lineup as they did with the TITAN Xp.

As it is a Quadro, we would expect this to be NVIDIA's first Volta-powered product which provides certified, professional driver code paths for applications such as CATIA, Solidedge, and more.

quadro-gv100.png

NVIDIA also heavily promoted the idea of using two of these GV100 cards in one system, utilizing NVLink. Considering the lack of NVLink support for the TITAN V, this is also the first time we've seen a Volta card with display outputs supporting NVLink in more standard workstations.

More importantly, this announcement brings NVIDIA's RTX technology to the professional graphics market. 

With popular rendering applications like V-Ray already announcing and integrating support for NVIDIA's Optix Raytracing denoiser in their beta branch, it seems only a matter of time before we'll see a broad suite of professional applications supporting RTX technology for real-time. For example, raytraced renders of items being designed in CAD and modeling applications. 

This sort of speed represents a potential massive win for professional users, who won't have to waste time waiting for preview renderings to complete to continue iterating on their projects.

The NVIDIA Quadro GV100 is available now directly from NVIDIA now for a price of $8,999, which puts it squarely in the same price range of the previous highest-end Quadro GP100. 

Source: NVIDIA
Manufacturer: NVIDIA

93% of a GP100 at least...

NVIDIA has announced the Tesla P100, the company's newest (and most powerful) accelerator for HPC. Based on the Pascal GP100 GPU, the Tesla P100 is built on 16nm FinFET and uses HBM2.

nvidia-2016-gtc-pascal-banner.png

NVIDIA provided a comparison table, which we added what we know about a full GP100 to:

  Tesla K40 Tesla M40 Tesla P100 Full GP100
GPU GK110 (Kepler) GM200 (Maxwell) GP100 (Pascal) GP100 (Pascal)
SMs 15 24 56 60
TPCs 15 24 28 (30?)
FP32 CUDA Cores / SM 192 128 64 64
FP32 CUDA Cores / GPU 2880 3072 3584 3840
FP64 CUDA Cores / SM 64 4 32 32
FP64 CUDA Cores / GPU 960 96 1792 1920
Base Clock 745 MHz 948 MHz 1328 MHz TBD
GPU Boost Clock 810/875 MHz 1114 MHz 1480 MHz TBD
FP64 GFLOPS 1680 213 5304 TBD
Texture Units 240 192 224 240
Memory Interface 384-bit GDDR5 384-bit GDDR5 4096-bit HBM2 4096-bit HBM2
Memory Size Up to 12 GB Up to 24 GB 16 GB TBD
L2 Cache Size 1536 KB 3072 KB 4096 KB TBD
Register File Size / SM 256 KB 256 KB 256 KB 256 KB
Register File Size / GPU 3840 KB 6144 KB 14336 KB 15360 KB
TDP 235 W 250 W 300 W TBD
Transistors 7.1 billion 8 billion 15.3 billion 15.3 billion
GPU Die Size 551 mm2 601 mm2 610 mm2 610mm2
Manufacturing Process 28 nm 28 nm 16 nm 16nm

This table is designed for developers that are interested in GPU compute, so a few variables (like ROPs) are still unknown, but it still gives us a huge insight into the “big Pascal” architecture. The jump to 16nm allows for about twice the number of transistors, 15.3 billion, up from 8 billion with GM200, with roughly the same die area, 610 mm2, up from 601 mm2.

nvidia-2016-gp100_block_diagram-1-624x368.png

A full GP100 processor will have 60 shader modules, compared to GM200's 24, although Pascal stores half of the shaders per SM. The GP100 part that is listed in the table above is actually partially disabled, cutting off four of the sixty total. This leads to 3584 single-precision (32-bit) CUDA cores, which is up from 3072 in GM200. (The full GP100 architecture will have 3840 of these FP32 CUDA cores -- but we don't know when or where we'll see that.) The base clock is also significantly higher than Maxwell, 1328 MHz versus ~1000 MHz for the Titan X and 980 Ti, although Ryan has overclocked those GPUs to ~1390 MHz with relative ease. This is interesting, because even though 10.6 TeraFLOPs is amazing, it's only about 20% more than what GM200 could pull off with an overclock.

Continue reading our preview of the NVIDIA Pascal architecture!!

GTC 2015: NVIDIA Roadmap Shows Pascal with 3D Memory, NVLink and Mixed Precision Compute

Subject: Graphics Cards | March 17, 2015 - 01:47 PM |
Tagged: pascal, nvidia, gtc 2015, GTC, geforce

At the keynote of the GPU Technology Conference (GTC) today, NVIDIA CEO Jen-Hsun Huang disclosed some more updates on the roadmap for future GPU technologies.

GTC-36.jpg

Most of the detail was around Pascal, due in 2016, that will introduce three new features including mixed compute precision, 3D (stacked) memory, and NVLink. Mixed precision is a method of computing in FP16, allowing calculations to run much faster at lower accuracy than full single or double precision when they are not necessary. Keeping in mind that Maxwell doesn't have an implementation with full speed DP compute (today), it would seem that NVIDIA is targeting different compute tasks moving forward. Though details are short, mixed precision would likely indicate processing cores than can handle both data types.

3D memory is the ability to put memory on-die with the GPU directly to improve overall memory banwidth. The visual diagram that NVIDIA showed on stage indicated that Pascal would have 750 GB/s of bandwidth, compared to 300-350 GB/s on Maxwell today.

NVLink is a new way of connecting GPUs, improving on bandwidth by more than 5x over current implementations of PCI Express. They claim this will allow for connecting as many as 8 GPUs for deep learning performance improvements (up to 10x). What that means for gaming has yet to be discussed.

GTC-38.jpg

NVIDIA made some other interesting claims as well. Pascal will be more than 2x more performance per watt efficient than Maxwell, even without the three new features listed above. It will also ship (in a compute targeted product) with a 32GB memory system compared to the 12GB of memory announced on the Titan X today. Pascal will also have 4x the performance in mixed precision compute.

Watch NVIDIA Reveal the GTX TITAN X at GTC 2015

Subject: Graphics Cards, Shows and Expos | March 17, 2015 - 10:31 AM |
Tagged: nvidia, video, GTC, gtc 2015

NVIDIA is streaming today's keynote from the GPU Technology Conference (GTC) on Ustream, and we have the embed below for you to take part. NVIDIA CEO Jen-Hsun Huang will reveal the details about the new GeForce GTX TITAN X but there are going to be other announcements as well, including one featuring Tesla CEO Elon Musk.

Should be interesting!

Source: NVIDIA