Subject: General Tech, Graphics Cards, Networking, Shows and Expos | March 19, 2019 - 06:16 PM | Jeremy Hellstrom
Tagged: nvidia, t4, amazon, microsoft, NGC, Mellanox, CUDA-X, GTC, jen-hsun huang, DRIVE Constellation, ai
As part of their long list of announcements yesterday, NVIDIA revealed they are partnering with Cisco, Dell EMC, Fujitsu, Hewlett Packard Enterprise, Inspur, Lenovo and Sugon to provide servers powered by T4 Tensor Core GPUs, optimized to run their CUDA-X AI accelerators.
Those T4 GPUs have been on the market for a while but this marks the first major success for NVIDIA in the server room, with models available for purchase from those aforementioned companies. Those who prefer other people's servers can also benefit from these new products, with Amazon and Microsoft offering Cloud based solutions. Setting yourself up to run NVIDIA's NGC software may save a lot of money down the road, the cards sip a mere 70W of power which is rather more attractive than the consumption of a gaggle of Tesla V100s. One might be guilty of suspecting this offers an explanation for their recent acquisition of Mellanox.
NGC software offers more than just a platform to run optimizations on, it also offers a range of templates to start off with from classification, and object detection, through sentiment analysis and most other basic starting points for training a machine. Customers will also be able to upload their own models to share internally or, if in the mood, externally with other users and companies. It supports existing products such as TensorFlow and PyTorch but also offers access to CUDA-X AI, which as the name suggests takes advantage of the base design of the T4 GPU to reduce the amount of time waiting for results and letting users advance designs quickly.
If you are curious exactly what particular implementations of everyone's favourite buzzword might be, NVIDIA's DRIVE Constellation is a example after JoshTekk's own heart. Literally an way to create open, scalable simulation for large fleets of self-driving cars to train them ... for good one hopes. Currently the Toyota Research Institute-Advanced Development utilizes these products in the development of their next self-driving fleet, and NVIDIA obviously hopes others will follow suit.
There is not much to see from the perspective of a gamer in the short term, but considering NVIDIA's work at shifting the horsepower from the silicon you own to their own Cloud this will certainly impact the future of gaming from both a hardware and gameplay perspective. GPUs as a Service may not be the future many of us want but this suggests it could be possible, not to mention the dirty tricks enemy AIs will be able to pull with this processing power behind them.
One might even dream that escort missions could become less of a traumatic experience!
Subject: General Tech | March 18, 2019 - 10:00 PM | Sebastian Peak
Tagged: gameworks, unreal engine, Unity, rtx, ray tracing, nvidia, GTC 19, GTC, dxr, developers
Today at GTC NVIDIA announced GameWorks RTX and the implementation of real-time ray tracing in the upcoming Unreal Engine 4.22 and the latest version of Unity, currently in 2019.03.
NVIDIA Announces GameWorks RTX
While Pascal and non-RTX Turing support for real-time ray tracing is something of a bombshell from NVIDIA, the creation of GameWorks tools for such effects is not surprising.
“NVIDIA GameWorks RTX is a comprehensive set of tools that help developers implement real time ray-traced effects in games. GameWorks RTX is available to the developer community in open source form under the GameWorks license and includes plugins for Unreal Engine 4.22 and Unity’s 2019.03 preview release.”
NVIDIA lists these components of GameWorks RTX:
- RTX Denoiser SDK – a library that enables fast, real-time ray tracing by providing denoising techniques to lower the required ray count and samples per pixel. It includes algorithms for ray traced area light shadows, glossy reflections, ambient occlusion and diffuse global illumination.
- Nsight for RT – a standalone developer tool that saves developers time by helping to debug and profile graphics applications built with DXR and other supported APIs.
Unreal Engine and Unity Gaining Real-Time Ray Tracing Support
And while not specific to NVIDIA hardware, news of more game engines offering integrated DXR support was also announced during the keynote:
“Unreal Engine 4.22 is available in preview now, with final release details expected in Epic’s GDC keynote on Wednesday. Starting on April 4, Unity will offer optimized, production-focused, realtime ray tracing support with a custom experimental build available on GitHub to all users with full preview access in the 2019.03 Unity release. Real-time ray tracing support from other first-party AAA game engines includes DICE/EA’s Frostbite Engine, Remedy Entertainment’s Northlight Engine and engines from Crystal Dynamics, Kingsoft, Netease and others.”
RTX may have been off to a slow start, but this will apparently be the year of real-time ray tracing after all - especially with the upcoming NVIDIA driver update adding support to the GTX 10-series and new GTX 16-series GPUs.
Subject: Graphics Cards | March 18, 2019 - 09:41 PM | Sebastian Peak
Tagged: unreal engine, Unity, turing, rtx, ray tracing, pascal, nvidia, geforce, GTC 19, GTC, gaming, developers
Today at GTC NVIDIA announced a few things of particular interest to gamers, including GameWorks RTX and the implementation of real-time ray tracing in upcoming versions of both Unreal Engine and Unity (we already posted the news that CRYENGINE will be supporting real-time ray tracing as well). But there is something else... NVIDIA is bringing ray tracing support to GeForce GTX graphics cards.
This surprising turn means that hardware RT support won’t be limited to RTX cards after all, as the install base of NVIDIA ray-tracing GPUs “grows to tens of millions” with a simple driver update next month, adding the feature to both to previous-gen Pascal and the new Turing GTX GPUs.
How is this possible? It’s all about the programmable shaders:
“NVIDIA GeForce GTX GPUs powered by Pascal and Turing architectures will be able to take advantage of ray tracing-supported games via a driver expected in April. The new driver will enable tens of millions of GPUs for games that support real-time ray tracing, accelerating the growth of the technology and giving game developers a massive installed base.
With this driver, GeForce GTX GPUs will execute ray traced effects on shader cores. Game performance will vary based on the ray-traced effects and on the number of rays cast in the game, along with GPU model and game resolution. Games that support the Microsoft DXR and Vulkan APIs are all supported.
However, GeForce RTX GPUs, which have dedicated ray tracing cores built directly into the GPU, deliver the ultimate ray tracing experience. They provide up to 2-3x faster ray tracing performance with a more visually immersive gaming environment than GPUs without dedicated ray tracing cores.”
A very important caveat is that “2-3x faster ray tracing performance” for GeForce RTX graphics cards mentioned in the last paragraph, so expectations will need to be tempered as RT features will be less efficient running on shader cores (Pascal and Turing) than they are with dedicated cores, as demonstrated by these charts:
It's going to be a busy April.
Subject: General Tech | May 31, 2018 - 01:41 PM | Jeremy Hellstrom
Tagged: jen-hsun huang, GTC, HPC, nvswitch, tesla v100
Jen-Hsun Huang has a busy dance card right now, with several interesting tidbits hitting the news recently, including his statement in this DigiTimes post that GPU development is outstripping Moore's law. The GPU Technology Conference kicked off yesterday in Taiwan 2018, with NVIDIA showing off their brand new HGX-2 GPU which contains both AIs and HPCs with Deep Learnings a sure bet as well. Buzzwords aside, the new accelerator is made up of 16 Tesla V100 GPUs, a mere half terabyte of memory and NVIDIA's NVSwitch. Specialized products from Lenovo and Supermicro, to name a few, as well as cloud providers will also be picking up this newest peice of kit from NVIDIA.
For those less interested in HPC, there is an interesting tidbit of information about an event at Hot Chips, on August 20th Stuart Oberman will be talking about NVIDIA’s Next Generation Mainstream GPU with other sessions dealing with their IoT and fabric connections.
"But demand for that power is "growing, not slowing," thanks to AI, Huang said. "Before this time, software was written by humans and software engineers can only write so much software, but machines don't get tired," he said, adding that every single company in the world that develops software will need an AI supercomputer."
Here is some more Tech News from around the web:
Subject: Graphics Cards | March 29, 2018 - 09:52 PM | Scott Michaud
Tagged: nvidia, GTC, gp102, quadro p6000
At GTC 2018, Walt Disney Imagineering unveiled a work-in-progress clip of their upcoming Star Wars: Galaxy’s Edge attraction, which is expected to launch next year at Disneyland and Walt Disney World Resort. The cool part about this ride is that it will be using Unreal Engine 4 with eight, GP102-based Quadro P6000 graphics cards. NVIDIA also reports that Disney has donated the code back to Epic Games to help them with their multi-GPU scaling in general – a win for us consumers… in a more limited fashion.
See? SLI doesn’t need to be limited to two cards if you have a market cap of $100 billion USD.
Another interesting angle to this story is how typical PC components are contributing to these large experiences. Sure, Quadro hardware isn’t exactly cheap, but it can be purchased through typical retail channels and it allows the company to focus their engineering time elsewhere.
Ironically, this also comes about two decades after location-based entertainment started to decline… but, you know, it’s Disneyland and Disney World. They’re fine.
Subject: General Tech | March 29, 2018 - 03:10 PM | Tim Verry
Tagged: project trillium, nvidia, machine learning, iot, GTC 2018, GTC, deep learning, arm, ai
During GTC 2018 NVIDIA and ARM announced a partnership that will see ARM integrate NVIDIA's NVDLA deep learning inferencing accelerator into the company's Project Trillium machine learning processors. The NVIDIA Deep Learning Accelerator (NVDLA) is an open source modular architecture that is specifically optimized for inferencing operations such as object and voice recognition and bringing that acceleration to the wider ARM ecosystem through Project Trillium will enable a massive number of smarter phones, tablets, Internet-of-Things, and embedded devices that will be able to do inferencing at the edge which is to say without the complexity and latency of having to rely on cloud processing. This means potentially smarter voice assistants (e.g. Alexa, Google), doorbell cameras, lighting, and security around the home and out-and-about on your phone for better AR, natural translation, and assistive technologies.
Karl Freund, lead analyst for deep learning at Moor Insights & Strategy was quoted in the press release in stating:
“This is a win/win for IoT, mobile and embedded chip companies looking to design accelerated AI inferencing solutions. NVIDIA is the clear leader in ML training and Arm is the leader in IoT end points, so it makes a lot of sense for them to partner on IP.”
ARM's Project Trillium was announced back in February and is a suite of IP for processors optimized for parallel low latency workloads and includes a Machine Learning processor, Object Detection processor, and neural network software libraries. NVDLA is a hardware and software platform based upon the Xavier SoC that is highly modular and configurable hardware that can feature a convolution core, single data processor, planar data processor, channel data processor, and data reshape engines. The NVDLA can be configured with all or only some of those elements and they can independently them up or down depending on what processing acceleration they need for their devices. NVDLA connects to the main system processor over a control interface and through two AXI memory interfaces (one optional) that connect to system memory and (optionally) dedicated high bandwidth memory (not necessarily HBM but just its own SRAM for example).
NVDLA is presented as a free and open source architecture that promotes a standard way to design deep learning inferencing that can accelerate operations to infer results from trained neural networks (with the training being done on other devices perhaps by the DGX-2). The project, which hosts the code on GitHub and encourages community contributions, goes beyond the Xavier-based hardware and includes things like drivers, libraries, TensorRT support (upcoming) for Google's TensorFlow acceleration, testing suites and SDKs as well as a deep learning training infrastructure (for the training side of things) that is compatible with the NVDLA software and hardware, and system integration support.
Bringing the "smarts" of smart devices to the local hardware and closer to the users should mean much better performance and using specialized accelerators will reportedly offer the performance levels needed without blowing away low power budgets. Internet-of-Things (IoT) and mobile devices are not going away any time soon, and the partnership between NVIDIA and ARM should make it easier for developers and chip companies to offer smarter (and please tell me more secure!) smart devices.
- NVDLA Primer
- Project Trillium: Machine Learning on ARM
- NVIDIA Announces DGX-2 with 16 GV100s & 8 100Gb NICs
- GTC 2018: NVIDIA Announces Volta-Powered Quadro GV100
- NVIDIA Teases Low Power, High Performance Xavier SoC That Will Power Future Autonomous Vehicles
- NVIDIA Launches Jetson TX2 With Pascal GPU For Embedded Devices
- ARM Announces Project Trillium, a New Dedicated AI Processing Family
Subject: General Tech | March 27, 2018 - 03:30 PM | Ken Addison
Tagged: nvidia, GTC, quadro, gv100, GP100, tesla, titan v, v100, votla
One of the big missing markets for NVIDIA with their slow rollout of the Volta architecture was professional workstations. Today, NVIDIA announced they are bringing Volta to the Quadro family with the Quadro GV100 card.
Powered by the same GV100 GPU that announced at last year's GTC in the Tesla V100, and late last year in the Titan V, the Quadro GV100 represents a leap forward in computing power for workstation-level applications. While these users could currently be using TITAN V for similar workloads, as we've seen in the past, Quadro drivers generally provide big performance advantages in these sorts of applications. Although, we'd love to see NVIDIA repeat their move of bringing these optimizations to the TITAN lineup as they did with the TITAN Xp.
As it is a Quadro, we would expect this to be NVIDIA's first Volta-powered product which provides certified, professional driver code paths for applications such as CATIA, Solidedge, and more.
NVIDIA also heavily promoted the idea of using two of these GV100 cards in one system, utilizing NVLink. Considering the lack of NVLink support for the TITAN V, this is also the first time we've seen a Volta card with display outputs supporting NVLink in more standard workstations.
More importantly, this announcement brings NVIDIA's RTX technology to the professional graphics market.
With popular rendering applications like V-Ray already announcing and integrating support for NVIDIA's Optix Raytracing denoiser in their beta branch, it seems only a matter of time before we'll see a broad suite of professional applications supporting RTX technology for real-time. For example, raytraced renders of items being designed in CAD and modeling applications.
This sort of speed represents a potential massive win for professional users, who won't have to waste time waiting for preview renderings to complete to continue iterating on their projects.
The NVIDIA Quadro GV100 is available now directly from NVIDIA now for a price of $8,999, which puts it squarely in the same price range of the previous highest-end Quadro GP100.
93% of a GP100 at least...
NVIDIA has announced the Tesla P100, the company's newest (and most powerful) accelerator for HPC. Based on the Pascal GP100 GPU, the Tesla P100 is built on 16nm FinFET and uses HBM2.
NVIDIA provided a comparison table, which we added what we know about a full GP100 to:
|Tesla K40||Tesla M40||Tesla P100||Full GP100|
|GPU||GK110 (Kepler)||GM200 (Maxwell)||GP100 (Pascal)||GP100 (Pascal)|
|FP32 CUDA Cores / SM||192||128||64||64|
|FP32 CUDA Cores / GPU||2880||3072||3584||3840|
|FP64 CUDA Cores / SM||64||4||32||32|
|FP64 CUDA Cores / GPU||960||96||1792||1920|
|Base Clock||745 MHz||948 MHz||1328 MHz||TBD|
|GPU Boost Clock||810/875 MHz||1114 MHz||1480 MHz||TBD|
|Memory Interface||384-bit GDDR5||384-bit GDDR5||4096-bit HBM2||4096-bit HBM2|
|Memory Size||Up to 12 GB||Up to 24 GB||16 GB||TBD|
|L2 Cache Size||1536 KB||3072 KB||4096 KB||TBD|
|Register File Size / SM||256 KB||256 KB||256 KB||256 KB|
|Register File Size / GPU||3840 KB||6144 KB||14336 KB||15360 KB|
|TDP||235 W||250 W||300 W||TBD|
|Transistors||7.1 billion||8 billion||15.3 billion||15.3 billion|
|GPU Die Size||551 mm2||601 mm2||610 mm2||610mm2|
|Manufacturing Process||28 nm||28 nm||16 nm||16nm|
This table is designed for developers that are interested in GPU compute, so a few variables (like ROPs) are still unknown, but it still gives us a huge insight into the “big Pascal” architecture. The jump to 16nm allows for about twice the number of transistors, 15.3 billion, up from 8 billion with GM200, with roughly the same die area, 610 mm2, up from 601 mm2.
A full GP100 processor will have 60 shader modules, compared to GM200's 24, although Pascal stores half of the shaders per SM. The GP100 part that is listed in the table above is actually partially disabled, cutting off four of the sixty total. This leads to 3584 single-precision (32-bit) CUDA cores, which is up from 3072 in GM200. (The full GP100 architecture will have 3840 of these FP32 CUDA cores -- but we don't know when or where we'll see that.) The base clock is also significantly higher than Maxwell, 1328 MHz versus ~1000 MHz for the Titan X and 980 Ti, although Ryan has overclocked those GPUs to ~1390 MHz with relative ease. This is interesting, because even though 10.6 TeraFLOPs is amazing, it's only about 20% more than what GM200 could pull off with an overclock.
Subject: Graphics Cards | March 17, 2015 - 01:47 PM | Ryan Shrout
Tagged: pascal, nvidia, gtc 2015, GTC, geforce
At the keynote of the GPU Technology Conference (GTC) today, NVIDIA CEO Jen-Hsun Huang disclosed some more updates on the roadmap for future GPU technologies.
Most of the detail was around Pascal, due in 2016, that will introduce three new features including mixed compute precision, 3D (stacked) memory, and NVLink. Mixed precision is a method of computing in FP16, allowing calculations to run much faster at lower accuracy than full single or double precision when they are not necessary. Keeping in mind that Maxwell doesn't have an implementation with full speed DP compute (today), it would seem that NVIDIA is targeting different compute tasks moving forward. Though details are short, mixed precision would likely indicate processing cores than can handle both data types.
3D memory is the ability to put memory on-die with the GPU directly to improve overall memory banwidth. The visual diagram that NVIDIA showed on stage indicated that Pascal would have 750 GB/s of bandwidth, compared to 300-350 GB/s on Maxwell today.
NVLink is a new way of connecting GPUs, improving on bandwidth by more than 5x over current implementations of PCI Express. They claim this will allow for connecting as many as 8 GPUs for deep learning performance improvements (up to 10x). What that means for gaming has yet to be discussed.
NVIDIA made some other interesting claims as well. Pascal will be more than 2x more performance per watt efficient than Maxwell, even without the three new features listed above. It will also ship (in a compute targeted product) with a 32GB memory system compared to the 12GB of memory announced on the Titan X today. Pascal will also have 4x the performance in mixed precision compute.
Subject: Graphics Cards, Shows and Expos | March 17, 2015 - 10:31 AM | Ryan Shrout
Tagged: nvidia, video, GTC, gtc 2015
NVIDIA is streaming today's keynote from the GPU Technology Conference (GTC) on Ustream, and we have the embed below for you to take part. NVIDIA CEO Jen-Hsun Huang will reveal the details about the new GeForce GTX TITAN X but there are going to be other announcements as well, including one featuring Tesla CEO Elon Musk.
Should be interesting!