Subject: General Tech | May 31, 2018 - 01:41 PM | Jeremy Hellstrom
Tagged: jen-hsun huang, GTC, HPC, nvswitch, tesla v100
Jen-Hsun Huang has a busy dance card right now, with several interesting tidbits hitting the news recently, including his statement in this DigiTimes post that GPU development is outstripping Moore's law. The GPU Technology Conference kicked off yesterday in Taiwan 2018, with NVIDIA showing off their brand new HGX-2 GPU which contains both AIs and HPCs with Deep Learnings a sure bet as well. Buzzwords aside, the new accelerator is made up of 16 Tesla V100 GPUs, a mere half terabyte of memory and NVIDIA's NVSwitch. Specialized products from Lenovo and Supermicro, to name a few, as well as cloud providers will also be picking up this newest peice of kit from NVIDIA.
For those less interested in HPC, there is an interesting tidbit of information about an event at Hot Chips, on August 20th Stuart Oberman will be talking about NVIDIA’s Next Generation Mainstream GPU with other sessions dealing with their IoT and fabric connections.
"But demand for that power is "growing, not slowing," thanks to AI, Huang said. "Before this time, software was written by humans and software engineers can only write so much software, but machines don't get tired," he said, adding that every single company in the world that develops software will need an AI supercomputer."
Here is some more Tech News from around the web:
Subject: Graphics Cards | March 29, 2018 - 09:52 PM | Scott Michaud
Tagged: nvidia, GTC, gp102, quadro p6000
At GTC 2018, Walt Disney Imagineering unveiled a work-in-progress clip of their upcoming Star Wars: Galaxy’s Edge attraction, which is expected to launch next year at Disneyland and Walt Disney World Resort. The cool part about this ride is that it will be using Unreal Engine 4 with eight, GP102-based Quadro P6000 graphics cards. NVIDIA also reports that Disney has donated the code back to Epic Games to help them with their multi-GPU scaling in general – a win for us consumers… in a more limited fashion.
See? SLI doesn’t need to be limited to two cards if you have a market cap of $100 billion USD.
Another interesting angle to this story is how typical PC components are contributing to these large experiences. Sure, Quadro hardware isn’t exactly cheap, but it can be purchased through typical retail channels and it allows the company to focus their engineering time elsewhere.
Ironically, this also comes about two decades after location-based entertainment started to decline… but, you know, it’s Disneyland and Disney World. They’re fine.
Subject: General Tech | March 29, 2018 - 03:10 PM | Tim Verry
Tagged: project trillium, nvidia, machine learning, iot, GTC 2018, GTC, deep learning, arm, ai
During GTC 2018 NVIDIA and ARM announced a partnership that will see ARM integrate NVIDIA's NVDLA deep learning inferencing accelerator into the company's Project Trillium machine learning processors. The NVIDIA Deep Learning Accelerator (NVDLA) is an open source modular architecture that is specifically optimized for inferencing operations such as object and voice recognition and bringing that acceleration to the wider ARM ecosystem through Project Trillium will enable a massive number of smarter phones, tablets, Internet-of-Things, and embedded devices that will be able to do inferencing at the edge which is to say without the complexity and latency of having to rely on cloud processing. This means potentially smarter voice assistants (e.g. Alexa, Google), doorbell cameras, lighting, and security around the home and out-and-about on your phone for better AR, natural translation, and assistive technologies.
Karl Freund, lead analyst for deep learning at Moor Insights & Strategy was quoted in the press release in stating:
“This is a win/win for IoT, mobile and embedded chip companies looking to design accelerated AI inferencing solutions. NVIDIA is the clear leader in ML training and Arm is the leader in IoT end points, so it makes a lot of sense for them to partner on IP.”
ARM's Project Trillium was announced back in February and is a suite of IP for processors optimized for parallel low latency workloads and includes a Machine Learning processor, Object Detection processor, and neural network software libraries. NVDLA is a hardware and software platform based upon the Xavier SoC that is highly modular and configurable hardware that can feature a convolution core, single data processor, planar data processor, channel data processor, and data reshape engines. The NVDLA can be configured with all or only some of those elements and they can independently them up or down depending on what processing acceleration they need for their devices. NVDLA connects to the main system processor over a control interface and through two AXI memory interfaces (one optional) that connect to system memory and (optionally) dedicated high bandwidth memory (not necessarily HBM but just its own SRAM for example).
NVDLA is presented as a free and open source architecture that promotes a standard way to design deep learning inferencing that can accelerate operations to infer results from trained neural networks (with the training being done on other devices perhaps by the DGX-2). The project, which hosts the code on GitHub and encourages community contributions, goes beyond the Xavier-based hardware and includes things like drivers, libraries, TensorRT support (upcoming) for Google's TensorFlow acceleration, testing suites and SDKs as well as a deep learning training infrastructure (for the training side of things) that is compatible with the NVDLA software and hardware, and system integration support.
Bringing the "smarts" of smart devices to the local hardware and closer to the users should mean much better performance and using specialized accelerators will reportedly offer the performance levels needed without blowing away low power budgets. Internet-of-Things (IoT) and mobile devices are not going away any time soon, and the partnership between NVIDIA and ARM should make it easier for developers and chip companies to offer smarter (and please tell me more secure!) smart devices.
- NVDLA Primer
- Project Trillium: Machine Learning on ARM
- NVIDIA Announces DGX-2 with 16 GV100s & 8 100Gb NICs
- GTC 2018: NVIDIA Announces Volta-Powered Quadro GV100
- NVIDIA Teases Low Power, High Performance Xavier SoC That Will Power Future Autonomous Vehicles
- NVIDIA Launches Jetson TX2 With Pascal GPU For Embedded Devices
- ARM Announces Project Trillium, a New Dedicated AI Processing Family
Subject: General Tech | March 27, 2018 - 03:30 PM | Ken Addison
Tagged: nvidia, GTC, quadro, gv100, GP100, tesla, titan v, v100, votla
One of the big missing markets for NVIDIA with their slow rollout of the Volta architecture was professional workstations. Today, NVIDIA announced they are bringing Volta to the Quadro family with the Quadro GV100 card.
Powered by the same GV100 GPU that announced at last year's GTC in the Tesla V100, and late last year in the Titan V, the Quadro GV100 represents a leap forward in computing power for workstation-level applications. While these users could currently be using TITAN V for similar workloads, as we've seen in the past, Quadro drivers generally provide big performance advantages in these sorts of applications. Although, we'd love to see NVIDIA repeat their move of bringing these optimizations to the TITAN lineup as they did with the TITAN Xp.
As it is a Quadro, we would expect this to be NVIDIA's first Volta-powered product which provides certified, professional driver code paths for applications such as CATIA, Solidedge, and more.
NVIDIA also heavily promoted the idea of using two of these GV100 cards in one system, utilizing NVLink. Considering the lack of NVLink support for the TITAN V, this is also the first time we've seen a Volta card with display outputs supporting NVLink in more standard workstations.
More importantly, this announcement brings NVIDIA's RTX technology to the professional graphics market.
With popular rendering applications like V-Ray already announcing and integrating support for NVIDIA's Optix Raytracing denoiser in their beta branch, it seems only a matter of time before we'll see a broad suite of professional applications supporting RTX technology for real-time. For example, raytraced renders of items being designed in CAD and modeling applications.
This sort of speed represents a potential massive win for professional users, who won't have to waste time waiting for preview renderings to complete to continue iterating on their projects.
The NVIDIA Quadro GV100 is available now directly from NVIDIA now for a price of $8,999, which puts it squarely in the same price range of the previous highest-end Quadro GP100.
93% of a GP100 at least...
NVIDIA has announced the Tesla P100, the company's newest (and most powerful) accelerator for HPC. Based on the Pascal GP100 GPU, the Tesla P100 is built on 16nm FinFET and uses HBM2.
NVIDIA provided a comparison table, which we added what we know about a full GP100 to:
|Tesla K40||Tesla M40||Tesla P100||Full GP100|
|GPU||GK110 (Kepler)||GM200 (Maxwell)||GP100 (Pascal)||GP100 (Pascal)|
|FP32 CUDA Cores / SM||192||128||64||64|
|FP32 CUDA Cores / GPU||2880||3072||3584||3840|
|FP64 CUDA Cores / SM||64||4||32||32|
|FP64 CUDA Cores / GPU||960||96||1792||1920|
|Base Clock||745 MHz||948 MHz||1328 MHz||TBD|
|GPU Boost Clock||810/875 MHz||1114 MHz||1480 MHz||TBD|
|Memory Interface||384-bit GDDR5||384-bit GDDR5||4096-bit HBM2||4096-bit HBM2|
|Memory Size||Up to 12 GB||Up to 24 GB||16 GB||TBD|
|L2 Cache Size||1536 KB||3072 KB||4096 KB||TBD|
|Register File Size / SM||256 KB||256 KB||256 KB||256 KB|
|Register File Size / GPU||3840 KB||6144 KB||14336 KB||15360 KB|
|TDP||235 W||250 W||300 W||TBD|
|Transistors||7.1 billion||8 billion||15.3 billion||15.3 billion|
|GPU Die Size||551 mm2||601 mm2||610 mm2||610mm2|
|Manufacturing Process||28 nm||28 nm||16 nm||16nm|
This table is designed for developers that are interested in GPU compute, so a few variables (like ROPs) are still unknown, but it still gives us a huge insight into the “big Pascal” architecture. The jump to 16nm allows for about twice the number of transistors, 15.3 billion, up from 8 billion with GM200, with roughly the same die area, 610 mm2, up from 601 mm2.
A full GP100 processor will have 60 shader modules, compared to GM200's 24, although Pascal stores half of the shaders per SM. The GP100 part that is listed in the table above is actually partially disabled, cutting off four of the sixty total. This leads to 3584 single-precision (32-bit) CUDA cores, which is up from 3072 in GM200. (The full GP100 architecture will have 3840 of these FP32 CUDA cores -- but we don't know when or where we'll see that.) The base clock is also significantly higher than Maxwell, 1328 MHz versus ~1000 MHz for the Titan X and 980 Ti, although Ryan has overclocked those GPUs to ~1390 MHz with relative ease. This is interesting, because even though 10.6 TeraFLOPs is amazing, it's only about 20% more than what GM200 could pull off with an overclock.
Subject: Graphics Cards | March 17, 2015 - 01:47 PM | Ryan Shrout
Tagged: pascal, nvidia, gtc 2015, GTC, geforce
At the keynote of the GPU Technology Conference (GTC) today, NVIDIA CEO Jen-Hsun Huang disclosed some more updates on the roadmap for future GPU technologies.
Most of the detail was around Pascal, due in 2016, that will introduce three new features including mixed compute precision, 3D (stacked) memory, and NVLink. Mixed precision is a method of computing in FP16, allowing calculations to run much faster at lower accuracy than full single or double precision when they are not necessary. Keeping in mind that Maxwell doesn't have an implementation with full speed DP compute (today), it would seem that NVIDIA is targeting different compute tasks moving forward. Though details are short, mixed precision would likely indicate processing cores than can handle both data types.
3D memory is the ability to put memory on-die with the GPU directly to improve overall memory banwidth. The visual diagram that NVIDIA showed on stage indicated that Pascal would have 750 GB/s of bandwidth, compared to 300-350 GB/s on Maxwell today.
NVLink is a new way of connecting GPUs, improving on bandwidth by more than 5x over current implementations of PCI Express. They claim this will allow for connecting as many as 8 GPUs for deep learning performance improvements (up to 10x). What that means for gaming has yet to be discussed.
NVIDIA made some other interesting claims as well. Pascal will be more than 2x more performance per watt efficient than Maxwell, even without the three new features listed above. It will also ship (in a compute targeted product) with a 32GB memory system compared to the 12GB of memory announced on the Titan X today. Pascal will also have 4x the performance in mixed precision compute.
Subject: Graphics Cards, Shows and Expos | March 17, 2015 - 10:31 AM | Ryan Shrout
Tagged: nvidia, video, GTC, gtc 2015
NVIDIA is streaming today's keynote from the GPU Technology Conference (GTC) on Ustream, and we have the embed below for you to take part. NVIDIA CEO Jen-Hsun Huang will reveal the details about the new GeForce GTX TITAN X but there are going to be other announcements as well, including one featuring Tesla CEO Elon Musk.
Should be interesting!
Subject: General Tech | May 1, 2014 - 02:47 PM | Ken Addison
Tagged: nvidia, shield, Portal, GTC, Cake, lie
Sometimes I feel like this job just keeps getting stranger and stranger. Today is no expection.
After reciving just a tracking number, and no additional information from NVIDIA earlier this week, the mystery package finally arrived today. Upon initial inspection we had no idea what to expect.
When we opened the box, we were greeted by a polystyrene cooler with the logo of Bake Me a Wish, which only served to confuse us more.
As we opened the cooler, and the subsequent box inside of it, things started to make more sense.
Inside the box, we were greeted by a chocolate cake, accompanied by a card from NVIDIA.
As you may remember at this year's GTC Conference, NVIDIA announced that they had ported Valve's Portal to Android and would be releasing it for SHIELD. Today we were greeted with a reminder of that, and the message that we should be able to try it for ourselves.
A teaser from this year's GTC Keynote
While we can't talk about our experiences with Portal just yet, stay tuned to PC Perspective for more coverage of the NVIDIA SHIELD and Portal very soon!
Subject: General Tech | April 8, 2014 - 05:03 PM | Tim Verry
Tagged: research, nvidia, GTC, gpgpu, global impact award
During the GPU Technology Conference last month, NVIDIA introduced a new annual grant called the Global Impact Award. The grant awards $150,000 to researchers using NVIDIA GPUs to research issues with worldwide impact such as disease research, drug design, medical imaging, genome mapping, urban planning, and other "complex social and scientific problems."
NVIDIA will be presenting the Global Impact Award to the winning researcher or non-profit institution at next year's GPU Technology Conference (GTC 2015). Individual researchers, universities, and non-profit research institutions that are using GPUs as a significant enabling technology in their research are eligible for the grant. Both third party and self-nomiations (.doc form) are accepted with the nominated candidates being evaluated based on several factors including the level of innovation, social impact, and current state of the research and its effectiveness in approaching the problem. Submissions for nominations are due by December 12, 2014 with the finalists being announced by NVIDIA on March 13, 2015. NVIDIA will then reveal the winner of the $150,000 grant at GTC 2015 (April 28, 2015).
The researcher, university, or non-profit firm can be located anywhere in the world, and the grant money can be assigned to a department, initiative, or a single project. The massively parallel nature of modern GPUs makes them ideal for many times of research with scalable projects, and I think the Global Impact Award is a welcome incentive to encourage the use of GPGPU in applicable research projects. I am interested to see what the winner will do with the money and where the research leads.
More information on the Global Impact Award can be found on the NVIDIA website.
Subject: General Tech | March 27, 2014 - 02:42 PM | Ken Addison
Tagged: W9100, video, titan z, poseidon 780, podcast, Oculus, nvidia, GTC, GDC
PC Perspective Podcast #293 - 03/27/2014
Join us this week as we discuss the NVIDIA Titan-Z, ASUS ROG Poseidon 780, News from OculusVR and more!
The URL for the podcast is: http://pcper.com/podcast - Share with your friends!
- iTunes - Subscribe to the podcast directly through the Store
- RSS - Subscribe through your regular RSS reader
- MP3 - Direct download link to the MP3 file
Hosts: Ryan Shrout, Jeremy Hellstrom, Josh Walrath and Allyn Malventano
Week in Review:
0:37:07 This podcast is brought to you by Coolermaster, and the CM Storm Pulse-R Gaming Headset
News items of interest:
Hardware/Software Picks of the Week:
Josh: Certainly not a Skype Connection to the Studio
Allyn: Continuous ink conversions