NVIDIA Teases Low Power, High Performance Xavier SoC That Will Power Future Autonomous Vehicles

Subject: Processors | October 1, 2016 - 06:11 PM |
Tagged: xavier, Volta, tegra, SoC, nvidia, machine learning, gpu, drive px 2, deep neural network, deep learning

Earlier this week at its first GTC Europe event in Amsterdam, NVIDIA CEO Jen-Hsun Huang teased a new SoC code-named Xavier that will be used in self-driving cars and feature the company's newest custom ARM CPU cores and Volta GPU. The new chip will begin sampling at the end of 2017 with product releases using the future Tegra (if they keep that name) processor as soon as 2018.

NVIDIA_Xavier_SOC.jpg

NVIDIA's Xavier is promised to be the successor to the company's Drive PX 2 system which uses two Tegra X2 SoCs and two discrete Pascal MXM GPUs on a single water cooled platform. These claims are even more impressive when considering that NVIDIA is not only promising to replace the four processors but it will reportedly do that at 20W – less than a tenth of the TDP!

The company has not revealed all the nitty-gritty details, but they did tease out a few bits of information. The new processor will feature 7 billion transistors and will be based on a refined 16nm FinFET process while consuming a mere 20W. It can process two 8k HDR video streams and can hit 20 TOPS (NVIDIA's own rating for deep learning int(8) operations).

Specifically, NVIDIA claims that the Xavier SoC will use eight custom ARMv8 (64-bit) CPU cores (it is unclear whether these cores will be a refined Denver architecture or something else) and a GPU based on its upcoming Volta architecture with 512 CUDA cores. Also, in an interesting twist, NVIDIA is including a "Computer Vision Accelerator" on the SoC as well though the company did not go into many details. This bit of silicon may explain how the ~300mm2 die with 7 billion transistors is able to match the 7.2 billion transistor Pascal-based Telsa P4 (2560 CUDA cores) graphics card at deep learning (tera-operations per second) tasks. Of course in addition to the incremental improvements by moving to Volta and a new ARMv8 CPU architectures on a refined 16nm FF+ process.

  Drive PX Drive PX 2 NVIDIA Xavier Tesla P4
CPU 2 x Tegra X1 (8 x A57 total) 2 x Tegra X2 (8 x A57 + 4 x Denver total) 1 x Xavier SoC (8 x Custom ARM + 1 x CVA) N/A
GPU 2 x Tegra X1 (Maxwell) (512 CUDA cores total 2 x Tegra X2 GPUs + 2 x Pascal GPUs 1 x Xavier SoC GPU (Volta) (512 CUDA Cores) 2560 CUDA Cores (Pascal)
TFLOPS 2.3 TFLOPS 8 TFLOPS ? 5.5 TFLOPS
DL TOPS ? 24 TOPS 20 TOPS 22 TOPS
TDP ~30W (2 x 15W) 250W 20W up to 75W
Process Tech 20nm 16nm FinFET 16nm FinFET+ 16nm FinFET
Transistors ? ? 7 billion 7.2 billion

For comparison, the currently available Tesla P4 based on its Pascal architecture has a TDP of up to 75W and is rated at 22 TOPs. This would suggest that Volta is a much more efficient architecture (at least for deep learning and half precision)! I am not sure how NVIDIA is able to match its GP104 with only 512 Volta CUDA cores though their definition of a "core" could have changed and/or the CVA processor may be responsible for closing that gap. Unfortunately, NVIDIA did not disclose what it rates the Xavier at in TFLOPS so it is difficult to compare and it may not match GP104 at higher precision workloads. It could be wholly optimized for int(8) operations rather than floating point performance. Beyond that I will let Scott dive into those particulars once we have more information!

Xavier is more of a teaser than anything and the chip could very well change dramatically and/or not hit the claimed performance targets. Still, it sounds promising and it is always nice to speculate over road maps. It is an intriguing chip and I am ready for more details, especially on the Volta GPU and just what exactly that Computer Vision Accelerator is (and will it be easy to program for?). I am a big fan of the "self-driving car" and I hope that it succeeds. It certainly looks to continue as Tesla, VW, BMW, and other automakers continue to push the envelope of what is possible and plan future cars that will include smart driving assists and even cars that can drive themselves. The more local computing power we can throw at automobiles the better and while massive datacenters can be used to train the neural networks, local hardware to run and make decisions are necessary (you don't want internet latency contributing to the decision of whether to brake or not!).

I hope that NVIDIA's self-proclaimed "AI Supercomputer" turns out to be at least close to the performance they claim! Stay tuned for more information as it gets closer to launch (hopefully more details will emerge at GTC 2017 in the US).

What are your thoughts on Xavier and the whole self-driving car future?

Also read:

Source: NVIDIA

CES 2016: NVIDIA Launches DRIVE PX 2 With Dual Pascal GPUs Driving A Deep Neural Network

Subject: General Tech | January 5, 2016 - 01:17 AM |
Tagged: tegra, pascal, nvidia, driveworks, drive px 2, deep neural network, deep learning, autonomous car

NVIDIA is using the Consumer Electronics Show to launch the Drive PX 2 which is the latest bit of hardware aimed at autonomous vehicles. Several NVIDIA products combine to create the company's self-driving "end to end solution" including DIGITS, DriveWorks, and the Drive PX 2 hardware to train, optimize, and run the neural network software that will allegedly be the brains of future self-driving cars (or so NVIDIA hopes).

NVIDIA DRIVE PX 2 Self Driving Car Supercomputer.jpg

The Drive PX 2 hardware is the successor to the Tegra-powered Drive PX released last year. The Drive PX 2 represents a major computational power jump with 12 CPU cores and two discrete "Pascal"-based GPUs! NVIDIA has not revealed the full specifications yet, but they have made certain details available. There are two Tegra SoCs along with two GPUs that are liquid cooled. The liquid cooling consists of a large metal block with copper tubing winding through it and then passing into what looks to be external connectors that attach to a completed cooling loop (an exterior radiator, pump, and reservoir).

There are a total of 12 CPU cores including eight ARM Cortex A57 cores and four "Denver" cores. The discrete graphics are based on the 16nm FinFET process and will use the company's upcoming Pascal architecture. The total package will draw a maximum of 250 watts and will offer up to 8 TFLOPS of computational horsepower and 24 trillion "deep learning operations per second." That last number relates to the number of special deep learning instructions the hardware can process per second which, if anything, sounds like an impressive amount of power when it comes to making connections and analyzing data to try to classify it. Drive PX 2 is, according to NVIDIA, 10 times faster than it's predecessor at running these specialized instructions and has nearly 4 times the computational horsepower when it comes to TLOPS.

Similar to the original Drive PX, the driving AI platform can accept and process the inputs of up to 12 video cameras. It can also handle LiDAR, RADAR, and ultrasonic sensors. NVIDIA compared the Drive PX 2 to the TITAN X in its ability to process 2,800 images per second versus the consumer graphics card's 450 AlexNet images which while possibly not the best comparison does make it look promising.

NVIDIA DRIVE PX 2 DRIVEWORKS.jpg

Neural networks and machine learning are at the core of what makes autonomous vehicles possible along with hardware powerful enough to take in a multitude of sensor data and process it fast enough. The software side of things includes the DriveWorks development kit which includes specialized instructions and a neural network that can detect objects based on sensor input(s), identify and classify them, determine the positions of objects relative to the vehicle, and calculate the most efficient path to the destination.

Specifically, in the press release NVIDIA stated:

"This complex work is facilitated by NVIDIA DriveWorks™, a suite of software tools, libraries and modules that accelerates development and testing of autonomous vehicles. DriveWorks enables sensor calibration, acquisition of surround data, synchronization, recording and then processing streams of sensor data through a complex pipeline of algorithms running on all of the DRIVE PX 2's specialized and general-purpose processors. Software modules are included for every aspect of the autonomous driving pipeline, from object detection, classification and segmentation to map localization and path planning."

DIGITS is the platform used to train the neural network that is then used by the Drive PX 2 hardware. The software is purportedly improving in both accuracy and training time with NVIDIA achieving a 96% accuracy rating at identifying traffic signs based on the traffic sign database from Ruhr University Bochum after a training session lasting only 4 hours as opposed to training times of days or even weeks.

NVIDIA claims that the initial Drive PX has been picked up by over 50 development teams (automakers, universities, software developers, et al) interested in autonomous vehicles. Early access to development hardware is expected to be towards the middle of the year with general availability of final hardware in Q4 2016.

The new Drive PX 2 is getting a serious hardware boost with the inclusion of two dedicated graphics processors (the Drive PX was based around two Tegra X1 SoCs), and that should allow automakers to really push what's possible in real time and push the self-driving car a bit closer to reality and final (self) drive-able products. I'm excited to see that vision come to fruition and am looking forward to seeing what this improved hardware will enable in the auto industry!

Coverage of CES 2016 is brought to you by Logitech!

PC Perspective's CES 2016 coverage is sponsored by Logitech.

Follow all of our coverage of the show at http://pcper.com/ces!

Source: NVIDIA