NVIDIA Jetson TX1 Will Power Autonomous Embedded Devices With Machine Learning

Subject: General Tech | November 12, 2015 - 02:46 AM |
Tagged: Tegra X1, nvidia, maxwell, machine learning, jetson, deep neural network, CUDA, computer vision

Nearly two years ago, NVIDIA unleashed the Jetson TK1, a tiny module for embedded systems based around the company's Tegra K1 "super chip." That chip was the company's first foray into CUDA-powered embedded systems capable of machine learning including object recognition, 3D scene processing, and enabling things like accident avoidance and self-parking cars.

Now, NVIDIA is releasing even more powerful kit called the Jetson TX1. This new development platform covers two pieces of hardware: the credit card sized Jetson TX1 module and a larger Jetson TX1 Development Kit that the module plugs into and provides plenty of I/O options and pin outs. The dev kit can be used by software developers or for prototyping while the module alone can be used with finalized embedded products.

View Full Size

NVIDIA foresees the Jetson TX1 being used in drones, autonomous vehicles, security systems, medical devices, and IoT devices coupled with deep neural networks, machine learning, and computer vision software. Devices would be able to learn from the environment in order to navigate safely, identify and classify objects of interest, and perform 3D mapping and scene modeling. NVIDIA partnered with several companies for proof-of-concepts including Kespry and Stereolabs.

Using the TX1, Kespry was able to use drones to classify and track in real time construction equipment moving around a construction site (in which the drone was not necessarily programmed for exactly as sites and weather conditions vary, the machine learning/computer vision was used to allow the drone to navigate the construction site and a deep neural network was used to identify and classify the type of equipment it saw using its cameras. Meanwhile Stereolabs used high resolution cameras and depth sensors to capture photos of buildings and then used software to reconstruct the 3D scene virtually for editing and modeling. You can find other proof-of-concept videos, including upgrading existing drones to be more autonomous posted here.

From the press release:

"Jetson TX1 will enable a new generation of incredibly capable autonomous devices," said Deepu Talla, vice president and general manager of the Tegra business at NVIDIA. "They will navigate on their own, recognize objects and faces, and become increasingly intelligent through machine learning. It will enable developers to create industry-changing products."

But what about the hardware side of things? Well, the TX1 is a respectable leap in hardware and compute performance. Sitting at 1 Teraflops of rated (FP16) compute performance, the TX1 pairs four ARM Cortex A57 and four ARM Cortex A53 64-bit CPU cores with a 256-core Maxwell-based GPU. Definitely respectable for its size and low power consumption, especially considering NVIDIA claims the SoC can best the Intel Skylake Core i7-6700K in certain workloads (thanks to the GPU portion). The module further contains 4GB of LPDDR4 memory and 16GB of eMMC flash storage.

In short, while on module storage has not increased, RAM has been doubled and compute performance has tripled for FP16 compute performance and jumped by approximately 40% for FP32 versus the Jetson TK1's 2GB of DDR3 and 192-core Kepler GPU. The TX1 also uses a smaller process node at 20nm (versus 28nm) and the chip is said to use "very little power." Networking support includes 802.11ac and Gigabit Ethernet. The chart below outlines the major differences between the two platforms.

  Jetson TX1 Jetson TK1
GPU (Architecture) 256-core (Maxwell) 192-core (Kepler)
CPU 4 x ARM Cortex A57 + 4 x A53 "4+1" ARM Cortex A15 "r3"
RAM 4 GB LPDDR4 2 GB LPDDR3
eMMC 16 GB 16 GB
Compute Performance (FP16) 1 TFLOP 326 GFLOPS
Compute Performance (FP32) - via AnandTech 512 GFLOPS (AT's estimation) 326 GFLOPS (NVIDIA's number)
Manufacturing Node 20nm 28nm
Launch Pricing $299 $192

The TX1 will run the Linux For Tegra operating system and supports the usual suspects of CUDA 7.0, cuDNN, and VisionWorks development software as well as the latest OpenGL drivers (OpenGL 4.5, OpenGL ES 3.1, and Vulkan).

NVIDIA is continuing to push for CUDA Everywhere, and the Jetson TX1 looks to be a more mature product that builds on the TK1. The huge leap in compute performance should enable even more interesting projects and bring more sophisticated automation and machine learning to smaller and more intelligent devices.

For those interested, the Jetson TX1 Development Kit (the full I/O development board with bundled module) will be available for pre-order today at $599 while the TX1 module itself will be available soon for approximately $299 each in orders of 1,000 or more (like Intel's tray pricing).

With CUDA 7, it is apparently possible for the GPU to be used for general purpose processing as well which may open up some doors that where not possible before in such a small device. I am interested to see what happens with NVIDIA's embedded device play and what kinds of automated hardware is powered by the tiny SoC and its beefy graphics.

Source: NVIDIA

November 12, 2015 | 07:49 AM - Posted by Anonymous (not verified)

1 TFLOPS is for 16 bit only. 32-bit is 500 GFlops

November 12, 2015 | 05:32 PM - Posted by Tim Verry

Hi, thank you for pointing that out. I did some looking around online and while I wasn't able to find NVIDIA's numbers specifically for FP16 and FP32 for both chips, I was able to find some calculations done by AnandTech which I linked to in the article. It looks like FP32 compute has increased due to the more efficient Maxwell cores by about 40% to the ~500 GLOPS you mentioned and ~tripled for FP16 work which seems to be where NVIDIA is getting the 1TFLOP number. Less impressive that I at first thought, heh, but still a respectable jump.

 

I've updated the article to specify FP16 and FP32 numbers.

November 12, 2015 | 11:14 AM - Posted by Anonymous (not verified)

CUDA is why Nvidia can't get into the mobile market that much, so Nvidia better have support for Vulkan on this development board. Even the automotive systems are going to have to have an opened source software solution, for code auditing, even for CUDA API code, after all that VW emissions shenanigans! That proprietary software for transportation systems is going to have to be audited, so it will be much easier just to have a single standardized and open code base and Vulkan will allow for both graphics and GPU compute through the same API. The Khronos group's open API standards is about code maintainability, standardizations, and portability as well as code certification and auditing with APIs that are definitely source code audit-able.

November 12, 2015 | 12:10 PM - Posted by Anonymous (not verified)

Yeah Vulkan will be as important to automobiles as OpenGL was to gaming.

November 12, 2015 | 02:41 PM - Posted by Anonymous (not verified)

Vulkan is not just for graphics it's for compute, but you just stick with the Green Goblin Gimper's vendor lock-in. Vulkan is Mantle, so look at how that TRUE hardware based asynchronous compute is working out for even the proprietary DX12(lots of Mantle calls with the functions renamed in DX12), with the red team's graphics cards performing better on the benchmarks! And Vulkan will compete with DX12 on that non gimped NON Green Goblin hardware. Nvidia sure hired AMD's HSA chief engineer in a hurry!

Better fix that in software context switching Green Goblin, those idle GPU FP/other cores are not kept working efficiently, and it's not for lack of GPU/compute processing threads waiting for dispatch! GPU processor thread context switching needs to be in the hardware, and not poorly implemented in software! If the Green Goblin wants in on the VR action then the hardware gimping has to stop. Steam OS is here and Vulkan will be used by Valve, and the entire mobile market, among others!