Author:
Subject: General Tech
Manufacturer: ARM

Addressing New Markets

Machine Learning is one of the hot topics in technology, and certainly one that is growing at a very fast rate. Applications such as facial recognition and self-driving cars are powering much of the development going on in this area. So far we have seen CPUs and GPUs being used in ML applications, but in most cases these are not the most efficient ways of doing these highly parallel but relatively computationally simple workloads. New chips have been introduced that are far more focused on machine learning, and now it seems that ARM is throwing their hat into the ring.

ml_01.png

ARM is introducing three products under the Project Trillium brand. It features a ML processor, a OD (Object Detection) processor, and a ARM developed Neural Network software stack. This project came as a surprise for most of us, but in hindsight it is a logical avenue for them to address as it will be incredibly important moving forward. Currently many applications that require machine learning are not processed at the edge, namely in the consumer’s hand or device right next to them. Workloads may be requested from the edge, but most of the heavy duty processing occurs in datacenters located all around the world. This requires communication, and sometimes pretty hefty levels of bandwidth. If neither of those things are present, applications requiring ML break down.

ml_02.png

Click here to read the rest of the article about Project Trillium!

How deep is your learning?

Recently, we've had some hands-on time with NVIDIA's new TITAN V graphics card. Equipped with the GV100 GPU, the TITAN V has shown us some impressive results in both gaming and GPGPU compute workloads.

However, one of the most interesting areas that NVIDIA has been touting for GV100 has been deep learning. With a 1.33x increase in single-precision FP32 compute over the Titan Xp, and the addition of specialized Tensor Cores for deep learning, the TITAN V is well positioned for deep learning workflows.

In mathematics, a tensor is a multi-dimensional array of numerical values with respect to a given basis. While we won't go deep into the math behind it, Tensors are a crucial data structure for deep learning applications.

07.jpg

NVIDIA's Tensor Cores aim to accelerate Tensor-based math by utilizing half-precision FP16 math in order to process both dimensions of a Tensor at the same time. The GV100 GPU contains 640 of these Tensor Cores to accelerate FP16 neural network training.

It's worth noting that these are not the first Tensor operation-specific hardware, with others such as Google developing hardware for these specific functions.

Test Setup

  PC Perspective Deep Learning Testbed
Processor AMD Ryzen Threadripper 1920X
Motherboard GIGABYTE X399 AORUS Gaming 7
Memory 64GB Corsair Vengeance RGB DDR4-3000 
Storage Samsung SSD 960 Pro 2TB
Power Supply Corsair AX1500i 1500 watt
OS Ubuntu 16.04.3 LTS
Drivers AMD: AMD GPU Pro 17.50
NVIDIA: 387.34

For our NVIDIA testing, we used the NVIDIA GPU Cloud 17.12 Docker containers for both TensorFlow and Caffe2 inside of our Ubuntu 16.04.3 host operating system.

AMD testing was done using the hiptensorflow port from the AMD ROCm GitHub repositories.

For all tests, we are using the ImageNet Large Scale Visual Recognition Challenge 2012 (ILSVRC2012) data set.

Continue reading our look at deep learning performance with the NVIDIA Titan V!!