AMD Enters Machine Learning Game with Radeon Instinct Products

AMD is launching Polaris, Fiji and Vega based GPU cards for machine learning.

NVIDIA has been diving in to the world of machine learning for quite a while, positioning themselves and their GPUs at the forefront on artificial intelligence and neural net development. Though the strategies are still filling out, I have seen products like the DIGITS DevBox place a stake in the ground of neural net training and platforms like Drive PX to perform inference tasks on those neural nets in self-driving cars. Until today AMD has remained mostly quiet on its plans to enter and address this growing and complex market, instead depending on the compute prowess of its latest Polaris and Fiji GPUs to make a general statement on their own.

The new Radeon Instinct brand of accelerators based on current and upcoming GPU architectures will combine with an open-source approach to software and present researchers and implementers with another option for machine learning tasks.

The statistics and requirements that come along with the machine learning evolution in the compute space are mind boggling. More than 2.5 quintillion bytes of data are generated daily and stored on phones, PCs and servers, both on-site and through a cloud infrastructure. That includes 500 million tweets, 4 million hours of YouTube video, 6 billion google searches and 205 billion emails.

Machine intelligence is going to allow software developers to address some of the most important areas of computing for the next decade. Automated cars depend on deep learning to train, medical fields can utilize this compute capability to more accurately and expeditiously diagnose and find cures to cancer, security systems can use neural nets to locate potential and current risk areas before they affect consumers; there are more uses for this kind of network and capability than we can imagine.

The Radeon Instinct initiative from AMD will utilize specifically built hardware accelerators along with AMD built ROCm software stacks to build machine learning frameworks and applications that are open and easily utilized by customers.

ROCm is an open-source HPC/Hyperscale-class platform for GPU computing that’s also programming-language independent. We are bringing the UNIX philosophy of choice, minimalism and modular software development to GPU computing. The new ROCm foundation lets you choose or even develop tools and a language run time for your application.

Like any reasonable compute initiative for machine learning, ROCm is built to support many GPUs, both inside a system and within a many server environment. It can simplify the stack through RDMA peer-sync. Besides being built for massive scaling, it includes compilers, language run times and interesting (and importantly) CUDA-application support. (CUDA being the NVIDIA developed GPGPU programming language.)

While I am still learning about this industry myself, the current configurations generally fall into two categories of workloads: training and inference. Training is accomplished with many high-performance servers and is the most time-consuming part of the process. NVIDIA once made claims that its GPUs were capable of a 14x speed up in this process over a comparable CPU, and that was back in 2014!

The inference part is aimed at using that built-up network of data from training. This could be in the form of cameras and a GPU for automated driving or drones using DNNs for impact avoidance. This portion is still accelerated by GPUs but doesn’t require as much relative horsepower to get the job completed.

AMD is announcing three accelerator cards for Instinct today, two for inference and one for training. The MI6 is a Polaris GPU based card with 5.7 TFLOPS of peak compute when measured in FP16 half precision math. It has 16GB of GDDR5 memory and uses about 150 watts of power. The 224 GB/s of memory bandwidth specification indicates that we are looking at Polaris 10 GPU like the one found on the RX 480. AMD claims that all the Instinct cards are passively cooled, which is technically accurate, but when you move the fans from the card to the server chassis, that’s a nebulous claim at best.

The MI8 accelerator is based on the same Fiji GPU implementation found on the Radeon R9 Nano, with a small form factor design that might help it find itself in unique system configurations. It has 8.2 TFLOPS of FP16 performance, 512GB/s of memory bandwidth though it is still limited to 4GB of memory because of the HBM integration.

Obviously the one of interest is the MI25, an accelerator with much higher performance aimed at training. This one uses one of the yet to be announced Vega 10 GPUs based on AMD’s upcoming Vega architecture. AMD was very tight lipped about specifications and performance of this card (we don’t even know how much memory the card will have) though we can infer estimated peak compute based on full system performance metrics. Based on servers that were shown, a Vega 10 GPU will like have 12.5 TFLOPS of single precision (FP32). In comparison, the Titan X Pascal based on GP102 has 11.0 TFLOPS of rated performance, so in theory, Vega 10 should exceed that. One thing to keep in mind, at least in prior AMD GCN architectures, the ratio of TFLOPS to performance has been higher for AMD than NVIDIA. (NVIDIA cards tend to offer better in-game performance at the same theoretical peak rated compute throughput.)

Instinct graphics cards are going to be built and supported by AMD directly, taking a page out of what NVIDIA has done with most its non-consumer graphics lines. This should give AMD more control on the messaging and branding for this line, something that system integrators spending millions of dollars on machine learning can appreciate.

Along with the hardware release comes MIOpen, a library for deep learning built by AMD to take advantage of the GCN architecture. MIOpen sits at the same level in the stack as C++ STL, NCCL and others, bridging between the ROCm platform and programming languages to the common frameworks like Caffe, TensorFlow, etc.

AMD did supply one relative performance metric using DeepBench GEMM, a common benchmark for deep learning systems. Compared to the Titan X Pascal card, the MI8 (based on the Radeon R9 Nano), comes in just slightly ahead. The MI25 using Vega is about 30% faster than NVIDIA’s Titan X Pascal. As far as I can tell, this benchmark used FP32 data types rather than FP16, so it’s not directly taking advantage of double packed math capabilities that Vega NCUs offer. (To be clear, the FP16 performance of the Pascal-based Titan X is awful, with a 1:64 ratio to single precision math.

Not letting an opportunity slide by them, AMD did use the Instinct announcement to show the upcoming Naples platform, the Zen architecture implementation for servers. No details were given, but the claim that Naples is “optimized” for GPU and accelerator throughput likely just points to an increase in available PCIe bandwidth and connectivity.

AMD’s stance on systems, differing from what NVIDIA has shown with the GP100 to this point, is to provide the hardware to system designers and let them create a device custom tailored for machine learning. The NVIDIA DXG-1 offers a stunning amount of performance, but it comes at a cost of $129,000 – AMD balked at the claim that compute capability should be priced that high. Partners were on hand and on stage to talk about what working with AMD Instinct should bring and showed off a few system integrations as well.

All three of the above systems use Instinct MI25 Vega-based GPUs and will vary in price along with their impressive stated compute (FP16) rates. Researchers able to get 3 PETAFLOPS of compute capability in a 42U standard rack design will have plenty of horsepower to develop the next-generation of deep neural networks.

This is a move that AMD and the Radeon Technologies Group needed to make. Though the world of machine learning is never going to eclipse the consumer or professional markets in terms of unit sales, the profit margin is incredibly high on configurations built for it. Also, the name recognition and halo effect that comes from being the leader in the training space will trickle down into inference platforms as well as to other markets that do directly interact with business and consumer markets. Building up the Instinct brand makes business sense, marketing sense and competitive sense, and it seems likely that AMD can impact DNN and machine learning fields with its combination of existing and upcoming GPU hardware and a push for an open, community driven software ecosystem.