AMD Introduces Radeon Pro SSG: A Professional GPU Paired With Low Latency Flash Storage (Updated)
Subject: Graphics Cards | July 27, 2016 - 01:56 AM | Tim Verry
Tagged: solid state, radeon pro, Polaris, gpgpu, amd
UPDATE (July 27th, 1am ET): More information on the Radeon Pro SSG has surfaced since the original article. According to AnandTech, the prototype graphics card actually uses an AMD Fiji GPU. The Fiji GPU is paired onboard PCI-E based storage using the same PEX8747 bridge chip used in the Radeon Pro Duo. Storage is handled by two PCI-E 3.0 x4 M.2 slots that can accommodate up to 1TB of NAND flash storage. As I mentioned below, having the storage on board the graphics card vastly reduces latency by reducing the number of hops and not having to send requests out to the rest of the system. AMD had more numbers to share following their demo, however.
From the 8K video editing demo, the dual Samsung 950 Pro PCI-E SSDs (in RAID 0) on board the Radeon Pro SSG hit 4GB/s while scrubbing through the video. That same video source stored on a Samsung 950 Pro attached to the motherboard had throughput of only 900MB/s. In theory, reaching out to system RAM still has raw throughput advantages (with DDR4 @ 3200 MHz on a Haswell-E platform theroretically capable of 62 GB/s reads and 47 GB/s writes though that would be bottlenecked by the graphics card having to go over the PCI-E 3.0 x16 link and it's maximum of 15.754 GB/s.). Of course if you can hold it in (much smaller) GDDR5 (300+GB/s depending on clocks and memory bus width) or HBM (1TB/s) and not have to go out to any other storage tier that's ideal but not always feasible especially in the HPC world.
However, having onboard storage on the same board as the GPU only a single "hop" away vastly reduces latency and offers much more total storage space than most systems have in DDR3 or DDR4. In essence, the solid state storage on the graphics card (which developers will need to specifically code for) acts as a massive cache for streaming in assets for data sets and workloads that are highly impacted by latency. This storage is not the fastest, but is the next best thing for holding active data outside of GDDR5/x or HBM. For throughput intensive workloads reaching out to system RAM will be better Finally, reaching out to system attached storage should be the last resort as it will be the slowest and most latent. Several commentors mentioned using a PCI-E based SSD in a second slot on the motherboard accessed much like GPUs in CrossFire communicate now (DMA over the PCI-E bus) which is an interesting idea that I had not considered.
Per my understanding of the situation, I think that the on board SSG storage would still be slightly more beneficial than this setup but it would get you close (I am assuming the GPU would be able to directly interact and request data from the SSD controller and not have to rely on the system CPU to do this work but I may well be mistaken. I will have to look into this further and ask the experts heh). On the prototype Radeon Pro SSG the M.2 slots are actually able to be seen as drives by the system and OS so it is essentially acting as if there was a PCI-E adapter card in a slot on the motherboard holding those drives but that may not be the case should this product actually hit the market. I do question their choice to go with Fiji rather than Polaris, but it sounds like they built the prototype off of the Radeon Pro Duo platform so I suppose it would make sense there.
Hopefully the final versions in 2017 or beyond use at least Vega though :).
Alongside the launch of new Radeon Pro WX (workstation) series graphics cards, AMD teased an interesting new Radeon Pro product: the Radeon Pro SSG. This new professional graphics card pairs a Polaris GPU with up ot a terabyte of on board solid state storage and seeks to solve one of the biggest hurdles in GP GPU performance when dealing with extremely large datasets which is latency.
One of the core focuses of AMD's HSA (heterogeneous system architecture) is unified memory and the ability of various processors (CPU, GPU, specialized co-processors, et al) to work together efficiently by being able to access and manipulate data from the same memory pool without having to copy data bck and forth between CPU-accessible memory and GPU-accessible memory. With the Radeon Pro SSG, this idea is not fully realized (it is more of a sidestep), but it will move performance further. It does not eliminate the need to copy data to the GPU before it can work on it, but once copied the GPU will be able to work on data stored in what AMD describes as a one terabyte frame buffer. This memory will be solid state and very fast, but more importantly it will be able to get at the data with much lower latency than previous methods. AMD claims the solid state storage (likely NAND but they have not said) will link with the GPU over a dedicated PCI-E bus. I suppose that if you can't bring the GPU to the data, you bring the data to the GPU!
Considering AMD's previous memory champ – the Radeon W9100 – maxed out at 32GB of GDDR5, the teased Radeon Pro SSG with its 1TB of purportedly low latency onboard flash storage opens up a slew of new possibilities for researchers and professionals in media, medical, and scientific roles working with massive datasets for imaging, creation, and simulations! I expect that there are many professionals out there eager to get their hands on one of these cards! They will be able to as well thanks to a beta program launching shortly, so long as they have $10,000 for the hardware!
AMD gave a couple of examples in their PR on the potential benefits of its "solid state graphics" including the ability to image a patient's beating heart in real time to allow medical professionals to examine and spot issues as early as possible and using the Radeon Pro SSG to edit and scrub through 8K video in real time at 90 FPS versus 17 with current offerings. On the scientific side of things being able to load up entire models into the new graphics memory (not as low latency as GDDR5 or HBM certainly) will be a boon as will being able to get data sets as close to the GPU as possible into servers using GPU accelerated databases powering websites accessed by millions of users.
It is not exactly the HSA future I have been waiting for ever so impatiently, but it is a nice advancement and an intriguing idea that I am very curious to see how well it pans out and if developers and researchers will truly take advantage of and use to further their projects. I suspect something like this could be great for deep learning tasks as well (such as powering the "clouds" behind self driving cars perhaps).
Stay tuned to PC Perspective for more information as it develops.
This is definitely a product that I will be watching and I hope that it does well. I am curious what Nvidia's and Intel's plans are here as well! What are your thoughts on AMD's "Solid State Graphics" card? All hype or something promising?