Qualcomm Announces the Cloud AI 100: Dedicated Power-Efficient AI Processing for the Cloud

Subject: Processors | April 9, 2019 - 12:30 PM |
Tagged: qualcomm, datacenter, cloud, artificial intelligence, ai inference, ai

Last year, several models of Qualcomm’s mobile chipsets gained AI acceleration capabilities. Now, Qualcomm is leveraging its custom hardware and networking expertise to introduce a new solution for dedicated cloud-based AI processing. The Qualcomm Cloud AI 100 is a custom hardware solution for cloud AI inference workloads.


Built on a 7nm process node, Qualcomm designed the Cloud AI 100 from the ground up for AI processing, stating that it has “greater than 50x” the peak AI processing performance of its Snapdragon 820 chipset, which would make it one of the most powerful solutions in its class. It’s also designed for power efficiency, with Qualcomm claiming that it offers “10x performance per watt over the industry’s most advanced AI inference solutions deployed today" and is therefore easily scalable to meet performance or power requirements.


Combined with a full software stack for developers and partners, Qualcomm is aiming the Cloud AI 100 at the full gamut of cloud-to-edge workloads, where it will compete with GPU, CPU, and FPGA-based solutions from companies like Intel and NVIDIA. Support for existing software stacks will be available, with Qualcomm specifically listing PyTorch, Glow, TensorFlow, Keras, and ONNX.


Qualcomm is also touting direct benefits for end-users of supported devices, with significant performance improvements for features like natural language processing and translations via personal assistants, image recognition and search, and personalized content recommendations.

Keith Kressin, Qualcomm’s SVP of Product Management, issued the following statement alongside the product’s announcement:

Today, Qualcomm Snapdragon mobile platforms bring leading AI acceleration to over a billion client devices. Our all new Qualcomm Cloud AI 100 accelerator will significantly raise the bar for the AI inference processing relative to any combination of CPUs, GPUs, and/or FPGAs used in today’s data centers. Furthermore, Qualcomm Technologies is now well positioned to support complete cloud-to-edge AI solutions all connected with high-speed and low-latency 5G connectivity.

Qualcomm plans to begin sampling the Cloud AI 100 to enterprise customers in the second half of 2019.

Source: Qualcomm

Xilinx and AMD Break GoogLeNet AI Inference Record

Subject: General Tech | October 4, 2018 - 09:58 PM |
Tagged: Xilinx, FPGA, hardware acceleration, big data, HPC, neural network, ai inference, inference

During the Xilinx Developer Forum in San Jose earlier this week, Xilinx showed off a server built in partnership with AMD that uses FPGA-based hardware acceleration cards to break an inference record in GoogLeNet by hitting up to 30,000 images per second in total high-performance AI inference throughput. GoogLeNet is a 22 layer deep convolutional neural network (PDF) that was started as a project for the ImageNet Large Scale Visual Recognition Challenge in 2014.

Xilinx and AMD Epyc break GoogLeNet record for AI inference.jpg

Xilinx was able to achieve such high performance while maintaining low latency windows by using eight of its Alveo U250 acceleration add-in-cards that use FPGAs based on its 16nm UltraScale architecture. The cards are hosted by a dual socket AMD server motherboard with two Epyc 7551 processors and eight channels of DDR4 memory. The AMD-based system has two 32 core (64 threads) Zen architecture processors (180W) each clocked at 2 GHz (2.55 GHz all core turbo and 3 GHz maximum turbo) with 64 MB L3, memory controllers supporting up to 2TB per socket of DDR4 memory (341 GB/s of bandwidth in a two socket configuration), and 128 PCI-Express lanes. The Xilinx Alveo U250 cards offer up to 33.3 INT8 TOPs and feature 54MB SRAM (38TB/s) and 64GB of off-chip memory (77GB/s). Interfaces include the PCI-E 3.0 x16 connection as well as two QSFP28 (100GbE) connections. The cards are rated at 225W TDPs and cost a whopping $12,995 MSRP each. The FPGA cards alone push the system well into the six-figure range before including the Epyc server CPUs, all that system memory, and the other base components. It is not likely you will see this system in your next Tesla any time soon, but it is a nice proof of concept at what future technology generations may be able to achieve at much more economical price points and used for AI inference tasks in everyday life (driver assistance, medical imaging, big data analytics driving market research that influences consumer pricing, etc).

Xilinx Alveo FPGA.jpg

Interestingly, this system may hold the current record, but it is not likely to last very long even against Xilinx’s own hardware. Specifically, Xilinx’s Versal ACAP cards (set to release in the second half of next year) are slated to hit up to 150W TDPs (in the add-in-card models) while being up to eight times faster than Xilinx’s previous FPGAs. The Versal ACAPs will use TSMCs 7nm FinFET node and will combine scalar processing engines (ARM CPUs), adaptable hardware engines (FPGAs with a new full software stack and much faster on-the-fly dynamic reconfiguration), and AI engines (DSPs, SIMD vector cores, and dedicated fixed function units for inference tasks) with a Network on Chip (NoC) and customizable memory hierarchy. Xilinx also has fierce competition on its hands in this huge AI/machine learning/deep neural network market with Intel/Altera and its Stratix FPGAs, AMD and NVIDIA with their GPUs and new AI focused cores, and other specialty hardware accelerator manufacturers including Google with its TPUs. (There's also ARM's Project Trillium for mobile.) I am interested to see what the new AI inference performance bar will be set to by this time next year!

Source: TechPowerUp