GTC 2018: Nvidia and ARM Integrating NVDLA Into Project Trillium For Inferencing at the Edge

Subject: General Tech | March 29, 2018 - 03:10 PM |
Tagged: project trillium, nvidia, machine learning, iot, GTC 2018, GTC, deep learning, arm, ai

During GTC 2018 NVIDIA and ARM announced a partnership that will see ARM integrate NVIDIA's NVDLA deep learning inferencing accelerator into the company's Project Trillium machine learning processors. The NVIDIA Deep Learning Accelerator (NVDLA) is an open source modular architecture that is specifically optimized for inferencing operations such as object and voice recognition and bringing that acceleration to the wider ARM ecosystem through Project Trillium will enable a massive number of smarter phones, tablets, Internet-of-Things, and embedded devices that will be able to do inferencing at the edge which is to say without the complexity and latency of having to rely on cloud processing. This means potentially smarter voice assistants (e.g. Alexa, Google), doorbell cameras, lighting, and security around the home and out-and-about on your phone for better AR, natural translation, and assistive technologies.

NVIDIAandARM_NVDLA.jpg

Karl Freund, lead analyst for deep learning at Moor Insights & Strategy was quoted in the press release in stating:

“This is a win/win for IoT, mobile and embedded chip companies looking to design accelerated AI inferencing solutions. NVIDIA is the clear leader in ML training and Arm is the leader in IoT end points, so it makes a lot of sense for them to partner on IP.”

ARM's Project Trillium was announced back in February and is a suite of IP for processors optimized for parallel low latency workloads and includes a Machine Learning processor, Object Detection processor, and neural network software libraries. NVDLA is a hardware and software platform based upon the Xavier SoC that is highly modular and configurable hardware that can feature a convolution core, single data processor, planar data processor, channel data processor, and data reshape engines. The NVDLA can be configured with all or only some of those elements and they can independently them up or down depending on what processing acceleration they need for their devices. NVDLA connects to the main system processor over a control interface and through two AXI memory interfaces (one optional) that connect to system memory and (optionally) dedicated high bandwidth memory (not necessarily HBM but just its own SRAM for example).

arm project trillium integrates NVDLA.jpg

NVDLA is presented as a free and open source architecture that promotes a standard way to design deep learning inferencing that can accelerate operations to infer results from trained neural networks (with the training being done on other devices perhaps by the DGX-2). The project, which hosts the code on GitHub and encourages community contributions, goes beyond the Xavier-based hardware and includes things like drivers, libraries, TensorRT support (upcoming)  for Google's TensorFlow acceleration, testing suites and SDKs as well as a deep learning training infrastructure (for the training side of things) that is compatible with the NVDLA software and hardware, and system integration support.

Bringing the "smarts" of smart devices to the local hardware and closer to the users should mean much better performance and using specialized accelerators will reportedly offer the performance levels needed without blowing away low power budgets. Internet-of-Things (IoT) and mobile devices are not going away any time soon, and the partnership between NVIDIA and ARM should make it easier for developers and chip companies to offer smarter (and please tell me more secure!) smart devices.

Also read:

Source: NVIDIA

ARM Introduces Kigen OS for Cellular IoT

Subject: General Tech | February 21, 2018 - 09:00 AM |
Tagged: modem, Kigen, iSIM, iot, cortex, cellular, arm

Last year ARM went on a bit of a buying spree thanks to the financial help of its holding company, SoftBank. One of the companies that it scooped up was that of Simulity Labs for around 12 million pounds. The company was developing IoT security products based on eSIM technology and a robust OS that provides provisioning on a cellular network.

armki_01.jpg

Many believe that the nearly ubiquitous cellular networks that surround us are the key to truly successful IoT products. There are massive cellular deployments around the world. It is a well regulated spectrum. Security through SIM cards is a well known and understood process. It is not impossible to break this security, but it is questionable if it is worth the time and effort to do so.

armki_02.jpg

ARM has gone ahead and provided the means to productize and push this technology with the aim of providing a vast, secure IoT infrastructure that would be relatively easy to rollout with current cellular networks. There are multiple parts to this technology, but ARM is hoping to offer an all-in-one solution that would provide an inexpensive platform for OEMs and Mobile Network Operators (MNOs) to roll out products on.

Click here to read the rest of our coverage of ARM Kigen and iSIM!

Source: ARM

Windows 10 on ARM Details

Subject: General Tech | February 19, 2018 - 01:22 PM |
Tagged: microsoft, windows 10, qualcomm, arm

Paul Thurrott found a developer documentation page, Troubleshooting x86 Desktop Apps, on the Windows Dev Center. The goal of the page is to list a few reasons why the software you develop might not be compatible with Windows 10 on ARM and the WOW translation layer. Yup, they’re reusing that name, which was the translation layer for 32-bit Win32 applications running on 64-bit Windows.

microsoft-2016-uwp-logo.png

Based on this document, we now know that Windows on ARM:

  • Will not translate x86 drivers, just x86 applications and services.
  • Does not support 64-bit applications (Thurrott.com says they’re working on it.)
  • Does not support (hardware-accelerated) OpenGL 1.1+ or DirectX 1-8
    • Vulkan is not mentioned anywhere, but I’m guessing not.

There are also a few other issues, like the application cannot modify Windows components (ex: the 7-zip entry in the Windows file explorer’s right-click menu) unless it is recompiled for ARM. Thurrott.com also says that Hyper-V is not supported in Windows 10 on ARM.

The amount of software that Windows on ARM can run is surprisingly both broader and narrower than I would have expected. The major issue for me is OpenGL – you would think that the graphics driver would dictate this, not so much the OS APIs. I certainly hope that, especially after their other pushes toward openness, Microsoft isn’t pressuring ARM manufacturers to not ship an OpenGL driver, even though the hardware vendors clearly know how to support OpenGL ES at the very least.

And yes, there could very well be a good reason, and they might even be working on OpenGL support as we speak, but it’s an odd omission (at least for now).

Lastly, this has nothing to do with UWP applications. This document is only about standard Win32 applications running on ARM processors. UWP is designed to be cross-architecture. You just need to include the ARM target when you build and package.

Source: Microsoft

Podcast #487 - AMD Desktop APUs, Snapdragon 845, ARM Machine Learning, and more!

Subject: General Tech | February 15, 2018 - 11:32 AM |
Tagged: podcast, Intel, amd, nvidia, raven ridge, r5 2400g, r3 2200g, arm, project trillium, qualcomm, snapdragon 845, x24, LTE, 5G

PC Perspective Podcast #487 - 02/15/18

Join us this week for a recap of news and reviews including new AMD Desktop APUs, Snapdragon 845 Performance Preview, ARM Machine Learning, and more!

You can subscribe to us through iTunes and you can still access it directly through the RSS page HERE.

The URL for the podcast is: http://pcper.com/podcast - Share with your friends!

Hosts: Ryan Shrout, Jeremy Hellstrom, Josh Walrath, Allyn Malventano

Peanut Gallery: Alex Lustenberg, Ken Addison

Program length: 1:18:46

Podcast topics of discussion:

  1. Week in Review:
  2. News items of interest:
  3. Picks of the Week:
    1. 1:09:00 Jeremy: &genie=1
  4. Closing/outro
 
Author:
Subject: General Tech
Manufacturer: ARM

Addressing New Markets

Machine Learning is one of the hot topics in technology, and certainly one that is growing at a very fast rate. Applications such as facial recognition and self-driving cars are powering much of the development going on in this area. So far we have seen CPUs and GPUs being used in ML applications, but in most cases these are not the most efficient ways of doing these highly parallel but relatively computationally simple workloads. New chips have been introduced that are far more focused on machine learning, and now it seems that ARM is throwing their hat into the ring.

ml_01.png

ARM is introducing three products under the Project Trillium brand. It features a ML processor, a OD (Object Detection) processor, and a ARM developed Neural Network software stack. This project came as a surprise for most of us, but in hindsight it is a logical avenue for them to address as it will be incredibly important moving forward. Currently many applications that require machine learning are not processed at the edge, namely in the consumer’s hand or device right next to them. Workloads may be requested from the edge, but most of the heavy duty processing occurs in datacenters located all around the world. This requires communication, and sometimes pretty hefty levels of bandwidth. If neither of those things are present, applications requiring ML break down.

ml_02.png

Click here to read the rest of the article about Project Trillium!

CES 2018: Lenovo Joins the Windows on ARM Rush with the 12-inch Miix 630 2-in-1 PC

Subject: Mobile | January 8, 2018 - 08:00 PM |
Tagged: WOA, windows on arm, snapdragon 835, snapdragon, qualcomm, Lenovo, laptop, convertible, CES 2017, arm, 2-in-1

Lenovo today unveiled the Miix 630, a 12-inch Windows 10 S device powered by Qualcomm's Snapdragon 835 processor. With the Miix 630, Lenovo joins HP, ASUS, and other manufacturers in the new Windows on ARM product category of ultraportable, always connected PCs and tablets.

lenovo-miix-630-front.jpg

The Miix 630 is powered by the Qualcomm Snapdragon 835 with integrated Adreno 540 graphics. It features a 12.3-inch 1920x1280 touchscreen display which, when paired with the included Lenovo pen, offers up to 1,024 levels of pressure sensitivity for drawing and writing. Other features include a 5MP front facing infrared camera with Windows Hello support, 13MP rear camera, detachable backlit keyboard with touchpad, and integrated LTE for the "always on" feature that distinguishes these devices from those with traditional mobile connectivity options.

lenovo-miix-630-closed.jpg

Despite its "always on" capabilities, the Miix 630 joins other Windows on ARM devices in touting lengthy battery life, with negligible battery draw while in standby mode and actual usage time of 20 hours for tasks such as continuous video playback.

The Miix 630's complete specs:

Processor Qualcomm Snapdragon 835
Graphics Adreno 540
Micrphones 2
Speakers 2 x 1 watt
Memory 4GB / 8GB
Storage 64GB / 128GB / 256GB
Battery 48 Whr
Display 12.3-inch
WUXGA+ (1920 x 1280)
Corning Glass Screen
Ports 1 x USB Type-C
1 x 3.5mm Audio In/Out
1 x SD Card
1 x Nano SIM Card
Connectivity 2x2 Wi-Fi 802.11ac
Bluetooth 4.1
LTE Cat11
Dimensions (D) 210mm x (W) 293mm x (H) 15.6mm
Weight 2.93 lbs (1.33 kg)

lenovo-miix-630-angle.jpg

Complete pricing for the higher-end configurations is not yet available, but Lenovo states that the Miix 630's base configuration will start at $799. It's expected to launch in the second quarter of this year.

Source:

Meltdown and Spectre Security Vulnerability Impacts Intel most, but AMD, Arm as well

Subject: Processors | January 3, 2018 - 08:17 PM |
Tagged: Intel, amd, arm, meltdown, spectre, security

The following story was originally posted on ShroutResearch.com.

UPDATE 1 - 8:25pm

Just before the closing bell on Wednesday, Intel released a statement responding to the security issues brought up in this story. While acknowledging that these new security concerns do exist, the company went out of its way to insinuate that AMD, Arm Holdings, and others were at risk. Intel also states that performance impact on patched machines “should not be significant and will be mitigated over time.”

Intel’s statement is at least mostly accurate though the released report from the Google Project Zero group responsible for finding the security vulnerability goes into much more detail. The security issue concerns a feature called “speculative execution” in which a computer tries to predict work that will be needed beforehand to speed up processing tasks. The paper details three variants of this particular vulnerability, the first of which applies to Intel, AMD, Arm, any nearly every other modern processor architecture. This variant is easily patched and should have near-zero effect on performance.

The second variant is deeply architecture specific, meaning attackers would need a unique code for each different Intel or AMD processor. This example should be exceedingly rare in the wild, and AMD goes as far as to call it a “near-zero” risk for systems.

The third is where things are more complex and where the claim that AMD processors are not susceptible is confirmed. This one is the source of the leaks and information that filtered out and was the target of the information for the story below. In its statement, AMD makes clear that due to architectural design differences on its products, past and modern processors from its family are not at risk.

The final outlook from this story looks very similar to how it did early on Wednesday though with a couple of added wrinkles. The security report released by Project Zero indicates that most modern hardware is at risk though to different degrees based on the design of the chips themselves. Intel is not alone in this instance, but it does have additional vulnerabilities that other processor designs do not incur. To insinuate otherwise in its public statement is incorrect.

As for performance impact, most of the initial testing and speculation is likely exaggerating how it will change the landscape, if at all. Neither Intel nor AMD see a “doomsday” scenario of regressing computing performance because of this security patch.

At the end of 2017, Intel CEO Brian Krzanich said his company would be going through changes in the New Year, becoming more aggressive, and taking the fight to its competitors in new and existing markets. It seems that BK will have his first opportunity to prove out this new corporate strategy with a looming security issue that affects nearly 10 years of processors.

recently revealed hardware bug in Intel processors is coming to light as operating system vendors like Microsoft and the Linux community scramble to update platforms to avoid potential security concerns. This bug has been rumored for some time, with updates to core Linux software packages indicating that a severe vulnerability was being fixed, but with comments redacted when published. Security flaws are often kept secret to avoid being exploited by attackers until software patches are available to correct them.

This hardware-level vulnerability allows user-mode applications, those run by general consumers or businesses, to potentially gain access to kernel-level memory space, an area that is handled by the operating system exclusively and can contain sensitive information like passwords, biometrics, and more. An attacker could use this flaw to potentially access other user-mode application data, compromising entire systems with bypass around integrated operating system firewalls.

At a time when Intel is being pressured from many different angles and markets, this vulnerability and hardware bug comes at an incredibly inopportune time. AMD spent its 2017 releasing competitive products in the consumer space with Ryzen and the enterprise space with EPYC. The enterprise markets in particular are at risk for Intel. The EPYC processors already offered performance and pricing advantages and now AMD can showcase security as none of its processor are affected by the same vulnerability that Intel is saddled with. Though the enterprise space works in cycles, and AMD won’t see an immediate uptick in sales, I would be surprised if this did not push more cloud providers and large scale server deployments to look at the AMD offerings.

MW-FE472_intel0_20170125203729_ZH.jpg

At this point, only the Linux community has publicly discussed the fixes taking place, with initial patches going out earlier this week. Much of the enterprise and cloud ecosystem runs on Linux-based platforms and securing these systems against attack is a crucial step. Microsoft has yet to comment publicly on what its software updates will look like, when they will be delivered, and what impact they have might on consumer systems.

While hardware and software vulnerabilities are common in today’s connected world, there are two key points that make this situation more significant. First, this is a hardware bug, meaning that it cannot be fixed or addressed completely without Intel making changes to its hardware design, a process that can take months or years to complete. As far as we can tell, this bug will affect ALL Intel processors released in the last decade or more, including enterprise Xeon processors and consumer Core and Pentium offerings. And as Intel has been the dominate market leader in both the enterprise and consumer spaces, there are potentially hundreds of millions of affected systems in the field.

The second differentiating point for this issue is that the software fix could impact the performance of systems. Initial numbers have been claiming as much as a 30% reduction in performance, but those results are likely worst case scenarios. Some early testing of the updated Linux platforms indicate performance could decrease from 6-20% depending on the application. Other testing of consumer workloads including gaming show almost no performance impact. Linux founder and active developer Linus Torvalds claims performance impact would range from nothing to “double-digit slowdowns.”

Even though the true nature of this vulnerability is still tied behind non-disclosure agreements, it is unlikely that there will be a double-digit performance reduction on servers at a mass scale when these updates are pushed out. Intel is aware of this vulnerability and has been for some time, and financially it would need to plan for any kind of product replacement or reimbursement campaign it might undertake with partners and customers.

Podcast #478 - Windows on ARM, Intel 10nm rumors, and more!

Subject: General Tech | December 7, 2017 - 01:45 PM |
Tagged: podcast, xfx, Vega, Raspberry Pi, radeon, qualcomm, nicehash, Intel, IME, GTX 1070Ti, gddr6, evga, Elgato, dell, coolermaster, cluster, asus, arm, amd, AM4, Adrenalin Edition, 4k60, 10nm, video

PC Perspective Podcast #478 - 12/07/17

Join us for discussion on Windows on ARM, Intel 10nm rumors, and more!

You can subscribe to us through iTunes and you can still access it directly through the RSS page HERE.

The URL for the podcast is: http://pcper.com/podcast - Share with your friends!

Hosts: Ryan Shrout, Josh Walrath, Jeremy Hellstrom, Allyn Malventano, Jim Tanous

Peanut Gallery: Alex Lustenberg

Program length: 1:39:42

Podcast topics of discussion:
  1. Week in Review:
  2. News items of interest:
  3. Closing/outro

Source:

Qualcomm Centriq 2400 Arm-based Server Processor Begins Commercial Shipment

Subject: Processors | November 8, 2017 - 02:03 PM |
Tagged: qualcomm, centriq 2400, centriq, arm

At an event in San Jose on Wednesday, Qualcomm and partners officially announced that its Centriq 2400 server processor based on the Arm-architecture was shipping to commercial clients. This launch is of note as it becomes the highest-profile and most partner-lauded Arm-based server CPU and platform to be released after years of buildup and excitement around several similar products. The Centriq is built specifically for enterprise cloud workloads with an emphasis on high core count and high throughput and will compete against Intel’s Xeon Scalable and AMD’s new EPYC platforms.

qc2.jpg

Paul Jacobs shows Qualcomm Centriq to press and analysts

Built on the same 10nm process technology from Samsung that gave rise to the Snapdragon 835, the Centriq 2400 becomes the first server processor in that particular node. While Qualcomm and Samsung tout that as a significant selling point, on its own it doesn’t hold much value. Where it does come into play and impact the product position with the resulting power efficiency it brings to the table. Qualcomm claims that the Centriq 2400 will “offer exceptional performance-per-watt and performance-per dollar” compared to the competition server options.

The raw specifications and capabilities of the Centriq 2400 are impressive.

  Centriq 2460 Centriq 2452 Centriq 2434
Architecture ARMv8 (64-bit)
Core: Falkor
ARMv8 (64-bit)
Core: Falkor
ARMv8 (64-bit)
Core: Falkor
Process Tech 10nm (Samsung) 10nm (Samsung) 10nm (Samsung)
Socket ? ? ?
Cores/Threads 48/48 46/46 40/40
Base Clock 2.2 GHz 2.2 GHz 2.3 GHz
Max Clock 2.6 GHz 2.6 GHz 2.5 GHz
Memory Tech DDR4 DDR4 DDR4
Memory Speeds 2667 MHz
128 GB/s
2667 MHz
128 GB/s
2667 MHz
128 GB/s
Cache 24MB L2, split
60MB L3
23MB L2, split
57.5MB L3
20MB L2, split
50MB L3
PCIe 32 lanes PCIe 3.0 32 lanes PCIe 3.0 32 lanes PCIe 3.0
Graphics N/A N/A N/A
TDP 120W 120W 120W
MSRP $1995 $1383 $888

Built on 18 billion transistors a die area of just 398mm2, the SoC holds 48 high-performance 64-bit cores running at frequencies as high as 2.6 GHz. (Interestingly, this appears to be about the same peak clock rate of all the Snapdragon processor cores we have seen on consumer products.) The cores are interconnected by a bi-directional ring bus that is reminiscent of the integration Intel used on its Core processor family up until Skylake-SP was brought to market. The bus supports 250 GB/s of aggregate bandwidth and Qualcomm claims that this will alleviate any concern over congestion bottlenecks, even with the CPU cores under full load.

qc1.jpg

The caching system provides 512KB of L2 cache for every pair of CPU cores, essentially organizing them into dual-core blocks. 60MB of L3 cache provides core-to-core communications and the cache is physically divided around the die for on-average faster access. A 6-channel DDR4 memory systems, with unknown peak frequency, supports a total of 768GB of capacity.

Connectivity is supplied with 32 lanes of PCIe 3.0 and up to 6 PCIe devices.

As you should expect, the Centriq 2400 supports the ARM TrustZone secure operating environment and hypervisors for virtualized environments. With this many cores on a single chip, it seems likely one of the key use cases for the server CPU.

Maybe most impressive is the power requirements of the Centriq 2400. It can offer this level of performance and connectivity with just 120 watts of power.

With a price of $1995 for the Centriq 2460, Qualcomm claims that it can offer “4X better performance per dollar and up to 45% better performance per watt versus Intel’s highest performance Skylake processor, the Intel Xeon Platinum 8180.” That’s no small claim. The 8180 is a 28-core/56-thread CPU with a peak frequency of 3.8 GHz and a TDP of 205 watts and a cost of $10,000 (not a typo).

Qualcomm had performance metrics from industry standard SPECint measurements, in both raw single thread configurations as well as performance per dollar and per watt. I will have more on the performance story of Centriq later this week.

qc2.jpg

perf1.jpg

More important than simply showing hardware, Qualcomm and several partners on hand at the press event as well as many statements from important vendors like Alibaba, HPE, Google, Microsoft, and Samsung. Present to showcase applications running on the Arm-based server platforms was an impressive list of the key cloud services providers: Alibaba, LinkedIn, Cloudflare, American Megatrends Inc., Arm, Cadence Design Systems, Canonical, Chelsio Communications, Excelero, Hewlett Packard Enterprise, Illumina, MariaDB, Mellanox, Microsoft Azure, MongoDB, Netronome, Packet, Red Hat, ScyllaDB, 6WIND, Samsung, Solarflare, Smartcore, SUSE, Uber, and Xilinx.

The Centriq 2400 series of SoC isn’t perfect for all general-purpose workloads and that is something we have understood from the outset of this venture by Arm and its partners to bring this architecture to the enterprise markets. Qualcomm states that its parts are designed for “highly threaded cloud native applications that are developed as micro-services and deployed for scale-out.” The result is a set of workloads that covers a lot of ground:

  • Web front end with HipHop Virtual Machine
  • NoSQL databases including MongoDB, Varnish, Scylladb
  • Cloud orchestration and automation including Kubernetes, Docker, metal-as-a-service
  • Data analytics including Apache Spark
  • Deep learning inference
  • Network function virtualization
  • Video and image processing acceleration
  • Multi-core electronic design automation
  • High throughput compute bioinformatics
  • Neural class networks
  • OpenStack Platform
  • Scaleout Server SAN with NVMe
  • Server-based network offload

I will be diving more into the architecture, system designs, and partner announcements later this week as I think the Qualcomm Centriq 2400 family will have a significant impact on the future of the enterprise server markets.

Source: Qualcomm

Podcast #474 - Optane 900P, Cord Cutting, 1070 Ti, and more!

Subject: General Tech | November 2, 2017 - 12:11 PM |
Tagged: Volta, video, podcast, PCI-e 4, nvidia, msi, Microsoft Andromeda, Memristors, Mali-D71, Intel Optane, gtx 1070 ti, cord cutting, arm, aegis 3, 8th generation core

PC Perspective Podcast #474 - 11/02/17

Join us for discussion on Optane 900P, Cord Cutting, 1070 Ti, and more!

You can subscribe to us through iTunes and you can still access it directly through the RSS page HERE.

The URL for the podcast is: http://pcper.com/podcast - Share with your friends!

Hosts: Ryan Shrout, Josh Walrath, Jeremy Hellstrom, Allyn Malventano,

Peanut Gallery: Ken Addison, Alex Lustenberg

Program length: 1:32:19

Podcast topics of discussion:
  1. Week in Review:
  2. News items of interest:
  3. Hardware/Software Picks of the Week
    1. 1:17:00 Ryan: Intel 900P Optane SSD
    2. 1:26:45 Allyn: Sony RX10 Mk IV. Pricey, but damn good.
  4. Closing/outro

Source: