ARM Unveils the Cortex-A72 Processor Architecture Details
ARM Releases Cortex-A72 for Licensing
On February 3rd, ARM announced a slew of new designs, including the Cortex A72. Few details were shared with us, but what we learned was that it could potentially redefine power and performance in the ARM ecosystem. Ryan was invited to London to participate in a deep dive of what ARM has done to improve its position against market behemoth Intel in the very competitive mobile space. Intel has a leg up on process technology with their 14nm Tri-Gate process, but they are continuing to work hard in making their x86 based processors more power efficient, while still maintaining good performance. There are certain drawbacks to using an ISA that is focused on high performance computing rather than being designed from scratch to provide good performance with excellent energy efficiency.
ARM has been on a pretty good roll with their Cortex A9, A7, A15, A17, A53, and A57 parts over the past several years. These designs have been utilized in a multitude of products and scenarios, with configurations that have scaled up to 16 cores. While each iteration has improved upon the previous, ARM is facing the specter of Intel’s latest generation, highly efficient x86 SOCs based on the 2nd gen 14nm Tri-Gate process. Several things have fallen into place for ARM to help them stay competitive, but we also cannot ignore the experience and design hours that have led to this product.
(Editor's Note: During my time with ARM last week it became very apparent that it is not standing still, not satisfied with its current status. With competition from Intel, Qualcomm and others ramping up over the next 12 months in both mobile and server markets, ARM will more than ever be depedent on the evolution of core design and GPU design to maintain advantages in performance and efficiency. As Josh will go into more detail here, the Cortex-A72 appears to be an incredibly impressive design and all indications and conversations I have had with others, outside of ARM, believe that it will be an incredibly successful product.)
Cortex A72: Highest Performance ARM Cortex
ARM has been ubiquitous for mobile applications since it first started selling licenses for their products in the 90s. They were found everywhere it seemed, but most people wouldn’t recognize the name ARM because these chips were fabricated and sold by licensees under their own names. Guys like Ti, Qualcomm, Apple, DEC and others all licensed and adopted ARM technology in one form or the other.
ARM’s importance grew dramatically with the introduction of increased complexity cellphones and smartphones. They also gained attention through multimedia devices such as the Microsoft Zune. What was once a fairly niche company with low performance, low power offerings became the 800 pound gorilla in the mobile market. Billions of chips are sold yearly based on ARM technology. To stay in that position ARM has worked aggressively on continually providing excellent power characteristics for their parts, but now they are really focusing on overall performance and capabilities to address, not only the smartphone market, but also the higher performance computing and server spaces that they want a significant presence in.
The Cortex-A9 is probably one of the first truly impressive performers for ARM when it comes to low power devices. This was a thoroughly modern processor with the ability to add in the NEON SIMD unit. ARM then iterated upon the device and added in the low power A7, the higher performance A15 followed, and finally the high performance but power optimized A17 showed up. These are all 32 bit parts, which is generally fine for most smartphone and tablet applications. Due to usage scenarios, the integration of a 64 bit processor and OS has been considered overkill for smartphones until just recently.
ARM has jumped into the 64 bit world with their A50 series of SOCs. The A53 is the lower power, lower performance part while the A57 is the higher power, higher performance SOC. These have both improved upon the performance and power consumption of the previous Cortex parts and are in mass production right now. While these parts look good, Intel is still breathing down ARM’s neck. To counter this, ARM has released their latest SOC, the Cortex-A72.
The A72 looks to be a significant milestone for ARM. This could be the design that will catapult their portfolio to the next level and allow them to address market spaces previously closed off to them. The first thing that ARM talks about with this part is how big of a jump in both performance and power efficiency it truly is. When compared to a 28nm Cortex-A15 running at the same power level, the A72 is around 3.5x faster. When the clockspeed is turned down on the A72 and performance is at the same place as the A15, the A72 pulls 75% less power. This is very impressive, because the A15 is still a go-to core design for smartphones and tablets.
ARM has accomplished this through a lot of work and some new tricks. The A72 looks to be offered on both TSMC’s 16nm FF and FF+ process as well as Samsung’s 14nm FinFET LPE and LPP nodes. These process nodes are fairly similar, but early indications point to Samsung/GLOBALFOUNDRIES 14 nm product as being slightly superior in performance and transistor density.
The process node is only a part of the overall strategy that ARM has taken in introducing the A72. The A72 is heavily based on the previous A57, but there is not one functional unit that has been ignored. Each part has been reworked and optimized to increase density, performance, and power efficiency. This is very similar overall to what AMD has done with the HD design libraries on the Excavator core-based Carrizo APU. This fine tuning has led to higher IPC as well as lower power consumption, which also allows the design to stay at the higher burst frequency for a longer time when at full load.
There are also a lot of improvements to many of the functional units from the A57 to the A72. The A72 is of course an ARMv8-A based part that fully supports 64 bit computing. It also has improved upon the interconnects between memory, peripherals, and other cores. It features ECC memory support (important for servers), AMBA 5 CHI implementation (a high performance interconnect), and improved FP with cryptography support.