ARM Refreshes All the Things
This past April ARM invited us to visit Cambridge, England so they could discuss with us their plans for the next year. Quite a bit has changed for the company since our last ARM Tech Day in 2016. They were acquired by SoftBank, but continue to essentially operate as their own company. They now have access to more funds, are less risk averse, and have a greater ability to expand in the ever growing mobile and IOT marketplaces.
The ARM of today certainly is quite different than what we had known 10 years ago when we saw their technology used in the first iPhone. The company back then had good technology, but a relatively small head count. They kept pace with the industry, but were not nearly as aggressive as other chip companies in some areas. Through the past 10 years they have grown not only in numbers, but in technologies that they have constantly expanded on. The company became more PR savvy and communicated more effectively with the press and in the end their primary users. Where once ARM would announce new products and not expect to see shipping products upwards of 3 years away, we are now seeing the company be much more aggressive with their designs and getting them out to their partners so that production ends up happening in months as compared to years.
Several days of meetings and presentations left us a bit overwhelmed by what ARM is bringing to market towards the end of 2017 and most likely beginning of 2018. On the surface it appears that ARM has only done a refresh of the CPU and GPU products, but once we start looking at these products in the greater scheme and how they interact with DynamIQ we see that ARM has changed the mobile computing landscape dramatically. This new computing concept allows greater performance, flexibility, and efficiency in designs. Partners will have far more control over these licensed products to create more value and differentiation as compared to years past.
We have previously covered DynamIQ at PCPer this past March. ARM wanted to seed that concept before they jumped into more discussions on their latest CPUs and GPUs. Previous Cortex products cannot be used with DynamIQ. To leverage that technology we must have new CPU designs. In this article we are covering the Cortex-A55 and Cortex-A75. These two new CPUs on the surface look more like a refresh, but when we dig in we see that some massive changes have been wrought throughout. ARM has taken the concepts of the previous A53 and A73 and expanded upon them fairly dramatically, not only to work with DynamIQ but also by removing significant bottlenecks that have impeded theoretical performance.
A Watershed Moment in Mobile
This previous May I was invited to Austin to be briefed on the latest core innovations from ARM and their partners. We were introduced to new CPU and GPU cores, as well as the surrounding technologies that provide the basis of a modern SOC in the ARM family. We also were treated to more information about the process technologies that ARM would embrace with their Artisan and POP programs. ARM is certainly far more aggressive now in their designs and partnerships than they have been in the past, or at least they are more willing to openly talk about them to the press.
The big process news that ARM was able to share at this time was the design of 10nm parts using an upcoming TSMC process node. This was fairly big news as TSMC was still introducing parts on their latest 16nm FF+ line. NVIDIA had not even released their first 16FF+ parts to the world in early May. Apple had dual sourced their 14/16 nm parts from Samsung and TSMC respectively, but these were based on LPE and FF lines (early nodes not yet optimized to LPP/FF+). So the news that TSMC would have a working 10nm process in 2017 was important to many people. 2016 might be a year with some good performance and efficiency jumps, but it seems that 2017 would provide another big leap forward after years of seeming stagnation of pure play foundry technology at 28nm.
Yesterday we received a new announcement from ARM that shows an amazing shift in thought and industry inertia. ARM is partnering with Intel to introduce select products on Intel’s upcoming 10nm foundry process. This news is both surprising and expected. It is surprising in that it happened as quickly as it did. It is expected as Intel is facing a very different world than it had planned for 10 years ago. We could argue that it is much different than they planned for 5 years ago.
Intel is the undisputed leader in process technologies and foundry practices. They are the gold standard of developing new, cutting edge process nodes and implementing them on a vast scale. This has served them well through the years as they could provide product to their customers seemingly on demand. It also allowed them a leg up in technology when their designs may not have fit what the industry wanted or needed (Pentium 4, etc.). It also allowed them to potentially compete in the mobile market with designs that were not entirely suited for ultra-low power. x86 is a modern processor technology with decades of development behind it, but that development focused mainly on performance at higher TDP ranges.
This past year Intel signaled their intent to move out of the sub 5 watt market and cede it to ARM and their partners. Intel’s ultra mobile offerings just did not make an impact in an area that they were expected to. For all of Intel’s advances in process technology, the base ARM architecture is just better suited to these power envelopes. Instead of throwing good money after bad (in the form of development time, wafer starts, rebates) Intel has stepped away from this market.
This leaves Intel with a problem. What to do with extra production capacity? Running a fab is a very expensive endeavor. If these megafabs are not producing chips 24/7, then the company is losing money. This past year Intel has seen their fair share of layoffs and slowing down production/conversion of fabs. The money spent on developing new, cutting edge process technologies cannot stop for the company if they want to keep their dominant position in the CPU industry. Some years back they opened up their process products to select 3rd party companies to help fill in the gaps of production. Right now Intel has far more production line space than they need for the current market demands. Yes, there were delays in their latest Skylake based processors, but those were solved and Intel is full steam ahead. Unfortunately, they do not seem to be keeping their fabs utilized at the level needed or desired. The only real option seems to be opening up some fab space to more potential customers in a market that they are no longer competing directly in.
The Intel Custom Foundry Group is working with ARM to provide access to their 10nm HPM process node. Initial production of these latest generation designs will commence in Q1 2017 with full scale production in Q4 2017. We do not have exact information as to what cores will be used, but we can imagine that they will be Cortex-A73 and A53 parts in big.LITTLE designs. Mali graphics will probably be the first to be offered on this advanced node as well due to the Artisan/POP program. Initial customers have not been disclosed and we likely will not hear about them until early 2017.
This is a big step for Intel. It is also a logical progression for them when we look over the changing market conditions of the past few years. They were unable to adequately compete in the handheld/mobile market with their x86 designs, but they still wanted to profit off of this ever expanding area. The logical way to monetize this market is to make the chips for those that are successfully competing here. This will cut into Intel’s margins, but it should increase their overall revenue base if they are successful here. There is no reason to believe that they won’t be.
The last question we have is if the 10nm HPM node will be identical to what Intel will use for their next generation “Cannonlake” products. My best guess is that the foundry process will be slightly different and will not provide some of the “secret sauce” that Intel will keep for themselves. It will probably be a mobile focused process node that stresses efficiency rather than transistor switching speed. I could be very wrong here, but I don’t believe that Intel will open up their process to everyone that comes to them hat in hand (AMD).
The partnership between ARM and Intel is a very interesting one that will benefit customers around the globe if it is handled correctly from both sides. Intel has a “not invented here” culture that has both benefited it and caused it much grief. Perhaps some flexibility on the foundry side will reap benefits of its own when dealing with very different designs than Intel is used to. This is a titanic move from where Intel probably thought it would be when it first started to pursue the ultra-mobile market, but it is a move that shows the giant can still positively react to industry trends.
New Products for 2017
PC Perspective was invited to Austin, TX on May 11 and 12 to participate in ARM’s yearly tech day. Also invited were a handful of editors and analysts that cover the PC and mobile markets. Those folks were all pretty smart, so it is confusing as to why they invited me. Perhaps word of my unique talent of screenshoting PDFs into near-unreadable JPGs preceded me? Regardless of the reason, I was treated to two full days of in-depth discussion of the latest generation of CPU and GPU cores, 10nm test chips, and information on new licensing options.
Today ARM is announcing their next CPU core with the introduction of the Cortex-A73. They are also unwrapping the latest Mali-G71 graphics technology. Other technologies such as the CCI-550 interconnect are also revealed. It is a busy and important day for ARM, especially in light of Intel seemingly abandoning the sub-milliwatt mobile market.
ARM previously announced the Cortex-A72 in February, 2015. Since that time it has been seen in most flagship mobile devices in late 2015 and throughout 2016. The market continues to evolve, and as such the workloads and form factors have pushed ARM to continue to develop and improve their CPU technology.
The Sofia Antipolis, France design group is behind the new A73. The previous several core architectures had been developed by the Cambridge group. As such, the new design differs quite dramatically from the previous A72. I was actually somewhat taken aback by the differences in the design philosophy of the two groups and the changes between the A72 and A73, but the generational jumps we have seen in the past make a bit more sense to me.
The marketplace is constantly changing when it comes to workloads and form factors. More and more complex applications are being ported to mobile devices, including hot technologies like AR and VR. Other technologies include 3D/360 degree video, greater than 20 MP cameras, and 4K/8K displays and their video playback formats. Form factors on the other hand have continued to decrease in size, especially in overall height. We have relatively large screens on most premium devices, but the designers have continued to make these phones thinner and thinner throughout the years. This has put a lot of pressure on ARM and their partners to increase performance while keeping TDPs in check, and even reducing them so they more adequately fit in the TDP envelope of these extremely thin devices.
10nm Sooner Than Expected?
It seems only yesterday that we had the first major GPU released on 16nm FF+ and now we are talking about ARM about to receive their first 10nm FF test chips! Well, in fact it was yesterday that NVIDIA formally released performance figures on the latest GeForce GTX 1080 which is based on TSMC’s 16nm FF+ process technology. Currently TSMC is going full bore on their latest process node and producing the fastest current graphics chip around. It has taken the foundry industry as a whole a lot longer to develop FinFET technology than expected, but now that they have that piece of the puzzle seemingly mastered they are moving to a new process node at an accelerated rate.
TSMC’s 10nm FF is not well understood by press and analysts yet, but we gather that it is more of a marketing term than a true drop to 10 nm features. Intel has yet to get past 14nm and does not expect 10 nm production until well into next year. TSMC is promising their version in the second half of 2016. We cannot assume that TSMC’s version will match what Intel will be doing in terms of geometries and electrical characteristics, but we do know that it is a step past TSMC’s 16nm FF products. Lithography will likely get a boost with triple patterning exposure. My guess is that the back end will also move away from the “20nm metal” stages that we see with 16nm. All in all, it should be an improved product from what we see with 16nm, but time will tell if it can match the performance and density of competing lines that bear the 10nm name from Intel, Samsung, and GLOBALFOUNDRIES.
ARM has a history of porting their architectures to new process nodes, but they are being a bit more aggressive here than we have seen in the past. It used to be that ARM would announce a new core or technology, and it would take up to two years to be introduced into the market. Now we are seeing technology announcements and actual products hitting the scenes about nine months later. With the mobile market continuing to grow we expect to see products quicker to market still.
The company designed a simplified test chip to tape out and send to TSMC for test production on the aforementioned 10nm FF process. The chip was taped out in December, 2015. The design was shipped to TSMC for mask production and wafer starts. ARM is expecting the finished wafers to arrive this month.
Looking Towards 2016
ARM invited us to a short conversation with them on the prospects of 2016. The initial answer as to how they feel the upcoming year will pan out is, “Interesting”. We covered a variety of topics ranging from VR to process technology. ARM is not announcing any new products at this time, but throughout this year they will continue to push their latest Mali graphics products as well as the Cortex A72.
Trends to Watch in 2016
The one overriding trend that we will see is that of “good phones at every price point”. ARM’s IP scales from very low to very high end mobile SOCs and their partners are taking advantage of the length and breadth of these technologies. High end phones based on custom cores (Apple, Qualcomm) will compete against those licensing the Cortex A72 and A57 parts for their phones. Lower end options that are less expensive and pull less power (which then requires less battery) will flesh out the midrange and budget parts. Unlike several years ago, the products from top to bottom are eminently usable and relatively powerful products.
Camera improvements will also take center stage for many products and continue to be a selling point and an area of differentiation for competitors. Improved sensors and software will obviously be the areas where the ARM partners will focus on, but ARM is putting some work into this area as well. Post processing requires quite a bit of power to do quickly and effectively. ARM is helping here to leverage the Neon SIMD engine and leveraging the power of the Mali GPU.
4K video is becoming more and more common as well with handhelds, and ARM is hoping to leverage that capability in shooting static pictures. A single 4K frame is around 8 megapixels in size. So instead of capturing video, the handheld can achieve a “best shot” type functionality. So the phone captures the 4K video and then users can choose the best shot available to them in that period of time. This is a simple idea that will be a nice feature for those with a product that can capture 4K video.
Subject: Editorial | May 21, 2015 - 03:34 PM | Ken Addison
Tagged: podcast, video, amd, hbm, Fiji, g-sync, ips, XB270HU, corsair, Oculus, supermicro, asus, gladius, jem davies, arm, mali
PC Perspective Podcast #350 - 05/21/2015
Join us this week as we discuss AMD's plan for HBM, IPS G-SYNC, GameWorks and The Witcher 3, and more!
The URL for the podcast is: http://pcper.com/podcast - Share with your friends!
- iTunes - Subscribe to the podcast directly through the Store
- RSS - Subscribe through your regular RSS reader
- MP3 - Direct download link to the MP3 file
Hosts: Ryan Shrout, Jeremy Hellstrom, Josh Walrath, and Allyn Malventano
Program length: 1:24:12
Week in Review:
News item of interest:
Hardware/Software Picks of the Week:
Sebastian: Aukey Quick Charge 2.0 Portable Charger
Subject: Mobile | May 15, 2015 - 01:56 PM | Ryan Shrout
Tagged: video, mali, jem davies, interview, arm
Have you ever wondered how a mobile GPU is born? Or how the architecture of a mobile GPU like ARM Mali differs from the technology in your discrete PC graphics card? Perhaps you just want to know if ideas like HBM (high bandwidth memory) are going to find their way into the mobile ecosystem any time soon?
Josh and I sat down (virtually) with ARM's VP of Technology and Fellow, Jem Davies, to answer these questions and quite a bit more. The resulting interview will shed light on the design process of a mobile GPU, how you get the most out of an SoC that measures power by the milliwatt, what the world of mobile benchmarking needs to do to clean up its act and quite a bit more.
You'd be hard pressed to find a better way to spend the next hour of your day as you will without a doubt walk away more informed about the world of smartphones, tablets and GPUs.
ARM Releases Cortex-A72 for Licensing
On February 3rd, ARM announced a slew of new designs, including the Cortex A72. Few details were shared with us, but what we learned was that it could potentially redefine power and performance in the ARM ecosystem. Ryan was invited to London to participate in a deep dive of what ARM has done to improve its position against market behemoth Intel in the very competitive mobile space. Intel has a leg up on process technology with their 14nm Tri-Gate process, but they are continuing to work hard in making their x86 based processors more power efficient, while still maintaining good performance. There are certain drawbacks to using an ISA that is focused on high performance computing rather than being designed from scratch to provide good performance with excellent energy efficiency.
ARM has been on a pretty good roll with their Cortex A9, A7, A15, A17, A53, and A57 parts over the past several years. These designs have been utilized in a multitude of products and scenarios, with configurations that have scaled up to 16 cores. While each iteration has improved upon the previous, ARM is facing the specter of Intel’s latest generation, highly efficient x86 SOCs based on the 2nd gen 14nm Tri-Gate process. Several things have fallen into place for ARM to help them stay competitive, but we also cannot ignore the experience and design hours that have led to this product.
(Editor's Note: During my time with ARM last week it became very apparent that it is not standing still, not satisfied with its current status. With competition from Intel, Qualcomm and others ramping up over the next 12 months in both mobile and server markets, ARM will more than ever be depedent on the evolution of core design and GPU design to maintain advantages in performance and efficiency. As Josh will go into more detail here, the Cortex-A72 appears to be an incredibly impressive design and all indications and conversations I have had with others, outside of ARM, believe that it will be an incredibly successful product.)
Cortex A72: Highest Performance ARM Cortex
ARM has been ubiquitous for mobile applications since it first started selling licenses for their products in the 90s. They were found everywhere it seemed, but most people wouldn’t recognize the name ARM because these chips were fabricated and sold by licensees under their own names. Guys like Ti, Qualcomm, Apple, DEC and others all licensed and adopted ARM technology in one form or the other.
ARM’s importance grew dramatically with the introduction of increased complexity cellphones and smartphones. They also gained attention through multimedia devices such as the Microsoft Zune. What was once a fairly niche company with low performance, low power offerings became the 800 pound gorilla in the mobile market. Billions of chips are sold yearly based on ARM technology. To stay in that position ARM has worked aggressively on continually providing excellent power characteristics for their parts, but now they are really focusing on overall performance and capabilities to address, not only the smartphone market, but also the higher performance computing and server spaces that they want a significant presence in.
ARM Releases Top Cortex Design to Partners
ARM has an interesting history of releasing products. The company was once in the shadowy background of the CPU world, but with the explosion of mobile devices and its relevance in that market, ARM has had to adjust how it approaches the public with their technologies. For years ARM has announced products and technology, only to see it ship one to two years down the line. It seems that with the increased competition in the marketplace from Apple, Intel, NVIDIA, and Qualcomm ARM is now pushing to license out its new IP in a way that will enable their partners to achieve a faster time to market.
The big news this time is the introduction of the Cortex A72. This is a brand new design that will be based on the ARMv8-A instruction set. This is a 64 bit capable processor that is also backwards compatible with 32 bit applications programmed for ARMv7 based processors. ARM does not go into great detail about the product other than it is significantly faster than the previous Cortex-A15 and Cortex-A57.
The previous Cortex-A15 processors were announced several years back and made their first introduction in late 2013/early 2014. These were still 32 bit processors and while they had good performance for the time, they did not stack up well against the latest A8 SOCs from Apple. The A53 and A57 designs were also announced around two years ago. These are the first 64 bit designs from ARM and were meant to compete with the latest custom designs from Apple and Qualcomm’s upcoming 64 bit part. We are only now just seeing these parts make it into production, and even Qualcomm has licensed the A53 and A57 designs to insure a faster time to market for this latest batch of next-generation mobile devices.
We can look back over the past five years and see that ARM is moving forward in announcing their parts and then having their partners ship them within a much shorter timespan than we were used to seeing. ARM is hoping to accelerate the introduction of its new parts within the next year.
Subject: Processors, Mobile | October 29, 2014 - 04:30 AM | Scott Michaud
Tagged: arm, mali-T800, mali
While some mobile SoC manufacturers have created their own graphics architectures, others license from ARM (and some even have a mixture of each within their product stack). There does not seem to be a specific push with this generation, rather just increases in the areas that make the most sense. Some comments tout increased energy efficiency, others higher performance, and even API support got a boost to OpenGL ES 3.1, which brings compute shaders to mobile graphics applications (without invoking OpenCL, etc.).
Three models are in the Mali-T800 series: the T820, the T830, and the T860. As you climb in the list, the products go from entry level to high-performance mobile. GPUs are often designed in modularized segments, which ARM calls cores. You see this frequently in desktop, discrete graphics cards where an entire product stack contains a handful of actual designs, but products are made by disabling whole modules. The T820 and T830 can scale between one to four "core" modules, each core containing four actual "shader cores", while the T860 can scale between one to sixteen "core" modules, each core with 16 "shader cores". Again "core modules" are groups that contain actual shader processors (and L2 cache, etc.). Cores in cores.
This is probably why NVIDIA calls them "Streaming Multiprocessors" that contain "CUDA Cores".
ARM does not (yet) provide an actual GFLOP rating for these processors, and it is up to manufacturers to some extent. It is normally a matter of multiplying the clock frequency by the number of ops per cycle and by the number of shader units available. I tried, but I assume my assumption of instructions per clock was off because the number I was getting did not match with known values from previous generations, so I assumed that I made a mistake. Also, again, ARM considers their performance figures to be conservative. Manufacturers should have no problem exceeding these, effortlessly.
As for a release timeline? Because these architectures are designed for manufacturers to implement, you should start seeing them within devices hitting retail in late 2015, early 2016.