Subject: Mobile | May 15, 2015 - 01:56 PM | Ryan Shrout
Tagged: video, mali, jem davies, interview, arm
Have you ever wondered how a mobile GPU is born? Or how the architecture of a mobile GPU like ARM Mali differs from the technology in your discrete PC graphics card? Perhaps you just want to know if ideas like HBM (high bandwidth memory) are going to find their way into the mobile ecosystem any time soon?
Josh and I sat down (virtually) with ARM's VP of Technology and Fellow, Jem Davies, to answer these questions and quite a bit more. The resulting interview will shed light on the design process of a mobile GPU, how you get the most out of an SoC that measures power by the milliwatt, what the world of mobile benchmarking needs to do to clean up its act and quite a bit more.
You'd be hard pressed to find a better way to spend the next hour of your day as you will without a doubt walk away more informed about the world of smartphones, tablets and GPUs.
Some Fresh Hope for 2016
EDIT 2015-05-07: A day after the AMD analyst meeting we now know that the roadmaps delivered here are not legitimate. While some of the information is likely correct on the roadmaps, they were not leaked by AMD. There is no FM3 socket, rather AMD is going with AM4. AMD will be providing more information throughout this quarter about their roadmaps, but for now take all of this information as "not legit".
SH SOTN has some eagle eyes and spotted the latest leaked roadmap for AMD. These roadmaps cover both mobile and desktop, from 2015 through 2016. There are obviously quite a few interesting tidbits of information here.
On the mobility roadmap we see the upcoming release of Carrizo, which we have been talking about since before CES. This will be the very first HSA 1.0 compliant part to hit the market, and AMD has done some really interesting things with the design in terms of performance, power efficiency, and die size optimizations. Carrizo will span the market from 15 watts to 35 watts TDP. This is a mobile only part, but indications point to it being pretty competent overall. This is a true SOC that will support all traditional I/O functions of older standalone southbridges. Most believe that this part will be manufactured by GLOBALFOUNDIRES on their 28 nm HKMG process that is more tuned to AMD's APU needs.
Carrizo-L will be based on the Puma+ architecture and will go from 10 watts to 15 watts TDP. This will use the same FP4 BGA connection as the big Carrizo APU. This should make these parts more palatable for OEMs as they do not have to differentiate the motherboard infrastructure. Making things easier for OEMs will give more reasons for these folks to offer products based on Carrizo and Carrizo-L APUs. The other big reason will be the GCN graphics compute units. Puma+ is a very solid processor architecture for low power products, but these parts are still limited to the older 28 nm HKMG process from TSMC.
One interesting addition here is that AMD will be introducing their "Amur" APU for the low power and ultra-low power markets. These will be comprised of four Cortex-A57 CPUs combined with AMD's GCN graphics units. This will be the first time we see this combination, and the first time AMD has integrated with ARM since ATI spun off their mobile graphics to Qualcomm under the "Adreno" branding (anagram for "Radeon"). What is most interesting here is that this APU will be a 20 nm part most likely fabricated by TSMC. This is not to say that Samsung or GLOBALFOUNDRIES might be producing it, but those companies are expending their energy on the 14 nm FinFET process that will be their bread and butter for years to come. This will be a welcome addition to the mobile market (tablets and handhelds) and could be a nice profit center for AMD if they are able to release this in a timely manner.
2016 is when things get very interesting. The Zen x86 design will dominate the upper 2/3 of the roadmap. I had talked about Zen when we had some new diagram leaks yesterday, but now we get to see the first potential products based off of this architecture. In mobile it will span from 5 watts to 35 watts TDP. The performance and mainstream offerings will be the "Bristol Ridge" APU which will feature 4 Zen cores (or one Zen module) combined with the next gen GCN architecture. This will be a 14nm part, and the assumption is that it will be GLOBALFOUNDRIES using 14nm FinFET LPP (Low Power Plus) that will be more tuned for larger APUs. This will also be a full SOC.
The next APU will be codenamed "Basilisk" that will span the 5 watt to 15 watt range. It will be comprised of 2 Zen cores (1/2 of a Zen module) and likely feature 2 to 4 MB of L3 cache, depending on power requirements. This looks to be the first Skybridge set of APUs that will share the same infrastructure as the ARM based Amur SOC. FT4 BGA is the basis for both the 2015 Amur and 2016 Basilisk SOCs.
Finally we have the first iteration of AMD's first ground up implementation of ARM's ARMv8-A ISA. The "Styx" APU features the new K12 CPU cores that AMD has designed from scratch. It too will feature the next generation GCN units as well as share the same FT4 BGA connection. Many are anxiously watching this space to see if AMD can build a better mousetrap when it comes to licensing the ARM ISA (as have Qualcomm, NVIDIA, and others).
2015 shows no difference in the performance desktop space, as it is still serviced by the now venerable Piledriver based FX parts on AM3+. The only change we expect to see here is that there will be a handful of new motherboard offerings from the usual suspects that will include the new USB 3.1 functionality derived from a 3rd party controller.
Mainstream and Performance will utilize the upcoming Godavari APUs. These are power and speed optimized APUs that are still based on the current Kaveri design. These look to be a simple refresh/rebadge with a slight performance tweak. Not exciting, but needs to happen for OEMs.
Low power will continue to be addressed by Beema based APUs. These are regular Puma based cores (not Puma+). AMD likely does not have the numbers to justify a new product in this rather small market.
2016 is when things get interesting again. We see the release of the FM3 socket (final proof that AM3+ is dead) that will house the latest Zen based APUs. At the top end we see "Summit Ridge" which will be composed of 8 Zen cores (or 2 Zen modules). This will have 4 MB of L2 cache and 16 MB of L3 cache if our other leaks are correct. These will be manufactured on 14nm FinFET LPE (the more appropriate process product for larger, more performance oriented parts). These will not be SOCs. We can expect these to be the basis of new Opterons as well, but there is obviously no confirmation of that on these particular slides. This will be the first new product in some years from AMD that has the chance to compete with higher end desktop SKUs from Intel.
From there we have the lower power Bristol Ridge and Basilisk APUs that we already covered in the mobile discussion. These look to be significant upgrades from the current Kaveri (and upcoming Godavari) APUs. New graphics cores, new CPU cores, and new SOC implementations where necessary.
AMD will really be shaking up the game in 2016. At the very least they will have proven that they can still change up their game and release higher end (and hopefully competitive) products. AMD has enough revenue and cash on hand to survive through 2016 and 2017 at the rate they are going now. We can only hope that this widescale change will allow AMD to make some significant inroads with OEMs on all levels. Otherwise Intel is free to do what they want and what price they want across multiple markets.
ARM Releases Cortex-A72 for Licensing
On February 3rd, ARM announced a slew of new designs, including the Cortex A72. Few details were shared with us, but what we learned was that it could potentially redefine power and performance in the ARM ecosystem. Ryan was invited to London to participate in a deep dive of what ARM has done to improve its position against market behemoth Intel in the very competitive mobile space. Intel has a leg up on process technology with their 14nm Tri-Gate process, but they are continuing to work hard in making their x86 based processors more power efficient, while still maintaining good performance. There are certain drawbacks to using an ISA that is focused on high performance computing rather than being designed from scratch to provide good performance with excellent energy efficiency.
ARM has been on a pretty good roll with their Cortex A9, A7, A15, A17, A53, and A57 parts over the past several years. These designs have been utilized in a multitude of products and scenarios, with configurations that have scaled up to 16 cores. While each iteration has improved upon the previous, ARM is facing the specter of Intel’s latest generation, highly efficient x86 SOCs based on the 2nd gen 14nm Tri-Gate process. Several things have fallen into place for ARM to help them stay competitive, but we also cannot ignore the experience and design hours that have led to this product.
(Editor's Note: During my time with ARM last week it became very apparent that it is not standing still, not satisfied with its current status. With competition from Intel, Qualcomm and others ramping up over the next 12 months in both mobile and server markets, ARM will more than ever be depedent on the evolution of core design and GPU design to maintain advantages in performance and efficiency. As Josh will go into more detail here, the Cortex-A72 appears to be an incredibly impressive design and all indications and conversations I have had with others, outside of ARM, believe that it will be an incredibly successful product.)
Cortex A72: Highest Performance ARM Cortex
ARM has been ubiquitous for mobile applications since it first started selling licenses for their products in the 90s. They were found everywhere it seemed, but most people wouldn’t recognize the name ARM because these chips were fabricated and sold by licensees under their own names. Guys like Ti, Qualcomm, Apple, DEC and others all licensed and adopted ARM technology in one form or the other.
ARM’s importance grew dramatically with the introduction of increased complexity cellphones and smartphones. They also gained attention through multimedia devices such as the Microsoft Zune. What was once a fairly niche company with low performance, low power offerings became the 800 pound gorilla in the mobile market. Billions of chips are sold yearly based on ARM technology. To stay in that position ARM has worked aggressively on continually providing excellent power characteristics for their parts, but now they are really focusing on overall performance and capabilities to address, not only the smartphone market, but also the higher performance computing and server spaces that they want a significant presence in.
Subject: General Tech | April 22, 2015 - 01:29 PM | Jeremy Hellstrom
Tagged: arm, Q1 2015
ARM seems to be completely ignoring the sales downturn that almost every single component manufacturer has seen in this quarter, as well as previous ones, turning in on increase of 14% on revenue and 24% on profit in Q1 of 2015. As The Register points out that equates to 450 chips selling every second, something even automated stock trading algorithms have to be impressed by. Royalty revenue increased by 31% thanks to Mali, regardless of Apple's decision not to use that chip in their iPhone 6. You can expect to see more news on ARM from us in the near future and you can expect the news to be good for their investors and users.
"The first three months of 2015 have been good to ARM, which saw revenues of $348.2m and pre-tax profits of $120.5m in the first quarter, with 3.8 billion ARM-based chips shipped - or more than 450 chips per second."
Here is some more Tech News from around the web:
- Google pulls plug on YouTube for older iPads, iPhones, smart TVs @ The Register
- How to Find Your Linux Version or Distro Release, and Why It Matters @ Linux.com
- SSL bug hits 1,000 iPhone, iPad apps including Microsoft and Yahoo titles @ The Inquirer
- 'No iOS Zone' Wi-Fi zero-day bug forces iPhones, iPads to crash and burn @ The Register
- New atomic clock won’t lose a second in 15 billion years @ Extremetech
Subject: General Tech | April 8, 2015 - 12:46 PM | Jeremy Hellstrom
Tagged: Samsung, arm, qualcomm, snapdragon 820, Kyro
Not only has the NVIDIA sueball pitch been judged to be in play and will continue to run but now according to news The Register has heard Samsung may be using their own in-house ARM processors for their next products. The rumour is that they have spend four years developing an ARM processor from the ground up which will make it much less likely that Qualcomm will be able to sell their next generation 64 bit Snapdragon Kyro processor to Samsung, which is after all a modified ARM v8-a chip as opposed to a custom built processor. Qualcomm does have other customers than Samsung, including HTC, Amazon and LG who might be interested in the new Snapdragon 820 but it does look bleak for their next generation processor. The only leverage Qualcomm has now is that Samsung will likely be the ones fabbing many of the new Snapdragon 820's, perhaps they can strike a deal for some lower cost mobile devices once Kyro matures.
"Samsung will join Apple and other mobile semiconductor rivals in producing chips powered by homegrown, proprietary application cores in 2016, according to a new report."
Here is some more Tech News from around the web:
- Intel outs updated Atom x3 chip destined for IoT devices @ The Inquirer
- Windows XP is still clinging on, one year later @ The Inquirer
- Surface tablet shipments expected to exceed 4 million units in 2015 @ DigiTimes
- Most top corporates still Heartbleeding over the internet @ The Register
- ONOS to SDN world: here's our numbers, show us yours @ The Register
Subject: General Tech | March 31, 2015 - 12:18 PM | Jeremy Hellstrom
Tagged: skybridge, HPC, arm, amd
The details are a little sparse but we now have hints of what AMD's plans are for next year and 2017. In 2016 we should see AMD chips with ARM cores, the Skybridge architecture which Josh described almost a year ago, which will be pin compatible allowing the same motherboard to run with either an ARM processor or an AMD64 depending on your requirements. The GPU portion of their APUs will move forward on a two year cycle so we should not expect any big jumps in the next year but they are talking about an HPC capable part by 2017. The final point that The Register translated covers that HPC part which is supposed to utilize a new memory architecture which will be nine times faster than existing GDDR5.
"Consumer and commercial business lead Junji Hayashi told the PC Cluster Consortium workshop in Osaka that the 2016 release CPU cores (an ARMv8 and an AMD64) will get simultaneous multithreading support, to sit alongside the clustered multithreading of the company's Bulldozer processor families."
Here is some more Tech News from around the web:
Subject: General Tech | March 17, 2015 - 01:18 PM | Jeremy Hellstrom
Tagged: hsa foundation, hsa, amd, arm, Samsung, Imagination Technologies, HSAIL
We have been talking about the HSA foundation since 2013, a cooperative effort by AMD, ARM, Imagination, Samsung, Qualcomm, MediaTek and TI to design a heterogeneous memory architecture to allow GPUs, DSPs and CPUs to all directly access the same physical memory. The release of the official specifications today are a huge step forward for these companies, especially for garnering future mobile market share as physical hardware apart from Carrizo becomes available.
Programmers will be able to use C, C++, Fortran, Java, and Python to write HSA-compliant code which is then compiled into HSAIL (Heterogeneous System Architecture Intermediate Language) and from there to the actual binary executables which will run on your devices. HSA currently supports x86 and x64 and there are Linux kernel patches available for those who develop on that OS. Intel and NVIDIA are not involved in this project at all, they have chosen their own solutions for mobile devices and while Intel certainly has pockets deep enough to experiment NVIDIA might not. We shall soon see if Pascal and improvements Maxwell's performance and efficiency through future generations can compete with the benefits of HSA.
The current problem is of course hardware, Bald Eagle and Carrizo are scheduled to arrive on the market soon but currently they are not available. Sea Islands GPUs and Kaveri have some HSA enhancements but with limited hardware to work with it will be hard to convince developers to focus on programming HSA optimized applications. The release of the official specs today is a great first step; if you prefer an overview to reading through the official documents The Register has a good article right here.
"The HSA Foundation today officially published version 1.0 of its Heterogeneous System Architecture specification, which (if we were being flippant) describes how GPUs, DSPs and CPUs can share the same physical memory and pass pointers between each other. (A provisional 1.0 version went live in August 2014.)"
Here is some more Tech News from around the web:
- Droidberry dangles: Why the BlackBerry-Samsung alliance is big potatoes @ The Register
- BlackBerry: FREAK SSL bug affects BES, BBM and BlackBerry smartphones @ The Inquirer
- Apple will pay you to ditch your Android or BlackBerry smartphone @ The Inquirer
- Ext4 Filesystem Improvements to Address Scaling Challenges @ Linux.com
- Microsoft gives EMET divine powers to repel God Mode attack @ The Register
- Microsoft RE-BORKS Windows 7 patch after reboot loop horror @ The Register
- Fujitsu Could Help Smartphone Chips Run Cooler @ Slashdot
- Gigabyte announces financial results for 2014 @ DigiTimes
- 3D Audio Standard Released @ Slashdot
- NikKTech And Nanoxia Spring Break EU Giveaway
Subject: Graphics Cards, Mobile | March 3, 2015 - 12:00 PM | Ryan Shrout
Tagged: Unity, lighting, global illumination, geomerics, GDC, arm
Back in 2013 ARM picked up a company called Geomerics, responsible for one the industry’s most advanced dynamic lighting engines used in games ranging from mobile to console to PC. Called Enlighten, it is the lighting engine in many major games in a variety of markets. Battlefield 3 uses it, Need for Speed: The Run does as well, The Bureau: XCOM Declassified and Quantum Conundrum mark another pair of major games that depend on Geomerics technology.
Great, but what does that have to do with ARM and why would the company be interested in investing in software that works with such a wide array of markets, most of which are not dominated by ARM processors? There are two answers, the first of which is directional: ARM is using the minds and creative talent behind Geomerics to help point the Cortex and Mali teams in the correct direction for CPU and GPU architecture development. By designing hardware to better address the advanced software and lighting systems Geomerics builds then Cortex and Mali will have some semblance of an advantage in specific gaming titles as well as a potential “general purpose” advantage. NVIDIA employs hundreds of gaming and software developers for this exact reason: what better way to make sure you are always at the forefront of the gaming ecosystem than getting high-level gaming programmers to point you to that edge? Qualcomm also recently (back in 2012) started employing game and engine developers in-house with the same goals.
ARM also believes it will be beneficial to bring publishers, developers and middleware partners to the ARM ecosystem through deployment of the Enlighten engine. It would be feasible to think console vendors like Microsoft and Sony would be more willing to integrate ARM SoCs (rather than the x86 used in the PS4 and Xbox One) when shown the technical capabilities brought forward by technologies like Geomerics Enlighten.
It’s best to think of the Geomerics acquisition of a kind of insurance program for ARM, making sure both its hardware and software roadmaps are in line with industry goals and directives.
At GDC 2015 Geomerics is announcing the release of the Enlighten 3 engine, a new version that brings cinematic-quality real-time global illumination to market. Some of the biggest new features include additional accuracy on indirect lighting, color separated directional output (enables individual RGB calculations), better light map baking for higher quality output, and richer material properties to support transparency and occlusion.
All of this technology will be showcased in a new Subway demo that includes real-time global illumination simulation, dynamic transparency and destructible environments.
Geomerics Enlighten 3 Subway Demo
Enlighten 3 will also ship with Forge, a new lighting editor and pipeline tool for content creators looking to streamline the building process. Forge will allow import functionality from Autodesk 3ds Max and Maya applications making inter-operability easier. Forge uses a technology called YEBIS 3 to show estimated final quality without the time consuming final-build processing time.
Finally, maybe the biggest news for ARM and Geomerics is that the Unity 5 game engine will be using Enlighten as its default lighting engine, giving ARM/Mali a potential advantage for gaming experiences in the near term. Of course Enlighten is available as an option for Unreal Engine 3 and 4 for developers using that engine in mobile, console and desktop projects as well as in an SDK form for custom integrations.
Subject: General Tech | February 5, 2015 - 02:05 PM | Ken Addison
Tagged: podcast, video, g-sync, GTX 970, gigabyte, brix s, broadwell-u, Intel, nuc, arm, Cortex-A72, mediatek, amd, Godavari, Raspberry Pi, windows 10
PC Perspective Podcast #335 - 02/05/2015
Join us this week as we discuss Mobile G-Sync, GTX 970 SLI, a Broadwell Brix and more!
The URL for the podcast is: http://pcper.com/podcast - Share with your friends!
- iTunes - Subscribe to the podcast directly through the Store
- RSS - Subscribe through your regular RSS reader
- MP3 - Direct download link to the MP3 file
Hosts: Ryan Shrout, Jeremy Hellstrom, Josh Walrath, and Allyn Malventano
Program length: 1:28:13
ARM Releases Top Cortex Design to Partners
ARM has an interesting history of releasing products. The company was once in the shadowy background of the CPU world, but with the explosion of mobile devices and its relevance in that market, ARM has had to adjust how it approaches the public with their technologies. For years ARM has announced products and technology, only to see it ship one to two years down the line. It seems that with the increased competition in the marketplace from Apple, Intel, NVIDIA, and Qualcomm ARM is now pushing to license out its new IP in a way that will enable their partners to achieve a faster time to market.
The big news this time is the introduction of the Cortex A72. This is a brand new design that will be based on the ARMv8-A instruction set. This is a 64 bit capable processor that is also backwards compatible with 32 bit applications programmed for ARMv7 based processors. ARM does not go into great detail about the product other than it is significantly faster than the previous Cortex-A15 and Cortex-A57.
The previous Cortex-A15 processors were announced several years back and made their first introduction in late 2013/early 2014. These were still 32 bit processors and while they had good performance for the time, they did not stack up well against the latest A8 SOCs from Apple. The A53 and A57 designs were also announced around two years ago. These are the first 64 bit designs from ARM and were meant to compete with the latest custom designs from Apple and Qualcomm’s upcoming 64 bit part. We are only now just seeing these parts make it into production, and even Qualcomm has licensed the A53 and A57 designs to insure a faster time to market for this latest batch of next-generation mobile devices.
We can look back over the past five years and see that ARM is moving forward in announcing their parts and then having their partners ship them within a much shorter timespan than we were used to seeing. ARM is hoping to accelerate the introduction of its new parts within the next year.