ARM Releases Top Cortex Design to Partners
ARM has an interesting history of releasing products. The company was once in the shadowy background of the CPU world, but with the explosion of mobile devices and its relevance in that market, ARM has had to adjust how it approaches the public with their technologies. For years ARM has announced products and technology, only to see it ship one to two years down the line. It seems that with the increased competition in the marketplace from Apple, Intel, NVIDIA, and Qualcomm ARM is now pushing to license out its new IP in a way that will enable their partners to achieve a faster time to market.
The big news this time is the introduction of the Cortex A72. This is a brand new design that will be based on the ARMv8-A instruction set. This is a 64 bit capable processor that is also backwards compatible with 32 bit applications programmed for ARMv7 based processors. ARM does not go into great detail about the product other than it is significantly faster than the previous Cortex-A15 and Cortex-A57.
The previous Cortex-A15 processors were announced several years back and made their first introduction in late 2013/early 2014. These were still 32 bit processors and while they had good performance for the time, they did not stack up well against the latest A8 SOCs from Apple. The A53 and A57 designs were also announced around two years ago. These are the first 64 bit designs from ARM and were meant to compete with the latest custom designs from Apple and Qualcomm’s upcoming 64 bit part. We are only now just seeing these parts make it into production, and even Qualcomm has licensed the A53 and A57 designs to insure a faster time to market for this latest batch of next-generation mobile devices.
We can look back over the past five years and see that ARM is moving forward in announcing their parts and then having their partners ship them within a much shorter timespan than we were used to seeing. ARM is hoping to accelerate the introduction of its new parts within the next year.
Subject: Processors, Mobile | October 29, 2014 - 04:30 AM | Scott Michaud
Tagged: arm, mali-T800, mali
While some mobile SoC manufacturers have created their own graphics architectures, others license from ARM (and some even have a mixture of each within their product stack). There does not seem to be a specific push with this generation, rather just increases in the areas that make the most sense. Some comments tout increased energy efficiency, others higher performance, and even API support got a boost to OpenGL ES 3.1, which brings compute shaders to mobile graphics applications (without invoking OpenCL, etc.).
Three models are in the Mali-T800 series: the T820, the T830, and the T860. As you climb in the list, the products go from entry level to high-performance mobile. GPUs are often designed in modularized segments, which ARM calls cores. You see this frequently in desktop, discrete graphics cards where an entire product stack contains a handful of actual designs, but products are made by disabling whole modules. The T820 and T830 can scale between one to four "core" modules, each core containing four actual "shader cores", while the T860 can scale between one to sixteen "core" modules, each core with 16 "shader cores". Again "core modules" are groups that contain actual shader processors (and L2 cache, etc.). Cores in cores.
This is probably why NVIDIA calls them "Streaming Multiprocessors" that contain "CUDA Cores".
ARM does not (yet) provide an actual GFLOP rating for these processors, and it is up to manufacturers to some extent. It is normally a matter of multiplying the clock frequency by the number of ops per cycle and by the number of shader units available. I tried, but I assume my assumption of instructions per clock was off because the number I was getting did not match with known values from previous generations, so I assumed that I made a mistake. Also, again, ARM considers their performance figures to be conservative. Manufacturers should have no problem exceeding these, effortlessly.
As for a release timeline? Because these architectures are designed for manufacturers to implement, you should start seeing them within devices hitting retail in late 2015, early 2016.
Subject: General Tech | October 31, 2013 - 03:48 PM | Ken Addison
Tagged: podcast, video, R9 290X, amd, radeon, 290x crossfire, 280x, r9 280x, gtx 770, gtx 780, arm, mali, Altera
PC Perspective Podcast #275 - 10/31/2013
Join us this week as we discuss the AMD Radeon R9 290X, ARMTechCon 2013, NVIDIA Pricedrops and more!
The URL for the podcast is: http://pcper.com/podcast - Share with your friends!
- iTunes - Subscribe to the podcast directly through the Store
- RSS - Subscribe through your regular RSS reader
- MP3 - Direct download link to the MP3 file
Hosts: Ryan Shrout, Jeremy Hellstrom, Josh Walrath, and Allyn Malventano
ARM is Serious About Graphics
Ask most computer users from 10 years ago who ARM is, and very few would give the correct answer. Some well informed people might mention “Intel” and “StrongARM” or “XScale”, but ARM remained a shadowy presence until we saw the rise of the Smartphone. Since then, ARM has built up their brand, much to the chagrin of companies like Intel and AMD. Partners such as Samsung, Apple, Qualcomm, MediaTek, Rockchip, and NVIDIA have all worked with ARM to produce chips based on the ARMv7 architecture, with Apple being the first to release the first ARMv8 (64 bit) SOCs. The multitude of ARM architectures are likely the most shipped chips in the world, going from very basic processors to the very latest Apple A7 SOC.
The ARMv7 and ARMv8 architectures are very power efficient, yet provide enough performance to handle the vast majority of tasks utilized on smartphones and tablets (as well as a handful of laptops). With the growth of visual computing, ARM also dedicated itself towards designing competent graphics portions of their chips. The Mali architecture is aimed at being an affordable option for those without access to their own graphics design groups (NVIDIA, Qualcomm), but competitive with others that are willing to license their IP out (Imagination Technologies).
ARM was in fact one of the first to license out the very latest graphics technology to partners in the form of the Mali-T600 series of products. These modules were among the first to support OpenGL ES 3.0 (compatible with 2.0 and 1.1) and DirectX 11. The T600 architecture is very comparable to Imagination Technologies’ Series 6 and the Qualcomm Adreno 300 series of products. Currently NVIDIA does not have a unified mobile architecture in production that supports OpenGL ES 3.0/DX11, but they are adapting the Kepler architecture to mobile and will be licensing it to interested parties. Qualcomm does not license out Adreno after buying that group from AMD (Adreno is an anagram of Radeon).
Subject: General Tech, Graphics Cards, Processors, Mobile | July 23, 2013 - 04:01 AM | Scott Michaud
Tagged: Samsung, mali, exynos
Exynos, the line of System on a Chip (SoC) products from Samsung, were notably absent of ARM Mali GPUs. This, apparently, struck concern over how viable Mali will continue to be and whether ARM will continue to lose designs to competitors such as Imagination Technologies.
Then Samsung announced, Monday evening for us North Americans, the upcoming Exynos 5 Octa Processor will embed six ARM Mali-T628 GPU cores. The T628 GPU cores are capable of OpenCL 1.1 and OpenGL ES 3.0 standards which should allow applications to offload heavy batches of tasks, such as computational photography processing, with high efficiency and performance.
The Exynos 5 Octa contains four ARM Cortex-A15 cores at 1.8GHz, supported by four additional Cortex-A7 cores clocked at 1.3GHz. These processors are currently being sampled and should be produced in August.
Cortex-A12 fills a gap
Starting off Computex with an interesting announcement, ARM is talking about a new Cortex-A12 core that will attempt to address a performance gap in the SoC ecosystem between the A9 and A15. In the battle to compete with Krait and Intel's Silvermont architecture due in late 2013, ARM definitely needed to address the separation in performance and efficiency of the A9 and A15.
Source: ARM. Top to bottom: Cortex-A15, A12, A9 die size estimate
Targeted at mid-range devices that tend to be more cost (and thus die-size) limited, the Cortex-A12 will ship in late 2014 for product sampling and you should begin seeing hardware for sale in early 2015.
Architecturally, the changes for the upcoming A12 core revolve around a move to fully out of order dual-issue design including the integrated floating point units. The execution units are faster and the memory design has been improved but ARM wasn't ready to talk about specifics with me yet; expect that later in the year.
ARM claims this results in a 40% performance gain for the Cortex-A12 over the Cortex-A9, tested in SPECint. Because product won't even start sampling until late in 2014 we have no way to verify this data yet or to evaluate efficiency claims. That time lag between announcement and release will also give competitors like Intel, AMD and even Qualcomm time to answer back with potential earlier availability.
ARM is a company that no longer needs much of an introduction. This was not always the case. ARM has certainly made a name for themselves among PC, tablet, and handheld consumers. Their primary source of income is licensing CPU designs as well as their ISA. While names like the Cortex A9 and Cortex A15 are fairly well known, not as many people know about the graphics IP that ARM also licenses. Mali is the product name of the graphics IP, and it encompasses an entire range of features and performance that can be licensed by other 3rd parties.
I was able to get a block of time with Nizar Romdhane, Head of the Mali Ecosystem at ARM. I was able to ask a few questions about Mali, ARM’s plans to address the increasingly important mobile graphics market, and how they will compete with competition from Imagination Technologies, Intel, AMD, NVIDIA, and Qualcomm.
We would like to thank Nizar for his time, as well as Phil Hughes in facilitating this interview. Stay tuned as we are expecting to continue this series of interviews with other ARM employees in the near future.
Subject: Graphics Cards | February 25, 2013 - 08:01 PM | Josh Walrath
Tagged: nvidia, tegra, tegra 4, Tegra 4i, pixel, vertex, PowerVR, mali, adreno, geforce
When Tegra 4 was introduced at CES there was precious little information about the setup of the integrated GPU. We all knew that it would be a much more powerful GPU, but we were not entirely sure how it was set up. Now NVIDIA has finally released a slew of whitepapers that deal with not only the GPU portion of Tegra 4, but also some of the low level features of the Cortex A15 processor. For this little number I am just going over the graphics portion.
This robust looking fellow is the Tegra 4. Note the four pixel "pipelines" that can output 4 pixels per clock.
The graphics units on the Tegra 4 and Tegra 4i are identical in overall architecture, just that the 4i has fewer units and they are arranged slightly differently. Tegra 4 is comprised of 72 units, 48 of which are pixel shaders. These pixel shaders are VLIW based VEC4 units. The other 24 units are vertex shaders. The Tegra 4i is comprised of 60 units, 48 of which are pixel shaders and 12 are vertex shaders. We knew at CES that it was not a unified shader design, but we were still unsure of the overall makeup of the part. There are some very good reasons why NVIDIA went this route, as we will soon explore.
If NVIDIA were to transition to unified shaders, it would increase the overall complexity and power consumption of the part. Each shader unit would have to be able to handle both vertex and pixel workloads, which means more transistors are needed to handle it. Simpler shaders focused on either pixel or vertex operations are more efficient at what they do, both in terms of transistors used and power consumption. This is the same train of thought when using fixed function units vs. fully programmable. Yes, the programmability will give more flexibility, but the fixed function unit is again smaller, faster, and more efficient at its workload.
On the other hand here we have the Tegra 4i, which gives up half the pixel pipelines and vertex shaders, but keeps all 48 pixel shaders.
If there was one surprise here, it would be that the part is not completely OpenGL ES 3.0 compliant. It is lacking in one major function that is required for certification. This particular part cannot render at FP32 levels. It has been quite a few years since we have heard of anything not being able to do FP32 in the PC market, but it is quite common to not support it in the power and transistor conscious mobile market. NVIDIA decided to go with a FP 20 partial precision setup. They claim that for all intents and purposes, it will not be noticeable to the human eye. Colors will still be rendered properly and artifacts will be few and far between. Remember back in the day when NVIDIA supported FP16 and FP32 while they chastised ATI for choosing FP24 with the Radeon 9700 Pro? Times have changed a bit. Going with FP20 is again a power and transistor saving decision. It still supports DX9.3 and OpenGL ES 2.0, but it is not fully OpenGL ES 3.0 compliant. This is not to say that it does not support any 3.0 features. It in fact does support quite a bit of the functionality required by 3.0, but it is still not fully compliant.
This will be an interesting decision to watch over the next few years. The latest Mali 600 series, PowerVR 6 series, and Adreno 300 series solutions all support OpenGL ES 3.0. Tegra 4 is the odd man out. While most developers have no plans to go to 3.0 anytime in the near future, it will eventually be implemented in software. When that point comes, then the Tegra 4 based devices will be left a bit behind. By then NVIDIA will have a fully compliant solution, but that is little comfort for those buying phones and tablets in the near future that will be saddled with non-compliance once applications hit.
The list of OpenGL ES 3.0 features that are actually present in Tegra 4, but the lack of FP32 relegates it to 2.0 compliant status.
The core speed is increased to 672 MHz, well up from the 520 MHz in Tegra 3 (8 pixel and 4 vertex shaders). The GPU can output four pixels per clock, double that of Tegra 3. Once we consider the extra clock speed and pixel pipelines, the Tegra 4 increases pixel fillrate by 2.6x. Pixel and vertex shading will get a huge boost in performance due to the dramatic increase of units and clockspeed. Overall this is a very significant improvement over the previous generation of parts.
The Tegra 4 can output to a 4K display natively, and that is not the only new feature for this part. Here is a quick list:
2x/4x Multisample Antialiasing (MSAA)
24-bit Z (versus 20-bit Z in the Tegra 3 processor) and 8-bit Stencil
4K x 4K texture size incl. Non-Power of Two textures (versus 2K x 2K in the Tegra 3 processor) – for higher quality textures, and easier to port full resolution textures from console and PC games to Tegra 4 processor. Good for high resolution displays.
16:1 Depth (Z) Compression and 4:1 Color Compression (versus none in Tegra 3 processor) – this is lossless compression and is useful for reducing bandwidth to/from the frame buffer, and especially effective in antialiasing processing when processing multiple samples per pixel
Percentage Closer Filtering for Shadow Texture Mapping and Soft Shadows
Texture border color eliminate coarse MIP-level bleeding
sRGB for Texture Filtering, Render Surfaces and MSAA down-filter
1 - CSAA is no longer supported in Tegra 4 processors
This is a big generational jump, and now we only have to see how it performs against the other top end parts from Qualcomm, Samsung, and others utilizing IP from Imagination and ARM.
Subject: General Tech, Mobile | October 22, 2012 - 02:00 PM | Jeremy Hellstrom
Tagged: arm, qualcomm, marketshare, SoC, imagination, Vivante, jon peddie, mali
ARM has made some serious impact on the mobile market with their Mali GPU on their SoC, with Jon Peddie Research reporting they have doubled their market share over the past year. That number is even more impressive when you pair it with the 91.3% growth in the mobile GPU market. Another player, Vivante, quadrupled their share of the market and while their products are found primarily in Asia you may recognize them as a member of the HSA. Their success comes at a cost to Imagination and Qualcomm, both of whom have seen their market shares drop. NVIDIA is currently making up 2.5% of the GPU market for tablets and smartphones which is not too bad when you consider that the other four main players all license their processors out while NVIDIA remains the sole provider of its Tegra SoCs. Get more numbers at The Inquirer.
"CHIP DESIGNERS ARM and Vivante have achieved significant market share gains in the system-on-chip (SoC) GPU market while Imagination and Qualcomm have seen their market shares fall."
Here is some more Tech News from around the web:
- AMD Q3 2012 analyst call talks IP strategy @ SemiAccurate
- Skype details Windows 8 app ahead of 26 October release @ The Inquirer
- Nanya Technology, Inotera to receive new financing to move to 30nm process, say sources @ DigiTimes