ARM Announces Mali-T800 Series of Mobile GPUs

Subject: Processors, Mobile | October 29, 2014 - 04:30 AM |
Tagged: arm, mali-T800, mali

While some mobile SoC manufacturers have created their own graphics architectures, others license from ARM (and some even have a mixture of each within their product stack). There does not seem to be a specific push with this generation, rather just increases in the areas that make the most sense. Some comments tout increased energy efficiency, others higher performance, and even API support got a boost to OpenGL ES 3.1, which brings compute shaders to mobile graphics applications (without invoking OpenCL, etc.).

arm-mali-t860-chip-diagram-LG.png

Three models are in the Mali-T800 series: the T820, the T830, and the T860. As you climb in the list, the products go from entry level to high-performance mobile. GPUs are often designed in modularized segments, which ARM calls cores. You see this frequently in desktop, discrete graphics cards where an entire product stack contains a handful of actual designs, but products are made by disabling whole modules. The T820 and T830 can scale between one to four "core" modules, each core containing four actual "shader cores", while the T860 can scale between one to sixteen "core" modules, each core with 16 "shader cores". Again "core modules" are groups that contain actual shader processors (and L2 cache, etc.). Cores in cores.

This is probably why NVIDIA calls them "Streaming Multiprocessors" that contain "CUDA Cores".

arm-mali-t830-chip-diagram-LG.png

ARM does not (yet) provide an actual GFLOP rating for these processors, and it is up to manufacturers to some extent. It is normally a matter of multiplying the clock frequency by the number of ops per cycle and by the number of shader units available. I tried, but I assume my assumption of instructions per clock was off because the number I was getting did not match with known values from previous generations, so I assumed that I made a mistake. Also, again, ARM considers their performance figures to be conservative. Manufacturers should have no problem exceeding these, effortlessly.

As for a release timeline? Because these architectures are designed for manufacturers to implement, you should start seeing them within devices hitting retail in late 2015, early 2016.

Source: ARM
Author:
Subject: Processors
Manufacturer: ARM

Cortex-A12 Optimized!

ARM is an interesting little company.  Years ago people would have no idea who you are talking about, but now there is a much greater appreciation for the company.  Their PR group is really starting to get the hang of getting their name out.  One thing that ARM does that is significantly different from what other companies do is announce products far in advance of when they will actually be seeing the light of day.  Today they are announcing the Cortex-A17 IP that will ship in 2015.
 
arm_01.jpg
 
ARM really does not have much of a choice in how they announce their technology, primarily because they rely on 3rd parties to actually ship products.  ARM licenses their IP to guys like Samsung, Qualcomm, Ti, NVIDIA, etc. and then wait for them to actually build and ship product.  I guess part of pre-announcing these bits of IP provides a greater push for their partners to actually license that specific IP due to end users and handset makers showing interest?  Whatever the case, it is interesting to see where ARM is heading with their technology.
 
The Cortex-A17 can be viewed as a more supercharged version of the Cortex-A12, but with features missing from that particular product.  The big advancement over the A12 is that the A17 can be utilized in a big.LITTLE configuration with Cortex-A7 IP.  The A17 is more power optimized as well so it can go into a sleep state faster than the A12, and it also features more memory controller tweaks to improve performance while again lowering power consumption.
 
arm_02.jpg
 
In terms of overall performance it gets a pretty big boost as compared to the very latest Cortex-A9r4 designs (such as the Tegra 4i).  Numbers bandied about by ARM show that the A17 is around 60% faster than the A9, and around 40% faster than the A12.  These numbers may or may not jive with real-world experience due to differences in handset and tablet designs, but theoretically speaking they look to be in the ballpark.  The A17 should be close in overall performance to A15 based SOCs.  A15s are shipping now, but they are not as power efficient as what ARM is promising with the A17.
 

Podcast #275 - AMD Radeon R9 290X, ARMTechCon 2013, NVIDIA Pricedrops and more!

Subject: General Tech | October 31, 2013 - 03:48 PM |
Tagged: podcast, video, R9 290X, amd, radeon, 290x crossfire, 280x, r9 280x, gtx 770, gtx 780, arm, mali, Altera

PC Perspective Podcast #275 - 10/31/2013

Join us this week as we discuss the AMD Radeon R9 290X, ARMTechCon 2013, NVIDIA Pricedrops and more!

You can subscribe to us through iTunes and you can still access it directly through the RSS page HERE.

The URL for the podcast is: http://pcper.com/podcast - Share with your friends!

  • iTunes - Subscribe to the podcast directly through the iTunes Store
  • RSS - Subscribe through your regular RSS reader
  • MP3 - Direct download link to the MP3 file

Hosts: Ryan Shrout, Jeremy Hellstrom, Josh Walrath, and Allyn Malventano

 
Program length: 1:22:37
  1. Week in Review:
    1. 0:55:40
  2. 0:59:20 This episode is brought to you by Carbonite.com! Use offer code PC for two free months!
      1. Intel Series 9 Chipset
  3. Hardware/Software Picks of the Week:
  4. podcast@pcper.com
  5. Closing/outro

 

Author:
Manufacturer: ARM

ARM is Serious About Graphics

Ask most computer users from 10 years ago who ARM is, and very few would give the correct answer.  Some well informed people might mention “Intel” and “StrongARM” or “XScale”, but ARM remained a shadowy presence until we saw the rise of the Smartphone.  Since then, ARM has built up their brand, much to the chagrin of companies like Intel and AMD.  Partners such as Samsung, Apple, Qualcomm, MediaTek, Rockchip, and NVIDIA have all worked with ARM to produce chips based on the ARMv7 architecture, with Apple being the first to release the first ARMv8 (64 bit) SOCs.  The multitude of ARM architectures are likely the most shipped chips in the world, going from very basic processors to the very latest Apple A7 SOC.

t700_01.jpg

The ARMv7 and ARMv8 architectures are very power efficient, yet provide enough performance to handle the vast majority of tasks utilized on smartphones and tablets (as well as a handful of laptops).  With the growth of visual computing, ARM also dedicated itself towards designing competent graphics portions of their chips.  The Mali architecture is aimed at being an affordable option for those without access to their own graphics design groups (NVIDIA, Qualcomm), but competitive with others that are willing to license their IP out (Imagination Technologies).

ARM was in fact one of the first to license out the very latest graphics technology to partners in the form of the Mali-T600 series of products.  These modules were among the first to support OpenGL ES 3.0 (compatible with 2.0 and 1.1) and DirectX 11.  The T600 architecture is very comparable to Imagination Technologies’ Series 6 and the Qualcomm Adreno 300 series of products.  Currently NVIDIA does not have a unified mobile architecture in production that supports OpenGL ES 3.0/DX11, but they are adapting the Kepler architecture to mobile and will be licensing it to interested parties.  Qualcomm does not license out Adreno after buying that group from AMD (Adreno is an anagram of Radeon).

Click to read the entire article here!

Samsung Exynos 5 Octa Returns to ARM Mali GPUs

Subject: General Tech, Graphics Cards, Processors, Mobile | July 23, 2013 - 04:01 AM |
Tagged: Samsung, mali, exynos

Exynos, the line of System on a Chip (SoC) products from Samsung, were notably absent of ARM Mali GPUs. This, apparently, struck concern over how viable Mali will continue to be and whether ARM will continue to lose designs to competitors such as Imagination Technologies.

ARM-Mali-T628.jpg

Then Samsung announced, Monday evening for us North Americans, the upcoming Exynos 5 Octa Processor will embed six ARM Mali-T628 GPU cores. The T628 GPU cores are capable of OpenCL 1.1 and OpenGL ES 3.0 standards which should allow applications to offload heavy batches of tasks, such as computational photography processing, with high efficiency and performance.

The Exynos 5 Octa contains four ARM Cortex-A15 cores at 1.8GHz, supported by four additional Cortex-A7 cores clocked at 1.3GHz. These processors are currently being sampled and should be produced in August.

Read on for the press blast from Samsung PR.

Author:
Subject: Processors, Mobile
Manufacturer: ARM
Tagged: t622, mali, cortex, arm, A9, A15, a12

Cortex-A12 fills a gap

Starting off Computex with an interesting announcement, ARM is talking about a new Cortex-A12 core that will attempt to address a performance gap in the SoC ecosystem between the A9 and A15.  In the battle to compete with Krait and Intel's Silvermont architecture due in late 2013, ARM definitely needed to address the separation in performance and efficiency of the A9 and A15. 

arm1.jpg

Source: ARM.  Top to bottom: Cortex-A15, A12, A9 die size estimate

Targeted at mid-range devices that tend to be more cost (and thus die-size) limited, the Cortex-A12 will ship in late 2014 for product sampling and you should begin seeing hardware for sale in early 2015.

arm3.jpg

Architecturally, the changes for the upcoming A12 core revolve around a move to fully out of order dual-issue design including the integrated floating point units.  The execution units are faster and the memory design has been improved but ARM wasn't ready to talk about specifics with me yet; expect that later in the year. 

arm6.jpg

ARM claims this results in a 40% performance gain for the Cortex-A12 over the Cortex-A9, tested in SPECint.  Because product won't even start sampling until late in 2014 we have no way to verify this data yet or to evaluate efficiency claims.  That time lag between announcement and release will also give competitors like Intel, AMD and even Qualcomm time to answer back with potential earlier availability.

Continue reading our overview of the newly announced ARM Cortex-A12 and Mali-T622!!

Author:
Subject: Mobile
Manufacturer: ARM

 

ARM is a company that no longer needs much of an introduction.  This was not always the case.  ARM has certainly made a name for themselves among PC, tablet, and handheld consumers.  Their primary source of income is licensing CPU designs as well as their ISA.  While names like the Cortex A9 and Cortex A15 are fairly well known, not as many people know about the graphics IP that ARM also licenses.  Mali is the product name of the graphics IP, and it encompasses an entire range of features and performance that can be licensed by other 3rd parties.

I was able to get a block of time with Nizar Romdhane, Head of the Mali Ecosystem at ARM.  I was able to ask a few questions about Mali, ARM’s plans to address the increasingly important mobile graphics market, and how they will compete with competition from Imagination Technologies, Intel, AMD, NVIDIA, and Qualcomm.

 

We would like to thank Nizar for his time, as well as Phil Hughes in facilitating this interview.  Stay tuned as we are expecting to continue this series of interviews with other ARM employees in the near future.

NVIDIA Details Tegra 4 and Tegra 4i Graphics

Subject: Graphics Cards | February 25, 2013 - 08:01 PM |
Tagged: nvidia, tegra, tegra 4, Tegra 4i, pixel, vertex, PowerVR, mali, adreno, geforce

 

When Tegra 4 was introduced at CES there was precious little information about the setup of the integrated GPU.  We all knew that it would be a much more powerful GPU, but we were not entirely sure how it was set up.  Now NVIDIA has finally released a slew of whitepapers that deal with not only the GPU portion of Tegra 4, but also some of the low level features of the Cortex A15 processor.  For this little number I am just going over the graphics portion.

layout.jpg

This robust looking fellow is the Tegra 4.  Note the four pixel "pipelines" that can output 4 pixels per clock.

The graphics units on the Tegra 4 and Tegra 4i are identical in overall architecture, just that the 4i has fewer units and they are arranged slightly differently.  Tegra 4 is comprised of 72 units, 48 of which are pixel shaders.  These pixel shaders are VLIW based VEC4 units.  The other 24 units are vertex shaders.  The Tegra 4i is comprised of 60 units, 48 of which are pixel shaders and 12 are vertex shaders.  We knew at CES that it was not a unified shader design, but we were still unsure of the overall makeup of the part.  There are some very good reasons why NVIDIA went this route, as we will soon explore.

If NVIDIA were to transition to unified shaders, it would increase the overall complexity and power consumption of the part.  Each shader unit would have to be able to handle both vertex and pixel workloads, which means more transistors are needed to handle it.  Simpler shaders focused on either pixel or vertex operations are more efficient at what they do, both in terms of transistors used and power consumption.  This is the same train of thought when using fixed function units vs. fully programmable.  Yes, the programmability will give more flexibility, but the fixed function unit is again smaller, faster, and more efficient at its workload.

layout_4i.jpg

On the other hand here we have the Tegra 4i, which gives up half the pixel pipelines and vertex shaders, but keeps all 48 pixel shaders.

If there was one surprise here, it would be that the part is not completely OpenGL ES 3.0 compliant.  It is lacking in one major function that is required for certification.  This particular part cannot render at FP32 levels.  It has been quite a few years since we have heard of anything not being able to do FP32 in the PC market, but it is quite common to not support it in the power and transistor conscious mobile market.  NVIDIA decided to go with a FP 20 partial precision setup.  They claim that for all intents and purposes, it will not be noticeable to the human eye.  Colors will still be rendered properly and artifacts will be few and far between.  Remember back in the day when NVIDIA supported FP16 and FP32 while they chastised ATI for choosing FP24 with the Radeon 9700 Pro?  Times have changed a bit.  Going with FP20 is again a power and transistor saving decision.  It still supports DX9.3 and OpenGL ES 2.0, but it is not fully OpenGL ES 3.0 compliant.  This is not to say that it does not support any 3.0 features.  It in fact does support quite a bit of the functionality required by 3.0, but it is still not fully compliant.

This will be an interesting decision to watch over the next few years.  The latest Mali 600 series, PowerVR 6 series, and Adreno 300 series solutions all support OpenGL ES 3.0.  Tegra 4 is the odd man out.  While most developers have no plans to go to 3.0 anytime in the near future, it will eventually be implemented in software.  When that point comes, then the Tegra 4 based devices will be left a bit behind.  By then NVIDIA will have a fully compliant solution, but that is little comfort for those buying phones and tablets in the near future that will be saddled with non-compliance once applications hit.

ogles_feat.jpg

The list of OpenGL ES 3.0 features that are actually present in Tegra 4, but the lack of FP32 relegates it to 2.0 compliant status.

The core speed is increased to 672 MHz, well up from the 520 MHz in Tegra 3 (8 pixel and 4 vertex shaders).  The GPU can output four pixels per clock, double that of Tegra 3.  Once we consider the extra clock speed and pixel pipelines, the Tegra 4 increases pixel fillrate by 2.6x.  Pixel and vertex shading will get a huge boost in performance due to the dramatic increase of units and clockspeed.  Overall this is a very significant improvement over the previous generation of parts.

The Tegra 4 can output to a 4K display natively, and that is not the only new feature for this part.  Here is a quick list:

2x/4x Multisample Antialiasing (MSAA)

24-bit Z (versus 20-bit Z in the Tegra 3 processor) and 8-bit Stencil

4K x 4K texture size incl. Non-Power of Two textures (versus 2K x 2K in the Tegra 3 processor) – for higher quality textures, and easier to port full resolution textures from  console and PC games to Tegra 4 processor.  Good for high resolution displays.

16:1 Depth (Z) Compression and 4:1 Color Compression (versus none in Tegra 3 processor) – this is lossless compression and is useful for reducing bandwidth to/from the frame buffer, and especially effective in antialiasing processing when processing multiple samples per pixel

Depth Textures

Percentage Closer Filtering for Shadow Texture Mapping and Soft Shadows

Texture border color eliminate coarse MIP-level bleeding

sRGB for Texture Filtering, Render Surfaces and MSAA down-filter

1 - CSAA is no longer supported in Tegra 4 processors

This is a big generational jump, and now we only have to see how it performs against the other top end parts from Qualcomm, Samsung, and others utilizing IP from Imagination and ARM.

Source: NVIDIA

ARM snaps graphics marketshare from the dragon

Subject: General Tech, Mobile | October 22, 2012 - 02:00 PM |
Tagged: arm, qualcomm, marketshare, SoC, imagination, Vivante, jon peddie, mali

ARM has made some serious impact on the mobile market with their Mali GPU on their SoC, with Jon Peddie Research reporting they have doubled their market share over the past year.  That number is even more impressive when you pair it with the 91.3% growth in the mobile GPU market.  Another player, Vivante, quadrupled their share of the market and while their products are found primarily in Asia you may recognize them as a member of the HSA.  Their success comes at a cost to Imagination and Qualcomm, both of whom have seen their market shares drop. NVIDIA is currently making up 2.5% of the GPU market for tablets and smartphones which is not too bad when you consider that the other four main players all license their processors out while NVIDIA remains the sole provider of its Tegra SoCs.  Get more numbers at The Inquirer.

ARM-Mali-T658-graphics.jpg

"CHIP DESIGNERS ARM and Vivante have achieved significant market share gains in the system-on-chip (SoC) GPU market while Imagination and Qualcomm have seen their market shares fall."

Here is some more Tech News from around the web:

Tech Talk

Source: The Inquirer

AFDS 2012: ARM once again on stage with AMD - partnership incoming?

Subject: Mobile, Shows and Expos | June 11, 2012 - 12:01 AM |
Tagged: mali, arm, amd, AFDS

In a blog post over at arm.com, ARM Fellow Jem Davies has made a point to let us all know that he is going to be attending the AMD Fusion Developer Summit yet again, but this time with something more concrete to discuss.  In a very self-aware statement, Davies writes in his post that "my appearance last year generated a lot of speculation about the nature of the relationship between ARM and AMD." 

Indeed it did.

From Davies' post:

This year, we have a great deal to discuss. ARM is all about low power and many people in the industry now realize that GPUs have a central role to play in providing highly energy-efficient computing. It’s an exciting future that can grow the ecosystem that surrounds computing. ARM’s unique portfolio of CPU, GPU, interconnect and physical IP puts us at the forefront of one of the most important technological changes in a long time. Reflecting on that and some of those changes, I will be making an announcement at the show.

Emphasis above is ours.

Also worth noting is that Jem Davies does not have his own session at AFDS, but rather we can expect to see him to come out on stage during another keynote, likely during Phil Rogers' or Mark Papermaster's. 

amdtablet.jpg

AMD wants into the tablet market.  ARM could accelerate that process.

Exactly WHAT the ARM/AMD announcement might be obviously isn't known by many yet, but we have speculated many times that an AMD built, ARM architecture processor, with Radeon-based graphics technology and ARM low-power CPU cores, could help AMD enter into the world of ultra-lower power SoCs very quickly.  Markets like the pending onslaught of Windows 8 RT tablets and clamshells have NVIDIA foaming at the mouth and AMD would be remiss to not attempt to tackle the same markets and one-up Intel at the same time.

It should be an exciting week!  Keep checking pcper.com and our AFDS site tag for all the latest news including keynote live blogs!

Source: ARM