All | Editorial | General Tech | Graphics Cards | Networking | Motherboards | Cases and Cooling | Processors | Chipsets | Memory | Displays | Systems | Storage | Mobile | Shows and Expos
The AMD Kaveri Architecture
Kaveri: AMD’s New Flagship Processor
How big is Kaveri? We already know the die size of it, but what kind of impact will it have on the marketplace? Has AMD chosen the right path by focusing on power consumption and HSA? Starting out an article with three questions in a row is a questionable tactic for any writer, but these are the things that first come to mind when considering a product the likes of Kaveri. I am hoping we can answer a few of these questions by the end of this article, but alas it seems as though the market will have the final say as to how successful this new architecture is.
AMD has been pursuing the “Future is Fusion” line for several years, but it can be argued that Kaveri is truly the first “Fusion” product that completes the overall vision for where AMD wants to go. The previous several generations of APUs were initially not all that integrated in a functional sense, but the complexity and completeness of that integration has been improved upon with each iteration. Kaveri takes this integration to the next step, and one which fulfills the promise of a truly heterogeneous computing solution. While AMD has the hardware available, we have yet to see if the software companies are willing to leverage the compute power afforded by a robust and programmable graphics unit powered by AMD’s GCN architecture.
(Editor's Note: The following two pages were written by our own Josh Walrath, dicsussing the technology and architecture of AMD Kaveri. Testing and performance analysis by Ryan Shrout starts on page 3.)
The first step in understanding Kaveri is taking a look at the process technology that AMD is using for this particular product. Since AMD divested itself of their manufacturing arm, they have had to rely on GLOBALFOUNDRIES to produce nearly all of their current CPUs and APUs. Bulldozer, Piledriver, Llano, Trinity, and Richland based parts were all produced on GF’s 32 nm PD-SOI process. The lower power APUs such as Brazos and Kabini have been produced by TSMC on their 40 nm and 28 nm processes respectively.
Kaveri will take a slightly different approach here. It will be produced by GLOBALFOUNDRIES, but it will forego the SOI and utilize a bulk silicon process. 28 nm HKMG is very common around the industry, but few pure play foundries were willing to tailor their process to the direct needs of AMD and the Kaveri product. GF was able to do such a thing. APUs are a different kind of animal when it comes to fabrication, primarily because the two disparate units require different characteristics to perform at the highest efficiency. As such, compromises had to be made.
More Details from Lisa Su
The executives at AMD like to break their own NDAs. Then again, they are the ones typically setting these NDA dates, so it isn’t a big deal. It is no secret that Kaveri has been in the pipeline for some time. We knew a lot of the basic details of the product, but there were certainly things that were missing. Lisu Su went up onstage and shared a few new details with us.
Kaveri will be made up of 4 “Steamroller” cores, which are enhanced versions of the previous Bulldozer/Trinity/Vishera families of products. Nearly everything in the processor is doubled. It now has dual decode, more cache, larger TLBs, and a host of other smaller features that all add up to greater single thread performance and better multi-threaded handling and performance. Integer performance will be improved, and the FPU/MMX/SSE unit now features 2 x 128 bit FMAC units which can “fuse” and support AVX 256.
However, there was no mention of the fabled 6 core Kaveri. At this time, it is unlikely that particular product will be launched anytime soon.
ARM is Serious About Graphics
Ask most computer users from 10 years ago who ARM is, and very few would give the correct answer. Some well informed people might mention “Intel” and “StrongARM” or “XScale”, but ARM remained a shadowy presence until we saw the rise of the Smartphone. Since then, ARM has built up their brand, much to the chagrin of companies like Intel and AMD. Partners such as Samsung, Apple, Qualcomm, MediaTek, Rockchip, and NVIDIA have all worked with ARM to produce chips based on the ARMv7 architecture, with Apple being the first to release the first ARMv8 (64 bit) SOCs. The multitude of ARM architectures are likely the most shipped chips in the world, going from very basic processors to the very latest Apple A7 SOC.
The ARMv7 and ARMv8 architectures are very power efficient, yet provide enough performance to handle the vast majority of tasks utilized on smartphones and tablets (as well as a handful of laptops). With the growth of visual computing, ARM also dedicated itself towards designing competent graphics portions of their chips. The Mali architecture is aimed at being an affordable option for those without access to their own graphics design groups (NVIDIA, Qualcomm), but competitive with others that are willing to license their IP out (Imagination Technologies).
ARM was in fact one of the first to license out the very latest graphics technology to partners in the form of the Mali-T600 series of products. These modules were among the first to support OpenGL ES 3.0 (compatible with 2.0 and 1.1) and DirectX 11. The T600 architecture is very comparable to Imagination Technologies’ Series 6 and the Qualcomm Adreno 300 series of products. Currently NVIDIA does not have a unified mobile architecture in production that supports OpenGL ES 3.0/DX11, but they are adapting the Kepler architecture to mobile and will be licensing it to interested parties. Qualcomm does not license out Adreno after buying that group from AMD (Adreno is an anagram of Radeon).
Another Next Unit of Computing
Just about a year ago Intel released a new product called the Next Unit of Computing, or NUC for short. The idea was to allow Intel's board and design teams to bring the efficient performance of the ultra low voltage processors to a desktop, and creative, form factor. By taking what is essentially Ultrabook hardware and putting it in a 4-in by 4-in design Intel is attempting to rethink what the "desktop" computer is and how the industry develops for it.
We reviewed the first NUC last year, based on the Intel Ivy Bridge processor and took away a surprising amount of interest in the platform. It was (and is) a bit more expensive than many consumers are going to be willing to spend on such a "small" physical device but the performance and feature set is compelling.
This time around Intel has updated the 4x4 enclosure a bit and upgrade the hardware from Ivy Bridge to Haswell. That alone should result in a modest increase in CPU performance with quite a bit of increase in the integrated GPU performance courtesy of the Intel HD Graphics 5000. Other changes are on the table to; let's take a look.
The Intel D54250WYK NUC is a bare bones system that will run you about $360. You'll need to buy system memory and an mSATA SSD for storage (wireless is optional) to complete the build.
A Whole New Atom Family
This past spring I spent some time with Intel at its offices in Santa Clara to learn about a brand new architecture called Silvermont. Built for and targeted at low power platforms like tablets and smartphones, Silvermont was not simply another refresh of the aging Atom processors that were all based on Pentium cores from years ago; instead Silvermont was built from the ground up for low power consumption and high efficiency to compete against the juggernaut that is ARM and its partners. My initial preview of the Silvermont architecture had plenty of detail about the change to an out-of-order architecture, the dual-core modules that comprise it and the power optimizations included.
Today, during the annual Intel Developer Forum held in San Francisco, we are finally able to reveal the remaining details about the new Atom processors based on Silvermont, code named Bay Trail. Not only do we have new information about the designs, but we were able to get our hands on some reference tablets integrating Bay Trail and the new Atom Z3000 series of SoCs to benchmark and compare to offerings from Qualcomm, NVIDIA and AMD.
A Whole New Atom Family
It should be surprise to anyone that the name “Intel Atom Processor” has had a stigma attached to it almost since its initial release during the netbook craze. It was known for being slow and hastily put together though it was still a very successful product in terms of sales. With each successive release and update, Diamondville to Pineview to Cedarview, Atom was improved but only marginally so. Even with Medfield and Clover Trail the products were based around that legacy infrastructure and it showed. Tablets and systems based on Clover Trail saw only moderate success and lukewarm reviews.
With Silvermont the Atom brand gets a second chance. Some may consider it a fifth or sixth chance, but Intel is sticking with the name. Silvermont as an architecture is incredibly flexible and will find its way into several Intel products like Avoton, Bay Trail and Merrifield and in segments from the micro-server to smartphones to convertible tablets. Not only that, but Intel is aware that Windows isn’t the only game out there anymore and the company will support the architecture across Linux, Android and Windows environments.
Atom has been in tablets for some time now, starting in September of last year with Clover Trail deigns being announced during IDF. In February we saw the initial Android-based options also filter out, again based on Clover Trail. They were okay, but really only stop-gaps to prove that Intel was serious about the space. The real test will be this holiday season with Bay Trail at the helm.
While we always knew these Bay Trail platforms were going be branded as Atom we now have the full details on the numbering scheme and productization of the architecture. The Atom Z3700 series will consist of quad-core SoCs with Intel HD graphics (the same design as the Core processor series though with fewer compute units) that will support Windows and Android operating systems. The Atom Z3600 will be dual-core processors, still with Intel HD graphics, targeted only at the Android market.
Very Minor Changes
November 14th, 2011 - that is the date that Intel introduced the LGA 2011 socket and the Sandy Bridge-E processor. Intel continued their pattern of modifying their mainstream architecture, Sandy Bridge at the time, into a higher performance (and higher priced) enthusiast class. The new socket differentiated these components into their own category for workstation users and others who demand top performance. Today Intel officially unveils the Ivy Bridge-E platform with essentially the same mindset.
The top end offering under the IVB-E name is the Core i7-4960X, a six-core, HyperThreaded processor with Turbo Boost technology and up to 15MB of L3 cache. Sound familiar? It should. There is really very little different about the new 4960X when compared to the Sandy Bridge-E Core i7-3960X released in 2011. In fact, the new processors use the exact same socket and will work on the same X79 motherboards already on the market. (Pending, of course, on whether your manufacturer has updated the UEFI/Firmware accordingly.)
The Ivy Bridge-E Platform
Even though the platform and features are nearly identical between Sandy Bridge-E and Ivy Bridge-E there are some readers that might need a refresher or maybe had never really investigated Socket 2011 products before today. I'll step through the major building blocks of the new Core i7-4960X just in case.
Battle of the IGPs
Our long journey with Frame Rating, a new capture-based analysis tool to measure graphics performance of PCs and GPUs, began almost two years ago as a way to properly evaluate the real-world experiences for gamers. What started as a project attempting to learn about multi-GPU complications has really become a new standard in graphics evaluation and I truly believe it will play a crucial role going forward in GPU and game testing.
Today we use these Frame Rating methods and tools, which are elaborately detailed in our Frame Rating Dissected article, and apply them to a completely new market: notebooks. Even though Frame Rating was meant for high performance discrete desktop GPUs, the theory and science behind the entire process is completely applicable to notebook graphics and even on the integrated graphics solutions on Haswell processors and Richland APUs. It also is able to measure performance of discrete/integrated graphics combos from NVIDIA and AMD in a unique way that has already found some interesting results.
Battle of the IGPs
Even though neither side wants us to call it this, we are testing integrated graphics today. With the release of Intel’s Haswell processor (the Core i7/i5/i3 4000) the company has upgraded the graphics noticeably on several of their mobile and desktop products. In my first review of the Core i7-4770K, a desktop LGA1150 part, the integrated graphics now known as the HD 4600 were only slightly faster than the graphics of the previous generation Ivy Bridge and Sandy Bridge. Even though we had all the technical details of the HD 5000 and Iris / Iris Pro graphics options, no desktop parts actually utilize them so we had to wait for some more hardware to show up.
When Apple held a press conference and announced new MacBook Air machines that used Intel’s Haswell architecture, I knew I could count on Ken to go and pick one up for himself. Of course, before I let him start using it for his own purposes, I made him sit through a few agonizing days of benchmarking and testing in both Windows and Mac OS X environments. Ken has already posted a review of the MacBook Air 11-in model ‘from a Windows perspective’ and in that we teased that we had done quite a bit more evaluation of the graphics performance to be shown later. Now is later.
So the first combatant in our integrated graphics showdown with Frame Rating is the 11-in MacBook Air. A small, but powerful Ultrabook that sports more than 11 hours of battery life (in OS X at least) but also includes the new HD 5000 integrated graphics options. Along with that battery life though is the GT3 variation of the new Intel processor graphics that doubles the number of compute units as compared to the GT2. The GT2 is the architecture behind the HD 4600 graphics that sits with nearly all of the desktop processors, and many of the notebook versions, so I am very curious how this comparison is going to stand.
OpenCL Support in a Meaningful Way
Adobe had OpenCL support since last year. You would never benefit from its inclusion unless you ran one of two AMD mobility chips under Mac OSX Lion, but it was there. Creative Cloud, predictably, furthers this trend with additional GPGPU support for applications like Photoshop and Premiere Pro.
This leads to some interesting points:
- How OpenCL is changing the landscape between Intel and AMD
- What GPU support is curiously absent from Adobe CC for one reason or another
- Which GPUs are supported despite not... existing, officially.
This should be very big news for our readers who do production work whether professional or for a hobby. If not, how about a little information about certain GPUs that are designed to compete with the GeForce 700-series?
JJ pays PC Perspective a visit!
In case you hadn't heard, on June the 4th, (world famous) JJ Guerrero from ASUS stopped by the PC Perspective offices to help host a live stream focused on Z87 platforms and the Intel Haswell processor. Since Intel decided to launch on a Saturday morning, you might have missed the boat: the Core i7-4770K was reviewed right here on PC Perspective and the results are pretty good.
Motherboards, we got motherboards here!!
Along with Haswell though is the release of the new Z87 chipset and with THAT, about 100 different ASUS motherboards. I exaggerate, but only a little. In our live stream that aired for about 4.5 hours, JJ and I discussed about 20 different motherboard ranging from Mini-ITX options to the budget-minded Z87-A and even the ROG Maximus VI Extreme!
Below you will find an on-demand version of the stream, broken up into five segments.
ASUS Z87 Motherboard Segmentation
This first segment details the mindset ASUS had when creating the four different motherboard product lines: Mainstream, Workstation, TUF and ROG. Why do they need all of these options and what features and quality points are common across the entire families?
ASUS Mainstream Z87 Motherboard Lineup
ASUS' new mainstream line of motherboards with the z87 chipset range from the Z87-A to the Z87-Deluxe/Dual. JJ talks about the features that are added as you move up the product stack so that you can find the option that fits your platform needs and budget.
Trinity... but Better!
Richland. We have been hearing this name for a solid nine months. Originally Richland was going to be a low end Trinity model that was budget oriented (or at least that was the context we heard it in). Turns out Richland is something quite different, though the product group does extend all the way from the budget products up to mainstream prices. We have seen both AMD and Intel make speed bin updates throughout the years with their products, but that seems like it is becoming a thing of the past. Instead, AMD is refreshing their Trinity product in a pretty significant matter. It is not simply a matter of binning these chips up a notch.
Trinity was released last Fall and it was a solid product in terms of overall performance and capabilities. It was well worth the price that AMD charged, especially when compared to Intel processors that would often be significantly slower in terms of graphics. The “Piledriver” architecture powers both Trinity and Richland, and it is an improved version of the original “Bulldozer” architecture. Piledriver included some small IPC gains, but the biggest advantage given was in terms of power. It is a much more power efficient architecture that can be clocked higher than the original Bulldozer parts. Trinity turned out to be a power sipping part for both mobile and desktop. In ways, it helped to really keep AMD afloat.
It turns out there were still some surprises in store from Trinity, and they have only been exposed by the latest Richland parts. AMD is hoping to keep in front of Intel in terms of graphics performance and compatibility, even in the face of the latest Haswell parts. While AMD has not ported over GCN to the Trinity/Richland lineup, the VLIW4 unit present in the current parts is still very competitive. What is perhaps more important, the software support for both 3D applications and GPGPU is outstanding.
An new era for computing? Or, just a bit of catching up?
Early Tuesday, at 2am for viewers in eastern North America, Intel performed their Computex 2013 keynote to officially kick off Haswell. Unlike ASUS from the night prior, Intel did not announce a barrage of new products; the purpose is to promote future technologies and the new products of their OEM and ODM partners. In all, there was a pretty wide variety of discussed topics.
Intel carried on with the computational era analogy: the 80's was dominated by mainframes; the 90's were predominantly client-server; and the 2000's brought the internet to the forefront. While true, they did not explicitly mention how each era never actually died but rather just bled through: we still use mainframes, especially with cloud infrastructure; we still use client-server; and just about no-one would argue that the internet has been displaced, despite its struggle against semi-native apps.
Intel believes that we are currently in the two-in-one era, which they probably mean "multiple-in-one" due to devices such as the ASUS Transformer Book Trio. They created a tagline, almost a mantra, illustrating their vision:
"It's a laptop when you need it; it's a tablet when you want it."
But before elaborating, they wanted to discuss their position in the mobile market. They believe they are becoming a major player in the mobile market with key design wins and outperforming some incumbent system on a chips (SoCs). The upcoming Silvermont architecture pines to be fill in the gaps below Haswell, driving smartphones and tablets and stretching upward to include entry-level notebooks and all-in-one PCs. The architecture promises to scale between offering three-fold more performance than its past generation, or a fifth of the power for equivalent performance.
Ryan discussed Silvermont last month, be sure to give his thoughts a browse for more depth.
Cortex-A12 fills a gap
Starting off Computex with an interesting announcement, ARM is talking about a new Cortex-A12 core that will attempt to address a performance gap in the SoC ecosystem between the A9 and A15. In the battle to compete with Krait and Intel's Silvermont architecture due in late 2013, ARM definitely needed to address the separation in performance and efficiency of the A9 and A15.
Source: ARM. Top to bottom: Cortex-A15, A12, A9 die size estimate
Targeted at mid-range devices that tend to be more cost (and thus die-size) limited, the Cortex-A12 will ship in late 2014 for product sampling and you should begin seeing hardware for sale in early 2015.
Architecturally, the changes for the upcoming A12 core revolve around a move to fully out of order dual-issue design including the integrated floating point units. The execution units are faster and the memory design has been improved but ARM wasn't ready to talk about specifics with me yet; expect that later in the year.
ARM claims this results in a 40% performance gain for the Cortex-A12 over the Cortex-A9, tested in SPECint. Because product won't even start sampling until late in 2014 we have no way to verify this data yet or to evaluate efficiency claims. That time lag between announcement and release will also give competitors like Intel, AMD and even Qualcomm time to answer back with potential earlier availability.
Haswell - A New Architecture
Thanks for stopping by our coverage of the Intel Haswell, 4th Generation Core processor and Z87 chipset release! We have a lot of different stories for you to check out and I wanted to be sure you knew about them all.
- PCPer Live! ASUS Z87 Motherboard and Intel Haswell Live Event! - Tuesday, June 4th we will be hosting a live streaming event with JJ from ASUS. Stop by to learn about Z87 and overclocking Haswell and to win some motherboards and graphics cards!
- ASUS ROG Maximus VI Extreme Motherboard Review
- MSI Z87-GD65 Gaming Motherboard Review
- ASUS Gryphon Z87 Micro-ATX Motherboard Review
This spring has been unusually busy for us here at PC Perspective - with everything from new APU releases from AMD, new graphics cards from NVIDIA and now new desktop and mobile processors from Intel. There has never been a better time to be a technology enthusiast though some would argue that the days of the enthusiast PC builder are on the decline. Looking at the revived GPU wars and the launch of Intel's Haswell architecture, 4th Generation Core processors we couldn't disagree more.
Built on the same 22nm process technology that Ivy Bridge brought to the world, Haswell is a new architecture from Intel that really changes focus for the company towards a single homogenous design that has the ability to span wide ranging markets. From tablets to performance workstations, Haswell will soon finds its way into just about every crevasse of your technology life.
Today we focus on the desktop though - the release of the new Intel Core i7-4770K, fully unlocked, LGA1150 processor built for the Z87 chipset and DIY builders everywhere. In this review we'll discuss the architectural changes Haswell brings, the overclocking capabilities and limitations of the new design, application performance, graphics performance and quite a bit more.
Haswell remains a quad-core processor built on 1.4 billion transistors in a die measuring 177 mm2 with integrated processor graphics, shared L3 cache, dual channel DDR3 memory controller. But much has changed - let's dive in.
Kabini is a pretty nifty little chip. So nifty, AMD is actually producing server grade units for the growing micro-server market. As readers may or may not remember, AMD bought up SeaMicro last year to get a better grip on the expanding micro-server market. While there are no official announcements from SeaMicro about offerings utilizing the server-Kabini parts, we can expect there to be sooner as opposed to later.
The Kabini parts (Jaguar + GCN) will be branded Opteron X-series. So far there are two announced products; one utilizes the onboard graphics portion while the other has the GCN based unit disabled. The products have a selectable TDP that ranges from 9 watts to 22 watts. This should allow the vendors to further tailor the chips to their individual solutions.
The X1150 is the GPU-less product with adjustable TDPs ranging from 9 to 17 watts. It is a native quad core product with 2 MB of L2 cache. It can be clocked up to 2 GHz, which we assume is that 17 watts range. The X2150 has an adjustable TDP range from 11 to 22 watts. The four cores can go to a max speed of 1.9 GHz while the GPU can go from 266 MHz up to a max 600 MHz.
The Architectural Deep Dive
AMD officially unveiled their brand new Bobcat architecture to the world at CES 2011. This was a very important release for AMD in the low power market. Even though Netbooks were a dying breed at that time, AMD experienced a good uptick in sales due to the good combination of price, performance, and power consumption for the new Brazos platform. AMD was of the opinion that a single CPU design would not be able to span the power consumption spectrum of CPUs at the time, and so Bobcat was designed to fill that space which existed from 1 watt to 25 watts. Bobcat never was able to get down to that 1 watt point, but the Z-60 was a 4.5 watt part with two cores and the full 80 Radeon cores.
The Bobcat architecture was produced on TSMC’s 40 nm process. AMD eschewed the upcoming 32 nm HKMG/SOI process that was being utilized for the upcoming Llano and Bulldozer parts. In hindsight, this was a good idea. Yields took a while to improve on GLOBALFOUNDRIES new process, while the existing 40 nm product from TSMC was running at full speed. AMD was able to provide the market in fairly short order with good quantities of Bobcat based APUs. The product more than paid for itself, and while not exactly a runaway success that garnered many points of marketshare from Intel, it helped to provide AMD with some stability in the market. Furthermore, it provided a very good foundation for AMD when it comes to low power parts that are feature rich and offer competitive performance.
The original Brazos update did not happen, instead AMD introduced Brazos 2.0 which was a more process improvement oriented product which featured slightly higher speeds but remained in the same TDP range. The uptake of this product was limited, and obviously it was a minor refresh to buoy purchases of the aging product. Competition was coming from low power Ivy Bridge based chips, as well as AMD’s new Trinity products which could reach TDPs of 17 watts. Brazos and Brazos 2.0 did find a home in low powered, but full sized notebooks that were very inexpensive. Even heavily leaning Intel based manufacturers like Toshiba released Brazos based products in the sub-$500 market. The combination of good CPU performance and above average GPU performance made this a strong product in this particular market. It was so power efficient, small batteries were typically needed, thereby further lowering the cost.
All things must pass, and Brazos is no exception. Intel has a slew of 22 nm parts that are encroaching on the sub-15 watt territory, ARM partners have quite a few products that are getting pretty decent in terms of overall performance, and the graphics on all of these parts are seeing some significant upgrades. The 40 nm based Bobcat products are no longer competitive with what the market has to offer. So at this time we are finally seeing the first Jaguar based products. Jaguar is not a revolutionary product, but it improves on nearly every aspect of performance and power usage as compared to Bobcat.
A Reference Platform - But not a great one
Believe it or not, AMD claims that the Brazos platform, along with the "Brazos 2.0" update the following year, were the company's most successful mobile platforms in terms of sales and design wins. When it first took the scene in late 2010, it was going head to head against the likes of Intel's Atom processor and the combination of Atom + NVIDIA ION and winning. It was sold in mini-ITX motherboard form factors as well as small clamshell notebooks (gasp, dare we say...NETBOOKS?) and though it might not have gotten the universal attention it deserved, it was a great part.
With Kabini (and Temash as well), AMD is making another attempt to pull in some marketshare in the low power, low cost mobile markets. I have already gone over the details of the mobile platforms that AMD is calling Elite Mobility (Temash) and Mainstream (Kabini) in a previous article that launched today.
This article will quickly focus on the real-world performance of the Kabini platform as demonstrated by a reference laptop I received while visiting AMD in Toronto a few weeks ago. While this design isn't going to be available in retail (and I am somewhat thankful based on the build quality) the key is to look at the performance and power efficiency of the platform itself, not the specific implementation.
Kabini Architecture Overview
The building blocks of Kabini are four Jaguar x86 cores and 128 Radeon cores colleted in a pair of Compute Units - similar in many ways to the CUs found in the Radeon HD 7000 series discrete GPUs. Josh has written a very good article that focuses on the completely new architecture that is Jaguar and compared it to other processors including AMD's previous core used in Brazos, the Bobcat core.
2013 Elite Mobility APU - Temash
AMD has a lot to say today. At an event up in Toronto this month we got to sit down with AMD’s marketing leadership and key engineers to learn about the company’s plans for 2013 mobility processors. This includes a refreshed high performance APU known as Richland that will replace Trinity as well as two brand new APUs based on Jaguar CPU cores and the GCN architecture for low power platforms.
Josh has put together an article that details the Jaguar + GCN design of Temash and Kabini and I have also posted some initial performance results of the Kabini reference system AMD handed me in May. This article will detail the plans that AMD has for each of these three mobile segments, starting with the newest entry, AMD’s Elite Mobility APU platform – Temash.
The goal of the APU, the combination of traditional x86 processing cores and a discrete style graphics system, was to offer unparalleled performance in smaller and more efficient form factors. AMD believes that their leadership in the graphics front will offer them a good sized advantage in areas including performance tablets, hybrids and small screen clamshells that may or not be touch enabled. They are acknowledging though that getting into the smallest tablets (like the Nexus 7) is not on the table quite yet and that content creation desktop replacements are probably outside the scope of Richland.
2013 Elite Mobility APU – Temash
AMD will have the first x86 quad-core SoC design with Temash and AMD thinks it will make a big splash in a relatively new market known as the “high performance” tablet.
Temash, built around Jaguar CPU cores and the graphics technology of GCN, will be able to offer fully accelerated video playback with transcode support as well with features like image stabilization and Perfect Picture enabled. Temash will also be the only SoC to offer support for DX11 graphics and even though some games might not have the ability to show off added effects there are quite a few performance advantages of DX11 over DX10/9. With more than 100% claimed GPU performance upgrade you’ll be able to drive displays at 2560x1600 for productivity use and even be able to take advantage of wireless display options.
A much needed architecture shift
It has been almost exactly five years since the release of the first Atom branded processors from Intel, starting with the Atom 230 and 330 based on the Diamondville design. Built for netbooks and nettops at the time, the Atom chips were a reaction to a unique market that the company had not planned for. While the early Atoms were great sellers, they were universally criticized by the media for slow performance and sub-par user experiences.
Atom has seen numerous refreshes since 2008, but they were all modifications of the simplistic, in-order architecture that was launched initially. With today's official release of the Silvermont architecture, the Atom processors see their first complete redesign from the ground up. With the focus on tablets and phones rather than netbooks, can Intel finally find a foothold in the growing markets dominated by ARM partners?
I should note that even though we are seeing the architectural reveal today, Intel doesn't plan on having shipping parts until late in 2013 for embedded, server and tablets and not until 2014 for smartphones. Why the early reveal on the design then? I think that pressure from ARM's designs (Krait, Exynos) as well as the upcoming release of AMD's own Kabini is forcing Intel's hand a bit. Certainly they don't want to be perceived as having fallen behind and getting news about the potential benefits of their own x86 option out in the public will help.
Silvermont will be the first Atom processor built on the 22nm process, leaving the 32nm designs of Saltwell behind it. This also marks the beginning of a new change in the Atom design process, to adopt the tick/tock model we have seen on Intel's consumer desktop and notebook parts. At the next node drop of 14nm, we'll see see an annual cadence that first focuses on the node change, then an architecture change at the same node.
By keeping Atom on the same process technology as Core (Ivy Bridge, Haswell, etc), Intel can put more of a focus on the power capabilities of their manufacturing.
The Intel HD Graphics are joined by Iris
Intel gets a bad wrap on the graphics front. Much of it is warranted but a lot of it is really just poor marketing about the technologies and features they implement and improve on. When AMD or NVIDIA update a driver or fix a bug or bring a new gaming feature to the table, they are sure that every single PC hardware based website knows about and thus, that as many PC gamers as possible know about it. The same cannot be said about Intel though - they are much more understated when it comes to trumpeting their own horn. Maybe that's because they are afraid of being called out on some aspects or that they have a little bit of performance envy compared to the discrete options on the market.
Today might be the start of something new from the company though - a bigger focus on the graphics technology in Intel processors. More than a month before the official unveiling of the Haswell processors publicly, Intel is opening up about SOME of the changes coming to the Haswell-based graphics products.
We first learned about the changes to Intel's Haswell graphics architecture way back in September of 2012 at the Intel Developer Forum. It was revealed then that the GT3 design would essentially double theoretical output over the currently existing GT2 design found in Ivy Bridge. GT2 will continue to exist (though slightly updated) on Haswell and only some versions of Haswell will actually see updates to the higher-performing GT3 options.
In 2009 Intel announced a drive to increase graphics performance generation to generation at an exceptional level. Not long after they released the Sandy Bridge CPU and the most significant performance increase in processor graphics ever. Ivy Bridge followed after with a nice increase in graphics capability but not nearly as dramatic as the SNB jump. Now, according to this graphic, the graphics capability of Haswell will be as much as 75x better than the chipset-based graphics from 2006. The real question is what variants of Haswell will have that performance level...
I should note right away that even though we are showing you general performance data on graphics, we still don't have all the details on what SKUs will have what features on the mobile and desktop lineups. Intel appears to be trying to give us as much information as possible without really giving us any information.
heterogeneous Uniform Memory Access
Several years back we first heard AMD’s plans on creating a uniform memory architecture which will allow the CPU to share address spaces with the GPU. The promise here is to create a very efficient architecture that will provide excellent performance in a mixed environment of serial and parallel programming loads. When GPU computing came on the scene it was full of great promise. The idea of a heavily parallel processing unit that will accelerate both integer and floating point workloads could be a potential gold mine in wide variety of applications. Alas, the promise of the technology did not meet expectations when we have viewed the results so far. There are many problems with combining serial and parallel workloads between CPUs and GPUs, and a lot of this has to do with very basic programming and the communication of data between two separate memory pools.
CPUs and GPUs do not share common memory pools. Instead of using pointers in programming to tell each individual unit where data is stored in memory, the current implementation of GPU computing requires the CPU to write the contents of that address to the standalone memory pool of the GPU. This is time consuming and wastes cycles. It also increases programming complexity to be able to adjust to such situations. Typically only very advanced programmers with a lot of expertise in this subject could program effective operations to take these limitations into consideration. The lack of unified memory between CPU and GPU has hindered the adoption of the technology for a lot of applications which could potentially use the massively parallel processing capabilities of a GPU.
The idea for GPU compute has been around for a long time (comparatively). I still remember getting very excited about the idea of using a high end video card along with a card like the old GeForce 6600 GT to be a coprocessor which would handle heavy math operations and PhysX. That particular plan never quite came to fruition, but the idea was planted years before the actual introduction of modern DX9/10/11 hardware. It seems as if this step with hUMA could actually provide a great amount of impetus to implement a wide range of applications which can actively utilize the GPU portion of an APU.