The AMD Kaveri Architecture
Kaveri: AMD’s New Flagship Processor
How big is Kaveri? We already know the die size of it, but what kind of impact will it have on the marketplace? Has AMD chosen the right path by focusing on power consumption and HSA? Starting out an article with three questions in a row is a questionable tactic for any writer, but these are the things that first come to mind when considering a product the likes of Kaveri. I am hoping we can answer a few of these questions by the end of this article, but alas it seems as though the market will have the final say as to how successful this new architecture is.
AMD has been pursuing the “Future is Fusion” line for several years, but it can be argued that Kaveri is truly the first “Fusion” product that completes the overall vision for where AMD wants to go. The previous several generations of APUs were initially not all that integrated in a functional sense, but the complexity and completeness of that integration has been improved upon with each iteration. Kaveri takes this integration to the next step, and one which fulfills the promise of a truly heterogeneous computing solution. While AMD has the hardware available, we have yet to see if the software companies are willing to leverage the compute power afforded by a robust and programmable graphics unit powered by AMD’s GCN architecture.
(Editor's Note: The following two pages were written by our own Josh Walrath, dicsussing the technology and architecture of AMD Kaveri. Testing and performance analysis by Ryan Shrout starts on page 3.)
The first step in understanding Kaveri is taking a look at the process technology that AMD is using for this particular product. Since AMD divested itself of their manufacturing arm, they have had to rely on GLOBALFOUNDRIES to produce nearly all of their current CPUs and APUs. Bulldozer, Piledriver, Llano, Trinity, and Richland based parts were all produced on GF’s 32 nm PD-SOI process. The lower power APUs such as Brazos and Kabini have been produced by TSMC on their 40 nm and 28 nm processes respectively.
Kaveri will take a slightly different approach here. It will be produced by GLOBALFOUNDRIES, but it will forego the SOI and utilize a bulk silicon process. 28 nm HKMG is very common around the industry, but few pure play foundries were willing to tailor their process to the direct needs of AMD and the Kaveri product. GF was able to do such a thing. APUs are a different kind of animal when it comes to fabrication, primarily because the two disparate units require different characteristics to perform at the highest efficiency. As such, compromises had to be made.
Subject: Processors | January 7, 2014 - 04:52 AM | Josh Walrath
Tagged: amd, CES, 2014, Kaveri, A10 7850K, A10 7700K, APU, firepro, hsa
This year’s AMD CES was actually more interesting than I was expecting. The details of the event were well known, as most Kaveri details have been revealed over the past few months. I was unsure what Lisa Su and the gang would go over, but it was actually more interesting than I was expecting.
This past year has been a big one for AMD. They seem to be doing a lot better than others expected them to, especially with all of the delayed product launches on the CPU side for quite a few years. This year saw the APU take a pretty prominent place in the industry with the launch of the latest generation consoles from Sony and Microsoft. AMD made inroads with mobile form factors with a variety of APUs. The HSA Foundation members have grown and HSA members ship two out of every three connected, smart devices. Apple also includes Firepro graphics cards with all of their new Mac Pros.
Kaveri is of course the big news here. AMD feels that this is the best APU yet. The combination of Steamroller CPU cores, GCN graphics compute cores, HSA, hUMA, HQ, TrueAudio, Mantle support, PCI-E 3.0 support, and a configurable TDP makes for a pretty compelling product. AMD has shuffled some nomenclature about by saying that Kaveri, at the top end, is comprised of 12 compute cores. These include 4 Steamroller cores and 8 GCN compute clusters. Each compute cluster matches the historical definition of a core, but of course it looks quite a bit different than a traditional x86 core.
We have gone over Kaveri pretty extensively in the past. The CPU is clocked at 3.7 GHz with a 4 GHz boost. The graphics portion clocks in at 720 MHz. It can support up to DDR-3 2400 MHz memory, which is really needed to extract as much performance out of this new APU. Benchmarks provided by AMD show this product to be a big jump from the previous Richland, and in these particular benchmarks are quite a bit faster than the competing i5 4670K.
Gaming performance is also improved. This APU can run most current applications at 1080P resolutions with low to medium quality settings. Older titles can be run at 1080P with Medium to High/Extreme settings. While this processor is rated at around 867 GFLOPS, which is around 110 GFLOPS greater than the previous top end Richland, it is more efficient at delivering that theoretical performance. It looks to be a significant improvement all around.
Software support is improving with applications from companies like Adobe, The Document Foundation, and Nuance. These cover HSA applications and in Nuance’s case, using the TrueAudio portion to clean up and accelerate voice recognition. TrueAudio is also being supported in five upcoming games. This is not a huge amount, but it is a decent start for this new technology.
Mantle is gaining a lot more momentum with support from 3 engines, 5 developers, and 20+ games in development. They showed off Battlefied 4 running Mantle on a Kaveri APU for the first time publicly. They mentioned that it ran 45% faster than Direct3D at the same quality levels on the same hardware. The display showed frame rates up in the low 50 fps area.
AMD is continuing to move forward on their low power offerings based on Beema and Mullins. Lisa claims that these parts are outperforming the Intel Baytrail offerings in both CPU performance and graphics. Unfortunately, she mentioned noting about the power consumption associated with these results. They showed off the Discovery tablet as well as a fully functional PC that was the size of a large cellphone.
They closed up the even by talking about the Surround House 2. This demo looks significantly better than the previous iteration we saw last year. This features something like a 34.2 speaker setup in a projected dome. It is much more complex than the House from last year, but the hardware running it all is rather common. A single high end Firepro card running on a single A10 7850K. The demo is also one of the first shows of a 360 degree gesture recognition setup.
AMD has come a long way since hitting rock bottom a few years back. They continue to claw their way back to relevance, and they hope that Kaveri will help them regain a foothold in the computing market. They are certainly doing well in the graphics market, but the introduction of Kaveri should help them gain more momentum in the CPU/APU market. We have yet to test Kaveri on our own, but initial results look promising. It is a better APU, but we just don’t know how much better so far.
Follow all of our coverage of the show at http://pcper.com/ces!
Subject: General Tech, Graphics Cards, Processors | December 3, 2013 - 04:12 AM | Scott Michaud
Tagged: Kaveri, APU, amd
The launch and subsequent availability of Kaveri is scheduled for the CES time frame. The APU unites Steamroller x86 cores with several Graphics Core Next (GCN) cores. The high-end offering, the A10-7850K, is capable of 856 GFLOPs of compute power (most of which is of course from the GPU).
Image/Leak Credit: Prohardver.hu
We now know about two SKUs: the A10-7850K and the A10-7700K. Both parts are quite similar except that the higher model is given a 200 MHz CPU bump, 3.8 GHz to 4.0 Ghz, and 33% more GPU units, 6 to 8.
But how does this compare? The original source (prohardver.hu) claims that Kaveri will achieve an average 28 FPS in Crysis 3 on low at 1680x1050; this is a 12% increase over Richland. It also achieved an average 53 FPS with Sleeping Dogs on Medium which is 26% more than Richland.
These are healthy increases over the previous generation but do not even account for HSA advantages. I am really curious what will happen if integrated graphics become accessible enough that game developers decide to target it for general compute applications. The reduction in latency (semi-wasted time bouncing memory between compute devices) might open this architecture to where it can really shine.
We will do our best to keep you up to date on this part especially when it launches at CES.
Subject: General Tech, Systems | November 22, 2013 - 08:02 PM | Ryan Shrout
Tagged: video, teardown, xbox one, APU, amd, xbox, xb1
Last week we brought a teardown of the new Sony PlayStation 4 (PS4) console and this week we do the same for Microsoft's new Xbox One console.
In this video, which is a recording of our live stream that started last night at 12:30am EST, you'll see us unbox the Xbox One, turn it on, play with the new Kinect, take it apart and put it back together. And this time we didn't even break anything - though removing the plastic clips on the Xbox One are particularly more annoying and time consuming than the screws on the PS4.
Though they are out of stock, Amazon.com appears to be getting additional Xbox One consoles in stock pretty regularly, so keep an eye out.
The 7 Year Console Refresh
The consoles are coming! The consoles are coming! Ok, that is not necessarily true. One is already here and the second essentially is too. This of course brings up the great debate between PCs and consoles. The past has been interesting when it comes to console gaming, as often the consoles would be around a year ahead of PCs in terms of gaming power and prowess. This is no longer the case with this generation of consoles. Cutting edge is now considered mainstream when it comes to processing and graphics. The real incentive to buy this generation of consoles is a lot harder to pin down as compared to years past.
The PS4 retails for $399 US and the upcoming Xbox One is $499. The PS4’s price includes a single controller, while the Xbox’s package includes not just a controller, but also the next generation Kinect device. These prices would be comparable to some low end PCs which include keyboard, mouse, and a monitor that could be purchased from large brick and mortar stores like Walmart and Best Buy. Happily for most of us, we can build our machines to our own specifications and budgets.
As a directive from on high (the boss), we were given the task of building our own low-end gaming and productivity machines at a price as close to that of the consoles and explaining which solution would be superior at the price points given. The goal was to get as close to $500 as possible and still have a machine that would be able to play most recent games at reasonable resolutions and quality levels.
Subject: General Tech, Systems | November 15, 2013 - 02:42 PM | Ryan Shrout
Tagged: video, teardown, ps4, playstation 4, APU, amd
Last night Ken and I headed over the local Best Buy to pick up my preorder of the new Playstation 4. What would any hardware geek immediately do with this hardware? Obviously we take a screwdriver to it and take it apart.
In this video, which is a recording of our live stream that started last night at 12:30am EST, you'll see us unbox the PS4, turn it on, take it apart and put it back together. And I only had to fix one piece with gaffers tape, so there's that.
(We'll have a collection of high-resolution photos later today as well.)
Though they are out of stock, Amazon.com appears to be getting more PS4s in stock pretty regularly, so keep an eye out if you are interested in picking one up still.
Subject: Processors | November 13, 2013 - 05:35 PM | Josh Walrath
Tagged: Puma, Mullins, mobile, Jaguar, GCN, beema, apu13, APU, amd, 2014
AMD’s APU13 is all about APUs and their programming, but the hardware we have seen so far has been dominated by the upcoming Kaveri products for FM2+. It seems that AMD has more up their sleeves for release this next year, and it has somewhat caught me off guard. The Beema and Mullins based products are being announced today, but we do not have exact details on these products. The codenames have been around for some time now, but interest has been minimal since they are evolutionary products based on Kabini and Temash APUs that have been available this year. Little did I know that things would be far more interesting than that.
The basis for Beema and Mullins is the Puma core. This is a highly optimized revision of Jaguar, and in some ways can be considered a new design. All of the basics in terms of execution units, caches, and memory controllers are the same. What AMD has done is go through the design with a fine toothed comb and make it far more efficient per clock than what we have seen previously. This is still a 28 nm part, but the extra attention and love lavished upon it by AMD has resulted in a much more efficient system architecture for the CPU and GPU portions.
The parts will be offered in two and four core configurations. Beema will span from 10W to 25W configurations. Mullins will go all the way down to “2W SDP”. SDP essentially means that while the chip can be theoretically rated higher, it will rarely go above that 2W envelope in the vast majority of situations. These chips are expected to be around 2X more efficient per clock than the previous Jaguar based products. This means that at similar clock speeds, Beema and Mullins will pull far less power than that previous gen. It should also allow some higher clockspeeds at the top end 25W area.
These will be some of the first fanless quad cores that AMD will introduce for the tablet market. Previously we have seen tablets utilize the cut down versions of Temash to hit power targets, but with this redesign it is entirely possible to utilize the fully enabled quad core Mullins. AMD has not given us specific speeds for these products, but we can guess that they will be around what we see currently, but the chip will just have a lower TDP rating.
AMD is introducing their new security platform based on the ARM Trustzone. Essentially a small ARM Cortex A5 is integrated in the design and handles the security aspects of this feature. We were not briefed on how this achieves security, but the slide below gives some of the bullet points of the technology.
Since the pure-play foundries will not have a workable 20 nm process for AMD to jump to in a timely manner, AMD had no other choice but to really optimize the Jaguar core to make it more competitive with products from Intel and the ARM partners. At 28 nm the ARM ecosystem has a power advantage over AMD, while at 22 nm Intel offers similar performance to AMD but with greater power efficiency.
This is a necessary update for AMD as the competition has certainly not slowed down. AMD is more constrained obviously by the lack of a next-generation process node available for 1H 2014, so a redesign of this magnitude was needed. The performance per watt metric is very important here, as it promises longer battery life without giving up the performance people received from the previous Kabini/Temash family of APUs. This design work could be carried over to the next generation of APUs using 20 nm and below, which hopefully will keep AMD competitive with the rest of the market. Beema and Mullins are interesting looking products that will be shown off at CES 2014.
Subject: Graphics Cards, Processors | November 12, 2013 - 06:10 PM | Ryan Shrout
Tagged: amd, Kaveri, APU, video, hsa
Yesterday at the AMD APU13 developer conference, the company showed off the upcoming Kaveri APU running Battlefield 4 completely on the integrated graphics. I was able to push the AMD guys along and get a little more personal demo to share with our readers. The Kaveri APU had some of its details revealed this week:
- Quad-core Steamroller x86
- 512 Stream Processor GPU
- 856 GFLOPS of theoretical performance
- 3.7 GHz CPU clock speed, 720 MHz GPU clock speed
AMD wanted to be sure we pointed out in this video that the estimate clock speeds for FLOP performance may not be what the demo system was run at (likely a bit lower). Also, the version of Battlefield 4 here is the standard retail version and with further improvements from the driver team as the upcoming Mantle API implementation will likely introduce even more performance for the APU.
The game was running at 1920x1080 with MOSTLY medium quality settings (lighting set to low) but the results still looked damn impressive and the frame rates were silky and smooth. Considering this is running on a desktop with integrated processor graphics, the game play experience is simply unmatched.
Memory in the system was running at 2133 MHz.
The second demo looks at the image decoding acceleration that AMD is going to enable with Kaveri APUs upon release with a driver. Essentially, as the demonstration shows in the video, AMD is overwriting the integrated Windows JPG decompression algorithm with a new one that utilizes HSA to accelerate on both the x86 and SIMD (GPU) portions of the silicon. For the most strenuous demo that used 22 MP images saw a 100% increase in performance compared to the Kaveri CPU cores alone.
More Details from Lisa Su
The executives at AMD like to break their own NDAs. Then again, they are the ones typically setting these NDA dates, so it isn’t a big deal. It is no secret that Kaveri has been in the pipeline for some time. We knew a lot of the basic details of the product, but there were certainly things that were missing. Lisu Su went up onstage and shared a few new details with us.
Kaveri will be made up of 4 “Steamroller” cores, which are enhanced versions of the previous Bulldozer/Trinity/Vishera families of products. Nearly everything in the processor is doubled. It now has dual decode, more cache, larger TLBs, and a host of other smaller features that all add up to greater single thread performance and better multi-threaded handling and performance. Integer performance will be improved, and the FPU/MMX/SSE unit now features 2 x 128 bit FMAC units which can “fuse” and support AVX 256.
However, there was no mention of the fabled 6 core Kaveri. At this time, it is unlikely that particular product will be launched anytime soon.
AMD is up to some interesting things. Today at AMD’s tech day, we discovered a veritable cornucopia of information. Some of it was pretty interesting (audio), some was discussed ad-naseum (audio, audio, and more audio), and one thing in particular was quite shocking. Mantle was the final, big subject that AMD was willing to discuss. Many assumed that the R9 290X would be the primary focus of this talk, but in fact it very much was an aside that was not discussed at any length. AMD basically said, “Yes, the card exists, and it has some new features that we are not going to really go over at this time.” Mantle, as a technology, is at the same time a logical step as well as an unforeseen one. So what all does Mantle mean for users?
Looking back through the mists of time, when dinosaurs roamed the earth, the individual 3D chip makers all implemented low level APIs that allowed programmers to get closer to the silicon than what other APIs such as Direct3D and OpenGL would allow. This was a very efficient way of doing things in terms of graphics performance. It was an inefficient way to do things for a developer writing code for multiple APIs. Microsoft and the Kronos Group had solutions with Direct3D and OpenGL that allowed these programmers to develop for these high level APIs very simply (comparatively so). The developers could write code that would run D3D/OpenGL, and the graphics chip manufacturers would write drivers that would interface with Direct3D/OpenGL, which then go through a hardware abstraction layer to communicate with the hardware. The onus was then on the graphics people to create solid, high performance drivers that would work well with DirectX or OpenGL, so the game developer would not have to code directly for a multitude of current and older graphics cards.