AMD Spills more Kaveri Beans: AMD APU13

Subject: Processors
Manufacturer: AMD

More Details from Lisa Su

The executives at AMD like to break their own NDAs.  Then again, they are the ones typically setting these NDA dates, so it isn’t a big deal.  It is no secret that Kaveri has been in the pipeline for some time.  We knew a lot of the basic details of the product, but there were certainly things that were missing.  Lisu Su went up onstage and shared a few new details with us.

View Full Size

Kaveri will be made up of 4 “Steamroller” cores, which are enhanced versions of the previous Bulldozer/Trinity/Vishera families of products.  Nearly everything in the processor is doubled.  It now has dual decode, more cache, larger TLBs, and a host of other smaller features that all add up to greater single thread performance and better multi-threaded handling and performance.   Integer performance will be improved, and the FPU/MMX/SSE unit now features 2 x 128 bit FMAC units which can “fuse” and support AVX 256.

However, there was no mention of the fabled 6 core Kaveri.  At this time, it is unlikely that particular product will be launched anytime soon. 

Click to read the entire article here!

The GCN cores in Kaveri are exactly as advertised, but we were not entirely certain how many were going to be included.  Lisa mentioned that there will be up to 8 GCN compute units, which comes out to be around 512 stream units.  GCN has turned out to be a very flexible and efficient architecture for AMD, and a lot of stream units can be fit per mm-squared.  These GCN units are also DX 11.2 compliant.  Lisa did not give us the speeds that we would see running, but they did show off performance running BF4 at medium quality settings and 1080P.  The Kaveri chip was running between 24 and 30 fps, compared to an Intel i7-4770K paired with the GT 630 graphics card.  The Intel/NVIDIA combination was providing 12 to 14 fps in performance in the same scene with the same quality settings.

View Full Size

Kaveri will be the first HSA enabled part to be released (though the specification is not yet finalized).  It will contain the hUMA and HQ specifications that we have been made aware of over the past few years.  Kaveri will also be the first APU to support shared physical and virtual memory spaces between the CPU and GPU.  In addition, this chip will feature the TrueAudio functionality introduced with the latest AMD standalone GPUs (Hawaii and Bonaire).  This DSP technology will accelerate audio functions when combined with the necessary middleware.  From all indications, adding this functionality entails a very small die size hit.

AMD also talked about Mantle support for Kaveri.  This low-level rendering API will give Kaveri a nice boost in performance with games that support Mantle technology.  While it will not be faster under all situations as a standalone budget or midrange card, it will be a free performance boost for people who game on their APU. 

Kaveri will also be the first PCI-E 3.0 APU from AMD.  While rumor had it that Trinity supported PCI-E 3.0, it was never certified. AMD is pushing PCI-E 3.0 with Kaveri and the latest FM2+ motherboards.

View Full Size

Software support for heterogeneous computing is on the rise.  AMD detailed the progress they have made and what products are coming up that will help programmers and developers integrate heterogeneous features into their products.  Throughout the next year, more tools and features will be released so that it will become more and more transparent to programmers to implement parallel processing in appropriate scenarios.

Kaveri at the top end will feature around 856 GFlops of processing power.  This is well up from the 779 GFlops of the A10-6800K.  Kaveri also throws in support for up to 32 GB of memory that is shared between the CPU and GPU.  There should be a smaller latency hit for parallel loads as compared to the previous memory setup with Trinty/Richland.

View Full Size

Kaveri is going to be a very big product for AMD.  While it likely will not compete in CPU performance with the i5 and i7 4000 series from Intel, it will certainly be a big jump up in terms of graphics performance.  Nobody is sure where CPU performance will land, but it will certainly be a big improvement over the current iterations present in Vishera and Richland based products.  All indications point to it eclipsing the very competent Intel Iris Pro graphics component that is found only in certain Ultrathin notebooks, including Apple's new MacBook Pro 15'.  It is also the first shot across the bow of the industry when it comes to serious heterogeneous computing.  AMD is doing their best to make sure the software ecosystem is there, and the HSA group has gained a lot of momentum with the addition of Qualcomm and Samsung to the group (not to mention ARM and MediaTek being some of the original founders).

Lisa was pretty emphatic that they would be shipping product in 2013.  This may be true, but we really have no idea exactly how much product will be shipped.  They could certainly ship a couple thousand APUs at the very end of December and they would be keeping their word.  Launch looks to be January 14, 2014, but availability is not known at this time.  Press samples will probably be available some 3 weeks before launch. 

This is a very significant moment for AMD, and one very much akin to what the original Athlon 64 meant for the company.  They are treading on new territory here, and their implementation is logical and open to the industry.  Kaveri may not be an Intel killer, but it will certainly insure the survival of AMD if it performs as expected.

November 11, 2013 | 10:03 PM - Posted by AMDbumlover (not verified)

can't wait for a "balanced" review from PcPer...

November 11, 2013 | 11:45 PM - Posted by AMDbumlover (not verified)

Because, clearly, they cannot report on something without the anti-fanboys complaining.
Here's a cup of water.
Now shut the full cup.

November 12, 2013 | 12:02 AM - Posted by AMDbumlover (not verified)

I am so bipolar, here I am having a discussion with myself!

November 12, 2013 | 12:48 AM - Posted by Josh Walrath

But at least it is entertaining!

November 12, 2013 | 04:07 AM - Posted by JohnGR (not verified)

Who is winning?

November 12, 2013 | 01:43 PM - Posted by Anonymous (not verified)

The consumer.

November 11, 2013 | 10:04 PM - Posted by derz

You did not disappoint Josh. Nice recap article.

November 11, 2013 | 10:09 PM - Posted by jatrias (not verified)

Thank you,Josh!

November 11, 2013 | 10:25 PM - Posted by johnc (not verified)

is this a FX CPU replacement? If this is will we see some SLI boards?

November 11, 2013 | 10:48 PM - Posted by Josh Walrath

I am guessing we will find out more about those soon.  They won't necessarily be FX replacements, especially since the core count does not go above 2 modules.  It will be interesting to see where AMD places these, as well as if they will get a SLI license for the processor from NVIDIA.

November 13, 2013 | 09:17 AM - Posted by nabokovfan87

is it at all possible for you guys to have someone on from AMD to go over this and possibly peg them about fx parts or a 8370 part?

November 13, 2013 | 10:45 AM - Posted by Josh Walrath

I'll mention it to Ryan and see what we can dig up.

November 12, 2013 | 12:03 AM - Posted by snook

nice write up josh. you always break it down so that I can get my head around it. thanks

November 12, 2013 | 12:56 AM - Posted by SteeloYangster

I'm very very excited and interested in the Kaveri chips. I've had a lot of experience with the first and second gen APU's and can't wait to see programs utilizing HSA/hUMA. I hope they catch on!

November 12, 2013 | 02:43 AM - Posted by capawesome9870

Crystal ball time (dramatic sounding) **dun - dun - Duuunnn**
how long before before they start pushing out chips similar to the Xbox One and Play Station 4 to the PC market. 3-4 modules with 16+ GCN Compute Units.

November 12, 2013 | 04:33 AM - Posted by Melvar

How many Jaguar cores will fit in the same die area as a steamroller module?

I had assumed that they needed to go with the much weaker Jaguar cores on the Xbone & PS4 in order to have enough transistors left for that level of graphics.

November 12, 2013 | 10:24 AM - Posted by Josh Walrath

We do not yet know the die size of Kaveri, but it will be interesting to compare/contrast size vs. performance on the XBOne and PS4 units.  My gut feeling here is that MS and Sony were hoping to go really wide on the CPU but run it at lower speeds, so TDP headroom can be afforded to the graphics portion.

November 13, 2013 | 01:55 AM - Posted by PsiAmp

Theoretically it will make sense as soon as 20nm is available. So 1H 2015 is quite possible for a new APU with XBO chip performance.

CPU wise 4 cores @3.7 GHz Kaveri is faster then 8 cores @1.8 GHz Jaguar in PS4/XBO.

November 12, 2013 | 03:11 AM - Posted by Gadgety

"BF4 at medium quality settings and 1080P. The Kaveri chip was running between 24 and 30 fps, compared to an Intel i7-4770K paired with the GT 630 graphics card. The Intel/NVIDIA combination was providing 12 to 14 fps in performance in the same scene with the same quality settings"

Wow! Better than I expected. The most interesting chip launch in a long while to me. Depending on price, it'll hopefully be great for building a value for money entry gaming rig for my kid, and of course for laptops.

November 12, 2013 | 03:27 AM - Posted by capawesome9870

what is the memory bandwidth for these new Kaveri chips?

i am thinking it is dual channel DDR3-2133 which which would give 34GB/s (17GB/s per channel).

when are these APUs going to move to dedicated DDR5 soldered to the board to give 70+GB/s? Or even better DDR5 memory sticks made by AMD for their APUs.

fun fact the 7770 (640 stream processors 1.24TFlops) and 7750 (512 stream processors at 819GFlops) have a 72GB/s memory bandwidth.

November 12, 2013 | 10:29 AM - Posted by Josh Walrath

Memory bandwidth is at a premium with integrated graphics.  We do not yet know all the details behind this APU, but it seems like AMD did spend a lot of time on the memory controller to squeeze every ounce of bandwidth out of it.  Also, hopefully low latency as well.

There was talk and some whitepapers at one time about Kaveri and GDDR-5.  They even released a specification for a DIMM design for GDDR-5 that was nearly identical to DDR-4.  It would be a great boon for both CPU and GPU performance on this platform, but getting the rest of the industry behind it was apparently troublesome.

November 12, 2013 | 11:21 AM - Posted by Principle (not verified)

Yeah, there may be an embedded and perhaps mobile variant with GDDR5 where 4GB is the normal amount of RAM and never gets upgraded. I don't think you will see it for desktops at all, and the next iteration will just use DDR4.

The bandwidth was much improved with Kaveri, and with DDR3-1600 it was able to achieve something like 17GB/s compared to a Richland APU with DDR3-2133 only getting about 12GB/s. So I cannot wait to see what DDR3-2133 does for Kaveri. I would also assume it comes standard with an 1866 controller, but haven't seen any benches on engineering samples with 1866.

November 12, 2013 | 01:35 PM - Posted by capawesome9870

here is to hoping that AMD does do DDR5 on the APUs. the faster memory the APUs get the better games play.

November 12, 2013 | 05:46 AM - Posted by ET3D (not verified)

856 GFlops is around 10% more than 779. That's not a huge difference.

November 12, 2013 | 10:31 AM - Posted by Josh Walrath

It isn't a huge difference in theoretical performance, but in real world applications it is going to be a lot more efficient.  I think perf on graphics is going to exceed that 10% difference.

November 12, 2013 | 11:17 AM - Posted by Principle (not verified)

Yes, those are theoretical numbers, the A10-6800K never actually hit that. Kaveri with Steamroller, HUMA and GCN will come closer to the actual theoretical value.

November 12, 2013 | 06:09 AM - Posted by Ploutonas (not verified)

I have a bad feeling about it, I hope it's not a bulldozer 2

I am in the process to upgrade (with 4770k now), but I will give them a chance by waiting 1 month for some reviews... Only if its true, I may buy it.

November 12, 2013 | 10:32 AM - Posted by Josh Walrath

It will perform better than current Richland/Vishera products per clock.  The question is... how much?  I don't think that AMD is going to move past Intel in terms of IPC, but it will be a pretty big improvement over the previous AMD parts.  4770K is still a really strong CPU, especially if you are going to use standalone graphics.

November 12, 2013 | 08:15 AM - Posted by Anonymous (not verified)

How many execution ports per core(interger,Floating Point,etc), on the CPU? How deep is the execution pipline(Per Core)? how many integer and floating poin instructions can each CPU core retire per clock? Does the CPU have 4 fat cores, with no shared execution units, and a fat on die BUS between the CPU and the GPU, and if so, then this could be the start of something good. Great about the mantle support, but what about the OpenCL and OpenGL support, What is AMD's commitment to continued support of OpenGL/OpenCL (As many open source software apps use openGL, and I hope soon openCL!) Will AMD help the open source community intigrate Mantle support into open source applications such as Blender/Gimp/ETC.? I am looking forward to PCPER's review, and I hope it reads like the best whitepapers from the Hot Chips symposium, and please, include any links to any white papers and processor data sheets, that you may come across in your research. Please, in the future, do more Blender benchmarks, as well as other open source graphics software benchmarks! I think AMD is on to something great with HSA, and combineing CPUs with GPUs for gaming as well as graphics.

November 12, 2013 | 10:42 AM - Posted by Josh Walrath

That's a lot of questions... many of the answers are still unknown, even after fun events like Hotchips where information on Steamroller was presented.  What we do know is this...

It will have fewer shared resources than previous Vishera/Trinity processors.  Dual decode units is the really big deal here, and will more effectively feed the int pipes.  FMACs are again shared, but flow is supposedly a lot better with dual decode and retire.  The memory controller and crossbars are all new and improved, so communication should be a lot better.  AMD expects to see a big boost in IPC and multi-threaded efficiency.  Remember, with previous cores, they could really only handle one thread per clock per module due to single decode and other decisions in the pipeline.  This updated design *should* allow for two threads per clock per module, so again we will see a nice boost in multi-core efficiency.

This is very similar to what AMD did in forging ahead with 64 bit computing with AMD64 (that Intel adopted and called EM64T- crosslicensing it from AMD).  AMD has a good thing going with the HSA Foundation with some really large companies behind it in the Android/Linux world.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.