The AMD Kaveri Architecture
Kaveri: AMD’s New Flagship Processor
How big is Kaveri? We already know the die size of it, but what kind of impact will it have on the marketplace? Has AMD chosen the right path by focusing on power consumption and HSA? Starting out an article with three questions in a row is a questionable tactic for any writer, but these are the things that first come to mind when considering a product the likes of Kaveri. I am hoping we can answer a few of these questions by the end of this article, but alas it seems as though the market will have the final say as to how successful this new architecture is.
AMD has been pursuing the “Future is Fusion” line for several years, but it can be argued that Kaveri is truly the first “Fusion” product that completes the overall vision for where AMD wants to go. The previous several generations of APUs were initially not all that integrated in a functional sense, but the complexity and completeness of that integration has been improved upon with each iteration. Kaveri takes this integration to the next step, and one which fulfills the promise of a truly heterogeneous computing solution. While AMD has the hardware available, we have yet to see if the software companies are willing to leverage the compute power afforded by a robust and programmable graphics unit powered by AMD’s GCN architecture.
(Editor's Note: The following two pages were written by our own Josh Walrath, dicsussing the technology and architecture of AMD Kaveri. Testing and performance analysis by Ryan Shrout starts on page 3.)
The first step in understanding Kaveri is taking a look at the process technology that AMD is using for this particular product. Since AMD divested itself of their manufacturing arm, they have had to rely on GLOBALFOUNDRIES to produce nearly all of their current CPUs and APUs. Bulldozer, Piledriver, Llano, Trinity, and Richland based parts were all produced on GF’s 32 nm PD-SOI process. The lower power APUs such as Brazos and Kabini have been produced by TSMC on their 40 nm and 28 nm processes respectively.
Kaveri will take a slightly different approach here. It will be produced by GLOBALFOUNDRIES, but it will forego the SOI and utilize a bulk silicon process. 28 nm HKMG is very common around the industry, but few pure play foundries were willing to tailor their process to the direct needs of AMD and the Kaveri product. GF was able to do such a thing. APUs are a different kind of animal when it comes to fabrication, primarily because the two disparate units require different characteristics to perform at the highest efficiency. As such, compromises had to be made.
Subject: General Tech, Processors, Shows and Expos | January 10, 2014 - 08:32 AM | Scott Michaud
Tagged: Transformer Book Duet, Intel, CES 2014, CES, asus
Monday, the opening day of CES, was full of keynotes and announcements from Audi to Valve (Yahoo! was the day after). Okay, so that is probably not the complete alphabetical range, but keep reading regardless. The Intel speech had a few surprises including Gabe Newell re-announcing Steam Machines just a couple of hours after his own keynote.
Possibly the most surprising to me was the "Dual OS platforms" announcement. Frankly, I am fine with using BlueStacks for whatever little Android use that my desktop experiences. I did see a demo of the ASUS Transformer Book Duet, however, which was able to switch between Android and Windows 8.1 with the touch of a button and about 3 seconds of black screen. It seems to be more than emulation and it is pretty clearly not rebooting.
To be clear, the following is speculation (and not even confident at that). I am hypothesizing... not reporting. Unfortunately, Intel (and ASUS) have been very silent on the actual implementation as far as I can tell. Since this is clearly branded as "Android and Windows can be friends", it would not surprise me if this was a baked solution for the two platforms and maybe even special hardware.
One possibility is that hardware or software loads both operating systems into memory or hibernation state. In this way, when the user signals their desire for a change, the former operating system is slept (or hibernated) and the processor is then pointed to the others memory space.
Video credit: PCMag
If the above is the case then I hope popular Linux distributions can get their hands on it. Rebooting is far too annoying for me to try out alternative operating systems and virtualization is also too problematic (at least for now). If I can just suspend and switch, especially with native performance on either end, then I will definitely be willing to play around. Honestly, how expensive are RAM and storage these days?
But, if it is user-accessible, then it would be a major consideration for a future upgrade.
The other cute little announcement is Edison, a dual core PC in an SD card form factor. The hope is that this device will power wearable computing and make other devices smarter. It is based on 22nm silicon and even includes WiFi. One use case they presented was a bottle warmer which warms the milk before you even get your child.
Despite the late coverage, it was a very interesting keynote. Ars Technica still has their live blog published if you would like to skim through a play-by-play.
Follow all of our coverage of the show at http://pcper.com/ces!
Subject: Processors | January 7, 2014 - 09:52 AM | Josh Walrath
Tagged: amd, CES, 2014, Kaveri, A10 7850K, A10 7700K, APU, firepro, hsa
This year’s AMD CES was actually more interesting than I was expecting. The details of the event were well known, as most Kaveri details have been revealed over the past few months. I was unsure what Lisa Su and the gang would go over, but it was actually more interesting than I was expecting.
This past year has been a big one for AMD. They seem to be doing a lot better than others expected them to, especially with all of the delayed product launches on the CPU side for quite a few years. This year saw the APU take a pretty prominent place in the industry with the launch of the latest generation consoles from Sony and Microsoft. AMD made inroads with mobile form factors with a variety of APUs. The HSA Foundation members have grown and HSA members ship two out of every three connected, smart devices. Apple also includes Firepro graphics cards with all of their new Mac Pros.
Kaveri is of course the big news here. AMD feels that this is the best APU yet. The combination of Steamroller CPU cores, GCN graphics compute cores, HSA, hUMA, HQ, TrueAudio, Mantle support, PCI-E 3.0 support, and a configurable TDP makes for a pretty compelling product. AMD has shuffled some nomenclature about by saying that Kaveri, at the top end, is comprised of 12 compute cores. These include 4 Steamroller cores and 8 GCN compute clusters. Each compute cluster matches the historical definition of a core, but of course it looks quite a bit different than a traditional x86 core.
We have gone over Kaveri pretty extensively in the past. The CPU is clocked at 3.7 GHz with a 4 GHz boost. The graphics portion clocks in at 720 MHz. It can support up to DDR-3 2400 MHz memory, which is really needed to extract as much performance out of this new APU. Benchmarks provided by AMD show this product to be a big jump from the previous Richland, and in these particular benchmarks are quite a bit faster than the competing i5 4670K.
Gaming performance is also improved. This APU can run most current applications at 1080P resolutions with low to medium quality settings. Older titles can be run at 1080P with Medium to High/Extreme settings. While this processor is rated at around 867 GFLOPS, which is around 110 GFLOPS greater than the previous top end Richland, it is more efficient at delivering that theoretical performance. It looks to be a significant improvement all around.
Software support is improving with applications from companies like Adobe, The Document Foundation, and Nuance. These cover HSA applications and in Nuance’s case, using the TrueAudio portion to clean up and accelerate voice recognition. TrueAudio is also being supported in five upcoming games. This is not a huge amount, but it is a decent start for this new technology.
Mantle is gaining a lot more momentum with support from 3 engines, 5 developers, and 20+ games in development. They showed off Battlefied 4 running Mantle on a Kaveri APU for the first time publicly. They mentioned that it ran 45% faster than Direct3D at the same quality levels on the same hardware. The display showed frame rates up in the low 50 fps area.
AMD is continuing to move forward on their low power offerings based on Beema and Mullins. Lisa claims that these parts are outperforming the Intel Baytrail offerings in both CPU performance and graphics. Unfortunately, she mentioned noting about the power consumption associated with these results. They showed off the Discovery tablet as well as a fully functional PC that was the size of a large cellphone.
They closed up the even by talking about the Surround House 2. This demo looks significantly better than the previous iteration we saw last year. This features something like a 34.2 speaker setup in a projected dome. It is much more complex than the House from last year, but the hardware running it all is rather common. A single high end Firepro card running on a single A10 7850K. The demo is also one of the first shows of a 360 degree gesture recognition setup.
AMD has come a long way since hitting rock bottom a few years back. They continue to claw their way back to relevance, and they hope that Kaveri will help them regain a foothold in the computing market. They are certainly doing well in the graphics market, but the introduction of Kaveri should help them gain more momentum in the CPU/APU market. We have yet to test Kaveri on our own, but initial results look promising. It is a better APU, but we just don’t know how much better so far.
Follow all of our coverage of the show at http://pcper.com/ces!
Subject: Processors, Mobile | January 6, 2014 - 04:43 AM | Ryan Shrout
Tagged: tegra k1, tegra, SoC, nvidia, kepler, CES 2014, CES
Update: Check out our more in-depth analysis of the Tegra K1 processor from NVIDIA.
Today during its CES 2014 press conference, NVIDIA announced the Tegra K1 SoC as the successor to the Tegra 4 processor. This new ARM-based part includes 192 Kepler-based CUDA cores, sharing the same GPU architecture as the current GeForce GTX 700-series discrete graphics cards.
NVIDIA also announced the Epic has Unreal Engine 4 up and running on the Tegra K1, bringing an entirely new class of games to mobile Android devices. We got to see some demonstrations from NVIDIA running on the K1 and I must admit the visuals were stunning. Frame rates did get a bit choppy during the subway demo of UE4 but it's still early.
As an added surprise, NVIDIA is announcing a version of Tegra K1 that ships with the same quad-core A15 (4+1) design as the Tegra 4 BUT ALSO have a version that uses two NVIDIA Denver CPU cores!! Denver is NVIDIA's custom CPU design based on the ARMv8 architecture, adding 64-bit support to another ARM partner's portfolio.
Tegra K1 is offered in two pin-to-pin compatible versions - a 32-bit quad-core (4-Plus-1 ARM Cortex-A15 CPU) and a custom, NVIDIA-designed 64-bit dual Super Core CPU. This CPU (codenamed “Project Denver”) delivers very high single-thread and multi-thread performance. Both versions deliver stunning graphics and visual computing capabilities powered by the 192-core NVIDIA Kepler GPU.
NVIDIA has only had Denver back for a few days from the fab but there able to showcase it running Android. It's been a long time since the initial announcement of this project and its great to finally see a result.
Tegra K1 with quad-core A15 processor
We'll have an in-depth story on the Tegra K1 on Monday morning, 6am PST right here on PC Perspective so check back then!!
Follow all of our coverage of the show at http://pcper.com/ces!
Subject: General Tech, Graphics Cards, Processors | December 19, 2013 - 09:05 AM | Scott Michaud
Tagged: Intel, haswell
In another review from around the net, Carl Nelson over at Hardcoreware tested the dual-core (4 threads) Intel Core i3-4340 based on the Haswell architecture. This processor slides into the $157 retail price point with a maximum frequency of 3.6GHz and an Intel HD 4600 iGPU clocked at 1150MHz. Obviously this is not intended as top-end performance but, of course, not everyone wants that.
Image Credit: Hardcoreware
One page which I found particularly interesting was the one which benchmarked Battlefield 4 rendering on the iGPU. The AMD A10 6790K (~$130) had slightly lower 99th percentile frame time (characteristic of higher performance) but slightly lower average frames per second (characteristic of lower performance). The graph of frame times shows that AMD is much more consistent than Intel. Perhaps the big blue needs a little Fame Rating? I would be curious to see what is causing the pretty noticeable (in the graph, at least) stutter. AMD's frame pacing seems to be very consistent albeit this is obviously not a Crossfire scenario.
If you are in the low-to-mid $100 price point be sure to check out his review. Also, of course, Kaveri should be coming next month so that is something to look out for.
Subject: General Tech, Processors | December 17, 2013 - 02:17 AM | Scott Michaud
Tagged: Intel, Haswell-EP, Broadwell-EP, Broadwell
Intel has made its way on to our news feed several times over the last few days. The ticking and the tocking seem to be back on schedule. Was Intel held back by the complexity of 14nm? Was it too difficult for them to focus on both high-performance and mobile development? Was it a mix of both?
VR-Zone, who knows how to get a hold of Intel slides, just leaked details about Broadwell-EP. This product line is predicted to replace Haswell-EP at some point in the summer of 2015 (they expect right around Intel Developer Forum). They claim it will be Intel's first 14nm Xeon processor which obviously suggests that it will not be preceded by Broadwell in the lower performance server categories.
Image Credit: VR-Zone
Broadwell-EP will have up to 18 cores per socket (Hyper-Threading allows up to 36 threads). Its top level cache, which we assume is L3, will be up to 45MB large. TDPs will be the same as Haswell-EP which range from 70W to 145W for server parts and from 70W to 160W for workstations. The current parts based on Ivy Bridge, as far as I can tell, peak at 150W and 25MB of cache. Intel will apparently allow Haswell and Broadwell to give off a little more heat than their predecessors. This could be a very good sign for performance.
VR-Zone expects that a dual-socket Broadwell-EP Xeon system could support up to 2TB of DDR4 memory. They expect close to 1 TFLOP per socket of double precision FP performance. This meets or exceeds the performance available by Kaveri including its GPU. Sure, the AMD solution will be available over a year earlier and cost a fraction of the multi-thousand-dollar server processor, but it is somewhat ridiculous to think that a CPU has the theoretical performance available to software render the equivalent of Battlefield 4's medium settings without a GPU (if the software was written with said rendering engine, which it is not... of course).
This is obviously two generations off as we have just received the much anticipated Ivy Bridge-E. Still, it is good to see that Intel is keeping themselves moving ahead and developing new top-end performance parts for enthusiasts and high-end servers.
Subject: General Tech, Processors | December 15, 2013 - 09:27 AM | Scott Michaud
Tagged: Intel, google, arm
Amazon, Facebook, and Google are three members of a fairly exclusive club. These three companies order custom server processors from Intel (and other companies). Jason Waxman of Intel was quoted by Wired, "Sometimes OEMs and end customers ask us to put a feature into the silicon and it sort of depends upon how big a deal it is and whether it has to be invisible or proprietary to a customer. We're always happy to, if we can find a way to get it into the silicon".
Now, it would seem, that Google is interested in developing their own server processors based on architecture licensed from ARM. This could be a big deal for Intel as Bloomberg believes Google accounts for a whole 4.3% of the chip giant's revenue.
Of course this probably does not mean Google will spring up a fabrication lab somewhere. That would just be nutty. It is still unclear whether they will cut in ARM design houses, such as AMD or Qualcomm, or whether they will take ARM's design and run straight to TSMC, GlobalFoundries, or IBM with it.
I am sure there would be many takers for some sizable fraction of 4.3% of Intel's revenue.
Subject: General Tech, Processors, Mobile | December 14, 2013 - 09:07 AM | Scott Michaud
Tagged: Intel, Broadwell
This leak is from China DIY and, thus, machine-translated into English from Chinese. They claim that Broadwell is coming in the second half of 2014 and will be introduced in three four series:
- H will be the high performance offerings
- U and Y have very low power consumption
- M will fit mainstream performance
The high performance offerings will have up to four CPU cores, 6MB of L3 cache, support for up to 32GB of memory, and thermal rating of 47W. The leak claims that some will be configurable down to 37W which is pretty clearly its "SDP" rating. The problem, of course, is whether 47W is its actual TDP or, rather, another SDP rating. Who knows.
The H series is said to be available in either one or two chips. Both a separate PCH and CPU version will exist as well as a single-chip solution that integrates the PCH on-die.
There is basically nothing said about the M series beyond acknowledging its existence.
The U and Y series will be up to dual-core with 4MB L3 cache. The U series will have a thermal rating of 15W to 28W. The Y series will be substantially lower at 4.5W configurable down to 3.5W. No clue about which of these numbers are TDPs and which are SDPs. You can compare this earlier reports that Haswell will reach as low as 4.5W SDP.
Hopefully we will learn more about these soon and, perhaps, get a functional timeline of Intel releases. Seriously, I think I need to sit down and draw a flowchart some day.
Subject: General Tech, Processors | December 14, 2013 - 08:08 AM | Scott Michaud
Tagged: TSMC, process node, 16nm
Taiwan Semiconductor (TSMC) is one of the few chip fabrication companies in the world (especially when you omit the memory producers, etc.). Their customers include: AMD, NVIDIA, Qualcomm, Broadcom, and even a few Intel Atom processors have come out of their lines at one point. They will take money from just about anyone who wants a chip.
According to Bit-Tech, a few customers will even have access to 16nm before the end of the year.
The catch, which of course there is one, is that production runs will be very small. We would love to see a gigantic run of new AMD or NVIDIA GPUs based on 16nm but that will not be the case (and not just because Volcanic Islands and Maxwell are both 2Xnm products). The first customers, while otherwise anonymous, will be interested in mobile systems-on-a-chip (SoCs).
On the plus side, when future 1Xnm designs come out, TSMC's production could be reasonably caught up to make a smooth launch.
Intel, the current leader in the fabrication world, targeted a slightly smaller 14nm process and have already begun producing a few odds and ends at that level. Full production has not even really started yet.
Just so you can get an idea of the complexity we are dealing with: 16nm fabrication creates details that are just ~32 atoms in width.
Subject: General Tech, Processors | December 14, 2013 - 06:55 AM | Scott Michaud
Tagged: opteron, arm, amd
The ARMv8 architecture extends the hardware platform to 64-bit. This increase is mostly useful to address massive amounts of memory but can also have other benefits for performance. I think many of us remember the excitement prior to x86-64 and the subsequent let-down when we realized that, for most applications, typical vector extensions kept up in performance especially considering the compatibility issues of the day. It needed to happen but it was a hard sell until... it was just ubiquitous.
AMD has not kept it secret that they are developing 64-bit ARM processors for data centers but, until this week, further details were scarce. Under the codename, "Seattle", these processors will be available in four and eight cores. The Opteron branding will expand beyond x86 to include these new processors. The pitch to enterprises is simple: want both ARM and x86? Why bother with two vendors!
Seattle will also support up to 128GB of ECC memory and 10 Gigabit Ethernet for dense, but power efficient, compute clusters. It will be manufactured on the 28nm process.
The majority of AMD's blog post proclaimed its commitment to software support and it is definitely true that they hold a very high status in both the Linux and Apache Foundations. ARMv8 is supported in Linux starting with kernel 3.7.
Seattle is expected to launch in the second half of 2014.