Subject: Editorial | September 18, 2015 - 05:00 PM | Josh Walrath
Tagged: Zen, raja koduri, lisa su, Jim Keller, bulldozer, amd
2012 was a significant year for AMD. Many of the top executives left and there were many new and exciting hires at the company. Lisa Su, who would eventually become President and CEO of AMD was hired in January of that year. Rory Read seemed to be on a roll with many measures to turn around the company. He also convinced some big name folks to come back to AMD from other lucrative positions. One of these rehires was Jim Keller.
Jim Keller, breakin it down for AMD. Or doing "The Robot". Or both.
Today it was announced that Jim would be leaving AMD effective Sept. 18th. He was back at AMD for three years and in that time headed up the CPU group. He implemented massive changes that would result in the design of the upcoming Zen architecture. There was a full scale ejection of the Bulldozer concept that powered AMD processors since 2011 with the FX-8150 introduction with the current Excavator core design to last through 2016 with the final product being "Bristol Ridge,"expected next summer. Zen will not ship until late 2016 with the first full quarter of revenue in 2017.
Jim helped to develop the K7 and K8 processors from AMD. He also was extremely influential in the creation of the X86-64 ISA that not only powers AMD’s parts, but also was adopted by Intel after their disastrous EPIC/IA64 ISA failed to go anywhere. His past also includes work at DEC on the Alpha processors and before AMD at Apple working on the A4 and A5 SOCs.
We do not know any of the details about his leaving, and perhaps never will. AMD has released an official statement that “Jim Keller is leaving AMD to pursue other opportunities, effective September 18”. Looking at Jim’s past employment, he seems to move around a bit. Perhaps he enjoys coming into a place, turning things around, implementing some new thinking, but then becomes bored with the daily routine of management, budget, and planning.
In the near future this change will not affect AMD’s roadmaps or product lineups. We still will see Bristol Ridge as the follow-up for Godavari in Summer 2016 and the late 2016 introduction of Zen. What can be said beyond that is hard to quantify. There are a lot of smart and talented people still working at AMD and perhaps this allows someone there to step up and introduce the next generation of architectures and thinking at AMD. Everybody likes the idea of a rockstar designer coming in to shake things up, but time moves on and new people become those rockstars.
We wish Jim well on his new journey and hope that this is not a harbinger of things to come for AMD. Consumers need the competition that AMD brings to the table and we certainly hope we see them continue to release new products and stay on a schedule that will benefit both them and consumers. Perhaps he will join fellow veteran Glenn Henry at VIA/Centaur and produce the next, great X86-64 chip. Perhaps not.
Digging into a specific market
A little while ago, I decided to think about processor design as a game. You are given a budget of complexity, which is determined by your process node, power, heat, die size, and so forth, and the objective is to lay out features in the way that suits your goal and workload best. While not the topic of today's post, GPUs are a great example of what I mean. They make the assumption that in a batch of work, nearby tasks are very similar, such as the math behind two neighboring pixels on the screen. This assumption allows GPU manufacturers to save complexity by chaining dozens of cores together into not-quite-independent work groups. The circuit fits the work better, and thus it lets more get done in the same complexity budget.
Carrizo is aiming at a 63 million unit per year market segment.
This article is about Carrizo, though. This is AMD's sixth-generation APU, starting with Llano's release in June 2011. For this launch, Carrizo is targeting the 15W and 35W power envelopes for $400-$700 USD notebook devices. AMD needed to increase efficiency on the same, 28nm process that we have seen in their product stack since Kabini and Temash were released in May of 2013. They tasked their engineers to optimize their APU's design for these constraints, which led to dense architectures and clever features on the same budget of complexity, rather than smaller transistors or a bigger die.
15W was their primary target, and they claim to have exceeded their own expectations.
Backing up for a second. Beep. Beep. Beep. Beep.
When I met with AMD last month, I brought up the Bulldozer architecture with many individuals. I suspected that it was a quite clever design that didn't reach its potential because of external factors. As I started this editorial, processor design is a game and, if you can save complexity by knowing your workload, you can do more with less.
Bulldozer looked like it wanted to take a shortcut by cutting elements that its designers believed would be redundant going forward. First and foremost, two cores share a single floating point (decimal) unit. While you need some floating point capacity, upcoming workloads could use the GPU for a massive increase in performance, which is right there on the same die. As such, the complexity that is dedicated to every second FPU can be cut and used for something else. You can see this trend throughout various elements of the architecture.
Subject: Processors | April 27, 2015 - 10:06 PM | Josh Walrath
Tagged: Zen, Steamroller, Kaveria, k12, Excavator, carrizo, bulldozer, amd
There are some pretty breathless analysis of a single leaked block diagram that is supposedly from AMD. This is one of the first indications of what the Zen architecture looks like from a CPU core standpoint. The block diagram is very simple, but looks in the same style as what we have seen from AMD. There are some labels, but this is almost a 50,000 foot view of the architecture rather than a slightly clearer 10,000 foot view.
There are a few things we know for sure about Zen. It is a clean sheet design that moves away from what AMD was pursuing with their Bulldozer family of cores. Zen gives up CMT for SMT support for handling more threads. The design has a cluster of four cores sharing 8 MB of L3 cache, with each core having access to 512 KB of L2 cache. There is a lot of optimism that AMD can kick the trend of falling more and more behind Intel every year with this particular design. Jim Keller is viewed very positively due to his work at AMD in the K7 through K8 days, as well as what he accomplished at Apple with their ARM based offerings.
One of the first sites to pick up this diagram wrote quite a bit about what they saw. There was a lot of talk about, “right off the bat just by looking at the block diagram we can tell that Zen will have substantially higher single threaded performance compared to Excavator and the Bulldozer family.” There was the assumption that because it had two 256-bit FMACs that it could fuse them to create a single 512 bit AVX product.
These assumptions are pretty silly. This is a very simple block diagram that answers few very important questions about the architecture. Yes, it shows 6 int pipelines, but we don’t know how many are address generation vs. execution units. We don’t know how wide decode is. We don’t know latency to L2 cache, much less how L3 is connected and shared out. So just because we see more integer pipelines per core does not automatically mean, “Da, more is better, strong like tractor!” We don’t know what improvements or simplifications we will see in the schedulers. There is no mention of the front-end other than Fetch and Decode. How about Branch Prediction? What is the latency for the memory controller when addressing external memory?
Essentially, this looks like a simplified way of expressing to analysts that AMD is attempting to retain their per core integer performance while boosting floating point/AVX at a similar level. Other than that, there is very little that can be gleaned from this simple block diagram.
Other leaks that are interesting concerning Zen are the formats that we will see these products integrated into. One leak detailed a HPC aimed APU that features 16 Zen cores with 32 MB of L3 cache attached to a very large GPU. Another leak detailed a server level chip that will support 32 cores and will be seen in 2P systems. Zen certainly appears to be very flexible, and in ways it reminds me of a much beefier Jaguar type CPU. My gut feeling is that AMD will get closer to Intel than it has been in years, and perhaps they can catch Intel by surprise with a few extra features. The reality of the situation is that AMD is far behind and only now are we seeing pure-play foundries start to get even close to Intel in terms of process technology. AMD is very much at a disadvantage here.
Still, the company needs to release new, competitive products that will refill the company coffers. The previous quarter’s loss has dug into cash reserves, but AMD is still stable in terms of cash on hand and long term debt. 2015 will see new GPUs, an APU refresh, and the release of the new Carrizo parts. 2016 looks to be the make or break year with Zen and K12.
Edit 2015-04-28: Thanks to SH STON we have a new slide that has been leaked from the same deck as this one. This has some interesting info in that AMD may be going away from exclusive cache designs. Exclusive was a good idea when cache was small and expensive, as data was not replicated through each level of cache (L1 was not replicated in L2 and L2 was not replicated in L3). Intel has been using inclusive cache since forever, where data is replicated and simpler to handle. Now it looks like AMD is moving towards inclusive. This is not necessarily a bad thing as the 512 KB of L2 can easily handle what looks to be 128 KB of L1 and the shared 8 MB of L3 cache can easily handle the 2 MB of L2 data. Here is the link to that slide.
The new slide in question.
Subject: Cases and Cooling | January 2, 2014 - 08:42 PM | Jeremy Hellstrom
Tagged: sharkoon, bulldozer
If subtle just isn't your thing then the Sharkoon Bulldozer might be a good case for you. At 480 x 235 x 460mm (18.9 x 9.2 x 18.1") this is a large case and the numerous LED lights will make this case stand out even more, not to mention the unique paint job on the interior. There are 10 drive bays which can be modified for 3.5" or 2.5" drives with one bay removable to give your GPU some extra space. eTeknix liked the bottom mounted PSU and cable runs which make for a clean looking system but you will really have to like the exterior look if you consider buying this case.
"Sharkoon are not what I would call most people’s first choice when it comes to picking a new chassis, at least not here in the UK. However, we’ve seen a couple of Sharkoon products in the eTeknix office over the last couple of years that really impressed us, not only for being great cases, but also because they offered great value for money. This has left us eager to see more from Sharkoon and today we will be taking a look at their new Bulldozer chassis, a budget friendly ATX chassis that is available in a choice of three colours. Blue, Green or Red LED edition are available, with the blue model coming in a grey chassis, while the green and red LED models come in a black chassis."
Here are some more Cases & Cooling reviews from around the web:
- Cooler Master Elite 130 @ Benchmark Reviews
- Antec P100 Case @ Rbmods
- NZXT Phantom 530 Full Tower Case Review @HiTech Legion
- BitFenix Ronin Midi-Tower Computer Case Review @ Madshrimps
- Noctua NH-U14S 140mm Tower CPU Cooler @ Benchmark Reviews
- Noctua NH-U12S CPU Cooler @ Benchmark Reviews
Subject: Processors | April 30, 2013 - 06:04 PM | Josh Walrath
Tagged: amd, FX, vishera, bulldozer, FX-6350, FX-4350, FX-6300, FX-4300, 32 nm, SOI, Beloved
Today AMD has released two new processors that address the AM3+ market. The FX-6350 and FX-4350 are two new refreshes of the quad and hex core lineup of processors. Currently the FX-8350 is still the fastest of the breed, and there is no update for that particular number yet. This is not necessarily a bad thing, but there are those of us who are still awaiting the arrival of the rumored “Centurion”.
These parts are 125 watt TDP units, which are up from their 95 watt predecessors. The FX-6350 runs at 3.9 GHz with a 4.2 GHz boost clock. This is up 300 MHz stock and 100 MHz boost from the previous 95 watt FX-6300. The FX-4350 runs at 3.9 GHz with a 4.3 GHz boost clock. This is 100 MHz stock and 300 MHz boost above that of the FX-4300. What is of greater interest here is that the L3 cache goes from 4 MB on the 4300 to 8 MB on the 4350. This little fact looks to be the reason why the FX-4350 is now a 125 watt TDP part.
It has been some two years since AMD started shipping 32 nm PD-SOI/HKMG products to the market, and it certainly seems as though spinning off GLOBALFOUNDRIES has essentially stopped the push to implement new features into a process node throughout the years. As many may remember, AMD was somewhat famous for injecting new process technology into current nodes to improve performance, yields, and power characteristics in “baby steps” type fashion instead of leaving the node as is and making a huge jump with the next node. Vishera has been out for some 7 months now and we have not really seen any major improvement in regards to performance and power characteristics. I am sure that yields and bins have improved, but the bottom line is that this is only a minor refresh and AMD raised TDPs to 125 watts for these particular parts.
The FX-6350 is again a three module part containing six cores. Each module features 2 MB of L2 cache for a total of 6 MB L2 and the entire chip features 8 MB of L3 cache. The FX-4350 is a two module chip with four cores. The modules again feature the same 2 MB of L2 cache for a total of 4 MB active on the chip with the above mentioned 8 MB of L3 cache that is double what the FX-4300 featured.
Perhaps soon we will see updates on FM2 with the Richland series of desktop processors, but for now this refresh is all AMD has at the moment. These are nice upgrades to the line. The FX-6350 does cost the same as the FX-6300, but the thinking behind that is that the 6300 is more “energy efficient”. We have seen in the past that AMD (and Intel for that matter) does put a premium on lower wattage parts in a lineup. The FX-4350 is $10 more expensive than the 4300. It looks as though the FX-6350 is in stock at multiple outlets but the 4350 has yet to show up.
These will fit in any modern AM3+ motherboard with the latest BIOS installed. While not an incredibly exciting release from AMD, it at least shows that they continue to address their primary markets. AMD is in a very interesting place, and it looks like Rory Read is busy getting the house in order. Now we just have to see if they can curve back their cost structure enough to make the company more financially stable. Indications are good so far, but AMD has a long ways to go. But hey, at least according to AMD the FX series is beloved!
Subject: General Tech | April 30, 2013 - 05:23 PM | Jeremy Hellstrom
Tagged: Steamroller, piledriver, Kaveri, Kabini, hUMA, hsa, GCN, bulldozer, APU, amd
AMD may have united GPU and CPU into the APU but one hurdle had remained until now, the the non-uniformity of memory access between the two processors. Today we learned about one of the first successful HAS projects called Heterogeneous Uniform Memory Access, aka hUMA, which will appear in the upcoming Kaveri chip family. The use of this new technology will allow the on-die CPU and GPU to access the same memory pool, both physical and virtual and any data passed between the two processors will remain coherent. As The Tech Report mentions in their overview hUMA will not provide as much of a benefit to discrete GPUs, while they will be able to share address space the widely differing clock speeds between GDDR5 and DDR3 prevent unification to the level of an APU.
Make sure to read Josh's take as well so you can keep up with him on the Podcast.
"At the Fusion Developer Summit last June, AMD CTO Mark Papermaster teased Kaveri, AMD's next-generation APU due later this year. Among other things, Papermaster revealed that Kaveri will be based on the Steamroller architecture and that it will be the first AMD APU with fully shared memory.
Last week, AMD shed some more light on Kaveri's uniform memory architecture, which now has a snazzy marketing name: heterogeneous uniform memory access, or hUMA for short."
Here is some more Tech News from around the web:
- AMD’s new heterogeneous Uniform Memory Access
- hUMA; AMD’s Heterogeneous Unified Memory Architecture @ Hardware Canucks
- Compro TN50W Cloud Network Camera @ Tweaktown
- Wifi Pineapple project uses updated hardware for man-in-the-middle attacks @ Hack a Day
- New OpenWRT Drops Support For Linux 2.4, Low-Mem Devices @ Slashdot
- HP mashes up ProLiant, Integrity, BladeSystem, and Moonshot server @ The Register
- Acer selling tablet using Intel Y series processor @ The Register
- CERN Celebrates 20 Years of an Open Web (and Rebuilds 1st Web Page) @ Slashdot
- BitFenix 5K YouTube Subscriber Giveaway @ eTeknix
heterogeneous Uniform Memory Access
Several years back we first heard AMD’s plans on creating a uniform memory architecture which will allow the CPU to share address spaces with the GPU. The promise here is to create a very efficient architecture that will provide excellent performance in a mixed environment of serial and parallel programming loads. When GPU computing came on the scene it was full of great promise. The idea of a heavily parallel processing unit that will accelerate both integer and floating point workloads could be a potential gold mine in wide variety of applications. Alas, the promise of the technology did not meet expectations when we have viewed the results so far. There are many problems with combining serial and parallel workloads between CPUs and GPUs, and a lot of this has to do with very basic programming and the communication of data between two separate memory pools.
CPUs and GPUs do not share common memory pools. Instead of using pointers in programming to tell each individual unit where data is stored in memory, the current implementation of GPU computing requires the CPU to write the contents of that address to the standalone memory pool of the GPU. This is time consuming and wastes cycles. It also increases programming complexity to be able to adjust to such situations. Typically only very advanced programmers with a lot of expertise in this subject could program effective operations to take these limitations into consideration. The lack of unified memory between CPU and GPU has hindered the adoption of the technology for a lot of applications which could potentially use the massively parallel processing capabilities of a GPU.
The idea for GPU compute has been around for a long time (comparatively). I still remember getting very excited about the idea of using a high end video card along with a card like the old GeForce 6600 GT to be a coprocessor which would handle heavy math operations and PhysX. That particular plan never quite came to fruition, but the idea was planted years before the actual introduction of modern DX9/10/11 hardware. It seems as if this step with hUMA could actually provide a great amount of impetus to implement a wide range of applications which can actively utilize the GPU portion of an APU.
Subject: Graphics Cards, Processors | January 23, 2013 - 07:42 PM | Ryan Shrout
Tagged: southern islands, sony, ps4, playstation 4, orbis, Kaveri, bulldozer, APU, amd
Earlier today a report from Kotaku.com posted some details about the upcoming PlayStation console, code named Orbis and sometimes just called the PS4. Kotaku author Luke Plunkett got the information from a 90 page PDF that details the development kit so the information is likely pretty accurate if incomplete. It discusses a new controller and a completely new accounts system but I was mostly interested in the hardware details given.
We'll begin with the specs. And before we go any further, know that these are current specs for a PS4 development kit, not the final retail console itself. So while the general gist of the things you see here may be similar to what makes it into the actual commercial hardware, there's every chance some—if not all of it—changes, if only slightly.
This is key to keep in mind because here are the specs listed on the report:
- 8GB of system memory
- 2.2GB of graphics memory
- 4 module (8 core) AMD Bulldozer CPU
- AMD "R10xx" based GPU
- 4x USB 3.0 ports and 2x Ethernet connections
- Blu-ray drive
- 160GB HDD
- HDMI and optical audio output
We are essentially talking about an AMD FX-series processor with a Southern Islands based discrete card and I am nearly 100% sure that this will not match the configuration of the shipping system. Think about it - would a console developer really want to have a processor that can draw more than 100 watts inside its box in addition to a discrete GPU? I doubt it.
Instead, let's go with the idea that this developer kit is simply meant to emulate some final specifications. More than likely we are looking at an APU solution that combines Bulldozer or Steamroller cores along with GCN-based GPU SIMD arrays. The most likely candidate is Kaveri, a 28nm based product that meets both of those requirements. Josh recently discussed the future with Kaveri in a post during CES, worth checking out. AMD has told us several times that Kaveri should be able to hit the 1.0 TFLOPs level of performance and if we compare to the current discrete GPUs would enable graphics performance similar to that of an under-clocked Radeon HD 7770.
There is some room for doubt though - Kaveri isn't supposed to be out until "late Q4" though its possible that the PS4 will be the first customer. It is also possible that AMD is making a specific discrete GPU for implementation on the PS4 based on the GCN architecture that would be faster than the graphics performance expected on the Kaveri APU.
When speaking with our own Josh Walrath on this rumor, he tended to think that Sony and AMD would not use an APU but would rather combine a separate CPU and GPU on a single substrate, allowing for better yields than a combined APU part. In order to make up for the slower memory controller interface (on substrate is not as fast as on-die) AMD might again utilize backside cache, just like the one used on the Xbox 360 today. With process technology improvements its not unthinkable to see that jump to 30 or 40MB of cache.
With the debate of a 2013 or 2014 release still up in the air, there is plenty of time for this to change still but we will likely know for sure after our next trip to Taipei.
Subject: Processors | October 23, 2012 - 06:44 PM | Jeremy Hellstrom
Tagged: vishera, Steamroller, piledriver, FX-8350, fx-8150, FX-6300, FX-6200, bulldozer, amd
The FX-8350 Vishera processor from AMD has finally arrived with 8 fully unlocked cores of polished Piledriver processing power. With Piledriver there are no huge changes to the existing Bulldozer architecture, this is more of a polishing and optimizing the existing architecture and [H]ard|OCP's testing bears that out. While faster than the previous generation FX-8150 it still lags behind Intel's Ivy Bridge processors, disappointing but certainly expected. The unlocked cores do lend themselves somewhat to overclocking, with [H] hitting a stable 4.6GHz with all cores enabled, a 10% jump in frequency. At that speed it does better when competing with Intel's offerings, until you overclock them as well at which point the comparative performance suffers somewhat.
Make sure to catch Josh's review, covering both the 8 core FX-8350 and the $132 FX-6300 which has a disabled module; bringing back memories of older AMD chips whose modules could be brought back to life.
"AMD's new Piledriver core technology should not be a surprise to any enthusiast as much of its "embargoed" information has already been exposed on the Net. Today we take the AMD FX series model 8350 desktop variant, code named Vishera, and look at it in an enthusiast way as we expose its IPC at 4GHz, and a bit of overclocking."
Here are some more Processor articles from around the web:
- AMD's FX-8350 processor @ The Tech Report
- AMD FX-8350 "Vishera" Linux Benchmarks @ Phoronix
- AMD FX-8350 8-Core Black Edition Processor Review @ Legit Reviews
- AMD Vishera FX-8350 Review @ OCC
- The Vishera Review: AMD FX-8350, FX-8320, FX-6300 and FX-4300 Tested @ AnandTech
- AMD FX-8350: Piledriver @ Bjorn3D
- AMD FX-8350 @ Overclockers.com
- AMD FX-8350 vs Intel Core i7-3770K @ 4.8GHz - Multi-GPU Gaming Performance @ VR-Zone
- FX-8350 vs. Core i5-3470 CPU Review @ Hardware Secrets
- AMD FX-8350 (AM3+) Piledriver Processor Review @ eTeknix
- AMD FX-8350 Unlocked "Vishera" Octal Core CPU Review @ Hi Tech Legion
- AMD FX-8350 Vishera Desktop Processor @ Benchmark Reviews
- AMD FX-8350 and FX-6300 @ Legion Hardware
- AMD Piledriver FX Review - FX 8350, 8320, 6300 vs Intel Core i5 and i3 @ hardCOREware
- AMD FX-8350 Processor Review @ HardwareHeaven
- AMD FX-8350 and FX-6300 Piledriver @ TechSpot
- FX-8350 CPU Review; AMD's Vishera Arrives @ Hardware Canucks
- AMD FX8350 BE / Gigabyte HD7970 / ASUS Sabretooth 990FX R2 @ Kitguru
- AMD FX 8350 @ Guru of 3D
- AMD FX-8350 - "Piledriver" for AMD Socket AM3+ @ techPowerUp
Bulldozer to Vishera
Bulldozer is the word. Ok, perhaps it is not “the” word, but it is “a” word. When AMD let that little codename slip some years back, AMD enthusiasts and tech journalists started to salivate about the possibilities. Here was a unique and very new architecture that promised excellent single thread performance and outstanding multi-threaded performance all in a package that was easy to swallow and digest. Probiotics for the PC. Some could argue that the end product for Bulldozer and probiotics are the same, but I am not overly fond of writing articles containing four letter colorful metaphors.
The long and short of Bulldozer is that it was a product that was pushed out too fast, it had specifications that were too aggressive for the time, and it never delivered on the promise of the architecture. Logically there are some very good reasons behind the architecture, but implementing these ideas into a successful product is another story altogether. The chip was never able to reach the GHz range it was supposed to and stay within reasonable TDP limits. To get the chip out in a timely manner, timings had to be loosened internally so the chip could even run. Performance per clock was pretty dismal, and the top end FX-8150 was only marginally faster than the previous top end Phenom II X6 1100T. In some cases, the X6 was still faster and a more competent “all around” processor.
There really was not a whole lot for AMD to do about the situation. It had to have a new product, and it just did not turn out as nicely as they had hoped. The reasons for this are legion, but simply put AMD is competing with a company that is over ten times the size, with the resulting R&D budgets that such a size (and margins) can afford. Engineers looking for work are a dime a dozen, and Intel can hire as many as they need. So, instead of respinning Bulldozer ad nauseum and releasing new speed grades throughout the year by tweaking the process and metal layer design, AMD let the product line sit and stagnate at the top end for a year (though they did release higher TDP models based on the dual module FX-4000 and triple module FX-6000 series). Engineers were pushed into more forward looking projects. One of these is Vishera.