Subject: Processors | April 5, 2016 - 06:30 AM | Josh Walrath
Tagged: mobile, hp, GCN, envy, ddr4, carrizo, Bristol Ridge, APU, amd, AM4
Today AMD is “pre-announcing” their latest 7th generation APU. Codenamed “Bristol Ridge”, this new SOC is based off of the Excavator architecture featured in the previous Carrizo series of products. AMD provided very few hints as to what was new and different in Bristol Ridge as compared to Carrizo, but they have provided a few nice hints.
They were able to provide a die shot of the new Bristol Ridge APU and there are some interesting differences between it and the previous Carrizo. Unfortunately, there really are no changes that we can see from this shot. Those new functional units that you are tempted to speculate about? For some reason AMD decided to widen out the shot of this die. Those extra units around the border? They are the adjacent dies on the wafer. I was bamboozled at first, but happily Marc Sauter pointed it out to me. No new functional units for you!
This is the Carrizo shot. It is functionally identical to what we see with Bristol Ridge.
AMD appears to be using the same 28 nm HKMG process from GLOBALFOUNDRIES. This is not going to give AMD much of a jump, but from information in the industry GLOBALFOUNDRIES and others have put an impressive amount of work into several generations of 28 nm products. TSMC is on their third iteration which has improved power and clock capabilities on that node. GLOBALFOUNDRIES has continued to improve their particular process and likely Bristol Ridge is going to be the last APU built on that node.
All of the competing chips are rated at 15 watts TDP. Intel has the compute advantage, but AMD is cleaning up when it comes to graphics.
The company has also continued to improve upon their power gating and clocking technologies to keep TDPs low, yet performance high. AMD recently released the Godavari APUs to the market which exhibit better clocking and power characteristics from the previous Kaveri. Little was done on the actual design, rather it was improved process tech as well as better clock control algorithms that achieved these advances. It appears as though AMD has continued this trend with Bristol Ridge.
We likely are not seeing per clock increases, but rather higher and longer sustained clockspeeds providing the performance boost that we are seeing between Carrizo and Bristol Ridge. In these benchmarks AMD is using 15 watt TDP products. These are mobile chips and any power improvements will show off significant gains in overall performance. Bristol Ridge is still a native quad core part with what looks to be an 8 module GCN unit.
Again with all three products at a 15 watt TDP we can see that AMD is squeezing every bit of performance it can with the 28 nm process and their Excavator based design.
The basic core and GPU design look relatively unchanged, but obviously there were a lot of tweaks applied to give the better performance at comparable TDPs.
AMD is announcing this along with the first product that will feature this APU. The HP Envy X360. This convertible tablet offers some very nice features and looks to be one of the better implementations that AMD has seen using its latest APUs. Carrizo had some wins, but taking marketshare back from Intel in the mobile space has been tortuous at best. AMD obviously hopes that Bristol Ridge in the sub-35 watt range will continue to show fight for the company in this important market. Perhaps one of the more interesting features is the option for the PCIe SSD. Hopefully AMD will send out a few samples so we can see what a more “premium” type convertible can do with the AMD silicon.
The HP Envy X360 convertible in all of its glory.
Bristol Ridge will be coming to the AM4 socket infrastructure in what appears to be a Computex timeframe. These parts will of course feature higher TDPs than what we are seeing here with the 15 watt unit that was tested. It seems at that time AMD will announce the full lineup from top to bottom and start seeding the market with AM4 boards that will eventually house the “Zen” CPUs that will show up in late 2016.
Fighting for Relevance
AMD is still kicking. While the results of this past year have been forgettable, they have overcome some significant hurdles and look like they are improving their position in terms of cutting costs while extracting as much revenue as possible. There were plenty of ups and downs for this past quarter, but when compared to the rest of 2015 there were some solid steps forward here.
The company reported revenues of $958 million, which is down from $1.06 billion last quarter. The company also recorded a $103 million loss, but that is down significantly from the $197 million loss the quarter before. Q3 did have a $65 million write-down due to unsold inventory. Though the company made far less in revenues, they also shored up their losses. The company is still bleeding, but they still have plenty of cash on hand for the next several quarters to survive. When we talk about non-GAAP figures, AMD reports a $79 million loss for this past quarter.
For the entire year AMD recorded $3.99 billion in revenue with a net loss of $660 million. This is down from FY 2014 revenues of $5.51 billion and a net loss of $403 million. AMD certainly is trending downwards year over year, but they are hoping to reverse that come 2H 2016.
Graphics continues to be solid for AMD as they increased their sales from last quarter, but are down year on year. Holiday sales were brisk, but with only the high end Fury series being a new card during this season, the impact of that particular part was not as great as compared to the company having a new mid-range series like the newly introduced R9 380X. The second half of 2016 will see the introduction of the Polaris based GPUs for both mobile and desktop applications. Until then, AMD will continue to provide the current 28 nm lineup of GPUs to the market. At this point we are under the assumption that AMD and NVIDIA are looking at the same timeframe for introducing their next generation parts due to process technology advances. AMD already has working samples on Samsung’s/GLOBALFOUNDRIES 14nm LPP (low power plus) that they showed off at CES 2016.
Subject: Processors | January 11, 2016 - 06:26 PM | Sebastian Peak
Tagged: rumor, report, FM2+, carrizo, Athlon X4, amd
According to a report published by CPU World, a pair of unreleased AMD Athlon X4 processors appeared in a supported CPU list on Gigabyte's website (since removed) long enough to give away some information about these new FM2+ models.
Image credit: CPU World
The CPUs in question are the Athlon X4 835 and Athlon X4 845, 65W quad-core parts that are both based on AMD's Excavator core, according to CPU World. The part numbers are AD835XACI43KA and AD845XACI43KA, which the CPU World report interprets:
"The 'I43' letters and digits in the part number signify Socket FM2+, 4 CPU cores, and 1 MB L2 cache per module, or 2MB in total. The last two letters 'KA' confirm that the CPUs are based on Carrizo design."
The report further states that the Athlon X4 835 will operate at 3.1 GHz, with 3.5 GHz for the X4 845. No Turbo Core frequency information is known for these parts.
Subject: General Tech | October 22, 2015 - 02:38 PM | Jeremy Hellstrom
Tagged: Merlin Falcon, Excavator, carrizo, amd
On your latest flight you may have noticed some branding on the displays powering the schedules and in-flight entertainment, or perhaps if you were flying to Vegas you didn't notice it until you were playing the slots. If you were paying attention you would have noticed that the display was powered by AMD, as are many POS, medical and even military displays. A new series of Excavator based processors was announced today, the Merlin Falcon which has four Excavator cores, a Radeon third-gen GCN GPU and support for both DDR3 and DDR4 RAM.
Yes that is right, the first DDR4 chip from AMD is arriving but you won't be running it in your desktop. You should probably be jealous as this processor will have HSA 1.0, hardware based HEVC/H.265 video decode, DirectX 12 support and even the ARM co-processor that provides AMD's new Secure Processor feature. There is more at The Register if you follow the link.
"AMD will today unveil Merlin Falcon, its latest R-series processor aimed at industrial systems, medical devices, gambling machines, digital signs, military hardware, and so on."
Here is some more Tech News from around the web:
- SanDisk, Toshiba to jointly make 3D flash memory @ DigiTimes
- Michael Dell berates Microsoft's Nadella about high price of Surface tablet @ The Inquirer
- Square Enix To Concentrate On Remaking Their Back Catalog @ Slashdot
- Marvell, Longsys partner to make SSDs @ DigiTimes
- IoT's sub-GHz 802.11ah Wi-Fi will be dead on arrival, warn analysts @ The Register
- Amazon Fire TV Review @ Hardware Secrets
Subject: Graphics Cards, Processors | August 30, 2015 - 09:14 PM | Scott Michaud
Tagged: amd, carrizo, Fiji, opencl, opencl 2.0
Apart from manufacturers with a heavy first-party focus, such as Apple and Nintendo, hardware is useless without developer support. In this case, AMD has updated their App SDK to include support for OpenCL 2.0, with code samples. It also updates the SDK for Windows 10, Carrizo, and Fiji, but it is not entirely clear how.
That said, OpenCL is important to those two products. Fiji has a very high compute throughput compared to any other GPU at the moment, and its memory bandwidth is often even more important for GPGPU workloads. It is also useful for Carrizo, because parallel compute and HSA features are what make it a unique product. AMD has been creating first-party software software and helping popular third-party developers such as Adobe, but a little support to the world at large could bring a killer application or two, especially from the open-source community.
The SDK has been available in pre-release form for quite some time now, but it is finally graduated out of beta. OpenCL 2.0 allows for work to be generated on the GPU, which is especially useful for tasks that vary upon previous results without contacting the CPU again.
Subject: Graphics Cards, Processors, Mobile | June 4, 2015 - 04:58 PM | Scott Michaud
Tagged: amd, carrizo
My discussion of the Carrizo architecture went up a couple of days ago. The post did not include specific SKUs because we did not have those at the time. Now we do, and there will be products: one A8-branded, one A10-branded, and one FX-branded.
All three will be quad-core parts that can range between 12W and 35W designs, although the A8 processor does not have a 35W mode listed in the AMD Dual Graphics table. The FX-8800P is an APU that has all eight GPU cores while the A-series APUs have six. The A10-8700P and the A8-8600P are separated by a couple hundred megahertz base and boost CPU clocks, and 80 MHz GPU clock.
Also, we have been given a table of AMD Radeon R5 and R7 M-series GPUs that can be paired with Carrizo in an AMD Dual Graphics setup. These GPUs are the R7 M365, R7 M360, R7 M350, R7 M340, R5 M335, and R5 M330. They cannot be paired with every Carrizo APU, and some pairings only work in certain power envelopes. Thankfully, this table should only be relevant to OEMs, because end-users are receiving pre-configured systems.
Pricing and availability will depend on OEMs, of course.
Subject: General Tech | June 3, 2015 - 05:29 PM | Jeremy Hellstrom
Tagged: carrizo, APU, amd. excavator
If you skipped reading Scott's look at the new AMD Carrizo processor you have done yourself a disfavour and should read through his look at AMD's recent history and the evolution of Bulldozer and Steamroller into Excavator. It will help you understand The Tech Report's look into the new architecture and the AMD provided benchmarks which you can check out here. A lot of the new architecture is a refinement of previous chips but the Tonga based GPU portion is completely new and looks to be an impressive improvement, especially on these 15W and 30W chips. It will be very interesting to see how they fare against the Iris Pro on Intel's new Broadwell chips in systems without a discrete GPU.
"The Carrizo processor is AMD's follow-on to Kaveri and a direct competitor to Intel's Broadwell CPUs. After a lengthy prelude, AMD is officially taking the wraps off of Carrizo today at the Computex trade show in Taipei. The firm expects laptops based on Carrizo to be available near the end of this month, and now that the chip is official, we know a number of juicy details about it that had previously been murky."
Here is some more Tech News from around the web:
- Typing 'http://:' Into a Skype Message Trashes the Installation Beyond Repair @ Slashdot
- Microsoft suffers worldwide Wi-Fi wardrobe malfunction @ The Register
- Fanbois designing Windows 10 – where's it going to end? @ The Register
- Holy SSH-it! Microsoft promises secure logins for Windows PowerShell @ The Register
- Tech ARP 2015 Mega Giveaway #4 : Mi In-Ear Headphones
Digging into a specific market
A little while ago, I decided to think about processor design as a game. You are given a budget of complexity, which is determined by your process node, power, heat, die size, and so forth, and the objective is to lay out features in the way that suits your goal and workload best. While not the topic of today's post, GPUs are a great example of what I mean. They make the assumption that in a batch of work, nearby tasks are very similar, such as the math behind two neighboring pixels on the screen. This assumption allows GPU manufacturers to save complexity by chaining dozens of cores together into not-quite-independent work groups. The circuit fits the work better, and thus it lets more get done in the same complexity budget.
Carrizo is aiming at a 63 million unit per year market segment.
This article is about Carrizo, though. This is AMD's sixth-generation APU, starting with Llano's release in June 2011. For this launch, Carrizo is targeting the 15W and 35W power envelopes for $400-$700 USD notebook devices. AMD needed to increase efficiency on the same, 28nm process that we have seen in their product stack since Kabini and Temash were released in May of 2013. They tasked their engineers to optimize their APU's design for these constraints, which led to dense architectures and clever features on the same budget of complexity, rather than smaller transistors or a bigger die.
15W was their primary target, and they claim to have exceeded their own expectations.
Backing up for a second. Beep. Beep. Beep. Beep.
When I met with AMD last month, I brought up the Bulldozer architecture with many individuals. I suspected that it was a quite clever design that didn't reach its potential because of external factors. As I started this editorial, processor design is a game and, if you can save complexity by knowing your workload, you can do more with less.
Bulldozer looked like it wanted to take a shortcut by cutting elements that its designers believed would be redundant going forward. First and foremost, two cores share a single floating point (decimal) unit. While you need some floating point capacity, upcoming workloads could use the GPU for a massive increase in performance, which is right there on the same die. As such, the complexity that is dedicated to every second FPU can be cut and used for something else. You can see this trend throughout various elements of the architecture.
Some Fresh Hope for 2016
EDIT 2015-05-07: A day after the AMD analyst meeting we now know that the roadmaps delivered here are not legitimate. While some of the information is likely correct on the roadmaps, they were not leaked by AMD. There is no FM3 socket, rather AMD is going with AM4. AMD will be providing more information throughout this quarter about their roadmaps, but for now take all of this information as "not legit".
SH SOTN has some eagle eyes and spotted the latest leaked roadmap for AMD. These roadmaps cover both mobile and desktop, from 2015 through 2016. There are obviously quite a few interesting tidbits of information here.
On the mobility roadmap we see the upcoming release of Carrizo, which we have been talking about since before CES. This will be the very first HSA 1.0 compliant part to hit the market, and AMD has done some really interesting things with the design in terms of performance, power efficiency, and die size optimizations. Carrizo will span the market from 15 watts to 35 watts TDP. This is a mobile only part, but indications point to it being pretty competent overall. This is a true SOC that will support all traditional I/O functions of older standalone southbridges. Most believe that this part will be manufactured by GLOBALFOUNDIRES on their 28 nm HKMG process that is more tuned to AMD's APU needs.
Carrizo-L will be based on the Puma+ architecture and will go from 10 watts to 15 watts TDP. This will use the same FP4 BGA connection as the big Carrizo APU. This should make these parts more palatable for OEMs as they do not have to differentiate the motherboard infrastructure. Making things easier for OEMs will give more reasons for these folks to offer products based on Carrizo and Carrizo-L APUs. The other big reason will be the GCN graphics compute units. Puma+ is a very solid processor architecture for low power products, but these parts are still limited to the older 28 nm HKMG process from TSMC.
One interesting addition here is that AMD will be introducing their "Amur" APU for the low power and ultra-low power markets. These will be comprised of four Cortex-A57 CPUs combined with AMD's GCN graphics units. This will be the first time we see this combination, and the first time AMD has integrated with ARM since ATI spun off their mobile graphics to Qualcomm under the "Adreno" branding (anagram for "Radeon"). What is most interesting here is that this APU will be a 20 nm part most likely fabricated by TSMC. This is not to say that Samsung or GLOBALFOUNDRIES might be producing it, but those companies are expending their energy on the 14 nm FinFET process that will be their bread and butter for years to come. This will be a welcome addition to the mobile market (tablets and handhelds) and could be a nice profit center for AMD if they are able to release this in a timely manner.
2016 is when things get very interesting. The Zen x86 design will dominate the upper 2/3 of the roadmap. I had talked about Zen when we had some new diagram leaks yesterday, but now we get to see the first potential products based off of this architecture. In mobile it will span from 5 watts to 35 watts TDP. The performance and mainstream offerings will be the "Bristol Ridge" APU which will feature 4 Zen cores (or one Zen module) combined with the next gen GCN architecture. This will be a 14nm part, and the assumption is that it will be GLOBALFOUNDRIES using 14nm FinFET LPP (Low Power Plus) that will be more tuned for larger APUs. This will also be a full SOC.
The next APU will be codenamed "Basilisk" that will span the 5 watt to 15 watt range. It will be comprised of 2 Zen cores (1/2 of a Zen module) and likely feature 2 to 4 MB of L3 cache, depending on power requirements. This looks to be the first Skybridge set of APUs that will share the same infrastructure as the ARM based Amur SOC. FT4 BGA is the basis for both the 2015 Amur and 2016 Basilisk SOCs.
Finally we have the first iteration of AMD's first ground up implementation of ARM's ARMv8-A ISA. The "Styx" APU features the new K12 CPU cores that AMD has designed from scratch. It too will feature the next generation GCN units as well as share the same FT4 BGA connection. Many are anxiously watching this space to see if AMD can build a better mousetrap when it comes to licensing the ARM ISA (as have Qualcomm, NVIDIA, and others).
2015 shows no difference in the performance desktop space, as it is still serviced by the now venerable Piledriver based FX parts on AM3+. The only change we expect to see here is that there will be a handful of new motherboard offerings from the usual suspects that will include the new USB 3.1 functionality derived from a 3rd party controller.
Mainstream and Performance will utilize the upcoming Godavari APUs. These are power and speed optimized APUs that are still based on the current Kaveri design. These look to be a simple refresh/rebadge with a slight performance tweak. Not exciting, but needs to happen for OEMs.
Low power will continue to be addressed by Beema based APUs. These are regular Puma based cores (not Puma+). AMD likely does not have the numbers to justify a new product in this rather small market.
2016 is when things get interesting again. We see the release of the FM3 socket (final proof that AM3+ is dead) that will house the latest Zen based APUs. At the top end we see "Summit Ridge" which will be composed of 8 Zen cores (or 2 Zen modules). This will have 4 MB of L2 cache and 16 MB of L3 cache if our other leaks are correct. These will be manufactured on 14nm FinFET LPE (the more appropriate process product for larger, more performance oriented parts). These will not be SOCs. We can expect these to be the basis of new Opterons as well, but there is obviously no confirmation of that on these particular slides. This will be the first new product in some years from AMD that has the chance to compete with higher end desktop SKUs from Intel.
From there we have the lower power Bristol Ridge and Basilisk APUs that we already covered in the mobile discussion. These look to be significant upgrades from the current Kaveri (and upcoming Godavari) APUs. New graphics cores, new CPU cores, and new SOC implementations where necessary.
AMD will really be shaking up the game in 2016. At the very least they will have proven that they can still change up their game and release higher end (and hopefully competitive) products. AMD has enough revenue and cash on hand to survive through 2016 and 2017 at the rate they are going now. We can only hope that this widescale change will allow AMD to make some significant inroads with OEMs on all levels. Otherwise Intel is free to do what they want and what price they want across multiple markets.
Subject: Processors | April 27, 2015 - 06:06 PM | Josh Walrath
Tagged: Zen, Steamroller, Kaveria, k12, Excavator, carrizo, bulldozer, amd
There are some pretty breathless analysis of a single leaked block diagram that is supposedly from AMD. This is one of the first indications of what the Zen architecture looks like from a CPU core standpoint. The block diagram is very simple, but looks in the same style as what we have seen from AMD. There are some labels, but this is almost a 50,000 foot view of the architecture rather than a slightly clearer 10,000 foot view.
There are a few things we know for sure about Zen. It is a clean sheet design that moves away from what AMD was pursuing with their Bulldozer family of cores. Zen gives up CMT for SMT support for handling more threads. The design has a cluster of four cores sharing 8 MB of L3 cache, with each core having access to 512 KB of L2 cache. There is a lot of optimism that AMD can kick the trend of falling more and more behind Intel every year with this particular design. Jim Keller is viewed very positively due to his work at AMD in the K7 through K8 days, as well as what he accomplished at Apple with their ARM based offerings.
One of the first sites to pick up this diagram wrote quite a bit about what they saw. There was a lot of talk about, “right off the bat just by looking at the block diagram we can tell that Zen will have substantially higher single threaded performance compared to Excavator and the Bulldozer family.” There was the assumption that because it had two 256-bit FMACs that it could fuse them to create a single 512 bit AVX product.
These assumptions are pretty silly. This is a very simple block diagram that answers few very important questions about the architecture. Yes, it shows 6 int pipelines, but we don’t know how many are address generation vs. execution units. We don’t know how wide decode is. We don’t know latency to L2 cache, much less how L3 is connected and shared out. So just because we see more integer pipelines per core does not automatically mean, “Da, more is better, strong like tractor!” We don’t know what improvements or simplifications we will see in the schedulers. There is no mention of the front-end other than Fetch and Decode. How about Branch Prediction? What is the latency for the memory controller when addressing external memory?
Essentially, this looks like a simplified way of expressing to analysts that AMD is attempting to retain their per core integer performance while boosting floating point/AVX at a similar level. Other than that, there is very little that can be gleaned from this simple block diagram.
Other leaks that are interesting concerning Zen are the formats that we will see these products integrated into. One leak detailed a HPC aimed APU that features 16 Zen cores with 32 MB of L3 cache attached to a very large GPU. Another leak detailed a server level chip that will support 32 cores and will be seen in 2P systems. Zen certainly appears to be very flexible, and in ways it reminds me of a much beefier Jaguar type CPU. My gut feeling is that AMD will get closer to Intel than it has been in years, and perhaps they can catch Intel by surprise with a few extra features. The reality of the situation is that AMD is far behind and only now are we seeing pure-play foundries start to get even close to Intel in terms of process technology. AMD is very much at a disadvantage here.
Still, the company needs to release new, competitive products that will refill the company coffers. The previous quarter’s loss has dug into cash reserves, but AMD is still stable in terms of cash on hand and long term debt. 2015 will see new GPUs, an APU refresh, and the release of the new Carrizo parts. 2016 looks to be the make or break year with Zen and K12.
Edit 2015-04-28: Thanks to SH STON we have a new slide that has been leaked from the same deck as this one. This has some interesting info in that AMD may be going away from exclusive cache designs. Exclusive was a good idea when cache was small and expensive, as data was not replicated through each level of cache (L1 was not replicated in L2 and L2 was not replicated in L3). Intel has been using inclusive cache since forever, where data is replicated and simpler to handle. Now it looks like AMD is moving towards inclusive. This is not necessarily a bad thing as the 512 KB of L2 can easily handle what looks to be 128 KB of L1 and the shared 8 MB of L3 cache can easily handle the 2 MB of L2 data. Here is the link to that slide.
The new slide in question.