What Makes Ryzen Tick
We have been exposed to details about the Zen architecture for the past several Hot Chips conventions as well as other points of information directly from AMD. Zen was a clean sheet design that borrowed some of the best features from the Bulldozer and Jaguar architectures, as well as integrating many new ideas that had not been executed in AMD processors before. The fusion of ideas from higher performance cores, lower power cores, and experience gained in APU/GPU design have all come together in a very impressive package that is the Ryzen CPU.
It is well known that AMD brought back Jim Keller to head the CPU group after the slow downward spiral that AMD entered in CPU design. While the Athlon 64 was a tremendous part for the time, the subsequent CPUs being offered by the company did not retain that leadership position. The original Phenom had problems right off the bat and could not compete well with Intel’s latest dual and quad cores. The Phenom II shored up their position a bit, but in the end could not keep pace with the products that Intel continued to introduce with their newly minted “tic-toc” cycle. Bulldozer had issues out of the gate and did not have performance numbers that were significantly greater than the previous generation “Thuban” 6 core Phenom II product, much less the latest Intel Sandy Bridge and Ivy Bridge products that it would compete with.
AMD attempted to stop the bleeding by iterating and evolving the Bulldozer architecture with Piledriver, Steamroller, and Excavator. The final products based on this design arc seemed to do fine for the markets they were aimed at, but certainly did not regain any marketshare with AMD’s shrinking desktop numbers. No matter what AMD did, the base architecture just could not overcome some of the basic properties that impeded strong IPC performance.
The primary goal of this new architecture is to increase IPC to a level consistent to what Intel has to offer. AMD aimed to increase IPC per clock by at least 40% over the previous Excavator core. This is a pretty aggressive goal considering where AMD was with the Bulldozer architecture that was focused on good multi-threaded performance and high clock speeds. AMD claims that it has in fact increased IPC by an impressive 54% from the previous Excavator based core. Not only has AMD seemingly hit its performance goals, but it exceeded them. AMD also plans on using the Zen architecture to power products from mobile products to the highest TDP parts offered.
The Zen Core
The basis for Ryzen are the CCX modules. These modules contain four Zen cores along with 8 MB of shared L3 cache. Each core has 64 KB of L1 I-cache and 32 KB of D-cache. There is a total of 512 KB of L2 cache. These caches are inclusive. The L3 cache acts as a victim cache which partially copies what is in L1 and L2 caches. AMD has improved the performance of their caches to a very large degree as compared to previous architectures. The arrangement here allows the individual cores to quickly snoop any changes in the caches of the others for shared workloads. So if a cache line is changed on one core, other cores requiring that data can quickly snoop into the shared L3 and read it. Doing this allows the CPU doing the actual work to not be interrupted by cache read requests from other cores.
Each core can handle two threads, but unlike Bulldozer has a single integer core. Bulldozer modules featured two integer units and a shared FPU/SIMD. Zen gets rid of CMT for good and we have a single integer and FPU units for each core. The core can address two threads by utilizing AMD’s version of SMT (symmetric multi-threading). There is a primary thread that gets higher priority while the second thread has to wait until resources are freed up. This works far better in the real world than in how I explained it as resources are constantly being shuffled about and the primary thread will not monopolize all resources within the core.
Subject: Processors | September 19, 2016 - 10:35 AM | Sebastian Peak
Tagged: Socket AM4, processor, FX, cpu, APU, amd, 1331 pins
Image credit: Bit-Tech via HWSW
AMD's newest socket will merge the APU and FX series CPUs into this new AM4 socket, unlike the previous generation which split the two between AM3+ and FM2+. This is great news for system builders, who now have the option of starting with an inexpensive CPU/APU, and upgrading to a more powerful FX processor later on - with the same motherboard.
The new socket will apparently require a new cooler design, which is contrary to early reports (yes, we got it wrong, too) that the AM4 socket would be compatible with existing AM3 cooler mounts (manufacturers could of course offer hardware kits for existing cooler designs). In any case, AMD's new socket takes more of the delicate copper pins you love to try not to bend!
Clean Sheet and New Focus
It is no secret that AMD has been struggling for some time. The company has had success through the years, but it seems that the last decade has been somewhat bleak in terms of competitive advantages. The company has certainly made an impact in throughout the decades with their 486 products, K6, the original Athlon, and the industry changing Athlon 64. Since that time we have had a couple of bright spots with the Phenom II being far more competitive than expected, and the introduction of very solid graphics performance in their APUs.
Sadly for AMD their investment in the “Bulldozer” architecture was misplaced for where the industry was heading. While we certainly see far more software support for multi-threaded CPUs, IPC is still extremely important for most workloads. The original Bulldozer was somewhat rushed to market and was not fully optimized, while the “Piledriver” based Vishera products fixed many of these issues we have not seen the non-APU products updated to the latest Steamroller and Excavator architectures. The non-APU desktop market has been served for the past four years with 32nm PD-SOI based parts that utilize a rebranded chipset base that has not changed since 2010.
Four years ago AMD decided to change course entirely with their desktop and server CPUs. Instead of evolving the “Bulldozer” style architecture featuring CMT (Core Multi-Threading) they were going to do a clean sheet design that focused on efficiency, IPC, and scalability. While Bulldozer certainly could scale the thread count fairly effectively, the overall performance targets and clockspeeds needed to compete with Intel were just not feasible considering the challenges of process technology. AMD brought back Jim Keller to lead this effort, an industry veteran with a huge amount of experience across multiple architectures. Zen was born.
Hot Chips 28
This year’s Hot Chips is the first deep dive that we have received about the features of the Zen architecture. Mike Clark is taking us through all of the changes and advances that we can expect with the upcoming Zen products.
Zen is a clean sheet design that borrows very little from previous architectures. This is not to say that concepts that worked well in previous architectures were not revisited and optimized, but the overall floorplan has changed dramatically from what we have seen in the past. AMD did not stand still with their Bulldozer products, and the latest Excavator core does improve upon the power consumption and performance of the original. This evolution was simply not enough considering market pressures and Intel’s steady improvement of their core architecture year upon year. Zen was designed to significantly improve IPC and AMD claims that this product has a whopping 40% increase in IPC (instructions per clock) from the latest Excavator core.
AMD also has focused on scaling the Zen architecture from low power envelopes up to server level TDPs. The company looks to have pushed down the top end power envelope of Zen from the 125+ watts of Bulldozer/Vishera into the more acceptable 95 to 100 watt range. This also has allowed them to scale Zen down to the 15 to 25 watt TDP levels without sacrificing performance or overall efficiency. Most architectures have sweet spots where they tend to perform best. Vishera for example could scale nicely from 95 to 220 watts, but the design did not translate well into sub-65 watt envelopes. Excavator based “Carrizo” products on the other hand could scale from 15 watts to 65 watts without real problems, but became terribly inefficient above 65 watts with increased clockspeeds. Zen looks to address these differences by being able to scale from sub-25 watt TDPs up to 95 or 100. In theory this should allow AMD to simplify their product stack by offering a common architecture across multiple platforms.
Subject: Graphics Cards, Processors | June 29, 2016 - 07:27 AM | Sebastian Peak
Tagged: RX 490, radeon, processors, Polaris, graphics card, Bristol Ridge, APU, amd, A12-9800
AMD's current "We're in the Game" promotion offers a glimpse at upcoming product names, including the Radeon RX 490 graphics card, and the new Bristol Ridge APUs.
Visit AMD's gaming promo page and click the link to "check eligibility" to see the following list of products, which includes the new product names:
It seems safe to assume that the new products listed - including the Radeon RX 490 - are close to release, though details on the high-end Polaris GPU are not mentioned. We do have details on the upcoming Bristol Ridge products, with this in-depth preview from Josh published back in April. The A12-9800 and A12-9800E are said to be the flagship products in this new 7th-gen lineup, so there will be new desktop parts with improved graphics soon.
Bristol Ridge Takes on Mobile: E2 Through FX
It is no secret that AMD has faced an uphill battle since the release of the original Core 2 processors from Intel. While stayed mostly competitive through the Phenom II years, they hit some major performance issues when moving to the Bulldozer architecture. While on paper the idea of Chip Multi-Threading sounded fantastic, AMD was never able to get the per thread performance up to expectations. While their CPUs performed well in heavily multi-threaded applications, they just were never seen in as positive of a light as the competing Intel products.
The other part of the performance equation that has hammered AMD is the lack of a new process node that would allow it to more adequately compete with Intel. When AMD was at 32 nm PD-SOI, Intel had introduced its 22nm TriGate/FinFET. AMD then transitioned to a 28nm HKMG planar process that was more size optimized than 32nm, but did not drastically improve upon power and transistor switching performance.
So AMD had a double whammy on their hands with an underperforming architecture and limitted to no access to advanced process nodes that would actually improve their power and speed situation. They could not force their foundry partners to spend billions on a crash course in FinFET technology to bring that to market faster, so they had to iterate and innovate on their designs.
Bristol Ridge is the fruit of that particular labor. It is also the end point to the architecture that was introduced with Bulldozer way back in 2011.
Lower Power, Same Performance
AMD is in a strange position in that there is a lot of excitement about their upcoming Zen architecture, but we are still many months away from that introduction. AMD obviously needs to keep the dollars flowing in, and part of that means that we get refreshes now and then of current products. The “Kaveri” products that have been powering the latest APUs from AMD have received one of those refreshes. AMD has done some redesigning of the chip and tweaked the process technology used to manufacture them. The resulting product is the “Godavari” refresh that offers slightly higher clockspeeds as well as better overall power efficiency as compared to the previous “Kaveri” products.
One of the first refreshes was the A8-7670K that hit the ground in November of 2015. This is a slightly cut down part that features 6 GPU compute units vs. the 8 that a fully enabled Godavari chip has. This continues to be a FM2+ based chip with a 95 watt TDP. The clockspeed of this part goes from 3.6 GHz to 3.9 GHz. The GPU portion runs at the same 757 MHz that the original A10-7850K ran at. It is interesting to note that it is still a 95 watt TDP part with essentially the same clockspeeds as the 7850K, but with two fewer GPU compute units.
The other product being covered here is a bit more interesting. The A10-7860K looks to be a larger improvement from the previous 7850K in terms of power and performance. It shares the same CPU clockspeed range as the 7850K (3.6 GHz to 3.9 GHz), but improves upon the GPU clockspeed by hitting around 800 MHz. At first this seems underwhelming until we realize that AMD has lowered the TDP from 95 watts down to 65 watts. Less power consumed and less heat produced for the same performance from the CPU side and improved performance from the GPU seems like a nice advance.
AMD continues to utilize GLOBALFOUNDRIES 28 nm Bulk/HKMG process for their latest APUs and will continue to do so until Zen is released late this year. This is not the same 28 nm process that we were introduced to over four years ago. Over that time improvements have been made to improve yields and bins, as well as optimize power and clockspeed. GF also can adjust the process on a per batch basis to improve certain aspects of a design (higher speed, more leakage, lower power, etc.). They cannot produce miracles though. Do not expect 22 nm FinFET performance or density with these latest AMD products. Those kinds of improvements will show up with Samsung/GF’s 14nm LPP and TSMC’s 16nm FF+ lines. While AMD will be introducing GPUs on 14nm LPP this summer, the Zen launch in late 2016 will be the first AMD CPU to utilize that advanced process.
Subject: Graphics Cards, Processors | April 19, 2016 - 11:21 AM | Ryan Shrout
Tagged: sony, ps4, Playstation, neo, giant bomb, APU, amd
Based on a new report coming from Giant Bomb, Sony is set to release a new console this year with upgraded processing power and a focus on 4K capabilities, code named NEO. We have been hearing for several weeks that both Microsoft and Sony were planning partial generation upgrades but it appears that details for Sony's update have started leaking out in greater detail, if you believe the reports.
Giant Bomb isn't known for tossing around speculation and tends to only report details it can safely confirm. Austin Walker says "multiple sources have confirmed for us details of the project, which is internally referred to as the NEO."
The current PlayStation 4 APU
Image source: iFixIt.com
There are plenty of interesting details in the story, including Sony's determination to not split the user base with multiple consoles by forcing developers to have a mode for the "base" PS4 and one for NEO. But most interesting to us is the possible hardware upgrade.
The NEO will feature a higher clock speed than the original PS4, an improved GPU, and higher bandwidth on the memory. The documents we've received note that the HDD in the NEO is the same as that in the original PlayStation 4, but it's not clear if that means in terms of capacity or connection speed.
Games running in NEO mode will be able to use the hardware upgrades (and an additional 512 MiB in the memory budget) to offer increased and more stable frame rate and higher visual fidelity, at least when those games run at 1080p on HDTVs. The NEO will also support 4K image output, but games themselves are not required to be 4K native.
Giant Bomb even has details on the architectural changes.
|Shipping PS4||PS4 "NEO"|
|CPU||8 Jaguar Cores @ 1.6 GHz||8 Jaguar Cores @ 2.1 GHz|
|GPU||AMD GCN, 18 CUs @ 800 MHz||AMD GCN+, 36 CUs @ 911 MHz|
|Stream Processors||1152 SPs ~ HD 7870 equiv.||2304 SPs ~ R9 390 equiv.|
|Memory||8GB GDDR5 @ 176 GB/s||8GB GDDR5 @ 218 GB/s|
(We actually did a full video teardown of the PS4 on launch day!)
If the Compute Unit count is right from the GB report, then the PS4 NEO system will have 2,304 stream processors running at 911 MHz, giving it performance nearing that of a consumer Radeon R9 390 graphics card. The R9 390 has 2,560 SPs running at around 1.0 GHz, so while the NEO would be slower, it would be a substantial upgrade over the current PS4 hardware and the Xbox One. Memory bandwidth on NEO is still much lower than a desktop add-in card (218 GB/s vs 384 GB/s).
Could Sony's NEO platform rival the R9 390?
If the NEO hardware is based on Grenada / Hawaii GPU design, there are some interesting questions to ask. With the push into 4K that we expect with the upgraded PlayStation, it would be painful if the GPU didn't natively support HDMI 2.0 (4K @ 60 Hz). With the modularity of current semi-custom APU designs it is likely that AMD could swap out the display controller on NEO with one that can support HDMI 2.0 even though no consumer shipping graphics cards in the 300-series does so.
It is also POSSIBLE that NEO is based on the upcoming AMD Polaris GPU architecture, which supports HDR and HDMI 2.0 natively. That would be a much more impressive feat for both Sony and AMD, as we have yet to see Polaris released in any consumer GPU. Couple that with the variables of 14/16nm FinFET process production and you have a complicated production pipe that would need significant monitoring. It would potentially lower cost on the build side and lower power consumption for the NEO device, but I would be surprised if Sony wanted to take a chance on the first generation of tech from AMD / Samsung / Global Foundries.
However, if you look at recent rumors swirling about the June announcement of the Radeon R9 480 using the Polaris architecture, it is said to have 2,304 stream processors, perfectly matching the NEO specs above.
New features of the AMD Polaris architecture due this summer
There is a lot Sony and game developers could do with roughly twice the GPU compute capability on a console like NEO. This could make the PlayStation VR a much more comparable platform to the Oculus Rift and HTC Vive though the necessity to work with the original PS4 platform might hinder the upgrade path.
The other obvious use is to upgrade the image quality and/or rendering resolution of current games and games in development or just to improve the frame rate, an area that many current generation consoles seem to have been slipping on.
In the documents we’ve received, Sony offers suggestions for reaching 4K/UltraHD resolutions for NEO mode game builds, but they're also giving developers a degree of freedom with how to approach this. 4K TV owners should expect the NEO to upscale games to fit the format, but one place Sony is unwilling to bend is on frame rate. Throughout the documents, Sony repeatedly reminds developers that the frame rate of games in NEO Mode must meet or exceed the frame rate of the game on the original PS4 system.
There is still plenty to read in the Giant Bomb report, and I suggest you head over and do so. If you thought the summer was going to be interesting solely because of new GPU releases from AMD and NVIDIA, it appears that Sony and Microsoft have their own agenda as well.
Subject: General Tech | April 7, 2016 - 02:47 PM | Ken Addison
Tagged: VR, vive, video, tesla p100, steamvr, Spectre 13.3, rift, podcast, perfmon, pascal, Oculus, nvidia, htc, hp, GP100, Bristol Ridge, APU, amd
PC Perspective Podcast #394 - 04/07/2016
Join us this week as we discuss measuring VR Performance, NVIDIA's Pascal GP100, Bristol Ridge APUs and more!
The URL for the podcast is: http://pcper.com/podcast - Share with your friends!
- iTunes - Subscribe to the podcast directly through the Store (audio only)
- RSS - Subscribe through your regular RSS reader (audio only)
- MP3 - Direct download link to the MP3 file
This episode of the PC Perspective Podcast is sponsored by Lenovo!
Hosts: Ryan Shrout, Jeremy Hellstrom, Josh Walrath and Allyn Malventano
Program length: 1:32:19
Week in Review:
0:46:25 This week’s podcast is brought to you by Casper. Use code PCPER at checkout for $50 towards your order!
News items of interest:
Hardware/Software Picks of the Week
Subject: Processors | April 5, 2016 - 06:30 AM | Josh Walrath
Tagged: mobile, hp, GCN, envy, ddr4, carrizo, Bristol Ridge, APU, amd, AM4
Today AMD is “pre-announcing” their latest 7th generation APU. Codenamed “Bristol Ridge”, this new SOC is based off of the Excavator architecture featured in the previous Carrizo series of products. AMD provided very few hints as to what was new and different in Bristol Ridge as compared to Carrizo, but they have provided a few nice hints.
They were able to provide a die shot of the new Bristol Ridge APU and there are some interesting differences between it and the previous Carrizo. Unfortunately, there really are no changes that we can see from this shot. Those new functional units that you are tempted to speculate about? For some reason AMD decided to widen out the shot of this die. Those extra units around the border? They are the adjacent dies on the wafer. I was bamboozled at first, but happily Marc Sauter pointed it out to me. No new functional units for you!
This is the Carrizo shot. It is functionally identical to what we see with Bristol Ridge.
AMD appears to be using the same 28 nm HKMG process from GLOBALFOUNDRIES. This is not going to give AMD much of a jump, but from information in the industry GLOBALFOUNDRIES and others have put an impressive amount of work into several generations of 28 nm products. TSMC is on their third iteration which has improved power and clock capabilities on that node. GLOBALFOUNDRIES has continued to improve their particular process and likely Bristol Ridge is going to be the last APU built on that node.
All of the competing chips are rated at 15 watts TDP. Intel has the compute advantage, but AMD is cleaning up when it comes to graphics.
The company has also continued to improve upon their power gating and clocking technologies to keep TDPs low, yet performance high. AMD recently released the Godavari APUs to the market which exhibit better clocking and power characteristics from the previous Kaveri. Little was done on the actual design, rather it was improved process tech as well as better clock control algorithms that achieved these advances. It appears as though AMD has continued this trend with Bristol Ridge.
We likely are not seeing per clock increases, but rather higher and longer sustained clockspeeds providing the performance boost that we are seeing between Carrizo and Bristol Ridge. In these benchmarks AMD is using 15 watt TDP products. These are mobile chips and any power improvements will show off significant gains in overall performance. Bristol Ridge is still a native quad core part with what looks to be an 8 module GCN unit.
Again with all three products at a 15 watt TDP we can see that AMD is squeezing every bit of performance it can with the 28 nm process and their Excavator based design.
The basic core and GPU design look relatively unchanged, but obviously there were a lot of tweaks applied to give the better performance at comparable TDPs.
AMD is announcing this along with the first product that will feature this APU. The HP Envy X360. This convertible tablet offers some very nice features and looks to be one of the better implementations that AMD has seen using its latest APUs. Carrizo had some wins, but taking marketshare back from Intel in the mobile space has been tortuous at best. AMD obviously hopes that Bristol Ridge in the sub-35 watt range will continue to show fight for the company in this important market. Perhaps one of the more interesting features is the option for the PCIe SSD. Hopefully AMD will send out a few samples so we can see what a more “premium” type convertible can do with the AMD silicon.
The HP Envy X360 convertible in all of its glory.
Bristol Ridge will be coming to the AM4 socket infrastructure in what appears to be a Computex timeframe. These parts will of course feature higher TDPs than what we are seeing here with the 15 watt unit that was tested. It seems at that time AMD will announce the full lineup from top to bottom and start seeding the market with AM4 boards that will eventually house the “Zen” CPUs that will show up in late 2016.
AMD Keeps Q1 Interesting
CES 2016 was not a watershed moment for AMD. They showed off their line of current video cards and, perhaps more importantly, showed off working Polaris silicon, which will be their workhorse for 2016 in the graphics department. They did not show off Zen, a next generation APU, or any AM4 motherboards. The CPU and APU world was not presented in a way that was revolutionary. What they did show off, however, hinted at the things to come to help keep AMD relevant in the desktop space.
It was odd to see an announcement about the stock cooler that AMD was introducing, but when we learned more about it, the more important it was for AMD’s reputation moving forward. The Wraith cooler is a new unit to help control the noise and temperatures of the latest AMD CPUs and select APUs. This is a fairly beefy unit with a large, slow moving fan that produces very little noise. This is a big change from the variable speed fans on previous coolers that could get rather noisy and leave temperatures that were higher in range than are comfortable. There has been some derision aimed at AMD for providing “just a cooler” for their top end products, but it is a push that is making them more user and enthusiast friendly without breaking the bank.
Socket AM3+ is not dead yet. Though we have been commenting on the health of the platform for some time, AMD and its partners work to improve and iterate upon these products to include technologies such as USB 3.1 and M.2 support. While these chipsets are limited to PCI-E 2.0 speeds, the four lanes available to most M.2 controllers allows these boards to provide enough bandwidth to fully utilize the latest NVMe based M.2 drives available. We likely will not see a faster refresh on AM3+, but we will see new SKUs utilizing the Wraith cooler as well as a price break for the processors that exist in this socket.