Subject: Motherboards, Processors | September 7, 2016 - 08:08 PM | Tim Verry
Tagged: Zen, Summit Ridge, Excavator, Bristol Ridge, amd, A12-9800
This week AMD officially took the wraps off of its 7th generation APU lineup that it introduced back in May. Previously known as Bristol Ridge, AMD is launching eight new processors along with a new desktop platform that finally brings next generation I/O to AMD systems.
Bristol Ridge maintains the Excavator CPU cores and GCN GPU cores of Carrizo, but on refreshed silicon with performance and power efficiency gains that will bring the architecture started by Bulldozer to an apex. These will be the last chips of that line, and wil be succeeded by AMD's new "Zen" architecture in 2017. For now though, Bristol Ridge delivers as much as 17% higher per thread CPU performance and 27% higher graphics performance while using significantly lower power than its predecessors. Further, AMD has been able to (thanks to various process tweaks that Josh talked about previously) hit some impressive clock speeds with these chips enabling AMD to better compete with Intel's Core i5 offerings.
At the top end AMD has the (65W) quad core A12-9800 running at 3.8 GHz base and 4.2 GHz boost paired with GCN 3.0-based Radeon R7 graphics (that support VP9 and HEVC acceleration). These new Bristol Ridge chips are able to take advantage of DDR4 clocked up to 2400 MHz. For DIY PC builders planning to use dedicated graphics, AMD has the non-APU Athlon X4 950 which features four CPU cores at 3.5 GHz base and 3.8 GHz boost with a 65W TDP. While it is not clocked quite as high as its APU counterpart, it should still prove to be a popular choice for budge builds and will replace the venerable Athlon X4 860 and will also be paired with an AM4 motherboard that will be ready to accept a new Zen-based "Summit Ridge" CPU next year.
The following table lists the eight new 7th generation "Bristol Ridge" processors and their specifications.
|CPU Cores||CPU Clocks Base / Boost||GPU||GPU CUs||GPU Clocks (Max)||TDP|
|4||3.8 GHz / 4.2 GHz||Radeon R7||8||1,108 MHz||65W|
|A12-9800E4||4||3.1 GHz / 3.8 GHz||Radeon R7||8||900 MHz||35W|
|A10-9700||4||3.5 GHz / 3.8 GHz||Radeon R7||6||1,029 MHz||65W|
|A10-9700E||4||3.0 GHz / 3.5 GHz||Radeon R7||6||847 MHz||35W|
|A8-9600||4||3.1 GHz / 3.4 GHz||Radeon R7||6||900 MHz||65W|
|A6-9500||2||3.5 GHz / 3.8 GHz||Radeon
|A6-9500E||2||3.0 GHz / 3.4 GHz||Radeon
|Athlon X4 950||4||3.5 GHz / 3.8 GHz||None||0||N/A||65W|
To expand on the performance increases of Bristol Ridge, AMD compared the A12-9800 to the previous generation A10-8850 as well as Intel's Core i5-6500. According to the company, the Bristol Ridge processor handily beats the Carrizo chip and is competitive with the Intel i5. Specifically, when comparing Bristol Ridge and Carrizo, AMD found that the A12-9800 scored 3,521.25 in 3DMark 11 while the A10-8850 (95W Godavari) scored 2,880. Further, when compared in Cinebench R11.5 1T the A12-980 scored 1.21 versus the A10-8850's 1.06. Not bad when you consider that the new processor has a 30W lower TDP!
With that said, the comparison to Intel is perhaps most interesting to the readers. In this case, the A12-9800 is about where you would expect though that is not necessarily a bad thing. It does pull a bit closer to Intel in CPU and continues to offer superior graphics performance.
|AMD A12-9800 (65W)||Intel Core i5-6500 (65W)||AMD A10-8850 (95W)|
3DMark 11 Performance
|PCMark 8 Home Accelerated||3,483.25||3,702||Not run|
|Cinebench R11.5 1T||1.21||Not run||1.06|
Specifically, in 3DMark 11 Performance the A12-9800's score of 3,521.25 is quite a bit better than the Intel i5-6500's 1,765.75 result. However, in the more CPU focused PCMark 8 Home Accelerated benchmark the Intel comes out ahead with a score of 3,702 versus the AMD A12-9800's score of 3,483.25. If the price is right Bristol Ridge does not look too bad on paper, assuming AMD's testing holds true in independent reviews!
The AM4 Platform
Alongside the launch of desktop 7th generation APUs, AMD is launching a new AM4 platform that supports Bristol Ridge and is ready for Zen APUs next year. The new platform finally brings new I/O technologies to AMD systems including PCI-E 3.0, NVMe, SATA Express, DDR4, and USB 3.1 Gen 2.
According to Digital Trends, AMD's AM4 desktop platform wil span all the way from low end to enthusiast motherboards and these boards will be powered by one of three new chipsets. The three new chipsets are the B350 for mainstream, A320 for "essential," and X/B/A300 for small form factor motherboards. Notably missing is any mention of an enthusiast chipset, but one is reportedly being worked on and will arive closer to the launch of Zen-based processors in 2017.
The image below outlines the differences in the chipsets. Worth noting is that the APUs themselves will handle the eight lanes of PCI-E 3.0, dual channel DDR4, four USB 3.1 Gen 1 ports, and two SATA 6Gbps and two NVMe or PCI-E 3.0 storage devices. This leaves PCI-E 2.0, SATA Express, additional SATA 6Gbps, and USB 3.1 Gen 2 connection duties to the chipsets.
As of today, AMD has only announced the availability of AM4 motherboards and 7th generation APUs for OEM systems (with design wins from HP and Lenovo so far). The company will be outlining the channel / DIY PC builder lineup and pricing at a later (to be announced date).
I am looking forward to Zen and in a way the timing of Bristol Ridge seems strange. On the other hand, for OEMs it should do well and hold them over until then (heh) and enthusiasts / DIY builders are able to buy into Bristol Ridge knowing that they will be able to upgrade to Zen next year (while getting better than Carrizo performance with less power and possibly better overclocking) is not a bad option so long as the prices are right!
The full press blast is included below for more information on how they got their benchmark results.
Subject: Processors | September 6, 2016 - 03:05 PM | Ryan Shrout
Tagged: Zen, single thread, geekbench, amd
Over the holiday weekend a leaked Geekbench benchmark result on an engineering sample AMD Zen processor got tech nerds talking. Other than the showcase that AMD presented a couple weeks back using the Blender render engine, the only information we have on performance claims come from AMD touting a "40% IPC increase" over the latest Bulldozer derivative.
The results from Geekbench show performance from a two physical processor system and a total of 64 cores running at 1.44 GHz. Obviously that clock speed is exceptionally low; AMD demoed Summit Ridge running at 3.0 GHz in the showcase mentioned above. But this does give us an interesting data point with which to do some performance extrapolation. If we assume perfect clock speed scaling, we can guess at performance levels that AMD Zen might see at various clocks.
I needed a quick comparison point and found this Geekbench result from a Xeon E7-8857 v2 running at 3.6 GHz. That is an Ivy Bridge based architecture and though the system has 48 cores, we are only going to a look at single threaded results to focus on the IPC story.
Obviously there are a ton of caveats with looking at data like this. It's possible that AMD Zen platform was running in a very sub-optimal condition. It's possible that the BIOS and motherboard weren't fully cache aware (though I would hope that wouldn't be the case this late in the game). It's possible that the Linux OS was somehow holding back performance of the Zen architecture and needs update. There are many reasons why you shouldn't consider this data a final decision yet; but that doesn't make it any less interesting to see.
In the two graphs below I divide the collection of single threaded results from Geekbench into two halves and there are three data points for each benchmark. The blue line represents the Xeon Ivy Bridge processor running at 3.6 GHz. The light green line shows the results from the AMD Zen processor running at 1.44 GHz as reported by Geekbench. The dark green line shows an extrapolated AMD Zen performance result with perfect scaling by frequency.
Subject: Processors | September 2, 2016 - 01:39 AM | Tim Verry
Tagged: IBM, power9, power 3.0, 14nm, global foundries, hot chips
Earlier this month at the Hot Chips symposium, IBM revealed details on its upcoming Power9 processors and architecture. The new chips are aimed squarely at the data center and will be used for massive number crunching in big data and scientific applications in servers and supercomputer nodes.
Power9 is a big play from Big Blue, and will help the company expand its precense in the Intel-ruled datacenter market. Power9 processors are due out in 2018 and will be fabricated at Global Foundries on a 14nm HP FinFET process. The chips feature eight billion transistors and utilize an “execution slice microarchitecture” that lets IBM combine “slices” of fixed, floating point, and SIMD hardware into cores that support various levels of threading. Specifically, 2 slices make an SMT4 core and 4 slices make an SMT8 core. IBM will have Power9 processors with 24 SMT4 cores or 12 SMT8 cores (more on that later). Further, Power9 is IBM’s first processor to support its Power 3.0 instruction set.
According to IBM, its Power9 processors are between 50% to 125% faster than the previous generation Power8 CPUs depending on the application tested. The performance improvement is thanks to a doubling of the number of cores as well as a number of other smaller improvements including:
- A 5 cycle shorter pipeline versus Power8
- A single instruction random number generator (RNG)
- Hardware assisted garbage collection for interpreted languages (e.g. Java)
- New interrupt architecture
- 128-bit quad precision floating point and decimal math support
- Important for finance and security markets, massive databases and money math.
- IEEE 754
- CAPI 2.0 and NVLink support
- Hardware accelerators for encryption and compression
The Power9 processor features 120 MB of direct attached eDRAM that acts as an L3 cache (256 GB/s). The chips offer up 7TB/s of aggregate fabric bandwidth which certainly sounds impressive but that is a number with everything added together. With that said, there is a lot going on under the hood. Power9 supports 48 lanes of PCI-E 4.0 (2 GB/s per lane per direction), 48 lanes of proprietary 25Gbps accelerator lanes – these will be used for NVLink 2.0 to connect to NVIDIA GPUs as well as to connect to FPGAs, ASICs, and other accelerators or new memory technologies using CAPI 2.0 (Coherent Accelerator Processor Interface) – , and four 16Gbps SMP links (NUMA) used to combine four quad socket Power9 boards into a single 16 socket “cluster.”
These are processors that are built to scale and tackle the big data problems. In fact, not only is Google interested in Power9 to power its services, but the US Department of Energy will be building two supercomputers using IBM’s Power9 CPUs and NVIDI’s Volta GPUs. Summit and Sierra will offer between 100 to 300 Petaflops of computer power and will be installed at Oak Ridge National Laboratory and Lawrence Livermore National Laboratory respectively. There, some of the projects they will tackle is enabling the researchers to visualize the internals of a virtual light water reactor, research methods to improve fuel economy, and delve further into bioinformatics research.
The Power9 processors will be available in four variants that differ in the number of cores and number of threads each core supports. The chips are broken down into Power9 SO (Scale Out) and Power9 SU (Scale Up) and each group has two processors depending on whether you need a greater number of weaker cores or a smaller number of more powerful cores. Power9 SO chips are intended for multi-core systems and will be used in servers with one or two sockets while Power9 SU chips are for multi-processor systems with up to four sockets per board and up to 16 total sockets per cluster when four four socket boards are linked together. Power9 SO uses DDR4 memory and supports a theoretical maximum 4TB of memory (1TB with today’s 64GB DIMMS) and 120 GB/s of bandwidth while Power9 SU uses IBM’s buffered “Centaur” memory scheme that allows the systems to address a theoretical maximum of 8TB of memory (2TB with 64GB DIMMS) at 230 GB/s. In other words, the SU series is Big Blue’s “big guns.”
A photo of the 24 core SMT4 Power9 SO die.
Here is where it gets a bit muddy. The processors are further broken down by an SMT4 or SMT8 and both Power9 SO and Power9 SU have both options. There are Power9 CPUs with 24 SMT4 cores and there are CPUs with 12 SMT8 cores. IBM indicated that SMT4 (four threads per core) was suited to systems running Linux and virtualization with emphasis on high core counts. Meanwhile SMT8 (eight threads per core) is a better option for large logical partitions (one big system versus partitioning out the compute cluster into smaller VMs as above) and running IBM’s Hypervisor. In either case (24 SMT4 or 12 SMT8) there is the same number of total threads, but you are able to choose whether you want fewer “stronger” threads on each core or more (albeit weaker) threads per core depending on which you workloads are optimized for.
Servers supporting Power9 are already under development by Google and Rackspace and blueprints are even available from the OpenPower Foundation. Currently, it appears that Power9 SO will emerge as soon as the second half of next year (2H 2017) with Power9 SU following in 2018 which would line up with the expected date for the Summit and Sierra supercomputer launches.
This is not a chip that will be showing up in your desktop any time soon, but it is an interesting high performance processor! I will be keeping an eye on updates from Oak Ridge lab hehe.
Subject: Processors, Mobile | August 31, 2016 - 07:30 AM | Sebastian Peak
Tagged: SoC, Snapdragon 821, snapdragon, SD821, qualcomm, processor, mobile, adreno
Qualcomm has officially launched the Snapdragon 821 SoC, an upgraded successor to the existing Snapdragon 820 found in such phones as the Samsung Galaxy S7.
"With Snapdragon 820 already powering many of the premier flagship Android smartphones today, Snapdragon 821 is now poised to become the processor of choice for leading smartphones and devices for this year’s holiday season. Qualcomm Technologies’ engineers have improved Snapdragon 821 in three key areas to ensure Snapdragon 821 maintains the level of industry leadership introduced by its predecessor."
Specifications were previously revealed when the Snapdragon 821 was announced in July, with a 10% increase on the CPU clocks (2.4 GHz, up from the previous 2.2 GHz max frequency). The Adreno 530 GPU clock increases 5%, to 650 MHz from 624 MHz. In addition to improved performance from CPU and GPU clock speed increases, the SD821 is said to offer lower power consumption (estimated at 5% compared to the SD820), and offers new functionality including improved auto-focus capability.
Enhanced overall user experience:
The Snapdragon 821 has been specifically tuned to support a more responsive user experience when compared with the 820, including:
- Shorter boot times: Snapdragon 821 powered devices can boot up to 10 percent faster.
- Faster application launch times: Snapdragon 821 can reduce app load times by up to 10 percent.
- Smoother, more responsive user interactions: UI optimizations and performance enhancements designed to allow users to enjoy smoother scrolling and more responsive browsing performance.
Improved performance and power consumption:
- CPU speeds increase: As we previously announced, the 821 features Qualcomm Kryo CPU speeds up to 2.4GHz, representing an up to 10 percent improvement in performance over Snapdragon 820.
- GPU speeds increase: The Qualcomm Adreno GPU received a 5 percent speed increase over Snapdragon 820.
- Power savings: The 821 is engineered to deliver an incremental 5 percent power savings when comparing standard use case models. This power savings can extend battery life and support OEMs interested in reducing battery size for slimmer phones.
New features and functionality:
- Snapdragon 821 introduces several new features and capabilities, offering OEMs new options to create more immersive and engaging user experiences, including support for:
- Snapdragon VR SDK (Software Development Kit): Offers developers a superior mobile VR toolset, provides compatibility with the Google Daydream platform, and access to Snapdragon 821’s powerful heterogeneous architecture. Snapdragon VR SDK supports a superior level of visual and audio quality and more immersive virtual reality and gaming experiences in a mobile environment.
- Dual PD (PDAF): Offers significantly faster image autofocus speeds under a wide variety of conditions when compared to single PDAF solutions.
- Extended Laser Auto-Focus Ranging: Extends the visible focusing range, improving laser focal accuracy over Snapdragon 820.
- Android Nougat OS: Snapdragon 821 (as well as the 820) will support the latest Android operating system when available, offering new features, expanded compatibility, and additional security compared to prior Android versions.
Qualcomm says the ASUS ZenFone 3 Deluxe is the first phone to use this new Snapdragon 821 SoC while other OEMs will be working on designs implementing the upgraded SoC.
What's new and what's not
While spending time learning about upcoming products and technologies at the Intel Developer Forum earlier this month, I sat down with the company to learn about the release of Kaby Lake, now known as the 7th Generation Core processor family. We have been seeing and reporting on the details of Kaby Lake for quite some time here on PC Perspective – it became a more important topic when we realized that this would be the product that officially killed off the ‘tick-tock’ design philosophy that Intel had implemented years ago and that was responsible for much of the innovation in the CPU space over the last decade.
Today Intel released new information about the 7th Gen CPU family and Kaby Lake. Let’s dive into this topic with a simple and straight forward mindset in how it compares to Skylake.
What is the same
Actually, quite a lot. At its core, the microarchitecture of Kaby Lake is identical to that of Skylake. Instructions per clock (IPC) remain the same with the exception of dedicated hardware changes in the media engine, so you should not expect any performance differences with Kaby Lake except with improved clock speeds we’ll discuss in a bit.
Because of this lack of change many people will look down on the Kaby Lake release as Intel’s attempt to repackage an existing product to make sure it meets a financial market required annual product cadence. It is a valid but arguable criticism, but Intel is making changes in other areas that should make KBL an improvement in the thin and light ecosystem.
Also worth noting is that Intel is still building Kaby Lake on 14nm process technology, the same used on Skylake. The term “same” will be debated as well as Intel claims that improvements made in the process technology over the last 24 months have allowed them to expand clock speeds and improve on efficiency
What is changed
Dubbing this new revision of the process as “14nm+”, Intel tells me that they have improved the fin profile for the 3D transistors as well as channel strain while more tightly integrating the design process with manufacturing. The result is a 12% increase in process performance; that is a sizeable gain in a fairly tight time frame even for Intel.
That process improvement directly results in higher clock speeds for Kaby Lake when compared to Skylake when running at the same target TDPs. In general, we are looking at 300-400 MHz higher peak clock speeds in Turbo Boost situations when compared to similar TDP products in the 6th generation. Sustained clocks will very likely remain voltage / thermally limited but the ability spike up to higher clocks for even short bursts can improve performance and responsiveness of Kaby Lake when compared to Skylake.
In these two examples, Intel compares the 15 watt Core i7-6500U (a common part in currently shipping notebooks) and the upcoming 15 watt Core i7-7500U, both with dual-core HyperThreaded configurations. In SYSmark 2014 a 12% score improvement is measured while WebXPRT shows a 19% advantage. Double digit performance increases are pretty astounding for a new generational jump that does not include a new microarchitecture or a new process technology (more or less) though we should temper expectations for other applications and workload profiles like content creation.
Clean Sheet and New Focus
It is no secret that AMD has been struggling for some time. The company has had success through the years, but it seems that the last decade has been somewhat bleak in terms of competitive advantages. The company has certainly made an impact in throughout the decades with their 486 products, K6, the original Athlon, and the industry changing Athlon 64. Since that time we have had a couple of bright spots with the Phenom II being far more competitive than expected, and the introduction of very solid graphics performance in their APUs.
Sadly for AMD their investment in the “Bulldozer” architecture was misplaced for where the industry was heading. While we certainly see far more software support for multi-threaded CPUs, IPC is still extremely important for most workloads. The original Bulldozer was somewhat rushed to market and was not fully optimized, while the “Piledriver” based Vishera products fixed many of these issues we have not seen the non-APU products updated to the latest Steamroller and Excavator architectures. The non-APU desktop market has been served for the past four years with 32nm PD-SOI based parts that utilize a rebranded chipset base that has not changed since 2010.
Four years ago AMD decided to change course entirely with their desktop and server CPUs. Instead of evolving the “Bulldozer” style architecture featuring CMT (Core Multi-Threading) they were going to do a clean sheet design that focused on efficiency, IPC, and scalability. While Bulldozer certainly could scale the thread count fairly effectively, the overall performance targets and clockspeeds needed to compete with Intel were just not feasible considering the challenges of process technology. AMD brought back Jim Keller to lead this effort, an industry veteran with a huge amount of experience across multiple architectures. Zen was born.
Hot Chips 28
This year’s Hot Chips is the first deep dive that we have received about the features of the Zen architecture. Mike Clark is taking us through all of the changes and advances that we can expect with the upcoming Zen products.
Zen is a clean sheet design that borrows very little from previous architectures. This is not to say that concepts that worked well in previous architectures were not revisited and optimized, but the overall floorplan has changed dramatically from what we have seen in the past. AMD did not stand still with their Bulldozer products, and the latest Excavator core does improve upon the power consumption and performance of the original. This evolution was simply not enough considering market pressures and Intel’s steady improvement of their core architecture year upon year. Zen was designed to significantly improve IPC and AMD claims that this product has a whopping 40% increase in IPC (instructions per clock) from the latest Excavator core.
AMD also has focused on scaling the Zen architecture from low power envelopes up to server level TDPs. The company looks to have pushed down the top end power envelope of Zen from the 125+ watts of Bulldozer/Vishera into the more acceptable 95 to 100 watt range. This also has allowed them to scale Zen down to the 15 to 25 watt TDP levels without sacrificing performance or overall efficiency. Most architectures have sweet spots where they tend to perform best. Vishera for example could scale nicely from 95 to 220 watts, but the design did not translate well into sub-65 watt envelopes. Excavator based “Carrizo” products on the other hand could scale from 15 watts to 65 watts without real problems, but became terribly inefficient above 65 watts with increased clockspeeds. Zen looks to address these differences by being able to scale from sub-25 watt TDPs up to 95 or 100. In theory this should allow AMD to simplify their product stack by offering a common architecture across multiple platforms.
Subject: Processors | August 22, 2016 - 05:37 PM | Jeremy Hellstrom
Tagged: amd, a10-7870K
Leaving aside the questionable naming to instead focus on the improved cooler on this ~$130 APU from AMD. Neoseeker fired up the fun sized, 125W rated cooler on top of the A10-7870K and were pleasantly surprised at the lack of noise even under load. Encouraged by the performance they overclocked the chip by 500MHz to 4.4GHz and were rewarded with a stable and still very quiet system. The review focuses more the improvements the new cooler offers as opposed to the APU itself, which has not changed. Check out the review if you are considering a lower cost system that only speaks when spoken to.
"In order to find out just how much better the 125W thermal solution will perform, I am going to test the A10-7870K APU mounted on a Gigabyte F2A88X-UP4 motherboard provided by AMD with a set of 16 GB (2 x 8) DDR3 RAM modules set at 2133 MHz speed. I will then run thermal and fan speed tests so a comparison of the results will provide a meaningful data set to compare the near-silent 125W cooler to an older model AMD cooling solution."
Here are some more Processor articles from around the web:
GlobalFoundries Will Allegedly Skip 10nm and Jump to Developing 7nm Process Technology In House (Updated)
Subject: Processors | August 20, 2016 - 03:06 PM | Tim Verry
Tagged: Semiconductor, lithography, GLOBALFOUNDRIES, global foundries, euv, 7nm, 10nm
UPDATE (August 22nd, 11:11pm ET): I reached out to GlobalFoundries over the weekend for a comment and the company had this to say:
"We would like to confirm that GF is transitioning directly from 14nm to 7nm. We consider 10nm as more a half node in scaling, due to its limited performance adder over 14nm for most applications. For most customers in most of the markets, 7nm appears to be a more favorable financial equation. It offers a much larger economic benefit, as well as performance and power advantages, that in most cases balances the design cost a customer would have to spend to move to the next node.
As you stated in your article, we will be leveraging our presence at SUNY Polytechnic in Albany, the talent and know-how gained from the acquisition of IBM Microelectronics, and the world-class R&D pipeline from the IBM Research Alliance—which last year produced the industry’s first 7nm test chip with working transistors."
An unexpected bit of news popped up today via TPU that alleges GlobalFoundries is not only developing 7nm technology (expected), but that the company will skip production of the 10nm node altogether in favor of jumping straight from the 14nm FinFET technology (which it licensed from Samsung) to 7nm manufacturing based on its own in house design process.
Reportedly, the move to 7nm would offer 60% smaller chips at three times the design cost of 14nm which is to say that this would be both an expensive and impressive endeavor. Aided by Extreme Ultraviolet (EUV) lithography, GlobalFoundries expects to be able to hit 7nm production sometime in 2020 with prototyping and small usage of EUV in the year or so leading up to it. The in house process tech is likely thanks to the research being done at the APPC (Advanced Patterning and Productivity Center) in Albany New York along with the expertise of engineers and design patents and technology (e.g. ASML NXE 3300 and 3300B EUV) purchased from IBM when it acquired IBM Microelectronics. The APPC is reportedly working simultaneously on research and development of manufacturing methods (especially EUV where extremely small wavelengths of ultraviolet light (14nm and smaller) are used to etch patterns into silicon) and supporting production of chips at GlobalFoundries' "Malta" fab in New York.
Advanced Patterning and Productivity Center in Albany, NY where Global Foundries, SUNY Poly, IBM Engineers, and other partners are forging a path to 7nm and beyond semiconductor manufacturing. Photo by Lori Van Buren for Times Union.
Intel's Custom Foundry Group will start pumping out ARM chips in early 2017 followed by Intel's own 10nm Cannon Lake processors in 2018 and Samsung will be offering up its own 10nm node as soon as next year. Meanwhile, TSMC has reportedly already tapped out 10nm wafers and will being prodction in late 2016/early 2017 and claims that it will hit 5nm by 2020. With its rivals all expecting production of 10nm chips as soon as Q1 2017, GlobalFoundries will be at a distinct disadvantage for a few years and will have only its 14nm FinFET (from Samsung) and possibly its own 14nm tech to offer until it gets the 7nm production up and running (hopefully!).
Previously, GlobalFoundries has stated that:
“GLOBALFOUNDRIES is committed to an aggressive research roadmap that continually pushes the limits of semiconductor technology. With the recent acquisition of IBM Microelectronics, GLOBALFOUNDRIES has gained direct access to IBM’s continued investment in world-class semiconductor research and has significantly enhanced its ability to develop leading-edge technologies,” said Dr. Gary Patton, CTO and Senior Vice President of R&D at GLOBALFOUNDRIES. “Together with SUNY Poly, the new center will improve our capabilities and position us to advance our process geometries at 7nm and beyond.”
If this news turns out to be correct, this is an interesting move and it is certainly a gamble. However, I think that it is a gamble that GlobalFoundries needs to take to be competitive. I am curious how this will affect AMD though. While I had expected AMD to stick with 14nm for awhile, especially for Zen/CPUs, will this mean that AMD will have to go to TSMC for its future GPUs or will contract limitations (if any? I think they have a minimum amount they need to order from GlobalFoundries) mean that GPUs will remain at 14nm until GlobalFoundries can offer its own 7nm? I would guess that Vega will still be 14nm, but Navi in 2018/2019? I guess we will just have to wait and see!
- To 7nm And Beyond (Interview @ Semiconductor Engineering)
- GloFo Looks For 7nm Leadership @ Electronics Weekly
- GlobalFoundries develops 7nm and 10nm technologies in-house @ KitGuru
- SUNY Poly and GLOBALFOUNDRIES Announce New $500M R&D Program in Albany To Accelerate Next Generation Chip Technology @ GlobalFoundries (PR)
- AMD GPU Roadmap: Capsaicin Names Upcoming Architectures @ PC Perspective
- Next Gen Graphics and Process Migration: 20 nm and Beyond @ PC Perspective
Gunning for Broadwell-E
As I walked away from the St. Regis in downtown San Francisco tonight, I found myself wandering through the streets towards my hotel with something unique in tow. It was a smile. I was smiling, thinking about what AMD had just demonstrated and showed at its latest Zen processor reveal. The importance of this product launch can literally not be overstated for a company struggling to find a foothold to hang on to in a market that it once had a definitive lead. It’s been many years since I left a conference call, or a meeting, or a press conference feeling genuinely hopefully and enthusiastic about what AMD has shown me. Tonight I had that.
AMD’s CEO Lisa Su, and CTO Mark Papermaster, took stage down the street from the Intel Developer Forum to roll out a handful of new architectural details about the Zen architecture while also showing the first performance results comparing it to competing parts from Intel. The crowd in attendance, a mix of media and analysts, were impressed. The feeling was palpable in the room.
It’s late as I write this, and while there are some interesting architecture details to discuss, I think it is in everyone’s best interest that we touch on them lightly for now, and instead refocus on the deep-dive once the Hot Chips information comes out early next week. What you really want to know is clear: can Zen make Intel work again? Can Zen make that $1700 price tag on the Broadwell-E 6950X seem even more ludicrous? Yes.
The Zen Architecture
Much of what was discussed from the Zen architecture is a re-release of what has been out in recent months. This is a completely new, from the ground up, microarchitecture and not a revamp of the aging Bulldozer design. It integrated SMT (simultaneous multi-threading), a first for an AMD CPU, to better take efficient advantage of a longer pipeline. Intel has had HyperThreading for a long time now and AMD is finally joining the fold. A high bandwidth and low latency caching system is used to “feed the beast” as Papermaster put it and utilizing 14nm process technology (starting at Global Foundries) gives efficiency, and scaling a significant bump while enabling AMD to scale from notebooks to desktops to servers with the same architecture.
By far the most impressive claim from AMD thus far was that of a 40% increase in IPC over previous AMD designs. That’s a HUGE claim and is key to the success or failure of Zen. AMD proved to me today that the claims are real and that we will see the immediate impact of that architecture bump from day one.
Press was told of a handful of high level changes to the new architecture as well. Branch prediction gets a complete overhaul. This marks the first AMD processor to have a micro-op cache. Wider execution width with broader instruction schedulers are integrated, all of which adds up to much higher instruction level parallelism to improve single threaded performance.
Performance improvements aside, throughput and efficiency go up with Zen as well. AMD has integrated an 8MB L3 cache and improved prefetching for up 5x the cache bandwidth available per core on the CPU. SMT makes sure the pipeline stays full to prevent “bubbles” that introduce latency and lower efficiency while region-specific power gating means that we’ll see Zen in notebooks as well as enterprise servers in 2017. It truly is an impressive design from AMD.
Summit Ridge, the enthusiast platform that will be the first product available with Zen, is based on the AM4 platform and processors will go up to 8-cores and 16-threads. DDR4 memory support is included, PCI Express 3.0 and what AMD calls “next-gen” IO – I would expect a quick leap forward for AMD to catch up on things like NVMe and Thunderbolt.
The Real Deal – Zen Performance
As part of today’s reveal, AMD is showing the first true comparison between Zen and Intel processors. Sure, AMD showed a Zen-powered system running the upcoming Deus Ex running at 4K with a system powered by the Fury X, but the really impressive results where shown when comparing Zen to a Broadwell-E platform.
Using Blender to measure the performance of a rendering workload (a Zen CPU mockup of course), AMD ran an 8-core / 16-thread Zen processor at 3.0 GHz against an 8-core / 16-thread Broadwell-E processor at 3.0 GHz (likely a fixed clocked Core i7-6900K). The point of the demonstration was to showcase the IPC improvements of Zen and it worked: the render completed on the Zen platform a second or two faster than it did on the Intel Broadwell-E system.
Not much to look at, but Zen on the left, Broadwell-E on the right...
Of course there are lots of caveats: we didn’t setup the systems, I don’t know for sure that GPUs weren’t involved, we don’t know the final clocks of the Zen processors releasing in early 2017, etc. But I took two things away from the demonstration that are very important.
- The IPC of Zen is on-par or better than Broadwell.
- Zen will scale higher than 3.0 GHz in 8-core configurations.
AMD obviously didn’t state what specific SKUs were going to launch with the Zen architecture, what clock speeds they would run at, or even what TDPs they were targeting. Instead we were left with a vague but understandable remark of “comparable TDPs to Broadwell-E”.
Pricing? Overclocking? We’ll just have to wait a bit longer for that kind of information.
There is clearly a lot more for AMD to share about Zen but the announcement and showcase made this week with the early prototype products have solidified for me the capability and promise of this new microarchitecture. We have asked for, and needed, as an industry, a competitor to Intel in the enthusiast CPU space – something we haven’t legitimately had since the Athlon X2 days. Zen is what we have been pining over, what gamers and consumers have needed.
AMD’s processor stars might finally be aligning for a product that combines performance, efficiency and scalability at the right time. I’m ready for it –are you?
Subject: Graphics Cards, Processors | August 17, 2016 - 01:38 PM | Scott Michaud
Tagged: Xeon Phi, larrabee, Intel
Tom Forsyth, who is currently at Oculus, was once on the core Larrabee team at Intel. Just prior to Intel's IDF conference in San Francisco, which Ryan is at and covering as I type this, Tom wrote a blog post that outlined the project and its design goals, including why it didn't hit market as a graphics device. He even goes into the details of the graphics architecture, which was almost entirely in software apart from texture units and video out. For instance, Larrabee was running FreeBSD with a program, called DirectXGfx, that gave it the DirectX 11 feature set -- and it worked on hundreds of titles, too.
Also, if you found the discussion interesting, then there is plenty of content from back in the day to browse. A good example is an Intel Developer Zone post from Michael Abrash that discussed software rasterization, doing so with several really interesting stories.