The end of the world as we know it?
A surprise to most in the industry that such a thing would really occur, AMD and Intel announced in November a partnership that would bring Radeon graphics to Intel processors in 2018. The details were minimal at the time, and only told us specifics of the business relationship: this was a product purchase and not a license, no IP was changing hands, this was considered a semi-custom design for the AMD group, Intel was handling all the integration and packaging. Though we knew that the product would use HBM2 memory, the same utilized on the RX Vega products released last year, it was possible that the “custom” part was a Polaris architecture that had been retrofitted. Also, details of the processor side of this technology was left a mystery.
Today we have our answers and our first hands-on with systems utilizing what was previously known as Kaby Lake-G and what is now officially titled the “8th Generation Intel Core Processors with Radeon RX Vega M Graphics.” I’m serious.
For what I still call Kaby Lake-G, as it easier to type and understand, it introduces a new product line that we have not seen addressed in a very long time – high performance processors with high performance integrated graphics. Even though the combined part is not a single piece of silicon but instead a multi-chip package, it serves the same purpose in the eyes of the consumer and the OEM. The marriage of Intel’s highest performance mobile processor cores, the 8th Generation H-series, and one of, if not THE fastest mobile graphics core in a reasonable thermal envelope, the Vega M, is incredibly intriguing for all kinds of reasons. Even the currently announced AMD APUs and those in the public roadmaps don’t offer a combined performance package as impressive as this. Ryzen Mobile is interesting in its own right, but Kaby Lake-G is on a different level.
From a business standpoint, KBL-G is a design meant to attack NVIDIA. The green giant has become one of the most important computing companies on the planet in the last couple of years, leaning into its graphics processor dominance and turning it into cash and mindshare in the world of machine learning and AI. More than any other company, Intel is worried about the growth and capability of NVIDIA. Though not as sexy as “machine learning”, NVIDIA has dominated the mobile graphics markets as well, offering discrete GPU solutions to pair with Intel processor notebooks. In turn, NVIDIA eats up much of the margin and profitability that these mainstream gaming and content creation machines can generate. Productization of things like Max-Q give the market reason to believe that NVIDIA is the true innovator in the space, regardless of the legitimate answer to that question. Intel see that as no bueno – it wants to remain the leader in the market completely.
Subject: Processors | January 4, 2018 - 01:15 PM | Jeremy Hellstrom
Tagged: linux, spectre, meltdown, Intel
As the Linux patch for the Intel kernel issue is somewhat more mature than the Windows patch which was just pushed out, and because the patch may have more impact on hosting solutions than gaming machines, we turn to Phoronix for test results. Their testing overview looks at both Intel and AMD, as the PTI patch can be installed on AMD systems and it is not a bad idea to do so. The results are somewhat encouraging, CPUs with PCID (Process Context ID) such as Sandy Bridge and newer seem to see little effect from the patch, network performance seems unchanged and Xeon's see far less of an effect across the board than desktop machines. That is not to say there is no impact whatsoever, in synthetic benchmarks which make frequent system calls or depend on optimized access to the kernel they did see slowdowns; thankfully those workloads are not common for enthusiast software. Expect a lot more results from both Windows and Linux over the coming weeks.
"2018 has been off to a busy start with all the testing around the Linux x86 PTI (Page Table Isolation) patches for this "Intel CPU bug" that potentially dates back to the Pentium days but has yet to be fully disclosed. Here is the latest."
Here are some more Processor articles from around the web:
- Testing Windows 10 Performance Before and After the Meltdown Flaw Emergency Patch @ TechSpot
- 2nd-Gen Core i7 vs. 8th-Gen Core i7: RIP Sandy Bridge @ Techspot
- Intel Core i7 8700k @ Modders-Inc
- Ryzen Mobile Finally Arrives: AMD Ryzen 5 2500U @ Techspot
- Intel Core i9-7900X 3.3 GHz @ TechPowerUp
- The Best CPUs: This is what you should get @ Techspot
Subject: Processors | January 3, 2018 - 08:17 PM | Ryan Shrout
Tagged: Intel, amd, arm, meltdown, spectre, security
The following story was originally posted on ShroutResearch.com.
UPDATE 1 - 8:25pm
Just before the closing bell on Wednesday, Intel released a statement responding to the security issues brought up in this story. While acknowledging that these new security concerns do exist, the company went out of its way to insinuate that AMD, Arm Holdings, and others were at risk. Intel also states that performance impact on patched machines “should not be significant and will be mitigated over time.”
Intel’s statement is at least mostly accurate though the released report from the Google Project Zero group responsible for finding the security vulnerability goes into much more detail. The security issue concerns a feature called “speculative execution” in which a computer tries to predict work that will be needed beforehand to speed up processing tasks. The paper details three variants of this particular vulnerability, the first of which applies to Intel, AMD, Arm, any nearly every other modern processor architecture. This variant is easily patched and should have near-zero effect on performance.
The second variant is deeply architecture specific, meaning attackers would need a unique code for each different Intel or AMD processor. This example should be exceedingly rare in the wild, and AMD goes as far as to call it a “near-zero” risk for systems.
The third is where things are more complex and where the claim that AMD processors are not susceptible is confirmed. This one is the source of the leaks and information that filtered out and was the target of the information for the story below. In its statement, AMD makes clear that due to architectural design differences on its products, past and modern processors from its family are not at risk.
The final outlook from this story looks very similar to how it did early on Wednesday though with a couple of added wrinkles. The security report released by Project Zero indicates that most modern hardware is at risk though to different degrees based on the design of the chips themselves. Intel is not alone in this instance, but it does have additional vulnerabilities that other processor designs do not incur. To insinuate otherwise in its public statement is incorrect.
As for performance impact, most of the initial testing and speculation is likely exaggerating how it will change the landscape, if at all. Neither Intel nor AMD see a “doomsday” scenario of regressing computing performance because of this security patch.
At the end of 2017, Intel CEO Brian Krzanich said his company would be going through changes in the New Year, becoming more aggressive, and taking the fight to its competitors in new and existing markets. It seems that BK will have his first opportunity to prove out this new corporate strategy with a looming security issue that affects nearly 10 years of processors.
A recently revealed hardware bug in Intel processors is coming to light as operating system vendors like Microsoft and the Linux community scramble to update platforms to avoid potential security concerns. This bug has been rumored for some time, with updates to core Linux software packages indicating that a severe vulnerability was being fixed, but with comments redacted when published. Security flaws are often kept secret to avoid being exploited by attackers until software patches are available to correct them.
This hardware-level vulnerability allows user-mode applications, those run by general consumers or businesses, to potentially gain access to kernel-level memory space, an area that is handled by the operating system exclusively and can contain sensitive information like passwords, biometrics, and more. An attacker could use this flaw to potentially access other user-mode application data, compromising entire systems with bypass around integrated operating system firewalls.
At a time when Intel is being pressured from many different angles and markets, this vulnerability and hardware bug comes at an incredibly inopportune time. AMD spent its 2017 releasing competitive products in the consumer space with Ryzen and the enterprise space with EPYC. The enterprise markets in particular are at risk for Intel. The EPYC processors already offered performance and pricing advantages and now AMD can showcase security as none of its processor are affected by the same vulnerability that Intel is saddled with. Though the enterprise space works in cycles, and AMD won’t see an immediate uptick in sales, I would be surprised if this did not push more cloud providers and large scale server deployments to look at the AMD offerings.
At this point, only the Linux community has publicly discussed the fixes taking place, with initial patches going out earlier this week. Much of the enterprise and cloud ecosystem runs on Linux-based platforms and securing these systems against attack is a crucial step. Microsoft has yet to comment publicly on what its software updates will look like, when they will be delivered, and what impact they have might on consumer systems.
While hardware and software vulnerabilities are common in today’s connected world, there are two key points that make this situation more significant. First, this is a hardware bug, meaning that it cannot be fixed or addressed completely without Intel making changes to its hardware design, a process that can take months or years to complete. As far as we can tell, this bug will affect ALL Intel processors released in the last decade or more, including enterprise Xeon processors and consumer Core and Pentium offerings. And as Intel has been the dominate market leader in both the enterprise and consumer spaces, there are potentially hundreds of millions of affected systems in the field.
The second differentiating point for this issue is that the software fix could impact the performance of systems. Initial numbers have been claiming as much as a 30% reduction in performance, but those results are likely worst case scenarios. Some early testing of the updated Linux platforms indicate performance could decrease from 6-20% depending on the application. Other testing of consumer workloads including gaming show almost no performance impact. Linux founder and active developer Linus Torvalds claims performance impact would range from nothing to “double-digit slowdowns.”
Even though the true nature of this vulnerability is still tied behind non-disclosure agreements, it is unlikely that there will be a double-digit performance reduction on servers at a mass scale when these updates are pushed out. Intel is aware of this vulnerability and has been for some time, and financially it would need to plan for any kind of product replacement or reimbursement campaign it might undertake with partners and customers.
Subject: General Tech, Processors | December 12, 2017 - 04:52 PM | Tim Verry
Tagged: training, nnp, nervana, Intel, flexpoint, deep learning, asic, artificial intelligence
Intel recently provided a few insights into its upcoming Nervana Neural Network Processor (NNP) on its blog. Built in partnership with deep learning startup Nervana Systems which Intel acquired last year for over $400 million, the AI-focused chip previously codenamed Lake Crest is built on a new architecture designed from the ground up to accelerate neural network training and AI modeling.
The full details of the Intel NNP are still unknown, but it is a custom ASIC with a Tensor-based architecture placed on a multi-chip module (MCM) along with 32GB of HBM2 memory. The Nervana NNP supports optimized and power efficient Flexpoint math and interconnectivity is huge on this scalable platform. Each AI accelerator features 12 processing clusters (with an as-yet-unannounced number of "cores" or processing elements) paired with 12 proprietary inter-chip links that 20-times faster than PCI-E, four HBM2 memory controllers, a management-controller CPU, as well as standard SPI, I2C, GPIO, PCI-E x16, and DMA I/O. The processor is designed to be highly configurable and to meet both mode and data parallelism goals.
The processing elements are all software controlled and can communicate with each other using high speed bi-directional links at up to a terabit per second. Each processing element has more than 2MB of local memory and the Nervana NNP has 30MB in total of local memory. Memory accesses and data sharing is managed with QOS software which controls adjustable bandwidth over multiple virtual channels with multiple priorities per channel. Processing elements can talk to and send/receive data between each other and the HBM2 stacks locally as well as off die to processing elements and HBM2 on other NNP chips. The idea is to allow as much internal sharing as possible and to keep as much data stored and transformed in local data as possible in order to save precious HBM2 bandwidth (1TB/s) for pre-fetching upcoming tensors, reduce the number of hops and resulting latency by not having to go out to the HBM2 memory and back to transfer data between cores and/or processors, and to save power. This setup also helps Intel achieve an extremely parallel and scalable platform where multiple Nervana NNP Xeon co-processors on the same and remote boards effectively act as a massive singular compute unit!
Intel's Flexpoint is also at the heart of the Nervana NNP and allegedly allows Intel to achieve similar results to FP32 with twice the memory bandwidth while being more power efficient than FP16. Flexpoint is used for the scalar math required for deep learning and uses fixed point 16-bit multiply and addition operations with a shared 5-bit exponent. Unlike FP16, Flexpoint uses all 16-bits of address space for the mantissa and passes the exponent in the instruction. The NNP architecture also features zero cycle transpose operations and optimizations for matrix multiplication and convolutions to optimize silicon usage.
Software control allows users to dial in the performance for their specific workloads, and since many of the math operations and data movement are known or expected in advance, users can keep data as close to the compute units working on that data as possible while minimizing HBM2 memory accesses and data movements across the die to prevent congestion and optimize power usage.
Intel is currently working with Facebook and hopes to have its deep learning products out early next year. The company may have axed Knights Hill, but it is far from giving up on this extremely lucrative market as it continues to push towards exascale computing and AI. Intel is pushing for a 100x increase in neural network performance by 2020 which is a tall order but Intel throwing its weight around in this ring is something that should give GPU makers pause as such an achievement could cut heavily into their GPGPU-powered entries into this market that is only just starting to heat up.
You won't be running Crysis or even Minecraft on this thing, but you might be using software on your phone for augmented reality or in your autonomous car that is running inference routines on a neural network that was trained on one of these chips soon enough! It's specialized and niche, but still very interesting.
- Intel Launches Stratix 10 FPGA With ARM CPU and HBM2
- Intel's Nervana chip targets Nvidia on artificial intelligence
- New AI products will Crest Computex
- Intel to Ship FPGA-Accelerated Xeons in Early 2016
- Intel Kills Knights Hill, Will Launch Xeon Phi Architecture for Exascale Computing @ ExtremeTech
- NVIDIA Discusses Multi-Die GPUs
Subject: Processors | December 3, 2017 - 03:16 PM | Scott Michaud
Tagged: Intel, Cannonlake, 10nm
According to Fudzilla’s unnamed, “well-placed” sources, Intel could have already launched a 10nm CPU, but they are waiting until yields get better. This comment can be parsed in multiple ways. If they mean that “yeah, we could have a 10nm part out, but not covering our entire product stack and our yields would be so bad that we’d have shortages for several months” then, well, yeah. That is a bit of a “duh” comment. Intel can technically make a 10nm product if you don’t care about yields, supply, and intended TDP.
If, however, the comment means something along the lines of “we currently have a worst-case yield of 85%, but we’re waiting until we cross 90%” then… I doubt it’s true (or, at least, it’s not a whole truth). Coffee Lake is technically (if you count Broadwell) their fourth named 14nm architecture. I would expect that Intel’s yields would need to be less-than-mediocre to delay 10nm for this long. Their reactions to AMD seems to be a knee-jerk “add cores” with a little “we’re still the best single-threaded tech” on the side. Also, they are looking like they have fallen behind the other fabs, which mostly ship 10nm in mobile.
I doubt Intel would let all that stigma propagate just to get a few extra percent yield at launch.
Of course, I could be wrong. It just seems like the “we’re waiting for better yields” argument is a little more severe than the post is letting on. They would have pushed out a product by now if it was viable-but-suboptimal, right? That would have been the lesser of two evils, right?
Subject: Processors | November 28, 2017 - 03:39 PM | Jeremy Hellstrom
Tagged: Ryzen 5 2500U, Envy x360, amd
HP released a Ryzen powered laptop recently, the Envy x360, which The Tech Report used to test out the performance of the Ryzen 5 2500U. The APU sports four cores with a base clock of 2.0GHz, boosting to 3.6GHz and eight GPU CUs with a clock of 1100 MHz. In order to level the playing field when comparing it to Intel-powered gaming laptops, they installed a Samsung 960 EVO 500GB NVMe which was sadly not installed in the Envy. The mobile chip's GPU matched a pattern similar to Vega GPUs, offering a bit better performance at lower resolutions but vastly outpacing the performance of Intel's integrated GPU at higher resolutions. You will still be better off with a mobile GPU playing The Witcher 3 at 1600x900 but the fact that the Ryzen can hit 24fps with decent frame times is very impressive indeed.
It might even run faster once you remove that certain piece of software recently installed on HP laptops.
"AMD's Ryzen 5 2500U pairs the competitive performance of four Zen CPU cores with eight compute units of Vega graphics power in a notebook-friendly power envelope. We put the Ryzen 5 2500U to the test aboard HP's Envy x360 laptop to see whether the fusion of Zen and Vega results in the best APU yet."
Here are some more Processor articles from around the web:
- Intel's Core i5-8250U @ The Tech Report
- 6-Way Enterprise Focused Linux Distribution Comparison With An Intel Core i9, Dual Xeon Gold Systems @ Phoronix
- 4th-Gen Core i7 vs. 8th-Gen Core i7 @ Techspot
Subject: Processors | November 16, 2017 - 04:38 PM | Jeremy Hellstrom
Tagged: amd, EPYC, 7401P
AMD's new EPYC server chips range in price from around $4000 for the top end 32 core 7601 to around $500 for the 8 core 7251 with the $1000, 24 core EPYC 7401P sitting towards the middle of this family. Phoronix have tested quite a few of these processors, today focusing on the aforementioned 7401P, testing it against several other EPYC processors as well as several Xeon E3 and E5 models as well as a Gold and a Silver. To say that AMD showed up Intel in multithreaded performance is somewhat of an understatement as you can see in their benchmarks. Indeed in many cases you need around $5000 worth of Intel CPU to compete with the 7401P and even then Intel lags behind in many tests. The only shortcoming of the 7401P is that it can only be run in single socket configurations, not that you necessarily need two of these chips!
"We've been looking at the interesting AMD EPYC server processors recently from the high-end EPYC 7601 to the cheapest EPYC 7251 at under $500 as well as the EPYC 7351P that offers 16 cores / 32 threads for only about $750. The latest EPYC processor for testing at Phoronix has been the EPYC 7401P, a 24 core / 48 thread part that is slated to retail for around $1075 USD."
Here are some more Processor articles from around the web:
- AMD EPYC 7551 @ Phoronix
- Core i5-8400 vs. Overclocked Ryzen 5 1600 @ TechSpot
- In Hindsight: Some of the Worst CPU/GPUs Purchases of 2017 @ TechSpot
- The Latest In Our Massive Linux Benchmarking Setup - November 2017 @ Phoronix
- i7-2600K vs. i7-8700K - Is Upgrading Worthwhile? @ Hardware Canucks
Subject: General Tech, Processors | November 9, 2017 - 02:30 PM | Ken Addison
Tagged: Skull Canyon, nuc, kaby lake-g, Intel, Hades Canyon VR, Hades Canyon, EMIL, amd
Hot on the heels of Intel's announcement of new mobile-focused CPUs integrating AMD Radeon graphics, we have our first glimpse at a real-world design using this new chip.
Posted on the infamous Chinese tech forum, Chiphell earlier today, this photo appears to be a small form factor PC design integrating the new Kaby Lake-G CPU and GPU solution.
Looking at the standard size components on the board like the Samsung M.2 SSD and the DDR4 SODIMM memory modules, we can start to get a better idea of the actual size of the Kaby Lake-G module.
Additionally, we get our first look at the type of power delivery infrastructure that devices with Kaby Lake-G are going to require. It's impressive how small the motherboard is taking into account all of the power phases needed to feed the CPU, GPU, and HBM 2 memory.
Looking back at the leaked NUC roadmap from September, the picture starts to become more clear. While the "Hades Canyon" NUCs on this roadmap threw us for a loop when we first saw it months ago, it's now clear that they are referencing the new Kaby Lake-G line of products. The plethora of IO options from the roadmap, including dual Gigabit Ethernet and 2 Thunderbolt 3 ports also seem to match closely with the leaked NUC photo above.
Using this information we also now have a better idea of the thermal and power requirements for Kaby Lake-G. The base "Hades Canyon" NUC is listed with a 65W processor, while the "Hades Canyon VR" is listed with as a 100W part. This means that devices retain the same levels of CPU performance from the existing Kaby Lake-H Quad Core mobile CPUs which clock in at 35W, plus roughly 30 or 65W of graphics performance.
These leaked 3DMark scores might give us an idea of the performance of the Hades Canyon VR NUC.
One thing is clear; Hades Canyon will be the highest power NUC Intel has ever produced, surpassing the 45W Skull Canyon. Considering the already unusual for a NUC footprint of Skull Canyon, I'm interested to see the final form of Hades Canyon as well as the performance it brings!
With what looks to be a first half 2018 release date on the roadmap, it seems likely that we could see this NUC or other similar devices being shown off at CES in January. Stay tuned for more continuing coverage of Intel's Kaby Lake-G and upcoming devices featuring it!
Subject: Processors | November 8, 2017 - 02:03 PM | Ryan Shrout
Tagged: qualcomm, centriq 2400, centriq, arm
At an event in San Jose on Wednesday, Qualcomm and partners officially announced that its Centriq 2400 server processor based on the Arm-architecture was shipping to commercial clients. This launch is of note as it becomes the highest-profile and most partner-lauded Arm-based server CPU and platform to be released after years of buildup and excitement around several similar products. The Centriq is built specifically for enterprise cloud workloads with an emphasis on high core count and high throughput and will compete against Intel’s Xeon Scalable and AMD’s new EPYC platforms.
Paul Jacobs shows Qualcomm Centriq to press and analysts
Built on the same 10nm process technology from Samsung that gave rise to the Snapdragon 835, the Centriq 2400 becomes the first server processor in that particular node. While Qualcomm and Samsung tout that as a significant selling point, on its own it doesn’t hold much value. Where it does come into play and impact the product position with the resulting power efficiency it brings to the table. Qualcomm claims that the Centriq 2400 will “offer exceptional performance-per-watt and performance-per dollar” compared to the competition server options.
The raw specifications and capabilities of the Centriq 2400 are impressive.
|Centriq 2460||Centriq 2452||Centriq 2434|
|Process Tech||10nm (Samsung)||10nm (Samsung)||10nm (Samsung)|
|Base Clock||2.2 GHz||2.2 GHz||2.3 GHz|
|Max Clock||2.6 GHz||2.6 GHz||2.5 GHz|
|Memory Speeds||2667 MHz
|Cache||24MB L2, split
|23MB L2, split
|20MB L2, split
|PCIe||32 lanes PCIe 3.0||32 lanes PCIe 3.0||32 lanes PCIe 3.0|
Built on 18 billion transistors a die area of just 398mm2, the SoC holds 48 high-performance 64-bit cores running at frequencies as high as 2.6 GHz. (Interestingly, this appears to be about the same peak clock rate of all the Snapdragon processor cores we have seen on consumer products.) The cores are interconnected by a bi-directional ring bus that is reminiscent of the integration Intel used on its Core processor family up until Skylake-SP was brought to market. The bus supports 250 GB/s of aggregate bandwidth and Qualcomm claims that this will alleviate any concern over congestion bottlenecks, even with the CPU cores under full load.
The caching system provides 512KB of L2 cache for every pair of CPU cores, essentially organizing them into dual-core blocks. 60MB of L3 cache provides core-to-core communications and the cache is physically divided around the die for on-average faster access. A 6-channel DDR4 memory systems, with unknown peak frequency, supports a total of 768GB of capacity.
Connectivity is supplied with 32 lanes of PCIe 3.0 and up to 6 PCIe devices.
As you should expect, the Centriq 2400 supports the ARM TrustZone secure operating environment and hypervisors for virtualized environments. With this many cores on a single chip, it seems likely one of the key use cases for the server CPU.
Maybe most impressive is the power requirements of the Centriq 2400. It can offer this level of performance and connectivity with just 120 watts of power.
With a price of $1995 for the Centriq 2460, Qualcomm claims that it can offer “4X better performance per dollar and up to 45% better performance per watt versus Intel’s highest performance Skylake processor, the Intel Xeon Platinum 8180.” That’s no small claim. The 8180 is a 28-core/56-thread CPU with a peak frequency of 3.8 GHz and a TDP of 205 watts and a cost of $10,000 (not a typo).
Qualcomm had performance metrics from industry standard SPECint measurements, in both raw single thread configurations as well as performance per dollar and per watt. I will have more on the performance story of Centriq later this week.
More important than simply showing hardware, Qualcomm and several partners on hand at the press event as well as many statements from important vendors like Alibaba, HPE, Google, Microsoft, and Samsung. Present to showcase applications running on the Arm-based server platforms was an impressive list of the key cloud services providers: Alibaba, LinkedIn, Cloudflare, American Megatrends Inc., Arm, Cadence Design Systems, Canonical, Chelsio Communications, Excelero, Hewlett Packard Enterprise, Illumina, MariaDB, Mellanox, Microsoft Azure, MongoDB, Netronome, Packet, Red Hat, ScyllaDB, 6WIND, Samsung, Solarflare, Smartcore, SUSE, Uber, and Xilinx.
The Centriq 2400 series of SoC isn’t perfect for all general-purpose workloads and that is something we have understood from the outset of this venture by Arm and its partners to bring this architecture to the enterprise markets. Qualcomm states that its parts are designed for “highly threaded cloud native applications that are developed as micro-services and deployed for scale-out.” The result is a set of workloads that covers a lot of ground:
- Web front end with HipHop Virtual Machine
- NoSQL databases including MongoDB, Varnish, Scylladb
- Cloud orchestration and automation including Kubernetes, Docker, metal-as-a-service
- Data analytics including Apache Spark
- Deep learning inference
- Network function virtualization
- Video and image processing acceleration
- Multi-core electronic design automation
- High throughput compute bioinformatics
- Neural class networks
- OpenStack Platform
- Scaleout Server SAN with NVMe
- Server-based network offload
I will be diving more into the architecture, system designs, and partner announcements later this week as I think the Qualcomm Centriq 2400 family will have a significant impact on the future of the enterprise server markets.
Subject: Processors | November 6, 2017 - 02:00 PM | Josh Walrath
Tagged: radeon, Polaris, mobile, kaby lake, interposer, Intel, HBM2, gaming, EMIB, apple, amd, 8th generation core
In what is probably considered one of the worst kept secrets in the industry, Intel has announced a new CPU line for the mobile market that integrates AMD’s Radeon graphics. For the past year or so rumors of such a partnership were freely flowing, but now we finally get confirmation as to how this will be implemented and marketed.
Intel’s record on designing GPUs has been rather pedestrian. While they have kept up with the competition, a slew of small issues and incompatibilities have plagued each generation. Performance is also an issue when trying to compete with AMD’s APUs as well as discrete mobile graphics offerings from both AMD and NVIDIA. Software and driver support is another area where Intel has been unable to compete due largely to economics and the competitions’ decades of experience in this area.
There are many significant issues that have been solved in one fell swoop. Intel has partnered with AMD’s Semi-Custom Group to develop a modern and competent GPU that can be closely connected to the Intel CPU all the while utilizing HBM2 memory to improve overall performance. The packaging of this product utilizes Intel’s EMIB (Embedded Multi-die Interconnect Bridge) tech.
EMIB is an interposer-like technology that integrates silicon bridges into the PCB instead of relying upon a large interposer. This allows a bit more flexibility in layout of the chips as well as lowers the Z height of the package as there is not a large interposer sitting between the chips and the PCB. Just as interposer technology allows the use of chips from different process technologies to work seamlessly together, EMIB provides that same flexibility.
The GPU looks to be based on the Polaris architecture which is a slight step back from AMD’s cutting edge Vega architecture. Polaris does not implement the Infinity Fabric component that Vega does. It is more conventional in terms of data communication. It is a step beyond what AMD has provided for Sony and Microsoft, who each utilize a semi-custom design for the latest console chips. AMD is able to integrate the HBM2 controller that is featured in Vega. Using HBM2 provides a tremendous amount of bandwidth along with power savings as compared to traditional GDDR-5 memory modules. It also saves dramatically on PCB space allowing for smaller form factors.
EMIB provides nearly all of the advantages of the interposer while keeping the optimal z-height of the standard PCB substrate.
Intel did have to do quite a bit of extra work on the power side of the equation. AMD utilizes their latest Infinity Fabric for fine grained power control in their upcoming Raven Ridge based Ryzen APUs. Intel had to modify their current hardware to be able to do much the same work with 3rd party silicon. This is no easy task as the CPU needs to monitor and continually adjust for GPU usage in a variety of scenarios. This type of work takes time and a lot of testing to fine tune as well as the inevitable hardware revisions to get thing to work correctly. This then needs to be balanced by the GPU driver stack which also tends to take control of power usage in mobile scenarios.
This combination of EMIB, Intel Kaby Lake CPU, HBM2, and a current AMD GPU make this a very interesting combination for the mobile and small form factor markets. The EMIB form factor provides very fast interconnect speeds and a smaller footprint due to the integration of HBM2 memory. The mature AMD Radeon software stack for both Windows and macOS environments provides Intel with another feature in which to sell their parts in areas where previously they were not considered. The 8th Gen Kaby Lake CPU provides the very latest CPU design on the new 14nm++ process for greater performance and better power efficiency.
This is one of those rare instances where such cooperation between intense rivals actually improves the situation for both. AMD gets a financial shot in the arm by signing a large and important customer for their Semi-Custom division. The royalty income from this partnership should be more consistent as compared to the console manufacturers due to the seasonality of the console product. This will have a very material effect on AMD’s bottom line for years to come. Intel gets a solid silicon solution with higher performance than they can offer, as well as aforementioned mature software stack for multiple OS. Finally throw in the HBM2 memory support for better power efficiency and a smaller form factor, and it is a clear win for all parties involved.
The PCB savings plus faster interconnects will allow these chips to power smaller form factors with better performance and battery life.
One of the unknowns here is what process node the GPU portion will be manufactured on. We do not know which foundry Intel will use, or if they will stay in-house. Currently TSMC manufactures the latest console SoCs while GLOBALFOUNDRIES handles the latest GPUS from AMD. Initially one would expect Intel to build the GPU in house, but the current rumor is that AMD will work to produce the chips with one of their traditional foundry partners. Once the chip is manufactured then it is sent to Intel to be integrated into their product.
Apple is one of the obvious candidates for this particular form factor and combination of parts. Apple has a long history with Intel on the CPU side and AMD on the GPU side. This product provides all of the solutions Apple needs to manufacture high performance products in smaller form factors. Gaming laptops also get a boost from such a combination that will offer relatively high performance with minimal power increases as well as the smaller form factor.
The potential (leaked) performance of the 8th Gen Intel CPU with Radeon Graphics.
The data above could very well be wrong about the potential performance of this combination. What we see is pretty compelling though. The Intel/AMD product performs like a higher end CPU with discrete GPU combo. It is faster than a NVIDIA GTX 1050 Ti and trails the GTX 1060. It also is significantly faster than a desktop AMD RX 560 part. We can also see that it is going to be much faster than the flagship 15 watt TDP AMD Ryzen 7 2700U. We do not yet know how it compares to the rumored 65 watt TDP Raven Ridge based APUs from AMD that will likely be released next year. What will be fascinating here is how much power the new Intel combination will draw as compared to the discrete solutions utilizing NVIDIA graphics.
To reiterate, this is Intel as a customer for AMD’s Semi-Custom group rather than a licensing agreement between the two companies. They are working hand in hand in developing this solution and then both profiting from it. AMD getting royalties from every Intel package sold that features this technology will have a very positive effect on earnings. Intel gets a cutting edge and competent graphics solution along with the improved software and driver support such a package includes.
Update: We have been informed that AMD is producing the chips and selling them directly to Intel for integration into these new SKUs. There are no royalties or licensing, but the Semi-Custom division should still receive the revenue for these specialized products made only for Intel.