Subject: Processors | September 30, 2015 - 09:55 PM | Josh Walrath
Tagged: TSMC, Samsung, FinFET, apple, A9, 16 nm, 14 nm
So the other day the nice folks over at Chipworks got word that Apple was in fact sourcing their A9 SOC at both TSMC and Samsung. This is really interesting news on multiple fronts. From the information gleaned the two parts are the APL0898 (Samsung fabbed) and the APL1022 (TSMC).
These process technologies have been in the news quite a bit. As we well know, it has been a hard time for any foundry to go under 28 nm in an effective way if your name is not Intel. Even Intel has had some pretty hefty issues with their march to sub 32 nm parts, but they have the resources and financial ability to push through a lot of these hurdles. One of the bigger problems that affected the foundries was the idea that they could push back FinFETs beyond what they were initially planning. The idea was to hit 22/20 nm and use planar transistors and push development back to 16/14 nm for FinFET technology.
The Chipworks graphic that explains the differences between Samsung's and TSMC's A9 products.
There were many reasons why this did not work in an effective way for the majority of products that the foundries were looking to service with a 22/20 nm planar process. Yes, there were many parts that were fabricated using these nodes, but none of them were higher power/higher performance parts that typically garner headlines. No CPUs, no GPUs, and only a handful of lower power SOCs (most notably Apple's A8, which was around 89 mm squared and consumed up to 5 to 10 watts at maximum). The node just did not scale power very effectively. It provided a smaller die size, but it did not increase power efficiency and switching performance significantly as compared to 28 nm high performance nodes.
The information Chipworks has provided also verifies that Samsung's 14 nm FF process is more size optimized than TSMC's 16 nm FF. There was originally some talk about both nodes being very similar in overall transistor size and density, but Samsung has a slightly tighter design. Neither of them are smaller than Intel's latest 14 nm which is going into its second generation form. Intel still has a significant performance and size advantage over everyone else in the field. Going back to size we see the Samsung chip is around 96 mm square while the TSMC chip is 104.5 mm square. This is not huge, but it does show that the Samsung process is a little tighter and can squeeze more transistors per square mm than TSMC.
In terms of actual power consumption and clock scaling we have nothing to go on here. The chips are both represented in the 6S and 6S+. Testing so far has not shown there to be significant differences between the two SOCs so far. In theory one could be performing better than the other, but in reality we have not tested these chips at a low enough level to discern any major performance or power issue. My gut feeling here is that Samsung's process is more mature and running slightly better than TSMC's, but the differences are going to be minimal at best.
The next piece of info that we can glean from this is that there just isn't enough line space for all of the chip companies who want to fabricate their parts with either Samsung or TSMC. From a chip standpoint a lot of work has to be done to port a design to two different process nodes. While 14 and 16 are similar in overall size and the usage of FinFETS, the standard cells and design libraries for both Samsung and TSMC are going to be very different. It is not a simple thing to port over a design. A lot of work has to be done in the design stage to make a chip work with both nodes. I can tell you that there is no way that both chips are identical in layout. It is not going to be a "dumb port" where they just adjust the optics with the same masks and magically make these chips work right off the bat. Different mask sets for each fab, verification of both designs, and troubleshooting the yields by metal layer changes will be different for each manufacturer.
In the end this means that there just simply was not enough space at either TSMC or Samsung to handle the demand that Apple was expecting. Because Apple has deep pockets they contracted out both TSMC and Samsung to produce two very similar, but still different parts. Apple also likely outbid and locked down what availability to process wafers that Samsung and TSMC have, much to the dismay of other major chip firms. I have no idea what is going on in the background with people like NVIDIA and AMD when it comes to line space for manufacturing their next generation parts. At least for AMD it seems that their partnership with GLOBALFOUNDRIES and their version of 14 nm FF is having a hard time taking off. Eventually more space will be made in production and yields and bins will improve. Apple will stop taking up so much space and we can get other products rolling off the line. In the meantime, enjoy that cutting edge iPhone 6S/+ with the latest 14/16 nm FF chips.
Subject: Processors | September 27, 2015 - 07:01 AM | Scott Michaud
Tagged: Skylake, iris pro, Intel, Broadwell
Thanks to the Tech Report for pointing this out, but some recent stock level troubles with Skylake and Broadwell have been overcome. Both Newegg and Amazon have a few Core i7-6700Ks that are available for purchase, and both also have the Broadwell Core i7s and Core i5s with Iris Pro graphics. Moreover, Microcenter has stock of the Skylake processor at some of their physical stores with the cheapest price tag of all, but they do not have the Broadwell chips with Iris Pro (they are not even listed).
You'll notice that Skylake is somewhat cheaper than the Core i7 Broadwell, especially on Newegg. That is somewhat expected, as Broadwell with Iris Pro is a larger die than Skylake with an Intel HD 530. A bigger die means that fewer can be cut from a wafer, and thus each costs more (unless the smaller die has a relatively high amount of waste to compensate of course). Also, if you go with Broadwell, you will miss out on the Z170 chipset, because they still use Haswell's LGA-1150 socket.
On the other hand, despite being based on an older architecture and having much less thermal headroom, you can find some real-world applications that really benefit from the 128 MB of L4 Cache that Iris Pro brings, even if the iGPU itself is unused. The graphics cache can be used by the main processor. In Project Cars, again, according to The Tech Report, the i7-5775C measured a 5% increase in frame rate over the newer i7-6700k -- when using a GeForce GTX 980. Granted, this was before the FCLK tweak on Skylake so there are a few oranges mixed with our apples. PCIe rates might be slightly different now.
Regardless, they're all available now. If you were awaiting stock, have fun.
Subject: Graphics Cards, Processors | September 17, 2015 - 09:33 PM | Scott Michaud
Tagged: Skylake, kaby lake, iris pro, Intel, edram
Update: Sept 17, 2015 @ 10:30 ET -- To clarify: I'm speaking of socketed desktop Skylake. There will definitely be Iris Pro in the BGA options.
Before I begin, the upstream story has a few disputes that I'm not entirely sure on. The Tech Report published a post in September that cited an Intel spokesperson, who said that Skylake would not be getting a socketed processor with eDRAM (unlike Broadwell did just before Skylake launched). This could be a big deal, because the fast, on-processor cache could be used by the CPU as well as the RAM. It is sometimes called “128MB of L4 cache”.
Later, ITWorld and others posted stories that said Intel killed off a Skylake processor with eDRAM, citing The Tech Report. After, Scott Wasson claimed that a story, which may or may not be ITWorld's one, had some “scrambled facts” but wouldn't elaborate. Comparing the two articles doesn't really illuminate any massive, glaring issues, but I might just be missing something.
Update: Sept 18, 2015 @ 9:45pm -- So I apparently misunderstood the ITWorld article. They were claiming that Broadwell-C was discontinued, while The Tech Report was talking about Socketed Skylake with Iris Pro. I thought they both were talking about the latter. Moreover, Anandtech received word from Intel that Broadwell-C is, in fact, not discontinued. This is odd, because ITWorld said they had confirmation from Intel. My guess is that someone gave them incorrect information. Sorry that it took so long to update.
In the same thread, Ian Cutress of Anandtech asked whether The Tech Report benchmarked the processor after Intel tweaked its FCLK capabilities, which Scott did not (but is interested in doing so). Intel addressed a slight frequency boost between the CPU and PCIe lanes after Skylake shipped, which naturally benefits discrete GPUs. Since the original claim was that Broadwell-C is better than Skylake-K for gaming, giving a 25% boost to GPU performance (or removing a 20% loss, depending on how you look at it) could tilt Skylake back above Broadwell. We won't know until it's benchmarked, though.
Iris Pro and eDRAM, while skipping Skylake, might arrive in future architectures though, such as Kaby Lake. It seems to have been demonstrated that, in some situations, and ones relevant to gamers at that, that this boost in eDRAM can help computation -- without even considering the compute potential of a better secondary GPU. One argument is that cutting the extra die room gives Intel more margins, which is almost definitely true, but I wonder how much attention Kaby Lake will get. Especially with AVX-512 and other features being debatably removed, it almost feels like Intel is treating this Tock like a Tick, since they didn't really get one with Broadwell, and Kaby Lake will be the architecture that will lead us to 10nm. On the other hand, each of these architectures are developed by independent teams, so I might be wrong in comparing them serially.
That is a lotta SKUs!
The slow, gradual release of information about Intel's Skylake-based product portfolio continues forward. We have already tested and benchmarked the desktop variant flagship Core i7-6700K processor and also have a better understanding of the microarchitectural changes the new design brings forth. But today Intel's 6th Generation Core processors get a major reveal, with all the mobile and desktop CPU variants from 4.5 watts up to 91 watts, getting detailed specifications. Not only that, but it also marks the first day that vendors can announce and begin selling Skylake-based notebooks and systems!
All indications are that vendors like Dell, Lenovo and ASUS are still some weeks away from having any product available, but expect to see your feeds and favorite tech sites flooded with new product announcements. And of course with a new Apple event coming up soon...there should be Skylake in the new MacBooks this month.
Since I have already talked about the architecture and the performance changes from Haswell/Broadwell to Skylake in our 6700K story, today's release is just a bucket of specifications and information surround 46 different 6th Generation Skylake processors.
Intel's 6th Generation Core Processors
At Intel's Developer Forum in August, the media learned quite a bit about the new 6th Generation Core processor family including Intel's stance on how Skylake changes the mobile landscape.
Skylake is being broken up into 4 different line of Intel processors: S-series for desktop DIY users, H-series for mobile gaming machines, U-series for your everyday Ultrabooks and all-in-ones, Y-series for tablets and 2-in-1 detachables. (Side note: Intel does not reference an "Ultrabook" anymore. Huh.)
As you would expect, Intel has some impressive gains to claim with the new 6th Generation processor. However, it is important to put them in context. All of the claims above, including 2.5x performance, 30x graphics improvement and 3x longer battery life, are comparing Skylake-based products to CPUs from 5 years ago. Specifically, Intel is comparing the new Core i5-6200U (a 15 watt part) against the Core i5-520UM (an 18 watt part) from mid-2010.
Subject: Graphics Cards, Processors | August 30, 2015 - 09:14 PM | Scott Michaud
Tagged: amd, carrizo, Fiji, opencl, opencl 2.0
Apart from manufacturers with a heavy first-party focus, such as Apple and Nintendo, hardware is useless without developer support. In this case, AMD has updated their App SDK to include support for OpenCL 2.0, with code samples. It also updates the SDK for Windows 10, Carrizo, and Fiji, but it is not entirely clear how.
That said, OpenCL is important to those two products. Fiji has a very high compute throughput compared to any other GPU at the moment, and its memory bandwidth is often even more important for GPGPU workloads. It is also useful for Carrizo, because parallel compute and HSA features are what make it a unique product. AMD has been creating first-party software software and helping popular third-party developers such as Adobe, but a little support to the world at large could bring a killer application or two, especially from the open-source community.
The SDK has been available in pre-release form for quite some time now, but it is finally graduated out of beta. OpenCL 2.0 allows for work to be generated on the GPU, which is especially useful for tasks that vary upon previous results without contacting the CPU again.
Subject: Processors | August 26, 2015 - 02:40 PM | Jeremy Hellstrom
Tagged: Skylake, Intel, linux, Godavari
Using the GPU embedded in the vast majority of modern processors is a good way to reduce the price of and entry level system, as indeed is choosing Linux for your OS. Your performance is not going to match that of a system with a discrete GPU but with the newer GPU cores available you will be doing much better than the old days of the IGP. The first portion of Phoronix's review of the Skylake GPU covers the various versions of driver you can choose from while the rest compares Kaveri, Godavari, Haswell and Broadwell to the new HD530 on SkyLake CPUs. Currently the Iris Pro 6200 present on Broadwell is still the best for gaming, though the A10-7870K Godavari performance is also decent. Consider one of those two chips now, or await Iris Pro's possible arrival on a newer socketed processor if you are in no hurry.
"Intel's Core i5 6600K and i7 6700K processors released earlier this month feature HD Graphics 530 as the first Skylake graphics processor. Given that Intel's Open-Source Technology Center has been working on open-source Linux graphics driver support for over a year for Skylake, I've been quite excited to see how the Linux performance compares for Haswell and Broadwell as well as AMD's APUs on Linux."
Here are some more Processor articles from around the web:
- Intel Core i5 6600K Skylake Linux CPU Benchmarks @ Phoronix
- Intel Core i7-5775C Review @ Modders-Inc
- Intel Core i7-6700K Review: Inching Toward Extreme @ Modders-Inc
- Intel’s ‘Skylake’ Core i7-6700K: A Performance Look @ Techgage
- Intel Core i7 6700K "Skylake" Processor Review @HiTech Legion
- Intel Core i7-6700K Review @ Neoseeker
A third primary processor
As the Hot Chips conference begins in Cupertino this week, Qualcomm is set to divulge another set of information about the upcoming Snapdragon 820 processor. Earlier this month the company revealed details about the Adreno 5xx GPU architecture, showcasing improved performance and power efficiency while also adding a new Spectra 14-bit image processor. Today we shift to what Qualcomm calls the “third pillar in the triumvirate of programmable processors” that make up the Snapdragon SoC. The Hexagon DSP (digital signal processor), introduced initially by Qualcomm in 2004, has gone through a massive architecture shift and even programmability shift over the last 10 years.
Qualcomm believes that building a balanced SoC for mobile applications is all about heterogeneous computing with no one processor carrying the entire load. The majority of the work that any modern Snapdragon processor must handle goes through the primary CPU cores, the GPU or the DSP. We learned about upgrades to the Adreno 5xx series for the Snapdragon 820 and we are promised information about Kryo CPU architecture soon as well. But the Hexagon 600-series of DSPs actually deals with some of the most important functionality for smartphones and tablets: audio, voice, imaging and video.
Interestingly, Qualcomm opened up the DSP to programmability just four years ago, giving developers the ability to write custom code and software to take advantages of the specific performance capabilities that the DSP offers. Custom photography, videography and sound applications could benefit greatly in terms of performance and power efficiency if utilizing the QC DSP rather than the primary system CPU or GPU. As of this writing, Qualcomm claims there are “hundreds” of developers actively writing code targeting its family of Hexagon processors.
The Hexagon DSP in Snapdragon 820 consists of three primary partitions. The main compute DSP works in conjunction with the GPU and CPU cores and will do much of the heavy lifting for encompassed workloads. The modem DSP aids the cellular modem in communication throughput. The new guy here is the lower power DSP in the Low Power Island (LPI) that shifts how always-on sensors can communicate with the operating system.
Core and Interconnect
The Skylake architecture is Intel’s first to get a full release on the desktop in more than two years. While that might not seem like a long time in the grand scheme of technology, for our readers and viewers that is a noticeable change and shift from recent history that Intel has created with the tick-tock model of releases. Yes, Broadwell was released last year and was solid product, but Intel focused almost exclusively on the mobile platforms (notebooks and tablets) with it. Skylake will be much more ubiquitous and much more quickly than even Haswell.
Skylake represents Intel’s most scalable architecture to date. I don’t mean only frequency scaling, though that is an important part of this design, but rather in terms of market segment scaling. Thanks to brilliant engineering and design from Intel’s Israeli group Intel will be launching Skylake designs ranging from 4.5 watt TDP Core M solutions all the way up to the 91 watt desktop processors that we have already reviewed in the Core i7-6700K. That’s a range that we really haven’t seen before and in the past Intel has depended on the Atom architecture to make up ground on the lowest power platforms. While I don’t know for sure if Atom is finally trending towards the dodo once Skylake’s reign is fully implemented, it does make me wonder how much life is left there.
Scalability also refers to the package size – something that ensures that the designs the engineers created can actually be built and run in the platform segments they are targeting. Starting with the desktop designs for LGA platforms (DIY market) that fits on a 1400 mm2 design on the 91 watt TDP implementation Intel is scaling all the way down to 330 mm2 in a BGA1515 package for the 4.5 watt TDP designs. Only with a total product size like that can you hope to get Skylake in a form factor like the Compute Stick – which is exactly what Intel is doing. And note that the smaller packages require the inclusion of the platform IO chip as well, something that H- and S-series CPUs can depend on the motherboard to integrate.
Finally, scalability will also include performance scaling. Clearly the 4.5 watt part will not offer the user the same performance with the same goals as the 91 watt Core i7-6700K. The screen resolution, attached accessories and target applications allow Intel to be selective about how much power they require for each series of Skylake CPUs.
The fundamental design theory in Skylake is very similar to what exists today in Broadwell and Haswell with a handful of significant and hundreds of minor change that make Skylake a large step ahead of previous designs.
This slide from Julius Mandelblat, Intel Senior Principle Engineer, shows a higher level overview of the entirety of the consumer integration of Skylake. You can see that Intel’s goals included a bigger and wider core design, higher frequency, improved right architecture and fabric design and more options for eDRAM integration. Readers of PC Perspective will already know that Skylake supports both DDR3L and DDR4 memory technologies but the inclusion of the camera ISP is new information for us.
I knew that the move to DirectX 12 was going to be a big shift for the industry. Since the introduction of the AMD Mantle API along with the Hawaii GPU architecture we have been inundated with game developers and hardware vendors talking about the potential benefits of lower level APIs, which give more direct access to GPU hardware and enable more flexible threading for CPUs to game developers and game engines. The results, we were told, would mean that your current hardware would be able to take you further and future games and applications would be able to fundamentally change how they are built to enhance gaming experiences tremendously.
I knew that the reader interest in DX12 was outstripping my expectations when I did a live blog of the official DX12 unveil by Microsoft at GDC. In a format that consisted simply of my text commentary and photos of the slides that were being shown (no video at all), we had more than 25,000 live readers that stayed engaged the whole time. Comments and questions flew into the event – more than me or my staff could possible handle in real time. It turned out that gamers were indeed very much interested in what DirectX 12 might offer them with the release of Windows 10.
Today we are taking a look at the first real world gaming benchmark that utilized DX12. Back in March I was able to do some early testing with an API-specific test that evaluates the overhead implications of DX12, DX11 and even AMD Mantle from Futuremark and 3DMark. This first look at DX12 was interesting and painted an amazing picture about the potential benefits of the new API from Microsoft, but it wasn’t built on a real game engine. In our Ashes of the Singularity benchmark testing today, we finally get an early look at what a real implementation of DX12 looks like.
And as you might expect, not only are the results interesting, but there is a significant amount of created controversy about what those results actually tell us. AMD has one story, NVIDIA another and Stardock and the Nitrous engine developers, yet another. It’s all incredibly intriguing.
It comes after 8, but before 10
As the week of Intel’s Developer Forum (IDF) begins, you can expect to see a lot of information about Intel’s 6th Generation Core architecture, codenamed Skylake, finally revealed. When I posted my review of the Core i7-6700K, the first product based on that architecture to be released in any capacity, I was surprised that Intel was willing to ship product without the normal amount of background information for media and developers. Rather than give us the details and then ship product, which has happened for essentially every consumer product release I have been a part of, Intel did the reverse: ship a consumer friendly CPU and then promise to tell us how it all works later in the month at IDF.
Today I came across a document posted on Intel’s website that dives into very specific detail on the new Gen9 graphics and compute architecture of Skylake. Details on the Core architecture changes are not present, and instead we are given details on how the traditional GPU portion of the SoC has changed. To be clear: I haven’t had any formal briefing from Intel on this topic or anything surrounding the architecture of Skylake or the new Gen9 graphics system but I wanted to share the details we found available. I am sure we’ll learn more this week as IDF progresses so I will update this story where necessary.
What Intel calls Processor Graphics is what we used to call simply integrated graphics for the longest time. The purpose and role of processor graphics has changed drastically over the years and it is now not only responsible for 3D graphics rendering but compute, media and display capabilities of the Intel Skylake SoC (when discrete add-in graphics is not used). The architecture document used to source this story focuses on Gen9 graphics, the compute architecture utilized in the latest Skylake CPUs. The Intel HD Graphics 530 on the Core i7-6700K / Core i5-6600K is the first product released and announced using Gen9 graphics and is also the first to adopt Intel’s new 3-digit naming scheme.
This die shot of the Core i7-6700K shows the increased size and prominence of the Gen9 graphics in the overall SoC design. Containing four traditional x86 CPU cores and 1 “slice” implementation of Gen9 graphics (with three visible sub-slices we’ll describe below), this is not likely to be the highest performing iteration of the latest Intel HD Graphics technology.
Like the Intel processors before it, the Skylake design utilizes a ring bus architecture to connect the different components of the SoC. This bi-directional interconnect has a 32-byte wide data bus and connects to multiple “agents” on the CPU. Each individual CPU core is considered its own agent while the Gen9 compute architecture is considered one complete agent. The system agent bundles the DRAM memory, the display controller, PCI Express and other I/O interface that communicate with the rest of the PC. Any off-chip memory requests and transactions occur through this bus while on-chip data transfers tend to be handled differently.