All | Editorial | General Tech | Graphics Cards | Networking | Motherboards | Cases and Cooling | Processors | Chipsets | Memory | Displays | Systems | Storage | Mobile | Shows and Expos
Gunning for Broadwell-E
As I walked away from the St. Regis in downtown San Francisco tonight, I found myself wandering through the streets towards my hotel with something unique in tow. It was a smile. I was smiling, thinking about what AMD had just demonstrated and showed at its latest Zen processor reveal. The importance of this product launch can literally not be overstated for a company struggling to find a foothold to hang on to in a market that it once had a definitive lead. It’s been many years since I left a conference call, or a meeting, or a press conference feeling genuinely hopefully and enthusiastic about what AMD has shown me. Tonight I had that.
AMD’s CEO Lisa Su, and CTO Mark Papermaster, took stage down the street from the Intel Developer Forum to roll out a handful of new architectural details about the Zen architecture while also showing the first performance results comparing it to competing parts from Intel. The crowd in attendance, a mix of media and analysts, were impressed. The feeling was palpable in the room.
It’s late as I write this, and while there are some interesting architecture details to discuss, I think it is in everyone’s best interest that we touch on them lightly for now, and instead refocus on the deep-dive once the Hot Chips information comes out early next week. What you really want to know is clear: can Zen make Intel work again? Can Zen make that $1700 price tag on the Broadwell-E 6950X seem even more ludicrous? Yes.
The Zen Architecture
Much of what was discussed from the Zen architecture is a re-release of what has been out in recent months. This is a completely new, from the ground up, microarchitecture and not a revamp of the aging Bulldozer design. It integrated SMT (simultaneous multi-threading), a first for an AMD CPU, to better take efficient advantage of a longer pipeline. Intel has had HyperThreading for a long time now and AMD is finally joining the fold. A high bandwidth and low latency caching system is used to “feed the beast” as Papermaster put it and utilizing 14nm process technology (starting at Global Foundries) gives efficiency, and scaling a significant bump while enabling AMD to scale from notebooks to desktops to servers with the same architecture.
By far the most impressive claim from AMD thus far was that of a 40% increase in IPC over previous AMD designs. That’s a HUGE claim and is key to the success or failure of Zen. AMD proved to me today that the claims are real and that we will see the immediate impact of that architecture bump from day one.
Press was told of a handful of high level changes to the new architecture as well. Branch prediction gets a complete overhaul. This marks the first AMD processor to have a micro-op cache. Wider execution width with broader instruction schedulers are integrated, all of which adds up to much higher instruction level parallelism to improve single threaded performance.
Performance improvements aside, throughput and efficiency go up with Zen as well. AMD has integrated an 8MB L3 cache and improved prefetching for up 5x the cache bandwidth available per core on the CPU. SMT makes sure the pipeline stays full to prevent “bubbles” that introduce latency and lower efficiency while region-specific power gating means that we’ll see Zen in notebooks as well as enterprise servers in 2017. It truly is an impressive design from AMD.
Summit Ridge, the enthusiast platform that will be the first product available with Zen, is based on the AM4 platform and processors will go up to 8-cores and 16-threads. DDR4 memory support is included, PCI Express 3.0 and what AMD calls “next-gen” IO – I would expect a quick leap forward for AMD to catch up on things like NVMe and Thunderbolt.
The Real Deal – Zen Performance
As part of today’s reveal, AMD is showing the first true comparison between Zen and Intel processors. Sure, AMD showed a Zen-powered system running the upcoming Deus Ex running at 4K with a system powered by the Fury X, but the really impressive results where shown when comparing Zen to a Broadwell-E platform.
Using Blender to measure the performance of a rendering workload (a Zen CPU mockup of course), AMD ran an 8-core / 16-thread Zen processor at 3.0 GHz against an 8-core / 16-thread Broadwell-E processor at 3.0 GHz (likely a fixed clocked Core i7-6900K). The point of the demonstration was to showcase the IPC improvements of Zen and it worked: the render completed on the Zen platform a second or two faster than it did on the Intel Broadwell-E system.
Not much to look at, but Zen on the left, Broadwell-E on the right...
Of course there are lots of caveats: we didn’t setup the systems, I don’t know for sure that GPUs weren’t involved, we don’t know the final clocks of the Zen processors releasing in early 2017, etc. But I took two things away from the demonstration that are very important.
- The IPC of Zen is on-par or better than Broadwell.
- Zen will scale higher than 3.0 GHz in 8-core configurations.
AMD obviously didn’t state what specific SKUs were going to launch with the Zen architecture, what clock speeds they would run at, or even what TDPs they were targeting. Instead we were left with a vague but understandable remark of “comparable TDPs to Broadwell-E”.
Pricing? Overclocking? We’ll just have to wait a bit longer for that kind of information.
There is clearly a lot more for AMD to share about Zen but the announcement and showcase made this week with the early prototype products have solidified for me the capability and promise of this new microarchitecture. We have asked for, and needed, as an industry, a competitor to Intel in the enthusiast CPU space – something we haven’t legitimately had since the Athlon X2 days. Zen is what we have been pining over, what gamers and consumers have needed.
AMD’s processor stars might finally be aligning for a product that combines performance, efficiency and scalability at the right time. I’m ready for it –are you?
Bristol Ridge Takes on Mobile: E2 Through FX
It is no secret that AMD has faced an uphill battle since the release of the original Core 2 processors from Intel. While stayed mostly competitive through the Phenom II years, they hit some major performance issues when moving to the Bulldozer architecture. While on paper the idea of Chip Multi-Threading sounded fantastic, AMD was never able to get the per thread performance up to expectations. While their CPUs performed well in heavily multi-threaded applications, they just were never seen in as positive of a light as the competing Intel products.
The other part of the performance equation that has hammered AMD is the lack of a new process node that would allow it to more adequately compete with Intel. When AMD was at 32 nm PD-SOI, Intel had introduced its 22nm TriGate/FinFET. AMD then transitioned to a 28nm HKMG planar process that was more size optimized than 32nm, but did not drastically improve upon power and transistor switching performance.
So AMD had a double whammy on their hands with an underperforming architecture and limitted to no access to advanced process nodes that would actually improve their power and speed situation. They could not force their foundry partners to spend billions on a crash course in FinFET technology to bring that to market faster, so they had to iterate and innovate on their designs.
Bristol Ridge is the fruit of that particular labor. It is also the end point to the architecture that was introduced with Bulldozer way back in 2011.
It has been nearly two years since the release of the Haswell-E platform, which began with the launch of the Core i7-5960X processor. Back then, the introduction of an 8-core consumer processor was the primary selling point; along with the new X99 chipset and DDR4 memory support. At the time, I heralded the processor as “easily the fastest consumer processor we have ever had in our hands” and “nearly impossible to beat.” So what has changed over the course of 24 months?
Today Intel is launching Broadwell-E, the follow up to Haswell-E, and things look very much the same as they did before. There are definitely a couple of changes worth noting and discussing, including the move to a 10-core processor option as well as Turbo Boost Max Technology 3.0, which is significantly more interesting than its marketing name implies. Intel is sticking with the X99 platform (good for users that might want to upgrade), though the cost of these new processors is more than slightly disappointing based on trends elsewhere in the market.
This review of the new Core i7-6950X 10-core Broadwell-E processor is going to be quick, and to the point: what changes, what is the performance, how does it overclock, and what will it cost you?
10nm Sooner Than Expected?
It seems only yesterday that we had the first major GPU released on 16nm FF+ and now we are talking about ARM about to receive their first 10nm FF test chips! Well, in fact it was yesterday that NVIDIA formally released performance figures on the latest GeForce GTX 1080 which is based on TSMC’s 16nm FF+ process technology. Currently TSMC is going full bore on their latest process node and producing the fastest current graphics chip around. It has taken the foundry industry as a whole a lot longer to develop FinFET technology than expected, but now that they have that piece of the puzzle seemingly mastered they are moving to a new process node at an accelerated rate.
TSMC’s 10nm FF is not well understood by press and analysts yet, but we gather that it is more of a marketing term than a true drop to 10 nm features. Intel has yet to get past 14nm and does not expect 10 nm production until well into next year. TSMC is promising their version in the second half of 2016. We cannot assume that TSMC’s version will match what Intel will be doing in terms of geometries and electrical characteristics, but we do know that it is a step past TSMC’s 16nm FF products. Lithography will likely get a boost with triple patterning exposure. My guess is that the back end will also move away from the “20nm metal” stages that we see with 16nm. All in all, it should be an improved product from what we see with 16nm, but time will tell if it can match the performance and density of competing lines that bear the 10nm name from Intel, Samsung, and GLOBALFOUNDRIES.
ARM has a history of porting their architectures to new process nodes, but they are being a bit more aggressive here than we have seen in the past. It used to be that ARM would announce a new core or technology, and it would take up to two years to be introduced into the market. Now we are seeing technology announcements and actual products hitting the scenes about nine months later. With the mobile market continuing to grow we expect to see products quicker to market still.
The company designed a simplified test chip to tape out and send to TSMC for test production on the aforementioned 10nm FF process. The chip was taped out in December, 2015. The design was shipped to TSMC for mask production and wafer starts. ARM is expecting the finished wafers to arrive this month.
Lower Power, Same Performance
AMD is in a strange position in that there is a lot of excitement about their upcoming Zen architecture, but we are still many months away from that introduction. AMD obviously needs to keep the dollars flowing in, and part of that means that we get refreshes now and then of current products. The “Kaveri” products that have been powering the latest APUs from AMD have received one of those refreshes. AMD has done some redesigning of the chip and tweaked the process technology used to manufacture them. The resulting product is the “Godavari” refresh that offers slightly higher clockspeeds as well as better overall power efficiency as compared to the previous “Kaveri” products.
One of the first refreshes was the A8-7670K that hit the ground in November of 2015. This is a slightly cut down part that features 6 GPU compute units vs. the 8 that a fully enabled Godavari chip has. This continues to be a FM2+ based chip with a 95 watt TDP. The clockspeed of this part goes from 3.6 GHz to 3.9 GHz. The GPU portion runs at the same 757 MHz that the original A10-7850K ran at. It is interesting to note that it is still a 95 watt TDP part with essentially the same clockspeeds as the 7850K, but with two fewer GPU compute units.
The other product being covered here is a bit more interesting. The A10-7860K looks to be a larger improvement from the previous 7850K in terms of power and performance. It shares the same CPU clockspeed range as the 7850K (3.6 GHz to 3.9 GHz), but improves upon the GPU clockspeed by hitting around 800 MHz. At first this seems underwhelming until we realize that AMD has lowered the TDP from 95 watts down to 65 watts. Less power consumed and less heat produced for the same performance from the CPU side and improved performance from the GPU seems like a nice advance.
AMD continues to utilize GLOBALFOUNDRIES 28 nm Bulk/HKMG process for their latest APUs and will continue to do so until Zen is released late this year. This is not the same 28 nm process that we were introduced to over four years ago. Over that time improvements have been made to improve yields and bins, as well as optimize power and clockspeed. GF also can adjust the process on a per batch basis to improve certain aspects of a design (higher speed, more leakage, lower power, etc.). They cannot produce miracles though. Do not expect 22 nm FinFET performance or density with these latest AMD products. Those kinds of improvements will show up with Samsung/GF’s 14nm LPP and TSMC’s 16nm FF+ lines. While AMD will be introducing GPUs on 14nm LPP this summer, the Zen launch in late 2016 will be the first AMD CPU to utilize that advanced process.
Clockspeed Jump and More!
On March 1st AMD announced the availability of two new processors as well as more information on the A10 7860 APU.
The two new units are the A10-7890K and the Athlon X4 880K. These are both Kaveri based parts, but of course the Athlon has the GPU portion disabled. Product refreshes for the past several years have followed a far different schedule than the days of yore. Remember back in time when the Phenom II series and the competing Core 2 series would have clockspeed updates that were expected yearly, if not every half year with a slightly faster top end performer to garner top dollar from consumers?
Things have changed, for better or worse. We have so far seen two clockspeed bumps for the Kaveri /Godavari based APU. Kaveri was first introduced over two years ago with the A10-7850K and the lower end derivatives. The 7850K has a clockspeed that ranges from 3.7 GHz to the max 4 GHz with boost. The GPU portion is clocked at 720 MHz. This is a 95 watt TDP part that is one of the introductory units from GLOBALFOUNDRIES 28 nm HKMG process.
Today the new top end A10-7890K is clocked at 4.1 GHz to 4.3 GHz max. The GPU receives a significant boost in performance with a clockspeed of 866 MHz. The combination of CPU and GPU clockspeed increases push the total performance of the part exceeding 1 TFLOPs. It features the same dual module/quad core Godavari design as well as the 8 GCN Units. The interesting part here is that the APU does not exceed the 95 watt TDP that it shares with the older and slower 7850K. It is also a boost in performance from last year’s refresh of the A10-7870K which is clocked 200 MHz slower on the CPU portion but retains the 866 MHz speed of the GPU. This APU is fully unlocked so a user can easily overclock both the CPU and GPU cores.
The Athlon X4 880K is still based on the Godavari family rather than the Carizzo update that the X4 845 uses. This part is clocked from 4.0 to 4.2 GHz. It again retains the 95 watt TDP rating of the previous Athlon X4 CPUs. Previously the X4 860K was the highest clocked unit at 3.7 GHz to 4.0, but the 880K raises that to 4 to 4.2 GHz. A 300 MHz gain in base clock is pretty significant as well as stretching that ceiling to 4.2 GHz. The Godavari modules retain their full amount of L2 cache so the 880K has 4 MB available to it. These parts are very popular with budget enthusiasts and gaming builds as they are extremely inexpensive and perform at an acceptable level with free overclocking thrown in.
AMD Keeps Q1 Interesting
CES 2016 was not a watershed moment for AMD. They showed off their line of current video cards and, perhaps more importantly, showed off working Polaris silicon, which will be their workhorse for 2016 in the graphics department. They did not show off Zen, a next generation APU, or any AM4 motherboards. The CPU and APU world was not presented in a way that was revolutionary. What they did show off, however, hinted at the things to come to help keep AMD relevant in the desktop space.
It was odd to see an announcement about the stock cooler that AMD was introducing, but when we learned more about it, the more important it was for AMD’s reputation moving forward. The Wraith cooler is a new unit to help control the noise and temperatures of the latest AMD CPUs and select APUs. This is a fairly beefy unit with a large, slow moving fan that produces very little noise. This is a big change from the variable speed fans on previous coolers that could get rather noisy and leave temperatures that were higher in range than are comfortable. There has been some derision aimed at AMD for providing “just a cooler” for their top end products, but it is a push that is making them more user and enthusiast friendly without breaking the bank.
Socket AM3+ is not dead yet. Though we have been commenting on the health of the platform for some time, AMD and its partners work to improve and iterate upon these products to include technologies such as USB 3.1 and M.2 support. While these chipsets are limited to PCI-E 2.0 speeds, the four lanes available to most M.2 controllers allows these boards to provide enough bandwidth to fully utilize the latest NVMe based M.2 drives available. We likely will not see a faster refresh on AM3+, but we will see new SKUs utilizing the Wraith cooler as well as a price break for the processors that exist in this socket.
Are Computers Still Getting Faster?
It looks like CES is starting to wind down, which makes sense because it ended three days ago. Now that we're mostly caught up, I found a new video from The 8-Bit Guy. He doesn't really explain any old technologies in this one. Instead, he poses an open question about computer speed. He was able to have a functional computing experience on a ten-year-old Apple laptop, which made him wonder if the rate of computer advancement is slowing down.
I believe that he (and his guest hosts) made great points, but also missed a few important ones.
One of his main arguments is that software seems to have slowed down relative to hardware. I don't believe that is true, but I believe it's looking in the right area. PCs these days are more than capable of doing just about anything in terms of 2D user interface that we would want to, and do so with a lot of overhead for inefficient platforms and sub-optimal programming (relative to the 80's and 90's at the very least). The areas that require extra horsepower are usually doing large batches of many related tasks. GPUs are key in this area, and they are keeping up as fast as they can, despite some stagnation with fabrication processes and a difficulty (at least before HBM takes hold) in keeping up with memory bandwidth.
For the last five years to ten years or so, CPUs have been evolving toward efficiency as GPUs are being adopted for the tasks that need to scale up. I'm guessing that AMD, when they designed the Bulldozer architecture, hoped that GPUs would have been adopted much more aggressively, but even as graphics devices, they now have a huge effect on Web, UI, and media applications.
These are also tasks that can scale well between devices by lowering resolution (and so forth). The primary thing that a main CPU thread needs to do is figure out the system's state and keep the graphics card fed before the frame-train leaves the station. In my experience, that doesn't scale well (although you can sometimes reduce the amount of tracked objects for games and so forth). Moreover, it is easier to add GPU performance, compared to single-threaded CPU, because increasing frequency and single-threaded IPC should be more complicated than planning out more, duplicated blocks of shaders. These factors combine to give lower-end hardware a similar experience in the most noticeable areas.
So, up to this point, we discussed:
- Software is often scaling in ways that are GPU (and RAM) limited.
- CPUs are scaling down in power more than up in performance.
- GPU-limited tasks can often be approximated with smaller workloads.
- Software gets heavier, but it doesn't need to be "all the way up" (ex: resolution).
- Some latencies are hard to notice anyway.
Back to the Original Question
This is where “Are computers still getting faster?” can be open to interpretation.
Tasks are diverging from one class of processor into two, and both have separate industries, each with their own, multiple goals. As stated, CPUs are mostly progressing in power efficiency, which extends (an assumed to be) sufficient amount of performance downward to multiple types of devices. GPUs are definitely getting faster, but they can't do everything. At the same time, RAM is plentiful but its contribution to performance can be approximated with paging unused chunks to the hard disk or, more recently on Windows, compressing them in-place. Newer computers with extra RAM won't help as long as any single task only uses a manageable amount of it -- unless it's seen from a viewpoint that cares about multi-tasking.
In short, computers are still progressing, but the paths are now forked and winding.
May the Radeon be with You
In celebration of the release of The Force Awakens as well as the new Star Wars Battlefront game from DICE and EA, AMD sent over some hardware for us to use in a system build, targeted at getting users up and running in Battlefront with impressive quality and performance, but still on a reasonable budget. Pairing up an AMD processor, MSI motherboard, Sapphire GPU with a low cost chassis, SSD and more, the combined system includes a FreeSync monitor for around $1,200.
Holiday breaks are MADE for Star Wars Battlefront
Though the holiday is already here and you'd be hard pressed to build this system in time for it, I have a feeling that quite a few of our readers and viewers will find themselves with some cash and gift certificates in hand, just ITCHING for a place to invest in a new gaming PC.
The video above includes a list of components, the build process (in brief) and shows us getting our gaming on with Star Wars Battlefront. Interested in building a system similar the one above on your own? Here's the hardware breakdown.
|AMD Powered Star Wars Battlefront System|
|Processor||AMD FX-8370 - $197
Cooler Master Hyper 212 EVO - $29
|Motherboard||MSI 990FXA Gaming - $137|
|Memory||AMD Radeon Memory DDR3-2400 - $79|
|Graphics Card||Sapphire NITRO Radeon R9 380X - $266|
|Storage||SanDisk Ultra II 240GB SSD - $79|
|Case||Corsair Carbide 300R - $68|
|Power Supply||Seasonic 600 watt 80 Plus - $69|
|Monitor||AOC G2460PF 1920x1080 144Hz FreeSync - $259|
|Total Price||Full System (without monitor) - Amazon.com - $924|
For under $1,000, plus another $250 or so for the AOC FreeSync capable 1080p monitor, you can have a complete gaming rig for your winter break. Let's detail some of the specific components.
AMD sent over the FX-8370 processor for our build, a 4-module / 8-core CPU that runs at 4.0 GHz, more than capable of handling any gaming work load you can toss at it. And if you need to do some transcoding, video work or, heaven forbid, school or productivity work, the FX-8370 has you covered there too.
For the motherboard AMD sent over the MSI 990FXA Gaming board, one of the newer AMD platforms that includes support for USB 3.1 so you'll have a good length of usability for future expansion. The Cooler Master Hyper 212 EVO cooler was our selection to keep the FX-8370 running smoothly and 8GB of AMD Radeon DDR3-2133 memory is enough for the system to keep applications and the Windows 10 operating system happy.
Skylake Architecture Comes Through
When Intel finally revealed the details surrounding it's latest Skylake architecture design back in August at IDF, we learned for the first time about a new technology called Intel Speed Shift. A feature that moves some of the control of CPU clock speed and ramp up away from the operating system and into hardware gives more control to the processor itself, making it less dependent on Windows (and presumably in the future, other operating systems). This allows the clock speed of a Skylake processor to get higher, faster, allowing for better user responsiveness.
It's pretty clear that Intel is targeting this feature addition for tablets and 2-in-1s where the finger/pen to screen interaction is highly reliant on immediate performance to enable improved user experiences. It has long been known that one of the biggest performance deltas between iOS from Apple and Android from Google centers on the ability for the machine to FEEL faster when doing direct interaction, regardless of how fast the background rendering of an application or web browser actually is. Intel has been on a quest to fix this problem for Android for some time, where it has the ability to influence software development, and now they are bringing that emphasis to Windows 10.
With the most recent Windows 10 update, to build v10586, Intel Speed Shift has finally been enabled for Skylake users. And since you cannot disable the feature once it's installed, this is the one and only time we'll be able to measure performance in our test systems. So let's see if Intel's claims of improved user experiences stand up to our scrutiny.
That is a lotta SKUs!
The slow, gradual release of information about Intel's Skylake-based product portfolio continues forward. We have already tested and benchmarked the desktop variant flagship Core i7-6700K processor and also have a better understanding of the microarchitectural changes the new design brings forth. But today Intel's 6th Generation Core processors get a major reveal, with all the mobile and desktop CPU variants from 4.5 watts up to 91 watts, getting detailed specifications. Not only that, but it also marks the first day that vendors can announce and begin selling Skylake-based notebooks and systems!
All indications are that vendors like Dell, Lenovo and ASUS are still some weeks away from having any product available, but expect to see your feeds and favorite tech sites flooded with new product announcements. And of course with a new Apple event coming up soon...there should be Skylake in the new MacBooks this month.
Since I have already talked about the architecture and the performance changes from Haswell/Broadwell to Skylake in our 6700K story, today's release is just a bucket of specifications and information surround 46 different 6th Generation Skylake processors.
Intel's 6th Generation Core Processors
At Intel's Developer Forum in August, the media learned quite a bit about the new 6th Generation Core processor family including Intel's stance on how Skylake changes the mobile landscape.
Skylake is being broken up into 4 different line of Intel processors: S-series for desktop DIY users, H-series for mobile gaming machines, U-series for your everyday Ultrabooks and all-in-ones, Y-series for tablets and 2-in-1 detachables. (Side note: Intel does not reference an "Ultrabook" anymore. Huh.)
As you would expect, Intel has some impressive gains to claim with the new 6th Generation processor. However, it is important to put them in context. All of the claims above, including 2.5x performance, 30x graphics improvement and 3x longer battery life, are comparing Skylake-based products to CPUs from 5 years ago. Specifically, Intel is comparing the new Core i5-6200U (a 15 watt part) against the Core i5-520UM (an 18 watt part) from mid-2010.
A third primary processor
As the Hot Chips conference begins in Cupertino this week, Qualcomm is set to divulge another set of information about the upcoming Snapdragon 820 processor. Earlier this month the company revealed details about the Adreno 5xx GPU architecture, showcasing improved performance and power efficiency while also adding a new Spectra 14-bit image processor. Today we shift to what Qualcomm calls the “third pillar in the triumvirate of programmable processors” that make up the Snapdragon SoC. The Hexagon DSP (digital signal processor), introduced initially by Qualcomm in 2004, has gone through a massive architecture shift and even programmability shift over the last 10 years.
Qualcomm believes that building a balanced SoC for mobile applications is all about heterogeneous computing with no one processor carrying the entire load. The majority of the work that any modern Snapdragon processor must handle goes through the primary CPU cores, the GPU or the DSP. We learned about upgrades to the Adreno 5xx series for the Snapdragon 820 and we are promised information about Kryo CPU architecture soon as well. But the Hexagon 600-series of DSPs actually deals with some of the most important functionality for smartphones and tablets: audio, voice, imaging and video.
Interestingly, Qualcomm opened up the DSP to programmability just four years ago, giving developers the ability to write custom code and software to take advantages of the specific performance capabilities that the DSP offers. Custom photography, videography and sound applications could benefit greatly in terms of performance and power efficiency if utilizing the QC DSP rather than the primary system CPU or GPU. As of this writing, Qualcomm claims there are “hundreds” of developers actively writing code targeting its family of Hexagon processors.
The Hexagon DSP in Snapdragon 820 consists of three primary partitions. The main compute DSP works in conjunction with the GPU and CPU cores and will do much of the heavy lifting for encompassed workloads. The modem DSP aids the cellular modem in communication throughput. The new guy here is the lower power DSP in the Low Power Island (LPI) that shifts how always-on sensors can communicate with the operating system.
Core and Interconnect
The Skylake architecture is Intel’s first to get a full release on the desktop in more than two years. While that might not seem like a long time in the grand scheme of technology, for our readers and viewers that is a noticeable change and shift from recent history that Intel has created with the tick-tock model of releases. Yes, Broadwell was released last year and was solid product, but Intel focused almost exclusively on the mobile platforms (notebooks and tablets) with it. Skylake will be much more ubiquitous and much more quickly than even Haswell.
Skylake represents Intel’s most scalable architecture to date. I don’t mean only frequency scaling, though that is an important part of this design, but rather in terms of market segment scaling. Thanks to brilliant engineering and design from Intel’s Israeli group Intel will be launching Skylake designs ranging from 4.5 watt TDP Core M solutions all the way up to the 91 watt desktop processors that we have already reviewed in the Core i7-6700K. That’s a range that we really haven’t seen before and in the past Intel has depended on the Atom architecture to make up ground on the lowest power platforms. While I don’t know for sure if Atom is finally trending towards the dodo once Skylake’s reign is fully implemented, it does make me wonder how much life is left there.
Scalability also refers to the package size – something that ensures that the designs the engineers created can actually be built and run in the platform segments they are targeting. Starting with the desktop designs for LGA platforms (DIY market) that fits on a 1400 mm2 design on the 91 watt TDP implementation Intel is scaling all the way down to 330 mm2 in a BGA1515 package for the 4.5 watt TDP designs. Only with a total product size like that can you hope to get Skylake in a form factor like the Compute Stick – which is exactly what Intel is doing. And note that the smaller packages require the inclusion of the platform IO chip as well, something that H- and S-series CPUs can depend on the motherboard to integrate.
Finally, scalability will also include performance scaling. Clearly the 4.5 watt part will not offer the user the same performance with the same goals as the 91 watt Core i7-6700K. The screen resolution, attached accessories and target applications allow Intel to be selective about how much power they require for each series of Skylake CPUs.
The fundamental design theory in Skylake is very similar to what exists today in Broadwell and Haswell with a handful of significant and hundreds of minor change that make Skylake a large step ahead of previous designs.
This slide from Julius Mandelblat, Intel Senior Principle Engineer, shows a higher level overview of the entirety of the consumer integration of Skylake. You can see that Intel’s goals included a bigger and wider core design, higher frequency, improved right architecture and fabric design and more options for eDRAM integration. Readers of PC Perspective will already know that Skylake supports both DDR3L and DDR4 memory technologies but the inclusion of the camera ISP is new information for us.
I knew that the move to DirectX 12 was going to be a big shift for the industry. Since the introduction of the AMD Mantle API along with the Hawaii GPU architecture we have been inundated with game developers and hardware vendors talking about the potential benefits of lower level APIs, which give more direct access to GPU hardware and enable more flexible threading for CPUs to game developers and game engines. The results, we were told, would mean that your current hardware would be able to take you further and future games and applications would be able to fundamentally change how they are built to enhance gaming experiences tremendously.
I knew that the reader interest in DX12 was outstripping my expectations when I did a live blog of the official DX12 unveil by Microsoft at GDC. In a format that consisted simply of my text commentary and photos of the slides that were being shown (no video at all), we had more than 25,000 live readers that stayed engaged the whole time. Comments and questions flew into the event – more than me or my staff could possible handle in real time. It turned out that gamers were indeed very much interested in what DirectX 12 might offer them with the release of Windows 10.
Today we are taking a look at the first real world gaming benchmark that utilized DX12. Back in March I was able to do some early testing with an API-specific test that evaluates the overhead implications of DX12, DX11 and even AMD Mantle from Futuremark and 3DMark. This first look at DX12 was interesting and painted an amazing picture about the potential benefits of the new API from Microsoft, but it wasn’t built on a real game engine. In our Ashes of the Singularity benchmark testing today, we finally get an early look at what a real implementation of DX12 looks like.
And as you might expect, not only are the results interesting, but there is a significant amount of created controversy about what those results actually tell us. AMD has one story, NVIDIA another and Stardock and the Nitrous engine developers, yet another. It’s all incredibly intriguing.
It comes after 8, but before 10
As the week of Intel’s Developer Forum (IDF) begins, you can expect to see a lot of information about Intel’s 6th Generation Core architecture, codenamed Skylake, finally revealed. When I posted my review of the Core i7-6700K, the first product based on that architecture to be released in any capacity, I was surprised that Intel was willing to ship product without the normal amount of background information for media and developers. Rather than give us the details and then ship product, which has happened for essentially every consumer product release I have been a part of, Intel did the reverse: ship a consumer friendly CPU and then promise to tell us how it all works later in the month at IDF.
Today I came across a document posted on Intel’s website that dives into very specific detail on the new Gen9 graphics and compute architecture of Skylake. Details on the Core architecture changes are not present, and instead we are given details on how the traditional GPU portion of the SoC has changed. To be clear: I haven’t had any formal briefing from Intel on this topic or anything surrounding the architecture of Skylake or the new Gen9 graphics system but I wanted to share the details we found available. I am sure we’ll learn more this week as IDF progresses so I will update this story where necessary.
What Intel calls Processor Graphics is what we used to call simply integrated graphics for the longest time. The purpose and role of processor graphics has changed drastically over the years and it is now not only responsible for 3D graphics rendering but compute, media and display capabilities of the Intel Skylake SoC (when discrete add-in graphics is not used). The architecture document used to source this story focuses on Gen9 graphics, the compute architecture utilized in the latest Skylake CPUs. The Intel HD Graphics 530 on the Core i7-6700K / Core i5-6600K is the first product released and announced using Gen9 graphics and is also the first to adopt Intel’s new 3-digit naming scheme.
This die shot of the Core i7-6700K shows the increased size and prominence of the Gen9 graphics in the overall SoC design. Containing four traditional x86 CPU cores and 1 “slice” implementation of Gen9 graphics (with three visible sub-slices we’ll describe below), this is not likely to be the highest performing iteration of the latest Intel HD Graphics technology.
Like the Intel processors before it, the Skylake design utilizes a ring bus architecture to connect the different components of the SoC. This bi-directional interconnect has a 32-byte wide data bus and connects to multiple “agents” on the CPU. Each individual CPU core is considered its own agent while the Gen9 compute architecture is considered one complete agent. The system agent bundles the DRAM memory, the display controller, PCI Express and other I/O interface that communicate with the rest of the PC. Any off-chip memory requests and transactions occur through this bus while on-chip data transfers tend to be handled differently.
Light on architecture details
Our Intel Skylake launch coverage is intense! Make sure you hit up all the stories and videos that are interesting for you!
- The Intel Core i7-6700K Review - Skylake First for Enthusiasts (Video)
- Skylake vs. Sandy Bridge: Discrete GPU Showdown (Video)
- ASUS Z170-A Motherboard Preview
- Intel Skylake / Z170 Rapid Storage Technology Tested - PCIe and SATA RAID
The Intel Skylake architecture has been on our radar for quite a long time as Intel's next big step in CPU design. Through leaks and some official information discussed by Intel over the past few months, we know at least a handful of details: DDR4 memory support, 14nm process technology, modest IPC gains and impressive GPU improvements. But the details have remained a mystery on how the "tock" of Skylake on the 14nm process technology will differ from Broadwell and Haswell.
Interestingly, due to some shifts in how Intel is releasing Skylake, we are going to be doing a review today with very little information on the Skylake architecture and design (at least officially). While we are very used to the company releasing new information at the Intel Developer Forum along with the launch of a new product, Intel has instead decided to time the release of the first Skylake products with Gamescom in Cologne, Germany. Parts will go on sale today (August 5th) and we are reviewing a new Intel processor without the background knowledge and details that will be needed to really explain any of the changes or differences in performance that we see. It's an odd move honestly, but it has some great repercussions for the enthusiasts that read PC Perspective: Skylake will launch first as an enthusiast-class product for gamers and DIY builders.
For many of you this won't change anything. If you are curious about the performance of the new Core i7-6700K, power consumption, clock for clock IPC improvements and anything else that is measurable, then you'll get exactly what you want from today's article. If you are a gear-head that is looking for more granular details on how the inner-workings of Skylake function, you'll have to wait a couple of weeks longer - Intel plans to release that information on August 18th during IDF.
So what does the addition of DDR4 memory, full range base clock manipulation and a 4.0 GHz base clock on a brand new 14nm architecture mean for users of current Intel or AMD platforms? Also, is it FINALLY time for users of the Core i7-2600K or older systems to push that upgrade button? (Let's hope so!)
Bioshock Infinite Results
Our Intel Skylake launch coverage is intense! Make sure you hit up all the stories and videos that are interesting for you!
- The Intel Core i7-6700K Review - Skylake First for Enthusiasts (Video)
- Skylake vs. Sandy Bridge: Discrete GPU Showdown (Video)
- ASUS Z170-A Motherboard Preview
- Intel Skylake / Z170 Rapid Storage Technology Tested - PCIe and SATA RAID
Today marks the release of Intel's newest CPU architecture, code named Skylake. I already posted my full review of the Core i7-6700K processor so, if you are looking for CPU performance and specification details on that part, you should start there. What we are looking at in this story is the answer to a very simple, but also very important question:
Is it time for gamers using Sandy Bridge system to finally bite the bullet and upgrade?
I think you'll find that answer will depend on a few things, including your gaming resolution and aptitude for multi-GPU configuration, but even I was surprised by the differences I saw in testing.
Our testing scenario was quite simple. Compare the gaming performance of an Intel Core i7-6700K processor and Z170 motherboard running both a single GTX 980 and a pair of GTX 980s in SLI against an Intel Core i7-2600K and Z77 motherboard using the same GPUs. I installed both the latest NVIDIA GeForce drivers and the latest Intel system drivers for each platform.
|Skylake System||Sandy Bridge System|
|Processor||Intel Core i7-6700K||Intel Core i7-2600K|
|Motherboard||ASUS Z170-Deluxe||Gigabyte Z68-UD3H B3|
|Memory||16GB DDR4-2133||8GB DDR3-1600|
|Graphics Card||1x GeForce GTX 980
2x GeForce GTX 980 (SLI)
|1x GeForce GTX 980
2x GeForce GTX 980 (SLI)
|OS||Windows 8.1||Windows 8.1|
Our testing methodology follows our Frame Rating system, which uses a capture-based system to measure frame times at the screen (rather than trusting the software's interpretation).
If you aren't familiar with it, you should probably do a little research into our testing methodology as it is quite different than others you may see online. Rather than using FRAPS to measure frame rates or frame times, we are using an secondary PC to capture the output from the tested graphics card directly and then use post processing on the resulting video to determine frame rates, frame times, frame variance and much more.
This amount of data can be pretty confusing if you attempting to read it without proper background, but I strongly believe that the results we present paint a much more thorough picture of performance than other options. So please, read up on the full discussion about our Frame Rating methods before moving forward!!
While there are literally dozens of file created for each “run” of benchmarks, there are several resulting graphs that FCAT produces, as well as several more that we are generating with additional code of our own.
If you need some more background on how we evaluate gaming performance on PCs, just check out my most recent GPU review for a full breakdown.
I only had time to test four different PC titles:
- Bioshock Infinite
- Grand Theft Auto V
- GRID 2
- Metro: Last Light
Tick Tock Tick Tock Tick Tock Tock
A few websites have been re-reporting on a leak from BenchLife.info about Kaby Lake, which is supposedly a second 14nm redesign (“Tock”) to be injected between Skylake and Cannonlake.
UPDATE (July 2nd, 3:20pm ET): It has been pointed out that many hoaxes have come out of the same source, and that I should be more clear in my disclaimer. This is an unconfirmed, relatively easy to fake leak that does not have a second, independent source. I reported on it because (apart from being interesting enough) some details were listed on the images, but not highlighted in the leak, such as "GT0" and a lack of Iris Pro on -K. That suggests that the leaker got the images from somewhere, but didn't notice those details, which implies that the original source was hoaxed by an anonymous source, who only seeded the hoax to a single media outlet, or that it was an actual leak.
Either way, enjoy my analysis but realize that this is a single, unconfirmed source who allegedly published hoaxes in the past.
Image Credit: BenchLife.info
If true, this would be a major shift in both Intel's current roadmap as well as how they justify their research strategies. It also includes a rough stack of product categories, from 4.5W up to 91W TDPs, including their planned integrated graphics configurations. This leads to a pair of interesting stories:
How Kaby Lake could affect Intel's processors going forward. Since 2006, Intel has only budgeted a single CPU architecture redesign for any given fabrication process node. Taking two attempts on the 14nm process buys time for 10nm to become viable, but it could also give them more time to build up a better library of circuit elements, allowing them to assemble better processors in the future.
What type of user will be given Iris Pro? Also, will graphics-free options be available in the sub-Enthusiast class? When buying a processor from Intel, the high-end mainstream processors tend to have GT2-class graphics, such as the Intel HD 4600. Enthusiast architectures, such as Haswell-E, cannot be used without discrete graphics -- the extra space is used for more cores, I/O lanes, or other features. As we will discuss later, Broadwell took a step into changing the availability of Iris Pro in the high-end mainstream, but it doesn't seem like Kaby Lake will make any more progress. Also, if I am interpreting the table correctly, Kaby Lake might bring iGPU-less CPUs to LGA 1151.
Keeping Your Core Regular
To the first point, Intel has been on a steady tick-tock cycle since the Pentium 4 architecture reached the 65nm process node, which was a “tick”. The “tock” came from the Conroe/Merom architecture that was branded “Core 2”. This new architecture was a severe departure from the high clock, relatively low IPC design that Netburst was built around, which instantaneously changed the processor landscape from a dominant AMD to an Intel runaway lead.
After 65nm and Core 2 started the cycle, every new architecture alternated between shrinking the existing architecture to smaller transistors (tick) and creating a new design on the same fabrication process (tock). Even though Intel has been steadily increasing their R&D budget over time, which is now in the range of $10 to $12 billion USD each year, creating smaller, more intricate designs with new process nodes has been getting harder. For comparison, AMD's total revenue (not just profits) for 2014 was $5.51 billion USD.
Qualcomm’s GPU History
Despite its market dominance, Qualcomm may be one of the least known contenders in the battle for the mobile space. While players like Apple, Samsung, and even NVIDIA are often cited as the most exciting and most revolutionary, none come close to the sheer sales, breadth of technology, and market share that Qualcomm occupies. Brands like Krait and Snapdragon have helped push the company into the top 3 semiconductor companies in the world, following only Intel and Samsung.
Founded in July 1985, seven industry veterans came together in the den of Dr. Irwin Jacobs’ San Diego home to discuss an idea. They wanted to build “Quality Communications” (thus the name Qualcomm) and outlined a plan that evolved into one of the telecommunications industry’s great start-up success stories.
Though Qualcomm sold its own handset business to Kyocera in 1999, many of today’s most popular mobile devices are powered by Qualcomm’s Snapdragon mobile chipsets with integrated CPU, GPU, DSP, multimedia CODECs, power management, baseband logic and more. In fact the typical “chipset” from Qualcomm encompasses up to 20 different chips of different functions besides just the main application processor. If you are an owner of a Galaxy Note 4, Motorola Droid Turbo, Nexus 6, or Samsung Galaxy S5, then you are most likely a user of one of Qualcomm’s Snapdragon chipsets.
Qualcomm’s GPU History
Before 2006, the mobile GPU as we know it today was largely unnecessary. Feature phones and “dumb” phones were still the large majority of the market with smartphones and mobile tablets still in the early stages of development. At this point all the visual data being presented on the screen, whether on a small monochrome screen or with the color of a PDA, was being drawn through a software renderer running on traditional CPU cores.
But by 2007, the first fixed-function, OpenGL ES 1.0 class of GPUs started shipping in mobile devices. These dedicated graphics processors were originally focused on drawing and updating the user interface on smartphones and personal data devices. Eventually these graphics units were used for what would be considered the most basic gaming tasks.
Digging into a specific market
A little while ago, I decided to think about processor design as a game. You are given a budget of complexity, which is determined by your process node, power, heat, die size, and so forth, and the objective is to lay out features in the way that suits your goal and workload best. While not the topic of today's post, GPUs are a great example of what I mean. They make the assumption that in a batch of work, nearby tasks are very similar, such as the math behind two neighboring pixels on the screen. This assumption allows GPU manufacturers to save complexity by chaining dozens of cores together into not-quite-independent work groups. The circuit fits the work better, and thus it lets more get done in the same complexity budget.
Carrizo is aiming at a 63 million unit per year market segment.
This article is about Carrizo, though. This is AMD's sixth-generation APU, starting with Llano's release in June 2011. For this launch, Carrizo is targeting the 15W and 35W power envelopes for $400-$700 USD notebook devices. AMD needed to increase efficiency on the same, 28nm process that we have seen in their product stack since Kabini and Temash were released in May of 2013. They tasked their engineers to optimize their APU's design for these constraints, which led to dense architectures and clever features on the same budget of complexity, rather than smaller transistors or a bigger die.
15W was their primary target, and they claim to have exceeded their own expectations.
Backing up for a second. Beep. Beep. Beep. Beep.
When I met with AMD last month, I brought up the Bulldozer architecture with many individuals. I suspected that it was a quite clever design that didn't reach its potential because of external factors. As I started this editorial, processor design is a game and, if you can save complexity by knowing your workload, you can do more with less.
Bulldozer looked like it wanted to take a shortcut by cutting elements that its designers believed would be redundant going forward. First and foremost, two cores share a single floating point (decimal) unit. While you need some floating point capacity, upcoming workloads could use the GPU for a massive increase in performance, which is right there on the same die. As such, the complexity that is dedicated to every second FPU can be cut and used for something else. You can see this trend throughout various elements of the architecture.