Core and Interconnect
The Skylake architecture is Intel’s first to get a full release on the desktop in more than two years. While that might not seem like a long time in the grand scheme of technology, for our readers and viewers that is a noticeable change and shift from recent history that Intel has created with the tick-tock model of releases. Yes, Broadwell was released last year and was solid product, but Intel focused almost exclusively on the mobile platforms (notebooks and tablets) with it. Skylake will be much more ubiquitous and much more quickly than even Haswell.
Skylake represents Intel’s most scalable architecture to date. I don’t mean only frequency scaling, though that is an important part of this design, but rather in terms of market segment scaling. Thanks to brilliant engineering and design from Intel’s Israeli group Intel will be launching Skylake designs ranging from 4.5 watt TDP Core M solutions all the way up to the 91 watt desktop processors that we have already reviewed in the Core i7-6700K. That’s a range that we really haven’t seen before and in the past Intel has depended on the Atom architecture to make up ground on the lowest power platforms. While I don’t know for sure if Atom is finally trending towards the dodo once Skylake’s reign is fully implemented, it does make me wonder how much life is left there.
Scalability also refers to the package size – something that ensures that the designs the engineers created can actually be built and run in the platform segments they are targeting. Starting with the desktop designs for LGA platforms (DIY market) that fits on a 1400 mm2 design on the 91 watt TDP implementation Intel is scaling all the way down to 330 mm2 in a BGA1515 package for the 4.5 watt TDP designs. Only with a total product size like that can you hope to get Skylake in a form factor like the Compute Stick – which is exactly what Intel is doing. And note that the smaller packages require the inclusion of the platform IO chip as well, something that H- and S-series CPUs can depend on the motherboard to integrate.
Finally, scalability will also include performance scaling. Clearly the 4.5 watt part will not offer the user the same performance with the same goals as the 91 watt Core i7-6700K. The screen resolution, attached accessories and target applications allow Intel to be selective about how much power they require for each series of Skylake CPUs.
The fundamental design theory in Skylake is very similar to what exists today in Broadwell and Haswell with a handful of significant and hundreds of minor change that make Skylake a large step ahead of previous designs.
This slide from Julius Mandelblat, Intel Senior Principle Engineer, shows a higher level overview of the entirety of the consumer integration of Skylake. You can see that Intel’s goals included a bigger and wider core design, higher frequency, improved right architecture and fabric design and more options for eDRAM integration. Readers of PC Perspective will already know that Skylake supports both DDR3L and DDR4 memory technologies but the inclusion of the camera ISP is new information for us.
Subject: Storage | August 18, 2015 - 02:20 PM | Allyn Malventano
Tagged: XPoint, ssd, Optane, Intel, IDF 2015
Just three weeks ago, we reported 3D XPoint Technology. This was a 2-layer stack of non-volatile memory that couples the data retention of NAND flash memory with speeds much closer to that of DRAM.
The big question at that time was less about the tech and more about its practical applications. Ryan is out covering IDF, and he just saw the first publically announced application by Intel:
Intel Optane Technology is Intel’s term for how they are going to incorporate XPoint memory dies into the devices we use today. They intend to start with datacenter storage and work their way down to ultrabooks, which means that XPoint must come in at a cost/GB closer to NAND than to DRAM. For those asking specific performance figures after our earlier announcement, here are a couple of performance comparisons between an SSD DC P3700 and a prototype SSD using XPoint:
At QD=8, the XPoint equipped prototype comes in at 5x the performance of the P3700. The bigger question is how about QD=1 performance, as XPoint is supposed to be far less latent than NAND?
Yes, you read that correctly, that’s 76k IOPS at QD=1. That means only issuing the SSD one command at a time, waiting for a reply, and only then issuing another command. Basically the worst case for SSD performance, as no commands are stacked up in the queue to enable parallelism to kick in and increase overall throughput. For comparison, SATA SSDs have a hard time maintaining that figure at their maximum queue depths of 32.
Exciting to see a follow-on announcement so quickly after the announcement of the technology itself, but remember that Intel did state ‘2016’ for these to start appearing, so don’t put off that SSD 750 purchase just yet.
More to follow as we continue our coverage of IDF 2015!
It comes after 8, but before 10
As the week of Intel’s Developer Forum (IDF) begins, you can expect to see a lot of information about Intel’s 6th Generation Core architecture, codenamed Skylake, finally revealed. When I posted my review of the Core i7-6700K, the first product based on that architecture to be released in any capacity, I was surprised that Intel was willing to ship product without the normal amount of background information for media and developers. Rather than give us the details and then ship product, which has happened for essentially every consumer product release I have been a part of, Intel did the reverse: ship a consumer friendly CPU and then promise to tell us how it all works later in the month at IDF.
Today I came across a document posted on Intel’s website that dives into very specific detail on the new Gen9 graphics and compute architecture of Skylake. Details on the Core architecture changes are not present, and instead we are given details on how the traditional GPU portion of the SoC has changed. To be clear: I haven’t had any formal briefing from Intel on this topic or anything surrounding the architecture of Skylake or the new Gen9 graphics system but I wanted to share the details we found available. I am sure we’ll learn more this week as IDF progresses so I will update this story where necessary.
What Intel calls Processor Graphics is what we used to call simply integrated graphics for the longest time. The purpose and role of processor graphics has changed drastically over the years and it is now not only responsible for 3D graphics rendering but compute, media and display capabilities of the Intel Skylake SoC (when discrete add-in graphics is not used). The architecture document used to source this story focuses on Gen9 graphics, the compute architecture utilized in the latest Skylake CPUs. The Intel HD Graphics 530 on the Core i7-6700K / Core i5-6600K is the first product released and announced using Gen9 graphics and is also the first to adopt Intel’s new 3-digit naming scheme.
This die shot of the Core i7-6700K shows the increased size and prominence of the Gen9 graphics in the overall SoC design. Containing four traditional x86 CPU cores and 1 “slice” implementation of Gen9 graphics (with three visible sub-slices we’ll describe below), this is not likely to be the highest performing iteration of the latest Intel HD Graphics technology.
Like the Intel processors before it, the Skylake design utilizes a ring bus architecture to connect the different components of the SoC. This bi-directional interconnect has a 32-byte wide data bus and connects to multiple “agents” on the CPU. Each individual CPU core is considered its own agent while the Gen9 compute architecture is considered one complete agent. The system agent bundles the DRAM memory, the display controller, PCI Express and other I/O interface that communicate with the rest of the PC. Any off-chip memory requests and transactions occur through this bus while on-chip data transfers tend to be handled differently.
Subject: General Tech | August 13, 2015 - 01:14 PM | Ken Addison
Tagged: podcast, video, amd, nvidia, GTX 970, Zotac GTX 970 AMP! Extreme Core Edition, dx12, 3dfx, voodoo 3, Intel, SSD 750, NVMe, Samsung, R9 Fury, Fiji, gtx 950
PC Perspective Podcast #362 - 08/13/2015
Join us this week as we discuss Benchmarking a Voodoo 3, Flash Media Summit 2015, Skylake Delidding and more!
The URL for the podcast is: http://pcper.com/podcast - Share with your friends!
- iTunes - Subscribe to the podcast directly through the Store
- RSS - Subscribe through your regular RSS reader
- MP3 - Direct download link to the MP3 file
Hosts: Ryan Shrout, Jeremy Hellstrom, Josh Walrath, and Sebastian Peak
Program length: 1:15:23
Subject: Processors | August 11, 2015 - 06:39 PM | Jeremy Hellstrom
Tagged: skylake-u, Intel
Fanless Tech just posted slides of Skylake-U the ultraportable version of Skylake, all of which have an impressively low TDP of 15W which can be reduced to either 10W or in some cases all the way down to 7.5W. As they have done previously all are BGA socketed which means you will not be able to upgraded nor are you likely to see them in desktops, not necessarily a bad thing for this segment of the mobile market but certainly worth noting.
There will be two i7 models and two i5 along with a single i3 version, the top models of which, the Core i7-6600U and Core i5-6300U sport a slightly increased frequency and support for vPro. Those two models, along with the i7-6500U and i5-6200U will have the Intel HD graphics 520 with frequencies of 300/1050 for the i7's and 300/1000 for the i5 and i3 chips
Along with the Core models will come a single Pentium chip, the 4405U and a pair of Celerons, the 3955U and 3855U. They will have HD510 graphics, clocks of 300/950 or 300/900 for the Celerons and you will see slight reductions in PCIe and storage subsystems on teh 4405U and 3855U. The naming scheme is less confusing that some previous generations, a boon for those with family or friends looking for a new laptop who are perhaps not quite as obsessed with processors as we are.
It's Basically a Function Call for GPUs
Mantle, Vulkan, and DirectX 12 all claim to reduce overhead and provide a staggering increase in “draw calls”. As mentioned in the previous editorial, loading graphics card with tasks will take a drastic change in these new APIs. With DirectX 10 and earlier, applications would assign attributes to (what it is told is) the global state of the graphics card. After everything is configured and bound, one of a few “draw” functions is called, which queues the task in the graphics driver as a “draw call”.
While this suggests that just a single graphics device is to be defined, which we also mentioned in the previous article, it also implies that one thread needs to be the authority. This limitation was known about for a while, and it contributed to the meme that consoles can squeeze all the performance they have, but PCs are “too high level” for that. Microsoft tried to combat this with “Deferred Contexts” in DirectX 11. This feature allows virtual, shadow states to be loaded from secondary threads, which can be appended to the global state, whole. It was a compromise between each thread being able to create its own commands, and the legacy decision to have a single, global state for the GPU.
Some developers experienced gains, while others lost a bit. It didn't live up to expectations.
The paradigm used to load graphics cards is the problem. It doesn't make sense anymore. A developer might not want to draw a primitive with every poke of the GPU. At times, they might want to shove a workload of simple linear algebra through it, while other requests could simply be pushing memory around to set up a later task (or to read the result of a previous one). More importantly, any thread could want to do this to any graphics device.
The new graphics APIs allow developers to submit their tasks quicker and smarter, and it allows the drivers to schedule compatible tasks better, even simultaneously. In fact, the driver's job has been massively simplified altogether. When we tested 3DMark back in March, two interesting things were revealed:
- Both AMD and NVIDIA are only a two-digit percentage of draw call performance apart
- Both AMD and NVIDIA saw an order of magnitude increase in draw calls
Subject: Processors | August 8, 2015 - 05:55 PM | Scott Michaud
Tagged: Skylake, Intel, delid, CPU die, cpu, Core i7-6700K
PC Watch, a Japanese computer hardware website, acquired at least one Skylake i7-6700K and removed the heatspreader. With access to the bare die, they took some photos and tested a few thermal compound replacements, which quantifies how good (or bad) Intel's default thermal grease is. As evidenced by the launch of Ivy Bridge and, later, Devil's Canyon, the choice of thermal interface between the die and the lid can make a fairly large difference in temperatures and overclocking.
Image Credit: PC Watch
They chose the vice method for the same reason that Morry chose this method in his i7-4770k delid article last year. This basically uses a slight amount of torque and external pressure or shock to pop the lid off the processor. Despite how it looks, this is considered to be less traumatic than using a razer blade to cut the seal, because human hands are not the most precise instruments and a slight miss could damage the PCB. PC Watch, apparently, needed to use a wrench to get enough torque on the vice, which is transferred to the processor as pressure.
Image Credit: PC Watch
Of course, Intel could always offer enthusiasts with choices in thermal compounds before they put the lid on, which would be safest. How about that, Intel?
Image Credit: PC Watch
With the lid off, PC Watch mentioned that the thermal compound seems to be roughly the same as Devil's Canyon, which is quite good. They also noticed that the PCB is significantly more thin than Haswell, dropping in thickness from about 1.1mm to about 0.8mm. For some benchmarks, they tested it with the stock interface, an aftermarket solution called Prolimatech PK-3, and a liquid metal alloy called Coollaboratory Liquid Pro.
Image Credit: PC Watch
At 4.0 GHz, PK-3 dropped the temperature by about 4 degrees Celsius, while Liquid Metal knocked it down 16 degrees. At 4.6 GHz, PK-3 continued to give a delta of about 4 degrees, while Liquid Metal widened its gap to 20 degrees. It reduced an 88 C temperature to 68 C!
Image Credit: PC Watch
There are obviously limitations to how practical this is. If you were concerned about thermal wear on your die, you probably wouldn't forcibly remove its heatspreader from its PCB to acquire it. That would be like performing surgery on yourself to remove your own appendix, which wasn't inflamed, just in case. Also, from an overclocking standpoint, heat doesn't scale with frequency. Twenty degrees is a huge gap, but even a hundred MHz could eat it up, depending on your die.
It's still interesting for those who try, though.
Subject: General Tech | August 7, 2015 - 01:31 PM | Jeremy Hellstrom
Tagged: fud, security, Intel, amd, x86, SMM
The SSM security hole that Christopher Domas has demonstrated (pdf) is worrying but don't panic, it requires your system to be compromised before you are vulnerable. That said, once you have access to the SMM you can do anything you feel like to the computer up to and including ensuring you can reinfect the machine even after a complete format or UEFI update. The flaw was proven on Intel x86 machines but is likely to apply to AMD processors as well as they were using the same architecture around the turn of the millennium and thankfully the issue has been mitigated in recent processors. Intel will be releasing patches for effected CPUs, although not all the processors can be patched and we have yet to hear from AMD. You can get an over view of the issue by following the link at Slashdot and speculate on if this flaw was a mistake or inserted there on purpose in our comment section.
"Security researcher Christopher Domas has demonstrated a method of installing a rootkit in a PC's firmware that exploits a feature built into every x86 chip manufactured since 1997. The rootkit infects the processor's System Management Mode, and could be used to wipe the UEFI or even to re-infect the OS after a clean install. Protection features like Secure Boot wouldnt help, because they too rely on the SMM to be secure."
Here is some more Tech News from around the web:
- Millions of Android devices pwned in single text attack ... again @ The Inquirer
- Mozilla Issues Fix For Firefox Zero-Day Bug @ Slashdot
- Microsoft plays down playing fast and loose with Windows 10 privacy @ The Inquirer
- Ransacked US OPM wins Pwnie Award for 'Most EPIC Fail' @ The Register
- Hacking Team brewed potent iOS poison for non-jailbroken iThings @ The Register
- Tesla Model S Has Been Hacked @ Slashdot
- Asus EA-AC87 4×4 wireless bridge @ Kitguru
Subject: Storage | August 6, 2015 - 06:37 PM | Allyn Malventano
Tagged: SSD 750, ssd, pcie, NVMe, Intel
A new 800GB SKU of the Intel SSD 750 Series of PCIe SSDs was hinted at with the Skylake launch press materials, and it appears to have been a reality:
They may not be on the shelves yet, but appearing on ARK is a pretty good indicator that these are coming soon. We don't have pricing yet, but I would suspect a cost/GB closer to the 1.2TB model than to the 400GB model, which should come in at around $700. Performance sees a slight hit for the 800GB model, likely since this is an 'uneven' number of dies for the design of the SSD DC P3500 line it was based on.
Which would you prefer - a single 800GB or a pair of 400GB SSD 750's in a RAID (now that it is possible)?
A quick look at storage
** This piece has been updated to reflect changes since first posting. See page two for PCIe RAID results! **
Our Intel Skylake launch coverage is intense! Make sure you hit up all the stories and videos that are interesting for you!
- The Intel Core i7-6700K Review - Skylake First for Enthusiasts (Video)
- Skylake vs. Sandy Bridge: Discrete GPU Showdown (Video)
- ASUS Z170-A Motherboard Preview
- Intel Skylake / Z170 Rapid Storage Technology Tested - PCIe and SATA RAID
When I saw the small amount of press information provided with the launch of Intel Skylake, I was both surprised and impressed. The new Z170 chipset was going to have an upgraded DMI link, nearly doubling throughput. DMI has, for a long time, been suspected as the reason Intel SATA controllers have pegged at ~1.8 GB/sec, which limits the effectiveness of a RAID with more than 3 SSDs. Improved DMI throughput could enable the possibility of a 6-SSD RAID-0 that exceeds 3GB/sec, which would compete with PCIe SSDs.
Speaking of PCIe SSDs, that’s the other big addition to Z170. Intel’s Rapid Storage Technology was going to be expanded to include PCIe (even NVMe) SSDs, with the caveat that they must be physically connected to PCIe lanes falling under the DMI-connected chipset. This is not as big of as issue as you might think, as Skylake does not have 28 or 40 PCIe lanes as seen with X99 solutions. Z170 motherboards only have to route 16 PCIe lanes from the CPU to either two (8x8) or three (8x4x4) PCIe slots, and the remaining slots must all hang off of the chipset. This includes the PCIe portion of M.2 and SATA Express devices.