Subject: Processors | May 28, 2015 - 03:44 PM | Scott Michaud
Tagged: Intel, Skylake, skylake-s, haswell, devil's canyon
For a while, it was unclear whether we would see Broadwell on the desktop. With the recently leaked benchmarks of the Intel Core i7-6700K, it seems all-but-certain that Intel will skip it and go straight to Skylake. Compared to Devil's Canyon, the Haswell-based Core i7-4790K, the Skylake-S Core i7-6700K has the same base clock (4.0 GHz) and same full-processor Turbo clock (4.2 GHz). Pretty much every improvement that you see is pure performance per clock (IPC).
Image Credit: CPU Monkey
In multi-threaded applications, the Core i7-6700K tends to get about a 9% increase while, when a single core is being loaded, it tends to get about a 4% increase. Part of this might be the slightly lower single-core Turbo clock, which is said to be 4.2 GHz instead of 4.4 GHz. There might also be some increased efficiency with HyperThreading or cache access -- I don't know -- but it would be interesting to see.
I should note that we know nothing about the GPU. In fact, CPU Monkey fails to list a GPU at all. Intel has expressed interest in bringing Iris Pro-class graphics to the high-end mainstream desktop processors. For someone who is interested in GPU compute, especially with Explicit Unlinked MultiAdapter in DirectX 12 upcoming, it would be nice to see GPUs be ubiquitous and always enabled. It is expected to have the new GT4e graphics with 72 compute units and either 64 or 128MB of eDRAM. If clocks are equivalent, this could translate well over a teraflop (~1.2 TFLOPs) of compute performance in addition to discrete graphics. In discrete graphics, that would be nearly equivalent to an NVIDIA GTX 560 Ti.
We are expecting to see the Core i7-6700K launch in Q3 of this year. We'll see.
Subject: Processors | May 27, 2015 - 09:45 PM | Scott Michaud
Tagged: xeon, Skylake, Intel, Cannonlake, avx-512
AVX-512 is an instruction set that expands the CPU registers from 256-bit to 512-bit. It comes with a core specification, AVX-512 Foundation, and several extensions that can be added where it makes sense. For instance, AVX-512 Exponential and Reciprocal Instructions (ERI) help solve transcendental problems, which occur in geometry and are useful for GPU-style architectures. As such, it appears in Knights Landing but not anywhere else.
Image Credit: Bits and Chips
Today's rumor is that Skylake, the successor to Broadwell, will not include any AVX-512 support in its consumer parts. According to the lineup, Xeons based on Skylake will support AVX-512 Foundation, Conflict Detection Instructions, Vector Length Extensions, Byte and Word Instructions, and Double and Quadword Instructions. Fused Multiply and Add for 52-bit Integers and Vector Byte Manipulation Instructions will not arrive until Cannonlake shrinks everything down to 10nm.
The main advantage of larger registers is speed. When you can fit 512 bits of data in a memory bank and operate upon it at once, you are able to do several, linked calculations together. AVX-512 has the capability to operate on sixteen 32-bit values at the same time, which is obviously sixteen times the compute performance compared with doing just one at a time... if all sixteen undergo the same operation. This is especially useful for games, media, and other, vector-based workloads (like science).
This also makes me question whether the entire Cannonlake product stack will support AVX-512. While vectorization is a cheap way to get performance for suitable workloads, it does take up a large amount of transistors (wider memory, extra instructions, etc.). Hopefully Intel will be able to afford the cost with the next die shrink.
Subject: Graphics Cards, Processors, Displays, Systems | May 15, 2015 - 03:02 PM | Scott Michaud
Tagged: Oculus, oculus vr, nvidia, amd, geforce, radeon, Intel, core i5
Today, Oculus has published a list of what they believe should drive their VR headset. The Oculus Rift will obviously run on lower hardware. Their minimum specifications, published last month and focused on the Development Kit 2, did not even list a specific CPU or GPU -- just a DVI-D or HDMI output. They then went on to say that you really should use a graphics card that can handle your game at 1080p with at least 75 fps.
The current list is a little different:
- NVIDIA GeForce GTX 970 / AMD Radeon R9 290 (or higher)
- Intel Core i5-4590 (or higher)
- 8GB RAM (or higher)
- A compatible HDMI 1.3 output
- 2x USB 3.0 ports
- Windows 7 SP1 (or newer).
I am guessing that, unlike the previous list, Oculus has a more clear vision for a development target. They were a little unclear about whether this refers to the consumer version or the current needs of developers. In either case, it would likely serve as a guide for what they believe developers should target when the consumer version launches.
This post also coincides with the release of the Oculus PC SDK 0.6.0. This version pushes distortion rendering to the Oculus Server process, rather than the application. It also allows multiple canvases to be sent to the SDK, which means developers can render text and other noticeable content at full resolution, but scale back in places that the user is less likely to notice. They can also be updated at different frequencies, such as sleeping the HUD redraw unless a value changes.
The Oculus PC SDK (0.6.0) is now available at the Oculus Developer Center.
Subject: Processors | May 7, 2015 - 07:36 PM | Scott Michaud
Tagged: Intel, xeon, xeon e7 v3, xeon e7
On May 5th, Intel officially announced their new E7 v3 lineup of Xeon processors. This replaces the Xeon E7 v2 processors, which were based on Ivy Bridge-EX, with the newer Haswell-EX architecture. Interestingly, WCCFTech has Broadwell-EX listed next, even though the desktop is expected to mostly skip Broadwell and jump to Skylake in high-performance roles.
The largest model is the E7-8890 v3, which contains eighteen cores fed by a total of 45MB in L3 cache. Despite the high core count, the E7-8890 v3 has its base frequency set at 2.5 GHz to yield a TDP of 165W. The E7-8891 v3 (165W) and the E7-8893 v3 (140W) drop the core count to ten and four, but raise the base frequency to 2.8 GHz and 3.2 GHz, respectively. The E7-8880L v3 is a low power version, relatively speaking, which will also contains eighteen cores that are clocked at 2.0 GHz. This drops its TDP to 115W while still maintaining 45 MB of L3 cache.
Image Credit: WCCFTech
The product stack trickles down from there, but not much further. Just twelve processors are listed in the Xeon E7 segment, which Intel points out in the WCCFTech slides is a significant reduction in SKUs. This suggests that they believe their previous line was too many options for enterprise customers. When dealing with prices in the range of $1,223 - $7,174 USD for bulk orders, it makes sense to offer a little choice to slightly up-sell potential buyers, but too many choices can defeat that purpose. Also, it was a bit humorous to see such an engineering-focused company highlight a reduction of SKUs with a bubble point like it was a technological feature. Not bad, actually quite good as I mentioned above, just a bit funny.
The Xeon E7 v3 is listed as now available, with SKUs ranging from $1223 - $7174 USD.
Some Fresh Hope for 2016
EDIT 2015-05-07: A day after the AMD analyst meeting we now know that the roadmaps delivered here are not legitimate. While some of the information is likely correct on the roadmaps, they were not leaked by AMD. There is no FM3 socket, rather AMD is going with AM4. AMD will be providing more information throughout this quarter about their roadmaps, but for now take all of this information as "not legit".
SH SOTN has some eagle eyes and spotted the latest leaked roadmap for AMD. These roadmaps cover both mobile and desktop, from 2015 through 2016. There are obviously quite a few interesting tidbits of information here.
On the mobility roadmap we see the upcoming release of Carrizo, which we have been talking about since before CES. This will be the very first HSA 1.0 compliant part to hit the market, and AMD has done some really interesting things with the design in terms of performance, power efficiency, and die size optimizations. Carrizo will span the market from 15 watts to 35 watts TDP. This is a mobile only part, but indications point to it being pretty competent overall. This is a true SOC that will support all traditional I/O functions of older standalone southbridges. Most believe that this part will be manufactured by GLOBALFOUNDIRES on their 28 nm HKMG process that is more tuned to AMD's APU needs.
Carrizo-L will be based on the Puma+ architecture and will go from 10 watts to 15 watts TDP. This will use the same FP4 BGA connection as the big Carrizo APU. This should make these parts more palatable for OEMs as they do not have to differentiate the motherboard infrastructure. Making things easier for OEMs will give more reasons for these folks to offer products based on Carrizo and Carrizo-L APUs. The other big reason will be the GCN graphics compute units. Puma+ is a very solid processor architecture for low power products, but these parts are still limited to the older 28 nm HKMG process from TSMC.
One interesting addition here is that AMD will be introducing their "Amur" APU for the low power and ultra-low power markets. These will be comprised of four Cortex-A57 CPUs combined with AMD's GCN graphics units. This will be the first time we see this combination, and the first time AMD has integrated with ARM since ATI spun off their mobile graphics to Qualcomm under the "Adreno" branding (anagram for "Radeon"). What is most interesting here is that this APU will be a 20 nm part most likely fabricated by TSMC. This is not to say that Samsung or GLOBALFOUNDRIES might be producing it, but those companies are expending their energy on the 14 nm FinFET process that will be their bread and butter for years to come. This will be a welcome addition to the mobile market (tablets and handhelds) and could be a nice profit center for AMD if they are able to release this in a timely manner.
2016 is when things get very interesting. The Zen x86 design will dominate the upper 2/3 of the roadmap. I had talked about Zen when we had some new diagram leaks yesterday, but now we get to see the first potential products based off of this architecture. In mobile it will span from 5 watts to 35 watts TDP. The performance and mainstream offerings will be the "Bristol Ridge" APU which will feature 4 Zen cores (or one Zen module) combined with the next gen GCN architecture. This will be a 14nm part, and the assumption is that it will be GLOBALFOUNDRIES using 14nm FinFET LPP (Low Power Plus) that will be more tuned for larger APUs. This will also be a full SOC.
The next APU will be codenamed "Basilisk" that will span the 5 watt to 15 watt range. It will be comprised of 2 Zen cores (1/2 of a Zen module) and likely feature 2 to 4 MB of L3 cache, depending on power requirements. This looks to be the first Skybridge set of APUs that will share the same infrastructure as the ARM based Amur SOC. FT4 BGA is the basis for both the 2015 Amur and 2016 Basilisk SOCs.
Finally we have the first iteration of AMD's first ground up implementation of ARM's ARMv8-A ISA. The "Styx" APU features the new K12 CPU cores that AMD has designed from scratch. It too will feature the next generation GCN units as well as share the same FT4 BGA connection. Many are anxiously watching this space to see if AMD can build a better mousetrap when it comes to licensing the ARM ISA (as have Qualcomm, NVIDIA, and others).
2015 shows no difference in the performance desktop space, as it is still serviced by the now venerable Piledriver based FX parts on AM3+. The only change we expect to see here is that there will be a handful of new motherboard offerings from the usual suspects that will include the new USB 3.1 functionality derived from a 3rd party controller.
Mainstream and Performance will utilize the upcoming Godavari APUs. These are power and speed optimized APUs that are still based on the current Kaveri design. These look to be a simple refresh/rebadge with a slight performance tweak. Not exciting, but needs to happen for OEMs.
Low power will continue to be addressed by Beema based APUs. These are regular Puma based cores (not Puma+). AMD likely does not have the numbers to justify a new product in this rather small market.
2016 is when things get interesting again. We see the release of the FM3 socket (final proof that AM3+ is dead) that will house the latest Zen based APUs. At the top end we see "Summit Ridge" which will be composed of 8 Zen cores (or 2 Zen modules). This will have 4 MB of L2 cache and 16 MB of L3 cache if our other leaks are correct. These will be manufactured on 14nm FinFET LPE (the more appropriate process product for larger, more performance oriented parts). These will not be SOCs. We can expect these to be the basis of new Opterons as well, but there is obviously no confirmation of that on these particular slides. This will be the first new product in some years from AMD that has the chance to compete with higher end desktop SKUs from Intel.
From there we have the lower power Bristol Ridge and Basilisk APUs that we already covered in the mobile discussion. These look to be significant upgrades from the current Kaveri (and upcoming Godavari) APUs. New graphics cores, new CPU cores, and new SOC implementations where necessary.
AMD will really be shaking up the game in 2016. At the very least they will have proven that they can still change up their game and release higher end (and hopefully competitive) products. AMD has enough revenue and cash on hand to survive through 2016 and 2017 at the rate they are going now. We can only hope that this widescale change will allow AMD to make some significant inroads with OEMs on all levels. Otherwise Intel is free to do what they want and what price they want across multiple markets.
ARM Releases Cortex-A72 for Licensing
On February 3rd, ARM announced a slew of new designs, including the Cortex A72. Few details were shared with us, but what we learned was that it could potentially redefine power and performance in the ARM ecosystem. Ryan was invited to London to participate in a deep dive of what ARM has done to improve its position against market behemoth Intel in the very competitive mobile space. Intel has a leg up on process technology with their 14nm Tri-Gate process, but they are continuing to work hard in making their x86 based processors more power efficient, while still maintaining good performance. There are certain drawbacks to using an ISA that is focused on high performance computing rather than being designed from scratch to provide good performance with excellent energy efficiency.
ARM has been on a pretty good roll with their Cortex A9, A7, A15, A17, A53, and A57 parts over the past several years. These designs have been utilized in a multitude of products and scenarios, with configurations that have scaled up to 16 cores. While each iteration has improved upon the previous, ARM is facing the specter of Intel’s latest generation, highly efficient x86 SOCs based on the 2nd gen 14nm Tri-Gate process. Several things have fallen into place for ARM to help them stay competitive, but we also cannot ignore the experience and design hours that have led to this product.
(Editor's Note: During my time with ARM last week it became very apparent that it is not standing still, not satisfied with its current status. With competition from Intel, Qualcomm and others ramping up over the next 12 months in both mobile and server markets, ARM will more than ever be depedent on the evolution of core design and GPU design to maintain advantages in performance and efficiency. As Josh will go into more detail here, the Cortex-A72 appears to be an incredibly impressive design and all indications and conversations I have had with others, outside of ARM, believe that it will be an incredibly successful product.)
Cortex A72: Highest Performance ARM Cortex
ARM has been ubiquitous for mobile applications since it first started selling licenses for their products in the 90s. They were found everywhere it seemed, but most people wouldn’t recognize the name ARM because these chips were fabricated and sold by licensees under their own names. Guys like Ti, Qualcomm, Apple, DEC and others all licensed and adopted ARM technology in one form or the other.
ARM’s importance grew dramatically with the introduction of increased complexity cellphones and smartphones. They also gained attention through multimedia devices such as the Microsoft Zune. What was once a fairly niche company with low performance, low power offerings became the 800 pound gorilla in the mobile market. Billions of chips are sold yearly based on ARM technology. To stay in that position ARM has worked aggressively on continually providing excellent power characteristics for their parts, but now they are really focusing on overall performance and capabilities to address, not only the smartphone market, but also the higher performance computing and server spaces that they want a significant presence in.
Subject: Processors | April 27, 2015 - 06:06 PM | Josh Walrath
Tagged: Zen, Steamroller, Kaveria, k12, Excavator, carrizo, bulldozer, amd
There are some pretty breathless analysis of a single leaked block diagram that is supposedly from AMD. This is one of the first indications of what the Zen architecture looks like from a CPU core standpoint. The block diagram is very simple, but looks in the same style as what we have seen from AMD. There are some labels, but this is almost a 50,000 foot view of the architecture rather than a slightly clearer 10,000 foot view.
There are a few things we know for sure about Zen. It is a clean sheet design that moves away from what AMD was pursuing with their Bulldozer family of cores. Zen gives up CMT for SMT support for handling more threads. The design has a cluster of four cores sharing 8 MB of L3 cache, with each core having access to 512 KB of L2 cache. There is a lot of optimism that AMD can kick the trend of falling more and more behind Intel every year with this particular design. Jim Keller is viewed very positively due to his work at AMD in the K7 through K8 days, as well as what he accomplished at Apple with their ARM based offerings.
One of the first sites to pick up this diagram wrote quite a bit about what they saw. There was a lot of talk about, “right off the bat just by looking at the block diagram we can tell that Zen will have substantially higher single threaded performance compared to Excavator and the Bulldozer family.” There was the assumption that because it had two 256-bit FMACs that it could fuse them to create a single 512 bit AVX product.
These assumptions are pretty silly. This is a very simple block diagram that answers few very important questions about the architecture. Yes, it shows 6 int pipelines, but we don’t know how many are address generation vs. execution units. We don’t know how wide decode is. We don’t know latency to L2 cache, much less how L3 is connected and shared out. So just because we see more integer pipelines per core does not automatically mean, “Da, more is better, strong like tractor!” We don’t know what improvements or simplifications we will see in the schedulers. There is no mention of the front-end other than Fetch and Decode. How about Branch Prediction? What is the latency for the memory controller when addressing external memory?
Essentially, this looks like a simplified way of expressing to analysts that AMD is attempting to retain their per core integer performance while boosting floating point/AVX at a similar level. Other than that, there is very little that can be gleaned from this simple block diagram.
Other leaks that are interesting concerning Zen are the formats that we will see these products integrated into. One leak detailed a HPC aimed APU that features 16 Zen cores with 32 MB of L3 cache attached to a very large GPU. Another leak detailed a server level chip that will support 32 cores and will be seen in 2P systems. Zen certainly appears to be very flexible, and in ways it reminds me of a much beefier Jaguar type CPU. My gut feeling is that AMD will get closer to Intel than it has been in years, and perhaps they can catch Intel by surprise with a few extra features. The reality of the situation is that AMD is far behind and only now are we seeing pure-play foundries start to get even close to Intel in terms of process technology. AMD is very much at a disadvantage here.
Still, the company needs to release new, competitive products that will refill the company coffers. The previous quarter’s loss has dug into cash reserves, but AMD is still stable in terms of cash on hand and long term debt. 2015 will see new GPUs, an APU refresh, and the release of the new Carrizo parts. 2016 looks to be the make or break year with Zen and K12.
Edit 2015-04-28: Thanks to SH STON we have a new slide that has been leaked from the same deck as this one. This has some interesting info in that AMD may be going away from exclusive cache designs. Exclusive was a good idea when cache was small and expensive, as data was not replicated through each level of cache (L1 was not replicated in L2 and L2 was not replicated in L3). Intel has been using inclusive cache since forever, where data is replicated and simpler to handle. Now it looks like AMD is moving towards inclusive. This is not necessarily a bad thing as the 512 KB of L2 can easily handle what looks to be 128 KB of L1 and the shared 8 MB of L3 cache can easily handle the 2 MB of L2 data. Here is the link to that slide.
The new slide in question.
Subject: General Tech, Graphics Cards, Processors | April 19, 2015 - 02:08 PM | Scott Michaud
Tagged: moores law, Intel
While he was the director of research and development at Fairchild Semiconductor, Gordon E. Moore predicted that the number of components in an integrated circuits would double every year. Later, this time-step would slow to every two years; you can occasionally hear people talk about eighteen months too, but I am not sure who derived that number. In a few years, he would go on to found Intel with Robert Noyce, where they spend tens of billions of dollars annually to keep up with the prophecy.
It works out for the most part, but we have been running into physical issues over the last few years though. One major issue is that, with our process technology dipping into the single- and low double-digit nanometers, we are running out of physical atoms to manipulate. The distance between silicon atoms in a solid at room temperature is about 0.5nm; a 14nm product has features containing about 28 atoms, give or take a few in rounding error.
It has been a good fifty years since the start of Moore's Law. Humanity has been developing plans for how to cope with the eventual end of silicon lithography process shrinks. We will probably transition to smaller atoms and molecules and later consider alternative technologies like photonic crystals, which routes light in the hundreds of terahertz through a series of waveguides that make up an integrated circuit. Another interesting thought: will these technologies fall in line with Moore's Law in some way?
Subject: Processors | April 15, 2015 - 10:04 PM | Ryan Shrout
Tagged: Intel, Skylake, skylake-s, lga1151, 100 series
Some slides have leaked out with information about Intel's forthcoming 6th Generation Core processor, code named Skylake. We have known that Skylake was coming, and coming this year, but there have been a lot of questions about enthusiast parts and what that means for DIY builders. The slides were first seen over at WCCFTech.com and show some interesting new information.
Dubbed Skylake-S, the LGA (socketed) processor will use a new derivative with 1151 pins as well as a new set of chipsets, the Intel 100-series. Skylake is built on the same 14nm process technology used with Broadwell but will feature a new microarchitecture for both the IA cores and the graphics systems. Obviously you can read the slide yourself above, but some of the highlights are worth touching on individually. Skylake will support both DDR3L and DDR4 memory systems with the enthusiast grade parts likely the only ones to attempt to push the newer, faster DDR4 speeds.
Enthusiasts will also be glad to know that there are planned 95 watt quad-core SKUs that will support unlocked features and overclocking capability. Intel lists an "enhanced" BCLK overclocking with the term "full range" which likely means there will no longer be a need for straps to 125 MHz, etc. A 95 watt TDP is higher than the 88 watt limit we saw on Haswell processors so there is a chance we might actually witness usable performance gains if Intel can get the clock speeds up and above where they sit today with current generation parts.
The use of DMI 3.0, the connection between the processor and the chipset, sees the first increase in bandwidth in many generations. Rated at 8 GT/s, twice that of the DMI 2.0 interface used on Haswell, should allow for fewer bottlenecks on storage and external PCIe connections coming from the chipset.
The new Intel 100-series chipsets will come in three variants at launch: the Z170, the H170 and the H110. The one we are most concerned with is the Z170 of course as it will be paired wit the higher end 65 watt and 95 watt enthusiast processors. Based on these specs, Skylake will continue to operate with only 16 lanes of PCI Express 3.0 capable of running at 1 x16, 2 x8 or 1 x8 and 2 x4 connections. With either DDR3L or DDR4 you will have a dual-channel memory system.
For storage, the Z170 still has six SATA 6.0 Gb/s ports, moves to 14 USB ports maximum with 10 of them capable of USB 3.0 speeds and it upgrades Intel RST to support PCIe storage drivers. Of note here is that the Intel chipset does not include USB 3.1 capability so motherboard vendors will continue to need an external controller to integrate it. Without a doubt the 100-series chipsets will be able to support booting and compatibility with the new Intel 750-series PCIe SSDs, the current king of the hill.
As for timing, the roadmap lists the Z170 chipset and the Skylake-S processor as a Q3 2015 release. I would normally expect that to line up with Computex in early June but that doesn't appear to be the case based on other information I am getting.
Subject: Processors | April 7, 2015 - 05:56 PM | Jeremy Hellstrom
Tagged: amd, FX-8320e
Over at Techgage one of the writers recently updated their system, due to budget constraints they needed to stay in the $600-700 range all told which of course indicates an AMD build. They chose the $138 FX-8320E for their processor, along with a pair of GTX 760s, the ASUS M5A99FX Pro R2.0, 8GB of DDR3-1866 and with storage, power, cooling and case they managed to keep within the ir budget. The question remain is if it is powerful enough for reasonable gaming duties such as Borderlands 2. Read on to see if the recommendation is to go with AMD or the i3-4330 and a low end H97 board.
"Released this past fall, AMD’s FX-8320E processor promises to deliver a lot of processing power for those on a budget. It sports eight cores, and as a Black Edition, its overclocking capabilities are unrestricted. But is that enough to make this the best go-to budget processor, especially for gamers?"
Here are some more Processor articles from around the web:
- A10-7800 CPU Review @ Hardware Secrets
- AMD A8-7650k Kaveri @ eTeknix
- A10-6800K vs. Core i3-4150 CPU Review @ Hardware Secrets