Coming in 2014: Intel Core M
The era of Broadwell begins in late 2014 and based on what Intel has disclosed to us today, the processor architecture appears to be impressive in nearly every aspect. Coming off the success of the Haswell design in 2013 built on 22nm, the Broadwell-Y architecture will not only be the first to market with a new microarchitecture, but will be the flagship product on Intel’s new 14nm tri-gate process technology.
The Intel Core M processor, as Broadwell-Y has been dubbed, includes impressive technological improvements over previous low power Intel processors that result in lower power, thinner form factors, and longer battery life designs. Broadwell-Y will stretch into even lower TDPs enabling 9mm or small fanless designs that maintain current battery lifespans. A new 2nd generation FIVR with modified power delivery design allows for even thinner packaging and a wider range of dynamic frequencies than before. And of course, along with the shift comes an updated converged core design and improved graphics performance.
All of these changes are in service to what Intel claims is a re-invention of the notebook. Compared to 2010 when the company introduced the original Intel Core processor, thus redirecting Intel’s direction almost completely, Intel Core M and the Broadwell-Y changes will allow for some dramatic platform changes.
Notebook thickness will go from 26mm (~1.02 inches) down to a small as 7mm (~0.27 inches) as Intel has proven with its Llama Mountain reference platform. Reductions in total thermal dissipation of 4x while improving core performance by 2x and graphics performance by 7x are something no other company has been able to do over the same time span. And in the end, one of the most important features for the consumer, is getting double the useful battery life with a smaller (and lighter) battery required for it.
But these kinds of advancements just don’t happen by chance – ask any other semiconductor company that is either trying to keep ahead of or catch up to Intel. It takes countless engineers and endless hours to build a platform like this. Today Intel is sharing some key details on how it was able to make this jump including the move to a 14nm FinFET / tri-gate transistor technology and impressive packaging and core design changes to the Broadwell architecture.
Intel 14nm Technology Advancement
Intel consistently creates and builds the most impressive manufacturing and production processes in the world and it has helped it maintain a market leadership over rivals in the CPU space. It is also one of the key tenants that Intel hopes will help them deliver on the world of mobile including tablets and smartphones. At the 22nm node Intel was the first offer 3D transistors, what they called tri-gate and others refer to as FinFET. By focusing on power consumption rather than top level performance Intel was able to build the Haswell design (as well as Silvermont for the Atom line) with impressive performance and power scaling, allowing thinner and less power hungry designs than with previous generations. Some enthusiasts might think that Intel has done this at the expense of high performance components, and there is some truth to that. But Intel believes that by committing to this space it builds the best future for the company.
Filling the Product Gaps
In the first several years of my PCPer employment, I typically handled most of the AMD CPU refreshes. These were rather standard affairs that involved small jumps in clockspeed and performance. These happened every 6 to 8 months, with the bigger architectural shifts happening some years apart. We are finally seeing a new refresh of the AMD APU parts after the initial release of Kaveri to the world at the beginning of this year. This update is different. Unlike previous years, there are no faster parts than the already available A10-7850K.
This refresh deals with fleshing out the rest of the Kaveri lineup with products that address different TDPs, markets, and prices. The A10-7850K is still the king when it comes to performance on the FM2+ socket (as long as users do not pay attention to the faster CPU performance of the A10-6800K). The initial launch in January also featured another part that never became available until now; the A8-7600 was supposed to be available some months ago, but is only making it to market now. The 7600 part was unique in that it had a configurable TDP that went from 65 watts down to 45 watts. The 7850K on the other hand was configurable from 95 watts down to 65 watts.
So what are we seeing today? AMD is releasing three parts to address the lower power markets that AMD hopes to expand their reach into. The A8-7600 was again detailed back in January, but never released until recently. The other two parts are brand new. The A10-7800 is a 65 watt TDP part with a cTDP that goes down to 45 watts. The other new chip is the A6-7600K which is unlocked, has a configurable TDP, and looks to compete directly with Intel’s recently released 20 year Anniversary Pentium G3258.
Subject: Processors | July 22, 2014 - 04:15 PM | Jeremy Hellstrom
Tagged: linux, Pentium G3258, ubuntu 14.10
Phoronix tested out the 20th Anniversary Pentium CPU on Ubuntu 14.10 and right off the bat were impressed as they managed a perfectly stable overclock of 4.4GHz on air. Using Linux 3.16 and Mesa 10.2 they had no issues with the performance of the onboard GPU though the performance lagged behind the fast GPU present on the Haswell chips they tested against. When they benchmarked the CPU the lack of Advanced Vector Extensions and the fact that it is a dual core CPU showed in the results but when you consider the difference in price for a G3258's compared to a 4770K it fares quite well. Stay tuned for their next set of benchmarks which will compare the G3258 to AMD's current offerings.
"Up for review today on Phoronix is the Pentium G3258, the new processor Intel put out in celebration of their Pentium brand turning 20 years old. This new Pentium G3258 processor costs under $100 USD and comes unlocked for offering quite a bit overclocking potential while this Pentium CPU can be used by current Intel 8 and 9 Series Chipsets. Here's our first benchmarks of the Intel Pentium G3258 using Ubuntu Linux."
Here are some more Processor articles from around the web:
- Intel Core i7 4790K – Haswell gets a refresh @ Bjorn3D
- Haswell Devils Canyon Performance @ Hardware Asylum
- AMD Athlon 5350 and Gigabyte GA-AM1M-S2H @ Benchmark Reviews
- AMD FX-9590 & FX-9370 Review @ OCC
Subject: General Tech, Graphics Cards, Processors | July 19, 2014 - 03:05 AM | Scott Michaud
Tagged: Xeon Phi, xeon, Intel, avx-512, avx
It is difficult to know what is actually new information in this Intel blog post, but it is interesting none-the-less. Its topic is the AVX-512 extension to x86, designed for Xeon and Xeon Phi processors and co-processors. Basically, last year, Intel announced "Foundation", the minimum support level for AVX-512, as well as Conflict Detection, Exponential and Reciprocal, and Prefetch, which are optional. This, earlier blog post was very much focused on Xeon Phi, but it acknowledged that the instructions will make their way to standard, CPU-like Xeons at around the same time.
This year's blog post brings in a bit more information, especially for common Xeons. While all AVX-512-supporting processors (and co-processors) will support "AVX-512 Foundation", the instruction set extensions are a bit more scattered.
|Conflict Detection Instructions||Yes||Yes||Yes|
|Exponential and Reciprocal Instructions||No||Yes||Yes|
|Byte and Word Instructions||Yes||No||No|
|Doubleword and Quadword Instructions||Yes||No||No|
|Vector Length Extensions||Yes||No||No|
Source: Intel AVX-512 Blog Post (and my understanding thereof).
So why do we care? Simply put: speed. Vectorization, the purpose of AVX-512, has similar benefits to multiple cores. It is not as flexible as having multiple, unique, independent cores, but it is easier to implement (and works just fine with having multiple cores, too). For an example: imagine that you have to multiply two colors together. The direct way to do it is multiply red with red, green with green, blue with blue, and alpha with alpha. AMD's 3DNow! and, later, Intel's SSE included instructions to multiply two, four-component vectors together. This reduces four similar instructions into a single operating between wider registers.
Smart compilers (and programmers, although that is becoming less common as compilers are pretty good, especially when they are not fighting developers) are able to pack seemingly unrelated data together, too, if they undergo similar instructions. AVX-512 allows for sixteen 32-bit pieces of data to be worked on at the same time. If your pixel only has four, single-precision RGBA data values, but you are looping through 2 million pixels, do four pixels at a time (16 components).
For the record, I basically just described "SIMD" (single instruction, multiple data) as a whole.
This theory is part of how GPUs became so powerful at certain tasks. They are capable of pushing a lot of data because they can exploit similarities. If your task is full of similar problems, they can just churn through tonnes of data. CPUs have been doing these tricks, too, just without compromising what they do well.
Subject: General Tech, Processors, Mobile | July 16, 2014 - 03:37 AM | Scott Michaud
Tagged: quarterly results, quarterly earnings, quarterly, Intel, earnings
Another fiscal quarter brings another Intel earnings report. Once again, they are doing well for themselves as a whole but are struggling to gain a foothold in mobile. In three months, they sold 8.7 billion dollars in PC hardware, of which 3.7 billion was profit. Its mobile division, on the other hand, brought in 51 million USD in revenue, losing 1.1 billion dollars for their efforts. In all, the company is profitable -- by about 3.84 billion USD.
One interesting metric which Intel adds to their chart, and I have yet to notice another company listing this information so prominently, is their number of employees, compared between quarters. Last year, Intel employed about 106,000 people, which increased to 106,300 two quarters ago. Between two quarters ago and this last quarter, that number dropped by 1400, to 104,900 employees, which was about 1.3% of their total workforce. There does not seem to be a reason for this decline (except for Richard Huddy, we know that he went to AMD).
Image Credit: Anandtech
As a final note, Anandtech, when reporting on this story, added a few historical trends near the end. One which caught my attention was the process technology vs. quarter graph, demonstrating their smallest transistor size over the last thirteen-and-a-bit years. We are still slowly approaching 0nm, following an exponential curve as it approaches its asymptote. The width, however, is still fairly regular. It looks like it is getting slightly longer, but not drastically (minus the optical illusion caused by the smaller drops).
Subject: General Tech, Processors, Mobile | July 11, 2014 - 04:58 PM | Scott Michaud
Tagged: x86, VIA, isaiah II, Intel, centaur, arm, amd
There might be a third, x86-compatible processor manufacturer who is looking at the mobile market. Intel has been trying to make headway, including the direct development of Android for the x86 architecture. The company also has a few design wins, mostly with Windows 8.1-based tablets but also the occasional Android-based models. Google is rumored to be preparing the "Nexus 8" tablet with one of Intel's Moorefield SoCs. AMD, the second-largest x86 processor manufacturer, is aiming their Mullins platform at tablets and two-in-ones, but cannot afford to play snowplow, at least not like Intel.
VIA, through their Centaur Technology division, is expected to announce their own x86-based SoC, too. Called Isaiah II, it is rumored to be a quad core, 64-bit processor with a maximum clock rate of 2.0 GHz. Its GPU is currently unknown. VIA sold their stake S3 Graphics to HTC back in 2011, who then became majority shareholder over the GPU company. That said, HTC and VIA are very close companies. The chairwoman of HTC is the founder of VIA Technologies. The current President and CEO of VIA, who has been in that position since 1992, is her husband. I expect that the GPU architecture will be provided by S3, or will somehow be based on their technology. I could be wrong. Both companies will obviously do what they think is best.
It would make sense, though, especially if it benefits HTC with cheap but effective SoCs for Android and "full" Windows (not Windows RT) devices.
Or this announcement could be larger than it would appear. Three years ago, VIA filed for a patent which described a processor that can read both x86 and ARM machine language and translate it into its own, internal microinstructions. The Centaur Isaiah II could reasonably be based on that technology. If so, this processor would be able to support either version of Android. Or, after Intel built up the Android x86 code base, maybe they shelved that initiative (or just got that patent for legal reasons).
But what about Intel? Honestly, I see this being a benefit for the behemoth. Extra x86-based vendors will probably grow the overall market share, compared to ARM, by helping with software support. Even if it is compatible with both ARM and x86, what Intel needs right now is software. They can only write so much of it themselves. It is possible that VIA, being the original netbook processor, could disrupt the PC market with both x86 and ARM compatibility, but I doubt it.
Centaur Technology, the relevant division of VIA, will make their announcement in less than 51 days.
Subject: Processors | July 9, 2014 - 05:42 PM | Josh Walrath
Tagged: nvidia, msi, Luxmark, Lightning, hsa, GTX 580, GCN, APU, amd, A88X, A10-7850K
When I first read many of the initial AMD A10 7850K reviews, my primary question was how would the APU act if there was a different GPU installed on the system and did not utilize the CrossFire X functionality that AMD talked about. Typically when a user installs a standalone graphics card on the AMD FM2/FM2+ platform, they disable the graphics portion of the APU. They also have to uninstall the AMD Catalyst driver suite. So this then leaves the APU as a CPU only, and all of that graphics silicon is left silent and dark.
Who in their right mind would pair a high end graphics card with the A10-7850K? This guy!
Does this need to be the case? Absolutely not! The GCN based graphics unit on the latest Kaveri APUs is pretty powerful when used in GPGPU/OpenCL applications. The 4 cores/2 modules and 8 GCN cores can push out around 856 GFlops when fully utilized. We also must consider that the APU is the first fully compliant HSA (Heterogeneous System Architecture) chip, and it handles memory accesses much more efficiently than standalone GPUs. The shared memory space with the CPU gets rid of a lot of the workarounds typically needed for GPGPU type applications. It makes sense that users would want to leverage the performance potential of a fully functioning APU while upgrading their overall graphics performance with a higher end standalone GPU.
To get this to work is very simple. Assuming that the user has been using the APU as their primary graphics controller, they should update to the latest Catalyst drivers. If the user is going to use an AMD card, then it would behoove them to totally uninstall the Catalyst driver and re-install only after the new card is installed. After this is completed restart the machine, go into the UEFI, and change the primary video boot device to PEG (PCI-Express Graphics) from the integrated unit. Save the setting and shut down the machine. Insert the new video card and attach the monitor cable(s) to it. Boot the machine and either re-install the Catalyst suite if an AMD card is used, or install the latest NVIDIA drivers if that is the graphics choice.
Windows 7 and Windows 8 allow users to install multiple graphics drivers from different vendors. In my case I utilized a last generation GTX 580 (the MSI N580GTX Lightning) along with the AMD A10 7850K. These products coexist happily together on the MSI A88X-G45 Gaming motherboard. The monitor is attached to the NVIDIA card and all games are routed through that since it is the primary graphics adapter. Performance seems unaffected with both drivers active.
I find it interesting that the GPU portion of the APU is named "Spectre". Who owns those 3dfx trademarks anymore?
When I load up Luxmark I see three entries: the APU (CPU and GPU portions), the GPU portion of the APU, and then the GTX 580. Luxmark defaults to the GPUs. We see these GPUs listed as “Spectre”, which is the GCN portion of the APU, and the NVIDIA GTX 580. Spectre supports OpenCL 1.2 while the GTX 580 is an OpenCL 1.1 compliant part.
With both GPUs active I can successfully run the Luxmark “Sala” test. The two units perform better together than when they are run separately. Adding in the CPU does increase the score, but not by very much (my guess here is that the APU is going to be very memory bandwidth bound in such a situation). Below we can see the results of the different units separate and together.
These results make me hopeful about the potential of AMD’s latest APU. It can run side by side with a standalone card, and applications can leverage the performance of this unit. Now all we need is more HSA aware software. More time and more testing is needed for setups such as this, and we need to see if HSA enabled software really does see a boost from using the GPU portion of the APU as compared to a pure CPU piece of software or code that will run on the standalone GPU.
Personally I find the idea of a heterogeneous solution such as this appealing. The standalone graphics card handles the actual graphics portions, the CPU handles that code, and the HSA software can then fully utilize the graphics portion of the APU in a very efficient manner. Unfortunately, we do not have hard numbers on the handful of HSA aware applications out there, especially when used in conjunction with standalone graphics. We know in theory that this can work (and should work), but until developers get out there and really optimize their code for such a solution, we simply do not know if having an APU will really net the user big gains as compared to something like the i7 4770 or 4790 running pure x86 code.
In the meantime, at least we know that these products work together without issue. The mixed mode OpenCL results make a nice case for improving overall performance in such a system. I would imagine with more time and more effort from developers, we could see some really interesting implementations that will fully utilize a system such as this one. Until then, happy experimenting!
Subject: Processors | July 8, 2014 - 07:23 PM | Jeremy Hellstrom
Tagged: intel atom, Pentium G3258, overclocking
Technically it is an Anniversary Edition Pentium processor but it reminds those of us who have been in the game a long time of the old Celeron D's which cost very little and overclocked like mad! The Pentium G3258 is well under $100 but the stock speed of 3.2GHz is only a recommendation as this processor is just begging to be overclocked. The Tech Report coaxed it up to 4.8GHz on air cooling, 100MHz higher than the i7-4790K they tested. A processor that costs about 20% of the price of the 4790K can almost meet its performance in Crysis 3 without resorting to even high end watercooling should make any gamer on a budget sit up an take notice. Sure you lose the extra cores and other features of the flagship processor but if you are primarily a gamer these are not your focus, you simply want the fastest processor you can get at a reasonable amount of money. Stay tuned for more information about the Anniversary Edition Pentium as there are more benchmarks to be run!
"This new Pentium is an unlocked dual-core CPU based on the latest 22-nm Haswell silicon. I ran out and picked one up as soon as they went on sale last week. The list price is only 72 bucks, but Micro Center had them on sale for $60. In other words, you can get a processor that will quite possibly run at clock speeds north of 4GHz—with all the per-clock throughput of Intel's very latest CPU core—for the price of a new Call of Shooty game.
Also, ours overclocks like a Swiss watchmaker on meth."
Here are some more Processor articles from around the web:
- Intel Pentium G3258 Dual Core Processor Gaming Performance @ Legit Reviews
- Intel Pentium G3258 Processor Review @ Legit Reviews
- Intel Core i7 4790K @ eTeknix
- Devil's Canyon Intel Core i7-4790K @ Legion Hardware
- Overclocking the Core i7-4790K @ The Tech Report
When Magma Freezes Over...
Intel confirms that they have approached AMD about access to their Mantle API. The discussion, despite being clearly labeled as "an experiment" by an Intel spokesperson, was initiated by them -- not AMD. According to AMD's Gaming Scientist, Richard Huddy, via PCWorld, AMD's response was, "Give us a month or two" and "we'll go into the 1.0 phase sometime this year" which only has about five months left in it. When the API reaches 1.0, anyone who wants to participate (including hardware vendors) will be granted access.
AMD inside Intel Inside???
I do wonder why Intel would care, though. Intel has the fastest per-thread processors, and their GPUs are not known to be workhorses that are held back by API call bottlenecks, either. Of course, that is not to say that I cannot see any reason, however...
Subject: General Tech, Graphics Cards, Processors | July 2, 2014 - 03:55 AM | Scott Michaud
Tagged: Intel, Xeon Phi, xeon, silvermont, 14nm
Anandtech has just published a large editorial detailing Intel's Knights Landing. Mostly, it is stuff that we already knew from previous announcements and leaks, such as one by VR-Zone from last November (which we reported on). Officially, few details were given back then, except that it would be available as either a PCIe-based add-in board or as a socketed, bootable, x86-compatible processor based on the Silvermont architecture. Its many cores, threads, and 512 bit registers are each pretty weak, compared to Haswell, for instance, but combine to about 3 TFLOPs of double precision performance.
Not enough graphs. Could use another 256...
The best way to imagine it is running a PC with a modern, Silvermont-based Atom processor -- only with up to 288 processors listed in your Task Manager (72 actual cores with quad HyperThreading).
The main limitation of GPUs (and similar coprocessors), however, is memory bandwidth. GDDR5 is often the main bottleneck of compute performance and just about the first thing to be optimized. To compensate, Intel is packaging up-to 16GB of memory (stacked DRAM) on the chip, itself. This RAM is based on "Hybrid Memory Cube" (HMC), developed by Micron Technology, and supported by the Hybrid Memory Cube Consortium (HMCC). While the actual memory used in Knights Landing is derived from HMC, it uses a proprietary interface that is customized for Knights Landing. Its bandwidth is rated at around 500GB/s. For comparison, the NVIDIA GeForce Titan Black has 336.4GB/s of memory bandwidth.
Intel and Micron have worked together in the past. In 2006, the two companies formed "IM Flash" to produce the NAND flash for Intel and Crucial SSDs. Crucial is Micron's consumer-facing brand.
So the vision for Knights Landing seems to be the bridge between CPU-like architectures and GPU-like ones. For compute tasks, GPUs edge out CPUs by crunching through bundles of similar tasks at the same time, across many (hundreds of, thousands of) computing units. The difference with (at least socketed) Xeon Phi processors is that, unlike most GPUs, Intel does not rely upon APIs, such as OpenCL, and drivers to translate a handful of functions into bundles of GPU-specific machine language. Instead, especially if the Xeon Phi is your system's main processor, it will run standard, x86-based software. The software will just run slowly, unless it is capable of vectorizing itself and splitting across multiple threads. Obviously, OpenCL (and other APIs) would make this parallelization easy, by their host/kernel design, but it is apparently not required.
It is a cool way that Intel arrives at the same goal, based on their background. Especially when you mix-and-match Xeons and Xeon Phis on the same computer, it is a push toward heterogeneous computing -- with a lot of specialized threads backing up a handful of strong ones. I just wonder if providing a more-direct method of programming will really help developers finally adopt massively parallel coding practices.
I mean, without even considering GPU compute, how efficient is most software at splitting into even two threads? Four threads? Eight threads? Can this help drive heterogeneous development? Or will this product simply try to appeal to those who are already considering it?