Intel to Ship FPGA-Accelerated Xeons in Early 2016

Subject: Processors | November 20, 2015 - 06:21 PM |
Tagged: xeon, Intel, FPGA

UPDATE (Nov 26th, 3:30pm ET): A few readers have mentioned that FPGAs take much less than hours to reprogram. I even received an email last night that claims FPGAs can be reprogrammed in "well under a second." This differs from the sources I've read when I was reading up on their OpenCL capabilities (for potential evolutions of projects) back in ~2013. That said, multiple sources, including one who claim to have personal experience with FPGAs, say that it's not the case. Also, I've never used an FPGA myself -- again, I was just researching them to see where some GPU-based projects could go.

Designing integrated circuits, as I've said a few times, is basically a game. You have a blank canvas that you can etch complexity into. The amount of “complexity” depends on your fabrication process, how big your chip is, the intended power, and so forth. Performance depends on how you use the complexity to compute actual tasks. If you know something special about your workload, you can optimize your circuit to do more with less. CPUs are designed to do basically anything, while GPUs assume similar tasks can be run together. If you will only ever run a single program, you can even bake some or all of its source code into hardware called an “application-specific integrated circuit” (ASIC), which is often used for video decoding, rasterizing geometry, and so forth.


This is an old Atom back when Intel was partnered with Altera for custom chips.

FPGAs are circuits that can be baked into a specific application, but can also be reprogrammed later. Changing tasks requires a significant amount of time (sometimes hours) but it is easier than reconfiguring an ASIC, which involves removing it from your system, throwing it in the trash, and printing a new one. FPGAs are not quite as efficient as a dedicated ASIC, but it's about as close as you can get without translating the actual source code directly into a circuit.

Intel, after purchasing FPGA manufacturer, Altera, will integrate their technology into Xeons in Q1 2016. This will be useful to offload specific tasks that dominate a server's total workload. According to PC World, they will be integrated as a two-chip package, where both the CPU and FPGA can access the same cache. I'm not sure what form of heterogeneous memory architecture that Intel is using, but this would be a great example of a part that could benefit from in-place acceleration. You could imagine a simple function being baked into the FPGA to, I don't know, process large videos in very specific ways without expensive copies.

Again, this is not a consumer product, and may never be. Reprogramming an FPGA can take hours, and I can't think of too many situations where consumers will trade off hours of time to switch tasks with high performance. Then again, it just takes one person to think of a great application for it to take off.

Source: PCWorld

Intel Launches Knights Landing-based Xeon Phi AIBs

Subject: Processors | November 18, 2015 - 07:34 AM |
Tagged: Xeon Phi, knights landing, Intel

The add-in board version of the Xeon Phi has just launched, which Intel aims at supercomputing audiences. They also announced that this product will be available as a socketed processor that is embedded in, as PC World states, “a limited number of workstations” by the first half of next year. The interesting part about these processors is that they combine a GPU-like architecture with the x86 instruction set.

intel-2015-KNL die.jpg

Image Credit: Intel (Developer Zone)

In the case of next year's socketed Knights Landing CPUs, you can even boot your OS with it (and no other processor installed). It will probably be a little like running a 72-core Atom-based netbook.

To make it a little more clear, Knights Landing is a 72-core, 512-bit processor. You might wonder how that can compete against a modern GPU, which has thousands of cores, but those are not really cores in the CPU sense. GPUs crunch massive amounts of calculations by essentially tying several cores together, and doing other tricks to minimize die area per effective instruction. NVIDIA ties 32 instructions together and pushes them down the silicon. As long as they don't diverge, you can get 32 independent computations for very little die area. AMD packs 64 together.

Knight's Landing does the same. The 512-bit registers can hold 16 single-precision (32-bit) values and operate on them simultaneously.

16 times 72 is 1152. All of a sudden, we're in shader-count territory. This is one of the reasons why they can achieve such high performance with “only” 72 cores, compared to the “thousands” that are present on GPUs. They're actually on a similar scale, just counted differently.

Update: (November 18th @ 1:51 pm EST) I just realized that, while I kept saying "one of the reasons", I never elaborated on the other points. Knights Landing also has four threads per core. So that "72 core" is actually "288 thread", with 512-bit registers that can perform sixteen 32-bit SIMD instructions simultaneously. While hyperthreading is not known to be 100% efficient, you could consider Knights Landing to be a GPU with 4608 shader units. Again, it's not the best way to count it, but it could sort-of work.

So in terms of raw performance, Knights Landing can crunch about 8 TeraFLOPs of single-precision performance or around 3 TeraFLOPs of double-precision, 64-bit performance. This is around 30% faster than the Titan X in single precision, and around twice the performance of Titan Black in double precision. NVIDIA basically removed the FP64 compute units from Maxwell / Titan X, so Knight's Landing is about 16x faster, but that's not really a fair comparison. NVIDIA recommends Kepler for double-precision workloads.

So interestingly, Knights Landing would be a top-tier graphics card (in terms of shading performance) if it was compatible with typical graphics APIs. Of course, it's not, and it will be priced way higher than, for instance, the AMD Radeon Fury X. Knight's Landing isn't available on Intel ARK yet, but previous models are in the $2000 - $4000 range.

Source: PC World

New Intel NUC Models Listed with 6th-Gen Skylake Processors

Subject: Processors, Systems | November 17, 2015 - 11:21 AM |
Tagged: Skylake, NUC6i5SYK, NUC6i5SYH, NUC6i3SYK, NUC6i3SYH, nuc, mini-pc, Intel, i5-6260U, i3-6100U


(Image credit: PCMag)

NUC systems sporting the latest Intel 6th-gen Skylake processors are coming, with the NUC6i5SYH, NUC6i5SYK, NUC6i3SYH, NUC6i3SYK listed with updated Core i5 and i3 CPUs. As this is a processor refresh the appearance and product nomenclature remain unchanged (unfortunately).


The four new Skylake Intel NUC models listed on Intel's product page

Here's Intel's description of the Skylake Core i5-powered NUC6i5SYH:

"Intel NUC Kit NUC6i5SYH is equipped with Intel’s newest architecture, the 6th generation Intel Core i5-6260U processor. Intel Iris graphics 540 with 4K display capabilities provides brilliant resolution for gaming and home theaters. NUC5i5SYH has room for a 2.5” drive for additional storage and an M.2 SSD so you can transfer your data at lightning speed. Designed for Windows 10, NUC6i5SYH has the performance to stream media, manage spreadsheets, or create presentations."

The NUC6i5SYH and NUC6i5SYK feature the i5-6260U is a dual-core, Hyper-Threaded 15W part with a base speed of 1.9 GHz with up to 2.8 GHz Turbo. It has 4 MB cache and supports up to 32GB 2133 MHz DDR4. The processor also provides Intel Iris graphics 540 (Skylake GT3e), which offers 48 Execution Units and 64 MB of dedicated eDRAM. The lower-end NUC6i3SYH and NUC6i3SYK models offer the i3-6100U, which is also a dual-core, Hyper-Threaded part, but this 15W processor's speed is fixed at 2.3 GHz without Turbo Boost, and it offers the lesser Intel HD Graphics 520.

Availability and pricing are not yet known, but expect to see the new models for sale soon.

Source: Intel
Subject: Processors, Mobile
Manufacturer: Intel

Skylake Architecture Comes Through

When Intel finally revealed the details surrounding it's latest Skylake architecture design back in August at IDF, we learned for the first time about a new technology called Intel Speed Shift. A feature that moves some of the control of CPU clock speed and ramp up away from the operating system and into hardware gives more control to the processor itself, making it less dependent on Windows (and presumably in the future, other operating systems). This allows the clock speed of a Skylake processor to get higher, faster, allowing for better user responsiveness.


It's pretty clear that Intel is targeting this feature addition for tablets and 2-in-1s where the finger/pen to screen interaction is highly reliant on immediate performance to enable improved user experiences. It has long been known that one of the biggest performance deltas between iOS from Apple and Android from Google centers on the ability for the machine to FEEL faster when doing direct interaction, regardless of how fast the background rendering of an application or web browser actually is. Intel has been on a quest to fix this problem for Android for some time, where it has the ability to influence software development, and now they are bringing that emphasis to Windows 10.

With the most recent Windows 10 update, to build v10586, Intel Speed Shift has finally been enabled for Skylake users. And since you cannot disable the feature once it's installed, this is the one and only time we'll be able to measure performance in our test systems. So let's see if Intel's claims of improved user experiences stand up to our scrutiny.

Continue reading our performance evaluation of Intel Speed Shift on the Skylake Architecture!!

Report: Intel Broadwell-E Flagship i7-6950X a 10 Core, 20 Thread CPU

Subject: Processors | November 13, 2015 - 06:40 PM |
Tagged: X99, processor, LGA2011-v3, Intel, i7-6950X, HEDT, Haswell-E, cpu, Broadwell-E

Intel's high-end desktop (HEDT) processor line will reportedly be moving from Haswell-E to Broadwell-E soon, and with the move Intel will offer their highest consumer core count to date, according to a post at XFastest which WCCFtech reported on yesterday.


Image credit: VR-Zone

While it had been thought that Broadwell-E would feature the same core counts as Haswell-E (as seen on the leaked slide above), according to the report the upcoming flagship Core i7-6950X will be a massive 10 core, 20 thread part built using Intel's 14 nm process. Broadwell-E is expected to provide an upgrade to those running on Intel's current enthusiast X99 platform before Skylake-E arrives with an all-new chipset.

WCCFtech offered this chart in their report, outlining the differences between the HEDT generations (and providing a glimpse of the future Skylake-E variant):


Intel HEDT generations compared (Credit: WCCFtech)

It isn't all that surprising that one of Intel's LGA2011-v3 processors would arrive on desktops with 10 cores as these are closely related to the Xeon server processors, and Haswell based Xeon CPUs are already available with up to 18 cores, though priced far beyond what even the extreme builder would probably find reasonable (not to mention being far less suited to a desktop build based on motherboard compatibility). The projected $999 price tag for the Extreme Edition part with 10 cores would mark not only the first time an Intel desktop processor reached the core-count milestone, but it would also mark the lowest price to attain one of the company's 10-core parts to date (Xeon or otherwise).

Running Intel HD 530 graphics under Linux

Subject: Processors | November 12, 2015 - 01:22 PM |
Tagged: linux, Skylake, Intel, i5-6600K, hd 530, Ubuntu 15.10

A great way to shave money off of a minimalist system is to skip buying a GPU and using the one present on modern processors, as well as installing Linux instead of buying a Windows license.  The problem with doing so is that playing demanding games is going to be beyond your computers ability, at least without turning off most of the features that make the game look good.  To help you figure out what your machine would be capable of is this article from Phoronix.  Their tests show that Windows 10 currently has a very large performance lead compared to the same hardware running on Ubuntu as the Windows OpenGL driver is superior to the open-source Linux driver.  This may change sooner rather than later but you should be aware that for now you will not get the most out of your Skylakes GPU on Linux at this time.


"As it's been a while since my last Windows vs. Linux graphics comparison and haven't yet done such a comparison for Intel's latest-generation Skylake HD Graphics, the past few days I was running Windows 10 Pro x64 versus Ubuntu 15.10 graphics benchmarks with a Core i5 6600K sporting HD Graphics 530."

Here are some more Processor articles from around the web:



Source: Phoronix

Samsung Announces Exynos 8 Octa 8890 Application Processor

Subject: Processors, Mobile | November 12, 2015 - 09:30 AM |
Tagged: SoC, smartphone, Samsung Galaxy, Samsung, mobile, Exynos 8890, Exynos 8 Octa, Exynos 7420, Application Processor

Coming just a day after Qualcomm officially launched their Snapdragon 820 SoC, Samsung is today unveiling their latest flagship mobile part, the Exynos 8 Octa 8890.


The Exynos 8 Octa 8890 is built on Samsung’s 14 nm FinFET process like the previous Exynos 7 Octa 7420, and again is based on the a big.LITTLE configuration; though the big processing cores are a custom design this time around. The Exynos 7420 was comprised of four ARM Cortex A57 cores and four small Cortex A53 cores, and while the small cores in the 8890 are again ARM Cortex A53, the big cores feature Samsung’s “first custom designed CPU based on 64-bit ARMv8 architecture”.

“With Samsung’s own SCI (Samsung Coherent Interconnect) technology, which provides cache-coherency between big and small cores, the Exynos 8 Octa fully utilizes benefits of big.LITTLE structure for efficient usage of the eight cores. Additionally, Exynos 8 Octa is built on highly praised 14nm FinFET process. These all efforts for Exynos 8 Octa provide 30% more superb performance and 10% more power efficiency.”


Another big advancement for the Exynos 8 Octa is the integrated modem, which provides Category 12/13 LTE with download speeds (with carrier aggregation) of up to 600 Mbps, and uploads up to 150 Mbps. This might sound familiar, as it mirrors the LTE Release 12 specs of the new modem in the Snapdragon 820.

Video processing is handled by the Mali-T880 GPU, moving up from the Mali-T760 found in the Exynos 7 Octa. The T880 is “the highest performance and the most energy-efficient mobile GPU in the Mali family”, with up to 1.8x the performance of the T760 while being 40% more energy-efficient. 

Samsung will be taking this new SoC into mass production later this year, and the chip is expected to be featured in the company’s upcoming flagship Galaxy phone.

Full PR after the break.

Source: Samsung

GLOBALFOUNDRIES Achieves 14nm FinFET - Coming to New AMD Products

Subject: Processors | November 6, 2015 - 10:09 AM |
Tagged: tape out, processors, GLOBALFOUNDRIES, global foundries, APU, amd, 14 nm FinFET

GlobalFoundries has today officially announced their success with sample 14 nm FinFET production for upcoming AMD products.


(Image credit: KitGuru)

GlobalFoundries licensed 14 nm LPE and LPP technology from Samsung in 2014, and were producing wafers as early as April of this year. At the time a GF company spokesperson was quoted in this report at KitGuru, stating "the early version (14LPE) is qualified in our fab and our lead product is yielding in double digits. Since 2014, we have taped multiple products and testchips and are seeing rapid progress, in yield and maturity, for volume shipments in 2015." Now they have moved past LPE (Low Power Early) to LPP (Low Power Plus), with new products based on the technology slated for 2016:

"AMD has taped out multiple products using GLOBALFOUNDRIES’ 14nm Low Power Plus (14LPP) process technology and is currently conducting validation work on 14LPP production samples.  Today’s announcement represents another significant milestone towards reaching full production readiness of GLOBALFOUNDRIES’ 14LPP process technology, which will reach high-volume production in 2016."

GlobalFoundries was originally the manufacturing arm of AMD, and has continued to produce the companies processors since the spin-off in 2012. AMD's current desktop FX-8350 CPU was manufactured on 32 nm SOI, and more recently APUs such as the A10-7850K have been produced at 28 nm - both at GlobalFoundries. Intel's latest offerings such as the flagship 6700K desktop CPU are produced with Intel's 14nm process, and the success of the 14LPP production at GlobalFoundries has the potential to bring AMD's new processors closer parity with Intel (at least from a lithography standpoint).

Full PR after the break.

Report: Unreleased AMD Bristol Ridge SoC Listed Online

Subject: Processors | November 5, 2015 - 09:30 PM |
Tagged: SoC, report, processor, mobile apu, leak, FX-9830PP, cpu, Bristol Ridge, APU, amd

A new report points to an entry from the USB implementors forum, which shows an unreleased AMD Bristol Ridge SoC.


(AMD via

Bristol Ridge itself is not news, as the report at Computer Base observes (translation):

"A leaked roadmap had previously noted that Bristol Ridge is in the coming year soldered on motherboards for notebooks and desktop computers in special BGA package FP4."


( via Computer Base)

But there is something different about this chip as the report point out the model name FX-9830P pictured in the screen grab is consistent with the naming scheme for notebook parts, with the highest current model being FX-8800P (Carrizo), a 35W 4-thread Excavator part with 512 stream processors from the R7 GPU core.


(BenchLife via Computer Base)

No details are available other than information from a leaked roadmap (above), which points to Bristol Ridge as an FP4 BGA part for mobile, with a desktop variant for socket FM3 that would replace Kaveri/Godavari (and possibly still an Excavator part). New cores are coming in 2016, and we'll have to wait and see for additional details (or until more information inevitably leaks out).

Update, 11/06/15: WCCFtech expounds on the leak:

“Bristol Ridge isn’t just limited to mobility platforms but will also be featured on AM4 desktop platform as Bristol Ridge will be the APU generation available on desktops in 2016 while Zen would be integrated on the performance focused FX processors.”

WCCFtech’s report also included a link to this SiSoftware database entry for an engineering sample of a dual-core Stoney Ridge processor, a low-power mobile part with a 2.7 GHz clock speed. Stoney Ridge will reportedly succeed Carrizo-L for low-power platforms.

The report also provided this chart to reference the new products:



Report: Intel Xeon D SoC to Reach 16 Cores

Subject: Processors | October 23, 2015 - 02:21 PM |
Tagged: Xeon D, SoC, rumor, report, processor, Pentium D, Intel, cpu

Intel's Xeon D SoC lineup will soon expand to include 12-core and 16-core options, after the platform launched earlier this year with the option of 4 or 8 cores for the 14 nm chips.


The report yesterday from CPU World offers new details on the refreshed lineup which includes both Xeon D and Pentium D SoCs:

"According to our sources, Intel have made some changes to the lineup, which is now comprised of 13 Xeon D and Pentium D SKUs. Even more interesting is that Intel managed to double the maximum number of cores, and consequentially combined cache size, of Xeon D design, and the nearing Xeon D launch may include a few 12-core and 16-core models with 18 MB and 24 MB cache."

The move is not unexpected as Intel initially hinted at an expanded offering by the end of the year (emphasis added):

"...the Intel Xeon processor D-1500 product family is the first offering of a line of processors that will address a broad range of low-power, high-density infrastructure needs. Currently available with 4 or 8 cores and 128 GB of addressable memory..."


Current Xeon D Processors

The new flagship Xeon D model will be the D-1577, a 16-core processor with between 18 and 24 MB of L3 cache (exact specifications are not yet known). These SoCs feature integrated platform controller hub (PCH), I/O, and dual 10 Gigabit Ethernet, and the initial offerings had up to a 45W TDP. It would seem likely that a model with double the core count would either necessitate a higher TDP or simply target a lower clock speed. We should know more before too long.

For futher information on Xeon D, please check out our previous coverage: 

Source: CPU-World