Subject: Graphics Cards, Processors | December 8, 2015 - 08:07 AM | Scott Michaud
Tagged: hsa, GCC, amd
Phoronix, the Linux-focused hardware website, highlighted patches for the GNU Compiler Collection (GCC) that implement HSA. This will allow newer APUs, such as AMD's Carrizo, to accelerate chunks of code (mostly loops) that have been tagged with a precompiler flag as valuable to be done on the GPU. While I have done some GPGPU development, many of the low-level specifics of HSA aren't areas that I have too much experience with.
The patches have been managed by Martin Jambor of SUSE Labs. You can see a slideshow presentation of their work on the GNU website. Even though features froze about a month ago, they are apparently hoping that this will make it into the official GCC 6 release. If so, many developers around the world will be able to target HSA-compatible hardware in the first half of 2016. Technically, anyone can do so regardless, but they would need to specifically use the unofficial branch on the GCC Subversion repository. This probably means compiling it themselves, and it might even be behind on a few features in other branches that were accepted into GCC 6.
Subject: Processors | December 4, 2015 - 11:35 PM | Sebastian Peak
Tagged: Skylake, Intel, heatsink, damage, cpu cooler, Core i7 6700K, Core i7 6600K, bend, 6th generation, 3rd party
Some Intel 6th-gen "Skylake" processors have been damaged by the heatsink mounts of 3rd-party CPU coolers according to a report that began with pcgameshardware.de and has since made its rounds throughout PC hardware media (including the sourced Ars Technica article).
The highly-referenced pcgameshardware.de image of a bent Skylake CPU
The problem is easy enough to explain, as Skylake has a notably thinner construction compared to earlier generations of Intel CPUs, and if enough pressure is exerted against these new processors the green substrate can bend, causing damage not only to the CPU but the pins in the LGA 1151 socket as well.
The only way to prevent the possibility of a bend is avoid overtightening the heatsink, but considering most compatible coolers on the market were designed for Haswell and earlier generations of Intel CPU this leaves users to guess what pressure might be adequate without potentially bending the CPU.
Intel has commented on the issue:
"The design specifications and guidelines for the 6th Gen Intel Core processor using the LGA 1151 socket are unchanged from previous generations and are available for partners and 3rd party manufacturers. Intel can’t comment on 3rdparty designs or their adherence to the recommended design specifications. For questions about a specific cooling product we must defer to the manufacturer."
It's worth noting that while Intel states that their "guidelines for the 6th Gen Intel Core processor using the LGA 1151 socket are unchanged from previous generations", it is specifically a change in substrate thickness that has caused the concerns. The problem is not limited to any specific brands, but certainly will be more of an issue for heatsink mounts that can exert a tremendous amount of pressure.
An LGA socket damaged from a bent Skylake CPU (credit: pcgameshardware)
From the Ars report:
"Noctua, EK Water Blocks, Scythe, Arctic, Thermaltake, and Thermalright, commenting to Games Hardware about the issue, suggested that damage from overly high mounting pressure is most likely to occur during shipping or relocation of a system. Some are recommending that the CPU cooler be removed altogether before a system is shipped."
Scythe has been the first vendor to offer a solution to the issue, releasing this statement on their support website:
"Japanese cooling expert Scythe announces a change of the mounting system for Skylake / Socket 1151 on several coolers of its portfolio. All coolers are compatible with Skylake sockets in general, but bear the possibility of damage to CPU and motherboard in some cases where the PC is exposed to strong shocks (e.g. during shipping or relocation).This problem particularly involves only coolers which will mounted with the H.P.M.S. mounting system. To prevent this, the mounting pressure has been reduced by an adjustment of the screw set. Of course, Scythe is going to ship a the new set of screws to every customer completely free of charge! To apply for the free screw set, please send your request via e-mail to email@example.com or use the contact form on our website."
The thickness of Skylake (left) compared to Haswell (right) (credit: pcgameshardware)
As owner of an Intel Skylake i5-6600K, which I have been testing with an assortment of CPU coolers for upcoming reviews, I can report that my processor appears to be free of any obvious damage. I am particularly careful about pressure when attaching a heatsink, but there have been a couple (including the above mentioned Scythe HPMS mounting system) that could easily have been tightened far beyond what was needed for a proper connection.
We will continue to monitor this situation and update as more vendors offer their response to the issue.
Subject: Processors, Mobile | December 1, 2015 - 07:30 AM | Scott Michaud
Tagged: TSMC, SoC, LG, Intel, arm
So this story came out of nowhere. Whether the rumors are true or false, I am stuck on how everyone seems to be talking about it with a casual deadpan. I spent a couple hours Googling whether I missed some big announcement that made Intel potentially fabricating ARM chips a mundane non-story. Pretty much all that I found was Intel allowing Altera to make FPGAs with embedded ARM processors in a supporting role, which is old news.
Image Credit: Internet Memes...
The rumor is that Intel and TSMC were both vying to produce LG's Nuclon 2 SoC. This part is said to house two quad-core ARM modules in a typical big.LITTLE formation. Samples were allegedly produced, with Intel's part (2.4 GHx) being able to clock around 300 MHz faster than TSMC's offering (2.1 GHz). Clock rate is highly dependent upon the “silicon lottery,” so this is an area that production maturity can help with. Intel's sample would also be manufactured at 14nm (versus 16nm from TSMC although these numbers mean less than they used to). LG was also, again allegedly, interesting in Intel's LTE modem. According to the rumors, LG went with TSMC because they felt Intel couldn't keep up with demand.
Now that the rumor has been reported... let's step back a bit.
I talked with Josh a couple of days ago about this post. He's quite skeptical (as I am) about the whole situation. First and foremost, it takes quite a bit of effort to port a design to a different manufacturing process. LG could do it, but it is questionable, especially for a second chip ever sort of thing. Moreover, I still believe that Intel doesn't want to manufacture chips that directly compete with them. x86 in phones is still not a viable business, but Intel hasn't given up and you would think that's a prerequisite.
So this whole thing doesn't seem right.
Subject: Processors | November 20, 2015 - 06:21 PM | Scott Michaud
Tagged: xeon, Intel, FPGA
UPDATE (Nov 26th, 3:30pm ET): A few readers have mentioned that FPGAs take much less than hours to reprogram. I even received an email last night that claims FPGAs can be reprogrammed in "well under a second." This differs from the sources I've read when I was reading up on their OpenCL capabilities (for potential evolutions of projects) back in ~2013. That said, multiple sources, including one who claim to have personal experience with FPGAs, say that it's not the case. Also, I've never used an FPGA myself -- again, I was just researching them to see where some GPU-based projects could go.
Designing integrated circuits, as I've said a few times, is basically a game. You have a blank canvas that you can etch complexity into. The amount of “complexity” depends on your fabrication process, how big your chip is, the intended power, and so forth. Performance depends on how you use the complexity to compute actual tasks. If you know something special about your workload, you can optimize your circuit to do more with less. CPUs are designed to do basically anything, while GPUs assume similar tasks can be run together. If you will only ever run a single program, you can even bake some or all of its source code into hardware called an “application-specific integrated circuit” (ASIC), which is often used for video decoding, rasterizing geometry, and so forth.
This is an old Atom back when Intel was partnered with Altera for custom chips.
FPGAs are circuits that can be baked into a specific application, but can also be reprogrammed later. Changing tasks requires a significant amount of time (sometimes hours) but it is easier than reconfiguring an ASIC, which involves removing it from your system, throwing it in the trash, and printing a new one. FPGAs are not quite as efficient as a dedicated ASIC, but it's about as close as you can get without translating the actual source code directly into a circuit.
Intel, after purchasing FPGA manufacturer, Altera, will integrate their technology into Xeons in Q1 2016. This will be useful to offload specific tasks that dominate a server's total workload. According to PC World, they will be integrated as a two-chip package, where both the CPU and FPGA can access the same cache. I'm not sure what form of heterogeneous memory architecture that Intel is using, but this would be a great example of a part that could benefit from in-place acceleration. You could imagine a simple function being baked into the FPGA to, I don't know, process large videos in very specific ways without expensive copies.
Again, this is not a consumer product, and may never be. Reprogramming an FPGA can take hours, and I can't think of too many situations where consumers will trade off hours of time to switch tasks with high performance. Then again, it just takes one person to think of a great application for it to take off.
Subject: Processors | November 18, 2015 - 07:34 AM | Scott Michaud
Tagged: Xeon Phi, knights landing, Intel
The add-in board version of the Xeon Phi has just launched, which Intel aims at supercomputing audiences. They also announced that this product will be available as a socketed processor that is embedded in, as PC World states, “a limited number of workstations” by the first half of next year. The interesting part about these processors is that they combine a GPU-like architecture with the x86 instruction set.
Image Credit: Intel (Developer Zone)
In the case of next year's socketed Knights Landing CPUs, you can even boot your OS with it (and no other processor installed). It will probably be a little like running a 72-core Atom-based netbook.
To make it a little more clear, Knights Landing is a 72-core, 512-bit processor. You might wonder how that can compete against a modern GPU, which has thousands of cores, but those are not really cores in the CPU sense. GPUs crunch massive amounts of calculations by essentially tying several cores together, and doing other tricks to minimize die area per effective instruction. NVIDIA ties 32 instructions together and pushes them down the silicon. As long as they don't diverge, you can get 32 independent computations for very little die area. AMD packs 64 together.
Knight's Landing does the same. The 512-bit registers can hold 16 single-precision (32-bit) values and operate on them simultaneously.
16 times 72 is 1152. All of a sudden, we're in shader-count territory. This is one of the reasons why they can achieve such high performance with “only” 72 cores, compared to the “thousands” that are present on GPUs. They're actually on a similar scale, just counted differently.
Update: (November 18th @ 1:51 pm EST) I just realized that, while I kept saying "one of the reasons", I never elaborated on the other points. Knights Landing also has four threads per core. So that "72 core" is actually "288 thread", with 512-bit registers that can perform sixteen 32-bit SIMD instructions simultaneously. While hyperthreading is not known to be 100% efficient, you could consider Knights Landing to be a GPU with 4608 shader units. Again, it's not the best way to count it, but it could sort-of work.
So in terms of raw performance, Knights Landing can crunch about 8 TeraFLOPs of single-precision performance or around 3 TeraFLOPs of double-precision, 64-bit performance. This is around 30% faster than the Titan X in single precision, and around twice the performance of Titan Black in double precision. NVIDIA basically removed the FP64 compute units from Maxwell / Titan X, so Knight's Landing is about 16x faster, but that's not really a fair comparison. NVIDIA recommends Kepler for double-precision workloads.
So interestingly, Knights Landing would be a top-tier graphics card (in terms of shading performance) if it was compatible with typical graphics APIs. Of course, it's not, and it will be priced way higher than, for instance, the AMD Radeon Fury X. Knight's Landing isn't available on Intel ARK yet, but previous models are in the $2000 - $4000 range.
Subject: Processors, Systems | November 17, 2015 - 11:21 AM | Sebastian Peak
Tagged: Skylake, NUC6i5SYK, NUC6i5SYH, NUC6i3SYK, NUC6i3SYH, nuc, mini-pc, Intel, i5-6260U, i3-6100U
(Image credit: PCMag)
NUC systems sporting the latest Intel 6th-gen Skylake processors are coming, with the NUC6i5SYH, NUC6i5SYK, NUC6i3SYH, NUC6i3SYK listed with updated Core i5 and i3 CPUs. As this is a processor refresh the appearance and product nomenclature remain unchanged (unfortunately).
The four new Skylake Intel NUC models listed on Intel's product page
Here's Intel's description of the Skylake Core i5-powered NUC6i5SYH:
"Intel NUC Kit NUC6i5SYH is equipped with Intel’s newest architecture, the 6th generation Intel Core i5-6260U processor. Intel Iris graphics 540 with 4K display capabilities provides brilliant resolution for gaming and home theaters. NUC5i5SYH has room for a 2.5” drive for additional storage and an M.2 SSD so you can transfer your data at lightning speed. Designed for Windows 10, NUC6i5SYH has the performance to stream media, manage spreadsheets, or create presentations."
The NUC6i5SYH and NUC6i5SYK feature the i5-6260U is a dual-core, Hyper-Threaded 15W part with a base speed of 1.9 GHz with up to 2.8 GHz Turbo. It has 4 MB cache and supports up to 32GB 2133 MHz DDR4. The processor also provides Intel Iris graphics 540 (Skylake GT3e), which offers 48 Execution Units and 64 MB of dedicated eDRAM. The lower-end NUC6i3SYH and NUC6i3SYK models offer the i3-6100U, which is also a dual-core, Hyper-Threaded part, but this 15W processor's speed is fixed at 2.3 GHz without Turbo Boost, and it offers the lesser Intel HD Graphics 520.
Availability and pricing are not yet known, but expect to see the new models for sale soon.
Skylake Architecture Comes Through
When Intel finally revealed the details surrounding it's latest Skylake architecture design back in August at IDF, we learned for the first time about a new technology called Intel Speed Shift. A feature that moves some of the control of CPU clock speed and ramp up away from the operating system and into hardware gives more control to the processor itself, making it less dependent on Windows (and presumably in the future, other operating systems). This allows the clock speed of a Skylake processor to get higher, faster, allowing for better user responsiveness.
It's pretty clear that Intel is targeting this feature addition for tablets and 2-in-1s where the finger/pen to screen interaction is highly reliant on immediate performance to enable improved user experiences. It has long been known that one of the biggest performance deltas between iOS from Apple and Android from Google centers on the ability for the machine to FEEL faster when doing direct interaction, regardless of how fast the background rendering of an application or web browser actually is. Intel has been on a quest to fix this problem for Android for some time, where it has the ability to influence software development, and now they are bringing that emphasis to Windows 10.
With the most recent Windows 10 update, to build v10586, Intel Speed Shift has finally been enabled for Skylake users. And since you cannot disable the feature once it's installed, this is the one and only time we'll be able to measure performance in our test systems. So let's see if Intel's claims of improved user experiences stand up to our scrutiny.
Subject: Processors | November 13, 2015 - 06:40 PM | Sebastian Peak
Tagged: X99, processor, LGA2011-v3, Intel, i7-6950X, HEDT, Haswell-E, cpu, Broadwell-E
Intel's high-end desktop (HEDT) processor line will reportedly be moving from Haswell-E to Broadwell-E soon, and with the move Intel will offer their highest consumer core count to date, according to a post at XFastest which WCCFtech reported on yesterday.
Image credit: VR-Zone
While it had been thought that Broadwell-E would feature the same core counts as Haswell-E (as seen on the leaked slide above), according to the report the upcoming flagship Core i7-6950X will be a massive 10 core, 20 thread part built using Intel's 14 nm process. Broadwell-E is expected to provide an upgrade to those running on Intel's current enthusiast X99 platform before Skylake-E arrives with an all-new chipset.
WCCFtech offered this chart in their report, outlining the differences between the HEDT generations (and providing a glimpse of the future Skylake-E variant):
Intel HEDT generations compared (Credit: WCCFtech)
It isn't all that surprising that one of Intel's LGA2011-v3 processors would arrive on desktops with 10 cores as these are closely related to the Xeon server processors, and Haswell based Xeon CPUs are already available with up to 18 cores, though priced far beyond what even the extreme builder would probably find reasonable (not to mention being far less suited to a desktop build based on motherboard compatibility). The projected $999 price tag for the Extreme Edition part with 10 cores would mark not only the first time an Intel desktop processor reached the core-count milestone, but it would also mark the lowest price to attain one of the company's 10-core parts to date (Xeon or otherwise).
Subject: Processors | November 12, 2015 - 01:22 PM | Jeremy Hellstrom
Tagged: linux, Skylake, Intel, i5-6600K, hd 530, Ubuntu 15.10
A great way to shave money off of a minimalist system is to skip buying a GPU and using the one present on modern processors, as well as installing Linux instead of buying a Windows license. The problem with doing so is that playing demanding games is going to be beyond your computers ability, at least without turning off most of the features that make the game look good. To help you figure out what your machine would be capable of is this article from Phoronix. Their tests show that Windows 10 currently has a very large performance lead compared to the same hardware running on Ubuntu as the Windows OpenGL driver is superior to the open-source Linux driver. This may change sooner rather than later but you should be aware that for now you will not get the most out of your Skylakes GPU on Linux at this time.
"As it's been a while since my last Windows vs. Linux graphics comparison and haven't yet done such a comparison for Intel's latest-generation Skylake HD Graphics, the past few days I was running Windows 10 Pro x64 versus Ubuntu 15.10 graphics benchmarks with a Core i5 6600K sporting HD Graphics 530."
Here are some more Processor articles from around the web:
- Intel Core i5 6500: A Great Skylake CPU For $200, Works Well On Linux @ Phoronix
- CPU Battle - Old and High-End vs. New and Entry-Level @ Hardware Secrets
- Which is the faster CPU: old but high-end or entry-level and new? - Part 2 @ Hardware Secrets
- AMD FX 8320E CPU Review @ Neoseeker
Subject: Processors, Mobile | November 12, 2015 - 09:30 AM | Sebastian Peak
Tagged: SoC, smartphone, Samsung Galaxy, Samsung, mobile, Exynos 8890, Exynos 8 Octa, Exynos 7420, Application Processor
Coming just a day after Qualcomm officially launched their Snapdragon 820 SoC, Samsung is today unveiling their latest flagship mobile part, the Exynos 8 Octa 8890.
The Exynos 8 Octa 8890 is built on Samsung’s 14 nm FinFET process like the previous Exynos 7 Octa 7420, and again is based on the a big.LITTLE configuration; though the big processing cores are a custom design this time around. The Exynos 7420 was comprised of four ARM Cortex A57 cores and four small Cortex A53 cores, and while the small cores in the 8890 are again ARM Cortex A53, the big cores feature Samsung’s “first custom designed CPU based on 64-bit ARMv8 architecture”.
“With Samsung’s own SCI (Samsung Coherent Interconnect) technology, which provides cache-coherency between big and small cores, the Exynos 8 Octa fully utilizes benefits of big.LITTLE structure for efficient usage of the eight cores. Additionally, Exynos 8 Octa is built on highly praised 14nm FinFET process. These all efforts for Exynos 8 Octa provide 30% more superb performance and 10% more power efficiency.”
Another big advancement for the Exynos 8 Octa is the integrated modem, which provides Category 12/13 LTE with download speeds (with carrier aggregation) of up to 600 Mbps, and uploads up to 150 Mbps. This might sound familiar, as it mirrors the LTE Release 12 specs of the new modem in the Snapdragon 820.
Video processing is handled by the Mali-T880 GPU, moving up from the Mali-T760 found in the Exynos 7 Octa. The T880 is “the highest performance and the most energy-efficient mobile GPU in the Mali family”, with up to 1.8x the performance of the T760 while being 40% more energy-efficient.
Samsung will be taking this new SoC into mass production later this year, and the chip is expected to be featured in the company’s upcoming flagship Galaxy phone.
Full PR after the break.