Haswell-E shows its stuff

Subject: Processors | August 29, 2014 - 02:08 PM |
Tagged: Intel, Haswell-E, haswell, evga, ddr4, corsair, core i7, asus, 5960X

The Tech Report took the new i7-5960X, Asus X99 Deluxe, 16 GB of Corsair Vengeance LPX DDR4, a Kingston HyperX SH103S3 240GB SSD and a XFX Radeon HD 7950 DD and set it loose on the test bench.  The results were impressive to say the least, especially when they moved on from games to test productivity software where the Haswell architecture really shines.  When they attempted to overclock the CPU they found a hard limit feeding the processor 1.3V and running 4.4GHz, any faster would cause some applications to BSoD.  On the other hand that applied to all 8 cores and the difference in performance was striking.

Also make sure to read Ryan's review to get even mroe information on this long awaited chip.

ports-socket.jpg

"Haswell-E has arrived. With eight cores, 20MB of cache, and quad channels of DDR4 memory, it looks to be the fastest desktop CPU in history--and not by a little bit. We've tested the heck out of it and have a huge suite of comparisons going to back to the Pentium III 800. Just, you know, for context."

Here are some more Processor articles from around the web:

Processors

Author:
Subject: Processors
Manufacturer: Intel

Revamped Enthusiast Platform

Join us at 12:30pm PT / 3:30pm ET as Intel's Matt Dunford joins us for a live stream event to discuss the release of Haswell-E and the X99 platform!! Find us at http://www.pcper.com/live!!

Sometimes writing these reviews can be pretty anti-climactic. With all of the official and leaked information released about Haswell-E over the last six to nine months, there isn't much more to divulge that can truly be called revolutionary. Yes, we are looking at the new king of the enthusiast market with an 8-core processor that not only brings a 33% increase in core count over the previous generation Ivy Bridge-E and Sandy Bridge-E platforms, but also includes the adoption of the DDR4 memory specification, which allows for high density and high speed memory subsystems.

And along with the new processor on a modified socket (though still LGA2011) comes a new chipset with some interesting new features. If you were left wanting for USB 3.0 or Thunderbolt on X79, then you are going to love what you see with X99. Did you think you needed some more SATA ports to really liven up your pool of hard drives? Retail boards are going to have you covered.

Again, just like last time, you will find a set of three processors that are coming into the market at the same time. These offerings range from the $999 price point and go down to the much more reasonable cost of $389. But this time there are more interesting decisions to be made based on specification differences in the family. Do the changes that Intel made in the sub-$1000 SKUs make it a better or worse buy for users looking to finally upgrade? 

Haswell-E: A New Enthusiast Lineup from Intel

Today's launch of the Intel Core i7-5960X processor continues on the company's path of enthusiast branded parts that are built off of a subset of the workstation and server market. It is no secret that some Xeon branded processors will work in X79 motherboards and the same is true of the upcoming Haswell-EP series (with its X99 platform) launching today. As an enthusiast though, I think we can agree that it doesn't really matter how a processor like this comes about, as long as it continues to occur well into the future.

hswex99-5.jpg

The Core i7-5960X processor is an 8-core, 16-thread design built on what is essentially the same architecture we saw released with the mainstream Haswell parts released in June of 2013. There are some important differences of course, including the lack of integrated graphics and the move from DDR3 to DDR4 for system memory. The underlying microarchitecture remains unchanged, though. Previously known as the Haswell-E platform, the Core i7-5960X continues Intel's trend of releasing enthusiast/workstation grade platforms that are based on an existing mainstream architecture.

Continue reading our review of the new Intel Core i7-5960X Haswell-E processor!!

Haswell-E has sprung a leak

Subject: Processors | August 26, 2014 - 01:32 PM |
Tagged: rumour, leak, Intel, Haswell-E, 5960X, 5930K, 5820K

Take it with a grain of salt as always with leaks of these kind but you will be interested to know that videocardz.com has what might be some inside information on Haswell-E pricing and model numbers.

Intel-HaswellE-E-VideoCardz_Com-Press-Deck-4-850x478.png

Intel i7 / X99 Haswell-E pricing:

  • Intel Core i7 5960X 8C/16HT – 40-lane PCI-Express support (x16 + x16 + x8) — $999
  • Intel Core i7 5930K 6C/12HT – 40-lane PCI-Express support (x16 + x16 + x8) — $583
  • Intel Core i7 5820K 6C/12HT – 28-lane PCI-Express support (x16 + x8 + x4) —– $389

As you can see there is a big jump between the affordable i7-5820K and the more expensive 5930K.  For those who know they will stick with a single GPU or two low to mid-range GPUs the 5820K should be enough for you but if you have any thoughts of upgrading or adding in a number of PCIe SSDs then you might want to seriously consider saving up for the 5930K.  Current generation GPUs and SSDs are not fully utilizing PCIe 3.0 16x but that is not likely to remain true for long so if you wish for your system to have some longevity this is certainly something you should think long and hard about.  Core counts are up while frequencies are down, the 8 core 5960X has a base clock of 3GHz, a full gigahertz slower than the 4790K but you can expect the monstrous 20MB cache and quad-channel DDR4-2133 to mitigate that somewhat.  Also make sure to note that TDP, 140W is no laughing matter and will require some serious cooling.

Follow the link for a long deck of slides that reveal even more!

Intel-HaswellE-E-VideoCardz_Com-Press-Deck-5-850x478.png

Manufacturer: PC Perspective

Introduction

Introduction

02-cpu-in-vise-block-positioning-profile.jpg

Since the introduction of the Haswell line of CPUs, the Internet has been aflame with how hot the CPUs run. Speculation ran rampant on the cause with theories abounding about the lesser surface area and inferior thermal interface material (TIM) in between the CPU die surface and the underside of the CPU heat spreader. It was later confirmed that Intel had changed the TIM interfacing the CPU die surface to the heat spreader with Haswell, leading to the hotter than expected CPU temperatures. This increase in temperature led to inconsistent core-to-core temperatures as well as vastly inferior overclockability of the Haswell K-series chips over previous generations.

A few of the more adventurous enthusiasts took it upon themselves to use inventive ways to address the heat concerns surrounding the Haswell by delidding the processor. The delidding procedure involves physically removing the heat spreader from the CPU, exposing the CPU die. Some individuals choose to clean the existing TIM from the core die and heat spreader underside, applying superior TIM such as metal or diamond-infused paste or even the Coollaboratory Liquid Ultra metal material and fixing the heat spreader back in place. Others choose a more radical solution, removing the heat spreader from the equation entirely for direct cooling of the naked CPU die. This type of cooling method requires use of a die support plate, such as the MSI Die Guard included with the MSI Z97 XPower motherboard.

Whichever outcome you choose, you must first remove the heat spreader from the CPU's PCB. The heat spreader itself is fixed in place with black RTV-type material ensuring a secure and air-tight seal, protecting the fragile die from outside contaminants and influences. Removal can be done in multiple ways with two of the most popular being the razor blade method and the vise method. With both methods, you are attempting to separate the CPU PCB from the heat spreader without damaging the CPU die or components on the top or bottom sides of the CPU PCB.

Continue reading editorial on delidding your Haswell CPU!!

Intel Haswell-E De-Lidded: Solder Is Its Thermal Interface

Subject: General Tech, Processors | August 24, 2014 - 03:33 AM |
Tagged: Intel, Haswell-E, Ivy Bridge-E, haswell, solder, thermal paste

Sorry for being about a month late to this news. Apparently, someone got their hands on an Intel Core i7-5960X and they wanted to see its eight cores. Removing the lid, they found that it was soldered directly onto the die with an epoxy, rather than coated with a thermal paste. While Haswell-E will still need to contend with the limitations of 22nm, and how difficult it becomes to exceed various clockspeed ceilings, the better ability to dump heat is always welcome.

Intel-5960X-delidded.jpg

Image Credit: OCDrift

While Devil's Canyon (Core i7 4970K) used better thermal paste, the method used with Haswell-E will be event better. I should note that Ivy Bridge-E, released last year, also contained a form of solder under its lid and its overclocking results were still limited. This is not an easy path to ultimate gigahertz. Even so, it is nice that Intel, at least on their enthusiast line, is spending that little bit extra to not introduce artificial barriers.

Source: OCDrift

X99 Manuals Leak: Core i7-5820K Has Reduced PCIe Lanes?

Subject: General Tech, Processors | August 23, 2014 - 01:38 AM |
Tagged: X99, Intel, Haswell-E

Haswell-E, with its X99 chipset, are expected to launch soon. This will bring a new spread of processors and motherboards to the high-end, enthusiast market. These are the processors that fans of Intel should buy if they have money, want all the RAM, and have a bunch of PCIe expansion cards to install.

Intel-logo.png

The Intel enthusiast platform typically has 40 PCIe lanes, while the mainstream platform has 16. For Haswell-E, the Core i7-5820K will be the exception. According to Gigabyte's X99 manual, the four, full-sized PCIe slots will have the following possible configurations:
 

Core i7-5930K
(and above)
First Slot
(PCIe 1)
Second Slot
(PCIe 4)
Third Slot
(PCIe 2)
Fourth Slot
(PCIe 3)
  16x Unused 16x 8x
  8x 8x 16x 8x
Core i7-5820K
First Slot
(PCIe 1)
Second Slot
(PCIe 4)
Third Slot
(PCIe 2)
Fourth Slot
(PCIe 3)
  16x Unused 8x 4x
  8x 8x 8x 4x

If you count the PCIe x1 slots, the table would refer to the first, third, fifth, and seventh slots.

To me, this is not too bad. You are able to use three GPUs with eight-lane bandwidth and stick a four-lane PCIe SSD on the last slot. Considering that each lane is PCIe 3.0, it is similar to having three PCIe 2.0 x16 slots. While two-way and three-way SLI is supported on all CPUs, four-way SLI is only allowed with processors that provide forty lanes of PCIe 3.0.

Gigabyte also provides three PCIe 2.0 x1 slots, which are not handled by the CPU and do not count against its available lanes.

Since I started to write up this news post, Gigabyte seems to have replaced their manual with a single, blank page. Thankfully, I was able to have it cached long enough to finish my thoughts. Some sites claim that the manual failed to mention the 8-8-8 configuration and suggested that configurations of three GPUs were impossible. That is not true; the manual refers to these situations, just not in the most clear of terms.

Haswell-E should launch soon, with most rumors pointing to the end of the month.

VIA's Rumored New "Isaiah II" Based x86 CPU Will Compete With Intel Bay Trail and AMD Kabini Chips

Subject: Processors | August 19, 2014 - 09:06 PM |
Tagged: VIA, isaiah II, centaur technologies, centaur

VIA subsidiary Centaur Technology is rumored to be launching a new x86 processor at the end of August based on the "Isaiah II" architecture. This upcoming chip is a 64-bit SoC aimed at the mobile and low power space. So far, the only known implementation is a quad core version clocked at up to 2.0 GHz with a 2MB L2 cache. Benchmarks of the quad core Isaiah II-based processor recently appeared online, and if the SiSoft Sandra results hold true VIA has very competitive chip on its hands that outperforms Intel's Bay Trail Z3770 and holds its own against AMD's Jaguar-based Athlon 5350.

Centaur Technology.jpg

The SiSoft Sandra results below show the alleged Isaiah II quad core handily outmaneuvering Intel's Bay Trail SoC and trading wins with AMD's Athlon 5350. All three SoCs are quad core parts with integrated graphics solutions. The benchmarks were run on slightly different configurations as they do not share a motherboard or chipset in common. In the case of the VIA chip, it was paired with a motherboard using the VIA VX11H chipset).

Processor VIA Isaiah II Quad Core AMD Athlon 5350 Intel Atom Z3770
CPU Arithmetic 20.00 GOPS 22.66 GOPS 15.10 GOPS
CPU Multimedia 50.20 Mpix/s 47.56 Mpix/s 25.90 Mpix/s
Multicore Efficiency 3.10 GB/s 4.00 GB/s 1.70 GB/s
Cryptography (HS) 1.50 GB/s 1.48 GB/s 0.40 GB/s
PM Efficiency (ALU) 2.90 GIPS 2.88 GIPS 2.50 GIPS
Financial Analysis (DP FP64) 3.00 kOPT/S 3.64 kOPT/S 1.50 kOPT/S

For comparison, The Atom Z3770 is a quad core clocked at 1.46 GHz (2.39 GHz max turbo) with 2MB L2 cache and Intel HD Graphics clocked at up to 667 MHz supporting up to 4GB of 1066 MHz memory. Bay Trail is manufactured on a 22nm process and has a 2W SDP (Scenario Design Power). Further, the AMD "Kabini" Athlon 5350 features four Jaguar CPU cores clocked at 2.05 GHz, a 128-core GCN GPU clocked at 600 MHz, 2MB L2 cache, and support for 1600 MHz memory. AMD's Kabini SoC is a 28nm chip with a 25W TDP (Thermal Design Power). VIA's new chip allegedly supports modern instruction sets, including AVX 2.0, putting it on par with the AMD and Intel options. 

Processor VIA Isaiah II Quad Core AMD Athlon 5350 Intel Atom Z3770
CPU 4 Cores @ 2.00 GHz 4 Cores @ 2.05 GHz 4 Cores @ 1.46 GHz (up to 2.39 GHz turbo)
GPU ? 128 GCN Cores @ 600 MHz HD Graphics @ (up to) 667 MHz
Memory Support ? 1600 MHz 1066 MHz
L2 Cache 2 MB 2 MB 2 MB
TDP / SDP ? 25W 2W
Process Node ? 28nm 22nm
Price ? $55 $37

The SiSoft Sandra benchmarks spotted by TechPowerUp suggest that the Centaur Technology designed chip has potential. However, there are still several (important) unknowns at this point. Mainly, price and power usage. Also, the GPU VIA is using in the processor is still a mystery though Scott suspects an S3 GPU is possible through a partnership with HTC. 

The chip does seem to be offering up competitive performance, but pricing and power efficiency will play a major role in whether or not VIA gets any design wins with system OEMs. If I had to guess, the VIA chip will sit somewhere between the Intel and AMD offerings with the inclusion of motherboard chipset pushing it towards AMD's higher TDP.

If VIA prices it correctly, we could see the company making a slight comeback in the x86 market with consumer facing devices (particularly Windows 8.1 tablets). VIA has traditionally been known as the low power x86 licensee, and the new expanding mobile market is the ideal place for such a chip. Its past endeavors have not been well received (mainly due to timing and volume production/availability issues of the Nano processors), but I hope that Centaur Technology and VIA are able to pull this one off as I had started to forget the company existed (heh).

Source: TechPowerUp

Intel and Microsoft Show DirectX 12 Demo and Benchmark

Subject: General Tech, Graphics Cards, Processors, Mobile, Shows and Expos | August 13, 2014 - 09:55 PM |
Tagged: siggraph 2014, Siggraph, microsoft, Intel, DirectX 12, directx 11, DirectX

Along with GDC Europe and Gamescom, Siggraph 2014 is going on in Vancouver, BC. At it, Intel had a DirectX 12 demo at their booth. This scene, containing 50,000 asteroids, each in its own draw call, was developed on both Direct3D 11 and Direct3D 12 code paths and could apparently be switched while the demo is running. Intel claims to have measured both power as well as frame rate.

intel-dx12-LockedFPS.png

Variable power to hit a desired frame rate, DX11 and DX12.

The test system is a Surface Pro 3 with an Intel HD 4400 GPU. Doing a bit of digging, this would make it the i5-based Surface Pro 3. Removing another shovel-load of mystery, this would be the Intel Core i5-4300U with two cores, four threads, 1.9 GHz base clock, up-to 2.9 GHz turbo clock, 3MB of cache, and (of course) based on the Haswell architecture.

While not top-of-the-line, it is also not bottom-of-the-barrel. It is a respectable CPU.

Intel's demo on this processor shows a significant power reduction in the CPU, and even a slight decrease in GPU power, for the same target frame rate. If power was not throttled, Intel's demo goes from 19 FPS all the way up to a playable 33 FPS.

Intel will discuss more during a video interview, tomorrow (Thursday) at 5pm EDT.

intel-dx12-unlockedFPS-1.jpg

Maximum power in DirectX 11 mode.

For my contribution to the story, I would like to address the first comment on the MSDN article. It claims that this is just an "ideal scenario" of a scene that is bottlenecked by draw calls. The thing is: that is the point. Sure, a game developer could optimize the scene to (maybe) instance objects together, and so forth, but that is unnecessary work. Why should programmers, or worse, artists, need to spend so much of their time developing art so that it could be batch together into fewer, bigger commands? Would it not be much easier, and all-around better, if the content could be developed as it most naturally comes together?

That, of course, depends on how much performance improvement we will see from DirectX 12, compared to theoretical max efficiency. If pushing two workloads through a DX12 GPU takes about the same time as pushing one, double-sized workload, then it allows developers to, literally, perform whatever solution is most direct.

intel-dx12-unlockedFPS-2.jpg

Maximum power when switching to DirectX 12 mode.

If, on the other hand, pushing two workloads is 1000x slower than pushing a single, double-sized one, but DirectX 11 was 10,000x slower, then it could be less relevant because developers will still need to do their tricks in those situations. The closer it gets, the fewer occasions that strict optimization is necessary.

If there are any DirectX 11 game developers, artists, and producers out there, we would like to hear from you. How much would a (let's say) 90% reduction in draw call latency (which is around what Mantle claims) give you, in terms of fewer required optimizations? Can you afford to solve problems "the naive way" now? Some of the time? Most of the time? Would it still be worth it to do things like object instancing and fewer, larger materials and shaders? How often?

NVIDIA Reveals 64-bit Denver CPU Core Details, Headed to New Tegra K1 Powered Devices Later This Year

Subject: Processors | August 12, 2014 - 01:06 AM |
Tagged: tegra k1, project denver, nvidia, Denver, ARMv8, arm, Android, 64-bit

During GTC 2014 NVIDIA launched the Tegra K1, a new mobile SoC that contains a powerful Kepler-based GPU. Initial processors (and the resultant design wins such as the Acer Chromebook 13 and Xiaomi Mi Pad) utilized four ARM Cortex-A15 cores for the CPU side of things, but later this year NVIDIA is deploying a variant of the Tegra K1 SoC that switches out the four A15 cores for two custom (NVIDIA developed) Denver CPU cores.

Today at the Hot Chips conference, NVIDIA revealed most of the juicy details on those new custom cores announced in January which will be used in devices later this year.

The custom 64-bit Denver CPU cores use a 7-way superscalar design and run a custom instruction set. Denver is a wide but in-order architecture that allows up to seven operations per clock cycle. NVIDIA is using a custom ISA and on-the-fly binary translation to convert ARMv8 instructions to microcode before execution. A software layer and 128MB cache enhance the Dynamic Code Optimization technology by allowing the processor to examine and optimize the ARM code, convert it to the custom instruction set, and further cache the converted microcode of frequently used applications in a cache (which can be bypassed for infrequently processed code). Using the wider execution engine and Dynamic Code Optimization (which is transparent to ARM developers and does not require updated applications), NVIDIA touts the dual Denver core Tegra K1 as being at least as powerful as the quad and octo-core packing competition.

Further, NVIDIA has claimed at at peak throughput (and in specific situations where application code and DCO can take full advantage of the 7-way execution engine) the Denver-based mobile SoC handily outpaces Intel’s Bay Trail, Apple’s A7 Cyclone, and Qualcomm’s Krait 400 CPU cores. In the results of a synthetic benchmark test provided to The Tech Report, the Denver cores were even challenging Intel’s Haswell-based Celeron 2955U processor. Keeping in mind that these are NVIDIA-provided numbers and likely the best results one can expect, Denver is still quite a bit more capable than existing cores. (Note that the Haswell chips would likely pull much farther ahead when presented with applications that cannot be easily executed in-order with limited instruction parallelism).

NVIDIA Denver CPU Core 64bit ARMv8 Tegra K1.png

NVIDIA is ratcheting up mobile CPU performance with its Denver cores, but it is also aiming for an efficient chip and has implemented several power saving tweaks. Beyond the decision to go with an in-order execution engine (with DCO hopefully mostly making up for that), the beefy Denver cores reportedly feature low latency power state transitions (e.g. between active and idle states), power gating, dynamic voltage, and dynamic clock scaling. The company claims that “Denver's performance will rival some mainstream PC-class CPUs at significantly reduced power consumption.” In real terms this should mean that the two Denver cores in place of the quad core A15 design in the Tegra K1 should not result in significantly lower battery life. The two K1 variants are said to be pin compatible such that OEMs and developers can easily bring upgraded models to market with the faster Denver cores.

NVIDIA Denver CPU cores in Tegra K1.png

For those curious, In the Tegra K1, the two Denver cores (clocked at up to 2.5GHz) share a 16-way L2 cache and each have 128KB instruction and 64KB data L1 caches to themselves. The 128MB Dynamic Code Optimization cache is held in system memory.

Denver is the first (custom) 64-bit ARM processor for Android (with Apple’s A7 being the first 64-bit smartphone chip), and NVIDIA is working on supporting the next generation Android OS known as Android L.

The dual Denver core Tegra K1 is coming later this year and I am excited to see how it performs. The current K1 chip already has a powerful fully CUDA compliant Kepler-based GPU which has enabled awesome projects such as computer vision and even prototype self-driving cars. With the new Kepler GPU and Denver CPU pairing, I’m looking forward to seeing how NVIDIA’s latest chip is put to work and the kinds of devices it enables.

Are you excited for the new Tegra K1 SoC with NVIDIA’s first fully custom cores?

Source: NVIDIA

Kaveri on Linux

Subject: Processors | August 11, 2014 - 03:40 PM |
Tagged: A10-7800, A6-7400K, linux, amd, ubuntu 14.04, Kaveri

Linux support for AMD's GPUs has not been progressing at the pace many users would like, though it is improving over time but that is not the same with their APUs.  Phoronix just tested the A10-7800 and A6-7400K on Ubuntu 14.04 with kernel 3.13 and the latest Catalyst 14.6 Beta.  This preview just covers the raw performance, you can expect to see more published in the near future that will cover new features such as the configurable TDP which exists on these chips.  The tests show that the new 7800 can keep pace with the previous 7850K and while the A6-7400K is certainly slower it will be able to handle a Linux machine with relatively light duties.  You can see the numbers here.

image.php_.jpg

"At the end of July AMD launched new Kaveri APU models: the A10-7800, A8-7600, and A6-7400K. AMD graciously sent over review samples on their A10-7800 and A6-7400K Kaveri APUs, which we've been benchmarking and have some of the initial Linux performance results to share today."

Here are some more Processor articles from around the web:

Processors

Source: Phoronix