Subject: General Tech, Processors | August 24, 2014 - 03:33 AM | Scott Michaud
Tagged: Intel, Haswell-E, Ivy Bridge-E, haswell, solder, thermal paste
Sorry for being about a month late to this news. Apparently, someone got their hands on an Intel Core i7-5960X and they wanted to see its eight cores. Removing the lid, they found that it was soldered directly onto the die with an epoxy, rather than coated with a thermal paste. While Haswell-E will still need to contend with the limitations of 22nm, and how difficult it becomes to exceed various clockspeed ceilings, the better ability to dump heat is always welcome.
Image Credit: OCDrift
While Devil's Canyon (Core i7 4970K) used better thermal paste, the method used with Haswell-E will be event better. I should note that Ivy Bridge-E, released last year, also contained a form of solder under its lid and its overclocking results were still limited. This is not an easy path to ultimate gigahertz. Even so, it is nice that Intel, at least on their enthusiast line, is spending that little bit extra to not introduce artificial barriers.
Subject: General Tech, Processors | August 23, 2014 - 01:38 AM | Scott Michaud
Tagged: X99, Intel, Haswell-E
Haswell-E, with its X99 chipset, are expected to launch soon. This will bring a new spread of processors and motherboards to the high-end, enthusiast market. These are the processors that fans of Intel should buy if they have money, want all the RAM, and have a bunch of PCIe expansion cards to install.
If you count the PCIe x1 slots, the table would refer to the first, third, fifth, and seventh slots.
To me, this is not too bad. You are able to use three GPUs with eight-lane bandwidth and stick a four-lane PCIe SSD on the last slot. Considering that each lane is PCIe 3.0, it is similar to having three PCIe 2.0 x16 slots. While two-way and three-way SLI is supported on all CPUs, four-way SLI is only allowed with processors that provide forty lanes of PCIe 3.0.
Gigabyte also provides three PCIe 2.0 x1 slots, which are not handled by the CPU and do not count against its available lanes.
Since I started to write up this news post, Gigabyte seems to have replaced their manual with a single, blank page. Thankfully, I was able to have it cached long enough to finish my thoughts. Some sites claim that the manual failed to mention the 8-8-8 configuration and suggested that configurations of three GPUs were impossible. That is not true; the manual refers to these situations, just not in the most clear of terms.
Haswell-E should launch soon, with most rumors pointing to the end of the month.
Subject: Processors | August 19, 2014 - 09:06 PM | Tim Verry
Tagged: VIA, isaiah II, centaur technologies, centaur
VIA subsidiary Centaur Technology is rumored to be launching a new x86 processor at the end of August based on the "Isaiah II" architecture. This upcoming chip is a 64-bit SoC aimed at the mobile and low power space. So far, the only known implementation is a quad core version clocked at up to 2.0 GHz with a 2MB L2 cache. Benchmarks of the quad core Isaiah II-based processor recently appeared online, and if the SiSoft Sandra results hold true VIA has very competitive chip on its hands that outperforms Intel's Bay Trail Z3770 and holds its own against AMD's Jaguar-based Athlon 5350.
The SiSoft Sandra results below show the alleged Isaiah II quad core handily outmaneuvering Intel's Bay Trail SoC and trading wins with AMD's Athlon 5350. All three SoCs are quad core parts with integrated graphics solutions. The benchmarks were run on slightly different configurations as they do not share a motherboard or chipset in common. In the case of the VIA chip, it was paired with a motherboard using the VIA VX11H chipset).
|Processor||VIA Isaiah II Quad Core||AMD Athlon 5350||Intel Atom Z3770|
|CPU Arithmetic||20.00 GOPS||22.66 GOPS||15.10 GOPS|
|CPU Multimedia||50.20 Mpix/s||47.56 Mpix/s||25.90 Mpix/s|
|Multicore Efficiency||3.10 GB/s||4.00 GB/s||1.70 GB/s|
|Cryptography (HS)||1.50 GB/s||1.48 GB/s||0.40 GB/s|
|PM Efficiency (ALU)||2.90 GIPS||2.88 GIPS||2.50 GIPS|
|Financial Analysis (DP FP64)||3.00 kOPT/S||3.64 kOPT/S||1.50 kOPT/S|
For comparison, The Atom Z3770 is a quad core clocked at 1.46 GHz (2.39 GHz max turbo) with 2MB L2 cache and Intel HD Graphics clocked at up to 667 MHz supporting up to 4GB of 1066 MHz memory. Bay Trail is manufactured on a 22nm process and has a 2W SDP (Scenario Design Power). Further, the AMD "Kabini" Athlon 5350 features four Jaguar CPU cores clocked at 2.05 GHz, a 128-core GCN GPU clocked at 600 MHz, 2MB L2 cache, and support for 1600 MHz memory. AMD's Kabini SoC is a 28nm chip with a 25W TDP (Thermal Design Power). VIA's new chip allegedly supports modern instruction sets, including AVX 2.0, putting it on par with the AMD and Intel options.
|Processor||VIA Isaiah II Quad Core||AMD Athlon 5350||Intel Atom Z3770|
|CPU||4 Cores @ 2.00 GHz||4 Cores @ 2.05 GHz||4 Cores @ 1.46 GHz (up to 2.39 GHz turbo)|
|GPU||?||128 GCN Cores @ 600 MHz||HD Graphics @ (up to) 667 MHz|
|Memory Support||?||1600 MHz||1066 MHz|
|L2 Cache||2 MB||2 MB||2 MB|
|TDP / SDP||?||25W||2W|
The SiSoft Sandra benchmarks spotted by TechPowerUp suggest that the Centaur Technology designed chip has potential. However, there are still several (important) unknowns at this point. Mainly, price and power usage. Also, the GPU VIA is using in the processor is still a mystery though Scott suspects an S3 GPU is possible through a partnership with HTC.
The chip does seem to be offering up competitive performance, but pricing and power efficiency will play a major role in whether or not VIA gets any design wins with system OEMs. If I had to guess, the VIA chip will sit somewhere between the Intel and AMD offerings with the inclusion of motherboard chipset pushing it towards AMD's higher TDP.
If VIA prices it correctly, we could see the company making a slight comeback in the x86 market with consumer facing devices (particularly Windows 8.1 tablets). VIA has traditionally been known as the low power x86 licensee, and the new expanding mobile market is the ideal place for such a chip. Its past endeavors have not been well received (mainly due to timing and volume production/availability issues of the Nano processors), but I hope that Centaur Technology and VIA are able to pull this one off as I had started to forget the company existed (heh).
Subject: General Tech, Graphics Cards, Processors, Mobile, Shows and Expos | August 13, 2014 - 09:55 PM | Scott Michaud
Tagged: siggraph 2014, Siggraph, microsoft, Intel, DirectX 12, directx 11, DirectX
Along with GDC Europe and Gamescom, Siggraph 2014 is going on in Vancouver, BC. At it, Intel had a DirectX 12 demo at their booth. This scene, containing 50,000 asteroids, each in its own draw call, was developed on both Direct3D 11 and Direct3D 12 code paths and could apparently be switched while the demo is running. Intel claims to have measured both power as well as frame rate.
Variable power to hit a desired frame rate, DX11 and DX12.
The test system is a Surface Pro 3 with an Intel HD 4400 GPU. Doing a bit of digging, this would make it the i5-based Surface Pro 3. Removing another shovel-load of mystery, this would be the Intel Core i5-4300U with two cores, four threads, 1.9 GHz base clock, up-to 2.9 GHz turbo clock, 3MB of cache, and (of course) based on the Haswell architecture.
While not top-of-the-line, it is also not bottom-of-the-barrel. It is a respectable CPU.
Intel's demo on this processor shows a significant power reduction in the CPU, and even a slight decrease in GPU power, for the same target frame rate. If power was not throttled, Intel's demo goes from 19 FPS all the way up to a playable 33 FPS.
Intel will discuss more during a video interview, tomorrow (Thursday) at 5pm EDT.
Maximum power in DirectX 11 mode.
For my contribution to the story, I would like to address the first comment on the MSDN article. It claims that this is just an "ideal scenario" of a scene that is bottlenecked by draw calls. The thing is: that is the point. Sure, a game developer could optimize the scene to (maybe) instance objects together, and so forth, but that is unnecessary work. Why should programmers, or worse, artists, need to spend so much of their time developing art so that it could be batch together into fewer, bigger commands? Would it not be much easier, and all-around better, if the content could be developed as it most naturally comes together?
That, of course, depends on how much performance improvement we will see from DirectX 12, compared to theoretical max efficiency. If pushing two workloads through a DX12 GPU takes about the same time as pushing one, double-sized workload, then it allows developers to, literally, perform whatever solution is most direct.
Maximum power when switching to DirectX 12 mode.
If, on the other hand, pushing two workloads is 1000x slower than pushing a single, double-sized one, but DirectX 11 was 10,000x slower, then it could be less relevant because developers will still need to do their tricks in those situations. The closer it gets, the fewer occasions that strict optimization is necessary.
If there are any DirectX 11 game developers, artists, and producers out there, we would like to hear from you. How much would a (let's say) 90% reduction in draw call latency (which is around what Mantle claims) give you, in terms of fewer required optimizations? Can you afford to solve problems "the naive way" now? Some of the time? Most of the time? Would it still be worth it to do things like object instancing and fewer, larger materials and shaders? How often?
NVIDIA Reveals 64-bit Denver CPU Core Details, Headed to New Tegra K1 Powered Devices Later This Year
Subject: Processors | August 12, 2014 - 01:06 AM | Tim Verry
Tagged: tegra k1, project denver, nvidia, Denver, ARMv8, arm, Android, 64-bit
During GTC 2014 NVIDIA launched the Tegra K1, a new mobile SoC that contains a powerful Kepler-based GPU. Initial processors (and the resultant design wins such as the Acer Chromebook 13 and Xiaomi Mi Pad) utilized four ARM Cortex-A15 cores for the CPU side of things, but later this year NVIDIA is deploying a variant of the Tegra K1 SoC that switches out the four A15 cores for two custom (NVIDIA developed) Denver CPU cores.
The custom 64-bit Denver CPU cores use a 7-way superscalar design and run a custom instruction set. Denver is a wide but in-order architecture that allows up to seven operations per clock cycle. NVIDIA is using a custom ISA and on-the-fly binary translation to convert ARMv8 instructions to microcode before execution. A software layer and 128MB cache enhance the Dynamic Code Optimization technology by allowing the processor to examine and optimize the ARM code, convert it to the custom instruction set, and further cache the converted microcode of frequently used applications in a cache (which can be bypassed for infrequently processed code). Using the wider execution engine and Dynamic Code Optimization (which is transparent to ARM developers and does not require updated applications), NVIDIA touts the dual Denver core Tegra K1 as being at least as powerful as the quad and octo-core packing competition.
Further, NVIDIA has claimed at at peak throughput (and in specific situations where application code and DCO can take full advantage of the 7-way execution engine) the Denver-based mobile SoC handily outpaces Intel’s Bay Trail, Apple’s A7 Cyclone, and Qualcomm’s Krait 400 CPU cores. In the results of a synthetic benchmark test provided to The Tech Report, the Denver cores were even challenging Intel’s Haswell-based Celeron 2955U processor. Keeping in mind that these are NVIDIA-provided numbers and likely the best results one can expect, Denver is still quite a bit more capable than existing cores. (Note that the Haswell chips would likely pull much farther ahead when presented with applications that cannot be easily executed in-order with limited instruction parallelism).
NVIDIA is ratcheting up mobile CPU performance with its Denver cores, but it is also aiming for an efficient chip and has implemented several power saving tweaks. Beyond the decision to go with an in-order execution engine (with DCO hopefully mostly making up for that), the beefy Denver cores reportedly feature low latency power state transitions (e.g. between active and idle states), power gating, dynamic voltage, and dynamic clock scaling. The company claims that “Denver's performance will rival some mainstream PC-class CPUs at significantly reduced power consumption.” In real terms this should mean that the two Denver cores in place of the quad core A15 design in the Tegra K1 should not result in significantly lower battery life. The two K1 variants are said to be pin compatible such that OEMs and developers can easily bring upgraded models to market with the faster Denver cores.
For those curious, In the Tegra K1, the two Denver cores (clocked at up to 2.5GHz) share a 16-way L2 cache and each have 128KB instruction and 64KB data L1 caches to themselves. The 128MB Dynamic Code Optimization cache is held in system memory.
Denver is the first (custom) 64-bit ARM processor for Android (with Apple’s A7 being the first 64-bit smartphone chip), and NVIDIA is working on supporting the next generation Android OS known as Android L.
The dual Denver core Tegra K1 is coming later this year and I am excited to see how it performs. The current K1 chip already has a powerful fully CUDA compliant Kepler-based GPU which has enabled awesome projects such as computer vision and even prototype self-driving cars. With the new Kepler GPU and Denver CPU pairing, I’m looking forward to seeing how NVIDIA’s latest chip is put to work and the kinds of devices it enables.
Are you excited for the new Tegra K1 SoC with NVIDIA’s first fully custom cores?
Subject: Processors | August 11, 2014 - 03:40 PM | Jeremy Hellstrom
Tagged: A10-7800, A6-7400K, linux, amd, ubuntu 14.04, Kaveri
Linux support for AMD's GPUs has not been progressing at the pace many users would like, though it is improving over time but that is not the same with their APUs. Phoronix just tested the A10-7800 and A6-7400K on Ubuntu 14.04 with kernel 3.13 and the latest Catalyst 14.6 Beta. This preview just covers the raw performance, you can expect to see more published in the near future that will cover new features such as the configurable TDP which exists on these chips. The tests show that the new 7800 can keep pace with the previous 7850K and while the A6-7400K is certainly slower it will be able to handle a Linux machine with relatively light duties. You can see the numbers here.
"At the end of July AMD launched new Kaveri APU models: the A10-7800, A8-7600, and A6-7400K. AMD graciously sent over review samples on their A10-7800 and A6-7400K Kaveri APUs, which we've been benchmarking and have some of the initial Linux performance results to share today."
Here are some more Processor articles from around the web:
- AMD's A10-7800 @ The Tech Report
- AMD A10-7800 APU @ Benchmark Reviews
- AMD A10-7800 @ Kitguru
- AMD Kaveri A8-7600 and A10-7800 APU Review @ Legit Reviews
- AMD A10-7800 “Kaveri” APU @ eTeknix
- AMD A10-7800 Kaveri APU Review @ Hardware Canucks
- Core i7-4790K "Devil's Canyon" overclocking revisited @ The Tech Report
- Intel Core i5 4690K processor @ Hardwareoverclock
Coming in 2014: Intel Core M
The era of Broadwell begins in late 2014 and based on what Intel has disclosed to us today, the processor architecture appears to be impressive in nearly every aspect. Coming off the success of the Haswell design in 2013 built on 22nm, the Broadwell-Y architecture will not only be the first to market with a new microarchitecture, but will be the flagship product on Intel’s new 14nm tri-gate process technology.
The Intel Core M processor, as Broadwell-Y has been dubbed, includes impressive technological improvements over previous low power Intel processors that result in lower power, thinner form factors, and longer battery life designs. Broadwell-Y will stretch into even lower TDPs enabling 9mm or small fanless designs that maintain current battery lifespans. A new 2nd generation FIVR with modified power delivery design allows for even thinner packaging and a wider range of dynamic frequencies than before. And of course, along with the shift comes an updated converged core design and improved graphics performance.
All of these changes are in service to what Intel claims is a re-invention of the notebook. Compared to 2010 when the company introduced the original Intel Core processor, thus redirecting Intel’s direction almost completely, Intel Core M and the Broadwell-Y changes will allow for some dramatic platform changes.
Notebook thickness will go from 26mm (~1.02 inches) down to a small as 7mm (~0.27 inches) as Intel has proven with its Llama Mountain reference platform. Reductions in total thermal dissipation of 4x while improving core performance by 2x and graphics performance by 7x are something no other company has been able to do over the same time span. And in the end, one of the most important features for the consumer, is getting double the useful battery life with a smaller (and lighter) battery required for it.
But these kinds of advancements just don’t happen by chance – ask any other semiconductor company that is either trying to keep ahead of or catch up to Intel. It takes countless engineers and endless hours to build a platform like this. Today Intel is sharing some key details on how it was able to make this jump including the move to a 14nm FinFET / tri-gate transistor technology and impressive packaging and core design changes to the Broadwell architecture.
Intel 14nm Technology Advancement
Intel consistently creates and builds the most impressive manufacturing and production processes in the world and it has helped it maintain a market leadership over rivals in the CPU space. It is also one of the key tenants that Intel hopes will help them deliver on the world of mobile including tablets and smartphones. At the 22nm node Intel was the first offer 3D transistors, what they called tri-gate and others refer to as FinFET. By focusing on power consumption rather than top level performance Intel was able to build the Haswell design (as well as Silvermont for the Atom line) with impressive performance and power scaling, allowing thinner and less power hungry designs than with previous generations. Some enthusiasts might think that Intel has done this at the expense of high performance components, and there is some truth to that. But Intel believes that by committing to this space it builds the best future for the company.
Filling the Product Gaps
In the first several years of my PCPer employment, I typically handled most of the AMD CPU refreshes. These were rather standard affairs that involved small jumps in clockspeed and performance. These happened every 6 to 8 months, with the bigger architectural shifts happening some years apart. We are finally seeing a new refresh of the AMD APU parts after the initial release of Kaveri to the world at the beginning of this year. This update is different. Unlike previous years, there are no faster parts than the already available A10-7850K.
This refresh deals with fleshing out the rest of the Kaveri lineup with products that address different TDPs, markets, and prices. The A10-7850K is still the king when it comes to performance on the FM2+ socket (as long as users do not pay attention to the faster CPU performance of the A10-6800K). The initial launch in January also featured another part that never became available until now; the A8-7600 was supposed to be available some months ago, but is only making it to market now. The 7600 part was unique in that it had a configurable TDP that went from 65 watts down to 45 watts. The 7850K on the other hand was configurable from 95 watts down to 65 watts.
So what are we seeing today? AMD is releasing three parts to address the lower power markets that AMD hopes to expand their reach into. The A8-7600 was again detailed back in January, but never released until recently. The other two parts are brand new. The A10-7800 is a 65 watt TDP part with a cTDP that goes down to 45 watts. The other new chip is the A6-7600K which is unlocked, has a configurable TDP, and looks to compete directly with Intel’s recently released 20 year Anniversary Pentium G3258.
Subject: Processors | July 22, 2014 - 04:15 PM | Jeremy Hellstrom
Tagged: linux, Pentium G3258, ubuntu 14.10
Phoronix tested out the 20th Anniversary Pentium CPU on Ubuntu 14.10 and right off the bat were impressed as they managed a perfectly stable overclock of 4.4GHz on air. Using Linux 3.16 and Mesa 10.2 they had no issues with the performance of the onboard GPU though the performance lagged behind the fast GPU present on the Haswell chips they tested against. When they benchmarked the CPU the lack of Advanced Vector Extensions and the fact that it is a dual core CPU showed in the results but when you consider the difference in price for a G3258's compared to a 4770K it fares quite well. Stay tuned for their next set of benchmarks which will compare the G3258 to AMD's current offerings.
"Up for review today on Phoronix is the Pentium G3258, the new processor Intel put out in celebration of their Pentium brand turning 20 years old. This new Pentium G3258 processor costs under $100 USD and comes unlocked for offering quite a bit overclocking potential while this Pentium CPU can be used by current Intel 8 and 9 Series Chipsets. Here's our first benchmarks of the Intel Pentium G3258 using Ubuntu Linux."
Here are some more Processor articles from around the web:
- Intel Core i7 4790K – Haswell gets a refresh @ Bjorn3D
- Haswell Devils Canyon Performance @ Hardware Asylum
- AMD Athlon 5350 and Gigabyte GA-AM1M-S2H @ Benchmark Reviews
- AMD FX-9590 & FX-9370 Review @ OCC
Subject: General Tech, Graphics Cards, Processors | July 19, 2014 - 03:05 AM | Scott Michaud
Tagged: Xeon Phi, xeon, Intel, avx-512, avx
It is difficult to know what is actually new information in this Intel blog post, but it is interesting none-the-less. Its topic is the AVX-512 extension to x86, designed for Xeon and Xeon Phi processors and co-processors. Basically, last year, Intel announced "Foundation", the minimum support level for AVX-512, as well as Conflict Detection, Exponential and Reciprocal, and Prefetch, which are optional. This, earlier blog post was very much focused on Xeon Phi, but it acknowledged that the instructions will make their way to standard, CPU-like Xeons at around the same time.
This year's blog post brings in a bit more information, especially for common Xeons. While all AVX-512-supporting processors (and co-processors) will support "AVX-512 Foundation", the instruction set extensions are a bit more scattered.
|Conflict Detection Instructions||Yes||Yes||Yes|
|Exponential and Reciprocal Instructions||No||Yes||Yes|
|Byte and Word Instructions||Yes||No||No|
|Doubleword and Quadword Instructions||Yes||No||No|
|Vector Length Extensions||Yes||No||No|
Source: Intel AVX-512 Blog Post (and my understanding thereof).
So why do we care? Simply put: speed. Vectorization, the purpose of AVX-512, has similar benefits to multiple cores. It is not as flexible as having multiple, unique, independent cores, but it is easier to implement (and works just fine with having multiple cores, too). For an example: imagine that you have to multiply two colors together. The direct way to do it is multiply red with red, green with green, blue with blue, and alpha with alpha. AMD's 3DNow! and, later, Intel's SSE included instructions to multiply two, four-component vectors together. This reduces four similar instructions into a single operating between wider registers.
Smart compilers (and programmers, although that is becoming less common as compilers are pretty good, especially when they are not fighting developers) are able to pack seemingly unrelated data together, too, if they undergo similar instructions. AVX-512 allows for sixteen 32-bit pieces of data to be worked on at the same time. If your pixel only has four, single-precision RGBA data values, but you are looping through 2 million pixels, do four pixels at a time (16 components).
For the record, I basically just described "SIMD" (single instruction, multiple data) as a whole.
This theory is part of how GPUs became so powerful at certain tasks. They are capable of pushing a lot of data because they can exploit similarities. If your task is full of similar problems, they can just churn through tonnes of data. CPUs have been doing these tricks, too, just without compromising what they do well.