First Apple A8 Benchmarks Show... "Modest" Increase

Subject: General Tech, Processors, Mobile | September 12, 2014 - 10:30 AM |
Tagged: apple, apple a8, SoC, iphone 6, iphone 6 plus

So one of the first benchmarks for Apple's A8 SoC has been published to Rightware, and it is not very different from its predecessor. The Apple A7 GPU of last year's iPhone 5S received a score of 20,253.80 on the Basemark X synthetic benchmark. The updated Apple A8 GPU, found on the iPhone 6, saw a 4.7% increase, to 21204.26, on the same test.

apple-a8-rightware.png

Again, this is a synthetic benchmark and not necessarily representative of real-world performance. To me, though, it wouldn't surprise me if the GPU is identical, and the increase corresponds mostly to the increase in CPU performance. That said, it still does not explain the lack of increase that we see, despite Apple's switch to TSMC's 20nm process. Perhaps it matters more in power consumption and non-gaming performance? That does not align well with their 20% faster CPU and 50% faster GPU claims...

Speaking of gaming performance, iOS 8 introduces the Metal API, which is Apple's response to Mantle, DirectX 12, and OpenGL Next Initiative. Maybe that boost will give Apple a pass for a generation? Perhaps we will see the two GPUs (A7 and A8) start to diverge in the Metal API? We shall see when more benchmarks and reviews get published.

Source: Rightware

Qualcomm Snapdragon 210 Has LTE for Sub-$100 Devices

Subject: General Tech, Processors, Mobile | September 11, 2014 - 03:27 PM |
Tagged: qualcomm, snapdragon 210, snapdragon, LTE, cheap tablet

The Snapdragon 210 was recently announced by Qualcomm to be an SoC for cheap, sub-$100 tablets and mobile phones. With it, the company aims to bring LTE connectivity to that market segment, including Dual SIM support. It will be manufactured on the 28nm process, with up to four ARM CPU cores and a Qualcomm Adreno 304 GPU.

Qualcomm_Snapdragon_logo.png

According to Qualcomm, the SoC can decode 1080p video. It will also be able to manage cameras with up to 8 megapixels of resolution, including HDR, autofocus, auto white balance, and auto exposure. Let's be honest, you will not really get much more than that for a sub-$100 device.

The Snapdragon 210 has been given Quick Charge 2.0, normally reserved for the 400-line and up, refill the battery quickly when connected to a Quick Charge 2.0-supporting charger (ex: the Motorola Turbo Charger). Quick Charge 1.0 worked by optimizing how energy was delivered to the battery through a specification. Quick Charge 2.0 does the same, just with 60 watts of power (!!). For reference, the USB standard defines 2.5W, which is 5V at 0.5A, although the specification is regularly extended to 5 or 10 watts.

Devices featuring the Snapdragon 210 are expected for the first half of 2015.

Source: Qualcomm

Centaur Technology Extends Their Website Countdown...

Subject: General Tech, Processors, Mobile | September 9, 2014 - 05:38 PM |
Tagged: x86, VIA, centaur technologies

In early July, we reported on VIA's Centaur Technology division getting a new website. At the time, we anticipated that it would coincide with an announcement about Isaiah II, their rumored to be upcoming x86-based SoC (maybe even compatible with ARM, too).

Android-x86.png

Fifty-one days later, on August 31st, 2014, we came back at quarter-to-four EDT and let the website run its course, refreshing occasionally. 4 PM hit and... the counter stayed at 0 days, 0 hours, 0 minutes, and 0 seconds. Okay, I said. For about an hour, I refreshed occasionally because things could have happened on Labour Day weekend. I, then, came back late in the evening, and the day after. I next thought about it the week after, at which point the website was updated... with a timer that expires on September 30th, 2014.

Well... crap.

So by the end of the month, we may find out what Centaur is trying to announce. I am a little less confident in the breadth of the announcement, given that the company waited for the timer to lapse before correcting their mistake. I would expect that if their big announcement, like a new SoC, were to hold up the launch, the company would have known ahead of time. At the moment, it sounds like a typical website redesign which got delayed.

I will hopefully be pleasantly surprised come the end of the month.

Intel Developer Forum (IDF) 2014 Keynote Live Blog

Subject: Processors, Shows and Expos | September 9, 2014 - 08:02 AM |
Tagged: idf, idf 2014, Intel, keynote, live blog

Today is the beginning of the 2014 Intel Developer Forum in San Francisco!  Join me at 9am PT for the first of our live blogs of the main Intel keynote where we will learn what direction Intel is taking on many fronts!

intelicon.jpg

Intel Networking: XL710 Fortville 40 Gigabit Ethernet and VXLAN Acceleration

Subject: General Tech, Networking, Processors | September 8, 2014 - 09:29 AM |
Tagged: xeon e5-2600 v3, xeon e5, Intel

So, to coincide with their E5-2600 v3 launch, Intel is discussing virtualized LANs and new, high-speed PCIe-based, networking adapters. Xeons are typically used in servers and their networking add-in boards will often shame what you see on a consumer machine. One of these boards supports up to two 40GbE connections, configurable to four 10GbE, for all the bandwidth.

intel-40gb-nic-01.png

The Intel XL710 is their new network controller, which I am told is being manufactured at 28nm. It is supposedly more power efficient, as well. In their example, a previous dual 10-gigabit controller will consume 5.2W of power while a single 40-gigabit will consume 3.3W. In terms of a network adapter, that is a significant reduction, which is very important in a data center due to the number of machines and the required air conditioning.

As for the virtualized networking part of the announcement, Intel is heavily promoting Software-defined networking (SDN). Intel mentioned two techniques to help increase usable bandwidth and decrease CPU utilization, which is important at 40 gigabits.

intel-40gb-nic-3.jpg

Receive Side Scaling disabled

The first is "generic segmentation offload" for VXLAN (VXLAN GSO) that allows the host of any given connection to chunk data more efficiently to send out over a virtual network.

intel-40gb-nic-2.jpg

Generic Segmentation Offload disabled

The second is TCP L4 Receive Side Scaling (RSS), which splits traffic between multiple receive queues (and can be managed by multiple CPU threads). I am not a network admin and I will not claim to know how existing platforms manage traffic at this level. Still, Intel seems to claim that this NIC and CPU platform will result in higher effective bandwidth and better multi-core CPU utilization (that I expect will lead to lower power consumption).

intel-40gb-nic-4.jpg

Both enabled

If it works as advertised, it could be a win for customers who buy into the Intel ecosystem.

Source: Intel

Intel Graphics Drivers Claim Significant Improvements

Subject: General Tech, Graphics Cards, Processors | September 6, 2014 - 02:25 PM |
Tagged: iris pro, iris, intel hd graphics, Intel

I was originally intending to test this with benchmarks but, after a little while, I realized that Ivy Bridge was not supported. This graphics driver starts and ends with Haswell. While I cannot verify their claims, Intel advertises up to 30% more performance in some OpenCL tasks and a 10% increase in games like Batman: Arkham City and Sleeping Dogs. They even claim double performance out of League of Legends at 1366x768.

inteltf2.jpg

Intel is giving gamers a "free lunch".

The driver also tunes Conservative Morphological Anti-Aliasing (CMAA). They claim it looks better than MLAA and FXAA, "without performance impact" (their whitepaper from March showed a ~1-to-1.5 millisecond cost on Intel HD 5000). Intel recommends disabling it after exiting games to prevent it from blurring other applications, and they automatically disable it in Windows, Internet Explorer, Chrome, Firefox, and Windows 8.1 Photo.

Adaptive Rendering Control was also added in this driver. This limits redrawing identical frames by comparing the ones it does draw with previously drawn ones, and adjusts the frame rate accordingly. This is most useful for games like Angry Birds, Minesweeper, and Bejeweled LIVE. It is disabled when not on battery power, or when the driver is set to "Maximum Performance".

The Intel Iris and HD graphics driver is available from Intel, for both 32-bit and 64-bit Windows 7, 8, and 8.1, on many Haswell-based GPUs.

Source: Intel

Intel Sent Us a Containment Chamber with Parts Inside

Subject: Motherboards, Processors, Chipsets, Memory, Storage | September 5, 2014 - 10:21 AM |
Tagged: X99-Deluxe, SSD 730, Intel, Haswell-E, ddr4, asus, 5960X

Okay, I'll be the first to admit that I didn't know what I was getting into. When a couple of packages showed up at our office from Intel with claims that they wanted to showcase the new Haswell-E platform...I was confused. The setup was simple: turn on cameras and watch what happens.

So out of the box comes...a containment chamber. A carefully crafted, wood+paint concoction that includes lights, beeps, motors and platforms. 

Want to see how Intel promotes the Core i7-5960X and X99 platform? Check out this video below.

Our reviews of products included in this video:

Intel Announces Core M Processor Lineup Using Broadwell-Y

Subject: Processors | September 5, 2014 - 09:11 AM |
Tagged: Intel, core m, broadwell-y, Broadwell, 14nm

In a somewhat surprising fashion, Intel has decided to announce (again) the Core M processor family that will be shipping this fall and winter using the Broadwell-Y SoC. I was able to visit Portland and talk with the process technology and architecture teams back in early August so much of the news coming out today about the improvements of 14nm tri-gate transistors, the smaller package size of Broadwell-Y and the goals for thinner, fanless designs is going to be a repeat for frequent PC Perspective readers. (You can see that original story, Intel Core M Processor: Broadwell Architecture and 14nm Process Reveal.)

What is new information today are specifics on the clock speeds and SKU offerings.

  5Y70 5Y10a 5Y10
Cores/Threads 2/4 2/4 2/4
Base Freq 1.10 GHz 800 MHz 800 MHz
Max Single Core Turbo 2.6 GHz 2.0 GHz 2.0 GHz
Max Dual Core Turbo 2.6 GHz 2.0 GHz 2.0 GHz
Max Quad Core Turbo N/A N/A N/A
Graphics Intel HD Graphics 5300 Intel HD Graphics 5300 Intel HD Graphics 5300
Graphics Base/Max Freq 100/850 MHz 100/800 MHz 100/800 MHz
LPDDR3L Memory Speed 1600 MHz 1600 MHz 1600 MHz
L3 Cache 4MB 4MB 4MB
TDP 4.5 watts 4.5 watts 4.5 watts
Intel vPro Y N N
Intel TXT Y N N
Intel VT-d Y Y Y
Intel VT-x Y Y Y
AES-NI Y Y Y
1K Pricing $281 $281 $281

Intel has planned three options, all with the same $281 pricing, though obviously based on volume and other deals with OEMs, these are likely to shift. The Core M 5Y70 is the highest performance part with a base clock speed of 1.10 GHz that can scale up to 2.6 GHz with one or both cores active. The other two parts launching today both feature 800 MHz base clocks and 2.0 GHz maximum Turbo speeds.

With that scaling information, and the wide range that the Intel HD Graphics 5300 can hit (100-800 MHz) Intel is doubling down on the benefits of fast and reliable Turbo Boost technology to give you high frequencies only when you need it most. This conserves power consumption the vast majority of time and allows Intel's partners to build fanless designs that are incredibly thin.

The 5Y10 and 5Y10a differ only in that the non-A variant has a configurable TDP down the 4.0 watts should the vendor opt for that.

bwdy1.jpg

Intel is also giving us a more detailed look at the Broadwell-Y PCH that includes a lot of I/O for such a small platform. Two channels of USB 3.0 can support four total ports and as many as four SATA 6G storage units can be integrated as well. These Y-SKUs look like they have 12 lanes of PCIe 2.0 available to them should a notebook vendor decide to use PCIe storage solutions (like M.2) rather than relying purely on SATA. 

bwdy2.jpg

At least one partner has already announced a Core M product: the Lenovo ThinkPad Helix. It appears to be an amazing 11.6-in convertible tablet design. Without a doubt we'll encouter numerous other designs at the Intel Developer Forum that starts next Tuesday.

Source: Intel

You've probably noticed Intel launched a new family of chips

Subject: Processors | September 4, 2014 - 12:31 PM |
Tagged: Intel, Haswell-E, haswell, ddr4, core i7, 5960X

[H]ard|OCP reviewed Intel's brand new Extreme processor, the Haswell-E i7-5960X as weill as posting a large amount of Intel's launch slides detailing the new features present in this series of CPU.  As you can see from the picture they used the same funky white ASUS motherboard which Ryan used in his review but chose a Koolance EX2-755 watercooler as opposed to the Corsair H100i which allowed them to hit 4.5GHz with 1.301v CPU core voltage, slightly lower than Ryan managed.  In the end, while extremely impressed by the CPU they saw little benefits to gaming and recommend this CPU to those who spend most of their time encoding video, manipulating huge images and of course those who just want the best CPU on the planet.

14092645759hVccBAGr0_2_2.jpg

"There are many members of the "1366 X58 Enthusiast Overclockers Club" that have been waiting with bated breath for Intel's launch of the new X99 Express Chipset and new family of Core i7 Haswell-E processors. All this new hardware comes bundled with brand new DDR4 RAM technology packing huge bandwidth as well."

Here are some more Processor articles from around the web:

Processors

Source: [H]ard|OCP

Interview with Intel's Matt Dunford about Haswell-E and X99

Subject: Processors, Chipsets | August 29, 2014 - 04:25 PM |
Tagged: video, Intel, X99, Haswell-E, core i7-5960x, 5960X, ddr4

Though my review of the Intel Core i7-5960X Haswell-E processor was posted earlier today, we hosted a live stream later in the afternoon where Allyn and I talked about the launch. We were also able to welcome Matt Dunford, Princpal Evangelist at Intel to talk about his role in the Haswell-E release, the future of the platform, how DDR4 memory fits into it all and much more.

The video is embeded in the processor review now as well but I have included it separately below for those of you that want to jump straight in.

My thanks goes out to Matt from Intel for joining us on the live stream and to all the viewers that came by to submit questions and participate!

Haswell-E shows its stuff

Subject: Processors | August 29, 2014 - 11:08 AM |
Tagged: Intel, Haswell-E, haswell, evga, ddr4, corsair, core i7, asus, 5960X

The Tech Report took the new i7-5960X, Asus X99 Deluxe, 16 GB of Corsair Vengeance LPX DDR4, a Kingston HyperX SH103S3 240GB SSD and a XFX Radeon HD 7950 DD and set it loose on the test bench.  The results were impressive to say the least, especially when they moved on from games to test productivity software where the Haswell architecture really shines.  When they attempted to overclock the CPU they found a hard limit feeding the processor 1.3V and running 4.4GHz, any faster would cause some applications to BSoD.  On the other hand that applied to all 8 cores and the difference in performance was striking.

Also make sure to read Ryan's review to get even mroe information on this long awaited chip.

ports-socket.jpg

"Haswell-E has arrived. With eight cores, 20MB of cache, and quad channels of DDR4 memory, it looks to be the fastest desktop CPU in history--and not by a little bit. We've tested the heck out of it and have a huge suite of comparisons going to back to the Pentium III 800. Just, you know, for context."

Here are some more Processor articles from around the web:

Processors

Haswell-E has sprung a leak

Subject: Processors | August 26, 2014 - 10:32 AM |
Tagged: rumour, leak, Intel, Haswell-E, 5960X, 5930K, 5820K

Take it with a grain of salt as always with leaks of these kind but you will be interested to know that videocardz.com has what might be some inside information on Haswell-E pricing and model numbers.

Intel-HaswellE-E-VideoCardz_Com-Press-Deck-4-850x478.png

Intel i7 / X99 Haswell-E pricing:

  • Intel Core i7 5960X 8C/16HT – 40-lane PCI-Express support (x16 + x16 + x8) — $999
  • Intel Core i7 5930K 6C/12HT – 40-lane PCI-Express support (x16 + x16 + x8) — $583
  • Intel Core i7 5820K 6C/12HT – 28-lane PCI-Express support (x16 + x8 + x4) —– $389

As you can see there is a big jump between the affordable i7-5820K and the more expensive 5930K.  For those who know they will stick with a single GPU or two low to mid-range GPUs the 5820K should be enough for you but if you have any thoughts of upgrading or adding in a number of PCIe SSDs then you might want to seriously consider saving up for the 5930K.  Current generation GPUs and SSDs are not fully utilizing PCIe 3.0 16x but that is not likely to remain true for long so if you wish for your system to have some longevity this is certainly something you should think long and hard about.  Core counts are up while frequencies are down, the 8 core 5960X has a base clock of 3GHz, a full gigahertz slower than the 4790K but you can expect the monstrous 20MB cache and quad-channel DDR4-2133 to mitigate that somewhat.  Also make sure to note that TDP, 140W is no laughing matter and will require some serious cooling.

Follow the link for a long deck of slides that reveal even more!

Intel-HaswellE-E-VideoCardz_Com-Press-Deck-5-850x478.png

Intel Haswell-E De-Lidded: Solder Is Its Thermal Interface

Subject: General Tech, Processors | August 24, 2014 - 12:33 AM |
Tagged: Intel, Haswell-E, Ivy Bridge-E, haswell, solder, thermal paste

Sorry for being about a month late to this news. Apparently, someone got their hands on an Intel Core i7-5960X and they wanted to see its eight cores. Removing the lid, they found that it was soldered directly onto the die with an epoxy, rather than coated with a thermal paste. While Haswell-E will still need to contend with the limitations of 22nm, and how difficult it becomes to exceed various clockspeed ceilings, the better ability to dump heat is always welcome.

Intel-5960X-delidded.jpg

Image Credit: OCDrift

While Devil's Canyon (Core i7 4970K) used better thermal paste, the method used with Haswell-E will be event better. I should note that Ivy Bridge-E, released last year, also contained a form of solder under its lid and its overclocking results were still limited. This is not an easy path to ultimate gigahertz. Even so, it is nice that Intel, at least on their enthusiast line, is spending that little bit extra to not introduce artificial barriers.

Source: OCDrift

X99 Manuals Leak: Core i7-5820K Has Reduced PCIe Lanes?

Subject: General Tech, Processors | August 22, 2014 - 10:38 PM |
Tagged: X99, Intel, Haswell-E

Haswell-E, with its X99 chipset, are expected to launch soon. This will bring a new spread of processors and motherboards to the high-end, enthusiast market. These are the processors that fans of Intel should buy if they have money, want all the RAM, and have a bunch of PCIe expansion cards to install.

Intel-logo.png

The Intel enthusiast platform typically has 40 PCIe lanes, while the mainstream platform has 16. For Haswell-E, the Core i7-5820K will be the exception. According to Gigabyte's X99 manual, the four, full-sized PCIe slots will have the following possible configurations:
 

Core i7-5930K
(and above)
First Slot
(PCIe 1)
Second Slot
(PCIe 4)
Third Slot
(PCIe 2)
Fourth Slot
(PCIe 3)
  16x Unused 16x 8x
  8x 8x 16x 8x
Core i7-5820K
First Slot
(PCIe 1)
Second Slot
(PCIe 4)
Third Slot
(PCIe 2)
Fourth Slot
(PCIe 3)
  16x Unused 8x 4x
  8x 8x 8x 4x

If you count the PCIe x1 slots, the table would refer to the first, third, fifth, and seventh slots.

To me, this is not too bad. You are able to use three GPUs with eight-lane bandwidth and stick a four-lane PCIe SSD on the last slot. Considering that each lane is PCIe 3.0, it is similar to having three PCIe 2.0 x16 slots. While two-way and three-way SLI is supported on all CPUs, four-way SLI is only allowed with processors that provide forty lanes of PCIe 3.0.

Gigabyte also provides three PCIe 2.0 x1 slots, which are not handled by the CPU and do not count against its available lanes.

Since I started to write up this news post, Gigabyte seems to have replaced their manual with a single, blank page. Thankfully, I was able to have it cached long enough to finish my thoughts. Some sites claim that the manual failed to mention the 8-8-8 configuration and suggested that configurations of three GPUs were impossible. That is not true; the manual refers to these situations, just not in the most clear of terms.

Haswell-E should launch soon, with most rumors pointing to the end of the month.

VIA's Rumored New "Isaiah II" Based x86 CPU Will Compete With Intel Bay Trail and AMD Kabini Chips

Subject: Processors | August 19, 2014 - 06:06 PM |
Tagged: VIA, isaiah II, centaur technologies, centaur

VIA subsidiary Centaur Technology is rumored to be launching a new x86 processor at the end of August based on the "Isaiah II" architecture. This upcoming chip is a 64-bit SoC aimed at the mobile and low power space. So far, the only known implementation is a quad core version clocked at up to 2.0 GHz with a 2MB L2 cache. Benchmarks of the quad core Isaiah II-based processor recently appeared online, and if the SiSoft Sandra results hold true VIA has very competitive chip on its hands that outperforms Intel's Bay Trail Z3770 and holds its own against AMD's Jaguar-based Athlon 5350.

Centaur Technology.jpg

The SiSoft Sandra results below show the alleged Isaiah II quad core handily outmaneuvering Intel's Bay Trail SoC and trading wins with AMD's Athlon 5350. All three SoCs are quad core parts with integrated graphics solutions. The benchmarks were run on slightly different configurations as they do not share a motherboard or chipset in common. In the case of the VIA chip, it was paired with a motherboard using the VIA VX11H chipset).

Processor VIA Isaiah II Quad Core AMD Athlon 5350 Intel Atom Z3770
CPU Arithmetic 20.00 GOPS 22.66 GOPS 15.10 GOPS
CPU Multimedia 50.20 Mpix/s 47.56 Mpix/s 25.90 Mpix/s
Multicore Efficiency 3.10 GB/s 4.00 GB/s 1.70 GB/s
Cryptography (HS) 1.50 GB/s 1.48 GB/s 0.40 GB/s
PM Efficiency (ALU) 2.90 GIPS 2.88 GIPS 2.50 GIPS
Financial Analysis (DP FP64) 3.00 kOPT/S 3.64 kOPT/S 1.50 kOPT/S

For comparison, The Atom Z3770 is a quad core clocked at 1.46 GHz (2.39 GHz max turbo) with 2MB L2 cache and Intel HD Graphics clocked at up to 667 MHz supporting up to 4GB of 1066 MHz memory. Bay Trail is manufactured on a 22nm process and has a 2W SDP (Scenario Design Power). Further, the AMD "Kabini" Athlon 5350 features four Jaguar CPU cores clocked at 2.05 GHz, a 128-core GCN GPU clocked at 600 MHz, 2MB L2 cache, and support for 1600 MHz memory. AMD's Kabini SoC is a 28nm chip with a 25W TDP (Thermal Design Power). VIA's new chip allegedly supports modern instruction sets, including AVX 2.0, putting it on par with the AMD and Intel options. 

Processor VIA Isaiah II Quad Core AMD Athlon 5350 Intel Atom Z3770
CPU 4 Cores @ 2.00 GHz 4 Cores @ 2.05 GHz 4 Cores @ 1.46 GHz (up to 2.39 GHz turbo)
GPU ? 128 GCN Cores @ 600 MHz HD Graphics @ (up to) 667 MHz
Memory Support ? 1600 MHz 1066 MHz
L2 Cache 2 MB 2 MB 2 MB
TDP / SDP ? 25W 2W
Process Node ? 28nm 22nm
Price ? $55 $37

The SiSoft Sandra benchmarks spotted by TechPowerUp suggest that the Centaur Technology designed chip has potential. However, there are still several (important) unknowns at this point. Mainly, price and power usage. Also, the GPU VIA is using in the processor is still a mystery though Scott suspects an S3 GPU is possible through a partnership with HTC. 

The chip does seem to be offering up competitive performance, but pricing and power efficiency will play a major role in whether or not VIA gets any design wins with system OEMs. If I had to guess, the VIA chip will sit somewhere between the Intel and AMD offerings with the inclusion of motherboard chipset pushing it towards AMD's higher TDP.

If VIA prices it correctly, we could see the company making a slight comeback in the x86 market with consumer facing devices (particularly Windows 8.1 tablets). VIA has traditionally been known as the low power x86 licensee, and the new expanding mobile market is the ideal place for such a chip. Its past endeavors have not been well received (mainly due to timing and volume production/availability issues of the Nano processors), but I hope that Centaur Technology and VIA are able to pull this one off as I had started to forget the company existed (heh).

Source: TechPowerUp

Intel and Microsoft Show DirectX 12 Demo and Benchmark

Subject: General Tech, Graphics Cards, Processors, Mobile, Shows and Expos | August 13, 2014 - 06:55 PM |
Tagged: siggraph 2014, Siggraph, microsoft, Intel, DirectX 12, directx 11, DirectX

Along with GDC Europe and Gamescom, Siggraph 2014 is going on in Vancouver, BC. At it, Intel had a DirectX 12 demo at their booth. This scene, containing 50,000 asteroids, each in its own draw call, was developed on both Direct3D 11 and Direct3D 12 code paths and could apparently be switched while the demo is running. Intel claims to have measured both power as well as frame rate.

intel-dx12-LockedFPS.png

Variable power to hit a desired frame rate, DX11 and DX12.

The test system is a Surface Pro 3 with an Intel HD 4400 GPU. Doing a bit of digging, this would make it the i5-based Surface Pro 3. Removing another shovel-load of mystery, this would be the Intel Core i5-4300U with two cores, four threads, 1.9 GHz base clock, up-to 2.9 GHz turbo clock, 3MB of cache, and (of course) based on the Haswell architecture.

While not top-of-the-line, it is also not bottom-of-the-barrel. It is a respectable CPU.

Intel's demo on this processor shows a significant power reduction in the CPU, and even a slight decrease in GPU power, for the same target frame rate. If power was not throttled, Intel's demo goes from 19 FPS all the way up to a playable 33 FPS.

Intel will discuss more during a video interview, tomorrow (Thursday) at 5pm EDT.

intel-dx12-unlockedFPS-1.jpg

Maximum power in DirectX 11 mode.

For my contribution to the story, I would like to address the first comment on the MSDN article. It claims that this is just an "ideal scenario" of a scene that is bottlenecked by draw calls. The thing is: that is the point. Sure, a game developer could optimize the scene to (maybe) instance objects together, and so forth, but that is unnecessary work. Why should programmers, or worse, artists, need to spend so much of their time developing art so that it could be batch together into fewer, bigger commands? Would it not be much easier, and all-around better, if the content could be developed as it most naturally comes together?

That, of course, depends on how much performance improvement we will see from DirectX 12, compared to theoretical max efficiency. If pushing two workloads through a DX12 GPU takes about the same time as pushing one, double-sized workload, then it allows developers to, literally, perform whatever solution is most direct.

intel-dx12-unlockedFPS-2.jpg

Maximum power when switching to DirectX 12 mode.

If, on the other hand, pushing two workloads is 1000x slower than pushing a single, double-sized one, but DirectX 11 was 10,000x slower, then it could be less relevant because developers will still need to do their tricks in those situations. The closer it gets, the fewer occasions that strict optimization is necessary.

If there are any DirectX 11 game developers, artists, and producers out there, we would like to hear from you. How much would a (let's say) 90% reduction in draw call latency (which is around what Mantle claims) give you, in terms of fewer required optimizations? Can you afford to solve problems "the naive way" now? Some of the time? Most of the time? Would it still be worth it to do things like object instancing and fewer, larger materials and shaders? How often?

NVIDIA Reveals 64-bit Denver CPU Core Details, Headed to New Tegra K1 Powered Devices Later This Year

Subject: Processors | August 11, 2014 - 10:06 PM |
Tagged: tegra k1, project denver, nvidia, Denver, ARMv8, arm, Android, 64-bit

During GTC 2014 NVIDIA launched the Tegra K1, a new mobile SoC that contains a powerful Kepler-based GPU. Initial processors (and the resultant design wins such as the Acer Chromebook 13 and Xiaomi Mi Pad) utilized four ARM Cortex-A15 cores for the CPU side of things, but later this year NVIDIA is deploying a variant of the Tegra K1 SoC that switches out the four A15 cores for two custom (NVIDIA developed) Denver CPU cores.

Today at the Hot Chips conference, NVIDIA revealed most of the juicy details on those new custom cores announced in January which will be used in devices later this year.

The custom 64-bit Denver CPU cores use a 7-way superscalar design and run a custom instruction set. Denver is a wide but in-order architecture that allows up to seven operations per clock cycle. NVIDIA is using a custom ISA and on-the-fly binary translation to convert ARMv8 instructions to microcode before execution. A software layer and 128MB cache enhance the Dynamic Code Optimization technology by allowing the processor to examine and optimize the ARM code, convert it to the custom instruction set, and further cache the converted microcode of frequently used applications in a cache (which can be bypassed for infrequently processed code). Using the wider execution engine and Dynamic Code Optimization (which is transparent to ARM developers and does not require updated applications), NVIDIA touts the dual Denver core Tegra K1 as being at least as powerful as the quad and octo-core packing competition.

Further, NVIDIA has claimed at at peak throughput (and in specific situations where application code and DCO can take full advantage of the 7-way execution engine) the Denver-based mobile SoC handily outpaces Intel’s Bay Trail, Apple’s A7 Cyclone, and Qualcomm’s Krait 400 CPU cores. In the results of a synthetic benchmark test provided to The Tech Report, the Denver cores were even challenging Intel’s Haswell-based Celeron 2955U processor. Keeping in mind that these are NVIDIA-provided numbers and likely the best results one can expect, Denver is still quite a bit more capable than existing cores. (Note that the Haswell chips would likely pull much farther ahead when presented with applications that cannot be easily executed in-order with limited instruction parallelism).

NVIDIA Denver CPU Core 64bit ARMv8 Tegra K1.png

NVIDIA is ratcheting up mobile CPU performance with its Denver cores, but it is also aiming for an efficient chip and has implemented several power saving tweaks. Beyond the decision to go with an in-order execution engine (with DCO hopefully mostly making up for that), the beefy Denver cores reportedly feature low latency power state transitions (e.g. between active and idle states), power gating, dynamic voltage, and dynamic clock scaling. The company claims that “Denver's performance will rival some mainstream PC-class CPUs at significantly reduced power consumption.” In real terms this should mean that the two Denver cores in place of the quad core A15 design in the Tegra K1 should not result in significantly lower battery life. The two K1 variants are said to be pin compatible such that OEMs and developers can easily bring upgraded models to market with the faster Denver cores.

NVIDIA Denver CPU cores in Tegra K1.png

For those curious, In the Tegra K1, the two Denver cores (clocked at up to 2.5GHz) share a 16-way L2 cache and each have 128KB instruction and 64KB data L1 caches to themselves. The 128MB Dynamic Code Optimization cache is held in system memory.

Denver is the first (custom) 64-bit ARM processor for Android (with Apple’s A7 being the first 64-bit smartphone chip), and NVIDIA is working on supporting the next generation Android OS known as Android L.

The dual Denver core Tegra K1 is coming later this year and I am excited to see how it performs. The current K1 chip already has a powerful fully CUDA compliant Kepler-based GPU which has enabled awesome projects such as computer vision and even prototype self-driving cars. With the new Kepler GPU and Denver CPU pairing, I’m looking forward to seeing how NVIDIA’s latest chip is put to work and the kinds of devices it enables.

Are you excited for the new Tegra K1 SoC with NVIDIA’s first fully custom cores?

Source: NVIDIA

Kaveri on Linux

Subject: Processors | August 11, 2014 - 12:40 PM |
Tagged: A10-7800, A6-7400K, linux, amd, ubuntu 14.04, Kaveri

Linux support for AMD's GPUs has not been progressing at the pace many users would like, though it is improving over time but that is not the same with their APUs.  Phoronix just tested the A10-7800 and A6-7400K on Ubuntu 14.04 with kernel 3.13 and the latest Catalyst 14.6 Beta.  This preview just covers the raw performance, you can expect to see more published in the near future that will cover new features such as the configurable TDP which exists on these chips.  The tests show that the new 7800 can keep pace with the previous 7850K and while the A6-7400K is certainly slower it will be able to handle a Linux machine with relatively light duties.  You can see the numbers here.

image.php_.jpg

"At the end of July AMD launched new Kaveri APU models: the A10-7800, A8-7600, and A6-7400K. AMD graciously sent over review samples on their A10-7800 and A6-7400K Kaveri APUs, which we've been benchmarking and have some of the initial Linux performance results to share today."

Here are some more Processor articles from around the web:

Processors

Source: Phoronix

How can you make your Pentium G3258 system cheaper? Run Ubuntu!

Subject: Processors | July 22, 2014 - 01:15 PM |
Tagged: linux, Pentium G3258, ubuntu 14.10

Phoronix tested out the 20th Anniversary Pentium CPU on Ubuntu 14.10 and right off the bat were impressed as they managed a perfectly stable overclock of 4.4GHz on air.  Using Linux 3.16 and Mesa 10.2 they had no issues with the performance of the onboard GPU though the performance lagged behind the fast GPU present on the Haswell chips they tested against.  When they benchmarked the CPU the lack of Advanced Vector Extensions and the fact that it is a dual core CPU showed in the results but when you consider the difference in price for a G3258's compared to a 4770K it fares quite well.  Stay tuned for their next set of benchmarks which will compare the G3258 to AMD's current offerings.

image.php_.jpg

"Up for review today on Phoronix is the Pentium G3258, the new processor Intel put out in celebration of their Pentium brand turning 20 years old. This new Pentium G3258 processor costs under $100 USD and comes unlocked for offering quite a bit overclocking potential while this Pentium CPU can be used by current Intel 8 and 9 Series Chipsets. Here's our first benchmarks of the Intel Pentium G3258 using Ubuntu Linux."

Here are some more Processor articles from around the web:

Processors

Source: Phoronix

Intel AVX-512 Expanded

Subject: General Tech, Graphics Cards, Processors | July 19, 2014 - 12:05 AM |
Tagged: Xeon Phi, xeon, Intel, avx-512, avx

It is difficult to know what is actually new information in this Intel blog post, but it is interesting none-the-less. Its topic is the AVX-512 extension to x86, designed for Xeon and Xeon Phi processors and co-processors. Basically, last year, Intel announced "Foundation", the minimum support level for AVX-512, as well as Conflict Detection, Exponential and Reciprocal, and Prefetch, which are optional. This, earlier blog post was very much focused on Xeon Phi, but it acknowledged that the instructions will make their way to standard, CPU-like Xeons at around the same time.

Intel_Xeon_Phi_Family.jpg

This year's blog post brings in a bit more information, especially for common Xeons. While all AVX-512-supporting processors (and co-processors) will support "AVX-512 Foundation", the instruction set extensions are a bit more scattered.

 
Xeon
Processors
Xeon Phi
Processors
Xeon Phi
Coprocessors (AIBs)
Foundation Instructions Yes Yes Yes
Conflict Detection Instructions Yes Yes Yes
Exponential and Reciprocal Instructions No Yes Yes
Prefetch Instructions No Yes Yes
Byte and Word Instructions Yes No No
Doubleword and Quadword Instructions Yes No No
Vector Length Extensions Yes No No

Source: Intel AVX-512 Blog Post (and my understanding thereof).

So why do we care? Simply put: speed. Vectorization, the purpose of AVX-512, has similar benefits to multiple cores. It is not as flexible as having multiple, unique, independent cores, but it is easier to implement (and works just fine with having multiple cores, too). For an example: imagine that you have to multiply two colors together. The direct way to do it is multiply red with red, green with green, blue with blue, and alpha with alpha. AMD's 3DNow! and, later, Intel's SSE included instructions to multiply two, four-component vectors together. This reduces four similar instructions into a single operating between wider registers.

Smart compilers (and programmers, although that is becoming less common as compilers are pretty good, especially when they are not fighting developers) are able to pack seemingly unrelated data together, too, if they undergo similar instructions. AVX-512 allows for sixteen 32-bit pieces of data to be worked on at the same time. If your pixel only has four, single-precision RGBA data values, but you are looping through 2 million pixels, do four pixels at a time (16 components).

For the record, I basically just described "SIMD" (single instruction, multiple data) as a whole.

This theory is part of how GPUs became so powerful at certain tasks. They are capable of pushing a lot of data because they can exploit similarities. If your task is full of similar problems, they can just churn through tonnes of data. CPUs have been doing these tricks, too, just without compromising what they do well.

Source: Intel