The impact of your CPU on gaming, Intel's 6700K verus 6950X

Subject: Processors | June 27, 2016 - 02:40 PM |
Tagged: dx12, 6700k, Intel, i7-6950X

[H]ard|OCP has been conducting tests using a variety of CPUs to see how well DX12 distributes load between cores as compared to DX11.  Their final article which covers the 6700K and 6950X was done a little differently and so cannot be directly compared to the previously tested CPUs.  That does not lower the value of the testing, scaling is still very obvious and the new tests were designed to highlight more common usage scenarios for gamers.  Read on to see how well, or how poorly, Ashes of the Singularity scales when using DX12.

1466612693P2ZJAEIlTj_5_1.png

"This is our fourth and last installment of looking at the new DX12 API and how it works with a game such as Ashes of the Singularity. We have looked at how DX12 is better at distributing workloads across multiple CPU cores than DX11 in AotS when not GPU bound. This time we compare the latest Intel processors in GPU bound workloads."

Here are some more Processor articles from around the web:

Processors

Source: [H]ard|OCP

Rumor: Intel Adds New Codecs with Kaby Lake-S iGPU

Subject: Processors | June 24, 2016 - 11:15 PM |
Tagged: Intel, kaby lake, iGPU, h.265, hevc, vp8, vp9, codec, codecs

Fudzilla isn't really talking about their sources, so it's difficult to gauge how confident we should be, but they claim to have information about the video codecs supported by Kaby Lake's iGPU. This update is supposed to include hardware support for HDR video, the Rec.2020 color gamut, and HDCP 2.2, because, if videos are pirated prior to their release date, the solution is clearly to punish your paying customers with restrictive, compatibility-breaking technology. Time-traveling pirates are the worst.

Intel-logo.png

According to their report, Kaby Lake-S will support VP8, VP9, HEVC 8b, and HEVC 10b, both encode and decode. However, they then go on to say that 10-bit VP9 and 10-bit HEVC 10b does not include hardware encoding. I'm not too knowledgeable about video codecs, but I don't know of any benefits to encoding 8-bit HEVC Main 10. Perhaps someone in our comments can clarify.

Source: Fudzilla

UCDavis Manufactures a 1000-Core CPU

Subject: Processors | June 21, 2016 - 10:00 PM |
Tagged: ucdavis

Update (June 22nd @ 12:36 AM): Errrr. Right. Accidentally referred to the CPU in terms of TFLOPs. That's incorrect -- it's not a floating-point decimal processor. Should be trillions of operations per second (teraops). Whoops! Also, it has a die area of 64sq.mm, compared to 520sq.mm of something like GF110.

So this is an interesting news post. Graduate students at UCDavis have designed and produced a thousand-core CPU at IBM's facilities. The processor is manufactured on their 32nm process, which is quite old -- about half-way between NVIDIA's Fermi and Kepler if viewed from a GPU perspective. Its die area is not listed, though, but we've reached out to their press contact for more information. The chip can be clocked up to 1.78 GHz, yielding 1.78 teraops of theoretical performance.

These numbers tell us quite a bit.

ucdavis-2016-thousandcorecpu.jpg

The first thing that stands out to me is that the processor is clocked at 1.78 GHz, has 1000 cores, and is rated at 1.78 teraops. This is interesting because modern GPUs (note that this is not a GPU -- more on that later) are rated at twice the clock rate times the number of cores. The factor of two comes in with fused multiply-add (FMA), a*b + c, which can be easily implemented as a single instruction and are widely used in real-world calculations. Two mathematical operations in a single instruction yields a theoretical max of 2 times clock times core count. Since this processor does not count the factor of two, it seems like its instruction set is massively reduced compared to commercial processors. If they even cut out FMA, what else did they remove from the instruction set? This would at least partially explain why the CPU has such a high theoretical throughput per transistor compared to, say, NVIDIA's GF110, which has a slightly lower TFLOP rating with about five times the transistor count -- and that's ignoring all of the complexity-saving tricks that GPUs play, that this chip does not. Update (June 22nd @ 12:36 AM): Again, none of this makes sense, because it's not a floating-point processor.

"Big Fermi" uses 3 billion transistors to achieve 1.5 TFLOPs when operating on 32 pieces of data simultaneously (see below). This processor does 1.78 teraops with 0.621 billion transistors.

On the other hand, this chip is different from GPUs in that it doesn't use their complexity-saving tricks. GPUs save die space by tying multiple threads together and forcing them to behave in lockstep. On NVIDIA hardware, 32 instructions are bound into a “warp”. On AMD, 64 make up a “wavefront”. On Intel's Xeon Phi, AVX-512 packs 16, 32-bit instructions together into a vector and operates them at once. GPUs use this architecture because, if you have a really big workload, you, chances are, have very related tasks; neighbouring pixels on a screen will be operating on the same material with slightly offset geometry, multiple vertexes of the same object will be deformed by the same process, and so forth.

This processor, on the other hand, has a thousand cores that are independent. Again, this is wasteful for tasks that map easily to single-instruction-multiple-data (SIMD) architectures, but the reverse (not wasteful in highly parallel tasks that SIMD is wasteful on) is also true. SIMD makes an assumption about your data and tries to optimize how it maps to the real-world -- it's either a valid assumption, or it's not. If it isn't? A chip like this would have multi-fold performance benefits, FLOP for FLOP.

Source: UCDavis

Rumor: AMD Plans 32-Core Opteron with 128 PCIe Lanes

Subject: Processors | June 15, 2016 - 11:18 PM |
Tagged: Zen, opteron, amd

We're beginning to see how the Zen architecture will affect AMD's entire product stack. This news refers to their Opteron line of CPUs, which are intended for servers and certain workstations. They tend to allow lots of memory, have lots of cores, and connect to a lot of I/O options and add-in boards at the same time.

amd-2016-e3-zenlogo.png

In this case, Zen-based Opterons will be available in two, four, sixteen, and thirty-two core options, with two threads per core (yielding four, eight, thirty-two, and sixty-four threads, respectively). TDPs will range between 35W and 180W. Intel's Xeon E7 v4 goes up to 165W got 24 cores (on Broadwell-EX) so AMD has a little more headroom to play with for those extra eight cores. That is obviously a lot, and it should be, again, good for cloud applications that can be parallelized.

As for the I/O side of things, the rumored chip will have 128 PCIe 3.0 lanes. It's unclear whether that is per socket, or total. Its wording sounds like it is per-CPU, although much earlier rumors have said that it has 64 PCIe lanes per socket with dual-socket boards available. It will also support sixteen 10-Gigabit Ethernet connections, which, again, is great for servers, especially with virtualization.

These are expected to launch in 2017. Fudzilla claims that “very late 2016” is possible, but also that it will launch after high-end desktop, which are expected to be delayed until 2017.

Source: Fudzilla

AMD "Sneak Peek" at RX Series (RX 480, RX 470, RX 460)

Subject: Graphics Cards, Processors | June 13, 2016 - 03:51 PM |
Tagged: amd, Polaris, Zen, Summit Ridge, rx 480, rx 470, rx 460

AMD has just unveiled their entire RX line of graphics cards at E3 2016's PC Gaming Show. It was a fairly short segment, but it had a few interesting points in it. At the end, they also gave another teaser of Summit Ridge, which uses the Zen architecture.

amd-2016-e3-470460.png

First, Polaris. As we know, the RX 480 was going to bring >5 TFLOPs at a $199 price point. They elaborated that this will apply to the 4GB version, which likely means that another version with more VRAM will be available, and that implies 8GB. Beyond the RX 480, AMD has also announced the RX 470 and RX 460. Little is known about the 470, but they mentioned that the 460 will have a <75W TDP. This is interesting because the PCIe bus provides 75W of power. This implies that it will not require any external power, and thus could be a cheap and powerful (in terms of esports titles) addition to an existing desktop. This is an interesting way to use the power savings of the die shrink to 14nm!

amd-2016-e3-backpackpc.png

They also showed off a backpack VR rig. They didn't really elaborate, but it's here.

amd-2016-e3-summitdoom.png

As for Zen? AMD showed the new architecture running DOOM, and added the circle-with-Zen branding to a 3D model of a CPU. Zen will be coming first to the enthusiast category with (up to?) eight cores, two threads per core (16 threads total).

amd-2016-e3-zenlogo.png

The AMD Radeon RX 480 will launch on June 29th for $199 USD (4GB). None of the other products have a specific release date.

Source: AMD

James Reinders Leaving Intel and What It Means

Subject: Processors | June 8, 2016 - 08:17 AM |
Tagged: Xeon Phi, Intel, gpgpu

Intel's recent restructure had a much broader impact than I originally believed. Beyond the large number of employees who will lose their jobs, we're even seeing it affect other areas of the industry. Typically, ASUS releases their ZenPhone line with x86 processors, which I assumed was based on big subsidies from Intel to push their instruction set into new product categories. This year, ASUS chose the ARM-based Qualcomm Snapdragon, which seemed to me like Intel decided to stop the bleeding.

reinders148x148.jpg

That brings us to today's news. After over 27 years at Intel, James Reinders accepted the company's early retirement offer, scheduled for his 10001st day with the company, and step down from his position as Intel's High Performance Computing Director. He worked on the Larabee and Xeon Phi initiatives, and published several books on parallelism.

According to his letter, it sounds like his retirement offer was part of a company-wide package, and not targeting his division specifically. That would sort-of make sense, because Intel is focusing on cloud and IoT. Xeon Phi is an area that Intel is battling NVIDIA for high-performance servers, and I would expect that it has potential for cloud-based applications. Then again, as I say that, AWS only has a handful of GPU instances, and they are running fairly old hardware at that, so maybe the demand isn't there yet.

Video Perspective: Intel Giving Away 6950X + SSD 750 Systems at PAX Prime

Subject: Processors | June 7, 2016 - 03:29 PM |
Tagged: Intel, video, PAX, pax prime, i7-6950X, taser

Intel is partnering with 12 of their top system builders to build amazing PCs around the Core i7-6950X 10-core Extreme Edition processor and the SSD 750 Series drives. Intel will be raffling off 7 of these systems at PAX Prime in September. You can find out more details on the competition and how you can enter at http://inte.ly/rigchallenge. 

As for us, we got a taser.

Looking for a new CPU? You will be waiting until January at the earliest

Subject: Processors | June 7, 2016 - 02:45 PM |
Tagged: Zen, kaby lake, Intel, delayed, amd

Bad news upgraders, neither AMD nor Intel will be launching their new CPUs until the beginning of next year.  Both AMD's Zen and Intel's Kaby Lake have now been delayed instead of launching in Q4 and Q3 of this year respectively.  DigiTimes did not delve into the reasons behind the delay in AMD's 14nm GLOBALFOUNDRIES (and Samsung) sourced Zen but unfortunately the reasons beind Intel's delay are all too clear.  With large stockpiles of  Skylake and Haswell processors and systems based around them sitting in the channel, AMD's delay creates an opportunity for Intel and retailers to move that stock.  Once Kaby Lake arrives the systems will no longer be attractive to consumers and the prices will plummet.

Here is to hoping AMD's delay does not imply anything serious, though the lack of a new product release at a time which traditionally sees sales increase is certainly going to hurt their bottom line for 2016.

bad-news-everyone.jpg

"With the delays, the PC supply chain will not be able to begin mass production for the next-generation products until November or December and PC demand is also unlikely to pick up until the first quarter of 2017."

Here are some more Processor articles from around the web:

Processors

 

Source: DigiTimes

Intel Launches Xeon E7 v4 Processors

Subject: Processors | June 7, 2016 - 09:39 AM |
Tagged: xeon e7 v4, xeon e7, xeon, Intel, broadwell-ex, Broadwell

Yesterday, Intel launched eleven SKUs of Xeon processors that are based on Broadwell-EX. While I don't follow this product segment too closely, it's a bit surprising that Intel launched them so close to consumer-level Broadwell-E. Maybe I shouldn't be surprised, though.

intel-logo-cpu.jpg

These processors scale from four cores up to twenty-four of them, with HyperThreading. They are also available in cache sizes from 20MB up to 60MB. With Intel's Xeon naming scheme, the leading number immediately after the E7 in the product name denotes the number of CPUs that can be installed in a multi-socket system. The E7-8XXX line can be run in an eight-socket motherboard, while the E7-4XXX models are limited to four sockets per system. TDPs range between 115W and 165W, which is pretty high, but to be expected for a giant chip that runs at a fairly high frequency.

Intel Xeon E7 v4 launched on June 6th with listed prices between $1223 to $7174 per CPU.

Source: Intel

HSA 1.1 Released

Subject: Graphics Cards, Processors, Mobile | June 6, 2016 - 07:11 AM |
Tagged: hsa 1.1, hsa

The HSA Foundation released version 1.1 of their specification, which focuses on “multi-vendor” compatibility. In this case, multi-vendor doesn't refer to companies that refused to join the HSA Foundation, namely Intel and NVIDIA, but rather multiple types of vendors. Rather than aligning with AMD's focus on CPU-GPU interactions, HSA 1.1 includes digital signal processors (DSPs), field-programmable gate arrays (FPGAs), and other accelerators. I can see this being useful in several places, especially on mobile, where cameras, sound processors, and CPU cores, and a GPU regularly share video buffers.

HSA Foundation_Logo.png

That said, the specification also mentions “more efficient interoperation with non-HSA compliant devices”. I'm not quite sure what that specifically refers to, but it could be important to keep an eye on for future details -- whether it is relevant for Intel and NVIDIA hardware (and so forth).

Charlie, down at SemiAccurate, notes that HSA 1.1 will run on all HSA 1.0-compliant hardware. This makes sense, but I can't see where this is explicitly mentioned in their press release. I'm guessing that Charlie was given some time on a conference call (or face-to-face) regarding this, but it's also possible that he may be mistaken. It's also possible that it is explicitly mentioned in the HSA Foundation's press blast and I just fail at reading comprehension.

If so, I'm sure that our comments will highlight my error.