Intel Larrabee Post-Mortem by Tom Forsyth

Subject: Graphics Cards, Processors | August 17, 2016 - 05:38 PM |
Tagged: Xeon Phi, larrabee, Intel

Tom Forsyth, who is currently at Oculus, was once on the core Larrabee team at Intel. Just prior to Intel's IDF conference in San Francisco, which Ryan is at and covering as I type this, Tom wrote a blog post that outlined the project and its design goals, including why it didn't hit market as a graphics device. He even goes into the details of the graphics architecture, which was almost entirely in software apart from texture units and video out. For instance, Larrabee was running FreeBSD with a program, called DirectXGfx, that gave it the DirectX 11 feature set -- and it worked on hundreds of titles, too.

Intel_Xeon_Phi_Family.jpg

Also, if you found the discussion interesting, then there is plenty of content from back in the day to browse. A good example is an Intel Developer Zone post from Michael Abrash that discussed software rasterization, doing so with several really interesting stories.

IDF 2016: Intel Project Alloy Promises Untethered VR and AR Experiences

Subject: General Tech, Processors, Displays, Shows and Expos | August 16, 2016 - 05:50 PM |
Tagged: VR, virtual reality, project alloy, Intel, augmented reality, AR

At the opening keynote to this summer’s Intel Developer Forum, CEO Brian Krzanich announced a new initiative to enable a completely untether VR platform called Project Alloy. Using Intel processors and sensors the goal of Project Alloy is to move all of the necessary compute into the headset itself, including enough battery to power the device for a typical session, removing the need for a high powered PC and a truly cordless experience.

01.jpg

This is indeed the obvious end-game for VR and AR, though Intel isn’t the first to demonstrate a working prototype. AMD showed the Sulon Q, an AMD FX-based system that was a wireless VR headset. It had real specs too, including a 2560x1440 OLED 90Hz display, 8GB of DDR3 memory, an AMD FX-8800P APU with R7 graphics embedded. Intel’s Project Alloy is currently using unknown hardware and won’t have a true prototype release until the second half of 2017.

There is one key advantage that Intel has implemented with Alloy: RealSense cameras. The idea is simple but the implications are powerful. Intel demonstrated using your hands and even other real-world items to interact with the virtual world. RealSense cameras use depth sensing to tracking hands and fingers very accurately and with a device integrated into the headset and pointed out and down, Project Alloy prototypes will be able to “see” and track your hands, integrating them into the game and VR world in real-time.

02.jpg

The demo that Intel put on during the keynote definitely showed the promise, but the implementation was clunky and less than what I expected from the company. Real hands just showed up in the game, rather than representing the hands with rendered hands that track accurately, and it definitely put a schism in the experience. Obviously it’s up to the application developer to determine how your hands would actually be represented, but it would have been better to show case that capability in the live demo.  

03.jpg

Better than just tracking your hands, Project Alloy was able to track a dollar bill (why not a Benjamin Intel??!?) and use it to interact with a spinning lathe in the VR world. It interacted very accurately and with minimal latency – the potential for this kind of AR integration is expansive.

Those same RealSense cameras and data is used to map the space around you, preventing you from running into things or people or cats in the room. This enables the first “multi-room” tracking capability, giving VR/AR users a new range of flexibility and usability.

04.jpg

Though I did not get hands on with the Alloy prototype itself, the unit on-stage looked pretty heavy, pretty bulky. Comfort will obviously be important for any kind of head mounted display, and Intel has plenty of time to iterate on the design for the next year to get it right. Both AMD and NVIDIA have been talking up the importance of GPU compute to provide high quality VR experiences, so Intel has an uphill battle to prove that its solution, without the need for external power or additional processing, can truly provide the untethered experience we all desire.

Intel Will Release 14nm Coffee Lake To Succeed Kaby Lake In 2018

Subject: Processors | July 28, 2016 - 06:47 PM |
Tagged: kaby lake, Intel, gt3e, coffee lake, 14nm

Intel will allegedly be releasing another 14nm processor following Kaby Lake (which is itself a 14nm successor to Skylake) in 2018. The new processors are code named "Coffee Lake" and will be released alongside low power runs of 10nm Cannon Lake chips. 

Intel Coffe Lake to Coexist With Cannon Lake.jpg

Not much information is known about Coffee Lake outside of leaked slides and rumors, but the first processors slated to launch in 2018 will be mainstream mobile chips that will come in U and HQ mobile flavors which are 15W to 28W and 35W to 45W TDP chips respectively. Of course, these processors will be built on a very mature 14nm process with the usual small performance and efficiency gains beyond Skylake and Kaby Lake. The chips should have a better graphics unit, but perhaps more interesting is that the slides suggest that Coffee Lake will be the first architecture where Intel will bring "hexacore" (6 core) processors into mainstream consumer chips! The HQ-class Coffee Lake processors will reportedly come in two, four, and six core variants with Intel GT3e class GPUs. Meanwhile the lower power U-class chips top out at dual cores with GT3e class graphics. This is interesting because Intel has previous held back the six core CPUs for its more expensive and higher margin HEDT and Xeon platforms.

Of course 2018 is also the year for Cannon Lake which would have been the "tock" in Intel's old tick-tock schedule (which is no more) as the chips will move to a smaller process node and then Intel would improve on the 10nm process from there in future architectures. Cannon Lake is supposed to be built on the tiny 10nm node, and it appears that the first chips on this node will be ultra low power versions for laptops and tablets. Occupying the ULV platform's U-class (15W) and Y-class (4.5W), Cannon Lake CPUs will be dual cores with GT2 graphics. These chips should sip power while giving comparable performance to Kaby and Coffee Lake perhaps even matching the performance of the Coffee Lake U processors!

Stay tuned to PC Perspective for more information!

Report: ARM Holdings Purchased by SoftBank for $32 Billion

Subject: Processors, Mobile | July 18, 2016 - 04:03 AM |
Tagged: softbank, SoC, smartphones, mobile cpu, Cortex-A73, ARM Holdings, arm, acquisition

ARM Holdings is to be aquired by SoftBank for $32 billion USD. This report has been confirmed by the Wall Street Journal, who states that an official annoucement of the deal is likely on Monday as "both companies’ boards have agreed to the deal".

arm-holdings.jpg

(Image credit: director.co.uk)

"Japan’s SoftBank Group Corp. has reached a more than $32 billion deal to buy U.K.-based chip-designer ARM HoldingsPLC, marking a significant push for the Japanese telecommunications giant into the mobile internet, according to a person familiar with the situation." - WSJ

ARM just announced their newest CPU core, the Cortex-A73, at the end of May, with performance and efficiency improvements over the current Cortex-A72 promised with the new architecture.

ARM_01.png

(Image credit: AnandTech)

We will have to wait and see if this aquisition will have any bearing on future product development, though it seems the acquisition targets the significant intellectual property value of ARM, whose designs can be found in most smartphones.

Qualcomm Announces the Snapdragon 821 SoC

Subject: Processors, Mobile | July 11, 2016 - 03:44 PM |
Tagged: SoC, Snapdragon 821, snapdragon, qualcomm, adreno 530

Announced today, the Snapdragon 821 offers a modest CPU frequency increase over the Snapdragon 820, with clock speeds of up to 2.4 GHz compared to 2.2 GHz with the Snapdragon 820. The new SoC is still implementing Qualcomm's custom quad-core "Kryo" design, which is made up of two pairs of dual-core CPU clusters.

Screenshot_20160711-112828~2.png

Quoting Anandtech, who also reported on the Snapdragon 821 today:

"What isn’t in this announcement is that the power cluster will likely be above 2 GHz and GPU clocks look to be around 650 MHz but without knowing whether there are some changes other than clock relative to Adreno 530 we can’t really estimate the performance of this part."

Specifics on the Adreno GPU were not mentioned in the official announcement. The 650 MHz GPU clock reported by Anandtech would offer a modest improvement over the SD820's 624 MHz Adreno 530 GPU. Additionally, the "power cluster" will reportedly move from 1.6 GHz with the SD820 to 2.0 GHz with the SD821.

No telling when this updated SoC will find its way into consumer devices, with the Snapdragon 820 currently available in the Samsung Galaxy S7/S7 Edge, LG G5, OnePlus 3, and a few others.

Source: Qualcomm

AMD Lists Radeon RX 490 Graphics Card, New APUs in Gaming Promo

Subject: Graphics Cards, Processors | June 29, 2016 - 11:27 AM |
Tagged: RX 490, radeon, processors, Polaris, graphics card, Bristol Ridge, APU, amd, A12-9800

AMD's current "We're in the Game" promotion offers a glimpse at upcoming product names, including the Radeon RX 490 graphics card, and the new Bristol Ridge APUs.

amd_promo.png

Visit AMD's gaming promo page and click the link to "check eligibility" to see the following list of products, which includes the new product names:

amd_product_listing.png

It seems safe to assume that the new products listed - including the Radeon RX 490 - are close to release, though details on the high-end Polaris GPU are not mentioned. We do have details on the upcoming Bristol Ridge products, with this in-depth preview from Josh published back in April. The A12-9800 and A12-9800E are said to be the flagship products in this new 7th-gen lineup, so there will be new desktop parts with improved graphics soon.

The impact of your CPU on gaming, Intel's 6700K verus 6950X

Subject: Processors | June 27, 2016 - 06:40 PM |
Tagged: dx12, 6700k, Intel, i7-6950X

[H]ard|OCP has been conducting tests using a variety of CPUs to see how well DX12 distributes load between cores as compared to DX11.  Their final article which covers the 6700K and 6950X was done a little differently and so cannot be directly compared to the previously tested CPUs.  That does not lower the value of the testing, scaling is still very obvious and the new tests were designed to highlight more common usage scenarios for gamers.  Read on to see how well, or how poorly, Ashes of the Singularity scales when using DX12.

1466612693P2ZJAEIlTj_5_1.png

"This is our fourth and last installment of looking at the new DX12 API and how it works with a game such as Ashes of the Singularity. We have looked at how DX12 is better at distributing workloads across multiple CPU cores than DX11 in AotS when not GPU bound. This time we compare the latest Intel processors in GPU bound workloads."

Here are some more Processor articles from around the web:

Processors

Source: [H]ard|OCP

Rumor: Intel Adds New Codecs with Kaby Lake-S iGPU

Subject: Processors | June 25, 2016 - 03:15 AM |
Tagged: Intel, kaby lake, iGPU, h.265, hevc, vp8, vp9, codec, codecs

Fudzilla isn't really talking about their sources, so it's difficult to gauge how confident we should be, but they claim to have information about the video codecs supported by Kaby Lake's iGPU. This update is supposed to include hardware support for HDR video, the Rec.2020 color gamut, and HDCP 2.2, because, if videos are pirated prior to their release date, the solution is clearly to punish your paying customers with restrictive, compatibility-breaking technology. Time-traveling pirates are the worst.

Intel-logo.png

According to their report, Kaby Lake-S will support VP8, VP9, HEVC 8b, and HEVC 10b, both encode and decode. However, they then go on to say that 10-bit VP9 and 10-bit HEVC 10b does not include hardware encoding. I'm not too knowledgeable about video codecs, but I don't know of any benefits to encoding 8-bit HEVC Main 10. Perhaps someone in our comments can clarify.

Source: Fudzilla

UCDavis Manufactures a 1000-Core CPU

Subject: Processors | June 22, 2016 - 02:00 AM |
Tagged: ucdavis

Update (June 22nd @ 12:36 AM): Errrr. Right. Accidentally referred to the CPU in terms of TFLOPs. That's incorrect -- it's not a floating-point decimal processor. Should be trillions of operations per second (teraops). Whoops! Also, it has a die area of 64sq.mm, compared to 520sq.mm of something like GF110.

So this is an interesting news post. Graduate students at UCDavis have designed and produced a thousand-core CPU at IBM's facilities. The processor is manufactured on their 32nm process, which is quite old -- about half-way between NVIDIA's Fermi and Kepler if viewed from a GPU perspective. Its die area is not listed, though, but we've reached out to their press contact for more information. The chip can be clocked up to 1.78 GHz, yielding 1.78 teraops of theoretical performance.

These numbers tell us quite a bit.

ucdavis-2016-thousandcorecpu.jpg

The first thing that stands out to me is that the processor is clocked at 1.78 GHz, has 1000 cores, and is rated at 1.78 teraops. This is interesting because modern GPUs (note that this is not a GPU -- more on that later) are rated at twice the clock rate times the number of cores. The factor of two comes in with fused multiply-add (FMA), a*b + c, which can be easily implemented as a single instruction and are widely used in real-world calculations. Two mathematical operations in a single instruction yields a theoretical max of 2 times clock times core count. Since this processor does not count the factor of two, it seems like its instruction set is massively reduced compared to commercial processors. If they even cut out FMA, what else did they remove from the instruction set? This would at least partially explain why the CPU has such a high theoretical throughput per transistor compared to, say, NVIDIA's GF110, which has a slightly lower TFLOP rating with about five times the transistor count -- and that's ignoring all of the complexity-saving tricks that GPUs play, that this chip does not. Update (June 22nd @ 12:36 AM): Again, none of this makes sense, because it's not a floating-point processor.

"Big Fermi" uses 3 billion transistors to achieve 1.5 TFLOPs when operating on 32 pieces of data simultaneously (see below). This processor does 1.78 teraops with 0.621 billion transistors.

On the other hand, this chip is different from GPUs in that it doesn't use their complexity-saving tricks. GPUs save die space by tying multiple threads together and forcing them to behave in lockstep. On NVIDIA hardware, 32 instructions are bound into a “warp”. On AMD, 64 make up a “wavefront”. On Intel's Xeon Phi, AVX-512 packs 16, 32-bit instructions together into a vector and operates them at once. GPUs use this architecture because, if you have a really big workload, you, chances are, have very related tasks; neighbouring pixels on a screen will be operating on the same material with slightly offset geometry, multiple vertexes of the same object will be deformed by the same process, and so forth.

This processor, on the other hand, has a thousand cores that are independent. Again, this is wasteful for tasks that map easily to single-instruction-multiple-data (SIMD) architectures, but the reverse (not wasteful in highly parallel tasks that SIMD is wasteful on) is also true. SIMD makes an assumption about your data and tries to optimize how it maps to the real-world -- it's either a valid assumption, or it's not. If it isn't? A chip like this would have multi-fold performance benefits, FLOP for FLOP.

Source: UCDavis

Rumor: AMD Plans 32-Core Opteron with 128 PCIe Lanes

Subject: Processors | June 16, 2016 - 03:18 AM |
Tagged: Zen, opteron, amd

We're beginning to see how the Zen architecture will affect AMD's entire product stack. This news refers to their Opteron line of CPUs, which are intended for servers and certain workstations. They tend to allow lots of memory, have lots of cores, and connect to a lot of I/O options and add-in boards at the same time.

amd-2016-e3-zenlogo.png

In this case, Zen-based Opterons will be available in two, four, sixteen, and thirty-two core options, with two threads per core (yielding four, eight, thirty-two, and sixty-four threads, respectively). TDPs will range between 35W and 180W. Intel's Xeon E7 v4 goes up to 165W got 24 cores (on Broadwell-EX) so AMD has a little more headroom to play with for those extra eight cores. That is obviously a lot, and it should be, again, good for cloud applications that can be parallelized.

As for the I/O side of things, the rumored chip will have 128 PCIe 3.0 lanes. It's unclear whether that is per socket, or total. Its wording sounds like it is per-CPU, although much earlier rumors have said that it has 64 PCIe lanes per socket with dual-socket boards available. It will also support sixteen 10-Gigabit Ethernet connections, which, again, is great for servers, especially with virtualization.

These are expected to launch in 2017. Fudzilla claims that “very late 2016” is possible, but also that it will launch after high-end desktop, which are expected to be delayed until 2017.

Source: Fudzilla