James Reinders Leaving Intel and What It Means

Subject: Processors | June 8, 2016 - 08:17 AM |
Tagged: Xeon Phi, Intel, gpgpu

Intel's recent restructure had a much broader impact than I originally believed. Beyond the large number of employees who will lose their jobs, we're even seeing it affect other areas of the industry. Typically, ASUS releases their ZenPhone line with x86 processors, which I assumed was based on big subsidies from Intel to push their instruction set into new product categories. This year, ASUS chose the ARM-based Qualcomm Snapdragon, which seemed to me like Intel decided to stop the bleeding.

View Full Size

That brings us to today's news. After over 27 years at Intel, James Reinders accepted the company's early retirement offer, scheduled for his 10001st day with the company, and step down from his position as Intel's High Performance Computing Director. He worked on the Larabee and Xeon Phi initiatives, and published several books on parallelism.

According to his letter, it sounds like his retirement offer was part of a company-wide package, and not targeting his division specifically. That would sort-of make sense, because Intel is focusing on cloud and IoT. Xeon Phi is an area that Intel is battling NVIDIA for high-performance servers, and I would expect that it has potential for cloud-based applications. Then again, as I say that, AWS only has a handful of GPU instances, and they are running fairly old hardware at that, so maybe the demand isn't there yet.

June 8, 2016 | 11:22 AM - Posted by Mark Granger (not verified)

Intel's current business strategy is to focus on their few remaining profitable markets at the high end while making repeated and failing attempts to penetrate the mobile and IOT markets. This has been failing for years and will continue to fail as rapid advancements in ARM processor performance eats away at the few markets where Intel still leads in performance.

There is no safe strategy for Intel going forwards. Instead here are three risky strategies that have at least a chance of paying off big:

1. Intel should drop the Intel instruction set and adopt ARM instead. The Intel instruction architecture is just a drag on performance and cost of its processors and offers zero advantages. Intel should build server CPUs with 256 ARM cores rather than 48 Intel cores. This way Intel can profit from the yearly advances in ARM processors instead of being killed by them.

2. Intel should double down in the high performance personal computer market. There is still a great demand for high performance computers for graphics, games and software development. Intel should get rid of the Xeon brand and just offer a single flavor of processors with the ability to put more than one on a motherboard. All processors should have unlocked clocks.

3. Intel should get into massively parallel AI processors. AI is clearly going to be "the next big thing" for processors and has very different architectural needs than traditional CPUs. Google is already designing its own AI processors as are other companies. Thus far Intel has made no attempt to create a custom architecture for AI learning.

If Intel does make a big change in focus, we will probably be talking about it for years to come. If not, Intel will simply become relevant in a few years.

June 8, 2016 | 11:41 AM - Posted by Anonymous (not verified)

The "big subsidies from Intel" is Contra Revenue and Intel is out of the phone SOC Market, so ASUS is better served by ARM based SOCs, and Mali or PowerVR graphics. That Contra Revenue gamble has cost Intel many Billions that could have been better spent on maybe a new RISC ISA based CPU core, but Intel is/was already too late to the mobile devices party.

Intel is going to have more problems with the chromebook/tablet market with any remaining Atom derived SKUs, as the custom ARM designs by Apple(A series) and maybe AMD's K12(Possibly with SMT capabilities and a likewise wide order superscalar design) begin to take more of the mainstream Tablet/chromebook market. These ARMv8A ISA running custom micro-architectures like the A7(now up to the A9) for Apple have in their custom micro-architectures more of the desktop/laptop SOC levels of execution resources and a new AMD K12 custom design with SMT will be even more desktop/laptop like if AMD has in fact designed simulations multi-threading into K12's custom cores. AMD will be able to pair the K12 core with its graphics IP, so I would expect the K12 running in a tablet/chromebook line of SKU with Polaris derived graphics would be a big hit with many mobile devices OEMs looking for a more beefier custom ARM core that could very well surpass Apple's A series in performance.

Intel is having to back off of its mobile Russian front, and fend off an invasion of its main cash cow continent server market, and there will be some traditional big Intel Clients like Google getting an OpenPower power9 license and having some power9 server SKUs custom made to order for Google By Google! So with the Power ISA designs like Power8/Power9 being opened up via OpenPower for an ARM style licensing there will be some serious competition in the server market over the next few years. And add to that AMD's Zen x86 ISA based APUs on an interposer module with a big fat Vega GPU accelerator and HBM2 on the very same module for the HPC/Workstation/Server market, and Intel will be having to defend its server market from all sides. Additionally there will even be some custom ARM server SKUs from AMD and others to give Intel more server headaches also.

The Layoffs and early buyouts are just a symptom of what will be more things to come in the area serious competition across all the markets for Intel, and that server market competition will force Intel to have to become much leaner to survive. Those Intel stockholders are not going to be too happy about Intel missing out on the mobile market and now even Intel's domination of the server market is going to be fought over, it's not going to be pretty over the next few years for the big fat Chipzilla lumbering beast and the fat will have to be trimmed in order to stay cost competitive across all of its markets.

June 8, 2016 | 11:48 AM - Posted by Anonymous (not verified)

Edit: simulations multi-threading

to: simultaneous multithreading

Really OpenOffice why do you say spell-check complete sometimes when there are still words that need correction! And what is up with the prefix Mufti instead of Multi. Even with the latest OpenOffice update and the spelling dictionary is still not fixed!

June 8, 2016 | 11:49 AM - Posted by BlackDove (not verified)

You're wrong on all three points.

x86 CPUs are not the same as ARM and interchanging them is not beneficial in many cases.

Xeons and desktop CPUs that are on the same die cost about the same already(an E3 Xeon is about the same price as an i7). People building a desktop aren't going to spend $7,000 for the features that an E7 has, which are essential for the servers that use them.

Intel is converging E5 and E7 Xeon lines with Skylake Purley. There's a huge demand for hyperscale datacenters and supercomputers and that's what Intel is focusing on with Purley, Knights Landing and Hill, HMC and XPoint.

Intel is buying up companies like Altera to integrate FPGAs onto their CPUs. Idk what you mean by "AI processors" unless you mean neuromorphic chips?

June 8, 2016 | 02:13 PM - Posted by Anonymous (not verified)

AMD will be placing an FPGA right on the HBM2 stack/s between the bottom HBM2 logic/controller die and the HBM memory dies above. So that will be for some in memory programmable compute to go along with the 16, or 32 Zen cores and a big fat Vega GPU accelerator on the interposer module. Let's see Intel match that APU on an interposer module's CPU to GPU bandwidth, in addition to the HBM2's raw effective bandwidth at much lower/power saving clock rates.

There is nothing preventing other types of ASICs added to the Interposer module also from AMD, and that silicon interposer technology will allow for much wider etched in the interposer's silicon parallel data fabrics, including moving some active circuitry onto the silicon interposer for things like coherency circuitry, in addition to only passive traces, for even logic circuitry added to the future interposers.

Intel will have to compete with HBM2 and the wider parallel connection fabrics etched into the Interposer substrate in addition to the Zen/Vega on interposer package pairing up of CPU cores with a fat GPU die! So systems using a limited number of PCI lanes to external off module GPU accelerators will have some disadvantage compared to a thousands of bits wide interposer etched connection fabric CPU to GPU for any interposer module based APU/SOC.

On the OpenPower Power8/Power9 market Intel will have to compete with Power8/9 based cores that have at least 8 SMT processor threads per core with 8 instruction decoders per core feeding into 16 execution pipelines on the Power8 designs, and who knows with extra resources will come with the Power9's newer core micro-architecture. And here is the big problem for Intel, there is nothing stopping both AMD and Nvidia from getting a Power8/Power9 license from OpenPower and integrating some GPU IP into the power8/power9 based third party licensee marketplace. Nvidia has the lead in providing its Nvlink IP and GPU accelerators for the OpenPower market, and AMD has yet to do any work for any OpenPower based systems, but that could change.

There is a large third party licensee Power9 market starting up for Google and Google's usage of power9 CPUs will definitely mark a turning point among the cloud services providers that will be using Power9 based server systems over the next few years. Nvidia is in a good place for plenty of GPU accelerator business also with the third party, and IBM, based Power8/Power9 systems. The pricing pressures will be great and moving into the lower direction so Intel will have to compete in the very price/performance metric conscious server/HPC market. And that overall power usage metric may favor AMD's Zen/Vega/HBM APUs on an interposer Server/HPC SKUs on that other important metric the power usage metric that the cloud services providers pay very close attention to also, and to an obsessive degree.

June 8, 2016 | 06:00 PM - Posted by Anonymous (not verified)

Marketing BS won't save AMD from bankruptcy... everything single thing in the universe is finite, A-M-Dead too!

June 8, 2016 | 10:01 PM - Posted by Anonymous (not verified)

Wow, your marketing for the chip pimps at Intel shows through, and the layoffs will continue at Intel in larger numbers over the next few years! More Intel on the outside is in store for 2017 in many server rooms with competition from OpenPower, and from AMD's Zen based SKUs, and some ARM based server SKUs also like K12/others.

Those high margin Intel days are numbered, and how has that Contra Revenue worked out in the mobile market for Billions down the drain. Intel has plenty of room in that mothballed chip fab to store those ATOM chips that bombed in the Phone market even with the Contra Revenue stream going full blast! The mobile market OEMs are very happy with their licensed ARM based SOCs and no Intel Rings through their noses. Intel on the outside is going to be the word for 2017 and beyond for even the server room, so it's time for the great Intel market Milking to come to an end!

June 10, 2016 | 01:17 AM - Posted by BlackDove (not verified)

"Who knows?" Lots of people know what Power9 is going to be like and the information is readily available.

It isn't just Intel compeeing with AMD. AMD is actually not competeing with anyone in the HPC space. Fujitsu has been using HMC which is the competitor for HBM since 2014 in their PrimeHPC FX100 SPARC XIfx systems and they will lilely stay ahead of Intel and AMD by several years as they have been for a while.

I also think that the Fujitsu post FX100 architectures will be the first exascale computer, which is relevant since all the processors mentioed are going into pre-exascale computers.

EMIB is Intels competetion for silicon interposers, and they can have mixed nodes and mixed signal on EMIB SiPs, so there's plenty of competetion for silicon interposers currently in use.

All this stuff about APUs has always been hype. An APU is just a CPU with an integrated GPU.

Google switching to Power is about the only thing you said thats relevant. Knights Landing is a 72 core CPU with 4 thread SMT per core for 288 threads. Ok?

June 8, 2016 | 12:45 PM - Posted by lsr (not verified)

Xeon Phi targets different type of parallel applications which can't benefit from GPU computing

June 8, 2016 | 02:25 PM - Posted by Anonymous (not verified)

Not really, Those AMD ACE units are getting more and more of the traditionally CPU types of asynchronous compute ability with each new GCN generation, and there is plenty of OpenCL based code that can accelerate pleny of the tasks that at one time required a CPU. And those Xeon Phi Knights landing have to be clocked much higher than any GPU accelerator just to attempt provide the teraflops of FP compute that a GPU can provide with the GPU being clocked much slower to save on the power usage metric. The latest Pascal and Polaris SKUs already take the lead from any Knights versions in FP compute, and AMD's Vega with its GCN updates will add to that Xeon Phi FP deficiency all while doing it with less power/gigaflop!

June 8, 2016 | 07:01 PM - Posted by lsr (not verified)

yes, really; as long as your problem decomposes to tasks which can be run in lock step by the same code, you're good to go on GPU; but many problems don't fall into that category, and it's where you need all CPU cores you can get

June 8, 2016 | 07:23 PM - Posted by Scott Michaud

Eh, Knight's Landing has the same problem. AVX-512 requires problems decompose into at least 16 x 32-bit or 8 x 64-bit, lockstep, for maximum efficiency.

Granted, socketed Knight's Landing has the advantage that it can be place alongside a standard Xeon with unified memory, and potentially even use a driver to semi-automate core affinity. That said, NVIDIA has NVLink, which could compensate for that advantage, too.

June 8, 2016 | 07:38 PM - Posted by Anonymous (not verified)

I posted a similar post, but I am pretty slow typing on an iPhone 5, I guess. The Xeon Phi would have an advantage in memory capacity, since HMC will allow for much larger amounts of memory to be connected to the processor. HBM doesn't allow for that much capacity, but it will offer higher bandwidth and will probably be cheaper and lower power than HMC based devices.

June 8, 2016 | 07:33 PM - Posted by Anonymous (not verified)

Not original poster, but just wanted to say the Xeon Phi probably can't handle that type of workload that well either. Xeon Phi is a set of simple CPU cores, but they reach high performance by using wide vector units (256 or 512 bit). These will all be carrying out the same operation on all components in most cases. Current GPUs may actually have greater granularity in some cases.

For problems that require a lot of scalar CPU cores or high throughput on scalar threads, the best solution will probably be many cores. The new ARM A-73 is supposed to be less than one square millimeter in size. Even a fat core like SkyLake is only about 10 to 12 square millimeters in size. A much more powerful (than A-73) ARM core will probably fall somewhere in between in size. They will be able to put a lot of ARM cores on die, but probably not as many as enthusiast think. Large numbers of cores need large amounts of cache to keep them busy.

June 8, 2016 | 10:36 PM - Posted by Anonymous (not verified)

Not with GCN's asynchronous-compute and GPU cores that utilize thread level parallelism instead of instruction level parallelism, so on AMD's GCN cores compute tasks can be preempted and other threads scheduled with GPU code blocks that can also run in a long term repeated fashion for longer periods of time without hogging resources as they can be preempted when other time dependent workloads need to be executed.

Those GCN “4.0/1.3” ACE units will have even more abilities to run compute tasks just wait until the NDAs expire and the Polaris white papers are released for the latest GCN. GCN uses a RISC ISA/SIMT based GPU micro-architecture with the threads able to be context switched with other threads on the GCN ACE units so more compute workloads will able to be done on AMD's GPUs. AMD even has some FirePro SKUs that support full GPU hardware virtualization of the GPU into independent virtualized work spaces to allow for safe and secure execution of multiple different applications using one GPU that is made to look like many virtual GPUs.

Lockstep went away with AMD's moving away from Terascale to the new GCN GPU micro-architecture a few years ago now! Even ARM's new Mali Bifrost GPU micro-architecture is going with Thread level parallelism and the ability to preempt GPU threads/clauses, so expect that more asynchronous-compute will be done on the GPU for the mobile markets also.

Better read the whitepapers you are behind by a few years on GPU micro-architectures!

June 9, 2016 | 01:11 AM - Posted by Scott Michaud

Lockstep still exists, but it's only affects like, 64 threads at a time. AMD usually does it 16 hardware threads over four cycles. It also exists in Knight's Landing (16 x 32-bit and 8 x 64-bit) with AVX-512.

June 9, 2016 | 03:46 PM - Posted by Anonymous (not verified)

Not for lockstep with threads, and threads can be preempted and other work scheduled, and not degrade system response times for time dependent tasks. The GCN compute units are actually compute units so they can manage the Thread level parallelism much better. And the Polaris/GCN white-papers will be available the end of this month, so look into reading up on the improvements for Async-Compute ON GCN "4.0/1.3) there have been many improvements there for compute and graphics workloads for the Polaris GPU ISA. Vega will be even more of an improvement.

That ARM Mali Bifrost GPU Micro-Architecture looks to be very interesting for its ability to manage threads on a finer grained level for the mobile devices Async-Compute enabled markets also. The Vulkan Graphics/Compute API is going to be great across all the computing markets and not dependent on any one limited OS ecosystem!

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.