Intel does not respond well when asked about Larabee. Though Intel has received a lot of bad press from the gaming community about what they were trying to do, that does not necessarily mean that Intel was wrong about how they set up the architecture. The problem with Larrabee was that it was being considered as a consumer level product with an eye for breaking into the HPC/GPGPU market. For the consumer level, Larrabee would have been a disaster. Intel simply would not have been able to compete with AMD and NVIDIA for gamers’ hearts.
The problem with Larrabee and the consumer space was a matter of focus, process decisions, and die size. Larrabee is unique in that it is almost fully programmable and features really only one fixed function unit. In this case, that fixed function unit was all about texturing. Everything else relied upon the large array of x86 processors and their attached vector units. This turns out to be very inefficient when it comes to rendering games, which is the majority of work for the consumer market in graphics cards. While no outlet was able to get a hold of a Larrabee sample and run benchmarks on it, the general feeling was that Intel would easily be a generation behind in performance. When considering how large the die size would have to be to even get to that point, it was simply not economical for Intel to produce these cards.
Xeon Phi is essentially an advanced part based on the original Larrabee architecture.
This is not to say that Larrabee does not have a place in the industry. The actual design lends itself very nicely towards HPC applications. With each chip hosting many x86 processors with powerful vector units attached, these products can provide tremendous performance in HPC applications which can leverage these particular units. Because Intel utilized x86 processors instead of the more homogenous designs that AMD and NVIDIA use (lots of stream units doing vector and scalar, but no x86 units or a more traditional networking fabric to connect them). This does give Intel a leg up on the competition when it comes to programming. While GPGPU applications are working with products like OpenCL, C++ AMP, and NVIDIA’s CUDA, Intel is able to rely on many current programming languages which can utilize x86. With the addition of wide vector units on each x86 core, it is relatively simple to make adjustments to utilize these new features as compared to porting something over to OpenCL.
So this leads us to the Intel Xeon Phi. This is the first commercially available product based on an updated version of the Larrabee technology. The exact code name is Knights Corner. This is a new MIC (many integrated cores) product based on Intel’s latest 22 nm Tri-Gate process technology. The details are scarce on how many cores this product actually contains, but it looks to be more than 50 of a very basic “Pentium” style core; essentially low die space, in-order, and all connected by a robust networking fabric that allows fast data transfer between the memory interface, PCI-E interface, and the cores.
Each Xeon Phi promises more than 1 TFLOP of performance (as measured by Linpack). When combined with the new Xeon E5 series of processors, these products can provide a huge amount of computing power. Furthermore, with the addition of the Cray interconnect technology that Intel acquired this year, clusters of these systems could provide for some of the fastest supercomputers on the market. While it will take until the end of this year at least to integrate these products into a massive cluster, it will happen and Intel expects these products to be at the forefront of driving performance from the Petascale to the Exascale.
These are the building blocks that Intel hopes to utilize to corner the HPC market. Providing powerful CPUs and dozens if not hundreds of MIC units per cluster, the potential computer power should bring us to the Exascale that much sooner.
Time will of course tell if Intel will be successful with Xeon Phi and Knights Corner. The idea behind this product seems sound, and the addition of powerful vector units being attached to simple x86 cores should make the software migration to massively parallel computing just a wee bit easier than what we are seeing now with GPU based products from AMD and NVIDIA. The areas that those other manufacturers have advantages over Intel are that of many years of work with educational institutions (research), software developers (gaming, GPGPU, and HPC), and industry standards groups (Khronos). Xeon Phi has a ways to go before being fully embraced by these other organizations, and its future is certainly not set in stone. We have yet to see 3rd party groups get a hold of these products and put them to the test. While Intel CPUs are certainly class leading, we still do not know of the full potential of these MIC products as compared to what is currently available in the market.
The one positive thing for Intel’s competitors is that it seems their enthusiasm for massively parallel computing is justified. Intel just entered that ring with a unique architecture that will certainly help push high performance computing more towards true heterogeneous computing.