Intel's Knights Landing (Xeon Phi, 2015) Details

Subject: General Tech, Graphics Cards, Processors | July 2, 2014 - 03:55 AM |
Tagged: Intel, Xeon Phi, xeon, silvermont, 14nm

Anandtech has just published a large editorial detailing Intel's Knights Landing. Mostly, it is stuff that we already knew from previous announcements and leaks, such as one by VR-Zone from last November (which we reported on). Officially, few details were given back then, except that it would be available as either a PCIe-based add-in board or as a socketed, bootable, x86-compatible processor based on the Silvermont architecture. Its many cores, threads, and 512 bit registers are each pretty weak, compared to Haswell, for instance, but combine to about 3 TFLOPs of double precision performance.

View Full Size

Not enough graphs. Could use another 256...

The best way to imagine it is running a PC with a modern, Silvermont-based Atom processor -- only with up to 288 processors listed in your Task Manager (72 actual cores with quad HyperThreading).

The main limitation of GPUs (and similar coprocessors), however, is memory bandwidth. GDDR5 is often the main bottleneck of compute performance and just about the first thing to be optimized. To compensate, Intel is packaging up-to 16GB of memory (stacked DRAM) on the chip, itself. This RAM is based on "Hybrid Memory Cube" (HMC), developed by Micron Technology, and supported by the Hybrid Memory Cube Consortium (HMCC). While the actual memory used in Knights Landing is derived from HMC, it uses a proprietary interface that is customized for Knights Landing. Its bandwidth is rated at around 500GB/s. For comparison, the NVIDIA GeForce Titan Black has 336.4GB/s of memory bandwidth.

Intel and Micron have worked together in the past. In 2006, the two companies formed "IM Flash" to produce the NAND flash for Intel and Crucial SSDs. Crucial is Micron's consumer-facing brand.

View Full Size

So the vision for Knights Landing seems to be the bridge between CPU-like architectures and GPU-like ones. For compute tasks, GPUs edge out CPUs by crunching through bundles of similar tasks at the same time, across many (hundreds of, thousands of) computing units. The difference with (at least socketed) Xeon Phi processors is that, unlike most GPUs, Intel does not rely upon APIs, such as OpenCL, and drivers to translate a handful of functions into bundles of GPU-specific machine language. Instead, especially if the Xeon Phi is your system's main processor, it will run standard, x86-based software. The software will just run slowly, unless it is capable of vectorizing itself and splitting across multiple threads. Obviously, OpenCL (and other APIs) would make this parallelization easy, by their host/kernel design, but it is apparently not required.

It is a cool way that Intel arrives at the same goal, based on their background. Especially when you mix-and-match Xeons and Xeon Phis on the same computer, it is a push toward heterogeneous computing -- with a lot of specialized threads backing up a handful of strong ones. I just wonder if providing a more-direct method of programming will really help developers finally adopt massively parallel coding practices.

I mean, without even considering GPU compute, how efficient is most software at splitting into even two threads? Four threads? Eight threads? Can this help drive heterogeneous development? Or will this product simply try to appeal to those who are already considering it?

Source: Intel

July 2, 2014 | 09:27 AM - Posted by Anonymous (not verified)

Nice for ray tracing! but how does these cores compare to GPU cores for raster OPs, and other texture/tessellation hardware built into GPU hardware? Maybe software can be developed to run on the cores to simulate the GPU type functionality, but will it be as fast?

July 2, 2014 | 02:52 PM - Posted by Scott Michaud

They do not have fixed function rendering hardware, because they are designed for general compute. That said, according to studies by NVIDIA, a million triangle scene (at 2048x1536) only takes 4.01ms to compute on the shader units of a GeForce GTX 480 (compared to 1.11ms when using the 480's hardware). That is really only about 1/4 of your rendering budget, at 60FPS. In other words, a 33% boost in performance (4/3) would (on paper) compensate for it.

And yes, software can (and is being) developed to run on arbitrary processors and simulate GPU functionality. NVIDIA's research project is doing that. I announced a research project doing that, late last year. Obviously, this Xeon Phi would not really be cost-effective for that. The future? Who knows.

July 3, 2014 | 10:59 AM - Posted by Anonymous (not verified)

Nvidia is considering putting Denver cores on its discrete GPU products, if the rumors are correct, and AMD has its Console APUs, that AMD could, if it wanted to, make into a discrete PCI based system, if the GPU core count where to be increased more in line with a high end discrete GPUs. I think bringing CPU on to the die along side the GPU is the way to go as far as reducing latency between CPU and GPU, and it can't get much better than CPU core/s sharing a fat internal on die GPU style memory controller and BUS with the whole unit having a large on die RAM to stage the most often accessed OS/gaming engine code. These discrete systems should host their own gaming optimized OSs and the gaming engines that run the games also. Hosting the gaming OS and engine on Gddr5, or faster, memory with the on die RAM there to stage the most often needed OS/gaming engine code would further reduce the latency issues with Gddr5, and running a gaming optimized OS, tuned to take advantage of the hardware, with no other tasks but to assist the gaming engine in running the game, could not be matched by ay general purpose OS running on a narrow motherboard bus, memory bandwidth constrained systems that are currently used.

Future gaming rigs would consist of a general purpose OS running on the motherboard CPU, while the PCI based gaming system would be running the games, gaming OS, engine, and all the game. These systems would become more like a computing/gaming cluster with the gaming system/s plugged into the PCI slots, and multiple gaming systems could be hosted on one motherboard and could stream games to individual player's tablet/Laptop/other devices, or the gaming systems could be paired up to act as one more powerful gaming system for multiplayer games. Gigabit and faster WIFI is coming, and game streaming, and other streaming will allow whole households to utilize the gaming rig for all types of uses, along with gaming, the gaming/computing cluster running as a headless server, controlled by a tablet, or other device/s that was logged on as a remote desktop/s.

July 4, 2014 | 04:39 AM - Posted by Scott Michaud

There'd be no point in "making its console APUs into a discrete PCI-based system". AMD licenses both x86 and ARM instruction sets. Code which runs on an add-in board (AIB) does not need to run a host OS. Rather, it could targeted by the compiler, instead of the GPU-based cores, for the "branchy bits" of an offloaded chunk. In other words, AMD has many options, none of which were only possible with the Sony and Microsoft console contracts (although it helps stabilize the company, financially).

Developing an APU for the Xbox One and PS4, to me, seems like they were just subsidizing existing research with a little bit of licensing money -- and getting game developers used to their CPU and GPU architectures for optimizing PC games at the same time.

July 4, 2014 | 11:17 AM - Posted by Anonymous (not verified)

Screw M$, and its closed gaming ecosystem! Same for Sony, build a gaming cluster system based on a Linux distro and run the PCI card devices with their own light weight gaming OSs, to run gaming engines/Games, This is the low latency way to go! Gaming clusters with the CPU power on the discrete card and gamers able to add cards with more GPU/CPU power, power to host a complete gaming system on a PCI card, more CPU/GPU uber low latency gaming power with every extra card, and to hell with Intel, and M$. Nvidia will be doing this with its Denver cores, at first, and maybe Power8 after some time, Power/Power8 is up for Arm style licensing, so High End gaming could use Power8's performance! You can be darn sure that if Nvidia puts custom ARM ISA based processors on its discrete GPU products, that AMD will have to counter, and AMD already has a Gaming APU that fits the bill.

Unless motherboard makers start building boards with 256/512 bit busses and faster memory, the only way to go is a PCI based gaming system, you can't get any lower latency than a gaming APU with GPU/CPU on the same die, and what pc gamer would not want a gaming cluster/gaming server, with the ability to add more CPU/GPU power with each PCI slot adding its own gaming APU/CPU/GPU system, running an optimized for gaming OS! The Gaming Cluster is the future, and having a system with a motherboard and maybe 4 or more PIC x16 slots, or the x16 slots and a backplane bus, Crossfire/SLI/whatever for extra bandwidth between the computing GPU/CPU units, all hosted on a motherboard with the motherboard CPU running a VM, and any General Purpose OS, without stealing CPU cycles from the complete gaming systems on the PCI card/s, is the best for a home Server Powerful enough to run an entire household's computing/gaming/entertainment needs!

The Gigabit and Faster wireless routers coming to market, will allow the whole house to be connected to these gaming/computing cluster systems, and host virtual desktops on Tablets/Laptops/other devices throughout the house, and allow the users to have enough power to run the most demanding gaming/other workloads, that would normally overwhelm any mobile/laptop device. Game streaming, or compute streaming, and virtual desktop services via the home server is coming, and households with tablets that have content/games hosted on a centralized gaming/computing home Cluster server will become the norm! With the right VM cluster software/OS the entire household's computing devices could be turned into one big asymmetrical computing platform, with the distributed computing system able to distribute computing workloads across every computing device connected to the home network. The home based, and home hosted Cloud systems are just now beginning to come online, and Nvidia for sure is in a good place, along with Apple, and a few others, including AMD. Nvidia is working with IBM on integrating their GPUs with Power8, and Nvidia could license Power8, and create some very powerful home systems for gaming and compute, with services Gaming/other streamed to its upcoming Tablet devices. AMD likewise has its SeaMicro division and could make a home based cluster gaming/compute system of its own. The old days of being stuck with a single CPU/SOC wedded to a single socket on a slow motherboard are coming to a close, and the days of the home computing/gaming cluster are about to begin, WINTEL be damned, If anyone thinks for one New York nanosecond that Nvidia is not pondering the home Cluster/Server market, AMD likewise, just wait and see, because the Home computing Cluster, and Gigabit WIFI connected distributed multiprocessing is coming to the home market.

P.S. M$ is sure in a hurry to get Office running on android/other OSs, and windows will still be around for those who need to run legacy code, Windows hosted on a KVM/IOS/Other distro VM, like the enterprises are using right now, and these enterprise type systems/computing clusters will be in the Home in not too many years.

July 4, 2014 | 12:01 PM - Posted by Anonymous (not verified)

I have a question. Really this machine will be booting Windows?

July 5, 2014 | 01:22 AM - Posted by Scott Michaud

According to what Intel says, it can boot Windows and, in fact, any Haswell-compatible software (except the TSX instruction set introduced by Haswell).

But your question was worded, "Will it?" Of course, just because it can, doesn't mean it will. Its customers can say, "Uh huh, Windows, that's nice", while downloading some enterprise Linux distribution. Who knows?

I think it will. I expect that single threaded applications will perform somewhat like a tablet, but that shouldn't stop some niche from ignoring the full Xeon and building a PC with just a Xeon Phi in it.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.