Subject: Processors | November 20, 2015 - 06:21 PM | Scott Michaud
Tagged: xeon, Intel, FPGA
UPDATE (Nov 26th, 3:30pm ET): A few readers have mentioned that FPGAs take much less than hours to reprogram. I even received an email last night that claims FPGAs can be reprogrammed in "well under a second." This differs from the sources I've read when I was reading up on their OpenCL capabilities (for potential evolutions of projects) back in ~2013. That said, multiple sources, including one who claim to have personal experience with FPGAs, say that it's not the case. Also, I've never used an FPGA myself -- again, I was just researching them to see where some GPU-based projects could go.
Designing integrated circuits, as I've said a few times, is basically a game. You have a blank canvas that you can etch complexity into. The amount of “complexity” depends on your fabrication process, how big your chip is, the intended power, and so forth. Performance depends on how you use the complexity to compute actual tasks. If you know something special about your workload, you can optimize your circuit to do more with less. CPUs are designed to do basically anything, while GPUs assume similar tasks can be run together. If you will only ever run a single program, you can even bake some or all of its source code into hardware called an “application-specific integrated circuit” (ASIC), which is often used for video decoding, rasterizing geometry, and so forth.
This is an old Atom back when Intel was partnered with Altera for custom chips.
FPGAs are circuits that can be baked into a specific application, but can also be reprogrammed later. Changing tasks requires a significant amount of time (sometimes hours) but it is easier than reconfiguring an ASIC, which involves removing it from your system, throwing it in the trash, and printing a new one. FPGAs are not quite as efficient as a dedicated ASIC, but it's about as close as you can get without translating the actual source code directly into a circuit.
Intel, after purchasing FPGA manufacturer, Altera, will integrate their technology into Xeons in Q1 2016. This will be useful to offload specific tasks that dominate a server's total workload. According to PC World, they will be integrated as a two-chip package, where both the CPU and FPGA can access the same cache. I'm not sure what form of heterogeneous memory architecture that Intel is using, but this would be a great example of a part that could benefit from in-place acceleration. You could imagine a simple function being baked into the FPGA to, I don't know, process large videos in very specific ways without expensive copies.
Again, this is not a consumer product, and may never be. Reprogramming an FPGA can take hours, and I can't think of too many situations where consumers will trade off hours of time to switch tasks with high performance. Then again, it just takes one person to think of a great application for it to take off.
Subject: General Tech | June 19, 2014 - 01:19 PM | Jeremy Hellstrom
Tagged: xeon, Intel, FPGA
Intel has just revealed what The Register is aptly referring to as the FrankenChip, a hybrid Xeon E5 and FPGA chip. This will allow large companies to access the power of a Xeon and be able to offload some work onto an FPGA they can program and optimize themselves. The low power FPGA is actually on the chip, as opposed to Microsoft's recent implementation which saw FPGA's added to PCIe slots. Intel's solution does not use up a slot and also offers direct access to the Xeon cache hierarchy and system memory via QPI which will allow for increased performance. Another low power shot has been fired at ARM's attempts to grow their share of the server market but we shall see if the inherent complexity of programming an FPGA to work with an x86 is more or less attractive than switching to ARM.
"Intel has expanded its chip customization business to help it take on the hazy threat posed by some of the world's biggest clouds adopting low-power ARM processors."
Here is some more Tech News from around the web:
- Amazon's new, not-really-3D Fire: Puts Bezos' cash register in YOUR pocket @ The Register
- Amazon Fire Phone will crash and burn @ The Inquirer
- Knitted Circuit Board Lends Flexibility to E-Textiles @ Hack a Day
- 3D Windowing System Developed Using Wayland, Oculus Rift @ Slashdot
- Google Play Store is littered with 'secret keys' @ The Inquirer
- How farsighted is Microsoft's Azure RemoteApp? @ The Register
- Rollei Mini WiFi Camcorder 1 Review @ NikKTech
- The Dell Inspiron 3000 & 5000 Launch Report @ Tech ARP
Subject: General Tech, Graphics Cards | October 16, 2013 - 10:00 PM | Scott Michaud
Tagged: FPGA, Altera
(Update 10/17/2013, 6:13 PM) Apparently I messed up inputing this into the website last night. To compare FPGAs with current hardware, the Altera Stratix 10 is rated at more than 10 TeraFLOPs compared to the Tesla K20X at ~4 TeraFLOPs or the GeForce Titan at ~4.5 TeraFLOPs. All figures are single precision. (end of update)
Field Programmable Gate Arrays (FPGAs) are not general purpose processors; they are not designed to perform any random instruction at any random time. If you have a specific set of instructions that you want performed efficiently, you can spend a couple of hours compiling your function(s) to an FPGA which will then be the hardware embodiment of your code.
This is similar to an Application-Specific Integrated Circuit (ASIC) except that, for an ASIC, it is the factory who bakes your application into the hardware. Many (actually, to my knowledge, almost every) FPGAs can even be reprogrammed if you can spare those few hours to configure it again.
Altera is a manufacturer of FPGAs. They are one of the few companies who were allowed access to Intel's 14nm fabrication facilities. Rahul Garg of Anandtech recently published a story which discussed compiling OpenCL kernels to FPGAs using Altera's compiler.
Now this is pretty interesting.
The design of OpenCL splits work between "host" and "kernel". The host application is written in some arbitrary language and follows typical programming techniques. Occasionally, the application will run across a large batch of instructions. A particle simulation, for instance, will require position information to be computed. Rather than having the host code loop through every particle and perform some complex calculation, what happens to each particle could be "a kernel" which the host adds to the queue of some accelerator hardware. Normally, this is a GPU with its thousands of cores chunked into groups of usually 32 or 64 (vendor-specific).
An FPGA, on the other hand, can lock itself to the specific set of instructions. It can decide to, within a few hours, configure some arbitrary number of compute paths and just churn through each kernel call until it is finished. The compiler knows exactly the application it will need to perform while the host code runs on the CPU.
This is obviously designed for enterprise applications, at least as far into the future as we can see. Current models are apparently priced in the thousands of dollars but, as the article points out, has the potential to out-perform a 200W GPU at just a tenth of the power. This could be very interesting for companies, perhaps a film production house, who wants to install accelerator cards for sub-d surfaces or ray tracing but would like to develop the software in-house and occasionally update their code after business hours.
Regardless of the potential market, a FPGA-based add-in card simply makes sense for OpenCL and its architecture.