Intel Launches Stratix 10 FPGA With ARM CPU and HBM2

Subject: Processors | October 10, 2016 - 02:25 AM |
Tagged: SoC, Intel, FPGA, Cortex A53, arm, Altera

 Intel and recently acquired Altera have launched a new FPGA product based on Intel’s 14nm Tri-Gate process featuring an ARM CPU, 5.5 million logic element FPGA, and HBM2 memory in a single package. The Stratix 10 is aimed at data center, networking, and radar/imaging customers.

The Stratix 10 is an Altera-designed FPGA (field programmable gate array) with 5.5 million logic elements and a new HyperFlex architecture that optimizes registers, pipeline, and critical pathing (feed-forward designs) to increase core performance and increase the logic density by five times that of previous products. Further, the upcoming FPGA SoC reportedly can run at twice the core performance of Stratix V or use up to 70% less power than its predecessor at the same performance level.

Intel Altera Stratix 10.jpg

The increases in logic density, clockspeed, and power efficiency are a combination of the improved architecture and Intel’s 14nm FinFET (Tri-Gate) manufacturing process.

Intel rates the FPGA at 10 TFLOPS of single precision floating point DSP performance and 80 GFLOPS/watt.

Interestingly, Intel is using an ARM processor to feed data to the FPGA chip rather than its own Quark or Atom processors. Specifically, the Stratix 10 uses an ARM CPU with four Cortex A53 cores as well as four stacks of on package HBM2 memory with 1TB/s of bandwidth to feed data to the FPGA. There is also a “secure device manager” to ensure data integrity and security.

The Stratix 10 is aimed at data centers and will be used with in specialized tasks that demand high throughput and low latency. According to Intel, the processor is a good candidate for co-processors to offload and accelerate encryption/decryption, compression/de-compression, or Hadoop tasks. It can also be used to power specialized storage controllers and networking equipment.

Intel has started sampling the new chip to potential customers.

Intel Altera Stratix 10 FPGA SoC.png

In general, FPGAs are great at highly parallelized workloads and are able to efficiently take huge amounts of inputs and process the data in parallel through custom programmed logic gates. An FPGA is essentially a program in hardware that can be rewired in the field (though depending on the chip it is not necessarily a “fast” process and it can take hours or longer to switch things up heh). These processors are used in medical and imaging devices, high frequency trading hardware, networking equipment, signal intelligence (cell towers, radar, guidance, ect), bitcoin mining (though ASICs stole the show a few years ago), and even password cracking. They can be almost anything you want which gives them an advantage over traditional CPUs and graphics cards though cost and increased coding complexity are prohibitive.

The Stratix 10 stood out as interesting to me because of its claimed 10 TFLOPS of single precision performance which is reportedly the important metric when it comes to training neural networks. In fact, Microsoft recently began deploying FPGAs across its Azure cloud computing platform and plans to build the “world’s fastest AI supercomputer. The Redmond-based company’s Project Catapult saw the company deploy Stratix V FPGAs to nearly all of its Azure datacenters and is using the programmable silicon as part of an “acceleration fabric” in its “configurable cloud” architecture that will be used initially to accelerate the company’s Bing search and AI research efforts and later by independent customers for their own applications.

It is interesting to see Microsoft going with FPGAs especially as efforts to use GPUs for GPGPU and neural network training and inferencing duties have increased so dramatically over the years (with NVIDIA being the one pushing the latter). It may well be a good call on Microsoft’s part as it could enable better performance and researchers would be able to code their AI accelerator platforms down to the gate level to really optimize things. Using higher level languages and cheaper hardware with GPUs does have a lower barrier to entry though. I suppose ti will depend on just how much Microsoft is going to charge customers to use the FPGA-powered instances.

FPGAs are in kind of a weird middle ground and while they are definitely not a new technology, they do continue to get more complex and powerful!

What are your thoughts on Intel's new FPGA SoC?

Also read:

Source: Intel

Podcast #275 - AMD Radeon R9 290X, ARMTechCon 2013, NVIDIA Pricedrops and more!

Subject: General Tech | October 31, 2013 - 03:48 PM |
Tagged: podcast, video, R9 290X, amd, radeon, 290x crossfire, 280x, r9 280x, gtx 770, gtx 780, arm, mali, Altera

PC Perspective Podcast #275 - 10/31/2013

Join us this week as we discuss the AMD Radeon R9 290X, ARMTechCon 2013, NVIDIA Pricedrops and more!

You can subscribe to us through iTunes and you can still access it directly through the RSS page HERE.

The URL for the podcast is: http://pcper.com/podcast - Share with your friends!

  • iTunes - Subscribe to the podcast directly through the iTunes Store
  • RSS - Subscribe through your regular RSS reader
  • MP3 - Direct download link to the MP3 file

Hosts: Ryan Shrout, Jeremy Hellstrom, Josh Walrath, and Allyn Malventano

 
Program length: 1:22:37
  1. Week in Review:
    1. 0:55:40
  2. 0:59:20 This episode is brought to you by Carbonite.com! Use offer code PC for two free months!
      1. Intel Series 9 Chipset
  3. Hardware/Software Picks of the Week:
  4. podcast@pcper.com
  5. Closing/outro

 

ARM TechCon 2013: Altera To Produce ARMv8 Chips on Intel 14nm Fabs

Subject: Processors, Mobile | October 29, 2013 - 12:24 PM |
Tagged: techcon, Intel, arm techcon, arm, Altera, 14nm

In February of this year Intel and Altera announced that they would be partnering to build Altera FPGAs using the upcoming Intel 14nm tri-gate process technology.  The deal was important for the industry as it marked one of the first times Intel has shared its process technology with another processor company.  Seen as the company's most valuable asset, the decision to outsource work in the Intel fabrication facilities could have drastic ramifications for Intel's computing divisions and the industry as a whole.  This seems to back up the speculation that Intel is having a hard time keeping their Fabs at anywhere near 100% utilization with only in-house designs.

Today though, news is coming out that Altera is going to be included ARM-based processing cores, specifically those based on the ARMv8 64-bit architecture.  Starting in 2014 Altera's high-end Stratix 10 FPGA that uses four ARM Cortex-A53 cores will be produced by Intel fabs.

The deal may give Intel pause about its outsourcing strategy. To date the chip giant has experimented with offering its leading-edge fab processes as foundry services to a handful of chip designers, Altera being one of its largest planned customers to date.

Altera believes that by combing the ARMv8 A53 cores and Intel's 14nm tri-gate transistors they will be able to provide FPGA performance that is "two times the core performance" of current high-end 28nm options.

alteraarm.JPG

While this news might upset some people internally at Intel's architecture divisions, the news couldn't be better for ARM.  Intel is universally recognized as being the process technology leader, generally a full process node ahead of the competition from TSMC and GlobalFoundries.  I already learned yesterday that many of ARM's partners are skipping the 20nm technology from non-Intel foundries and instead are looking towards the 14/16nm FinFET transitions coming in late 2014. 

ARM has been working with essentially every major foundry in the business EXCEPT Intel and many viewed Intel's chances of taking over the mobile/tablet/phone space as dependent on its process technology advantage.  But if Intel continues to open up its facilities to the highest bidders, even if those customers are building ARM-based designs, then it could drastically improve the outlook for ARM's many partners.

UPDATE (7:57pm): After further talks with various parties there are a few clarifications that I wanted to make sure were added to our story.  First, Altera's FPGAs are primarly focused on the markets of communication, industrial, military, etc.  They are not really used as application processors and thus are not going to directly compete with Intel's processors in the phone/tablet space.  It remains to be seen if Intel will open its foundries to a directly competing product but for now this announcement regarding the upcoming Stratix 10 FPGA on Intel's 14nm tri-gate is an interesting progression.

Source: EETimes

Altera Does FPGAs with OpenCL

Subject: General Tech, Graphics Cards | October 16, 2013 - 10:00 PM |
Tagged: FPGA, Altera

(Update 10/17/2013, 6:13 PM) Apparently I messed up inputing this into the website last night. To compare FPGAs with current hardware, the Altera Stratix 10 is rated at more than 10 TeraFLOPs compared to the Tesla K20X at ~4 TeraFLOPs or the GeForce Titan at ~4.5 TeraFLOPs. All figures are single precision. (end of update)

Field Programmable Gate Arrays (FPGAs) are not general purpose processors; they are not designed to perform any random instruction at any random time. If you have a specific set of instructions that you want performed efficiently, you can spend a couple of hours compiling your function(s) to an FPGA which will then be the hardware embodiment of your code.

This is similar to an Application-Specific Integrated Circuit (ASIC) except that, for an ASIC, it is the factory who bakes your application into the hardware. Many (actually, to my knowledge, almost every) FPGAs can even be reprogrammed if you can spare those few hours to configure it again.

14nmPressGraphic3.jpg

Altera is a manufacturer of FPGAs. They are one of the few companies who were allowed access to Intel's 14nm fabrication facilities. Rahul Garg of Anandtech recently published a story which discussed compiling OpenCL kernels to FPGAs using Altera's compiler.

Now this is pretty interesting.

The design of OpenCL splits work between "host" and "kernel". The host application is written in some arbitrary language and follows typical programming techniques. Occasionally, the application will run across a large batch of instructions. A particle simulation, for instance, will require position information to be computed. Rather than having the host code loop through every particle and perform some complex calculation, what happens to each particle could be "a kernel" which the host adds to the queue of some accelerator hardware. Normally, this is a GPU with its thousands of cores chunked into groups of usually 32 or 64 (vendor-specific).

OpenCL_Logo-thumb.png

An FPGA, on the other hand, can lock itself to the specific set of instructions. It can decide to, within a few hours, configure some arbitrary number of compute paths and just churn through each kernel call until it is finished. The compiler knows exactly the application it will need to perform while the host code runs on the CPU.

This is obviously designed for enterprise applications, at least as far into the future as we can see. Current models are apparently priced in the thousands of dollars but, as the article points out, has the potential to out-perform a 200W GPU at just a tenth of the power. This could be very interesting for companies, perhaps a film production house, who wants to install accelerator cards for sub-d surfaces or ray tracing but would like to develop the software in-house and occasionally update their code after business hours.

Regardless of the potential market, a FPGA-based add-in card simply makes sense for OpenCL and its architecture.

Source: Anandtech