Intel AVX-512 Expanded

Subject: General Tech, Graphics Cards, Processors | July 19, 2014 - 03:05 AM |
Tagged: Xeon Phi, xeon, Intel, avx-512, avx

It is difficult to know what is actually new information in this Intel blog post, but it is interesting none-the-less. Its topic is the AVX-512 extension to x86, designed for Xeon and Xeon Phi processors and co-processors. Basically, last year, Intel announced "Foundation", the minimum support level for AVX-512, as well as Conflict Detection, Exponential and Reciprocal, and Prefetch, which are optional. This, earlier blog post was very much focused on Xeon Phi, but it acknowledged that the instructions will make their way to standard, CPU-like Xeons at around the same time.

Intel_Xeon_Phi_Family.jpg

This year's blog post brings in a bit more information, especially for common Xeons. While all AVX-512-supporting processors (and co-processors) will support "AVX-512 Foundation", the instruction set extensions are a bit more scattered.

 
Xeon
Processors
Xeon Phi
Processors
Xeon Phi
Coprocessors (AIBs)
Foundation Instructions Yes Yes Yes
Conflict Detection Instructions Yes Yes Yes
Exponential and Reciprocal Instructions No Yes Yes
Prefetch Instructions No Yes Yes
Byte and Word Instructions Yes No No
Doubleword and Quadword Instructions Yes No No
Vector Length Extensions Yes No No

Source: Intel AVX-512 Blog Post (and my understanding thereof).

So why do we care? Simply put: speed. Vectorization, the purpose of AVX-512, has similar benefits to multiple cores. It is not as flexible as having multiple, unique, independent cores, but it is easier to implement (and works just fine with having multiple cores, too). For an example: imagine that you have to multiply two colors together. The direct way to do it is multiply red with red, green with green, blue with blue, and alpha with alpha. AMD's 3DNow! and, later, Intel's SSE included instructions to multiply two, four-component vectors together. This reduces four similar instructions into a single operating between wider registers.

Smart compilers (and programmers, although that is becoming less common as compilers are pretty good, especially when they are not fighting developers) are able to pack seemingly unrelated data together, too, if they undergo similar instructions. AVX-512 allows for sixteen 32-bit pieces of data to be worked on at the same time. If your pixel only has four, single-precision RGBA data values, but you are looping through 2 million pixels, do four pixels at a time (16 components).

For the record, I basically just described "SIMD" (single instruction, multiple data) as a whole.

This theory is part of how GPUs became so powerful at certain tasks. They are capable of pushing a lot of data because they can exploit similarities. If your task is full of similar problems, they can just churn through tonnes of data. CPUs have been doing these tricks, too, just without compromising what they do well.

Source: Intel

Intel's Knights Landing (Xeon Phi, 2015) Details

Subject: General Tech, Graphics Cards, Processors | July 2, 2014 - 03:55 AM |
Tagged: Intel, Xeon Phi, xeon, silvermont, 14nm

Anandtech has just published a large editorial detailing Intel's Knights Landing. Mostly, it is stuff that we already knew from previous announcements and leaks, such as one by VR-Zone from last November (which we reported on). Officially, few details were given back then, except that it would be available as either a PCIe-based add-in board or as a socketed, bootable, x86-compatible processor based on the Silvermont architecture. Its many cores, threads, and 512 bit registers are each pretty weak, compared to Haswell, for instance, but combine to about 3 TFLOPs of double precision performance.

itsbeautiful.png

Not enough graphs. Could use another 256...

The best way to imagine it is running a PC with a modern, Silvermont-based Atom processor -- only with up to 288 processors listed in your Task Manager (72 actual cores with quad HyperThreading).

The main limitation of GPUs (and similar coprocessors), however, is memory bandwidth. GDDR5 is often the main bottleneck of compute performance and just about the first thing to be optimized. To compensate, Intel is packaging up-to 16GB of memory (stacked DRAM) on the chip, itself. This RAM is based on "Hybrid Memory Cube" (HMC), developed by Micron Technology, and supported by the Hybrid Memory Cube Consortium (HMCC). While the actual memory used in Knights Landing is derived from HMC, it uses a proprietary interface that is customized for Knights Landing. Its bandwidth is rated at around 500GB/s. For comparison, the NVIDIA GeForce Titan Black has 336.4GB/s of memory bandwidth.

Intel and Micron have worked together in the past. In 2006, the two companies formed "IM Flash" to produce the NAND flash for Intel and Crucial SSDs. Crucial is Micron's consumer-facing brand.

intel-knights-landing.jpg

So the vision for Knights Landing seems to be the bridge between CPU-like architectures and GPU-like ones. For compute tasks, GPUs edge out CPUs by crunching through bundles of similar tasks at the same time, across many (hundreds of, thousands of) computing units. The difference with (at least socketed) Xeon Phi processors is that, unlike most GPUs, Intel does not rely upon APIs, such as OpenCL, and drivers to translate a handful of functions into bundles of GPU-specific machine language. Instead, especially if the Xeon Phi is your system's main processor, it will run standard, x86-based software. The software will just run slowly, unless it is capable of vectorizing itself and splitting across multiple threads. Obviously, OpenCL (and other APIs) would make this parallelization easy, by their host/kernel design, but it is apparently not required.

It is a cool way that Intel arrives at the same goal, based on their background. Especially when you mix-and-match Xeons and Xeon Phis on the same computer, it is a push toward heterogeneous computing -- with a lot of specialized threads backing up a handful of strong ones. I just wonder if providing a more-direct method of programming will really help developers finally adopt massively parallel coding practices.

I mean, without even considering GPU compute, how efficient is most software at splitting into even two threads? Four threads? Eight threads? Can this help drive heterogeneous development? Or will this product simply try to appeal to those who are already considering it?

Source: Intel

Podcast #306 - Budget PC Shootout, the Coolermaster Elite 110, AMD GameWorks competitor

Subject: General Tech | June 26, 2014 - 02:36 PM |
Tagged: xeon, video, seiki, podcast, nvidia, msi, Intel, HDMI 2.0, gt70 2pe, gt70, gameworks, FX-9590, displayport 1.3, coolermaster, amd, 4k

PC Perspective Podcast #306 - 06/26/2014

Join us this week as we discuss our Budget PC Shootout, the Coolermaster Elite 110, an AMD GameWorks competitor and more!

You can subscribe to us through iTunes and you can still access it directly through the RSS page HERE.

The URL for the podcast is: http://pcper.com/podcast - Share with your friends!

  • iTunes - Subscribe to the podcast directly through the iTunes Store
  • RSS - Subscribe through your regular RSS reader
  • MP3 - Direct download link to the MP3 file

Hosts: Ryan Shrout, Josh Walrath, Jeremy Hellstrom, and Allyn Maleventano

Program length: 1:19:12

Subscribe to the PC Perspective YouTube Channel for more videos, reviews and podcasts!!

 

 

You got your FPGA in my Xeon!

Subject: General Tech | June 19, 2014 - 01:19 PM |
Tagged: xeon, Intel, FPGA

Intel has just revealed what The Register is aptly referring to as the FrankenChip, a hybrid Xeon E5 and FPGA chip.  This will allow large companies to access the power of a Xeon and be able to offload some work onto an FPGA they can program and optimize themselves.  The low power FPGA is actually on the chip, as opposed to Microsoft's recent implementation which saw FPGA's added to PCIe slots.  Intel's solution does not use up a slot and also offers direct access to the Xeon cache hierarchy and system memory via QPI which will allow for increased performance.  Another low power shot has been fired at ARM's attempts to grow their share of the server market but we shall see if the inherent complexity of programming an FPGA to work with an x86 is more or less attractive than switching to ARM.

intel-inside-logo-370x290.jpg

"Intel has expanded its chip customization business to help it take on the hazy threat posed by some of the world's biggest clouds adopting low-power ARM processors."

Here is some more Tech News from around the web:

Tech Talk

Source: The Register

ADATA Moves Quickly on New DDR4 Specification

Subject: General Tech | April 3, 2014 - 03:00 PM |
Tagged: adata, ddr4, xeon

ADATA has been rather busy lately, the release of the brand new Premiere Pro SSD family and now the launch of DDR4 modules for the next generation of Xeon processors.  These new DIMMs follow the current trend of energy efficiency in the server room by dropping the required voltage to 1.2V which can add up to quite a bit in a large server farm.  The specified speed of 2133MHz is attractive for a first gen server RDIMM though there does not seem to be much information available on the timings.

Taipei, Taiwan – April 3, 2014 - ADATA Technology, a leading manufacturer of high-performance DRAM modules and NAND Flash application products, has announced the launch of new DDR4 modules. Working in close cooperation with Intel, ADATA has successfully developed and launched DDR4 RDIMM (ECC Registered DIMM) that are fully compatible with the newly announced, next generation platform of Intel Xeon processor E5-2600 v3 product family.

Coming in densities of 4, 8 & 16 gigabytes, the new modules run at 1.2 volts, and at a frequency of 2133MHz. The higher clock frequencies, faster data transfer rates, and low voltage operation of DDR4 memory make it especially suited for use in the growing cloud server, storage and networking application fields.

adata.png

According to Jacky Yang, Product Manager at ADATA: “We are enthusiastic about the great potential of this new DDR4 specification, and we will move quickly to bring this new technology to our customers. Currently in development are DDR4 versions of ECC SO-DIMM, VLP RDIMM, & LRDIMM, so we look forward to providing the stability and reliability that ADATA is known for in a low voltage and high performance package.”

Source: ADATA

The Xeon E5-2600 gets an Ivy Bridge EP upgrade

Subject: General Tech | September 11, 2013 - 01:30 PM |
Tagged: Ivy Bridge-EP, xeon, xeon E5-2600 v2, idf 2013

A second coming of the Xeon E5-2600 family uses the Ivy Bridge-EP architecture and will sport up to 12 cores, using 22nm Trigate technology.  The three CPUs which will be arriving are each aimed at a separate market segment with different core counts and TDP.  The lower power chips will sport either 4 or 6 cores and have a TDP between 40-80W with the same 15MB L3 cache as SB-EP.  The second has a 25MB L3 cache, 6, 8 or 10 cores and TDPs ranging from 70-130W and uses the same interconnects as previously existed.  The last is the beast with 12 cores, TDPs of 115-130W and three rings linking the cores and cache segments with a split memory controller.  Check The Register for more info on the high powered end of IDF.

intel_xeon_e5_2600_v2_die_shot.jpg

"Companies with workloads that like to ride on lots of threads and cores are going to be able to get a lot more bang for a two-socket box thanks to the launch of the "Ivy Bridge-EP" Xeon E5-2600 v2 processors by Intel."

Here is some more Tech News from around the web:

Tech Talk

Source: The Register

New Atom C2000 processors and 14nm Server CPUs from Intel

Subject: General Tech | July 23, 2013 - 03:44 PM |
Tagged: Intel, atom, 14nm, Avoton, Broadwell, Denverton, xeon, rangeley

Intel has spent the day announcing new products for the server room, from new Atoms to Xeons.  Atom will bear the names of Avoton and Rangeley, Avoton will deal with microservers where power and heat are a major concern while Rangeley will appear in network devices and possibly mobile communication devices.  In the case of Avoton it will be replacing a chip that has not yet been released, the 32nm Atom S1200 lineup is due out in the near future and will fill a new niche for Intel that Centerton failed to fill.  The Register talks a bit more indepth here.

intel_avoton_atom_block_diagram.jpg

Slightly more powerful will be new Broadwell and Denverton Xeons, the first SoC server chips from Intel which will be manufactured on the 14nm process.  We heard much less about these upcoming chips, due for 2014 but you can read what is available at The Inquirer.

intel-low-power-server-roadmap-370x229.jpg

"SAN FRANCISCO: CHIPMAKER Intel has revealed more details about its server processor roadmap, including its upcoming Atom chips codenamed Avoton and Rangeley and new 14nm Xeon and Atom parts codenamed Broadwell and Denverton, respectively."

Here is some more Tech News from around the web:

Tech Talk

Source: The Inquirer

Engineering Sample of Intel Core i7-4960X, Ivy Bridge-E

Subject: General Tech, Processors | July 18, 2013 - 07:41 PM |
Tagged: xeon, Ivy Bridge-E, Intel

Tom's Hardware acquired, from... somewhere, an early engineering sample of the upcoming Core i7-4960X. Intel was allegedly not involved with this preview and were thus, I would expect, not the supplier for their review unit. While the introductory disclaimer alluded to some tensions between Intel and themselves, for us: we finally have a general ballpark of Ivy Bridge-E's performance. Sure, tweaks could be made before the end of this year, but this might be all we have to go on until then.

itunes.png

Single Threaded

handbrake.png

Multi Threaded

Both images, credit, Tom's Hardware.

When browsing through the benchmarks, I noticed three key points:

  • Single-threaded: slightly behind mainstream Haswell, similar to Sandy Bridge-E (SBE).
  • Multi-threaded: eight cores (Update 1: This was a 6-core part) are better than SBE, but marginal given the wait.
  • Power efficiency: Ivy Bridge-E handily wins, about 30% more performance per watt.

These results will likely be disappointing to enthusiasts who seek the highest performance, especially in single-threaded applications. Data centers, on the other hand, will likely be eager for Xeon variants of this architecture. The higher-tier Xeon E5 processors are still based on Socket 2011 Sandy Bridge-E including, for instance, those powering the highest performance Cluster Compute instances at Amazon Web Services.

But, for those who actually are salivating for the fastest at all costs, the wait for Ivy Bridge-E might as well be postponed until Haswell-E reaches us, allegedly, just a year later. That architecture should provide significant increases in performance, single and multi-threaded, and is rumored to arrive the following year. I may have just salted the wounds of those who purchased an X79 motherboard, awaiting Ivy Bridge-E, but it might just be the way to go for those who did not pre-invest in Ivy Bridge-E's promise.

Again, of course, under the assumption that these benchmarks are still valid upon release. While a complete product re-bin is unlikely, we still do not know what clock rate the final silicon will be capable of supporting, officially or unofficially.

Keep calm, and carry a Haswell?

Podcast #246 - ASUS P8Z77-I Deluxe Mini-ITX motherboard, more Frame Rating, DirectX 12 and more!

Subject: General Tech | April 11, 2013 - 01:26 PM |
Tagged: video, xeon, thunderbolt, roccat, quadro, premiere, podcast, opencl, nerdytec, Ivy Bridge-E, haswell, frame rating, firepro, falcon ridge, DirectX 12, couchmaster, ASUS P8Z77-I Deluxe, amd

PC Perspective Podcast #246- 04/11/2013

Join us this week as we discuss the ASUS P8Z77-I Deluxe Mini-ITX motherboard, more Frame Rating, DirectX 12 and more!

You can subscribe to us through iTunes and you can still access it directly through the RSS page HERE.

The URL for the podcast is: http://pcper.com/podcast - Share with your friends!

  • iTunes - Subscribe to the podcast directly through the iTunes Store
  • RSS - Subscribe through your regular RSS reader
  • MP3 - Direct download link to the MP3 file

Hosts: Ryan Shrout, Jeremy Hellstrom, Josh Walrath, and Allyn Malventano

This Podcast is brought to you by MSI!

Program length: 1:01:46

  1. Winner last week? Mike McLaughlin!! Congrats!
  2. Week in Review:
  3. 0:24:00 NerdyTec COUCHMASTER
  4. News items of interest:
  5. 0:47:00 Hardware/Software Picks of the Week:
    1. Allyn: Ultra Brush dust remover
  6. 1-888-38-PCPER or podcast@pcper.com
  7. Closing/outro

 

IDF: Intel Announces Upcoming Haswell and Ivy Bridge-E Xeon Processors

Subject: General Tech | April 10, 2013 - 04:14 PM |
Tagged: xeon-ex, xeon-ep, xeon, server, Intel, HPC, haswell

Intel officially announced its next-generation Xeon processors at IDF Beijing today. The new lineup includes the Haswell-based Xeon E3 1200 V3 family on the low end, and the Ivy Bridge-EP Xeon E5 and Ivy Bridge-EX Xeon E7 aimed at the mid-range general purpose and high-end HPC markets respectively. Intel did not disclose pricing or details on the new chips (such as core counts, cache, clockspeeds, number of SKUs etc.). However, the x86 chip giant did state that the new chips are coming later this year as well as teasing a few tidbits of information on the new Xeon chips.

The upcoming Xeon E3 processors will be part of the Xeon E3 1200 V3 family. These chips will be based on Haswell and are limited to one socket per board. Thanks to the Haswell architecture, Intel has managed to reduce power consumption by approximately 25% and increase video transcoding performance by about 25%. There will be at least one Xeon E3 1200 V3 series chip with a 13W TDP, for example.

Intel is also releasing a new media software development kit (SDK) for Linux and Windows machines that will provide a common platform for developers. It has allowed Intel to maximize the use of both the CPU and GPU for HD video transcoding as well as increasing the number of simultaneous video transcodes over previous generations. The new Xeon E3 1200 V3 (Haswell) chips will be available sometime before the end of 2013.

Intel Xeon Logo.jpg

The next-generation Xeon E5 chips will be based on the 22nm Ivy Bridge-EP architecture. They will be positioned at general purpose computing in data centers (and possibly high-end workstations), and will be limited to 2 sockets per motherboard. The new Xeon E5 processors will incorporate Intel Secure Key and OS Guard technologies. OS Guard is the evolution of the company's existing Intel Execute Disable Bit security technology. Intel is also including AES-NI (AES-New Instructions), to improve the hardware acceleration of AES encrypt/decrypt operations. These mid-range Xeon chips will be available in Q3 2013.

Finally, the top-end Xeon E7 processors will be based on the 22nm Ivy Bridge-EX architecture. The upcoming processors are intended for high performance server and supercomputing applications where scalability and performance are important. The Ivy Bride-EX chips are compatible with motherboards that will have between 4 and 8 sockets and up to 12TB of RAM per node. Further, Intel has packed these processors with new RAS features, including Resilient System Technology and Resilient Memory Technology. The RAS features ensure stability and data integrity in calculations are maintained. Such features are important in scientific, real-time analytics, cloud computing, and banking applications, where performance and up-time are paramount and any errors could cost a company money. Intel has stated that the new Xeon E7 CPUs will be available in the fourth quarter of this year (Q4'13).

While I was hoping for more details as far as core count, clockspeeds, and pricing, the approximate release to market timeframe for the chips is known. Do you think you will be upgrading to the new Xeon chips later this year, or are your current processors fast enough for your server applications?

More information on the upcoming Xeon chips can be found in this Intel fact sheet (PDF).

Source: Intel (PDF)