Subject: General Tech | September 19, 2014 - 02:08 AM | Scott Michaud
Tagged: asm.js, simd, sse, avx, neon, arm, Intel, x86
Over at Microsoft's Modern.IE status page, many features are listed as being developed or considered. This includes support for Mozilla-developed ASM.js and, expected to be included in ECMAScript 7th edition, SIMD instructions. This is the one that I wanted to touch on most. SIMD, which is implemented as SSE, AVX, NEON, and other instruction sets, to perform many tasks in few, actual instructions. For browsers which support this, it could allow for significant speed-ups in vector-based tasks, such as manipulating colors, vertexes, and other data structures. Emscripten is in the process of integrating SIMD support and the technology is designed to support Web Workers, allowing SIMD-aware C and C++ code to be compiled into SIMD.JS and scale to multiple cores, if available, and they probably are these days.
In short, it will be possible to store and process colors, positions, forces, and other data structures as packed, 32-bit 4-vectors, rather than arbitrary objects with properties that must be manipulated individually. It increases computation throughput for significantly large datasets. This should make game developers happy, in particular.
Apparently, some level of support has been in Firefox Nightly for the last several versions. No about:config manipulation required, just call the appropriate function on window's SIMD subobject. Internet Explorer is considering it and Chromium is currently reviewing Intel's contribution.
Subject: General Tech, Graphics Cards, Processors | July 19, 2014 - 03:05 AM | Scott Michaud
Tagged: Xeon Phi, xeon, Intel, avx-512, avx
It is difficult to know what is actually new information in this Intel blog post, but it is interesting none-the-less. Its topic is the AVX-512 extension to x86, designed for Xeon and Xeon Phi processors and co-processors. Basically, last year, Intel announced "Foundation", the minimum support level for AVX-512, as well as Conflict Detection, Exponential and Reciprocal, and Prefetch, which are optional. This, earlier blog post was very much focused on Xeon Phi, but it acknowledged that the instructions will make their way to standard, CPU-like Xeons at around the same time.
This year's blog post brings in a bit more information, especially for common Xeons. While all AVX-512-supporting processors (and co-processors) will support "AVX-512 Foundation", the instruction set extensions are a bit more scattered.
|Conflict Detection Instructions||Yes||Yes||Yes|
|Exponential and Reciprocal Instructions||No||Yes||Yes|
|Byte and Word Instructions||Yes||No||No|
|Doubleword and Quadword Instructions||Yes||No||No|
|Vector Length Extensions||Yes||No||No|
Source: Intel AVX-512 Blog Post (and my understanding thereof).
So why do we care? Simply put: speed. Vectorization, the purpose of AVX-512, has similar benefits to multiple cores. It is not as flexible as having multiple, unique, independent cores, but it is easier to implement (and works just fine with having multiple cores, too). For an example: imagine that you have to multiply two colors together. The direct way to do it is multiply red with red, green with green, blue with blue, and alpha with alpha. AMD's 3DNow! and, later, Intel's SSE included instructions to multiply two, four-component vectors together. This reduces four similar instructions into a single operating between wider registers.
Smart compilers (and programmers, although that is becoming less common as compilers are pretty good, especially when they are not fighting developers) are able to pack seemingly unrelated data together, too, if they undergo similar instructions. AVX-512 allows for sixteen 32-bit pieces of data to be worked on at the same time. If your pixel only has four, single-precision RGBA data values, but you are looping through 2 million pixels, do four pixels at a time (16 components).
For the record, I basically just described "SIMD" (single instruction, multiple data) as a whole.
This theory is part of how GPUs became so powerful at certain tasks. They are capable of pushing a lot of data because they can exploit similarities. If your task is full of similar problems, they can just churn through tonnes of data. CPUs have been doing these tricks, too, just without compromising what they do well.