JavaScript Is Still Getting Faster...

Subject: General Tech | September 18, 2014 - 11:08 PM |
Tagged: asm.js, simd, sse, avx, neon, arm, Intel, x86

The language that drives the client-side web (and server-side with Node.js) is continually being improved. Love it or hate it, JavaScript is everywhere and approaching native execution performance. You can write it yourself or compile into it from another, LLVM-compatible language through Emscripten. In fact, initiatives (like ASM.js) actually prefer compiled code because the translator can do what you are intending without accidentally stepping into slow functionality.

javascript-logo.png

Over at Microsoft's Modern.IE status page, many features are listed as being developed or considered. This includes support for Mozilla-developed ASM.js and, expected to be included in ECMAScript 7th edition, SIMD instructions. This is the one that I wanted to touch on most. SIMD, which is implemented as SSE, AVX, NEON, and other instruction sets, to perform many tasks in few, actual instructions. For browsers which support this, it could allow for significant speed-ups in vector-based tasks, such as manipulating colors, vertexes, and other data structures. Emscripten is in the process of integrating SIMD support and the technology is designed to support Web Workers, allowing SIMD-aware C and C++ code to be compiled into SIMD.JS and scale to multiple cores, if available, and they probably are these days.

In short, it will be possible to store and process colors, positions, forces, and other data structures as packed, 32-bit 4-vectors, rather than arbitrary objects with properties that must be manipulated individually. It increases computation throughput for significantly large datasets. This should make game developers happy, in particular.

Apparently, some level of support has been in Firefox Nightly for the last several versions. No about:config manipulation required, just call the appropriate function on window's SIMD subobject. Internet Explorer is considering it and Chromium is currently reviewing Intel's contribution.

Source: Modern.IE

Intel AVX-512 Expanded

Subject: General Tech, Graphics Cards, Processors | July 19, 2014 - 12:05 AM |
Tagged: Xeon Phi, xeon, Intel, avx-512, avx

It is difficult to know what is actually new information in this Intel blog post, but it is interesting none-the-less. Its topic is the AVX-512 extension to x86, designed for Xeon and Xeon Phi processors and co-processors. Basically, last year, Intel announced "Foundation", the minimum support level for AVX-512, as well as Conflict Detection, Exponential and Reciprocal, and Prefetch, which are optional. This, earlier blog post was very much focused on Xeon Phi, but it acknowledged that the instructions will make their way to standard, CPU-like Xeons at around the same time.

Intel_Xeon_Phi_Family.jpg

This year's blog post brings in a bit more information, especially for common Xeons. While all AVX-512-supporting processors (and co-processors) will support "AVX-512 Foundation", the instruction set extensions are a bit more scattered.

 
Xeon
Processors
Xeon Phi
Processors
Xeon Phi
Coprocessors (AIBs)
Foundation Instructions Yes Yes Yes
Conflict Detection Instructions Yes Yes Yes
Exponential and Reciprocal Instructions No Yes Yes
Prefetch Instructions No Yes Yes
Byte and Word Instructions Yes No No
Doubleword and Quadword Instructions Yes No No
Vector Length Extensions Yes No No

Source: Intel AVX-512 Blog Post (and my understanding thereof).

So why do we care? Simply put: speed. Vectorization, the purpose of AVX-512, has similar benefits to multiple cores. It is not as flexible as having multiple, unique, independent cores, but it is easier to implement (and works just fine with having multiple cores, too). For an example: imagine that you have to multiply two colors together. The direct way to do it is multiply red with red, green with green, blue with blue, and alpha with alpha. AMD's 3DNow! and, later, Intel's SSE included instructions to multiply two, four-component vectors together. This reduces four similar instructions into a single operating between wider registers.

Smart compilers (and programmers, although that is becoming less common as compilers are pretty good, especially when they are not fighting developers) are able to pack seemingly unrelated data together, too, if they undergo similar instructions. AVX-512 allows for sixteen 32-bit pieces of data to be worked on at the same time. If your pixel only has four, single-precision RGBA data values, but you are looping through 2 million pixels, do four pixels at a time (16 components).

For the record, I basically just described "SIMD" (single instruction, multiple data) as a whole.

This theory is part of how GPUs became so powerful at certain tasks. They are capable of pushing a lot of data because they can exploit similarities. If your task is full of similar problems, they can just churn through tonnes of data. CPUs have been doing these tricks, too, just without compromising what they do well.

Source: Intel