Subject: General Tech | September 19, 2014 - 02:08 AM | Scott Michaud
Tagged: asm.js, simd, sse, avx, neon, arm, Intel, x86
Over at Microsoft's Modern.IE status page, many features are listed as being developed or considered. This includes support for Mozilla-developed ASM.js and, expected to be included in ECMAScript 7th edition, SIMD instructions. This is the one that I wanted to touch on most. SIMD, which is implemented as SSE, AVX, NEON, and other instruction sets, to perform many tasks in few, actual instructions. For browsers which support this, it could allow for significant speed-ups in vector-based tasks, such as manipulating colors, vertexes, and other data structures. Emscripten is in the process of integrating SIMD support and the technology is designed to support Web Workers, allowing SIMD-aware C and C++ code to be compiled into SIMD.JS and scale to multiple cores, if available, and they probably are these days.
In short, it will be possible to store and process colors, positions, forces, and other data structures as packed, 32-bit 4-vectors, rather than arbitrary objects with properties that must be manipulated individually. It increases computation throughput for significantly large datasets. This should make game developers happy, in particular.
Apparently, some level of support has been in Firefox Nightly for the last several versions. No about:config manipulation required, just call the appropriate function on window's SIMD subobject. Internet Explorer is considering it and Chromium is currently reviewing Intel's contribution.
NVIDIA Reveals 64-bit Denver CPU Core Details, Headed to New Tegra K1 Powered Devices Later This Year
Subject: Processors | August 12, 2014 - 01:06 AM | Tim Verry
Tagged: tegra k1, project denver, nvidia, Denver, ARMv8, arm, Android, 64-bit
During GTC 2014 NVIDIA launched the Tegra K1, a new mobile SoC that contains a powerful Kepler-based GPU. Initial processors (and the resultant design wins such as the Acer Chromebook 13 and Xiaomi Mi Pad) utilized four ARM Cortex-A15 cores for the CPU side of things, but later this year NVIDIA is deploying a variant of the Tegra K1 SoC that switches out the four A15 cores for two custom (NVIDIA developed) Denver CPU cores.
The custom 64-bit Denver CPU cores use a 7-way superscalar design and run a custom instruction set. Denver is a wide but in-order architecture that allows up to seven operations per clock cycle. NVIDIA is using a custom ISA and on-the-fly binary translation to convert ARMv8 instructions to microcode before execution. A software layer and 128MB cache enhance the Dynamic Code Optimization technology by allowing the processor to examine and optimize the ARM code, convert it to the custom instruction set, and further cache the converted microcode of frequently used applications in a cache (which can be bypassed for infrequently processed code). Using the wider execution engine and Dynamic Code Optimization (which is transparent to ARM developers and does not require updated applications), NVIDIA touts the dual Denver core Tegra K1 as being at least as powerful as the quad and octo-core packing competition.
Further, NVIDIA has claimed at at peak throughput (and in specific situations where application code and DCO can take full advantage of the 7-way execution engine) the Denver-based mobile SoC handily outpaces Intel’s Bay Trail, Apple’s A7 Cyclone, and Qualcomm’s Krait 400 CPU cores. In the results of a synthetic benchmark test provided to The Tech Report, the Denver cores were even challenging Intel’s Haswell-based Celeron 2955U processor. Keeping in mind that these are NVIDIA-provided numbers and likely the best results one can expect, Denver is still quite a bit more capable than existing cores. (Note that the Haswell chips would likely pull much farther ahead when presented with applications that cannot be easily executed in-order with limited instruction parallelism).
NVIDIA is ratcheting up mobile CPU performance with its Denver cores, but it is also aiming for an efficient chip and has implemented several power saving tweaks. Beyond the decision to go with an in-order execution engine (with DCO hopefully mostly making up for that), the beefy Denver cores reportedly feature low latency power state transitions (e.g. between active and idle states), power gating, dynamic voltage, and dynamic clock scaling. The company claims that “Denver's performance will rival some mainstream PC-class CPUs at significantly reduced power consumption.” In real terms this should mean that the two Denver cores in place of the quad core A15 design in the Tegra K1 should not result in significantly lower battery life. The two K1 variants are said to be pin compatible such that OEMs and developers can easily bring upgraded models to market with the faster Denver cores.
For those curious, In the Tegra K1, the two Denver cores (clocked at up to 2.5GHz) share a 16-way L2 cache and each have 128KB instruction and 64KB data L1 caches to themselves. The 128MB Dynamic Code Optimization cache is held in system memory.
Denver is the first (custom) 64-bit ARM processor for Android (with Apple’s A7 being the first 64-bit smartphone chip), and NVIDIA is working on supporting the next generation Android OS known as Android L.
The dual Denver core Tegra K1 is coming later this year and I am excited to see how it performs. The current K1 chip already has a powerful fully CUDA compliant Kepler-based GPU which has enabled awesome projects such as computer vision and even prototype self-driving cars. With the new Kepler GPU and Denver CPU pairing, I’m looking forward to seeing how NVIDIA’s latest chip is put to work and the kinds of devices it enables.
Are you excited for the new Tegra K1 SoC with NVIDIA’s first fully custom cores?
Subject: General Tech | July 31, 2014 - 01:41 PM | Jeremy Hellstrom
Tagged: amd, seattle, developer, arm, opteron a1100, Cortex A57
AMD has been teasing us with Seattle, their first ARM based CPU which Josh described back in May after AMD's presentation. The AMD Opteron A1100 series will come in 4 and 8 core versions with each core being a Cortex A57 that has up to 4MB of shared L2 and 8MB of shared L3 cache, support for DDR3 or DDR4, 8 lanes of PCIe 3.0, up to 8 SATA3 ports and two 10Gb Ethernet ports. The newly announced Dev Kit will ship with a 4 core version and it can be yours for a mere $3000 if your application is accepted by AMD. It will be very interesting to see how these are integrated into existing server rooms and applications though it is a pity we will have to wait for HSA support. Check out more at The Inquirer.
"AMD HAS RELEASED a developer kit for its AMD Opteron A1100 server processor series that features the first 64-bit ARM-based chips codenamed "Seattle"."
Here is some more Tech News from around the web:
- "BadUSB" Exploit Makes Devices Turn "Evil" @ "Slashdot"
- Bloodthirsty Apple fanbois TEAR OPEN new Macbook, bare its guts to world+dog @ The Register
- Intel to see difficulties shipping 40 million tablet CPUs in 2014 @ DigiTimes
- Apple to lay off 40 percent of Beats staff following $3bn acquisition @ The Inquirer
- Windows Phone 8.1 Update Adds "Live Folders", Expands Cortana Support @ DailyTech
Subject: General Tech, Processors, Mobile | July 11, 2014 - 04:58 PM | Scott Michaud
Tagged: x86, VIA, isaiah II, Intel, centaur, arm, amd
There might be a third, x86-compatible processor manufacturer who is looking at the mobile market. Intel has been trying to make headway, including the direct development of Android for the x86 architecture. The company also has a few design wins, mostly with Windows 8.1-based tablets but also the occasional Android-based models. Google is rumored to be preparing the "Nexus 8" tablet with one of Intel's Moorefield SoCs. AMD, the second-largest x86 processor manufacturer, is aiming their Mullins platform at tablets and two-in-ones, but cannot afford to play snowplow, at least not like Intel.
VIA, through their Centaur Technology division, is expected to announce their own x86-based SoC, too. Called Isaiah II, it is rumored to be a quad core, 64-bit processor with a maximum clock rate of 2.0 GHz. Its GPU is currently unknown. VIA sold their stake S3 Graphics to HTC back in 2011, who then became majority shareholder over the GPU company. That said, HTC and VIA are very close companies. The chairwoman of HTC is the founder of VIA Technologies. The current President and CEO of VIA, who has been in that position since 1992, is her husband. I expect that the GPU architecture will be provided by S3, or will somehow be based on their technology. I could be wrong. Both companies will obviously do what they think is best.
It would make sense, though, especially if it benefits HTC with cheap but effective SoCs for Android and "full" Windows (not Windows RT) devices.
Or this announcement could be larger than it would appear. Three years ago, VIA filed for a patent which described a processor that can read both x86 and ARM machine language and translate it into its own, internal microinstructions. The Centaur Isaiah II could reasonably be based on that technology. If so, this processor would be able to support either version of Android. Or, after Intel built up the Android x86 code base, maybe they shelved that initiative (or just got that patent for legal reasons).
But what about Intel? Honestly, I see this being a benefit for the behemoth. Extra x86-based vendors will probably grow the overall market share, compared to ARM, by helping with software support. Even if it is compatible with both ARM and x86, what Intel needs right now is software. They can only write so much of it themselves. It is possible that VIA, being the original netbook processor, could disrupt the PC market with both x86 and ARM compatibility, but I doubt it.
Centaur Technology, the relevant division of VIA, will make their announcement in less than 51 days.
Subject: General Tech | July 10, 2014 - 01:17 PM | Ken Addison
Tagged: podcast, video, Intel, Mantle, amd, nvidia, XSPC, quantum dots, western digital, My Cloud Mirror, A10-7850K, Kaveri, arm, quakecon
PC Perspective Podcast #308 - 07/10/2014
Join us this week as we discuss Intel using Mantle, XSPC Watercooling Kits, Quantum Dots, and more!
The URL for the podcast is: http://pcper.com/podcast - Share with your friends!
- iTunes - Subscribe to the podcast directly through the Store
- RSS - Subscribe through your regular RSS reader
- MP3 - Direct download link to the MP3 file
Hosts: Ryan Shrout, Josh Walrath, Jeremy Hellstrom, Allyn Malventano, and Morry Tietelman
Subject: General Tech | July 8, 2014 - 01:40 PM | Jeremy Hellstrom
Tagged: SoC, Panasonic, Intel, arm
Intel has been fabbing ARM chips for Altera since the end of last year after their unprecedented move of allowing non-Intel designs into their fabs. This decision allowed Intel to increase the percentage of time the fabs were active, as they are no longer able to keep them at full capacity with their own chips and have even mothballed the new Fab 42 in Arizona. Altera is a good customer, as are Tabula, Netronome and Microsemi but together they are still not enough to bring Intel's capacity close to 100%. The Register has reported on a new contract with the ink still wet from signing; Panasonic will now be using Intel's Fabs for their ARM based SoCs. The immense size of Panasonic should keep Intel busy and ensure that they continue to make mountains of money licensing their 14nm-process tri-Gate transistors as well as the Fab time.
"Intel has notched up another customer for its fledgling Foundry business as it tries to make money out of its manufacturing and engineering expertise besides x86 processor sales.
The world's most valuable chip manufacturer said on Monday that Panasonic's audio-visual gear will make future system-on-chips (SoCs) in Intel's factories."
Here is some more Tech News from around the web:
- Fridge hacked. Car hacked. Next up, your LIGHT BULBS @ The Register
- RS Components shows off 3D printer line-up @ The Inquirer
- Red Hat Enterprise Linux 7 reaches general release @ The Inquirer
- Meet Xiki, the Revolutionary Command Shell for Linux and Mac OS X @ Linux.com
- Anime Expo 2014 – Part 3: Next-Level Cosplays @ Legit Reviews
Subject: General Tech | July 3, 2014 - 12:39 PM | Jeremy Hellstrom
Tagged: linux, linaro, juno, google, armv8-a, ARMv8, arm, Android
By now you should have read Ryan's post or listened to Josh talk about Juno on the PCPer Podcast but if you find yourself hungry for more information you can visit The Tech Report. They discuss how the 64-bit Linaro is already able to take advantage of one of big.LITTLE's power efficiency optimization called Global Task Scheduling. As Linaro releases monthly updates you can expect to see more features and better implementations as their take on the Android Open Source Project evolves. Expect to see more of Juno and ARMv8 on review sites as we work out just how to benchmark these devices.
"ARM has created its own custom SoC and platform for 64-bit development. The folks at Linaro have used this Juno dev platform to port an early version of Android L to the ARMv8 instruction set. Here's a first look at the Juno hardware and the 64-bit software it enables."
Here is some more Tech News from around the web:
- Running Cisco's VoIP manager? Four words you don't want to hear: 'Backdoor SSH root key @ The Register
- Latest Nexus 9 leak outs tablet's 5GB RAM, 2560x1600 screen @ The Inquirer
- Twitter takes on GOOGLE, swallows wannabe YouTube firm @ The Register
- Samsung will halt plasma TV production before the end of the year @ The Inquirer
- Previously male-only Hearthstone competition now open to all genders @ Polygon
Subject: Mobile | July 2, 2014 - 12:00 PM | Ryan Shrout
Tagged: linux, linaro, juno, google, armv8-a, ARMv8, arm, android l
Even though Apple has been shipping a 64-bit capable SoC since the release of the A7 part in September of 2013, the Android market has yet to see its first consumer 64-bit SoC release. That is about to change as we progress through the rest of 2014 and ARM is making sure that major software developers have the tools they need to be ready for the architecture shift. That help is will come in the form of the Juno ARM Development Platform (ADP) and 64-bit ready software stack.
Apple's A7 is the first core to implement ARMv8 but companies like Qualcomm, NVIDIA and course ARM have their own cores based on the 64-bit architecture. Much like we saw the with the 64-bit transition in the x86 ecosystem, ARMv8 will improve access to large datasets, will result in gains in performance thanks to increased register sizes, larger virtual address spaces above 4GB and more. ARM also improved performance of NEON (SIMD) and cryptography support while they were in there fixing up the house.
The Juno platform is the first 64-bit development platform to come directly from ARM and combines a host of components to create a reference hardware design for integrators and developers to target moving forward. Featuring a test chip built around Cortex-A57 (dual core), Cortex-A53 (quad core) and Mali-T624 (quad core), Juno allows software to target 64-bit development immediately without waiting for other SoC vendors to have product silicon ready. The hardware configuration implements big.LITTLE, OpenGL ES3.0 support, thermal and power management, Secure OS capability and more. In theory, ARM has built a platform that will be very similar to SoCs built by its partners in the coming months.
ARM isn't quite talking about the specific availability of the Juno platform, but for the target audience ARM should be able to provide the amount of development platforms necessary. Juno enables software development for 64-bit kernels, drivers, and tools and virtual machine hypervisors but it's not necessarily going to help developers writing generic applications. Think of Juno as the development platform for the low level designers and coders, not those that are migrating Facebook or Flappy Bird to your next smartphone.
The Juno platform helps ARM in a couple of specific ways. From a software perspective, it creates common foundation for the ARMv8 ecosystem and allows developer access to silicon before ARM's partners have prepared their own platforms. ARM claims that Juno is a fairly "neutral" platform so software developers won't feel like they are being funneled in one direction. I'd be curious what ARM's partners actually think about that though with the inclusion of Mali graphics, a product that ARM is definitely trying to promote in a competitive market.
Though the primary focus might be software, hardware partners will be able to benefit from Juno. On this board they will find the entire ARMv8 IP portfolio tested up to modern silicon. This should enable hardware vendors to see A57 and A53 working, in action and with the added benefit of a full big.LITTLE implementation. The hope is that this will dramatically accelerate the time to market for future 64-bit ARM designs.
The diagram above shows the full break down of the Juno SoC as well as some of the external connectivity on the board itself. The memory system is built around 8GB of DDR3 running at 12.8 GB/s and the is extensible through the PCI Express slots and the FPGA options.
Of course hardware is only half the story - today Linaro is releasing a 64-bit port of the Android Open Source Project (AOSP) that will run on Juno. That, along with the Linux kernel v3.14 with ARMv8-A support should give developers the tools needed to write the applications, middleware and kernels for future hardware. Also worth noting on June 25th at Google I/O was the announcement of developer access coming for Android L. This build will support ARMv8-A as well.
The switch to 64-bit technology on ARM devices isn't going to happen overnight but ARM and its partners have put together a collective ecosystem that will allow the software and hardware developers to make transition as quick and, most importantly, as painless as possible. With outside pressure pushing on ARM and its low power processor designs, it is taking more of its fate in its own hands, pushing the 64-bit transition forward at an accelerated pace. This helps ARM in the mobile space, the consumer space as well as the enterprise markets, a key market for SoC growth.
Subject: Editorial, General Tech, Graphics Cards, Processors, Chipsets | June 13, 2014 - 06:45 PM | Scott Michaud
Tagged: x86, restructure, gpu, arm, APU, amd
According to VR-Zone, AMD has reworked their business, last Thursday, sorting each of their projects into two divisions and moving some executives around. The company is now segmented into the "Enterprise, Embedded, and Semi-Custom Business Group", and the "Computing and Graphics Business Group". The company used to be divided between "Computing Solutions", which handled CPUs, APUs, chipsets, and so forth, "Graphics and Visual Solutions", which is best known for GPUs but also contains console royalties, and "All Other", which was... everything else.
Lisa Su, former general manger of global business, has moved up to Chief Operating Officer (COO), along with other changes.
This restructure makes sense for a couple of reasons. First, it pairs some unprofitable ventures with other, highly profitable ones. AMD's graphics division has been steadily adding profitability to the company while its CPU division has been mostly losing money. Secondly, "All Other" is about a nebulous as a name can get. Instead of having three unbalanced divisions, one of which makes no sense to someone glancing at AMD's quarterly earnings reports, they should now have two, roughly equal segments.
At the very least, it should look better to an uninformed investor. Someone who does not know the company might look at the sheet and assume that, if AMD divested from everything except graphics, that the company would be profitable. If, you know, they did not know that console contracts came into their graphics division because their compute division had x86 APUs, and so forth. This setup is now more aligned to customers, not products.
Subject: Processors, Mobile | June 4, 2014 - 11:00 AM | Ryan Shrout
Tagged: computex, computex 2014, arm, cavium, thunderx
While much of the news coming from Computex was centered around PC hardware, many of ARMs partners are making waves as well. Take Cavium for example, introducing the ThunderX CN88XX family of processors. With a completely custom ARMv8 architectural core design, the ThunderX processors will range from 24 to 48 cores and are targeted at large volume servers and cloud infrastructure. 48 cores!
The ThunderX family will be the first SoC to scale up to 48 cores and with a clock speed of 2.5 GHz and 16MB of L2 cache, should offer some truly impressive performance levels. Cavium claims to be the first socket-coherent ARM processor as well, using the Cavium Coherent Processor Interconnect. The I/O capacity stretches into the hundreds of Gigabits and quad channel DDR3 and DDR4 memory speeds up to 2.4 GHz keep the processors fed with work.
Here is the breakdown on the ThunderX families.
ThunderX_CP: Up to 48 highly efficient cores along with integrated virtSOC, dual socket coherency, multiple 10/40 GbE and high memory bandwidth. This family is optimized for private and public cloud web servers, content delivery, web caching, search and social media workloads.
ThunderX_ST: Up to 48 highly efficient cores along with integrated virtSOC, multiple SATAv3 controllers, 10/40 GbE & PCIe Gen3 ports, high memory bandwidth, dual socket coherency, and scalable fabric for east-west as well as north-south traffic connectivity. This family includes hardware accelerators for data protection/ integrity/security, user to user efficient data movement (RoCE) and compressed storage. This family is optimized for Hadoop, block & object storage, distributed file storage and hot/warm/cold storage type workloads.
ThunderX_SC: Up to 48 highly efficient cores along with integrated virtSOC, 10/40 GbE connectivity, multiple PCIe Gen3 ports, high memory bandwidth, dual socket coherency, and scalable fabric for east-west as well as north-south traffic connectivity. The hardware accelerators include Cavium’s industry leading, 4th generation NITROX and TurboDPI technology with acceleration for IPSec, SSL, Anti-virus, Anti-malware, firewall and DPI. This family is optimized for Secure Web front-end, security appliances and Cloud RAN type workloads.
ThunderX_NT: Up to 48 highly efficient cores along with integrated virtSOC, 10/40/100 GbE connectivity, multiple PCIe Gen3 ports, high memory bandwidth, dual socket coherency, and scalable fabric with feature rich capabilities for bandwidth provisioning , QoS, traffic Shaping and tunnel termination. The hardware accelerators include high packet throughput processing, network virtualization and data monitoring. This family is optimized for media servers, scale-out embedded applications and NFV type workloads.
We spoke with ARM earlier this year about its push into the server market and it is partnerships like these that will begin the ramp up to wide spread adoption of ARM-based server infrastructure. The ThunderX family will begin sampling in early Q4 2014 and production should be available by early 2015.