Podcast #307 - EVGA Torq X10 Mouse, Samsung 850 Pro, OCZ RevoDrive 350 and more!

Subject: General Tech | July 3, 2014 - 03:17 PM |
Tagged: podcast, video, evga, TORQ X10, Samsung, 850 PRO, ocz, RevoDrive 350, Silverstone, Nightjar, knights landing, Xeon Phi

PC Perspective Podcast #307 - 07/03/2014

Join us this week as we discuss the EVGA Torq X10 Mouse, Samsung 850 Pro, OCZ RevoDrive 350 and more!

You can subscribe to us through iTunes and you can still access it directly through the RSS page HERE.

The URL for the podcast is: http://pcper.com/podcast - Share with your friends!

  • iTunes - Subscribe to the podcast directly through the iTunes Store
  • RSS - Subscribe through your regular RSS reader
  • MP3 - Direct download link to the MP3 file

Hosts: Ryan Shrout, Josh Walrath, Jeremy Hellstrom, and Morry Tietelman

Program length: 1:19:27

Subscribe to the PC Perspective YouTube Channel for more videos, reviews and podcasts!!

 

Intel's Knights Landing (Xeon Phi, 2015) Details

Subject: General Tech, Graphics Cards, Processors | July 2, 2014 - 03:55 AM |
Tagged: Intel, Xeon Phi, xeon, silvermont, 14nm

Anandtech has just published a large editorial detailing Intel's Knights Landing. Mostly, it is stuff that we already knew from previous announcements and leaks, such as one by VR-Zone from last November (which we reported on). Officially, few details were given back then, except that it would be available as either a PCIe-based add-in board or as a socketed, bootable, x86-compatible processor based on the Silvermont architecture. Its many cores, threads, and 512 bit registers are each pretty weak, compared to Haswell, for instance, but combine to about 3 TFLOPs of double precision performance.

itsbeautiful.png

Not enough graphs. Could use another 256...

The best way to imagine it is running a PC with a modern, Silvermont-based Atom processor -- only with up to 288 processors listed in your Task Manager (72 actual cores with quad HyperThreading).

The main limitation of GPUs (and similar coprocessors), however, is memory bandwidth. GDDR5 is often the main bottleneck of compute performance and just about the first thing to be optimized. To compensate, Intel is packaging up-to 16GB of memory (stacked DRAM) on the chip, itself. This RAM is based on "Hybrid Memory Cube" (HMC), developed by Micron Technology, and supported by the Hybrid Memory Cube Consortium (HMCC). While the actual memory used in Knights Landing is derived from HMC, it uses a proprietary interface that is customized for Knights Landing. Its bandwidth is rated at around 500GB/s. For comparison, the NVIDIA GeForce Titan Black has 336.4GB/s of memory bandwidth.

Intel and Micron have worked together in the past. In 2006, the two companies formed "IM Flash" to produce the NAND flash for Intel and Crucial SSDs. Crucial is Micron's consumer-facing brand.

intel-knights-landing.jpg

So the vision for Knights Landing seems to be the bridge between CPU-like architectures and GPU-like ones. For compute tasks, GPUs edge out CPUs by crunching through bundles of similar tasks at the same time, across many (hundreds of, thousands of) computing units. The difference with (at least socketed) Xeon Phi processors is that, unlike most GPUs, Intel does not rely upon APIs, such as OpenCL, and drivers to translate a handful of functions into bundles of GPU-specific machine language. Instead, especially if the Xeon Phi is your system's main processor, it will run standard, x86-based software. The software will just run slowly, unless it is capable of vectorizing itself and splitting across multiple threads. Obviously, OpenCL (and other APIs) would make this parallelization easy, by their host/kernel design, but it is apparently not required.

It is a cool way that Intel arrives at the same goal, based on their background. Especially when you mix-and-match Xeons and Xeon Phis on the same computer, it is a push toward heterogeneous computing -- with a lot of specialized threads backing up a handful of strong ones. I just wonder if providing a more-direct method of programming will really help developers finally adopt massively parallel coding practices.

I mean, without even considering GPU compute, how efficient is most software at splitting into even two threads? Four threads? Eight threads? Can this help drive heterogeneous development? Or will this product simply try to appeal to those who are already considering it?

Source: Intel

Intel Xeon Phi to get Serious Refresh in 2015?

Subject: General Tech, Graphics Cards, Processors | November 28, 2013 - 03:30 AM |
Tagged: Intel, Xeon Phi, gpgpu

Intel was testing the waters with their Xeon Phi co-processor. Based on the architecture designed for the original Pentium processors, it was released in six products ranging from 57 to 61 cores and 6 to 16GB of RAM. This lead to double precision performance of between 1 and 1.2 TFLOPs. It was fabricated using their 22nm tri-gate technology. All of this was under the Knights Corner initiative.

Intel_Xeon_Phi_Family.jpg

In 2015, Intel plans to have Knights Landing ready for consumption. A modified Silvermont architecture will replace the many simple (basically 15 year-old) cores of the previous generation; up to 72 Silvermont-based cores (each with 4 threads) in fact. It will introduce the AVX-512 instruction set. AVX-512 allows applications to vectorize 8 64-bit (double-precision float or long integer) or 16 32-bit (single-precision float or standard integer) values.

In other words, packing a bunch of related problems into a single instruction.

The most interesting part? Two versions will be offered: Add-In Boards (AIBs) and a standalone CPU. It will not require a host CPU, because of its x86 heritage, if your application is entirely suited for an MIC architecture; unlike a Tesla, it is bootable with existing and common OSes. It can also be paired with standard Xeon processors if you would like a few strong threads with the 288 (72 x 4) the Xeon Phi provides.

And, while I doubt Intel would want to cut anyone else in, VR-Zone notes that this opens the door for AIB partners to make non-reference cards and manage some level of customer support. I'll believe a non-Intel branded AIB only when I see it.

Source: VR-Zone

Intel claims Knight's Landing will slay HUMA and bare all CUDA's flaws

Subject: General Tech | November 20, 2013 - 12:53 PM |
Tagged: Xeon Phi, knights landing, Intel, 14nm

Intel has been talking up the Xeon Phi, first of the Knight's Landing chips which shall arrive in the not too distant future.  This new architecture is touted to bring a return of homogeneous systems architecture which will perform parallel processing on its many cores, currently 61 is the number being tossed around, at a level of performance that will exceed the GPU accelerated heterogeneous architecture being pushed by AMD and NVIDIA.  Whether this is true or not remains to be seen but many server builders may prefer the familiar CPU only architecture and as at least some of the Phi's will be available in rack mounted form and not just addin cards they may choose Intel out of habit.   You can also read about Micron's Automata Processor which The Register reports can outperform a 48-chip cluster of Intel Xeon 5650s in certain scenarios.

KNOTS01.jpg

"From Intel's point of view, today's hottest trend in high-performance computing – GPU acceleration – is just a phase, one that will be superseded by the advent of many-core CPUs, beginning with Chipzilla's next-generation Xeon Phi, codenamed "Knights Landing"."

Here is some more Tech News from around the web:

Tech Talk

Source: The Register

Dell Unveils New T3610, T5610, and T7610 Workstations

Subject: General Tech, Systems | September 9, 2013 - 09:00 AM |
Tagged: Xeon Phi, workstation, quadro, micron, LSI, k6000, Ivy Bridge-EP, firepro, dell

Along with the release of new mobile workstations, Dell announced three new desktop workstations. Specifically, Dell is launching the T3610, T5610, and T7610 PC workstations under its Precision series. The new systems reside in redesigned cases with improved cable management, removable power supplies (tool-less, removable by sliding out from rear panel), and in the case of the T7610 removable hard drives. All of the new Precision workstations have been outfitted with Intel's latest Ivy Bridge-EP based Xeon processors, ECC memory, workstation-class graphics cards from AMD and NVIDIA, Xeon Phi accelerator card options, LSI hardware RAID controllers, and updated software solutions from Intel and Dell.

Dell Precision T3610 T5610 T7610.jpg

The new Precision workstations side-by-side. From left to right: T3610, T5610, and T7610.

Dell's Precision T3610 is a the mid-tower system of the group powered by single socket Xeon E5-2600 v2 hardware that further supports up to 128GB DDR3 ECC memory, two graphics cards, three 3.5” hard drives, and four 2.5” SSDs.

Dell Precision T3610 Single Xeon Ivy Bridge-EP Workstation.jpg

The Precision T3610, a new single socket, mid-range workstation.

The Precision T5610 ups the ante to a dual socket IVB-EP processor system that can be configured with up to 128GB DDR3 ECC memory, two AMD FirePro or NVIDIA Quadro (e.g. Quadro K5000) graphics cards, a Tesla K20C accelerator card, three 3.5” hard drives, and four 2.5” solid state drives.

Finally, the T7610 workstation supports dual Intel Ivy Bridge-EP Xeon E5-2600 v2 series processors (up to 24 cores per system), up to 512GB DDR3 ECC memory, three graphics cards (including two NVIDIA Quadro K6000 cards), four 3.5” hard drives, and eight 2.5” SSDs.

Dell Precision T5610 Dual Xeon Ivy Bridge-EP Workstation.jpg

Dell's Precision T5610 dual socket workstation.

The new Precision workstations can also be configured with an Intel Xeon Phi 3120A accelerator card in lieu of a Tesla card. The choice will mainly depend on the applications being used and the development resources and expertise available. Both options are designed to accelerate highly parallel workloads in applications that have been compiled to support them. Further, users can add an LSI hardware RAID card with 1GB of onboard memory to the systems. Dell further offers a Micron P320h PCI-E SSD that, while not bootable, offers up 350GB of high performance storage that excels at high sequential reads and writes.

On the software front, Dell is including the Dell Precision Performance Optimizer and the Intel Cache Acceleration Software. The former automatically configures and optimizes the workstation for specific applications based on profiles that are reportedly regularly updated. The other bit of software works to optimize systems that use both hard drives and SSDs with the SSDs as a cache for the mechanical storage. The Intel Cache Acceleration Software configures the caching algorithms to favor caching very large files on the solid state storage. It is a different approach to consumer caching strategies, but one that works well with businesses that use these workstations to process large data sets.

Dell Precision T7610 Dual Xeon Ivy Bridge-EP Workstation.jpg

The Dell Precision T7610 workstation.

The Dell workstations are aimed at businesses doing scientific analysis, professional engineering, and complex 3D modeling. The T7610 in particular is aimed at the oil and gas industry for use in simulations and modeling as companies search for new oil deposits.

All three systems will be available for purchase worldwide beginning September 12th. Some of the options, such as 512GB of ECC and the NVIDIA Quadro K6000 on the T7610 will not be available until next month, however. The T3610 has a starting price of $1,099 while the T5610 and T7610 have starting prices of $2,729 and $3,059 respectively.

What are your thoughts on Dell's new mid-tower workstations?

Source: Dell

The Titan's Overthrown. Tianhe-2 Supercomputer New #1

Subject: General Tech, Processors, Systems | June 26, 2013 - 10:27 PM |
Tagged: supercomputing, supercomputer, titan, Xeon Phi

The National Supercomputer Center in Guangzho, China, will host the the world's fastest supercomputer by the end of the year. The Tianhe-2, English: "Milky Way-2", is capable of nearly double the floating-point performance of Titan albeit with slightly less performance per watt. The Tianhe-2 was developed by China's National University of Defense Technology.

tianhe-2-jack-dongarra-pdf-600x0.jpg

Photo Credit: Top500.org

Comparing new fastest computer with the former, China's Milky Way-2 is able to achieve 33.8627 PetaFLOPs of calculations from 17.808 MW of electricity. The Titan, on the other hand, is able to crunch 17.590 PetaFLOPs with a draw of just 8.209 MW. As such, the new Milky Way-2 uses 12.7% more power per FLOP than Titan.

Titan is famously based on the Kepler GPU architecture from NVIDIA, coupled with several 16-core AMD Opteron server processors clocked at 2.2 GHz. This concept of using accelerated hardware carried over into the design of Tianhe-2, which is based around Intel's Xeon Phi coprocessor. If you include the simplified co-processor cores of the Xeon Phi, the new champion is the sum of 3.12 million x86 cores and 1024 terabytes of memory.

... but will it run Crysis?

... if someone gets around to emulating DirectX in software, it very well could.

Source: Top500

Inspur Readies Tianhe-2 Supercomputer With 54 Petaflop Theoretical Peak Performance

Subject: Systems | June 3, 2013 - 09:27 PM |
Tagged: Xeon Phi, tianhe-2, supercomputer, Ivy Bridge, HPC, China

A powerful new supercomputer constructed by Chinese company Inspur is currently in testing at the National University of Defense Technology. Called the Tianhe-2, the new supercomputer has 16,000 compute nodes and approximately 54 Petaflops of peak theoretical compute performance.
Destined for the National Supercomputer Center in Guangzhou, China, the open HPC platform will be used for education and research projects. The Tianhe-2 is composed of 125 racks with 128 compute nodes in each rack.

The compute nodes are broken down into two types: CPM and APU modules. One of each node type makes up a single compute board. The CPM module hosts four Intel Ivy Bridge processors, 128GB system memory, and a single Intel Xeon Phi accelerator card with 8GB of its own memory. Each APU module adds five Xeon Phi cards to every compute board. The compute boards (a CPM module + a APU module) contain two NICs that connect the various compute boards with Inspur's custom THExpress2 high bandwidth interconnects. Finally, the Tianhe-2 supercomputer will have access to 12.4 Petabytes of storage that is shared across all of the compute boards.

In all, the Tianhe-2 is powered by 32,000 Intel Ivy Bridge processors, 1.024 Petabytes of system memory (not counting Phi dedicated memory--which would make the total 1.404 PB), and 48,000 Intel Xeon Phi MIC (Many Integrated Cores) cards. That is a total of 3,120,000 processor cores (though keep in mind that number is primarily made up of the relatively simple individual Phi cores as there are 57 cores to each Phi card).

Artist Rendition of Inspur-built Tianhe-2 Chinese Supercomputer.png

Inspur claims up to 3.432 TFlops of peak compute performance per compute node (which, for simplicity they break down as one node is 2 Ivy Bridge chips, 64GB memory, and 3 Xeon Phi cards although the two compute modules that make up a node are not physically laid out that way) for a total theoretical potential compute power of 54,912 TFlops (or 54.912 Petaflops) across the entire supercomputer. In the latest Linpack benchmark run, researchers saw up to 63% efficiency in attaining peak performance -- 30.65 PFlops out of 49.19 PFlops peak/theoretical performance -- when only using 14,336 nodes with 50GB RAM each. Further testing and optimization should improve that number, and when all nodes are brought online the real world performance will naturally be higher than the current benchmarks. With that said, the Tianhe-2 is already besting Cray's TITAN, which is promising (though I hope Cray comes back next year and takes the crown again, heh).

In order to keep all of this hardware cool, Inspur is planning a custom liquid cooling system using chilled water. The Tianhe-2 will draw up to 17.6 MW of power under load. Once the liquid cooling system is implemented the supercomputer will draw 24MW while under load.
This is an impressive system, and an interesting take on a supercomputer architecture considering the rise in popularity of heterogeneous architectures that pair massive numbers of CPUs with graphics processing units (GPUs).

The Tianhe-2 supercomputer will be reconstructed at its permanent home at the National Supercomputer Center in Guangzhou, China once the testing phase is finished. It will be one of the top supercomputers in the world once it is fully online! HPC Wire has a nice article with slides an further details on the upcoming processing powerhouse that is worth a read if you are into this sort of HPC stuff.

Also read: Cray unveils the TITAN supercomputer.

Source: HPC Wire

AMD's new FirePro S10000 sports two GPUs

Subject: General Tech, Graphics Cards | November 13, 2012 - 01:17 PM |
Tagged: amd, Intel, firepro, firepro s10000, HPC, Xeon Phi, 3120A, 5110P, Knight's Corner

AMD's new Tahiti based FirePro S10000 sports a little more than just a GPU upgrade it sports two GPU updates as this is a dual GPU card.  According to The Register it should run about $3,600 and need 375W to perform, numbers which make it a more efficient card than the S9000 even though it needs significantly more cash and power to run.  It is a 2 slot card, a necessity in the server and workstation world and while it does not support CrossFire it does support EyeFinity with its DVI port and four Mini DisplayPorts.

elreg_amd_firepro_s10000.jpg

The Register also got some news about Xeon Phi, Intel's answer to the HPC cards on offer from AMD and Intel.  Knights Corner is the evolution of Larrabee into an actual product, in this case two 62 core cards though not all of the cores are active. The passively cooled 5110P has 60 cores running at 1.053GHz, while the 3120A has 57 cores clocked slightly higher at 1.1GHz and sports a fan.  Both cards produce just over a teraflop of double precision floating point math, compared to the 1.48 teraflops offered by AMD's S10000 or the 1.3 offered by the Tesla K20x. Check out more on these coprocessors at The Register.

elreg_intel_xeon_phi_die_shot.jpg

"With the FirePro S10000, not only is the GPU geared down to 825MHz, but the memory is similarly downshifted to 5GHz. The memory interface is 384-bit wide on each GPU, with two blocks of GDDR5 memory yielding a total of 6GB. (This could be a little skinny on the memory for some HPC workloads, given that the S9000 card has 6GB of memory for one Tahiti GPU.) Each GPU can access 240GB/sec of memory bandwidth linking to each 3GB chunk of GDDR5 memory.

Because the card is double-stuffed, it can deliver a very impressive 5.91 teraflops SP and 1.48 teraflops DP in peak floating point oomph."

Here is some more Tech News from around the web:

Tech Talk

Source: The Register

Fee PHI fo fum; Intel changes the smell of a Pentium

Subject: General Tech | September 5, 2012 - 03:49 PM |
Tagged: Xeon Phi, xeon, larrabee, knights corner, Intel, hot chips

The Register is back with more information from Hot Chips about Intel's Xeon Phi coprocessor, which seems to be much more than just a GPU in drag.  Inside the shell you will find at least 50 cores and at least 8GB of GDDR5 graphics, wwith the cores being very heavily modified 22-nanometer Tri-Gate process Pentium P54C chips clocked somewhere between 1.2-1.6GHz.  There is a brand new Vector Processing Unit which processes 512-bit SIMD instructions and sports an Extended Math Unit to handle calculations with hardware not software.  Read on for more details about the high-speed ring interconnects that allow these chips to communicate among themselves and with the Xeon server it will be a part of.

ElReg_intel_xeon_phi_block_diagram.jpg

"Intel has been showing off the performance of the "Knights Corner" x86-based coprocessor for so long that it's easy to forget that it is not yet a product you can actually buy. Back in June, Knights Corner was branded as the "Xeon Phi", making it clear that Phi was a Xeon coprocessor even if it does not bear a lot of resemblance to the Xeon processors at the heart of the vast majority of the world's servers."

Here is some more Tech News from around the web:

Tech Talk

Source: The Register

A lot of little Phi coprocessors lightens the load

Subject: General Tech | August 31, 2012 - 02:43 PM |
Tagged: Intel, xeon, Xeon Phi, hot chips, larrabee

The Xeon Phi is not Larrabee but it does give a chance to remind people that Intel did at one time swear we would be seeing huge results from a lot of strung together Pentium chips.  Nor is Many Integrated Cores the same as AMD's Magny-cours, although you can be forgiven if that thought popped into your head.  Instead the Xeon Phi is a co-processor that will have 50 or more 512-bit SIMD architecture based processors, each with 512KB of Level 2 cache.  These cores are comparatively slow on their own but have been designed to spread tasks over dozens of cores for parallel processing to make up for the lack of individual power.  Intel sees Phi as a way to create HPC servers which will be physically smaller than one based solely on traditional Xeon based servers as well as being more efficient.  There is still a lot more we need to learn about these chips; until then you can check out The Inquirer's article on Intel's answer to NVIDIA and AMD's HPC cards.

Xeon_Phi_PCIe_Card.jpg

"CHIPMAKER Intel revealed some architectural details of its upcoming Xeon Phi accelerator at the Hotchips conference, saying that the chip will feature 512-bit SIMD units."

Here is some more Tech News from around the web:

Tech Talk

Source: The Inquirer