GPU: Finally getting recognition from Windows.

Subject: General Tech, Graphics Cards, Systems, Mobile | July 27, 2012 - 02:12 PM |
Tagged: windows 8, winRT, gpgpu

Paul Thurrott of Windows Supersite reports that Windows 8 is finally taking hardware acceleration seriously and will utilize the GPU across all applications. This hardware acceleration should make Windows 8 perform better and consume less power than if the setup were running Windows 7. With Microsoft finally willing to adopt modern hardware for performance and battery life I wonder when they will start using the GPU to accelerate tasks like file encryption.

It is painful when you have the right tool for the job but must use the wrong one.

Windows has, in fact, used graphics acceleration for quite some time albeit in fairly mundane and obvious ways. Windows Vista and Windows 7 brought forth the Windows Aero Glass look and feel. Aero was heavily reliant on Shader Model 2.0 GPU computing to the point that much of it would not run on anything less.

GPGPU-Trail.png

Washington State is not that far away from Oregon.

Microsoft is focusing their hardware acceleration efforts for Windows 8 on what they call mainstream graphics. 2D graphics and animation were traditionally CPU-based with a couple of applications such as Internet Explorer 9, Firefox, and eventually Chrome allowing the otherwise idle GPU to lend a helping hand. As such, Microsoft is talking up Direct2D and DirectWrite usage all throughout Windows 8 on a wide variety of hardware.

The driving force that neither Microsoft nor Paul Thurrott seems to directly acknowledge is battery life. Graphics Processors are considered power-hogs until just recently for almost anyone who assembles a higher-end gaming computer.  Despite this, the GPU is actually more efficient at certain tasks than a CPU -- this is especially true when you consider the GPUs which will go into WinRT devices. The GPU will help the experience be more responsive and smooth but also consume less battery power. I guess Microsoft is finally believes that the time is right to bother using what you already have.

There are many more tasks which can be GPU accelerated than just graphics -- be it 3D or the new emphasis on 2D acceleration. Hopefully after Microsoft dips in their toe they will take the GPU more seriously as an all-around parallel task processor. Maybe now that they are implementing the GPU for all applications they can consider using it for all applications -- in all applications.

Intel Introduces Xeon Phi: Larrabee Unleashed

Subject: Processors | June 19, 2012 - 11:46 AM |
Tagged: Xeon Phi, xeon e5, nvidia, larrabee, knights corner, Intel, HPC, gpgpu, amd

Intel does not respond well when asked about Larabee.  Though Intel has received a lot of bad press from the gaming community about what they were trying to do, that does not necessarily mean that Intel was wrong about how they set up the architecture.  The problem with Larrabee was that it was being considered as a consumer level product with an eye for breaking into the HPC/GPGPU market.  For the consumer level, Larrabee would have been a disaster.  Intel simply would not have been able to compete with AMD and NVIDIA for gamers’ hearts.
 
The problem with Larrabee and the consumer space was a matter of focus, process decisions, and die size.  Larrabee is unique in that it is almost fully programmable and features really only one fixed function unit.  In this case, that fixed function unit was all about texturing.    Everything else relied upon the large array of x86 processors and their attached vector units.  This turns out to be very inefficient when it comes to rendering games, which is the majority of work for the consumer market in graphics cards.  While no outlet was able to get a hold of a Larrabee sample and run benchmarks on it, the general feeling was that Intel would easily be a generation behind in performance.  When considering how large the die size would have to be to even get to that point, it was simply not economical for Intel to produce these cards.
 
phi_01.jpg
 
Xeon Phi is essentially an advanced part based on the original Larrabee architecture.
 
This is not to say that Larrabee does not have a place in the industry.  The actual design lends itself very nicely towards HPC applications.  With each chip hosting many x86 processors with powerful vector units attached, these products can provide tremendous performance in HPC applications which can leverage these particular units.  Because Intel utilized x86 processors instead of the more homogenous designs that AMD and NVIDIA use (lots of stream units doing vector and scalar, but no x86 units or a more traditional networking fabric to connect them).  This does give Intel a leg up on the competition when it comes to programming.  While GPGPU applications are working with products like OpenCL, C++ AMP, and NVIDIA’s CUDA, Intel is able to rely on many current programming languages which can utilize x86.  With the addition of wide vector units on each x86 core, it is relatively simple to make adjustments to utilize these new features as compared to porting something over to OpenCL.
 
So this leads us to the Intel Xeon Phi.  This is the first commercially available product based on an updated version of the Larrabee technology.  The exact code name is Knights Corner.  This is a new MIC (many integrated cores) product based on Intel’s latest 22 nm Tri-Gate process technology.  The details are scarce on how many cores this product actually contains, but it looks to be more than 50 of a very basic “Pentium” style core;  essentially low die space, in-order, and all connected by a robust networking fabric that allows fast data transfer between the memory interface, PCI-E interface, and the cores.
 
intelphi.jpg
 
Each Xeon Phi promises more than 1 TFLOP of performance (as measured by Linpack).  When combined with the new Xeon E5 series of processors, these products can provide a huge amount of computing power.  Furthermore, with the addition of the Cray interconnect technology that Intel acquired this year, clusters of these systems could provide for some of the fastest supercomputers on the market.  While it will take until the end of this year at least to integrate these products into a massive cluster, it will happen and Intel expects these products to be at the forefront of driving performance from the Petascale to the Exascale.
 
phi_02.jpg
 
These are the building blocks that Intel hopes to utilize to corner the HPC market.  Providing powerful CPUs and dozens if not hundreds of MIC units per cluster, the potential computer power should bring us to the Exascale that much sooner.
 
Time will of course tell if Intel will be successful with Xeon Phi and Knights Corner.  The idea behind this product seems sound, and the addition of powerful vector units being attached to simple x86 cores should make the software migration to massively parallel computing just a wee bit easier than what we are seeing now with GPU based products from AMD and NVIDIA.  The areas that those other manufacturers have advantages over Intel are that of many years of work with educational institutions (research), software developers (gaming, GPGPU, and HPC), and industry standards groups (Khronos).  Xeon Phi has a ways to go before being fully embraced by these other organizations, and its future is certainly not set in stone.  We have yet to see 3rd party groups get a hold of these products and put them to the test.  While Intel CPUs are certainly class leading, we still do not know of the full potential of these MIC products as compared to what is currently available in the market.
 

The one positive thing for Intel’s competitors is that it seems their enthusiasm for massively parallel computing is justified.  Intel just entered that ring with a unique architecture that will certainly help push high performance computing more towards true heterogeneous computing. 

Source: Intel

CS6 OpenCL support -- not quite hardware acceleration for all

Subject: General Tech, Graphics Cards | May 19, 2012 - 03:27 AM |
Tagged: Adobe, CS6, gpgpu

Last month, SemiAccurate reported that Adobe Creative Suite 6 would be programmed around OpenCL which would allow any GPU to accelerate your work. Adobe now claims that OpenCL would only accelerate the HD6750M and the HD6770M running on OSX Lion with 1GB of vRAM on a MacBook Pro at least for the time being at least for Adobe Premiere Pro.

Does it aggravate you when something takes a while or stutters when you know a part of your PC is just idle?

Adobe has been increasingly moving to take advantage of the graphics processor available in your computer to benefit the professional behind the keyboard, mouse, or tablet. CS 5.5 pushed several of their applications on to the CUDA platform. End-users claim that Adobe sold them out for NVIDIA but that just seems unlikely and unlike either company. My prediction is and always was more that NVIDIA parachuted in some engineers to Adobe and their help was limited to CUDA.

Creative Suite 6 further suggests that I was correct as Adobe has gone back and re-authored much of those features in OpenCL.

AdobeCSOpenCL.png

Isn't it somewhat ironic that insanity is a symptom of mercury poisoning?

AMD as a hatter!

CS6 will not execute on just any old GPU now despite the wider availability of OpenCL relative to the somewhat NVIDIA proprietary CUDA. While the CUDA whitelist currently extends to 22 Windows NVIDIA GPUs and 3 Mac OSX NVIDIA GPUs current OpenCL support is limited to a pair of AMD-based OSX Lion mobile GPUs: the 6750M and the 6770M.

It would not surprise me if other GPUs would accelerate CS6 if manually added to a whitelist. Adobe probably is very conservative with what components they add to the whitelist in an effort to reduce support costs. That does not mean that you will see benefits even if you trick Adobe into accepting hardware acceleration though.

It appears as if Adobe is working towards using the most open and broad standards -- they just are doing it at their own pace this time. This release was obviously paced for Apple support.

Source: Adobe

NVIDIA urges you to program better now, not CPU -- later.

Subject: Editorial, General Tech, Graphics Cards, Processors | April 4, 2012 - 04:13 AM |
Tagged: nvidia, Intel, Knight's Corner, gpgpu

NVIDIA steals Intel’s lunch… analogy. In the process they claim that optimizing your application for Intel’s upcoming many-core hardware is not free of effort, and that effort is similar to what is required to develop on what NVIDIA already has available.

A few months ago, Intel published an article on their software blog to urge developers to look to the future without relying on the future when they design their applications. The crux of Intel’s argument states that regardless of how efficient Intel makes their processors, there is still responsibility on your part to create efficient code.

nvidiainteltf2.jpg

There’s always that one, in the back of the class…

NVIDIA, never a company to be afraid to make a statement, used Intel’s analogy to alert developers to optimize for many-core architectures.

The hope that unmodified HPC applications will work well on MIC with just a recompile is not really credible, nor is talking about ease of programming without consideration of performance.

There is no free lunch. Programmers will need to put in some effort to structure their applications for hybrid architectures. But that work will pay off handsomely for today’s, and especially tomorrow’s, HPC systems.

It remains to be seen how Intel MIC will perform when it eventually arrives. But why wait? Better to get ahead of the game by starting down the hybrid multicore path now.

NVIDIA thinks that Intel was correct: there would be no free lunch for developers, why not purchase a plate at NVIDIA’s table? Who knows, after the appetizer you might want to stay around.

You cannot simply allow your program to execute on Many Integrated Core (MIC) hardware and expect it to do so well. The goal is not to simply implement on new hardware -- it is to perform efficiently while utilizing the advantages of everything that is available. It will always be up to the developer to set up their application in the appropriate way.

Your advantage will be to understand the pros and cons of massive parallelism. NVIDIA, AMD, and now Intel have labored to create a variety of architectures to suit this aspiration; software developers must labor in a similar way on their end.

Source: NVIDIA Blogs

An academic collaboration leads to a GPU/CPU collaboration

Subject: General Tech | February 8, 2012 - 12:13 PM |
Tagged: gpgpu, l3 cache, APU

Over at North Carolina State University, students Yi Yang, Ping Xiang and Dr. Huiyang Zhou, along with Mike Mantor of Advanced Micro Devices have been working on a way to improve how efficiently the GPU and CPU work together.  Our current generations of APU/GPGPUs, Llano and Sandy Bridge, have united the two processing units on a single substrate but as of yet they cannot efficiently pass operations back and forth.  This project works to leverage the L3 cache of the CPU to give a high speed bridge between the two processors, allowing the CPU to pass highly parallel tasks to the GPU for more efficient processing and letting the CPU deal with the complex operations it was designed for.  

Along with that bridge comes a change in the way the L2 prefetch is utilized; increasing memory access at that level frees up more for the L3 to pass data between CPU and GPU thanks to a specially designed preexecution unit triggered by the GPU and running on the CPU which will enable synchronized memory fetch instructions.  The result has been impressive, in their tests they saw an average improvement of 21.4% in performance.

APU.jpg

"Researchers from North Carolina State University have developed a new technique that allows graphics processing units (GPUs) and central processing units (CPUs) on a single chip to collaborate – boosting processor performance by an average of more than 20 percent.

"Chip manufacturers are now creating processors that have a 'fused architecture,' meaning that they include CPUs and GPUs on a single chip,” says Dr. Huiyang Zhou, an associate professor of electrical and computer engineering who co-authored a paper on the research. "This approach decreases manufacturing costs and makes computers more energy efficient. However, the CPU cores and GPU cores still work almost exclusively on separate functions. They rarely collaborate to execute any given program, so they aren’t as efficient as they could be. That's the issue we’re trying to resolve."

Here is some more Tech News from around the web:

Tech Talk

 

NVIDIA Updates CUDA: Major Release for Science Research

Subject: General Tech, Graphics Cards | January 29, 2012 - 02:53 AM |
Tagged: nvidia, gpgpu, CUDA

NVIDIA has traditionally been very interested in acquiring room in the high-performance computing for scientific research market. For a lot of functions, having a fast and highly parallel processor saves time and money compared to having a traditional computer crunch away or having to book time with one of the world’s relatively few supercomputers. Despite the raw performance of a GPU, adequate development tools are required to bring the simulation or calculation into a functional program to execute on said GPU. NVIDIA is said to have had a strong lead with their CUDA platform for quite some time; that lead will likely continue with releases the size of this one.

MOD-9981_CUDAVisualProfiler.jpg

What does a tuned up GPU purr like? Cuda cuda cuda cuda cuda.

The most recent release, CUDA 4.1, has three main features:

  • A visual profiler to point out common mistakes and optimizations and to provide instructions which detail how to alter your code to increase your performance
  • A new compiler which is based on the LLVM infrastructure, making good on their promise to open the CUDA platform to other architectures -- both software and hardware
  • New image and signal processing functions for their NVIDIA Performance Primitives (NPP) library, relieving developers the need to create their own versions or license a proprietary library

The three features, as NVIDIA describes them in their press release, are listed below.

New Visual Profiler - Easiest path to performance optimization
The new Visual Profiler makes it easy for developers at all experience levels to optimize their code for maximum performance. Featuring automated performance analysis and an expert guidance system that delivers step-by-step optimization suggestions, the Visual Profiler identifies application performance bottlenecks and recommends actions, with links to the optimization guides. Using the new Visual Profiler, performance bottlenecks are easily identified and actionable.

LLVM Compiler - Instant 10 percent increase in application performance
LLVM is a widely-used open-source compiler infrastructure featuring a modular design that makes it easy to add support for new programming languages and processor architectures. Using the new LLVM-based CUDA compiler, developers can achieve up to 10 percent additional performance gains on existing GPU-accelerated applications with a simple recompile. In addition, LLVM's modular design allows third-party software tool developers to provide a custom LLVM solution for non-NVIDIA processor architectures, enabling CUDA applications to run across NVIDIA GPUs, as well as those from other vendors.

New Image, Signal Processing Library Functions - "Drop-in" Acceleration with NPP Library
NVIDIA has doubled the size of its NPP library, with the addition of hundreds of new image and signal processing functions. This enables virtually any developer using image or signal processing algorithms to easily gain the benefit of GPU acceleration, with the simple addition of library calls into their application. The updated NPP library can be used for a wide variety of image and signal processing algorithms, ranging from basic filtering to advanced workflows.
 

Source: NVIDIA

New Trojan.Badminer Malware Steals Your Spare Processing Cycles To Make Criminals Money At Your Expense

Subject: General Tech | August 17, 2011 - 11:02 PM |
Tagged: trojan, opencl, mining, Malware, gpgpu, bitcoin

A new piece of malware was recently uncovered by anti-virus provider Symantec that seeks to profit from your spare computing cycles. Dubbed Trojan.Badminer, this insidious piece of code is a trojan that (so far) is capable of affecting Windows operating systems from Windows 98 to Windows 7. Once this trojan has been downloaded and executed (usually through an online attack vector via an unpatched bug in flash or java), it proceeds to create a number of files and registry entries.

bitcointrojan.png

It's a trojan infected bitcoin, oh the audacity of malware authors!

After it has propagated throughout the system, it is then able to run one of two mining programs. It will first search for a compatible graphics card, and run Phoenix Miner. However, if a graphics card is not found, it will fall back to RPC miner and instead steal your CPU cycles.  The miners then start hashing in search of bitcoin blocks, and if found, will then send the reward money to the attacker’s account.

It should be noted that bitcoin mining itself is not inherently bad, and many people run it legitimately. In fact, if you are interested in learning more about bitcoins, we ran an article on them recently. This trojan on the other hand is malicious because it is infecting the user’s computer with unwanted code that steals processing cycles from the GPU and CPU to make the attacker money. All these GPU and CPU cycles come at the cost of reduced system responsiveness and electricity, which can add up to a rather large bill, depending on where you live and what hardware the trojan is able to get its hands on.

Right now, Symantec is offering up general tips on keeping users’ computers free from the infection, including enabling a software firewall (or at least being behind a router with its own firewall that blocks unsolicited incoming connections), running the computer as the lowest level user possible with UAC turned on, and not clicking on unsolicited email attachments or links.

If you are also a bitcoin miner, you may want to further protect yourself by securing your bitcoin wallet in the event that you also accidentally become infected by a trojan that seeks to steal the wallet.dat file (the file that essentially holds all your bitcoin currency).

Stay vigilant folks, and keep an eye out on your system GPU and CPU utilization in addion to using safer computing habits to keep nastly malware like this off of your system.  On a more opinionated note, is it just me or have malware authors really hit a new low with this one?

Source: Symantec

Developer Watch: CUVI 0.5 released

Subject: Editorial, General Tech, Graphics Cards | July 26, 2011 - 08:39 PM |
Tagged: gpgpu, Developer Watch, CUVI

Code that can be easily parallelized into many threads have been streaming over to the GPU with many applications and helper libraries taking advantage of CUDA and OpenCL primarily. Thus for developers who wish to utilize the GPU more but are unsure where to start there are more and more options for libraries of functions to call and at least partially embrace their video cards. OpenCV is a library of functions for image manipulation and, while GPU support is ongoing through CUDA, primarily runs on the CPU. CUVIlib, which has just launched their 0.5 release, is a competitor to OpenCV with a strong focus on GPU utilization, performance, and ease of implementation. While OpenCV is licensed as BSD which is about as permissive a license as can be offered, CUVI is not and is based on a proprietary EULA.

Benchmark KLT - CUVILib from TunaCode on Vimeo

Benchmark KLT - OpenCV from TunaCode on Vimeo.

The little plus signs are the computer tracking motion. CUVI (top; 33fps), OpenCV (bottom; 2.5fps)

(Video from CUVIlib)

Despite the proprietary and non-free for commercial use nature of CUVI they advertise large speedups for certain algorithms. For their Kanade-Lucas-Tomasi Feature Tracker algorithm when compared with OpenCV’s implementation they report a three-fold increase in performance with just a GeForce 9800GT installed and 8-13x faster when using a high end computing card such as the Tesla C2050. Their feature page includes footage of two 720p high definition videos undergoing the KLT algorithm with the OpenCV CPU method chugging at 2.5 fps contrasted with CUVI’s GPU-accelerated 33fps. Whether you would prefer to side with OpenCV’s GPU advancements or pay CUVIlib to augment what OpenCV is not good enough for your needs at is up to you, but either future will likely involve the GPU.

Source: CUVIlib

Podcast #162 - Adventures in Bitcoin Mining, the Eyefinity experience, Ultrabooks and more!

Subject: General Tech | July 14, 2011 - 04:38 PM |
Tagged: podcast, bitcoin, mining, gpu, gpgpu, amd, nvidia, eyefinity, APU

PC Perspective Podcast #162 - 7/14/2011

This week we talk about our adventures in Bitcoin Mining, the Eyefinity experience, Ultrabooks and more!

You can subscribe to us through iTunes and you can still access it directly through the RSS page HERE.

The URL for the podcast is: http://pcper.com/podcast - Share with your friends!

  • iTunes - Subscribe to the podcast directly through the iTunes Store
  • RSS - Subscribe through your regular RSS reader
  • MP3 - Direct download link to the MP3 file

Hosts: Ryan Shrout, Jeremy Hellstrom, Josh Walrath and Allyn Malventano

This Podcast is brought to you by MSI Computer, and their all new Sandy Bridge Motherboards!

Program length: 1:16:40

Program Schedule:

  1. 0:00:40 Introduction
  2. 1-888-38-PCPER or podcast@pcper.com
  3. http://pcper.com/podcast
  4. http://twitter.com/ryanshrout and http://twitter.com/pcper
  5. 0:02:10 Bitcoin Currency and GPU Mining Performance Comparison
  6. 0:22:48 Bitcoin Mining Update: Power Usage Costs Across the United States
  7. 0:34:15 This Podcast is brought to you by MSI Computer, and their all new Sandy Bridge Motherboards!
  8. 0:34:50 Eyefinity and Me
  9. 0:45:00 Video Perspective: AMD A-series APU Dual Graphics Technology Performance
  10. 0:47:02 As expected NVIDIA's next generation GPU release schedule was a bit optimistic
  11. 0:49:40 A PC Macbook Air: Can Intel has?
  12. 0:53:00 PC: for all your Xbox gaming needs
  13. 0:56:06 Email from Howard
  14. 1:00:28 Email from Ian
  15. 1:03:00 Email from Jan
    1. In case you're interested, here are almost 150mpix of HDR: http://rattkin.info/archives/430
  16. 1:08:55 Quakecon Reminder - http://www.quakecon.org/
  17. 1:09:45 Hardware / Software Pick of the Week
    1. Ryan: Dropped the ball
    2. Jeremy: I NEED FLEET COMMANDER
    3. Josh: Finally getting cheap enough for me to buy
    4. Allyn: http://gplus.to/
  18. 1-888-38-PCPER or podcast@pcper.com
  19. http://pcper.com/podcast   
  20. http://twitter.com/ryanshrout and http://twitter.com/pcper
  21. 1:15:15 Closing
Source:

Wish you CUDA had a GPGPU C++ template? Now you can!

Subject: General Tech, Graphics Cards | June 29, 2011 - 08:58 PM |
Tagged: gpgpu, CUDA

If you have seen our various news articles regarding how a GPU can be useful in many ways, and you are a developer yourself, you may be wondering how to get in on that action. Recently Microsoft showed off their competitor to OpenCL known as C++ AMP and AMD showed off some new tools designed to help developers of OpenCL. Everything was dead silent on the CUDA front at the AMD Fusion Developer Summit, as expected, but that does not mean that no-one is helping people who do not mind being tied in to NVIDIA. An open-sourced project has been created to generate template file for programmers wishing to do some of their computation in CUDA and wish a helping hand setting up the framework.

GPGPU-Trail.png

You may think the videocard is backwards, but clearly its DVI heads are in front.

The project was started by Pavel Kartashev and is a Java application that accepts form input and generates CUDA code to be imported into your project. The application will help you generate the tedious skeleton code for defining variables and efficiently using the GPU architecture leaving you to program the actual process to be accomplished itself. The author apparently plans to create a Web-based version which should be quite easy with the Java-based nature of his application. Personally I would find myself more interested in the local application or a widget to leaving my web browser windows to reference material. That said, I am sure that someone would like this tool in their web browser, possibly more people than are like-minded with me.

 
If you are interested in contributing either financially or through labor he asks that you contact him through the email tied with his Paypal account (likely for spam reasons, so I can assume posting it here would be the opposite of helpful). The rest of us can sit back, enjoy our GPU-enabled applications, and bet on how long it will take NVIDIA to reach out to him. I got all next week.
Source: