SECO Introduces mITX GPU Devkit for CUDA Programmers

Subject: General Tech | April 12, 2013 - 02:08 AM |
Tagged: SECO, nvidia, mini ITX, kepler, kayla, GTC 13, GTC, CUDA, arm

Last month, NVIDIA revealed its Kayla development platform that combines a quad core Tegra System on a Chip (SoC) with a NVIDIA Kepler GPU. Kayla will out later this year, but that has not stopped other board makers from putting together their own solutions. One such solution that began shipping earlier this week is the mITX GPU Devkit from SECO.

The new mITX GPU Devkit is a hardware platform for developers to program CUDA applications for mobile devices, desktops, workstations, and HPC servers. It combines a NVIDIA Tegra 3 processor, 2GB of RAM, and 4GB of internal storage (eMMC) on a Qseven module with a Mini-ITX form factor motherboard. Developers can then plug their own CUDA-capable graphics card into the single PCI-E 2.0 x16 slot (which actually runs at x4 speeds). Additional storage can be added via an internal SATA connection, and cameras can be hooked up using the CIC headers.

SECO mITX GPU DEVKIT.jpg

Rear IO on the mITX GPU Devkit includes:

  • 1 x Gigabit Ethernet
  • 3 x USB
  • 1 x OTG port
  • 1 x HDMI
  • 1 x Display Port
  • 3 x Analog audio
  • 2 x Serial
  • 1 x SD card slot

The SECO platform is a proving to be popular for GPGPU in the server space, especially with systems like Pedraforca. The intention of using these types of platforms in servers is to save power by using a low power ARM chip for inter-node communication and basic tasks while the real computing is done solely on the graphics cards. With Intel’s upcoming Haswell-based Xeon chips getting down to 13W TPDs though, systems like this are going to be more difficult to justify. SECO is mostly positioning this platform as a development board, however. One use in that respect is to begin optimizing GPU-accelerated code for mobile devices. With future Tegra chips to get CUDA-compatible graphics cards, new software development and optimization of existing GPGPU code for smartphones and tablet will be increasingly important.

SECO mITX GPU DEVKIT box.jpg

Either way, the SECO mITX GPU Devkit is available now for 349 EUR or approximately $360 (in both cases, before any taxes).

Source: SECO

NVIDIA Launches Tesla K20X Accelerator Card, Powers Titan Supercomputer

Subject: General Tech | November 12, 2012 - 06:29 AM |
Tagged: tesla, supercomputer, nvidia, k20x, HPC, CUDA, computing

Graphics card manufacturer NVIDIA launched a new Tesla K20X accelerator card today that supplants the existing K20 as the top of the line model. The new card cranks up the double and single precision floating point performance, beefs up the memory capacity and bandwidth, and brings some efficiency improvements to the supercomputer space.

NVIDIA_Tesla_K20X_K20_GPU_Accelerator.jpg

While it is not yet clear how many CUDA cores the K20X has, NVIDIA has stated that it is using the GK110 GPU, and is running with 6GB of memory with 250 GB/s of bandwidth – a nice improvement over the K20’s 5GB at 208 GB/s. Both the new K20X and K20 accelerator cards are based on the company’s Kepler architecture, but NVIDIA has managed to wring out more performance from the K20X. The K20 is rated at 1.17 TFlops peak double precision and 3.52 TFlops peak single precision while the K20X is rated at 1.31 TFlops and 3.95 TFlops.

Screenshot (363).png

The K20X manages to score 1.22 TFlops in DGEmm, which puts it at almost three times faster than the previous generation Tesla M2090 accelerator based on the Fermi architecture.

Screenshot (362).png

Aside from pure performance, NVIDIA is also touting efficiency gains with the new K20X accelerator card. When two K20X cards are paired with a 2P Sandy Bridge server, NVIDIA claims to achieve 76% efficiency versus 61% efficiency with a 2P Sandy Bridge server equipped with two previous generation M2090 accelerator cards. Additionally, NVIDIA claims to have enabled the Titan supercomputer to reach the #1 spot on the top 500 green supercomputers thanks to its new cards with a rating of 2,120.16 MFLOPS/W (million floating point operations per second per watt).

Screenshot (359).png

NVIDIA claims to have already shipped 30 PFLOPS worth of GPU accelerated computing power. Interestingly, most of that computing power is housed in the recently unveiled Titan supercomputer. This supercomputer contains 18,688 Tesla K20X (Kepler GK110) GPUs and 299,008 16-core AMD Opteron 6274 processors. It will consume 9 megawatts of power and is rated at a peak of 27 Petaflops and 17.59 Petaflops during a sustained Linpack benchmark. Further, when compared to Sandy Bridge processors, the K20 series offers up between 8.2 and 18.1 times more performance at several scientific applications.

Screenshot (360).png

While the Tesla cards undoubtedly use more power than CPUs, you need far fewer numbers of accelerator cards than processors to hit the same performance numbers. That is where NVIDIA is getting its power efficiency numbers from.

NVIDIA is aiming the accelerator cards at researchers and businesses doing 3D graphics, visual effects, high performance computing, climate modeling, molecular dynamics, earth science, simulations, fluid dynamics, and other such computationally intensive tasks. Using CUDA and the parrallel nature of the GPU, the Tesla cards can acheive performance much higher than a CPU-only system can. NVIDIA has also engineered software to better parrellelize workloads and keep the GPU accelerators fed with data that the company calls Hyper-Q and Dynamic Parallelism respectively.

It is interesting to see NVIDIA bring out a new flagship, especially another GK110 card. Systems using the K20 and the new K20X are available now with cards shipping this week and general availability later this month.

You can find the full press release below and a look at the GK110 GPU in our preview.

Anandtech also managed to get a look inside the Titan supercomputer at Oak Ridge National Labratory, where you can see the Tesla K20X cards in action.

A very specialized but completely open source CUDA-like program for image manipulation

Subject: General Tech | May 30, 2012 - 12:11 PM |
Tagged: CUDA, open source, opengl

Hack a Day linked to a program that could be of great use for anyone who manipulates and processes images, or anyone who wants to be able to make fractals very quickly.  Utilizing the OpenGL Shader Language Reuben Carter developed a command line tool that processes images using NVIDIA GPUs.  As we have talked about in the past on PC Perspective, GPUs are much better at this sort of parallel processing than a traditional CPU or the CPU portion on modern processors.  Below is one obvious use of this program, the quick creation of complex fractals but this program can also process pre-exisiting images.  Edge detection, colour transforms and perhaps even image recognition tasks can be completed with his software at a much faster speed than CPU bound image manipulation programs.  If you are in that field, or looking to decorate your dorm room,  you should grab his software via the GitHub link in the article.

RJC_Mandelbrot.jpg

"If you ever need to manipulate images really fast, or just want to make some pretty fractals, [Reuben] has just what you need. He developed a neat command line tool to send code to a graphics card and generate images using pixel shaders. Opposed to making these images with a CPU, a GPU processes every pixel in parallel, making image processing much faster."

Here is some more Tech News from around the web:

Tech Talk

 

Source: Hack a Day

NVIDIA Pioneers New Standard for High Performance Computing with Tesla GPUs

Subject: Shows and Expos | May 15, 2012 - 03:43 PM |
Tagged: tesla, nvidia, GTC 2012, kepler, CUDA

SAN JOSE, Calif.—GPU Technology Conference—May 15, 2012—NVIDIA today unveiled a new family of Tesla GPUs based on the revolutionary NVIDIA Kepler GPU computing architecture, which makes GPU-accelerated computing easier and more accessible for a broader range of high performance computing (HPC) scientific and technical applications.

GTC_horizontal_376_large.jpg

The new NVIDIA Tesla K10 and K20 GPUs are computing accelerators built to handle the most complex HPC problems in the world. Designed with an intense focus on high performance and extreme power efficiency, Kepler is three times as efficient as its predecessor, the NVIDIA Fermi architecture, which itself established a new standard for parallel computing when introduced two years ago.

“Fermi was a major step forward in computing,” said Bill Dally, chief scientist and senior vice president of research at NVIDIA. “It established GPU-accelerated computing in the top tier of high performance computing and attracted hundreds of thousands of developers to the GPU computing platform. Kepler will be equally disruptive, establishing GPUs broadly into technical computing, due to their ease of use, broad applicability and efficiency.”

servers-workstations-on.png

The Tesla K10 and K20 GPUs were introduced at the GPU Technology Conference (GTC), as part of a series of announcements from NVIDIA, all of which can be accessed in the GTC online press room.

NVIDIA developed a set of innovative architectural technologies that make the Kepler GPUs high performing and highly energy efficient, as well as more applicable to a wider set of developers and applications. Among the major innovations are:

  • SMX Streaming Multiprocessor – The basic building block of every GPU, the SMX streaming multiprocessor was redesigned from the ground up for high performance and energy efficiency. It delivers up to three times more performance per watt than the Fermi streaming multiprocessor, making it possible to build a supercomputer that delivers one petaflop of computing performance in just 10 server racks. SMX’s energy efficiency was achieved by increasing its number of CUDA architecture cores by four times, while reducing the clock speed of each core, power-gating parts of the GPU when idle and maximizing the GPU area devoted to parallel-processing cores instead of control logic.
  • Dynamic Parallelism – This capability enables GPU threads to dynamically spawn new threads, allowing the GPU to adapt dynamically to the data. It greatly simplifies parallel programming, enabling GPU acceleration of a broader set of popular algorithms, such as adaptive mesh refinement, fast multipole methods and multigrid methods.
  • Hyper-Q – This enables multiple CPU cores to simultaneously use the CUDA architecture cores on a single Kepler GPU. This dramatically increases GPU utilization, slashing CPU idle times and advancing programmability. Hyper-Q is ideal for cluster applications that use MPI.

“We designed Kepler with an eye towards three things: performance, efficiency and accessibility,” said Jonah Alben, senior vice president of GPU Engineering and principal architect of Kepler at NVIDIA. “It represents an important milestone in GPU-accelerated computing and should foster the next wave of breakthroughs in computational research.”

NVIDIA Tesla K10 and K20 GPUs
The NVIDIA Tesla K10 GPU delivers the world’s highest throughput for signal, image and seismic processing applications. Optimized for customers in oil and gas exploration and the defense industry, a single Tesla K10 accelerator board features two GK104 Kepler GPUs that deliver an aggregate performance of 4.58 teraflops of peak single-precision floating point and 320 GB per second memory bandwidth.

The NVIDIA Tesla K20 GPU is the new flagship of the Tesla GPU product family, designed for the most computationally intensive HPC environments. Expected to be the world’s highest-performance, most energy-efficient GPU, the Tesla K20 is planned to be available in the fourth quarter of 2012.

The Tesla K20 is based on the GK110 Kepler GPU. This GPU delivers three times more double precision compared to Fermi architecture-based Tesla products and it supports the Hyper-Q and dynamic parallelism capabilities. The GK110 GPU is expected to be incorporated into the new Titan supercomputer at the Oak Ridge National Laboratory in Tennessee and the Blue Waters system at the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign.

“In the two years since Fermi was launched, hybrid computing has become a widely adopted way to achieve higher performance for a number of critical HPC applications,” said Earl C. Joseph, program vice president of High-Performance Computing at IDC. “Over the next two years, we expect that GPUs will be increasingly used to provide higher performance on many applications.”

Preview of CUDA 5 Parallel Programming Platform
In addition to the Kepler architecture, NVIDIA today released a preview of the CUDA 5 parallel programming platform. Available to more than 20,000 members of NVIDIA’s GPU Computing Registered Developer program, the platform will enable developers to begin exploring ways to take advantage of the new Kepler GPUs, including dynamic parallelism.

The CUDA 5 parallel programming model is planned to be widely available in the third quarter of 2012. Developers can get access to the preview release by signing up for the GPU Computing Registered Developer program on the CUDA website.

Source: NVIDIA

NVIDIA Updates CUDA: Major Release for Science Research

Subject: General Tech, Graphics Cards | January 29, 2012 - 02:53 AM |
Tagged: nvidia, gpgpu, CUDA

NVIDIA has traditionally been very interested in acquiring room in the high-performance computing for scientific research market. For a lot of functions, having a fast and highly parallel processor saves time and money compared to having a traditional computer crunch away or having to book time with one of the world’s relatively few supercomputers. Despite the raw performance of a GPU, adequate development tools are required to bring the simulation or calculation into a functional program to execute on said GPU. NVIDIA is said to have had a strong lead with their CUDA platform for quite some time; that lead will likely continue with releases the size of this one.

MOD-9981_CUDAVisualProfiler.jpg

What does a tuned up GPU purr like? Cuda cuda cuda cuda cuda.

The most recent release, CUDA 4.1, has three main features:

  • A visual profiler to point out common mistakes and optimizations and to provide instructions which detail how to alter your code to increase your performance
  • A new compiler which is based on the LLVM infrastructure, making good on their promise to open the CUDA platform to other architectures -- both software and hardware
  • New image and signal processing functions for their NVIDIA Performance Primitives (NPP) library, relieving developers the need to create their own versions or license a proprietary library

The three features, as NVIDIA describes them in their press release, are listed below.

New Visual Profiler - Easiest path to performance optimization
The new Visual Profiler makes it easy for developers at all experience levels to optimize their code for maximum performance. Featuring automated performance analysis and an expert guidance system that delivers step-by-step optimization suggestions, the Visual Profiler identifies application performance bottlenecks and recommends actions, with links to the optimization guides. Using the new Visual Profiler, performance bottlenecks are easily identified and actionable.

LLVM Compiler - Instant 10 percent increase in application performance
LLVM is a widely-used open-source compiler infrastructure featuring a modular design that makes it easy to add support for new programming languages and processor architectures. Using the new LLVM-based CUDA compiler, developers can achieve up to 10 percent additional performance gains on existing GPU-accelerated applications with a simple recompile. In addition, LLVM's modular design allows third-party software tool developers to provide a custom LLVM solution for non-NVIDIA processor architectures, enabling CUDA applications to run across NVIDIA GPUs, as well as those from other vendors.

New Image, Signal Processing Library Functions - "Drop-in" Acceleration with NPP Library
NVIDIA has doubled the size of its NPP library, with the addition of hundreds of new image and signal processing functions. This enables virtually any developer using image or signal processing algorithms to easily gain the benefit of GPU acceleration, with the simple addition of library calls into their application. The updated NPP library can be used for a wide variety of image and signal processing algorithms, ranging from basic filtering to advanced workflows.
 

Source: NVIDIA

Give your project good CARMA, get a CUDA on ARM dev kit!

Subject: General Tech, Graphics Cards, Processors | December 20, 2011 - 04:34 AM |
Tagged: nvidia, CUDA, CARMA, capital letters, arm

Okay so the pun was a little obvious, but NVIDIA has just announced the specifications and name for the development kit used to develop for their ARM-based GPU computing platform. The development kit will provide a method to build and test applications on a platform similar to what will be found in the Barcelona Supercomputing Centre’s upcoming GPU supercomputer until you are ready to deploy the finished application with real data on the real machine. Such is the life of a development units.

CARMA.jpg

Carma: What goes around, comes around... right Intel?

The development kit is quite modest in its specifications:

  • Tegra3 ARM A9 CPU
  • Quadro 1000M GPU (96 CUDA Cores)
  • 2GB system RAM, 2GB GPU RAM
  • 4x PCIe Gen1 CPU to GPU link
  • 1000Base-T networking support
  • SATA, HDMI, DisplayPort, USB.
 
While the specifications are somewhere between a high-end tablet and a modest workstation, the real story is the continued progress by NVIDIA into the High Performance Computing (HPC) market. NVIDIA seems to be certain that they are able to (ARM-)wrestle more market share from Intel and other players such as IBM on the high performance front. Many would probably speculate about NVIDIA’s crushing in towards the home market from both ends, but I expect that creating a compelling ARM product for a desktop PC will never be the problem for NVIDIA: it is a lack of anything compelling to run on it these days for a desktop user.
Source: NVIDIA Blogs

CUDA been done sooner! NVIDIA open sources CUDA platform

Subject: General Tech, Graphics Cards, Processors | December 15, 2011 - 04:03 AM |
Tagged: CUDA

NVIDIA lays as the current front-runner for the “Last Year’s Best Decision, This Year” award. You may remember our coverage last June of the AMD Fusion Developer Summit; industry members such as ARM, Microsoft, and of course AMD discussed the potential of utilizing specialized processors and developing on open platforms such as OpenCL and Microsoft’s announced C++ AMP. Do you know what would have been an amazing announcement for AFDS to stomp OpenCL and C++ AMP? That NVIDIA would open up CUDA. Know what announcement missed that bus by a whole half a year? NVIDIA will open up CUDA.

GPGPU-Trail.png

Your platform pooh-pooh? Bear a CUDA.

While I just harassed NVIDIA for their timing, it might not be too late. CUDA is still a powerhouse of a GPGPU platform with substantial software support from absolute mammoth software packages such as Adobe Creative Suite to smaller projects like KGPU. With the open sourcing of the CUDA compiler, NVIDIA is also permitting manufacturers like AMD and even Intel to support CUDA with their GPUs, x86 CPUs, and other processing units. While I am excited at this outcome, I am still somewhat confused about NVIDIA’s timing: they are just a little late to open up and crush the market, and they seem quite abrupt if they originally intended CUDA to survive as a forever-proprietary computing platform.

Source: NVIDIA

NVDA Cum Laude-ing Stanford a CUDA Center of Excellence

Subject: Editorial, General Tech, Graphics Cards | July 17, 2011 - 01:07 PM |
Tagged: stanford, nvidia, CUDA

NVIDIA has been pushing their CUDA platform for years now as a method to access your GPU for purposes far beyond the scopes of flags and frags. We have seen what a good amount of heterogeneous hardware will do to a process with a hefty portion of parallelizable code from encryption to generating bitcoins; media processing to blurring the line between real-time and non-real-time 3d rendering. NVIDIA also recognizes the role that academia plays in training the future programmers and thus strongly supports when an institution teaches how to use GPU hardware effectively, especially when they teach how to use NVIDIA GPU hardware effectively. Recently, NVIDIA knighted Stanford as the latest of its CUDA Center of Excellence round table.

GPUniversity.jpg

It will be 150$ if you want it framed.

The list of CUDA Centres of Excellence now currently includes: Georgia Institute of Technology, Harvard School of Engineering, Institute of Process Engineering at Chinese Academy of Sciences, National Taiwan University, Stanford Engineering, TokyoTech, Tsinghua University, University of Cambridge, University of Illinois at Urbana-Champaign, University of Maryland, University of Tennessee, and the University of Utah. If you are interested in learning about programming for GPUs then NVIDIA has just graced blessing on one further choice. Whether that will affect many prospective students and faculty is yet to be seen, but it makes for many amusing puns nonetheless.

Source: NVIDIA

Wish you CUDA had a GPGPU C++ template? Now you can!

Subject: General Tech, Graphics Cards | June 29, 2011 - 08:58 PM |
Tagged: gpgpu, CUDA

If you have seen our various news articles regarding how a GPU can be useful in many ways, and you are a developer yourself, you may be wondering how to get in on that action. Recently Microsoft showed off their competitor to OpenCL known as C++ AMP and AMD showed off some new tools designed to help developers of OpenCL. Everything was dead silent on the CUDA front at the AMD Fusion Developer Summit, as expected, but that does not mean that no-one is helping people who do not mind being tied in to NVIDIA. An open-sourced project has been created to generate template file for programmers wishing to do some of their computation in CUDA and wish a helping hand setting up the framework.

GPGPU-Trail.png

You may think the videocard is backwards, but clearly its DVI heads are in front.

The project was started by Pavel Kartashev and is a Java application that accepts form input and generates CUDA code to be imported into your project. The application will help you generate the tedious skeleton code for defining variables and efficiently using the GPU architecture leaving you to program the actual process to be accomplished itself. The author apparently plans to create a Web-based version which should be quite easy with the Java-based nature of his application. Personally I would find myself more interested in the local application or a widget to leaving my web browser windows to reference material. That said, I am sure that someone would like this tool in their web browser, possibly more people than are like-minded with me.

 
If you are interested in contributing either financially or through labor he asks that you contact him through the email tied with his Paypal account (likely for spam reasons, so I can assume posting it here would be the opposite of helpful). The rest of us can sit back, enjoy our GPU-enabled applications, and bet on how long it will take NVIDIA to reach out to him. I got all next week.
Source:

Alenka: The SQL, starring CUDA!

Subject: General Tech, Graphics Cards, Storage | May 11, 2011 - 07:58 PM |
Tagged: SQL, developer, CUDA

Programmers are beginning to understand and be ever more comfortable with the uses of GPUs in their applications. Late last week we explored the KGPU project. KGPU is designed to allow the Linux kernel to offload massively parallel processes to the GPU to offload the CPU as well as directly increase performance. KGPU showed that in terms of an encrypted file system you can see whole multiple increases in read and write bandwidth on an SSD. Perhaps this little GPU thing can be useful for more? Alenka Project thinks so: they are currently working on a CUDA-based SQL-like language for data processing.

10-nv_logo.png

CUDA woulda shoulda... and did.

SQL databases are some of the most common methods to store and manipulate larger sets of data. If you have a blog it almost definitely is storing its information in a SQL database. If you play an MMO your data is almost definitely stored and accessed on a SQL server. As your data size expands and your number of concurrent accesses increases you can see why using a GPU could keep your application running much smoother.

Alenka in its current release supports large data sets exceeding both GPU and system RAM via streaming chunks, processing, and moving on. Its supported primitive types are doubles, longs, and varchars. It is open source under the Apache license V2.0. Developers interested in using or assisting with the project can check out their Sourceforge. We should continue to see more and more GPU-based applications appear in the near future as problems such as these are finally lifted from the CPU and given to someone more suitable to bear.