NVIDIA Finally Gets Serious with Tegra
Tegra has had an interesting run of things. The original Tegra 1 was utilized only by Microsoft with Zune. Tegra 2 had a better adoption, but did not produce the design wins to propel NVIDIA to a leadership position in cell phones and tablets. Tegra 3 found a spot in Microsoft’s Surface, but that has turned out to be a far more bitter experience than expected. Tegra 4 so far has been integrated into a handful of products and is being featured in NVIDIA’s upcoming Shield product. It also hit some production snags that made it later to market than expected.
I think the primary issue with the first three generations of products is pretty simple. There was a distinct lack of differentiation from the other ARM based products around. Yes, NVIDIA brought their graphics prowess to the market, but never in a form that distanced itself adequately from the competition. Tegra 2 boasted GeForce based graphics, but we did not find out until later that it was comprised of basically four pixel shaders and four vertex shaders that had more in common with the GeForce 7800/7900 series than it did with any of the modern unified architectures of the time. Tegra 3 boasted a big graphical boost, but it was in the form of doubling the pixel shader units and leaving the vertex units alone.
While NVIDIA had very strong developer relations and a leg up on the competition in terms of software support, it was never enough to propel Tegra beyond a handful of devices. NVIDIA is trying to rectify that with Tegra 4 and the 72 shader units that it contains (still divided between pixel and vertex units). Tegra 4 is not perfect in that it is late to market and the GPU is not OpenGL ES 3.0 compliant. ARM, Imagination Technologies, and Qualcomm are offering new graphics processing units that are not only OpenGL ES 3.0 compliant, but also offer OpenCL 1.1 support. Tegra 4 does not support OpenCL. In fact, it does not support NVIDIA’s in-house CUDA. Ouch.
Jumping into a new market is not an easy thing, and invariably mistakes will be made. NVIDIA worked hard to make a solid foundation with their products, and certainly they had to learn to walk before they could run. Unfortunately, running effectively entails having design wins due to outstanding features, performance, and power consumption. NVIDIA was really only average in all of those areas. NVIDIA is hoping to change that. Their first salvo into offering a product that offers features and support that is a step above the competition is what we are talking about today.
Subject: General Tech | April 12, 2013 - 02:08 AM | Tim Verry
Tagged: SECO, nvidia, mini ITX, kepler, kayla, GTC 13, GTC, CUDA, arm
Last month, NVIDIA revealed its Kayla development platform that combines a quad core Tegra System on a Chip (SoC) with a NVIDIA Kepler GPU. Kayla will out later this year, but that has not stopped other board makers from putting together their own solutions. One such solution that began shipping earlier this week is the mITX GPU Devkit from SECO.
The new mITX GPU Devkit is a hardware platform for developers to program CUDA applications for mobile devices, desktops, workstations, and HPC servers. It combines a NVIDIA Tegra 3 processor, 2GB of RAM, and 4GB of internal storage (eMMC) on a Qseven module with a Mini-ITX form factor motherboard. Developers can then plug their own CUDA-capable graphics card into the single PCI-E 2.0 x16 slot (which actually runs at x4 speeds). Additional storage can be added via an internal SATA connection, and cameras can be hooked up using the CIC headers.
Rear IO on the mITX GPU Devkit includes:
- 1 x Gigabit Ethernet
- 3 x USB
- 1 x OTG port
- 1 x HDMI
- 1 x Display Port
- 3 x Analog audio
- 2 x Serial
- 1 x SD card slot
The SECO platform is a proving to be popular for GPGPU in the server space, especially with systems like Pedraforca. The intention of using these types of platforms in servers is to save power by using a low power ARM chip for inter-node communication and basic tasks while the real computing is done solely on the graphics cards. With Intel’s upcoming Haswell-based Xeon chips getting down to 13W TPDs though, systems like this are going to be more difficult to justify. SECO is mostly positioning this platform as a development board, however. One use in that respect is to begin optimizing GPU-accelerated code for mobile devices. With future Tegra chips to get CUDA-compatible graphics cards, new software development and optimization of existing GPGPU code for smartphones and tablet will be increasingly important.
Either way, the SECO mITX GPU Devkit is available now for 349 EUR or approximately $360 (in both cases, before any taxes).
Subject: General Tech | November 12, 2012 - 06:29 AM | Tim Verry
Tagged: tesla, supercomputer, nvidia, k20x, HPC, CUDA, computing
Graphics card manufacturer NVIDIA launched a new Tesla K20X accelerator card today that supplants the existing K20 as the top of the line model. The new card cranks up the double and single precision floating point performance, beefs up the memory capacity and bandwidth, and brings some efficiency improvements to the supercomputer space.
While it is not yet clear how many CUDA cores the K20X has, NVIDIA has stated that it is using the GK110 GPU, and is running with 6GB of memory with 250 GB/s of bandwidth – a nice improvement over the K20’s 5GB at 208 GB/s. Both the new K20X and K20 accelerator cards are based on the company’s Kepler architecture, but NVIDIA has managed to wring out more performance from the K20X. The K20 is rated at 1.17 TFlops peak double precision and 3.52 TFlops peak single precision while the K20X is rated at 1.31 TFlops and 3.95 TFlops.
The K20X manages to score 1.22 TFlops in DGEmm, which puts it at almost three times faster than the previous generation Tesla M2090 accelerator based on the Fermi architecture.
Aside from pure performance, NVIDIA is also touting efficiency gains with the new K20X accelerator card. When two K20X cards are paired with a 2P Sandy Bridge server, NVIDIA claims to achieve 76% efficiency versus 61% efficiency with a 2P Sandy Bridge server equipped with two previous generation M2090 accelerator cards. Additionally, NVIDIA claims to have enabled the Titan supercomputer to reach the #1 spot on the top 500 green supercomputers thanks to its new cards with a rating of 2,120.16 MFLOPS/W (million floating point operations per second per watt).
NVIDIA claims to have already shipped 30 PFLOPS worth of GPU accelerated computing power. Interestingly, most of that computing power is housed in the recently unveiled Titan supercomputer. This supercomputer contains 18,688 Tesla K20X (Kepler GK110) GPUs and 299,008 16-core AMD Opteron 6274 processors. It will consume 9 megawatts of power and is rated at a peak of 27 Petaflops and 17.59 Petaflops during a sustained Linpack benchmark. Further, when compared to Sandy Bridge processors, the K20 series offers up between 8.2 and 18.1 times more performance at several scientific applications.
While the Tesla cards undoubtedly use more power than CPUs, you need far fewer numbers of accelerator cards than processors to hit the same performance numbers. That is where NVIDIA is getting its power efficiency numbers from.
NVIDIA is aiming the accelerator cards at researchers and businesses doing 3D graphics, visual effects, high performance computing, climate modeling, molecular dynamics, earth science, simulations, fluid dynamics, and other such computationally intensive tasks. Using CUDA and the parrallel nature of the GPU, the Tesla cards can acheive performance much higher than a CPU-only system can. NVIDIA has also engineered software to better parrellelize workloads and keep the GPU accelerators fed with data that the company calls Hyper-Q and Dynamic Parallelism respectively.
It is interesting to see NVIDIA bring out a new flagship, especially another GK110 card. Systems using the K20 and the new K20X are available now with cards shipping this week and general availability later this month.
You can find the full press release below and a look at the GK110 GPU in our preview.
Anandtech also managed to get a look inside the Titan supercomputer at Oak Ridge National Labratory, where you can see the Tesla K20X cards in action.
Subject: General Tech | May 30, 2012 - 12:11 PM | Jeremy Hellstrom
Tagged: CUDA, open source, opengl
Hack a Day linked to a program that could be of great use for anyone who manipulates and processes images, or anyone who wants to be able to make fractals very quickly. Utilizing the OpenGL Shader Language Reuben Carter developed a command line tool that processes images using NVIDIA GPUs. As we have talked about in the past on PC Perspective, GPUs are much better at this sort of parallel processing than a traditional CPU or the CPU portion on modern processors. Below is one obvious use of this program, the quick creation of complex fractals but this program can also process pre-exisiting images. Edge detection, colour transforms and perhaps even image recognition tasks can be completed with his software at a much faster speed than CPU bound image manipulation programs. If you are in that field, or looking to decorate your dorm room, you should grab his software via the GitHub link in the article.
"If you ever need to manipulate images really fast, or just want to make some pretty fractals, [Reuben] has just what you need. He developed a neat command line tool to send code to a graphics card and generate images using pixel shaders. Opposed to making these images with a CPU, a GPU processes every pixel in parallel, making image processing much faster."
Here is some more Tech News from around the web:
- Hard disk drive prices quick to rise, slow to fall @ The Register
- Microsoft's New User Agreement Bans Class Action Lawsuits @ NGOHQ
- AIDA64 v2.50 is released @ FinalWire
Subject: Shows and Expos | May 15, 2012 - 03:43 PM | Jeremy Hellstrom
Tagged: tesla, nvidia, GTC 2012, kepler, CUDA
SAN JOSE, Calif.—GPU Technology Conference—May 15, 2012—NVIDIA today unveiled a new family of Tesla GPUs based on the revolutionary NVIDIA Kepler GPU computing architecture, which makes GPU-accelerated computing easier and more accessible for a broader range of high performance computing (HPC) scientific and technical applications.
The new NVIDIA Tesla K10 and K20 GPUs are computing accelerators built to handle the most complex HPC problems in the world. Designed with an intense focus on high performance and extreme power efficiency, Kepler is three times as efficient as its predecessor, the NVIDIA Fermi architecture, which itself established a new standard for parallel computing when introduced two years ago.
“Fermi was a major step forward in computing,” said Bill Dally, chief scientist and senior vice president of research at NVIDIA. “It established GPU-accelerated computing in the top tier of high performance computing and attracted hundreds of thousands of developers to the GPU computing platform. Kepler will be equally disruptive, establishing GPUs broadly into technical computing, due to their ease of use, broad applicability and efficiency.”
The Tesla K10 and K20 GPUs were introduced at the GPU Technology Conference (GTC), as part of a series of announcements from NVIDIA, all of which can be accessed in the GTC online press room.
NVIDIA developed a set of innovative architectural technologies that make the Kepler GPUs high performing and highly energy efficient, as well as more applicable to a wider set of developers and applications. Among the major innovations are:
- SMX Streaming Multiprocessor – The basic building block of every GPU, the SMX streaming multiprocessor was redesigned from the ground up for high performance and energy efficiency. It delivers up to three times more performance per watt than the Fermi streaming multiprocessor, making it possible to build a supercomputer that delivers one petaflop of computing performance in just 10 server racks. SMX’s energy efficiency was achieved by increasing its number of CUDA architecture cores by four times, while reducing the clock speed of each core, power-gating parts of the GPU when idle and maximizing the GPU area devoted to parallel-processing cores instead of control logic.
- Dynamic Parallelism – This capability enables GPU threads to dynamically spawn new threads, allowing the GPU to adapt dynamically to the data. It greatly simplifies parallel programming, enabling GPU acceleration of a broader set of popular algorithms, such as adaptive mesh refinement, fast multipole methods and multigrid methods.
- Hyper-Q – This enables multiple CPU cores to simultaneously use the CUDA architecture cores on a single Kepler GPU. This dramatically increases GPU utilization, slashing CPU idle times and advancing programmability. Hyper-Q is ideal for cluster applications that use MPI.
“We designed Kepler with an eye towards three things: performance, efficiency and accessibility,” said Jonah Alben, senior vice president of GPU Engineering and principal architect of Kepler at NVIDIA. “It represents an important milestone in GPU-accelerated computing and should foster the next wave of breakthroughs in computational research.”
NVIDIA Tesla K10 and K20 GPUs
The NVIDIA Tesla K10 GPU delivers the world’s highest throughput for signal, image and seismic processing applications. Optimized for customers in oil and gas exploration and the defense industry, a single Tesla K10 accelerator board features two GK104 Kepler GPUs that deliver an aggregate performance of 4.58 teraflops of peak single-precision floating point and 320 GB per second memory bandwidth.
The NVIDIA Tesla K20 GPU is the new flagship of the Tesla GPU product family, designed for the most computationally intensive HPC environments. Expected to be the world’s highest-performance, most energy-efficient GPU, the Tesla K20 is planned to be available in the fourth quarter of 2012.
The Tesla K20 is based on the GK110 Kepler GPU. This GPU delivers three times more double precision compared to Fermi architecture-based Tesla products and it supports the Hyper-Q and dynamic parallelism capabilities. The GK110 GPU is expected to be incorporated into the new Titan supercomputer at the Oak Ridge National Laboratory in Tennessee and the Blue Waters system at the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign.
“In the two years since Fermi was launched, hybrid computing has become a widely adopted way to achieve higher performance for a number of critical HPC applications,” said Earl C. Joseph, program vice president of High-Performance Computing at IDC. “Over the next two years, we expect that GPUs will be increasingly used to provide higher performance on many applications.”
Preview of CUDA 5 Parallel Programming Platform
In addition to the Kepler architecture, NVIDIA today released a preview of the CUDA 5 parallel programming platform. Available to more than 20,000 members of NVIDIA’s GPU Computing Registered Developer program, the platform will enable developers to begin exploring ways to take advantage of the new Kepler GPUs, including dynamic parallelism.
The CUDA 5 parallel programming model is planned to be widely available in the third quarter of 2012. Developers can get access to the preview release by signing up for the GPU Computing Registered Developer program on the CUDA website.
Subject: General Tech, Graphics Cards | January 29, 2012 - 02:53 AM | Scott Michaud
Tagged: nvidia, gpgpu, CUDA
NVIDIA has traditionally been very interested in acquiring room in the high-performance computing for scientific research market. For a lot of functions, having a fast and highly parallel processor saves time and money compared to having a traditional computer crunch away or having to book time with one of the world’s relatively few supercomputers. Despite the raw performance of a GPU, adequate development tools are required to bring the simulation or calculation into a functional program to execute on said GPU. NVIDIA is said to have had a strong lead with their CUDA platform for quite some time; that lead will likely continue with releases the size of this one.
What does a tuned up GPU purr like? Cuda cuda cuda cuda cuda.
The most recent release, CUDA 4.1, has three main features:
- A visual profiler to point out common mistakes and optimizations and to provide instructions which detail how to alter your code to increase your performance
- A new compiler which is based on the LLVM infrastructure, making good on their promise to open the CUDA platform to other architectures -- both software and hardware
- New image and signal processing functions for their NVIDIA Performance Primitives (NPP) library, relieving developers the need to create their own versions or license a proprietary library
The three features, as NVIDIA describes them in their press release, are listed below.
New Visual Profiler - Easiest path to performance optimization
The new Visual Profiler makes it easy for developers at all experience levels to optimize their code for maximum performance. Featuring automated performance analysis and an expert guidance system that delivers step-by-step optimization suggestions, the Visual Profiler identifies application performance bottlenecks and recommends actions, with links to the optimization guides. Using the new Visual Profiler, performance bottlenecks are easily identified and actionable.
LLVM Compiler - Instant 10 percent increase in application performance
LLVM is a widely-used open-source compiler infrastructure featuring a modular design that makes it easy to add support for new programming languages and processor architectures. Using the new LLVM-based CUDA compiler, developers can achieve up to 10 percent additional performance gains on existing GPU-accelerated applications with a simple recompile. In addition, LLVM's modular design allows third-party software tool developers to provide a custom LLVM solution for non-NVIDIA processor architectures, enabling CUDA applications to run across NVIDIA GPUs, as well as those from other vendors.
New Image, Signal Processing Library Functions - "Drop-in" Acceleration with NPP Library
NVIDIA has doubled the size of its NPP library, with the addition of hundreds of new image and signal processing functions. This enables virtually any developer using image or signal processing algorithms to easily gain the benefit of GPU acceleration, with the simple addition of library calls into their application. The updated NPP library can be used for a wide variety of image and signal processing algorithms, ranging from basic filtering to advanced workflows.
Subject: General Tech, Graphics Cards, Processors | December 20, 2011 - 04:34 AM | Scott Michaud
Tagged: nvidia, CUDA, CARMA, capital letters, arm
Okay so the pun was a little obvious, but NVIDIA has just announced the specifications and name for the development kit used to develop for their ARM-based GPU computing platform. The development kit will provide a method to build and test applications on a platform similar to what will be found in the Barcelona Supercomputing Centre’s upcoming GPU supercomputer until you are ready to deploy the finished application with real data on the real machine. Such is the life of a development units.
Carma: What goes around, comes around... right Intel?
The development kit is quite modest in its specifications:
- Tegra3 ARM A9 CPU
- Quadro 1000M GPU (96 CUDA Cores)
- 2GB system RAM, 2GB GPU RAM
- 4x PCIe Gen1 CPU to GPU link
- 1000Base-T networking support
- SATA, HDMI, DisplayPort, USB.
Subject: General Tech, Graphics Cards, Processors | December 15, 2011 - 04:03 AM | Scott Michaud
NVIDIA lays as the current front-runner for the “Last Year’s Best Decision, This Year” award. You may remember our coverage last June of the AMD Fusion Developer Summit; industry members such as ARM, Microsoft, and of course AMD discussed the potential of utilizing specialized processors and developing on open platforms such as OpenCL and Microsoft’s announced C++ AMP. Do you know what would have been an amazing announcement for AFDS to stomp OpenCL and C++ AMP? That NVIDIA would open up CUDA. Know what announcement missed that bus by a whole half a year? NVIDIA will open up CUDA.
Your platform pooh-pooh? Bear a CUDA.
While I just harassed NVIDIA for their timing, it might not be too late. CUDA is still a powerhouse of a GPGPU platform with substantial software support from absolute mammoth software packages such as Adobe Creative Suite to smaller projects like KGPU. With the open sourcing of the CUDA compiler, NVIDIA is also permitting manufacturers like AMD and even Intel to support CUDA with their GPUs, x86 CPUs, and other processing units. While I am excited at this outcome, I am still somewhat confused about NVIDIA’s timing: they are just a little late to open up and crush the market, and they seem quite abrupt if they originally intended CUDA to survive as a forever-proprietary computing platform.
Subject: Editorial, General Tech, Graphics Cards | July 17, 2011 - 01:07 PM | Scott Michaud
Tagged: stanford, nvidia, CUDA
NVIDIA has been pushing their CUDA platform for years now as a method to access your GPU for purposes far beyond the scopes of flags and frags. We have seen what a good amount of heterogeneous hardware will do to a process with a hefty portion of parallelizable code from encryption to generating bitcoins; media processing to blurring the line between real-time and non-real-time 3d rendering. NVIDIA also recognizes the role that academia plays in training the future programmers and thus strongly supports when an institution teaches how to use GPU hardware effectively, especially when they teach how to use NVIDIA GPU hardware effectively. Recently, NVIDIA knighted Stanford as the latest of its CUDA Center of Excellence round table.
It will be 150$ if you want it framed.
The list of CUDA Centres of Excellence now currently includes: Georgia Institute of Technology, Harvard School of Engineering, Institute of Process Engineering at Chinese Academy of Sciences, National Taiwan University, Stanford Engineering, TokyoTech, Tsinghua University, University of Cambridge, University of Illinois at Urbana-Champaign, University of Maryland, University of Tennessee, and the University of Utah. If you are interested in learning about programming for GPUs then NVIDIA has just graced blessing on one further choice. Whether that will affect many prospective students and faculty is yet to be seen, but it makes for many amusing puns nonetheless.
Subject: General Tech, Graphics Cards | June 29, 2011 - 08:58 PM | Scott Michaud
Tagged: gpgpu, CUDA
If you have seen our various news articles regarding how a GPU can be useful in many ways, and you are a developer yourself, you may be wondering how to get in on that action. Recently Microsoft showed off their competitor to OpenCL known as C++ AMP and AMD showed off some new tools designed to help developers of OpenCL. Everything was dead silent on the CUDA front at the AMD Fusion Developer Summit, as expected, but that does not mean that no-one is helping people who do not mind being tied in to NVIDIA. An open-sourced project has been created to generate template file for programmers wishing to do some of their computation in CUDA and wish a helping hand setting up the framework.
You may think the videocard is backwards, but clearly its DVI heads are in front.
The project was started by Pavel Kartashev and is a Java application that accepts form input and generates CUDA code to be imported into your project. The application will help you generate the tedious skeleton code for defining variables and efficiently using the GPU architecture leaving you to program the actual process to be accomplished itself. The author apparently plans to create a Web-based version which should be quite easy with the Java-based nature of his application. Personally I would find myself more interested in the local application or a widget to leaving my web browser windows to reference material. That said, I am sure that someone would like this tool in their web browser, possibly more people than are like-minded with me.
Get notified when we go live!