NVIDIA Launches Jetson TK1 Mobile CUDA Development Platform

Subject: General Tech, Mobile | March 25, 2014 - 06:34 PM |
Tagged: GTC 2014, tegra k1, nvidia, CUDA, kepler, jetson tk1, development

NVIDIA recently unified its desktop and mobile GPU lineups by moving to a Kepler-based GPU in its latest Tegra K1 mobile SoC. The move to the Kepler architecture has simplified development and enabled the CUDA programming model to run on mobile devices. One of the main points of the opening keynote earlier today was ‘CUDA everywhere,’ and NVIDIA has officially accomplished that goal by having CUDA compatible hardware from servers to desktops to tablets and embedded devices.

Speaking of embedded devices, NVIDIA showed off a new development board called the Jetson TK1. This tiny new board features a NVIDIA Tegra K1 SoC at its heart along with 2GB RAM and 16GB eMMC storage. The Jetson TK1 supports a plethora of IO options including an internal expansion port (GPIO compatible), SATA, one half-mini PCI-e slot, serial, USB 3.0, micro USB, Gigabit Ethernet, analog audio, and HDMI video outputs.

NVIDIA Jetson TK1 Mobile CUDA Development Board.jpg

Of course the Tegra K1 part is a quad core (4+1) ARM CPU and a Kepler-based GPU with 192 CUDA cores. The SoC is rated at 326 GFLOPS which enables some interesting compute workloads including machine vision.

Computer Vision On NVIDIA CUDA.jpg

In fact, Audi has been utilizing the Jetson TK1 development board to power its self-driving prototype car (more on that soon). Other intended uses for the new development board include robotics, medical devices, security systems, and perhaps low power compute clusters (such as an improved Pedraforca system).It can also be used as a simple desktop platform for testing and developing mobile applications for other Tegra K1 powered devices, of course.

NVIDIA VisionWorks GTC 2014.jpg

Beyond the hardware, the Jetson TK1 comes with the CUDA toolkit, OpenGL 4.4 driver, and NVIDIA VisionWorks SDK which includes programming libraries and sample code for getting machine vision applications running on the Tegra K1 SoC.

The Jetson TK1 is available for pre-order now at $192 and is slated to begin shipping in April. Interested developers can find more information on the NVIDIA developer website.

 

GTC 2014: NVIDIA Shows Off New Dual GK110 GPU GTX TITAN Z Graphics Card

Subject: General Tech | March 25, 2014 - 02:46 PM |
Tagged: gtx titan z, gtx titan, GTC 2014, CUDA

During the opening keynote, NVIDIA showed off several pieces of hardware that will be available soon. On the desktop and workstation side of things, researchers (and consumers chasing the ultra high end) have the new GTX Titan Z to look forward to. This new graphics card is a dual GK110 GPU monster that offers up 8 TeraFLOPS of number crunching performance for an equally impressive $2,999 price tag.

DSC01411.JPG

Specifically, the GTX TITAN Z is a triple slot graphics card that marries two full GK110 (big Kepler) GPUs for a total of 5,760 CUDA cores, 448 TMUs, and 96 ROPs with 12GB of GDDR5 memory on a 384-bit bus (6GB on a 384-bit bus per GPU). NVIDIA has yet to release clockspeeds, but the two GPUs will run at the same clocks with a dynamic power balancing feature. Four the truly adventurous, it appears possible to SLI two GTX Titan Z cards using the single SLI connector. Display outputs include two DVI, one HDMI, and one DisplayPort connector.

NVIDIA is cooling the card using a single fan and two vapor chambers. Air is drawn inwards and exhausted out of the front exhaust vents.

DSC01415.JPG

In short, the GTX Titan Z is NVIDIA's new number crunching king and should find its way into servers and workstations running big data analytics and simulations. Personally, I'm looking forward to seeing someone slap two of them into a gaming PC and watching the screen catch on fire (not really).

What do you think about the newest dual GPU flagship?

Stay tuned to PC Perspective for further GTC 2014 coverage!

Author:
Manufacturer: NVIDIA

NVIDIA Finally Gets Serious with Tegra

Tegra has had an interesting run of things.  The original Tegra 1 was utilized only by Microsoft with Zune.  Tegra 2 had a better adoption, but did not produce the design wins to propel NVIDIA to a leadership position in cell phones and tablets.  Tegra 3 found a spot in Microsoft’s Surface, but that has turned out to be a far more bitter experience than expected.  Tegra 4 so far has been integrated into a handful of products and is being featured in NVIDIA’s upcoming Shield product.  It also hit some production snags that made it later to market than expected.

I think the primary issue with the first three generations of products is pretty simple.  There was a distinct lack of differentiation from the other ARM based products around.  Yes, NVIDIA brought their graphics prowess to the market, but never in a form that distanced itself adequately from the competition.  Tegra 2 boasted GeForce based graphics, but we did not find out until later that it was comprised of basically four pixel shaders and four vertex shaders that had more in common with the GeForce 7800/7900 series than it did with any of the modern unified architectures of the time.  Tegra 3 boasted a big graphical boost, but it was in the form of doubling the pixel shader units and leaving the vertex units alone.

kepler_smx.jpg

While NVIDIA had very strong developer relations and a leg up on the competition in terms of software support, it was never enough to propel Tegra beyond a handful of devices.  NVIDIA is trying to rectify that with Tegra 4 and the 72 shader units that it contains (still divided between pixel and vertex units).  Tegra 4 is not perfect in that it is late to market and the GPU is not OpenGL ES 3.0 compliant.  ARM, Imagination Technologies, and Qualcomm are offering new graphics processing units that are not only OpenGL ES 3.0 compliant, but also offer OpenCL 1.1 support.  Tegra 4 does not support OpenCL.  In fact, it does not support NVIDIA’s in-house CUDA.  Ouch.

Jumping into a new market is not an easy thing, and invariably mistakes will be made.  NVIDIA worked hard to make a solid foundation with their products, and certainly they had to learn to walk before they could run.  Unfortunately, running effectively entails having design wins due to outstanding features, performance, and power consumption.  NVIDIA was really only average in all of those areas.  NVIDIA is hoping to change that.  Their first salvo into offering a product that offers features and support that is a step above the competition is what we are talking about today.

Continue reading our article on the NVIDIA Kepler architecture making its way to mobile markets and Tegra!

SECO Introduces mITX GPU Devkit for CUDA Programmers

Subject: General Tech | April 11, 2013 - 11:08 PM |
Tagged: SECO, nvidia, mini ITX, kepler, kayla, GTC 13, GTC, CUDA, arm

Last month, NVIDIA revealed its Kayla development platform that combines a quad core Tegra System on a Chip (SoC) with a NVIDIA Kepler GPU. Kayla will out later this year, but that has not stopped other board makers from putting together their own solutions. One such solution that began shipping earlier this week is the mITX GPU Devkit from SECO.

The new mITX GPU Devkit is a hardware platform for developers to program CUDA applications for mobile devices, desktops, workstations, and HPC servers. It combines a NVIDIA Tegra 3 processor, 2GB of RAM, and 4GB of internal storage (eMMC) on a Qseven module with a Mini-ITX form factor motherboard. Developers can then plug their own CUDA-capable graphics card into the single PCI-E 2.0 x16 slot (which actually runs at x4 speeds). Additional storage can be added via an internal SATA connection, and cameras can be hooked up using the CIC headers.

SECO mITX GPU DEVKIT.jpg

Rear IO on the mITX GPU Devkit includes:

  • 1 x Gigabit Ethernet
  • 3 x USB
  • 1 x OTG port
  • 1 x HDMI
  • 1 x Display Port
  • 3 x Analog audio
  • 2 x Serial
  • 1 x SD card slot

The SECO platform is a proving to be popular for GPGPU in the server space, especially with systems like Pedraforca. The intention of using these types of platforms in servers is to save power by using a low power ARM chip for inter-node communication and basic tasks while the real computing is done solely on the graphics cards. With Intel’s upcoming Haswell-based Xeon chips getting down to 13W TPDs though, systems like this are going to be more difficult to justify. SECO is mostly positioning this platform as a development board, however. One use in that respect is to begin optimizing GPU-accelerated code for mobile devices. With future Tegra chips to get CUDA-compatible graphics cards, new software development and optimization of existing GPGPU code for smartphones and tablet will be increasingly important.

SECO mITX GPU DEVKIT box.jpg

Either way, the SECO mITX GPU Devkit is available now for 349 EUR or approximately $360 (in both cases, before any taxes).

Source: SECO

NVIDIA Launches Tesla K20X Accelerator Card, Powers Titan Supercomputer

Subject: General Tech | November 12, 2012 - 03:29 AM |
Tagged: tesla, supercomputer, nvidia, k20x, HPC, CUDA, computing

Graphics card manufacturer NVIDIA launched a new Tesla K20X accelerator card today that supplants the existing K20 as the top of the line model. The new card cranks up the double and single precision floating point performance, beefs up the memory capacity and bandwidth, and brings some efficiency improvements to the supercomputer space.

NVIDIA_Tesla_K20X_K20_GPU_Accelerator.jpg

While it is not yet clear how many CUDA cores the K20X has, NVIDIA has stated that it is using the GK110 GPU, and is running with 6GB of memory with 250 GB/s of bandwidth – a nice improvement over the K20’s 5GB at 208 GB/s. Both the new K20X and K20 accelerator cards are based on the company’s Kepler architecture, but NVIDIA has managed to wring out more performance from the K20X. The K20 is rated at 1.17 TFlops peak double precision and 3.52 TFlops peak single precision while the K20X is rated at 1.31 TFlops and 3.95 TFlops.

Screenshot (363).png

The K20X manages to score 1.22 TFlops in DGEmm, which puts it at almost three times faster than the previous generation Tesla M2090 accelerator based on the Fermi architecture.

Screenshot (362).png

Aside from pure performance, NVIDIA is also touting efficiency gains with the new K20X accelerator card. When two K20X cards are paired with a 2P Sandy Bridge server, NVIDIA claims to achieve 76% efficiency versus 61% efficiency with a 2P Sandy Bridge server equipped with two previous generation M2090 accelerator cards. Additionally, NVIDIA claims to have enabled the Titan supercomputer to reach the #1 spot on the top 500 green supercomputers thanks to its new cards with a rating of 2,120.16 MFLOPS/W (million floating point operations per second per watt).

Screenshot (359).png

NVIDIA claims to have already shipped 30 PFLOPS worth of GPU accelerated computing power. Interestingly, most of that computing power is housed in the recently unveiled Titan supercomputer. This supercomputer contains 18,688 Tesla K20X (Kepler GK110) GPUs and 299,008 16-core AMD Opteron 6274 processors. It will consume 9 megawatts of power and is rated at a peak of 27 Petaflops and 17.59 Petaflops during a sustained Linpack benchmark. Further, when compared to Sandy Bridge processors, the K20 series offers up between 8.2 and 18.1 times more performance at several scientific applications.

Screenshot (360).png

While the Tesla cards undoubtedly use more power than CPUs, you need far fewer numbers of accelerator cards than processors to hit the same performance numbers. That is where NVIDIA is getting its power efficiency numbers from.

NVIDIA is aiming the accelerator cards at researchers and businesses doing 3D graphics, visual effects, high performance computing, climate modeling, molecular dynamics, earth science, simulations, fluid dynamics, and other such computationally intensive tasks. Using CUDA and the parrallel nature of the GPU, the Tesla cards can acheive performance much higher than a CPU-only system can. NVIDIA has also engineered software to better parrellelize workloads and keep the GPU accelerators fed with data that the company calls Hyper-Q and Dynamic Parallelism respectively.

It is interesting to see NVIDIA bring out a new flagship, especially another GK110 card. Systems using the K20 and the new K20X are available now with cards shipping this week and general availability later this month.

You can find the full press release below and a look at the GK110 GPU in our preview.

Anandtech also managed to get a look inside the Titan supercomputer at Oak Ridge National Labratory, where you can see the Tesla K20X cards in action.

A very specialized but completely open source CUDA-like program for image manipulation

Subject: General Tech | May 30, 2012 - 09:11 AM |
Tagged: CUDA, open source, opengl

Hack a Day linked to a program that could be of great use for anyone who manipulates and processes images, or anyone who wants to be able to make fractals very quickly.  Utilizing the OpenGL Shader Language Reuben Carter developed a command line tool that processes images using NVIDIA GPUs.  As we have talked about in the past on PC Perspective, GPUs are much better at this sort of parallel processing than a traditional CPU or the CPU portion on modern processors.  Below is one obvious use of this program, the quick creation of complex fractals but this program can also process pre-exisiting images.  Edge detection, colour transforms and perhaps even image recognition tasks can be completed with his software at a much faster speed than CPU bound image manipulation programs.  If you are in that field, or looking to decorate your dorm room,  you should grab his software via the GitHub link in the article.

RJC_Mandelbrot.jpg

"If you ever need to manipulate images really fast, or just want to make some pretty fractals, [Reuben] has just what you need. He developed a neat command line tool to send code to a graphics card and generate images using pixel shaders. Opposed to making these images with a CPU, a GPU processes every pixel in parallel, making image processing much faster."

Here is some more Tech News from around the web:

Tech Talk

 

Source: Hack a Day

NVIDIA Pioneers New Standard for High Performance Computing with Tesla GPUs

Subject: Shows and Expos | May 15, 2012 - 12:43 PM |
Tagged: tesla, nvidia, GTC 2012, kepler, CUDA

SAN JOSE, Calif.—GPU Technology Conference—May 15, 2012—NVIDIA today unveiled a new family of Tesla GPUs based on the revolutionary NVIDIA Kepler GPU computing architecture, which makes GPU-accelerated computing easier and more accessible for a broader range of high performance computing (HPC) scientific and technical applications.

GTC_horizontal_376_large.jpg

The new NVIDIA Tesla K10 and K20 GPUs are computing accelerators built to handle the most complex HPC problems in the world. Designed with an intense focus on high performance and extreme power efficiency, Kepler is three times as efficient as its predecessor, the NVIDIA Fermi architecture, which itself established a new standard for parallel computing when introduced two years ago.

“Fermi was a major step forward in computing,” said Bill Dally, chief scientist and senior vice president of research at NVIDIA. “It established GPU-accelerated computing in the top tier of high performance computing and attracted hundreds of thousands of developers to the GPU computing platform. Kepler will be equally disruptive, establishing GPUs broadly into technical computing, due to their ease of use, broad applicability and efficiency.”

servers-workstations-on.png

The Tesla K10 and K20 GPUs were introduced at the GPU Technology Conference (GTC), as part of a series of announcements from NVIDIA, all of which can be accessed in the GTC online press room.

NVIDIA developed a set of innovative architectural technologies that make the Kepler GPUs high performing and highly energy efficient, as well as more applicable to a wider set of developers and applications. Among the major innovations are:

  • SMX Streaming Multiprocessor – The basic building block of every GPU, the SMX streaming multiprocessor was redesigned from the ground up for high performance and energy efficiency. It delivers up to three times more performance per watt than the Fermi streaming multiprocessor, making it possible to build a supercomputer that delivers one petaflop of computing performance in just 10 server racks. SMX’s energy efficiency was achieved by increasing its number of CUDA architecture cores by four times, while reducing the clock speed of each core, power-gating parts of the GPU when idle and maximizing the GPU area devoted to parallel-processing cores instead of control logic.
  • Dynamic Parallelism – This capability enables GPU threads to dynamically spawn new threads, allowing the GPU to adapt dynamically to the data. It greatly simplifies parallel programming, enabling GPU acceleration of a broader set of popular algorithms, such as adaptive mesh refinement, fast multipole methods and multigrid methods.
  • Hyper-Q – This enables multiple CPU cores to simultaneously use the CUDA architecture cores on a single Kepler GPU. This dramatically increases GPU utilization, slashing CPU idle times and advancing programmability. Hyper-Q is ideal for cluster applications that use MPI.

“We designed Kepler with an eye towards three things: performance, efficiency and accessibility,” said Jonah Alben, senior vice president of GPU Engineering and principal architect of Kepler at NVIDIA. “It represents an important milestone in GPU-accelerated computing and should foster the next wave of breakthroughs in computational research.”

NVIDIA Tesla K10 and K20 GPUs
The NVIDIA Tesla K10 GPU delivers the world’s highest throughput for signal, image and seismic processing applications. Optimized for customers in oil and gas exploration and the defense industry, a single Tesla K10 accelerator board features two GK104 Kepler GPUs that deliver an aggregate performance of 4.58 teraflops of peak single-precision floating point and 320 GB per second memory bandwidth.

The NVIDIA Tesla K20 GPU is the new flagship of the Tesla GPU product family, designed for the most computationally intensive HPC environments. Expected to be the world’s highest-performance, most energy-efficient GPU, the Tesla K20 is planned to be available in the fourth quarter of 2012.

The Tesla K20 is based on the GK110 Kepler GPU. This GPU delivers three times more double precision compared to Fermi architecture-based Tesla products and it supports the Hyper-Q and dynamic parallelism capabilities. The GK110 GPU is expected to be incorporated into the new Titan supercomputer at the Oak Ridge National Laboratory in Tennessee and the Blue Waters system at the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign.

“In the two years since Fermi was launched, hybrid computing has become a widely adopted way to achieve higher performance for a number of critical HPC applications,” said Earl C. Joseph, program vice president of High-Performance Computing at IDC. “Over the next two years, we expect that GPUs will be increasingly used to provide higher performance on many applications.”

Preview of CUDA 5 Parallel Programming Platform
In addition to the Kepler architecture, NVIDIA today released a preview of the CUDA 5 parallel programming platform. Available to more than 20,000 members of NVIDIA’s GPU Computing Registered Developer program, the platform will enable developers to begin exploring ways to take advantage of the new Kepler GPUs, including dynamic parallelism.

The CUDA 5 parallel programming model is planned to be widely available in the third quarter of 2012. Developers can get access to the preview release by signing up for the GPU Computing Registered Developer program on the CUDA website.

Source: NVIDIA

NVIDIA Updates CUDA: Major Release for Science Research

Subject: General Tech, Graphics Cards | January 28, 2012 - 11:53 PM |
Tagged: nvidia, gpgpu, CUDA

NVIDIA has traditionally been very interested in acquiring room in the high-performance computing for scientific research market. For a lot of functions, having a fast and highly parallel processor saves time and money compared to having a traditional computer crunch away or having to book time with one of the world’s relatively few supercomputers. Despite the raw performance of a GPU, adequate development tools are required to bring the simulation or calculation into a functional program to execute on said GPU. NVIDIA is said to have had a strong lead with their CUDA platform for quite some time; that lead will likely continue with releases the size of this one.

MOD-9981_CUDAVisualProfiler.jpg

What does a tuned up GPU purr like? Cuda cuda cuda cuda cuda.

The most recent release, CUDA 4.1, has three main features:

  • A visual profiler to point out common mistakes and optimizations and to provide instructions which detail how to alter your code to increase your performance
  • A new compiler which is based on the LLVM infrastructure, making good on their promise to open the CUDA platform to other architectures -- both software and hardware
  • New image and signal processing functions for their NVIDIA Performance Primitives (NPP) library, relieving developers the need to create their own versions or license a proprietary library

The three features, as NVIDIA describes them in their press release, are listed below.

New Visual Profiler - Easiest path to performance optimization
The new Visual Profiler makes it easy for developers at all experience levels to optimize their code for maximum performance. Featuring automated performance analysis and an expert guidance system that delivers step-by-step optimization suggestions, the Visual Profiler identifies application performance bottlenecks and recommends actions, with links to the optimization guides. Using the new Visual Profiler, performance bottlenecks are easily identified and actionable.

LLVM Compiler - Instant 10 percent increase in application performance
LLVM is a widely-used open-source compiler infrastructure featuring a modular design that makes it easy to add support for new programming languages and processor architectures. Using the new LLVM-based CUDA compiler, developers can achieve up to 10 percent additional performance gains on existing GPU-accelerated applications with a simple recompile. In addition, LLVM's modular design allows third-party software tool developers to provide a custom LLVM solution for non-NVIDIA processor architectures, enabling CUDA applications to run across NVIDIA GPUs, as well as those from other vendors.

New Image, Signal Processing Library Functions - "Drop-in" Acceleration with NPP Library
NVIDIA has doubled the size of its NPP library, with the addition of hundreds of new image and signal processing functions. This enables virtually any developer using image or signal processing algorithms to easily gain the benefit of GPU acceleration, with the simple addition of library calls into their application. The updated NPP library can be used for a wide variety of image and signal processing algorithms, ranging from basic filtering to advanced workflows.
 

Source: NVIDIA

Give your project good CARMA, get a CUDA on ARM dev kit!

Subject: General Tech, Graphics Cards, Processors | December 20, 2011 - 01:34 AM |
Tagged: nvidia, CUDA, CARMA, capital letters, arm

Okay so the pun was a little obvious, but NVIDIA has just announced the specifications and name for the development kit used to develop for their ARM-based GPU computing platform. The development kit will provide a method to build and test applications on a platform similar to what will be found in the Barcelona Supercomputing Centre’s upcoming GPU supercomputer until you are ready to deploy the finished application with real data on the real machine. Such is the life of a development units.

CARMA.jpg

Carma: What goes around, comes around... right Intel?

The development kit is quite modest in its specifications:

  • Tegra3 ARM A9 CPU
  • Quadro 1000M GPU (96 CUDA Cores)
  • 2GB system RAM, 2GB GPU RAM
  • 4x PCIe Gen1 CPU to GPU link
  • 1000Base-T networking support
  • SATA, HDMI, DisplayPort, USB.
 
While the specifications are somewhere between a high-end tablet and a modest workstation, the real story is the continued progress by NVIDIA into the High Performance Computing (HPC) market. NVIDIA seems to be certain that they are able to (ARM-)wrestle more market share from Intel and other players such as IBM on the high performance front. Many would probably speculate about NVIDIA’s crushing in towards the home market from both ends, but I expect that creating a compelling ARM product for a desktop PC will never be the problem for NVIDIA: it is a lack of anything compelling to run on it these days for a desktop user.
Source: NVIDIA Blogs

CUDA been done sooner! NVIDIA open sources CUDA platform

Subject: General Tech, Graphics Cards, Processors | December 15, 2011 - 01:03 AM |
Tagged: CUDA

NVIDIA lays as the current front-runner for the “Last Year’s Best Decision, This Year” award. You may remember our coverage last June of the AMD Fusion Developer Summit; industry members such as ARM, Microsoft, and of course AMD discussed the potential of utilizing specialized processors and developing on open platforms such as OpenCL and Microsoft’s announced C++ AMP. Do you know what would have been an amazing announcement for AFDS to stomp OpenCL and C++ AMP? That NVIDIA would open up CUDA. Know what announcement missed that bus by a whole half a year? NVIDIA will open up CUDA.

GPGPU-Trail.png

Your platform pooh-pooh? Bear a CUDA.

While I just harassed NVIDIA for their timing, it might not be too late. CUDA is still a powerhouse of a GPGPU platform with substantial software support from absolute mammoth software packages such as Adobe Creative Suite to smaller projects like KGPU. With the open sourcing of the CUDA compiler, NVIDIA is also permitting manufacturers like AMD and even Intel to support CUDA with their GPUs, x86 CPUs, and other processing units. While I am excited at this outcome, I am still somewhat confused about NVIDIA’s timing: they are just a little late to open up and crush the market, and they seem quite abrupt if they originally intended CUDA to survive as a forever-proprietary computing platform.

Source: NVIDIA