93% of a GP100 at least...
NVIDIA has announced the Tesla P100, the company's newest (and most powerful) accelerator for HPC. Based on the Pascal GP100 GPU, the Tesla P100 is built on 16nm FinFET and uses HBM2.
NVIDIA provided a comparison table, which we added what we know about a full GP100 to:
|Tesla K40||Tesla M40||Tesla P100||Full GP100|
|GPU||GK110 (Kepler)||GM200 (Maxwell)||GP100 (Pascal)||GP100 (Pascal)|
|FP32 CUDA Cores / SM||192||128||64||64|
|FP32 CUDA Cores / GPU||2880||3072||3584||3840|
|FP64 CUDA Cores / SM||64||4||32||32|
|FP64 CUDA Cores / GPU||960||96||1792||1920|
|Base Clock||745 MHz||948 MHz||1328 MHz||TBD|
|GPU Boost Clock||810/875 MHz||1114 MHz||1480 MHz||TBD|
|Memory Interface||384-bit GDDR5||384-bit GDDR5||4096-bit HBM2||4096-bit HBM2|
|Memory Size||Up to 12 GB||Up to 24 GB||16 GB||TBD|
|L2 Cache Size||1536 KB||3072 KB||4096 KB||TBD|
|Register File Size / SM||256 KB||256 KB||256 KB||256 KB|
|Register File Size / GPU||3840 KB||6144 KB||14336 KB||15360 KB|
|TDP||235 W||250 W||300 W||TBD|
|Transistors||7.1 billion||8 billion||15.3 billion||15.3 billion|
|GPU Die Size||551 mm2||601 mm2||610 mm2||610mm2|
|Manufacturing Process||28 nm||28 nm||16 nm||16nm|
This table is designed for developers that are interested in GPU compute, so a few variables (like ROPs) are still unknown, but it still gives us a huge insight into the “big Pascal” architecture. The jump to 16nm allows for about twice the number of transistors, 15.3 billion, up from 8 billion with GM200, with roughly the same die area, 610 mm2, up from 601 mm2.
A full GP100 processor will have 60 shader modules, compared to GM200's 24, although Pascal stores half of the shaders per SM. The GP100 part that is listed in the table above is actually partially disabled, cutting off four of the sixty total. This leads to 3584 single-precision (32-bit) CUDA cores, which is up from 3072 in GM200. (The full GP100 architecture will have 3840 of these FP32 CUDA cores -- but we don't know when or where we'll see that.) The base clock is also significantly higher than Maxwell, 1328 MHz versus ~1000 MHz for the Titan X and 980 Ti, although Ryan has overclocked those GPUs to ~1390 MHz with relative ease. This is interesting, because even though 10.6 TeraFLOPs is amazing, it's only about 20% more than what GM200 could pull off with an overclock.
Subject: Graphics Cards | March 17, 2015 - 05:47 PM | Ryan Shrout
Tagged: pascal, nvidia, gtc 2015, GTC, geforce
At the keynote of the GPU Technology Conference (GTC) today, NVIDIA CEO Jen-Hsun Huang disclosed some more updates on the roadmap for future GPU technologies.
Most of the detail was around Pascal, due in 2016, that will introduce three new features including mixed compute precision, 3D (stacked) memory, and NVLink. Mixed precision is a method of computing in FP16, allowing calculations to run much faster at lower accuracy than full single or double precision when they are not necessary. Keeping in mind that Maxwell doesn't have an implementation with full speed DP compute (today), it would seem that NVIDIA is targeting different compute tasks moving forward. Though details are short, mixed precision would likely indicate processing cores than can handle both data types.
3D memory is the ability to put memory on-die with the GPU directly to improve overall memory banwidth. The visual diagram that NVIDIA showed on stage indicated that Pascal would have 750 GB/s of bandwidth, compared to 300-350 GB/s on Maxwell today.
NVLink is a new way of connecting GPUs, improving on bandwidth by more than 5x over current implementations of PCI Express. They claim this will allow for connecting as many as 8 GPUs for deep learning performance improvements (up to 10x). What that means for gaming has yet to be discussed.
NVIDIA made some other interesting claims as well. Pascal will be more than 2x more performance per watt efficient than Maxwell, even without the three new features listed above. It will also ship (in a compute targeted product) with a 32GB memory system compared to the 12GB of memory announced on the Titan X today. Pascal will also have 4x the performance in mixed precision compute.
Subject: Graphics Cards, Shows and Expos | March 17, 2015 - 02:31 PM | Ryan Shrout
Tagged: nvidia, video, GTC, gtc 2015
NVIDIA is streaming today's keynote from the GPU Technology Conference (GTC) on Ustream, and we have the embed below for you to take part. NVIDIA CEO Jen-Hsun Huang will reveal the details about the new GeForce GTX TITAN X but there are going to be other announcements as well, including one featuring Tesla CEO Elon Musk.
Should be interesting!
Subject: General Tech | May 1, 2014 - 06:47 PM | Ken Addison
Tagged: nvidia, shield, Portal, GTC, Cake, lie
Sometimes I feel like this job just keeps getting stranger and stranger. Today is no expection.
After reciving just a tracking number, and no additional information from NVIDIA earlier this week, the mystery package finally arrived today. Upon initial inspection we had no idea what to expect.
When we opened the box, we were greeted by a polystyrene cooler with the logo of Bake Me a Wish, which only served to confuse us more.
As we opened the cooler, and the subsequent box inside of it, things started to make more sense.
Inside the box, we were greeted by a chocolate cake, accompanied by a card from NVIDIA.
As you may remember at this year's GTC Conference, NVIDIA announced that they had ported Valve's Portal to Android and would be releasing it for SHIELD. Today we were greeted with a reminder of that, and the message that we should be able to try it for ourselves.
A teaser from this year's GTC Keynote
While we can't talk about our experiences with Portal just yet, stay tuned to PC Perspective for more coverage of the NVIDIA SHIELD and Portal very soon!
Subject: General Tech | April 8, 2014 - 09:03 PM | Tim Verry
Tagged: research, nvidia, GTC, gpgpu, global impact award
During the GPU Technology Conference last month, NVIDIA introduced a new annual grant called the Global Impact Award. The grant awards $150,000 to researchers using NVIDIA GPUs to research issues with worldwide impact such as disease research, drug design, medical imaging, genome mapping, urban planning, and other "complex social and scientific problems."
NVIDIA will be presenting the Global Impact Award to the winning researcher or non-profit institution at next year's GPU Technology Conference (GTC 2015). Individual researchers, universities, and non-profit research institutions that are using GPUs as a significant enabling technology in their research are eligible for the grant. Both third party and self-nomiations (.doc form) are accepted with the nominated candidates being evaluated based on several factors including the level of innovation, social impact, and current state of the research and its effectiveness in approaching the problem. Submissions for nominations are due by December 12, 2014 with the finalists being announced by NVIDIA on March 13, 2015. NVIDIA will then reveal the winner of the $150,000 grant at GTC 2015 (April 28, 2015).
The researcher, university, or non-profit firm can be located anywhere in the world, and the grant money can be assigned to a department, initiative, or a single project. The massively parallel nature of modern GPUs makes them ideal for many times of research with scalable projects, and I think the Global Impact Award is a welcome incentive to encourage the use of GPGPU in applicable research projects. I am interested to see what the winner will do with the money and where the research leads.
More information on the Global Impact Award can be found on the NVIDIA website.
Subject: General Tech | March 27, 2014 - 06:42 PM | Ken Addison
Tagged: W9100, video, titan z, poseidon 780, podcast, Oculus, nvidia, GTC, GDC
PC Perspective Podcast #293 - 03/27/2014
Join us this week as we discuss the NVIDIA Titan-Z, ASUS ROG Poseidon 780, News from OculusVR and more!
The URL for the podcast is: http://pcper.com/podcast - Share with your friends!
- iTunes - Subscribe to the podcast directly through the Store
- RSS - Subscribe through your regular RSS reader
- MP3 - Direct download link to the MP3 file
Hosts: Ryan Shrout, Jeremy Hellstrom, Josh Walrath and Allyn Malventano
Week in Review:
0:37:07 This podcast is brought to you by Coolermaster, and the CM Storm Pulse-R Gaming Headset
News items of interest:
Hardware/Software Picks of the Week:
Josh: Certainly not a Skype Connection to the Studio
Allyn: Continuous ink conversions
Subject: General Tech | March 27, 2014 - 05:10 PM | Jeremy Hellstrom
Tagged: pascal, nvlink, nvidia, maxwell, jen-hsun huang, GTC
Before we get to see Volta in action NVIDIA is taking a half step and releasing the Pascal architecture which will use Maxwell-like Streaming Multiprocessors and will introduce stacked or 3D memory which will reside on the same substrate as the GPU. Jen-Hsun claimed this new type of memory will vastly increase the bandwidth available, provide two and a half times the capacity and be four times as energy efficient at the same time. Along with the 3D memory announcement was the revealing of NVLink, an alternative interconnect which he claims will offer 5-12 times the bandwidth of PCIe and will be utilized by HPC systems. From his announcement that NVLink will feature eight 20Gbps lanes per block or as NVIDIA is calling them, bricks, which The Tech Report used to make a quick calculation and came up with an aggregate bandwidth of a brick of around 20GB/s. Read on to see what else was revealed.
"Today during his opening keynote at the Nvidia GPU Technology Conference, CEO Jen-Hsun Huang offered an update to Nvidia's GPU roadmap. The big reveal was about a GPU code-named Pascal, which will be a generation beyond the still-being-introduced Maxwell architecture in the firm's plans."
Here is some more Tech News from around the web:
- Nvidia, VMware join to pipe high-quality 3D graphics from the cloud @ The Register
- Android has 97 Percent of Mobile Malware, But Nearly None in the U.S. @ DailyTech
- Amazon HALVES cloud storage prices after Google's shock slash @ The Register
- Bitcoin mining malware hits Android @ The Inquirer
- Facebook Oculus VR buy causes rift with developers and tech fans @ The Inquirer
- iSAW EXtreme Action Camera @ Kitguru
- Netgear VueZone VZSX2800 Wireless Surveillance Camera Kit @ eTeknix
Subject: General Tech | April 12, 2013 - 06:08 AM | Tim Verry
Tagged: SECO, nvidia, mini ITX, kepler, kayla, GTC 13, GTC, CUDA, arm
Last month, NVIDIA revealed its Kayla development platform that combines a quad core Tegra System on a Chip (SoC) with a NVIDIA Kepler GPU. Kayla will out later this year, but that has not stopped other board makers from putting together their own solutions. One such solution that began shipping earlier this week is the mITX GPU Devkit from SECO.
The new mITX GPU Devkit is a hardware platform for developers to program CUDA applications for mobile devices, desktops, workstations, and HPC servers. It combines a NVIDIA Tegra 3 processor, 2GB of RAM, and 4GB of internal storage (eMMC) on a Qseven module with a Mini-ITX form factor motherboard. Developers can then plug their own CUDA-capable graphics card into the single PCI-E 2.0 x16 slot (which actually runs at x4 speeds). Additional storage can be added via an internal SATA connection, and cameras can be hooked up using the CIC headers.
Rear IO on the mITX GPU Devkit includes:
- 1 x Gigabit Ethernet
- 3 x USB
- 1 x OTG port
- 1 x HDMI
- 1 x Display Port
- 3 x Analog audio
- 2 x Serial
- 1 x SD card slot
The SECO platform is a proving to be popular for GPGPU in the server space, especially with systems like Pedraforca. The intention of using these types of platforms in servers is to save power by using a low power ARM chip for inter-node communication and basic tasks while the real computing is done solely on the graphics cards. With Intel’s upcoming Haswell-based Xeon chips getting down to 13W TPDs though, systems like this are going to be more difficult to justify. SECO is mostly positioning this platform as a development board, however. One use in that respect is to begin optimizing GPU-accelerated code for mobile devices. With future Tegra chips to get CUDA-compatible graphics cards, new software development and optimization of existing GPGPU code for smartphones and tablet will be increasingly important.
Either way, the SECO mITX GPU Devkit is available now for 349 EUR or approximately $360 (in both cases, before any taxes).
Subject: General Tech | April 1, 2013 - 12:43 AM | Tim Verry
Tagged: nvidia, lenovo yoga, GTC 2013, GTC, gesture control, eyesight, ECS
During the Emerging Companies Summit at NVIDIA's GPU Technology Conference, Israeli company EyeSight Mobile Technologies' CEO Gideon Shmuel took the stage to discuss the future of its gesture recognition software. He also provided insight into how EyeSight plans to use graphics cards to improve and accelerate the process of identifying, and responding to, finger and hand movements along with face detection.
EyeSight is a five year old company that has developed gesture recognition software that can be installed on existing machines (though it appears to be aimed more at OEMs than directly to consumers). It can use standard cameras, such as webcams, to get its 2D input data and then gets a relative Z-axis from proprietary algorithms. This gives EyeSight essentially 2.5D of input data, and camera resolution and frame rate permitting, allows the software to identify and track finger and hand movements. EyeSight CEO Gideon Shmuel stated at the ECS presentation that the software is currently capable of "finger-level accuracy" at 5 meters from a TV.
Gestures include the ability to use your fingers as a mouse to point at on-screen objects, waving your hand to turn pages, scrolling, and even give hand signal cues.
The software is not open source, and there are no plans to move in that direction. The company has 15 patents pending on its technology, several of which it managed to file before the US Patent Office changed from First to Invent to First Inventor to File (heh, which is another article...). The software will support up to 20 million hardware devices in 2013, and EyeSight expects the number of compatible camera-packing devices to increase further to as many as 3.5 billion in 2015. Other features include the ability transparently map EyeSight input to Android apps without user's needing to muck with settings, and the ability to detect faces and "emotional signals" even in low light. According to the website, SDKs are available for Windows, Linux, and Android. The software maps the gestures it recognizes to Windows shortcuts, to increase compatibility with many existing applications (so long as they support keyboard shortcuts).
Currently, the EyeSight software is mostly run on the CPU, but the company is heavily investing into incorporating GPU support. Moving the processing to GPUs will allow the software to run faster and more power efficiently, especially on mobile devices (NVIDIA's Tegra platform was specifically mentioned). EyeSight's future road-map includes using GPU acceleration to bolster the number of supported gestures, move image processing to the GPUs, add velocity and vector control inputs, incorporate a better low-light filter (which will run on the GPU), and offload processing from the CPU to optimize power management and save CPU resources for the OS and other applications which is especially important for mobile devices. Gideon Shmuel also stated that he wants to see the technology being used on "anything with a display" from your smartphone to your air conditioner.
A basic version of the EyeSight input technology reportedly comes installed on the Lenovo Yoga convertible tablet. I think this software has potential, and would provide that Minority Report-like interaction that many enthusiasts wish for. Hopefully, EyeSight can deliver on its claimed accuracy figures and OEMs will embrace the technology by integrating it into future devices.
EyeSight has posted additional video demos and information about its touch-free technology on its website.
Do you think this "touch-free" gesture technology has merit, or will this type of input remain limited to awkward-integration in console games?
Subject: General Tech, Graphics Cards | March 20, 2013 - 05:47 PM | Tim Verry
Tagged: tesla, tegra 3, supercomputer, pedraforca, nvidia, GTC 2013, GTC, graphics cards, data centers
There is a lot of talk about heterogeneous computing at GTC, in the sense of adding graphics cards to servers. If you have HPC workloads that can benefit from GPU parallelism, adding GPUs gives you computing performance in less physical space, and using less power, than a CPU only cluster (for equivalent TFLOPS).
However, there was a session at GTC that actually took things to the opposite extreme. Instead of a CPU only cluster or a mixed cluster, Alex Ramirez (leader of Heterogeneous Architectures Group at Barcelona Supercomputing Center) is proposing a homogeneous GPU cluster called Pedraforca.
Pedraforca V2 combines NVIDIA Tesla GPUs with low power ARM processors. Each node is comprised of the following components:
- 1 x Mini-ITX carrier board
1 x Q7 module (which hosts the ARM SoC and memory)
- Current config is one Tegra 3 @ 1.3GHz and 2GB DDR2
- 1 x NVIDIA Tesla K20 accelerator card (1170 GFLOPS)
- 1 x InfiniBand 40Gb/s card (via Mellanox ConnectX-3 slot)
- 1 x 2.5" SSD (SATA 3 MLC, 250GB)
The ARM processor is used solely for booting the system and facilitating GPU communication between nodes. It is not intended to be used for computing. According to Dr. Ramirez, in situations where running code on a CPU would be faster, it would be best to have a small number of Intel Xeon powered nodes to do the CPU-favorable computing, and then offload the parallel workloads to the GPU cluster over the InfiniBand connection (though this is less than ideal, Pedraforca would be most-efficient with data-sets that can be processed solely on the Tesla cards).
While Pedraforca is not necessarily locked to NVIDIA's Tegra hardware, it is currently the only SoC that meets their needs. The system requires the ARM chip to have PCI-E support. The Tegra 3 SoC has four PCI-E lanes, so the carrier board is using two PLX chips to allow the Tesla and InfiniBand cards to both be connected.
The researcher stated that he is also looking forward to using NVIDIA's upcoming Logan processor in the Pedraforca cluster. It will reportedly be possible to upgrade existing Pedraforca clusters with the new chips by replacing the existing (Tegra 3) Q7 module with one that has the Logan SoC when it is released.
Pedraforca V2 has an initial cluster size of 64 nodes. While the speaker was reluctant to provide TFLOPS performance numbers, as it would depend on the workload, with 64 Telsa K20 cards, it should provide respectable performance. The intent of the cluster is to save power costs by using a low power CPU. If your sever kernel and applications can run on GPUs alone, there are noticeable power savings to be had by switching from a ~100W Intel Xeon chip to a lower-power (approximately 2-3W) Tegra 3 processor. If you have a kernel that needs to run on a CPU, it is recommended to run the OS on an Intel server and transfer just the GPU work to the Pedraforca cluster. Each Pedraforca node is reportedly under 300W, with the Tesla card being the majority of that figure. Despite the limitations, and niche nature of the workloads and software necessary to get the full power-saving benefits, Pedraforca is certainly an interesting take on a homogeneous server cluster!
In another session relating to the path to exascale computing, power use in data centers was listed as one of the biggest hurdles to getting to Exaflop-levels of performance, and while Pedraforca is not the answer to Exascale, it should at least be a useful learning experience at wringing the most parallelism out of code and pushing GPGPU to the limits. And that research will help other clusters use the GPUs more efficiently as researchers explore the future of computing.
The Pedraforca project built upon research conducted on Tibidabo, a multi-core ARM CPU cluster, and CARMA (CUDA on ARM development kit) which is a Tegra SoC paired with an NVIDIA Quadro card. The two slides below show CARMA benchmarks and a Tibidabo cluster (click on image for larger version).
Stay tuned to PC Perspective for more GTC 2013 coverage!