NVIDIA Announces Tesla V100 with Volta GPU at GTC 2017

Subject: Graphics Cards | May 10, 2017 - 01:32 PM |
Tagged: v100, tesla, nvidia, gv100, gtc 2017

During the opening keynote to NVIDIA’s GPU Technology Conference, CEO Jen-Hsun Huang formally unveiled the latest GPU architecture and the first product based on it. The Tesla V100 accelerator is based on the Volta GPU architecture and features some amazingly impressive specifications. Let’s take a look.

  Tesla V100 GTX 1080 Ti Titan X (Pascal) GTX 1080 GTX 980 Ti TITAN X GTX 980 R9 Fury X R9 Fury
GPU GV100 GP102 GP102 GP104 GM200 GM200 GM204 Fiji XT Fiji Pro
GPU Cores 5120 3584 3584 2560 2816 3072 2048 4096 3584
Base Clock - 1480 MHz 1417 MHz 1607 MHz 1000 MHz 1000 MHz 1126 MHz 1050 MHz 1000 MHz
Boost Clock 1455 MHz 1582 MHz 1480 MHz 1733 MHz 1076 MHz 1089 MHz 1216 MHz - -
Texture Units 320 224 224 160 176 192 128 256 224
ROP Units 128 (?) 88 96 64 96 96 64 64 64
Memory 16GB 11GB 12GB 8GB 6GB 12GB 4GB 4GB 4GB
Memory Clock 878 MHz (?) 11000 MHz 10000 MHz 10000 MHz 7000 MHz 7000 MHz 7000 MHz 500 MHz 500 MHz
Memory Interface 4096-bit (HBM2) 352-bit 384-bit G5X 256-bit G5X 384-bit 384-bit 256-bit 4096-bit (HBM) 4096-bit (HBM)
Memory Bandwidth 900 GB/s 484 GB/s 480 GB/s 320 GB/s 336 GB/s 336 GB/s 224 GB/s 512 GB/s 512 GB/s
TDP 300 watts 250 watts 250 watts 180 watts 250 watts 250 watts 165 watts 275 watts 275 watts
Peak Compute 15 TFLOPS 10.6 TFLOPS 10.1 TFLOPS 8.2 TFLOPS 5.63 TFLOPS 6.14 TFLOPS 4.61 TFLOPS 8.60 TFLOPS 7.20 TFLOPS
Transistor Count 21.1B 12.0B 12.0B 7.2B 8.0B 8.0B 5.2B 8.9B 8.9B
Process Tech 12nm 16nm 16nm 16nm 28nm 28nm 28nm 28nm 28nm
MSRP (current) lol $699 $1,200 $599 $649 $999 $499 $649 $549

While we are low on details today, it appears that the fundamental compute units of Volta are similar to that of Pascal. The GV100 has 80 SMs with 40 TPCs and 5120 total CUDA cores, a 42% increase over the GP100 GPU used on the Tesla P100 and 42% more than the GP102 GPU used on the GeForce GTX 1080 Ti. The structure of the GPU remains the same GP100 with the CUDA cores organized as 64 single precision (FP32) per SM and 32 double precision (FP64) per SM.

image7.png

Click to Enlarge

Interestingly, NVIDIA has already told us the clock speed of this new product as well, coming in at 1455 MHz Boost, more than 100 MHz lower than the GeForce GTX 1080 Ti and 25 MHz lower than the Tesla P100.

SXM2-VoltaChipDetails.png

Click to Enlarge

Volta adds in support for a brand new compute unit though, known as Tensor Cores. With 640 of these on the GPU die, NVIDIA directly targets the neural network and deep learning fields. If this is your first time hearing about Tensor, you should read up on its influence on the hardware markets, bringing forth an open-source software library for machine learning. Google has invested in a Tensor-specific processor already, and now NVIDIA throws its hat in the ring.

Adding Tensor Cores to Volta allows the GPU to do mass processing for deep learning, on the order of a 12x improvement over Pascal’s capabilities using CUDA cores only.

07.jpg

For users interested in standard usage models, including gaming, the GV100 GPU offers 1.5x improvement in FP32 computing, up to 15 TFLOPS of theoretical performance and 7.5 TFLOPS of FP64. Other relevant specifications include 320 texture units, a 4096-bit HBM2 memory interface and 16GB of memory on-module. NVIDIA claims a memory bandwidth of 900 GB/s which works out to 878 MHz per stack.

Maybe more impressive is the transistor count: 21.1 BILLION! NVIDIA claims that this is the largest chip you can make physically with today’s technology. Considering it is being built on TSMC's 12nm FinFET technology and has an 815 mm2 die size, I see no reason to doubt them.

03.jpg

Shipping is scheduled for Q3 for Tesla V100 – at least that is when NVIDIA is promising the DXG-1 system using the chip is promised to developers.

I know many of you are interested in the gaming implications and timelines – sorry, I don’t have an answer for you yet. I will say that the bump from 10.6 TFLOPS to 15 TFLOPS is an impressive boost! But if the server variant of Volta isn’t due until Q3 of this year, I find it hard to think NVIDIA would bring the consumer version out faster than that. And whether or not NVIDIA offers gamers the chip with non-HBM2 memory is still a question mark for me and could directly impact performance and timing.

More soon!!

Source: NVIDIA

Tencent Purchases 5% of Tesla Motors

Subject: General Tech | March 29, 2017 - 03:04 AM |
Tagged: tesla, tencent

Five percent of Tesla Motors has just been purchased by Tencent Holdings Limited. For our audience, this could be interesting in two ways. First, Tesla Motors is currently home to Jim Keller, who designed several CPUs architectures at AMD and Apple, including AMD’s K8, Apple’s A4 and A5, and AMD’s recent Zen. Second, Tencent has been purchasing minority chunks of several companies, including almost half of Epic Games, five percent of Activision Blizzard, and a few others, but the move into automotive technologies is somewhat new for them.

tesla-2016-logo2.png

From Tesla’s perspective, Tencent could be strong leverage into the Chinese market. In fact, Elon Musk tweeted to Bloomberg Business that they are glad to have Tencent “as an investor and advisor. Clearly, this means that they consider Tencent to be, in some fashion, an adviser for the company.

Personally, I’m curious how Tencent will affect the energy side of the company, including their subsidiary, SolarCity. I don’t really have anything to base this on, since it’s just as “out of left field” for Tencent as automotive technologies, but it’s something I’ll be occasionally glancing at none-the-less.

Source: Ars Technica

Tesla stores your Owner Authentication token in plain text ... which leads to a bad Ashton Kutcher movie

Subject: General Tech | November 25, 2016 - 12:52 PM |
Tagged: Android, Malware, hack, tesla, security

You might expect better from Tesla and Elon Musk but apparently you would be dissappointed as the OAuth token in your cars mobile app is stored in plain text.  The token is used to control your Tesla and is generated when you enter in your username and password.  It is good for 90 days, after which it requires you to log in again so a new token can be created.  Unfortunately, since that token is stored as plain text, someone who gains access to your Android phone can use that token to open your cars doors, start the engine and drive away.  Getting an Android user to install a malicious app which would allow someone to take over their device has proven depressingly easy.  Comments on Slashdot suggest it is unreasonable to blame Tesla for security issues in your devices OS, which is hard to argue; on the other hand it is impossible for Telsa to defend choosing to store your OAuth in plain text.

images.jpg

"By leveraging security flaws in the Tesla Android app, an attacker can steal Tesla cars. The only hard part is tricking Tesla owners into installing an Android app on their phones, which isn't that difficult according to a demo video from Norwegian firm Promon. This malicious app can use many of the freely available Android rooting exploits to take over the user's phone, steal the OAuth token from the Tesla app and the user's login credentials."

Here is some more Tech News from around the web:

Tech Talk

Source: Slashdot

Podcast #417 - Maximus VIII Forumla, MoCA adapters, GFE logins and more!!

Subject: General Tech | September 15, 2016 - 01:58 PM |
Tagged: VR, video, tesla, Silverstone, podcast, nvidia, msi, MoCA, Maximus VIII Formula, MasterLiquid, holodeck, GFE, geforce experience, euclideon, cooler master, asus, actiontec

PC Perspective Podcast #417 - 09/15/16

Join us this week as we discuss the Maximus VIII Forumla, MoCA adapters, GFE logins and more!

You can subscribe to us through iTunes and you can still access it directly through the RSS page HERE.

The URL for the podcast is: http://pcper.com/podcast - Share with your friends!

Hosts:  Ryan Shrout, Allyn Malventano, Josh Walrath and Jeremy Hellstrom

Program length: 1:36:39
  1. Week in Review:
  2. This episode is brought to you by Casper! (Use code “pcper”)
  3. News items of interest:
  4. Hardware/Software Picks of the Week
  5. Closing/outro

Eight is enough, looking at how the new Telsa HPC cards from NVIDIA will work

Subject: General Tech | September 14, 2016 - 01:06 PM |
Tagged: pascal, tesla, p40, p4, nvidia, neural net, m40, M4, HPC

The Register have package a nice explanation of the basics of how neural nets work in their quick look at NVIDIA's new Pascal based HPC cards, the P4 and P40.  The tired joke about Zilog or Dick Van Patten stems from the research which has shown that 8-bit precision is most effective when feeding data into a neural net.  Using 16 or 32-bit values slows the processing down significantly while adding little precision to the results produced.  NVIDIA is also perfecting a hybrid mode, where you can opt for a less precise answer produced by your local, presumably limited, hardware or you can upload the data to the cloud for the full treatment.  This is great for those with security concerns or when a quicker answer is more valuable than a more accurate one.

As for the hardware, NVIDIA claims the optimizations on the P40 will make it "40 times more efficient" than an Intel Xeon E5 CPU and it will also provide slightly more throughput than the currently available Titan X.  You can expect to see these arrive in the market sometime over then next two months.

newtesla.PNG

"Nvidia has designed a couple of new Tesla processors for AI applications – the P4 and the P40 – and is talking up their 8-bit math performance. The 16nm FinFET GPUs use Nv's Pascal architecture and follow on from the P100 launched in June. The P4 fits on a half-height, half-length PCIe card for scale-out servers, while the beefier P40 has its eyes set on scale-up boxes."

Here is some more Tech News from around the web:

Tech Talk

Source: The Register

NVIDIA Announces PCIe Versions of Tesla P100

Subject: Graphics Cards | June 20, 2016 - 01:57 PM |
Tagged: tesla, pascal, nvidia, GP100

GP100, the “Big Pascal” chip that was announced at GTC, will be coming to PCIe for enterprise and supercomputer customers in Q4 2016. Previously, it was only announced using NVIDIA's proprietary connection. In fact, they also gave themselves some lead time with their first-party DGX-1 system, which retails for $129,000 USD, although we expect that was more for yield reasons. Josh calculated that each GPU in that system is worth more than the full wafer that its die was manufactured on.

nvidia-2016-gp100tesla.jpg

This brings us to the PCIe versions. Interestingly, they have been down-binned from the NVLink version. The boost clock has been dropped to 1300 MHz, from 1480 MHz, although that is matched with a slightly lower TDP (250W versus the NVLink's 300W). This lowers the FP16 performance to 18.7 TFLOPs, down from 21.2, FP32 performance to 9.3 TFLOPs, down from 10.6, and FP64 performance to 4.7 TFLOPs, down from 5.3. This is where we get to the question: did NVIDIA reduce the clocks to hit a 250W TDP and be compatible with the passive cooling technology that previous Tesla cards utilize, or were the clocks dropped to increase yield?

They are also providing a 12GB version of the PCIe Tesla P100. I didn't realize that GPU vendors could selectively disable HBM2 stacks, but NVIDIA disabled 4GB of memory, which also dropped the bus width to 3072-bit. You would think that the simplicity of the circuit would want to divide work in a power-of-two fashion, but, knowing that they can, it makes me wonder why they did. Again, my first reaction is to question GP100 yield, but you wouldn't think that HBM, being such a small part of the die, is something that they can reclaim a lot of chips by disabling a chunk, right? That is, unless the HBM2 stacks themselves have yield issues -- which would be interesting.

There is also still no word on a 32GB version. Samsung claimed the memory technology, 8GB stacks of HBM2, would be ready for products in Q4 2016 or early 2017. We'll need to wait and see where, when, and why it will appear.

Source: NVIDIA
Manufacturer: NVIDIA

First, Some Background

 
TL;DR:
NVIDIA's Rumored GP102
 
Based on two rumors, NVIDIA seems to be planning a new GPU, called GP102, that sits between GP100 and GP104. This changes how their product stack flowed since Fermi and Kepler. GP102's performance, both single-precision and double-precision, will likely signal NVIDIA's product plans going forward.
  • - GP100's ideal 1 : 2 : 4 FP64 : FP32 : FP16 ratio is inefficient for gaming
  • - GP102 either extends GP104's gaming lead or bridges GP104 and GP100
  • - If GP102 is a bigger GP104, the future is unclear for smaller GPGPU devs
    • This is, unless GP100 can be significantly up-clocked for gaming.
  • - If GP102 matches (or outperforms) GP100 in gaming, and has better than 1 : 32 double-precision performance, then GP100 would be the first time that NVIDIA designed an enterprise-only, high-end GPU.
 

 

When GP100 was announced, Josh and I were discussing, internally, how it would make sense in the gaming industry. Recently, an article on WCCFTech cited anonymous sources, which should always be taken with a dash of salt, that claimed NVIDIA was planning a second architecture, GP102, between GP104 and GP100. As I was writing this editorial about it, relating it to our own speculation about the physics of Pascal, VideoCardz claims to have been contacted by the developers of AIDA64, seemingly on-the-record, also citing a GP102 design.

I will retell chunks of the rumor, but also add my opinion to it.

nvidia-titan-black-1.jpg

In the last few generations, each architecture had a flagship chip that was released in both gaming and professional SKUs. Neither audience had access to a chip that was larger than the other's largest of that generation. Clock rates and disabled portions varied by specific product, with gaming usually getting the more aggressive performance for slightly better benchmarks. Fermi had GF100/GF110, Kepler had GK110/GK210, and Maxwell had GM200. Each of these were available in Tesla, Quadro, and GeForce cards, especially Titans.

Maxwell was interesting, though. NVIDIA was unable to leave 28nm, which Kepler launched on, so they created a second architecture at that node. To increase performance without having access to more feature density, you need to make your designs bigger, more optimized, or more simple. GM200 was giant and optimized, but, to get the performance levels it achieved, also needed to be more simple. Something needed to go, and double-precision (FP64) performance was the big omission. NVIDIA was upfront about it at the Titan X launch, and told their GPU compute customers to keep purchasing Kepler if they valued FP64.

Fast-forward to Pascal.

Manufacturer: NVIDIA

93% of a GP100 at least...

NVIDIA has announced the Tesla P100, the company's newest (and most powerful) accelerator for HPC. Based on the Pascal GP100 GPU, the Tesla P100 is built on 16nm FinFET and uses HBM2.

nvidia-2016-gtc-pascal-banner.png

NVIDIA provided a comparison table, which we added what we know about a full GP100 to:

  Tesla K40 Tesla M40 Tesla P100 Full GP100
GPU GK110 (Kepler) GM200 (Maxwell) GP100 (Pascal) GP100 (Pascal)
SMs 15 24 56 60
TPCs 15 24 28 (30?)
FP32 CUDA Cores / SM 192 128 64 64
FP32 CUDA Cores / GPU 2880 3072 3584 3840
FP64 CUDA Cores / SM 64 4 32 32
FP64 CUDA Cores / GPU 960 96 1792 1920
Base Clock 745 MHz 948 MHz 1328 MHz TBD
GPU Boost Clock 810/875 MHz 1114 MHz 1480 MHz TBD
FP64 GFLOPS 1680 213 5304 TBD
Texture Units 240 192 224 240
Memory Interface 384-bit GDDR5 384-bit GDDR5 4096-bit HBM2 4096-bit HBM2
Memory Size Up to 12 GB Up to 24 GB 16 GB TBD
L2 Cache Size 1536 KB 3072 KB 4096 KB TBD
Register File Size / SM 256 KB 256 KB 256 KB 256 KB
Register File Size / GPU 3840 KB 6144 KB 14336 KB 15360 KB
TDP 235 W 250 W 300 W TBD
Transistors 7.1 billion 8 billion 15.3 billion 15.3 billion
GPU Die Size 551 mm2 601 mm2 610 mm2 610mm2
Manufacturing Process 28 nm 28 nm 16 nm 16nm

This table is designed for developers that are interested in GPU compute, so a few variables (like ROPs) are still unknown, but it still gives us a huge insight into the “big Pascal” architecture. The jump to 16nm allows for about twice the number of transistors, 15.3 billion, up from 8 billion with GM200, with roughly the same die area, 610 mm2, up from 601 mm2.

nvidia-2016-gp100_block_diagram-1-624x368.png

A full GP100 processor will have 60 shader modules, compared to GM200's 24, although Pascal stores half of the shaders per SM. The GP100 part that is listed in the table above is actually partially disabled, cutting off four of the sixty total. This leads to 3584 single-precision (32-bit) CUDA cores, which is up from 3072 in GM200. (The full GP100 architecture will have 3840 of these FP32 CUDA cores -- but we don't know when or where we'll see that.) The base clock is also significantly higher than Maxwell, 1328 MHz versus ~1000 MHz for the Titan X and 980 Ti, although Ryan has overclocked those GPUs to ~1390 MHz with relative ease. This is interesting, because even though 10.6 TeraFLOPs is amazing, it's only about 20% more than what GM200 could pull off with an overclock.

Continue reading our preview of the NVIDIA Pascal architecture!!

Tesla Motors Hires Peter Bannon of Apple

Subject: Graphics Cards, Processors | February 29, 2016 - 06:48 PM |
Tagged: tesla motors, tesla, SoC, Peter Bannon, Jim Keller

When we found out that Jim Keller has joined Tesla, we were a bit confused. He is highly skilled in processor design, and he moved to a company that does not design processors. Kind of weird, right? There are two possibilities that leap to mind: either he wanted to try something new in life, and Elon Musk hired him for his general management skills, or Tesla wants to get more involved in the production of their SoCs, possibly even designing their own.

tesla-2016-logo2.png

Now Peter Bannon, who was a colleague of Jim Keller at Apple, has been hired by Tesla Motors. Chances are, the both of them were not independently interested in an abrupt career change that led them to the same company. That seems highly unlikely, to say the least. So it appears that Tesla Motors wants experienced chip designers in house. What for? We don't know. This is a lot of talent to just look over the shoulders of NVIDIA and other SoC partners, to make sure they have an upper hand in negotiation. Jim Keller is at Tesla as their “Vice-President of Autopilot Hardware Engineering.” We don't know what Peter Bannon's title will be.

And then, if Tesla Motors does get into creating their own hardware, we wonder what they will do with it. The company has a history of open development and releasing patents (etc.) into the public. That said, SoC design is a highly encumbered field, depending on what they're specifically doing, which we have no idea about.

Source: Eletrek

Podcast #385 - Rise of the Tomb Raider Performance, 3x NVMe M.2 RAID-0, AMD Q1 Offerings

Subject: General Tech | February 4, 2016 - 11:53 AM |
Tagged: video, Trion 150, tesla, steam os, Samsung, rise of the tomb raider, podcast, ocz, NVMe, Jim Keller, amd, 950 PRO

PC Perspective Podcast #385 - 02/04/2016

Join us this week as we discuss Rise of the Tomb Raider performance, a triple RAID-0 NVMe array and more!

You can subscribe to us through iTunes and you can still access it directly through the RSS page HERE.

The URL for the podcast is: http://pcper.com/podcast - Share with your friends!

Hosts: Ryan Shrout, Jeremy Hellstrom, Josh Walrath, and Allyn Malventano

Subscribe to the PC Perspective YouTube Channel for more videos, reviews and podcasts!!