GTC 2013: Jen-Hsun Huang Takes the Stage to Discuss NVIDIA's Future, New Hardware

Subject: General Tech, Graphics Cards | March 19, 2013 - 02:55 PM |
Tagged: unified virtual memory, ray tracing, nvidia, GTC 2013, grid vca, grid, graphics cards

Today, NVIDIA's CEO Jen-Hsun Huang stepped on stage to present the GTC keynote. In the presentation (which was live streamed on the GTC website and archived here.), NVIDIA discussed five major points, looking back over 2013 and into the future of its mobile and professional products. In addition to the product roadmap, NVIDIA discussed the state of computer graphics and GPGPU software. Remote graphics and GPU virtualization was also on tap. Finally, towards the end of the Keynote, the company revealed its first appliance with the NVIDIA GRID VCA. The culmination of NVIDIA's GRID and GPU virtualization technology, the VCA is a device that hosts up to 16 virtual machines which each can tap into one of 16 Kepler-based graphics processors (8 cards, 16 GPUs per card) to fully hardware accelerate software running of the VCA. Three new mobile Tegra parts and two new desktop graphics processors were also hinted at, with improvements to power efficiency and performance.

View Full Size

On the desktop side of things, NVIDIA's roadmap included two new GPUs. Following Kepler, NVIDIA will introduce Maxwell and Volta. Maxwell will feature a new virtualized memory technology called Unified Virtual Memory. This tech will allow both the CPU and GPU to read from a single (virtual) memory store. Much as with the promise of AMD's Kaveri APU, the Unified Virtual Meory will result in speed improvements in heterogeneous applications because data will not have to be copied to/from the GPU and CPU in order for the data to be processed. Server applications will really benefit from the shared memory tech. NVIDIA did not provide details, but from the sound of it, the CPU and GPU both continue to write to their own physical memory, but their is a layer of virtualized memory on top of that, that will allow the two (or more) different processors to read from each other's memory store.
Following Maxwell, Volta will be a physically smaller chip with more transistors (likely a smaller process node). In addition to the power efficiency improvements over Maxwell, it steps up the memory bandwidth significantly. NVIDIA will use TSV (through silicon via) technology to physically mount the graphics DRAM chips over the GPU (attached to the same silicon substrate electrically). According to NVIDIA, this new TSV-mounted memory will achieve up to 1 Terabytes/second of memory bandwidth, which is a notable increase over existing GPUs.

View Full Size

NVIDIA continues to pursue the mobile market with its line of Tegra chips that pair an ARM CPU, NVIDIA GPU, and SDR modem. Two new mobile chips called Logan and Parker will follow Tegra 4. Both new chips will support the full CUDA 5 stack and OpenGL 4.3 out of the box. Logan will feature a Kepler-based graphics porcessor on the chip that can “everything a modern computer ought to do” according to NVIDIA. Parker will have a yet-to-be-revealed graphics processor (Kepler successor). This mobile chip will utilize 3D FinFET transistors. It will have a greater number of transistors in a smaller package than previous Tegra parts (it will be about the size of a dime), and NVIDIA also plans to ramp up the frequency to wrangle more performance out of the mobile chip. NVIDIA has stated that Logan silicon should be completed towards the end of 2013, with the mobile chips entering production in 2014.

View Full Size

Interestingly, Logan has a sister chip that NVIDIA is calling Kayla. This mobile chip is capable of running ray tracing applications and features OpenGL geometric shaders. It can support GPGPU code and will be compatible with Linux.

NVIDIA has been pushing CUDA for several years, now. The company has seen some respectable adoption rates, by growing from 1 Tesla supercomputer in 2008 to its graphics cards being used in 50 supercomputers, with 500 million CUDA processors on the market. There are now allegedly 640 universities working with CUDA and 37,000 academic papers on CUDA.

View Full Size

Finally, NVIDIA's hinted-at new product announcement was the NVIDIA VCA, which is a GPU virtualization appliance that hooks into the network and can deliver up to 16 virtual machines running independant applications. These GPU accelerated workspaces can be presneted to thin clinets over the netowrk by installing the GRID client software on users' workstations. The specifications of the GRID VCA is rather impressive, as well.

The GRID VCA features:

  • 2 x Intel Xeon processors with 16 threads each (32 total threads)
  • 192GB to 384GB of system memory
  • 8 Kepler-based graphics cards, with two GPUs each (16 total GPUs)
  • 16 x GPU-accelerated virtual machines

The GRID VCA fits into a 4U case. It can deliver remote graphics to workstations, and is allegedly fast enough to deliver gpu accelerated software that is equivalent to having it run on the local machine (at least over LAN). The GRID Visual Computing Appliance will come in two flavors at different price points. The first will have 8 Kepler GPUs with 4GB of memory each, 16 CPU threads, and 192GB of system memory for $24,900. The other version will cost $34,900 and features 16 Kepler GPUs (4GB memory), 32 CPU threads, and 384GB system memory. On top of the hardware cost, NVIDIA is also charging licensing fees. While both GRID VCA devices can support unlimited devices, the licenses cost $2,400 and $4,800 per year respectively.

View Full Size

Overall, it was an interesting keynote, and the proposed graphics cards look to be offering up some unique and necessary features that should help hasten the day of ubiquitous general purpose GPU computing. The Unified Virtual Memory was something I was not expecting, and it will be interesting to see how AMD responds. AMD is already promising shared memory in its Kaveri APU, but I am interested to see the details of how NVIDIA and AMD will accomplish shared memory with dedicated grapahics cards (and whether CrossFire/SLI setups will all have a single shared memory pool)..

Stay tuned to PC Perspective for more GTC 2013 Coverage!

March 19, 2013 | 04:06 PM - Posted by John Doe (not verified)

I'm sick of this complete and utterly arrogant and pompous Asian asshole and his BS.

Why does he ALWAYS wear leather jackets or tight t-shirts to show off his body like a fag?

He even has a GeForce tattoo on his forearm which pops out when he flexes his biceps... sigh.

March 19, 2013 | 04:54 PM - Posted by Ryan Shrout

Hey John, please leave our site.


March 19, 2013 | 05:02 PM - Posted by Matt (not verified)

Ryan you are awesome.

March 19, 2013 | 05:18 PM - Posted by John Doe (not verified)

You're better off putting an IP block on me.

Then again, I got all the proxies in the World so that can't stop me either.

In every single case, I'm the winner here.

March 19, 2013 | 05:23 PM - Posted by Daniel Masterson (not verified)

Hahaha ya Ryan thanks, this jonh D. guy is hilarious but doesn't really add to the conversation.

March 19, 2013 | 05:27 PM - Posted by John Doe (not verified)

I add more to "the conversation" than anyone else here do...

March 19, 2013 | 05:14 PM - Posted by Sonic4Spuds

Thanks for the article, definately some interesting tech to think about.

Personally I am really interested in the mobile processor that is capable of accelerating ray tracing.

March 19, 2013 | 05:25 PM - Posted by xbeaTX (not verified)

Ryan, I would like to know if you have any assumption on how to "unified virtual memory" implemented on Maxwell can be useful for the "Geforce" users... thanks! :)

March 19, 2013 | 05:58 PM - Posted by John Doe (not verified)

Ryan doesn't know enough to give a reply to that.

I personally would have, if only I was a GPU expert.

I worked at Topower and all I really expertise at is PSU's unfortunately.

March 19, 2013 | 06:16 PM - Posted by Tim Verry

You mean consumer level? Hmm, well you'd need the software to take adantage of it, but there will be small speed/performance gains because the step of copying data from CPU-accessible memory to GPU-accessible memory is taken out of the equation. With UVM, they can each read each others memory. NV was extremely scarce on the details of how this specifically works. Consumer level apps will not see nearly the same potential speedups as datacenter workloads since consumer level apps have much smallers datasets to copy back and forth. In the datacenter/HPC environment that copy process can be a big performance detriment as there is a lot of data to shuttle back and forth.


Like I said though, not many details on how it works and such yet so it's hard to give any sort of estimation on just how much faster. It'll likely make your system faster, it's the direction the industry is heading in though so it's probably going to be... gamers will get Maxwell for gaming prowess and not so much UVM, it will just be a nice feature to have on the consumer side. HPC will be were this kind of tech on both the NV and AMD side will have some sway in the buying decisions of what hardware labs and datacenters buy though.

Hope it helps answer your question, at least in part..

March 19, 2013 | 05:48 PM - Posted by MarkT (not verified)

Maxwell = SLI focus?

March 19, 2013 | 05:56 PM - Posted by John Doe (not verified)

That would be great though, I personally would say, SLi depends on a TON of things.

GPU's, drivers, games and the list goes on.

I've seen around %100 scaling on 285's and extremely shitty, laggy performance on a 9800GX2 due to the lack of vRAM (512MB per GPU). So really, there are so many things to consider before expecting such thing.

March 20, 2013 | 12:38 PM - Posted by aparsh335i (not verified)

This guy is a troll.
Don't listen to anything he says.

March 20, 2013 | 01:20 PM - Posted by John Doe (not verified)

You're a fucking moron.

Go to hell.

March 22, 2013 | 06:37 PM - Posted by Daniel Masterson (not verified)

Oh we don't believe me. He is becoming the butt of jokes on this site. I showed my friend how big of a troll this guy is and he couldn't stop laughing. Also he doesn't show his real name which I find hilarious!

March 23, 2013 | 07:04 AM - Posted by John Doe (not verified)

There will come a time you'll realize to differ facts from the fiction.

Appereantly I'm twice your age, and have spent WAY more time on tech sites than you've done, which makes me see things differently.

Who in the hell has the balls to put in his real name AND stand against an entire tech site? Nobody. I bet my fucking ass nobody on this planet would put his real name to pull the things I've been pulling off.

I consistently get attacked by people over the net, tech site owners all the time.

That should tell you something.

Or not, since you and your friend are both half my age.

Oh well.

March 23, 2013 | 10:51 PM - Posted by Daniel Masterson (not verified)

The fact that you are 54 (since you are twice my age apparently and have some weird fetish with age) and you act like this on tech sites is very sad sir. There would be no need to defend anything if you just added something positive to this site but you don't and we all feel sorry for you. Voyeurism is a sign of a serious disorder and you are textbook.

March 24, 2013 | 05:54 AM - Posted by John Doe (not verified)

Really? What the hell do I care about what you think? Daniel? You're an Internet person I'll never see. I'll just tell you to kiss off here. And if I saw you, I'd tell you to kiss off too.

I'm considering to buy a T-bird 62 BTW.

Just FYI. :)

March 24, 2013 | 11:58 AM - Posted by Daniel Masterson (not verified)

You care enough to reply. :)

March 19, 2013 | 08:41 PM - Posted by kukreknecmi (not verified)

Without accessing x86 memory space, the mentioned UVM thing wont work like as it is anticipated here. Nvidia need some kinda deep x86 integration on the HW level to suceed what AMD wants to do with unified x86 virtual adress space. It is not that valuable at the moment since only an APU kind of device can take full advante of it (also it takes hit by not being as fast as discrete GPU).

For AMD GCN, the x86 unified adress space means, GPU can map and adress the same elements that CPU can. The good use for discrete GPU for now is, it can take a CPU thread + GPU thread to do a work since they can see each, and read/write to permitted adress. We never saw a real implementation of this so far, even AMD kept pushing and saying things about HSA (and keep giving GPGPU examples that are nothing related to HSA).

If it didnt used so well so far, i'm not sure about future. Yet PS4 is a sure thing to achive this kind of stuff.

For Nvidia to achive this kind of goal, they need some kind of AMD approach, which they need to diversify their memory structure. They need some kind of bigger management system above GPC's. If they try to achive this at GPC level, it will look like what AMD's CU is. A CU can be assigned to do work for a thread in conjuction of a CPU thread. So CU on AMD is the lowest GPU-thread assignable unit.By thread i mean it as a high level job. If you try to assign o work-group / gpu-thread as it is literally used in GPU, it is an ultra-Unefficient way. A CU assinged thread + CPU thread will do best job. So if Nvidia implements GPC's in the fashion of CU's used, it gives a bit of shortage. For SMX's to be used as the same way, they need an upper controller system to manage the addressing scheme. And I'm not sure how they will suceed without making things too complicated and efficient well.

I'm not %100 sure if they will infringe some x86 stuff. OFCe mapping an virtual x86 adress space and infringing is not %100 same thing, yet i have suspects.

And by UVM, they may also not implied the x86. Since they are planning to do this on the mobile, so it may be related to ARM. Since they have licences and access, they can do that kind of implementation and Denver is all related about that.

March 19, 2013 | 09:17 PM - Posted by Tim Verry

Thanks for the input I didn't consider licensing issues, though you are right that it might affect what NVIDIA can ultimately officially implement.  Hopefully Intel is willing to work with them.

March 21, 2013 | 12:41 PM - Posted by Anonymous (not verified)

Apple's Tim Cook is so White Bread, that the Queen of England calls him white boy! I watched a video of him discussing graphics and apple, and I could not stay awake! Where do they get these CEOs, Tim Cook is devilishly boring, and Steve Ballmer is, well, devilishly, no! he is the devil!

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

This question is for testing whether you are a human visitor and to prevent automated spam submissions.