NVIDIA Allegedly Launching Quadro K6000 GK110 GPU For Professionals

Subject: Graphics Cards | March 8, 2013 - 09:17 AM |
Tagged: quadro, nvidia, kepler, k6000, gk110

Earlier this week, NVIDIA updated its Quadro line of workstation cards with new GPUs with GK104 “Kepler” cores. The updated line introduced four new Kepler cards, but the Quadro 6000 successor was notably absent from the NVIDIA announcement. If rumors hold true, professionals may get access to a K6000 Quadro card after all, and one that is powered by GK110 as well.

GK110 Block Diagram.jpg

According to rumors around the Internet, NVIDIA has reserved its top-end Quadro slot for a GK110-based graphics card. Dubbed the K6000 (and in line with the existing Kepler Quadro cards), the high-end workstation card will feature 13 SMX units, 2,496 CUDA cores, 192 Texture Manipulation Units, 40 Raster Operations Pipeline units, and a 320-bit memory bus. The K6000 card will likely have 5GB of GDDR5 memory, like its Tesla K20 counterpart. Interestingly, this Quadro K6000 graphics card has one less SMX unit than NVIDIA’s Tesla K20X and even NVIDIA’s consumer-grade GTX Titan GPU. A comparison between the rumored K6000 card, the Quadro K5000 (GK104), and other existing GK110 cards is available in the table below. Also, note that the (rumored) K6000 specs put it more in like with the Tesla K20 than the K20X, but as it is the flagship Quadro card I felt it was still fair to compare it to the flagship Telsa and GeForce cards.

  Quadro K6000 Tesla K20X GTX Titan GK110 Full   (Not available yet) Quadro K5000
SMX Units 13 14 14 15 8
CUDA Cores 2,496 2,688 2,688 2,880 1536
TMUs 192 224 224 256 128
ROPs 40 48 48 48 32
Memory Bus 320-bit 384-bit 384-bit 384-bit 256-bit
DP TFLOPS ~1.17 TFLOPS 1.31 TFLOPS 1.31 TFLOPS ~1.4 TFLOPS .09 TFLOPS
Core GK110 GK110 GK110 GK110 GK104

The Quadro cards are in an odd situation when it comes to double precision floating point performance. The Quadro K5000 which uses GK104 brings an abysmal 90 GFLOPS of double precision. The rumored GK110-powered Quadro K6000 brings double precision performance up to approximately 1 TFLOPS, which is quite the jump and shows that GK104 really was cut down to focus on gaming performance! Further, the card that the K6000 is replacing in name, the Quadro 6000 (no prefixed K), is based on NVIDIA’s previous-generation Fermi architecture and offers .5152 TFLOPS (515.2 GFLOPS) of double precision performance. On the plus side, users can expect around 3.5 TFLOPS of single precision horsepower, which is a substantial upgrade over Quadro 6000's 1.03 TFLOPS of single precision floating point. For comparison, the GK104-based Quadro K5000 offers 2.1 TFLOPS of single precision. Although it's no full GK110, it looks to be the Quadro card to beat for the intended usage.

nvidia-quadro-k5000 GPU.jpg

Of course, Quadro is more about stable drivers, beefy memory, and single precision than double precision, but it would be nice to see the expensive Quadro workstation cards have the ability to pull double duty, as it were. NVIDIA’s Tesla line is where DP floating point is key. It is just a rather wide gap between the two lineups that the K6000 somewhat closes, fortunately. I would have really liked to see the K6000 have at least 14 SMX units, to match consumer Titan and the Tesla K20X, but rumors are not looking positive in that regard. Professionals should expect to see quite the premium with the K6000 versus the Titan, despite the hardware differences. It will likely be sold for around $3,000.

No word on availability, but the card will likely be released soon in order to complete the Kepler Quadro lineup update. 

NVIDIA Refreshes Quadro with Kepler

Subject: General Tech, Graphics Cards | March 6, 2013 - 08:02 PM |
Tagged: quadro, nvidia

KeplerQuadroTop.png

Be polite, be efficient, have a plan to Kepler every card that you meet.

The professional graphics market is not designed for gamers although that should have been fairly clear. These GPUs are designed to effectively handle complex video, 3D, and high resolution display environments found in certain specialized workspaces.

This is the class of cards which allow a 3D animator to edit their creations with stereoscopic 3D glasses, for instance.

NVIDIA's branding will remain consistent with the scheme developed for the prior generation. Previously, if you were in the market for a Fermi-based Quadro solution, you would have the choice between: the Quadro 600, the 2000, the 4000, the 5000, and the 6000. Now that the world revolves around Kepler... heh heh heh... each entry has been prefixed with a K with the exception of the highest-end 6000 card. These entries are therefore:

  • Quadro K600, 192 CUDA Cores, 1GB, $199 MSRP
  • Quadro K2000, 384 CUDA Cores, 2GB, $599 MSRP
  • Quadro K4000, 768 CUDA Cores, 3GB, $1,269 MSRP
  • Quadro K5000, 1536 CUDA Cores, 4GB + ECC, $2,249 MSRP

This product line is demonstrated graphically by the NVIDIA slide below.

KeplerQuadro.png

Clicking the image while viewing the article will enlargen it.

It should be noted that each of the above products have been developed on the series of GK10X architectures and not the more computationally-intensive GK110 products. As the above slide alludes: while these Quadro cards are designed to handle the graphically-intensive applications, they are designed to be paired with GK110-based Tesla K20 cards to offload the GPGPU muscle.

Should you need the extra GPGPU performance, particularly when it comes to double precision mathematics, those cards can be found online for somewhere in the ballpark of $3,300 and $3,500.

The new Quadro products were available starting yesterday, March 5th, from “leading OEM and Channel Partners.”

Source: NVIDIA

A year of GeForce drivers reviewed

Subject: Graphics Cards | March 5, 2013 - 02:28 PM |
Tagged: nvidia, geforce, graphics drivers

After evaluating the evolution of AMD's drivers over 2012, [H]ard|OCP has now finalized their look at NVIDIA's offerings over the past year.  They chose a half dozen drivers spanning March to December, tested on both the GTX680 and GTX 670.  As you can see throughout the review, NVIDIA's performance was mostly stable apart from the final driver of 2012 which provided noticeably improved performance in several games.  [H] compared the frame rates from both companies on the same chart and it makes the steady improvement of AMD's drivers over the year even more obvious.  That does imply that AMD's initial drivers for this year needed improvement and that perhaps the driver team at AMD has a lot of work cut out for them in 2013 if they want to reach a high level of performance across the board, with game specific improvements offering the only deviation in performance.

H_Geforce.jpg

"We have evaluated AMD and NVIDIA's 2012 video card driver performances separately. Today we will be combining these two evaluations to show each companies full body of work in 2012. We will also be looking at some unique graphs that show how each video cards driver improved or worsened performance in each game throughout the year."

Here are some more Graphics Card articles from around the web:

Graphics Cards

Source: [H]ard|OCP

Podcast #240 - GTX TITAN Benchmarks, Frame Rating, Tegra 4 Details and more!

Subject: General Tech | February 28, 2013 - 03:45 PM |
Tagged: video, titan, sli, R5000, podcast, nvidia, H90, H110, gtx titan, frame rating, firepro, crossfire, amd

PC Perspective Podcast #240 - 02/28/2013

Join us this week as we discuss GTX TITAN Benchmarks, Frame Rating, Tegra 4 Details and more!

You can subscribe to us through iTunes and you can still access it directly through the RSS page HERE.

The URL for the podcast is: http://pcper.com/podcast - Share with your friends!

  • iTunes - Subscribe to the podcast directly through the iTunes Store
  • RSS - Subscribe through your regular RSS reader
  • MP3 - Direct download link to the MP3 file

Hosts: Ryan Shrout, Jeremy Hellstrom, Josh Walrath and Allyn Malventano

This Podcast is brought to you by MSI!

Program length: 1:24:28

Podcast topics of discussion:

  1. 0:01:18 PCPer Podcast BINGO!
  2. Week in Reviews:
    1. 0:03:00 GeForce GTX TITAN Performance Review
    2. 0:21:55 Frame Rating Part 3: First Results from the New GPU Performance Tools
    3. 0:38:00 Corsair Hydro Series H90 and H110 140mm Liquid Cooler Review
  3. 0:40:30 This Podcast is brought to you by MSI!
  4. News items of interest:
    1. 0:41:45 New Offices coming for NVIDIA
    2. 0:45:00 Chromebook Pixel brings high-res to high-price
    3. 0:48:00 GPU graphics market updates from JPR
    4. 0:55:45 Tegra 4 graphics details from Mobile World Congress
    5. 1:01:00 Unreal Engine 4 on PS4 has reduced quality
    6. 1:04:10 Micron SAS SSDs
    7. 1:08:25 AMD FirePro R5000 PCoIP Card
  5. Closing:
    1. 1:13:35 Hardware / Software Pick of the Week
      1. Ryan: NOT this 3 port HDMI switch
      2. Jeremy: Taxidermy + PICAXE, why didn't we think of this before?
      3. Josh: Still among my favorite headphones
      4. Allyn: Cyto
  1. 1-888-38-PCPER or podcast@pcper.com
  2. http://pcper.com/podcast
  3. http://twitter.com/ryanshrout and http://twitter.com/pcper
  4. Closing/outro

Be sure to subscribe to the PC Perspective YouTube channel!!

 

NVIDIA Details Tegra 4 and Tegra 4i Graphics

Subject: Graphics Cards | February 25, 2013 - 08:01 PM |
Tagged: nvidia, tegra, tegra 4, Tegra 4i, pixel, vertex, PowerVR, mali, adreno, geforce

 

When Tegra 4 was introduced at CES there was precious little information about the setup of the integrated GPU.  We all knew that it would be a much more powerful GPU, but we were not entirely sure how it was set up.  Now NVIDIA has finally released a slew of whitepapers that deal with not only the GPU portion of Tegra 4, but also some of the low level features of the Cortex A15 processor.  For this little number I am just going over the graphics portion.

layout.jpg

This robust looking fellow is the Tegra 4.  Note the four pixel "pipelines" that can output 4 pixels per clock.

The graphics units on the Tegra 4 and Tegra 4i are identical in overall architecture, just that the 4i has fewer units and they are arranged slightly differently.  Tegra 4 is comprised of 72 units, 48 of which are pixel shaders.  These pixel shaders are VLIW based VEC4 units.  The other 24 units are vertex shaders.  The Tegra 4i is comprised of 60 units, 48 of which are pixel shaders and 12 are vertex shaders.  We knew at CES that it was not a unified shader design, but we were still unsure of the overall makeup of the part.  There are some very good reasons why NVIDIA went this route, as we will soon explore.

If NVIDIA were to transition to unified shaders, it would increase the overall complexity and power consumption of the part.  Each shader unit would have to be able to handle both vertex and pixel workloads, which means more transistors are needed to handle it.  Simpler shaders focused on either pixel or vertex operations are more efficient at what they do, both in terms of transistors used and power consumption.  This is the same train of thought when using fixed function units vs. fully programmable.  Yes, the programmability will give more flexibility, but the fixed function unit is again smaller, faster, and more efficient at its workload.

layout_4i.jpg

On the other hand here we have the Tegra 4i, which gives up half the pixel pipelines and vertex shaders, but keeps all 48 pixel shaders.

If there was one surprise here, it would be that the part is not completely OpenGL ES 3.0 compliant.  It is lacking in one major function that is required for certification.  This particular part cannot render at FP32 levels.  It has been quite a few years since we have heard of anything not being able to do FP32 in the PC market, but it is quite common to not support it in the power and transistor conscious mobile market.  NVIDIA decided to go with a FP 20 partial precision setup.  They claim that for all intents and purposes, it will not be noticeable to the human eye.  Colors will still be rendered properly and artifacts will be few and far between.  Remember back in the day when NVIDIA supported FP16 and FP32 while they chastised ATI for choosing FP24 with the Radeon 9700 Pro?  Times have changed a bit.  Going with FP20 is again a power and transistor saving decision.  It still supports DX9.3 and OpenGL ES 2.0, but it is not fully OpenGL ES 3.0 compliant.  This is not to say that it does not support any 3.0 features.  It in fact does support quite a bit of the functionality required by 3.0, but it is still not fully compliant.

This will be an interesting decision to watch over the next few years.  The latest Mali 600 series, PowerVR 6 series, and Adreno 300 series solutions all support OpenGL ES 3.0.  Tegra 4 is the odd man out.  While most developers have no plans to go to 3.0 anytime in the near future, it will eventually be implemented in software.  When that point comes, then the Tegra 4 based devices will be left a bit behind.  By then NVIDIA will have a fully compliant solution, but that is little comfort for those buying phones and tablets in the near future that will be saddled with non-compliance once applications hit.

ogles_feat.jpg

The list of OpenGL ES 3.0 features that are actually present in Tegra 4, but the lack of FP32 relegates it to 2.0 compliant status.

The core speed is increased to 672 MHz, well up from the 520 MHz in Tegra 3 (8 pixel and 4 vertex shaders).  The GPU can output four pixels per clock, double that of Tegra 3.  Once we consider the extra clock speed and pixel pipelines, the Tegra 4 increases pixel fillrate by 2.6x.  Pixel and vertex shading will get a huge boost in performance due to the dramatic increase of units and clockspeed.  Overall this is a very significant improvement over the previous generation of parts.

The Tegra 4 can output to a 4K display natively, and that is not the only new feature for this part.  Here is a quick list:

2x/4x Multisample Antialiasing (MSAA)

24-bit Z (versus 20-bit Z in the Tegra 3 processor) and 8-bit Stencil

4K x 4K texture size incl. Non-Power of Two textures (versus 2K x 2K in the Tegra 3 processor) – for higher quality textures, and easier to port full resolution textures from  console and PC games to Tegra 4 processor.  Good for high resolution displays.

16:1 Depth (Z) Compression and 4:1 Color Compression (versus none in Tegra 3 processor) – this is lossless compression and is useful for reducing bandwidth to/from the frame buffer, and especially effective in antialiasing processing when processing multiple samples per pixel

Depth Textures

Percentage Closer Filtering for Shadow Texture Mapping and Soft Shadows

Texture border color eliminate coarse MIP-level bleeding

sRGB for Texture Filtering, Render Surfaces and MSAA down-filter

1 - CSAA is no longer supported in Tegra 4 processors

This is a big generational jump, and now we only have to see how it performs against the other top end parts from Qualcomm, Samsung, and others utilizing IP from Imagination and ARM.

Source: NVIDIA

Triangles beat voxels when you are constructing a building

Subject: General Tech | February 22, 2013 - 12:23 PM |
Tagged: nvidia, jen-hsun huang

NVIDIA will have a new nerve center across the street from their existing headquarters as from what Jen-Hsun told The Register they are almost at the point where they need bunk-desks in their current HQ.  The triangle pattern that the artists concepts shown not only embodies a key part of NVIDIA's technology but is also a well recognized technique in architecture to provide very sturdy construction.  Hao Ko was the architect chosen for the design, his resume includes a terminal at JFK airport as well as a rather tall building in China.  For NVIDIA's overlord to plan such an expensive undertaking shows great confidence in his companies success, even with the shrinking discrete GPU market.

nvidia_new_hq_aerial_view.jpg

"Move over Apple. Nvidia cofounder and CEO Jen-Hsun Huang wants to build his own futuristic space-station campus – and as you might expect, the Nvidia design is black and green and built from triangles, the basic building block of the mathematics around graphics processing. And, as it turns out, the strongest shape in architecture."

Here is some more Tech News from around the web:

Tech Talk

Source: The Register
Author:
Manufacturer: PC Perspective

In case you missed it...

UPDATE: We have now published full details on our Frame Rating capture and analysis system as well as an entire host of benchmark results.  Please check it out!!

In one of the last pages of our recent NVIDIA GeForce GTX TITAN graphics card review we included an update to our Frame Rating graphics performance metric that details the testing method in more detail and showed results for the first time.  Because it was buried so far into the article, I thought it was worth posting this information here as a separate article to solict feedback from readers and help guide the discussion forward without getting lost in the TITAN shuffle.  If you already read that page of our TITAN review, nothing new is included below. 

I am still planning a full article based on these results sooner rather than later; for now, please leave me your thoughts, comments, ideas and criticisms in the comments below!


Why are you not testing CrossFire??

If you haven't been following our sequence of stories that investigates a completely new testing methodology we are calling "frame rating", then you are really missing out.  (Part 1 is here, part 2 is here.)  The basic premise of Frame Rating is that the performance metrics that the industry is gathering using FRAPS are inaccurate in many cases and do not properly reflect the real-world gaming experience the user has.

Because of that, we are working on another method that uses high-end dual-link DVI capture equipment to directly record the raw output from the graphics card with an overlay technology that allows us to measure frame rates as they are presented on the screen, not as they are presented to the FRAPS software sub-system.  With these tools we can measure average frame rates, frame times and stutter, all in a way that reflects exactly what the viewer sees from the game.

We aren't ready to show our full sets of results yet (soon!) but the problems lie in that AMD's CrossFire technology shows severe performance degradations when viewed under the Frame Rating microscope that do not show up nearly as dramatically under FRAPS.  As such, I decided that it was simply irresponsible of me to present data to readers that I would then immediately refute on the final pages of this review (Editor: referencing the GTX TITAN article linked above.) - it would be a waste of time for the reader and people that skip only to the performance graphs wouldn't know our theory on why the results displayed were invalid.

Many other sites will use FRAPS, will use CrossFire, and there is nothing wrong with that at all.  They are simply presenting data that they believe to be true based on the tools at their disposal.  More data is always better. 

Here are these results and our discussion.  I decided to use the most popular game out today, Battlefield 3 and please keep in mind this is NOT the worst case scenario for AMD CrossFire in any way.  I tested the Radeon HD 7970 GHz Edition in single and CrossFire configurations as well as the GeForce GTX 680 and SLI.  To gather results I used two processes:

  1. Run FRAPS while running through a repeatable section and record frame rates and frame times for 60 seconds
  2. Run our Frame Rating capture system with a special overlay that allows us to measure frame rates and frame times with post processing.

Here is an example of what the overlay looks like in Battlefield 3.

fr_sli_1.jpg

Frame Rating capture on GeForce GTX 680s in SLI - Click to Enlarge

The column on the left is actually the visuals of an overlay that is applied to each and every frame of the game early in the rendering process.  A solid color is added to the PRESENT call (more details to come later) for each individual frame.  As you know, when you are playing a game, multiple frames will make it on any single 60 Hz cycle of your monitor and because of that you get a succession of colors on the left hand side.

By measuring the pixel height of those colored columns, and knowing the order in which they should appear beforehand, we can gather the same data that FRAPS does but our results are seen AFTER any driver optimizations and DX changes the game might make.

fr_cf_1.jpg

Frame Rating capture on Radeon HD 7970 CrossFire - Click to Enlarge

Here you see a very similar screenshot running on CrossFire.  Notice the thin silver band between the maroon and purple?  That is a complete frame according to FRAPS and most reviews.  Not to us - we think that frame rendered is almost useless. 

Continue reading our 3rd part in a series of Frame Rating and to see our first performance results!!

Join PCPer and NVIDIA for a GeForce GTX TITAN Live Review!

Subject: Graphics Cards | February 21, 2013 - 01:12 PM |
Tagged: video, titan, nvidia, live review, live, kepler, geforce titan, geforce

Missed the live event?  Here is the full replay feature me and Tom Petersen!

Hopefully by now you have read our review of the NVIDIA GeForce GTX TITAN 6GB graphics card that was just released.  This is definitely a product release that highlights a generations of GPUs and I would really encourage you to read the article and offer your feedback.

However, we have another event to promote right now: NVIDIA's Tom Petersen will be joining me on PCPer Live! at 11am PT / 2pm ET to talk about the GeForce GTX TITAN and its performance, features, pricing and more! 

pcperlive2.png

GeForce GTX TITAN Live Review Stream

11am PT / 2pm ET - February 21st

PC Perspective Live! Page

If you have questions for Tom or me, you can leave them in the comments below (no registration required)!

nvidia1.jpg

TITAN up your ... you know

Subject: Graphics Cards | February 21, 2013 - 12:57 PM |
Tagged: titan, nvidia, kepler, gtx titan, gk110, geforce

Before getting into the performance of the $1000 NVIDIA TITAN it is worth looking at the improvements NVIDIA has added to this GK110 beast.  At 10.5" long it is a half inch longer than a 680 and a full 1.5" shorter than a 690, which allows it to fit in a wider variety of cases and the vastly improved thermals allow the usage of much smaller cases than other high end GPUs can manage without exotic cooling solutions.  There is also a reduction in noise generated, to the point where SLI'd TITANs run quieter than some single card solutions, not to mention much faster.  To take a look at just how much faster you can see [H]ard|OCP's results which you can compare to Ryan's results.

H_TITAN.jpg

"NVIDIA is launching a TITAN today, literally, the new GeForce GTX TITAN video card is here, and we have a lot to talk about. We test single-GPU and 2-way SLI today, with more to follow later. We will find out if this TITAN of a video card really is worth it, and just who this video card is designed for. Be prepared to face the fastest single-GPU video card."

Here are some more Graphics Card articles from around the web:

Graphics Cards

Source: [H]ard|OCP
Author:
Manufacturer: NVIDIA

TITAN is back for more!

Our NVIDIA GeForce GTX TITAN Coverage Schedule:

If you are reading this today, chances are you were here on Tuesday when we first launched our NVIDIA GeForce GTX TITAN features and preview story (accessible from the link above) and were hoping to find benchmarks then.  You didn't, but you will now.  I am here to show you that the TITAN is indeed the single fastest GPU on the market and MAY be the best graphics cards (single or dual GPU) on the market depending on what usage models you have.  Some will argue, some will disagree, but we have an interesting argument to make about this $999 gaming beast.

A brief history of time...er, TITAN

In our previous article we talked all about TITAN's GK110-based GPU, the form factor, card design, GPU Boost 2.0 features and much more and I would highly press you all to read it before going forward.  If you just want the cliff notes, I am going to copy and paste some of the most important details below.

IMG_9502.JPG

From a pure specifications standpoint the GeForce GTX TITAN based on GK110 is a powerhouse.  While the full GPU sports a total of 15 SMX units, TITAN will have 14 of them enabled for a total of 2688 shaders and 224 texture units.  Clock speeds on TITAN are a bit lower than on GK104 with a base clock rate of 836 MHz and a Boost Clock of 876 MHz.  As we will show you later in this article though the GPU Boost technology has been updated and changed quite a bit from what we first saw with the GTX 680.

The bump in the memory bus width is also key, being able to feed that many CUDA cores definitely required a boost from 256-bit to 384-bit, a 50% increase.  Even better, the memory bus is still running at 6.0 GHz resulting in total memory bandwdith of 288.4 GB/s.

blockdiagram2.jpg

Speaking of memory - this card will ship with 6GB on-board.  Yes, 6 GeeBees!!  That is twice as much as AMD's Radeon HD 7970 and three times as much as NVIDIA's own GeForce GTX 680 card.  This is without a doubt a nod to the super-computing capabilities of the GPU and the GPGPU functionality that NVIDIA is enabling with the double precision aspects of GK110.

Continue reading our full review of the NVIDIA GeForce GTX TITAN graphics card with benchmarks and an update on our Frame Rating process!!