NVIDIA Finally Gets Serious with Tegra
Tegra has had an interesting run of things. The original Tegra 1 was utilized only by Microsoft with Zune. Tegra 2 had a better adoption, but did not produce the design wins to propel NVIDIA to a leadership position in cell phones and tablets. Tegra 3 found a spot in Microsoft’s Surface, but that has turned out to be a far more bitter experience than expected. Tegra 4 so far has been integrated into a handful of products and is being featured in NVIDIA’s upcoming Shield product. It also hit some production snags that made it later to market than expected.
I think the primary issue with the first three generations of products is pretty simple. There was a distinct lack of differentiation from the other ARM based products around. Yes, NVIDIA brought their graphics prowess to the market, but never in a form that distanced itself adequately from the competition. Tegra 2 boasted GeForce based graphics, but we did not find out until later that it was comprised of basically four pixel shaders and four vertex shaders that had more in common with the GeForce 7800/7900 series than it did with any of the modern unified architectures of the time. Tegra 3 boasted a big graphical boost, but it was in the form of doubling the pixel shader units and leaving the vertex units alone.
While NVIDIA had very strong developer relations and a leg up on the competition in terms of software support, it was never enough to propel Tegra beyond a handful of devices. NVIDIA is trying to rectify that with Tegra 4 and the 72 shader units that it contains (still divided between pixel and vertex units). Tegra 4 is not perfect in that it is late to market and the GPU is not OpenGL ES 3.0 compliant. ARM, Imagination Technologies, and Qualcomm are offering new graphics processing units that are not only OpenGL ES 3.0 compliant, but also offer OpenCL 1.1 support. Tegra 4 does not support OpenCL. In fact, it does not support NVIDIA’s in-house CUDA. Ouch.
Jumping into a new market is not an easy thing, and invariably mistakes will be made. NVIDIA worked hard to make a solid foundation with their products, and certainly they had to learn to walk before they could run. Unfortunately, running effectively entails having design wins due to outstanding features, performance, and power consumption. NVIDIA was really only average in all of those areas. NVIDIA is hoping to change that. Their first salvo into offering a product that offers features and support that is a step above the competition is what we are talking about today.
Subject: Graphics Cards | July 23, 2013 - 09:00 AM | Tim Verry
Tagged: workstation, simulation, quadro k6000, quadro, nvidia, k6000, gk110
Today, NVIDIA announced its flagship Quadro graphics card called the K6000. Back in March of this year, NVIDIA launched a new like of Quadro graphics cards for workstations. Those cards replaced the Fermi-based predecessors with new models based on NVIDIA’s GK-104 “Kepler” GPUs. Notably missing from that new lineup was NVIDIA Quadro K6000, which is the successor to the Quadro 6000.
Contrary to previous rumors, the Quadro K6000 will be based on the full GK110 chip. In fact, it will be the fastest single-GPU graphics card that NVIDIA has to offer.
The Quadro K6000 features a full GK110 GPU, 12GB of GDDR5 memory on a 384-bit bus, and a 225W TDP. The full GK110-based GPU has 2,880 CUDA cores, 256 TMUs, and 48 ROPs. Unfortunately, NVIDIA has not yet revealed clockspeeds for the GPU or memory.
Thanks to the GPU not having any SMX units disabled, the NVIDIA Quadro K6000 is rated for approximately 1.4 TFLOPS of peak double precision floating point performance of and 5.2 TFLOPS of single precision floating point performance.
The chart below illustrates the differences between the new flagship Quadro K6000 with full GK110 GPU and the highest tier Tesla and consumer graphics cards which have at least one SMX unit disabled.
NVIDIA GK110-Based Graphics Cards
|Quadro K6000||Tesla K20X||GTX TITAN|
|Memory Bandwidth||288 GB/s||250 GB/s||288 GB/s|
|Single Precision FP||5.2 TFLOPS||3.95 TFLOPS||4.5 TFLOPS|
|Double Precision FP||~1.4 TFLOPS||1.31 TFLOPS||1.31 TFLOPS|
The NVIDIA GTX TITAN gaming graphics card has 2,688 CUDA cores, 224 TMUs, and 48 ROPs and is rated for peak double and single precision of 1.31 TFLOPS and 4.5 TFLOPS respectively. On the other hand, the lower-clocked Tesla K20X compute accelerator card has 2,688 CUDA cores, 224 TMUs, and 48 ROPs along with lower clockspeeds on the memory and GPU. Because of the lower clockspeeds, the K20X is rated for double and single precision floating point performance of 1.31 TFLOPS and 3.95 TFLOPS and memory bandwidth of 250GB/s versus the 288GB/s bandwidth on the TITAN and K6000.
NVIDIA® Quadro® K6000 GPU
In all, the new K6000 is an impressive card for professional users, and the GK110 chip should perform well in the workstation environment where GK104 was the only option before. NVIDIA claims that the GK110 is up to 3-times the performance of the Quadro 6000 (non K) predecessor. It is also the first Quadro GPU with 12GB of GDDR5 memory, which should lend itself well to high resolutions and artists working with highly detailed models and simulations.
Specifically, NVIDIA is aiming this graphics card at the visual computing market, which includes 3D designers, visual effects artists, 3d animation, and simulations. The company provided several examples in the press release, including using the GK110-based card to render nearly complete photorealistic vehicle models in RTT Deltagen that can run real time during design reviews.
The Quadro K6000 allows for larger and fully populated virtual sets with realistic lighting and scene detail when 3D animators and VFX artists are working with models and movie scenes in real time. Simulation work also takes advantage of the beefy double precision horsepower to support up to 3-times faster simulation run times in Terraspark's InsightEarth simulation. Users can run simulations with wider areas in less time than the previous generation Quardo cards, and is being used by oil companies to determine the best places to drill.
Pixar's Vice President of Software and R&D Guido Quaroni had the following to say regarding the K6000.
"The Kepler features are key to our next generation of real-time lighting and geometryhandling. The added memory and other features allow our artists to see much more of thefinal scene in a real-time, interactive form, which allows many more artistic iterations."
The K6000 is the final piece to the traditional NVIDIA Quadro lineup and is likely to be well recieved by workstation users that need the increased double precision performance that GK110 offers over the existing GK104 chips. Specific pricing and availability are still unknown, but the K6000 will be available from workstation providers, system integrators, and authorized distribution partners beginning this fall.
Subject: General Tech | July 22, 2013 - 03:19 PM | Tim Verry
Tagged: nvidia, shield, project shield, tegra 4, gaming
NVIDIA has announced that its Shield gaming portable will begin shipping on July 31. The portable gaming console was originally slated to launch on June 27th for $349 but due to an unspecified mechanical issue with a third party component (discovered at the last minute) the company delayed the launch until the problem was fixed. Now, it appears the issue has been resolved and the NVIDIA Shield will launch on July 31 for $299 or $50 less than the original MSRP.
As a refresher, Project Shield, or just Shield as it is known now, is a portable game console that is made up of a controller, mobile-class hardware internals, and an integrated 5” 720 touchscreen display that hinges clam shell style from the back of the controller.. It runs the Android Jelly Bean operating system and can play Android games as well as traditional PC games that are streamed from PCs with a qualifying NVIDIA graphics card. On the inside, Shield has a NVIDIA Tegra 4 SoC (quad core ARM Cortex A15-based CPU with NVIDIA’s proprietary GPU technology added in), 2GB RAM, and 16GB of storage. In all, the Shield measures 158mm (W) x 135mm (D) x 57mm (H) and weighs about 1.2 pounds. The controller is reminiscent of an Xbox 360 game pad.
With the third party mechanical issue out of the way, the Shield is ready to ship on July 31 and is already avialble for pre-order. Gamers will be able to test out the Shield at Shield Experience Centers located at certain GameStop, Microcenter, and Canada Computers shops in the US and Canada. The hardware will also be available for purchase at the usual online retailers for $299 (MSRP).
Subject: Graphics Cards, Displays | July 18, 2013 - 08:16 PM | Ryan Shrout
Tagged: pq321q, PQ321, nvidia, drivers, asus, 4k
It would appear that NVIDIA was paying attention to our recent live stream where we unboxed and setup our new ASUS PQ321Q 4K 3840x2160 monitor. During our setup on the AMD and NVIDIA based test beds I noticed (and the viewers saw) some less than desirable results during initial configuration. The driver support was pretty clunky, we had issues with reliability of booting and switching between SST and MST (single and multi stream transport) modes caused the card some issue as well.
Today NVIDIA released a new R326 driver, 326.19 beta, that improves performance in a couple of games but more importantly, adds support for "tiled 4K displays." If you don't know what that means, you aren't alone. A tiled display is one that is powered by multiple heads and essentially acts as multiple screens in a single housing. The ASUS PQ321Q monitor that we have in house, and the Sharp PN-K321, are tiled displays that use DisplayPort 1.2 MST technology to run at 3840x2160 @ 60 Hz.
It is great to see NVIDIA reacting quickly to new technologies and to our issues from just under a week gone by. If you have either of these displays, be sure to give the new driver a shot and let me know your results!
Subject: General Tech, Storage | July 18, 2013 - 04:56 PM | Scott Michaud
Tagged: Raspberry Pi, nvidia, HPC, amazon
Adam DeConinck, high performance computing (HPC) systems engineer for NVIDIA, built a personal computer cluster in his spare time. While not exactly high performance, especially when compared to the systems he maintains for Amazon and his employer, its case is made of Lego and seems to be under a third of a cubic foot in volume.
Image source: NVIDIA Blogs
Raspberry Pi is based on a single-core ARM CPU bundled on an SoC with a 24 GFLOP GPU and 256 or 512 MB of memory. While this misses the cutesy point of the story, I am skeptical of the expected 16W power rating. Five Raspberry Pis, with Ethernet, draw a combined maximum of 17.5W, alone, and even that neglects the draw of the networking switch. My, personal, 8-port unmanaged switch is rated to draw 12W which, when added to 17.5W, is not 16W and thus something is being neglected or averaged. Then again, his device, power is his concern.
Despite constant development and maintenance of interconnected computers, professionally, Adam's will for related hobbies has not been displaced. Even after the initial build, he already plans to graft the Hadoop framework and really reign in the five ARM cores for something useful...
... but, let's be honest, probably not too useful.
Introduction and Design
With the release of Haswell upon us, we’re being treated to an impacting refresh of some already-impressive notebooks. Chief among the benefits is the much-championed battery life improvements—and while better power efficiency is obviously valuable where portability is a primary focus, beefier models can also benefit by way of increased versatility. Sure, gaming notebooks are normally tethered to an AC adapter, but when it’s time to unplug for some more menial tasks, it’s good to know that you won’t be out of juice in a couple of hours.
Of course, an abundance of gaming muscle never hurts, either. As the test platform for one of our recent mobile GPU analyses, MSI’s 15.6” GT60 gaming notebook is, for lack of a better description, one hell of a beast. Following up on Ryan’s extensive GPU testing, we’ll now take a more balanced and comprehensive look at the GT60 itself. Is it worth the daunting $1,999 MSRP? Does the jump to Haswell provide ample and economical benefits? And really, how much of a difference does it make in terms of battery life?
Our GT60 test machine featured the following configuration:
In case it wasn’t already apparent, this device makes no compromises. Sporting a desktop-grade GPU and a quad-core Haswell CPU, it looks poised to be the most powerful notebook we’ve tested to date. Other configurations exist as well, spanning various CPU, GPU, and storage options. However, all available GT60 configurations feature a 1080p anti-glare screen, discrete graphics (starting at the GTX 670M and up), Killer Gigabit LAN, and a case built from metal and heavy-duty plastic. They also come preconfigured with Windows 8, so the only way to get Windows 7 with your GT60 is to purchase it through a reseller that performs customizations.
Subject: Mobile | July 16, 2013 - 04:40 AM | Tim Verry
Tagged: zte, tegra 4, td-scdma, smartphone, nvidia, china mobile
Details have leaked on a new ZTE smartphone called the Geek U988S thanks to China's TENNA certification database. The Geek is powered by NVIDIA’s latest-generation Tegra 4 SoC and is headed for Chinese wireless carrier China Mobile and its TD-SCDMA network.
Along with leaked specifications, the TENNA site has photos of its upcoming smartphone. The pictured model has a pink colored chassis with a large 5-inch touchscreen LCD with a resolution of 1920 x 1080. A 2MP webcam sits above the display and the rear of the phone hosts an 8MP camera. The device measures 144 x 71 x 9mm.
Internal hardware includes a Tegra 4 SoC clocked at 1.8GHz and 2GB of RAM. The phone works on China’s TD-SCDMA network.
There is no word on pricing or availability, but photos and a specs list can be found here.
Overclocked GTX 770 from Galaxy
When NVIDIA launched the GeForce GTX 770 at the very end of May, we started to get in some retail samples from companies like Galaxy. While our initial review looked at the reference models, other add-in card vendors are putting their own unique touch on the latest GK104 offering and Galaxy was kind enough to send us their GeForce GTX 770 2GB GC model that uses a unique, more efficient cooler design and also runs at overclocked frequencies.
If you haven't yet read up on the GTX 770 GPU, you should probably stop by my first review of the GTX 770 to see what information you are missing out on. Essentially, the GTX 770 is a full-spec GK104 Kepler GPU running at higher clocks (both core and memory speeds) compared to the original GTX 680. The new reference clocks for the GTX 770 were 1046 MHz base clock, 1085 MHz Boost clock and a nice increase to 7.0 GHz memory speeds.
Galaxy GeForce GTX 770 2GB GC Specs
The Galaxy GC model is overclocked with a new base clock setting of 1111 MHz and a higher Boost clock of 1163 MHz; both are about 6.5-7.0% higher than the original clocks. Galaxy has left the memory speeds alone though keeping them running at 7.0 GHz effectively.
Another Wrench – GeForce GTX 760M Results
Just recently, I evaluated some of the current processor-integrated graphics options from our new Frame Rating performance metric. The results were very interesting, proving Intel has done some great work with its new HD 5000 graphics option for Ultrabooks. You might have noticed that the MSI GE40 didn’t just come with the integrated HD 4600 graphics but also included a discrete NVIDIA GeForce GTX 760M, on-board. While that previous article was to focus on the integrated graphics of Haswell, Trinity, and Richland, I did find some noteworthy results with the GTX 760M that I wanted to investigate and present.
The MSI GE40 is a new Haswell-based notebook that includes the Core i7-4702MQ quad-core processor and Intel HD 4600 graphics. Along with it MSI has included the Kepler architecture GeForce GTX 760M discrete GPU.
This GPU offers 768 CUDA cores running at a 657 MHz base clock but can stretch higher with GPU Boost technology. It is configured with 2GB of GDDR5 memory running at 2.0 GHz.
If you didn’t read the previous integrated graphics article, linked above, you’re going to have some of the data presented there spoiled and so you might want to get a baseline of information by getting through that first. Also, remember that we are using our Frame Rating performance evaluation system for this testing – a key differentiator from most other mobile GPU testing. And in fact it is that difference that allowed us to spot an interesting issue with the configuration we are showing you today.
If you are not familiar with the Frame Rating methodology, and how we had to change some things for mobile GPU testing, I would really encourage you to read this page of the previous mobility Frame Rating article for the scoop. The data presented below depends on that background knowledge!
Okay, you’ve been warned – on to the results.
Subject: Graphics Cards | July 10, 2013 - 01:48 PM | Josh Walrath
Tagged: Overclocked, nvidia, just delivered, gtx 780, gtx 770, gtx 760, GTX 670 Mini, DirectCU II, DCII, asus
Returning home on Monday, I was greeted by several (slightly wet) boxes from Asus. Happily, the rainstorm that made these boxes a bit damp did not last long, and the wetness was only superficial. The contents were perfectly fine. I was pleased by this, but not particularly pleased with FedEx for leaving them in a spot where they got wet. All complaints aside, I was obviously ecstatic to get the boxes.
Quite the lineup. The new packaging is sharp looking and clearly defines the contents.
Inside these boxes are some of the latest and greatest video cards from Asus. Having just finished up a budget roundup, I had the bandwidth available to tackle a much more complex task. Asus sent four cards for our testing procedures, and I intend to go over them with a fine toothed comb.
The smallest of the bunch is the new GTX 670 DC Mini. Asus did some serious custom work to not only get the card as small as it is, but also to redesign the power delivery system so that the chip only requires a single 8 pin PCI-E power connection. Most GTX 670 boards require 2 x 6 pin connectors which would come out to be around 225 watts delivered, but a single 8 pin would give around 175 watts total. This is skirting the edge of the official draw for the GTX 670, but with the GK104 chip being as mature as it is, there is some extra leeway involved. The cooler is quite compact and apparently pretty quiet. This is aimed at the small form factor crowd who do not want/need a overly large card, but still require a lot of performance. While the GTX 700 series is now hitting the streets, there is still a market for this particular card. Oh, and it is also overclocked for good measure!
We see a nice progression from big to little. It is amazing how small the GTX 670 DC Mini is compared to the rest, and it will be quite interesting to see how it compares to the GTX 760 in testing.
The second card is the newly released GTX 760 DCII OC. This is again based on the tried and true GK104 chip, but has several units disabled. It has 1152 CUDA cores, but retains the same number of ROPS as the fully enabled chips. It also features the full 256 bit memory bus running at 6 Gbps. It has plenty of bandwidth to provide the card in most circumstances considering the amount of functional units enabled. The cooler is one of the new DirectCU II designs and is a nice upgrade in both functionality and looks from the previous DCII models. It is a smaller card than one would expect, but that comes from the need to simplify the card and not overbuild it like the higher priced 770 and 780 cards. As I have mentioned before, I really like the budget and midrange cards. This should be a really fascinating card to test.
The next card is a bit of an odd bird. The GTX 770 DCII OC is essentially a slightly higher clocked GTX 680 from yesteryear. One of the big changes is that this particular model foregoes the triple slot cooler of the previous generation and implements a dual slot cooler that is quite heavy and with a good fin density. It features six pin and eight pin power connections so it has some legs for overclocking. The back plate is there for stability and protection, and it gives the board a very nice, solid feel. Asus added two LEDs by the power connections which show if the card is receiving power or not. This is nice, as the fans on this card are very silent in most situations. Nobody wants to unplug a video card that is powered up. It retains the previous generation DCII styling, but the cooler performance is certainly nothing to sneeze at. It also is less expensive than the previous GTX 680, but is faster.
All of the cards sport dual DVI, DisplayPort, and HDMI outputs. Both DVI ports are dual-link, but only one is DVI-I which can also output a VGA signal with the proper adapter.
Finally we have the big daddy of the GTX 700 series. The 780 DCII OC is pretty much a monster card that exceeds every other offering out there, except the $1K GTX Titan. It is a slightly cut down chip as compared to the mighty Titan, but it still packs in 2304 CUDA cores. It retains the 384 bit memory bus and runs at a brisk 6 Gbps for a whopping 288.4 GB/sec of bandwidth. The core is overclocked to a base of 889 MHz and boosts up to 941 MHz. The cooler on this is massive. It features a brand new fan design for the front unit which apparently can really move the air and do so quietly. Oddly enough, this fan made its debut appearance on the aforementioned GTX 670 DC Mini. The PCB on the GTX 780 DCII OC is non-reference. It features a new power delivery system that should keep this board humming when overclocked. Asus has done their usual magic in pairing the design with high quality components which should ensure a long lifespan for this pretty expensive board.
I do like the protective plates on the backs of the bigger cards, but the rear portion of the two smaller cards are interesting as well. We will delve more into the "Direct Power" functionality in the full review.
I am already well into testing these units and hope to have the full roundup late next week. These are really neat cards and any consumer looking to buy a new one should certainly check out the review once it is complete.
Asus has gone past the "Superpipe" stage with the GTX 780. That is a 10 mm heatpipe we are seeing. All of the DCII series coolers are robust, and even the DC Mini can dissipate a lot of heat.