Subject: Graphics Cards | July 21, 2016 - 02:04 PM | Jeremy Hellstrom
Tagged: gtx 460, gtx 760, gtx 960, gtx 1060, fermi, kepler, maxwell, pascal
Phoronix took a look at how NVIDIA's mid range cards performance on Linux has changed over the past four generations of GPU, from Fermi, through Kepler, Maxwell, and finally Pascal. CS:GO was run at 4k to push the newer GPUs as was DOTA, much to the dismay of the GTX 460. The scaling is rather interesting, there is a very large delta between Fermi and Kepler which comes close to being replicated when comparing Maxwell to Pascal. From the looks of the vast majority of the tests, the GTX 1060 will be a noticeable upgrade for Linux users no matter which previous mid range card they are currently using. We will likely see a similar article covering AMD in the near future.
"To complement yesterday's launch-day GeForce GTX 1060 Linux review, here are some more benchmark results with the various NVIDIA x60 graphics cards I have available for testing going back to the GeForce GTX 460 Fermi. If you are curious about the raw OpenGL/OpenCL/CUDA performance and performance-per-Watt for these mid-range x60 graphics cards from Fermi, Kepler, Maxwell, and Pascal, here are these benchmarks from Ubuntu 16.04 Linux." Here are some more Graphics Card articles from around the web:
- ASUS ROG STRIX-GTX1070-O8G-GAMING: GTX 1070, Strix Style! @ Bjorn3d
- MSI GeForce GTX 1060 Gaming X Review @HiTech Legion
- EVGA GeForce GTX 1070 SC Gaming ACX 3.0 Review - Affordable Enthusiast Gaming @HiTech Legion
- Radeon RX 480 performance revisited with AMD's 16.7.1 driver @ The Tech Report
- AMD Radeon RX 480 8GB CrossFire @ [H]ard|OCP
Subject: Graphics Cards | July 16, 2016 - 06:37 PM | Scott Michaud
Tagged: Volta, pascal, nvidia, maxwell, 16nm
For the past few generations, NVIDIA has been roughly trying to release a new architecture with a new process node, and release a refresh the following year. This ran into a hitch as Maxwell was delayed a year, apart from the GTX 750 Ti, and then pushed back to the same 28nm process that Kepler utilized. Pascal caught up with 16nm, although we know that some hard, physical limitations are right around the corner. The lattice spacing for silicon at room temperature is around ~0.5nm, so we're talking about features the size of ~the low 30s of atoms in width.
This rumor claims that NVIDIA is not trying to go with 10nm for Volta. Instead, it will take place on the same, 16nm node that Pascal is currently occupying. This is quite interesting, because GPUs scale quite well with complexity changes, as they have many features with a relatively low clock rate, so the only real ways to increase performance are to make the existing architecture more efficient, or make a larger chip.
That said, GP100 leaves a lot of room on the table for an FP32-optimized, ~600mm2 part to crush its performance at the high end, similar to how GM200 replaced GK110. The rumored GP102, expected in the ~450mm2 range for Titan or GTX 1080 Ti-style parts, has some room to grow. Like GM200, however, it would also be unappealing to GPU compute users who need FP64. If this is what is going on, and we're totally just speculating at the moment, it would signal that enterprise customers should expect a new GPGPU card every second gaming generation.
That is, of course, unless NVIDIA recognized ways to make the Maxwell-based architecture significantly more die-space efficient in Volta. Clocks could get higher, or the circuits themselves could get simpler. You would think that, especially in the latter case, they would have integrated those ideas into Maxwell and Pascal, though; but, like HBM2 memory, there might have been a reason why they couldn't.
We'll need to wait and see. The entire rumor could be crap, who knows?
Subject: Graphics Cards | June 21, 2016 - 05:22 PM | Scott Michaud
Tagged: nvidia, fermi, kepler, maxwell, pascal, gf100, gf110, GK104, gk110, GM204, gm200, GP104
Techspot published an article that compared eight GPUs across six, high-end dies in NVIDIA's last four architectures: Fermi to Pascal. Average frame rates were listed across nine games, each measured at three resolutions:1366x768 (~720p HD), 1920x1080 (1080p FHD), and 2560x1600 (~1440p QHD).
The results are interesting. Comparing GP104 to GF100, mainstream Pascal is typically on the order of four times faster than big Fermi. Over that time, we've had three full generational leaps in fabrication technology, leading to over twice the number of transistors packed into a die that is almost half the size. It does, however, show that prices have remained relatively constant, except that the GTX 1080 is sort-of priced in the x80 Ti category despite the die size placing it in the non-Ti class. (They list the 1080 at $600, but you can't really find anything outside the $650-700 USD range).
It would be interesting to see this data set compared against AMD. It's informative for an NVIDIA-only article, though.
NVIDIA's Ansel Technology
“In-game photography” is an interesting concept. Not too long ago, it was difficult to just capture the user's direct experience with a title. Print screen could only hold a single screenshot at a time, which allowed Steam and FRAPS to provide a better user experience. FRAPS also made video more accessible to the end-user, but it output huge files and, while it wasn't too expensive, it needed to be purchased online, which was a big issue ten-or-so years ago.
Seeing that their audience would enjoy video captures, NVIDIA introduced ShadowPlay a couple of years ago. The feature allowed users to, not only record video, but also capture the last few minutes. It did this with hardware acceleration, and it did this for free (for compatible GPUs). While I don't use ShadowPlay, preferring the control of OBS, it's a good example of how NVIDIA wants to support their users. They see these features as a value-add, which draw people to their hardware.
Subject: Graphics Cards | May 10, 2016 - 07:50 PM | Scott Michaud
Tagged: nvidia, maxwell, GTX 980 Ti, GTX 970, GTX 1080, geforce
The GTX 1080 announcement is starting to ripple into retailers, leading to price cuts on the previous generation, Maxwell-based SKUs. If you were interested in the GTX 1080, or an AMD graphics card of course, then you probably want to keep waiting. That said, you can take advantage of the discounts to get a VR-ready GPU or if you already have a Maxwell card that could use a cheap SLI buddy.
This tip comes from a NeoGAF thread. Microcenter has several cards on sale, but EVGA seems to have the biggest price cuts. This 980 Ti has dropped from $750 USD down to $499.99 (or $474.99 if you'll promise yourself to do that mail-in rebate). That's a whole third of its price slashed, and puts it about a hundred dollars under GTX 1080. Granted, it will also be slower than the GTX 1080, with 2GB less video RAM, but $100 might be worth that for you.
Subject: Graphics Cards | March 30, 2016 - 02:58 AM | Tim Verry
Tagged: maxwell, gtx 950, GM206, asus
Asus is launching a new midrange gaming graphics card clad in arctic camouflage. The Echelon GTX 950 Limited Edition is a Maxwell-based card that will come factory overclocked and paired with Asus features normally reserved for their higher end cards.
This dual slot, dual fan graphics card features “auto-extreme technology” which is Asus marketing speak for high end capacitors, chokes, and other components. Further, the card uses a DirectCU II cooler that Asus claims offers 20% better cooling performance while being 3-times quieter than the NVIDIA reference cooler. Asus tweaked the shroud on this card to resemble a white and gray arctic camouflage design. There is also a reinforced backplate that continues the stealthy camo theme.
I/O on the Echelon GTX 950 Limited Edition includes:
- 1 x DVI-D
- 1 x DVI-I
- 1 x HDMI 2.0
- 1 x DisplayPort
The card supports NVIDIA’s G-Sync technology and the inclusion of an HDMI 2.0 port allows it to be used in a HTPC/gaming PC build for the living room though case selection would be limited since it’s a larger dual slot card.
Beneath the stealthy exterior, Asus conceals a GM206-derived GTX 950 GPU with 768 CUDA cores, 48 Texture Units, and 32 ROPs as well as 2GB of GDDR5 memory. Out of the box, users have two factory overclocks to choose from that Asus calls Gaming and Overclock modes. In gaming mode, the Echelon GTX 950 GPU is clocked at 1,140 MHz base and 1,329 MHz boost. Turing the card to OC Mode, clockspeeds are further increased to 1,165 MHz base and 1,355 MHz boost.
For reference, the, well, reference GTX 950 clockspeeds are 1,024 MHz base and 1,186 MHz boost.
Asus also ever-so-slightly overclocked the GDDR5 memory to 6,610 MHz which is unfortunately a mere 10MHz over reference. The memory sits on a 128-bit bus and while a factory overclock is nice to see, transfer speeds increases will be minimal at best.
In our review of the GTX 950 which focused on the Asus Strix variant, Ryan found it be a good option for 1080p gamers wanting a bit more graphical prowess than the 750Ti for their games.
Maximum PC reports that camo-clad Echelon GTX 950 will be available at the end of the month. Pricing has not been released by Asus, but I would expect this card to come with an MSRP of around $180 USD.
A new fighter has entered the ring
When EVGA showed me that it was entering the world of gaming notebooks at CES in January, I must admit, I questioned the move. A company that, at one point, only built and distributed graphics cards based on NVIDIA GeForce GPUs had moved to mice, power supplies, tablets (remember that?) and even cases, was going to get into the cutthroat world of notebooks. But I was promised that EVGA had an angle; it would not be cutting any corners in order to bring a truly competitive and aggressive product to the market.
Just a couple of short months later (seriously, is it the end of March already?) EVGA presented us with a shiny new SC17 Gaming Notebook to review. It’s thinner than you might expect, heavier than I would prefer and packs some impressive compute power, along with unique features and overclocking capability, that will put it on your short list of portable gaming rigs for 2016.
Let’s start with a dive into the spec table and then go from there.
|EVGA SC17 Specifications|
|Processor||Intel Core i7-6820HK|
|Memory||32GB G.Skill DDR4-2666|
|Graphics Card||GeForce GTX 980M 8GB|
|Storage||256GB M.2 NVMe PCIe SSD
1TB 7200 RPM SATA 6G HDD
|Display||Sharp 17.3 inch UDH 4K with matte finish|
|Connectivity||Intel 219-V Gigabit Ethernet
Intel AC-8260 802.11ac
2x USB 3.0 Type-A
1x USB 3.1 Type-C
|Audio||Realtek ALC 255
|Video||1x HDMI 1.4
2x mini DisplayPort (1x G-Sync support)
|Dimensions||16-in x 11.6-in x 1.05-in|
|OS||Windows 10 Home|
With a price tag of $2,699, EVGA owes you a lot – and it delivers! The processor of choice is the Intel Core i7-6820HK, an unlocked, quad-core, HyperThreaded processor that brings desktop class computing capability to a notebook. The base clock speed is 2.7 GHz but the Turbo clock reaches as high as 3.6 GHz out of the box, supplying games, rendering programs and video editors plenty of horsepower for production on the go. And don’t forget that this is one of the first unlocked processors from Intel for mobile computing – multipliers and voltages can all be tweaked in the UEFI or through Precision X Mobile software to push it even further.
Based on EVGA’s relationship with NVIDIA, it should surprise exactly zero people that a mobile GeForce GPU is found inside the SC17. The GTX 980M is based on the Maxwell 2.0 design and falls slightly under the desktop consumer class GeForce GTX 970 card in CUDA core count and clock speed. With 1536 CUDA cores and a 1038 MHz base clock, with boost capability, the discrete graphics will have enough juice for most games at very high image quality settings. EVGA has configured the GPU with 8GB of GDDR5 memory, more than any desktop GTX 970… so there’s that. Obviously, it would have been great to see the full powered GTX 980 in the SC17, but that would have required changes to the thermal design, chassis and power delivery.
Subject: Graphics Cards | January 20, 2016 - 03:26 PM | Scott Michaud
Tagged: nvidia, linux, tesla, fermi, kepler, maxwell
It's nice to see long-term roundups every once in a while. They do not really provide useful information for someone looking to make a purchase, but they show how our industry is changing (or not). In this case, Phoronix tested twenty-seven NVIDIA GeForce cards across four architectures: Tesla, Fermi, Kepler, and Maxwell. In other words, from the GeForce 8 series all the way up to the GTX 980 Ti.
Image Credit: Phoronix
Nine years of advancements in ASIC design, with a doubling time-step of 18 months, should yield a 64-fold improvement. The number of transistors falls short, showing about a 12-fold improvement between the Titan X and the largest first-wave Tesla, although that means nothing for a fabless semiconductor designer. The main reason why I include this figure is to show the actual Moore's Law trend over this time span, but it also highlights the slowdown in process technology.
Performance per watt does depend on NVIDIA though, and the ratio between the GTX 980 Ti and the 8500 GT is about 72:1. While this is slightly better than the target 64:1 ratio, these parts are from very different locations in their respective product stacks. Swapping the 8500 GT for the following year's 9800 GTX, which leads to a comparison between top-of-the-line GPUs of their respective times, and you see a 6.2x improvement in performance per watt versus the GTX 980 Ti. On the other hand, that part was outstanding for its era.
I should note that each of these tests take place on Linux. It might not perfectly reflect the landscape on Windows, but again, it's interesting in its own right.
Subject: General Tech | November 12, 2015 - 02:46 AM | Tim Verry
Tagged: Tegra X1, nvidia, maxwell, machine learning, jetson, deep neural network, CUDA, computer vision
Nearly two years ago, NVIDIA unleashed the Jetson TK1, a tiny module for embedded systems based around the company's Tegra K1 "super chip." That chip was the company's first foray into CUDA-powered embedded systems capable of machine learning including object recognition, 3D scene processing, and enabling things like accident avoidance and self-parking cars.
Now, NVIDIA is releasing even more powerful kit called the Jetson TX1. This new development platform covers two pieces of hardware: the credit card sized Jetson TX1 module and a larger Jetson TX1 Development Kit that the module plugs into and provides plenty of I/O options and pin outs. The dev kit can be used by software developers or for prototyping while the module alone can be used with finalized embedded products.
NVIDIA foresees the Jetson TX1 being used in drones, autonomous vehicles, security systems, medical devices, and IoT devices coupled with deep neural networks, machine learning, and computer vision software. Devices would be able to learn from the environment in order to navigate safely, identify and classify objects of interest, and perform 3D mapping and scene modeling. NVIDIA partnered with several companies for proof-of-concepts including Kespry and Stereolabs.
Using the TX1, Kespry was able to use drones to classify and track in real time construction equipment moving around a construction site (in which the drone was not necessarily programmed for exactly as sites and weather conditions vary, the machine learning/computer vision was used to allow the drone to navigate the construction site and a deep neural network was used to identify and classify the type of equipment it saw using its cameras. Meanwhile Stereolabs used high resolution cameras and depth sensors to capture photos of buildings and then used software to reconstruct the 3D scene virtually for editing and modeling. You can find other proof-of-concept videos, including upgrading existing drones to be more autonomous posted here.
From the press release:
"Jetson TX1 will enable a new generation of incredibly capable autonomous devices," said Deepu Talla, vice president and general manager of the Tegra business at NVIDIA. "They will navigate on their own, recognize objects and faces, and become increasingly intelligent through machine learning. It will enable developers to create industry-changing products."
But what about the hardware side of things? Well, the TX1 is a respectable leap in hardware and compute performance. Sitting at 1 Teraflops of rated (FP16) compute performance, the TX1 pairs four ARM Cortex A57 and four ARM Cortex A53 64-bit CPU cores with a 256-core Maxwell-based GPU. Definitely respectable for its size and low power consumption, especially considering NVIDIA claims the SoC can best the Intel Skylake Core i7-6700K in certain workloads (thanks to the GPU portion). The module further contains 4GB of LPDDR4 memory and 16GB of eMMC flash storage.
In short, while on module storage has not increased, RAM has been doubled and compute performance has tripled for FP16 compute performance and jumped by approximately 40% for FP32 versus the Jetson TK1's 2GB of DDR3 and 192-core Kepler GPU. The TX1 also uses a smaller process node at 20nm (versus 28nm) and the chip is said to use "very little power." Networking support includes 802.11ac and Gigabit Ethernet. The chart below outlines the major differences between the two platforms.
|Jetson TX1||Jetson TK1|
|GPU (Architecture)||256-core (Maxwell)||192-core (Kepler)|
|CPU||4 x ARM Cortex A57 + 4 x A53||"4+1" ARM Cortex A15 "r3"|
|RAM||4 GB LPDDR4||2 GB LPDDR3|
|eMMC||16 GB||16 GB|
|Compute Performance (FP16)||1 TFLOP||326 GFLOPS|
|Compute Performance (FP32) - via AnandTech||512 GFLOPS (AT's estimation)||326 GFLOPS (NVIDIA's number)|
The TX1 will run the Linux For Tegra operating system and supports the usual suspects of CUDA 7.0, cuDNN, and VisionWorks development software as well as the latest OpenGL drivers (OpenGL 4.5, OpenGL ES 3.1, and Vulkan).
NVIDIA is continuing to push for CUDA Everywhere, and the Jetson TX1 looks to be a more mature product that builds on the TK1. The huge leap in compute performance should enable even more interesting projects and bring more sophisticated automation and machine learning to smaller and more intelligent devices.
For those interested, the Jetson TX1 Development Kit (the full I/O development board with bundled module) will be available for pre-order today at $599 while the TX1 module itself will be available soon for approximately $299 each in orders of 1,000 or more (like Intel's tray pricing).
With CUDA 7, it is apparently possible for the GPU to be used for general purpose processing as well which may open up some doors that where not possible before in such a small device. I am interested to see what happens with NVIDIA's embedded device play and what kinds of automated hardware is powered by the tiny SoC and its beefy graphics.
To the Max?
Much of the PC enthusiast internet, including our comments section, has been abuzz with “Asynchronous Shader” discussion. Normally, I would explain what it is and then outline the issues that surround it, but I would like to swap that order this time. Basically, the Ashes of the Singularity benchmark utilizes Asynchronous Shaders in DirectX 12, but they disable it (by Vendor ID) for NVIDIA hardware. They say that this is because, while the driver reports compatibility, “attempting to use it was an unmitigated disaster in terms of performance and conformance”.
AMD's Robert Hallock claims that NVIDIA GPUs, including Maxwell, cannot support the feature in hardware at all, while all AMD GCN graphics cards do. NVIDIA has yet to respond to our requests for an official statement, although we haven't poked every one of our contacts yet. We will certainly update and/or follow up if we hear from them. For now though, we have no idea whether this is a hardware or software issue. Either way, it seems more than just politics.
So what is it?
Simply put, Asynchronous Shaders allows a graphics driver to cram workloads in portions of the GPU that are idle, but not otherwise available. For instance, if a graphics task is hammering the ROPs, the driver would be able to toss an independent physics or post-processing task into the shader units alongside it. Kollock from Oxide Games used the analogy of HyperThreading, which allows two CPU threads to be executed on the same core at the same time, as long as it has the capacity for it.
Kollock also notes that compute is becoming more important in the graphics pipeline, and it is possible to completely bypass graphics altogether. The fixed-function bits may never go away, but it's possible that at least some engines will completely bypass it -- maybe even their engine, several years down the road.
But, like always, you will not get an infinite amount of performance by reducing your waste. You are always bound by the theoretical limits of your components, and you cannot optimize past that (except for obviously changing the workload itself). The interesting part is: you can measure that. You can absolutely observe how long a GPU is idle, and represent it as a percentage of a time-span (typically a frame).
And, of course, game developers profile GPUs from time to time...
According to Kollock, he has heard of some console developers getting up to 30% increases in performance using Asynchronous Shaders. Again, this is on console hardware and so this amount may increase or decrease on the PC. In an informal chat with a developer at Epic Games, so massive grain of salt is required, his late night ballpark “totally speculative” guesstimate is that, on the Xbox One, the GPU could theoretically accept a maximum ~10-25% more work in Unreal Engine 4, depending on the scene. He also said that memory bandwidth gets in the way, which Asynchronous Shaders would be fighting against. It is something that they are interested in and investigating, though.
This is where I speculate on drivers. When Mantle was announced, I looked at its features and said “wow, this is everything that a high-end game developer wants, and a graphics developer absolutely does not”. From the OpenCL-like multiple GPU model taking much of the QA out of SLI and CrossFire, to the memory and resource binding management, this should make graphics drivers so much easier.
It might not be free, though. Graphics drivers might still have a bunch of games to play to make sure that work is stuffed through the GPU as tightly packed as possible. We might continue to see “Game Ready” drivers in the coming years, even though much of that burden has been shifted to the game developers. On the other hand, maybe these APIs will level the whole playing field and let all players focus on chip design and efficient injestion of shader code. As always, painfully always, time will tell.