Subject: General Tech, Graphics Cards | April 7, 2014 - 09:01 AM | Scott Michaud
Tagged: nvidia, geforce experience, directx 11
We knew that NVIDIA had an impending driver update providing DirectX 11 performance improvements. Launched today, 337.50 still claims significant performance increases over the previous 335.23 version. What was a surprise is GeForce Experience 2.0. This version allows both ShadowPlay and GameStream to operate on notebooks. It also allows ShadowPlay to record, and apparently stream to Twitch, your Windows desktop (but not on notebooks). It also enables Battery Boost, discussed previously.
Personally, I find desktop streaming is the headlining feature, although I rarely use laptops (and much less for gaming). This is especially useful for OpenGL, games which run in windowed mode, and if you want to occasionally screencast without paying for Camtasia or tinkering with CamStudio. If I were to make a critique, and of course I will, I would like the option to select which monitor gets recorded. Its current behavior records the primary monitor as far as I can tell.
I should also mention that, in my testing, "shadow recording" is not supported when not recording a fullscreen game. I'm guessing that NVIDIA believes their users would prefer to not record their desktops until manually started and likewise stopped. It seems like it had to have been a conscious decision. It does limit its usefulness in OpenGL or windowed games, however.
This driver also introduces GameStream for devices out of your home discussed in the SHIELD update.
This slide is SLi improvements, driver-to driver, for the GTX 770 and the 780 Ti.
As for the performance boost, NVIDIA claims up to 64% faster performance in configurations with one active GPU and up to 71% faster in SLI. It will obviously vary on a game-by-game and GPU-by-GPU basis. I do not have any benchmarks, besides a few examples provided by NVIDIA, to share. That said, it is a free driver. If you have a GeForce GPU, download it. It does complicate matters if you are deciding between AMD and NVIDIA, however.
BF4 Integrates FCAT Overlay Support
Back in September AMD publicly announced Mantle, a new lower level API meant to offer more performance for gamers and more control for developers fed up with the restrictions of DirectX. Without diving too much into the politics of the release, the fact that Battlefield 4 developer DICE was integrating Mantle into the Frostbite engine for Battlefield was a huge proof point for the technology. Even though the release was a bit later than AMD had promised us, coming at the end of January 2014, one of the biggest PC games on the market today had integrated a proprietary AMD API.
When I did my first performance preview of BF4 with Mantle on February 1st, the results were mixed but we had other issues to deal with. First and foremost, our primary graphics testing methodology, called Frame Rating, wasn't able to be integrated due to the change of API. Instead we were forced to use an in-game frame rate counter built by DICE which worked fine, but didn't give us the fine grain data we really wanted to put the platform to the test. It worked, but we wanted more. Today we are happy to announce we have full support for our Frame Rating and FCAT testing with BF4 running under Mantle.
A History of Frame Rating
In late 2012 and throughout 2013, testing graphics cards became a much more complicated beast. Terms like frame pacing, stutter, jitter and runts were not in the vocabulary of most enthusiasts but became an important part of the story just about one year ago. Though complicated to fully explain, the basics are pretty simple.
Rather than using software on the machine being tested to measure performance, our Frame Rating system uses a combination of local software and external capture hardware. On the local system with the hardware being evaluated we run a small piece of software called an overlay that draws small colored bars on the left hand side of the game screen that change successively with each frame rendered by the game. Using a secondary system, we capture the output from the graphics card directly, intercepting it from the display output, in real-time in an uncompressed form. With that video file captured, we then analyze it frame by frame, measuring the length of each of those colored bars, how long they are on the screen, how consistently they are displayed. This allows us to find the average frame rate but also to find how smoothly the frames are presented, if there are dropped frames and if there are jitter or stutter issues.
Subject: General Tech, Graphics Cards | April 1, 2014 - 04:42 PM | Tim Verry
Tagged: VCA, nvidia, GTC 2014
NVIDIA launched a new visual computing appliance called the Iray VCA at the GPU Technology Conference last week. This new piece of enterprise hardware uses full GK 110 graphics cards to accelerate the company’s Iray renderer which is used to create photo realistic models in various design programs.
The Iray VCA specifically is a licensed appliance (hardware + software) that combines NVIDIA hardware and software. On the hardware side of things, the Iray VCA is powered by eight graphics cards, dual processors (unspecified but likely Intel Xeons based on usage in last year’s GRID VCA), 256GB of system RAM, and a 2TB SSD. Networking hardware includes two 10GbE NICs, two 1GbE NICs, and one Infiniband connection. In total, the Iray VCA features 20 CPU cores and 23,040 CUDA cores. The GPUs used are based on the full GK110 die and are paired with 12GB of memory each.
Even better, it is a scalable solution such that companies can add additional Iray VCAs to the network. The appliances reportedly transparently accelerate the Iray accelerated renders done on designer’s workstations. NVIDIA reports that an Iray VCA is approximately 60-times faster than a Quadro K5000-powered workstation. Further, according to NVIDIA, 19 Iray VCAs working together amounts to 1 PetaFLOP of compute performance which is enough to render photo realistic simulations using 1 billion rays with up to hundreds of thousands of bounces.
The Iray VCA enables some rather impressive real time renders of 3D models with realistic physical properties and lighting. The models are light simulations that use ray tracing, global illumination and other techniques to show photo realistic models using up to billions of rays of light. NVIDIA is positioning the Iray VCA as an alternative to physical prototyping, allowing designers to put together virtual prototypes that can be iterated and changed at significantly less cost and time.
Iray itself is NVIDIA’s GPU-accelerated photo realistic renderer. The Iray technology is used in a number of design software packages. The Iray VCA is meant to further accelerate that Iray renderer by throwing massive amounts of parallel processing hardware at the resource intensive problem over the network (the Iray VCAs can be installed at a data center or kept on site). Initially the Iray VCA will support 3ds Max, Catia, Bunkspeed, and Maya, but NVIDIA is working on supporting all Iray accelerated software with the VCA hardware.
The virtual prototypes can be sliced and examined and can even be placed in real world environments by importing HDR photos. Jen-Hsun Huang demonstrated this by placing Honda’s vehicle model on the GTC stage (virtually).
In fact, one of NVIDIA’s initial partners with the Iray VCA is Honda. Honda is currently beta testing a cluster of 25 Iray VCAs to refine styling designs for cars and their interiors based on initial artistic work. Honda Research and Development System Engineer Daisuke Ide was quoted by NVIDIA as stating that “Our TOPS tool, which uses NVIDIA Iray on our NVIDIA GPU cluster, enables us to evaluate our original design data as if it were real. This allows us to explore more designs so we can create better designs faster and more affordably.”
The Iray VCA (PDF) will be available this summer for $50,000. The sticker price includes the hardware, Iray license, and the first year of updates and maintenance. This is far from consumer technology, but it is interesting technology that may be used in the design process of your next car or other major purchase.
What do you think about the Iray VCA and NVIDIA's licensed hardware model?
Subject: Editorial, General Tech, Graphics Cards, Processors, Shows and Expos | March 30, 2014 - 01:45 AM | Scott Michaud
Tagged: gdc 14, GDC, GCN, amd
While Mantle and DirectX 12 are designed to reduce overhead and keep GPUs loaded, the conversation shifts when you are limited by shader throughput. Modern graphics processors are dominated by sometimes thousands of compute cores. Video drivers are complex packages of software. One of their many tasks is converting your scripts, known as shaders, into machine code for its hardware. If this machine code is efficient, it could mean drastically higher frame rates, especially at extreme resolutions and intense quality settings.
Emil Persson of Avalanche Studios, probably known best for the Just Cause franchise, published his slides and speech on optimizing shaders. His talk focuses on AMD's GCN architecture, due to its existence in both console and PC, while bringing up older GPUs for examples. Yes, he has many snippets of GPU assembly code.
AMD's GCN architecture is actually quite interesting, especially dissected as it was in the presentation. It is simpler than its ancestors and much more CPU-like, with resources mapped to memory (and caches of said memory) rather than "slots" (although drivers and APIs often pretend those relics still exist) and with how vectors are mostly treated as collections of scalars, and so forth. Tricks which attempt to combine instructions together into vectors, such as using dot products, can just put irrelevant restrictions on the compiler and optimizer... as it breaks down those vector operations into those very same component-by-component ops that you thought you were avoiding.
Basically, and it makes sense coming from GDC, this talk rarely glosses over points. It goes over execution speed of one individual op compared to another, at various precisions, and which to avoid (protip: integer divide). Also, fused multiply-add is awesome.
I know I learned.
As a final note, this returns to the discussions we had prior to the launch of the next generation consoles. Developers are learning how to make their shader code much more efficient on GCN and that could easily translate to leading PC titles. Especially with DirectX 12 and Mantle, which lightens the CPU-based bottlenecks, learning how to do more work per FLOP addresses the other side. Everyone was looking at Mantle as AMD's play for success through harnessing console mindshare (and in terms of Intel vs AMD, it might help). But honestly, I believe that it will be trends like this presentation which prove more significant... even if behind-the-scenes. Of course developers were always having these discussions, but now console developers will probably be talking about only one architecture - that is a lot of people talking about very few things.
This is not really reducing overhead; this is teaching people how to do more work with less, especially in situations (high resolutions with complex shaders) where the GPU is most relevant.
Subject: General Tech, Graphics Cards | March 26, 2014 - 05:43 PM | Scott Michaud
Tagged: amd, firepro, W9100
The AMD FirePro W9100 has been announced, bringing the Hawaii architecture to non-gaming markets. First seen in the Radeon R9 series of graphics cards, it has the capacity for 5 TeraFLOPs of single-precision (32-bit) performance and 2 TeraFLOPs of double-precision (64-bit). The card also has 16GB of GDDR5 memory to support it. From the raw numbers, this is slightly more capacity than either the Titan Black or Quadro K6000 in all categories. It will also support six 4K monitors (or three at 60Hz), per card. AMD supports up to four W9100 cards in a single system.
Professional users can be looking for several things in their graphics cards: compute performance (either directly or through licensed software such as Photoshop, Premiere, Blender, Maya, and so forth), several high-resolution monitors (or digital signage units), and/or a lot of graphics performance. The W9100 is basically the top of the stack which covers all three of these requirements.
AMD also announced a system branding initiative called, "AMD FirePro Ultra Workstation". They currently have five launch partners, Supermicro, Boxx, Tarox, Silverdraft, and Versatile Distribution Services, which will have workstations available under this program. The list of components for a "Recommend" certification is: two eight-core 2.6 GHz CPUs, 32GB of RAM, four PCIe 3.0 x16 slots, a 1500W Platinum PSU, and a case with nine expansion slots (to allow four W9100 GPUs along with one SSD or SDI interface card).
Also, while the company has heavily discussed OpenCL in their slide deck, they have not mentioned specific versions. As such, I will assume that the FirePro W9100 supports OpenCL 1.2, like the R9-series, and not OpenCL 2.0 which was ratified back in November. This is still a higher conformance level than NVIDIA, which is at OpenCL 1.1.
Currently no word about pricing or availability.
Subject: General Tech, Graphics Cards, Mobile | March 25, 2014 - 03:01 PM | Scott Michaud
Tagged: shield, nvidia
The SHIELD from NVIDIA is getting a software update which advances GameStream, TegraZone, and the Android OS, itself, to KitKat. Personally, the GameStream enhancements seem most notable as it now allows users to access their home PC's gaming content outside of the home, as if it were a cloud server (but some other parts were interesting, too). Also, from now until the end of April, NVIDIA has temporarily cut the price down to $199.
Going into more detail: GameStream, now out of Beta, will stream games which are rendered on your gaming PC to your SHIELD. Typically, we have seen this through "cloud" services, such as OnLive and GaiKai, which allow access to a set of games that run on their servers (with varying license models). The fear with these services is the lack of ownership, but the advantage is that the slave device just needs enough power to decode an HD video stream.
In NVIDIA's case, the user owns both server (their standard NVIDIA-powered gaming PC, which can now be a laptop) and target device (the SHIELD). This technology was once limited to your own network (which definitely has its uses, especially for the SHIELD as a home theater device) but now can also be exposed over the internet. For this technology, NVIDIA recommends 5 megabit upload and download speeds - which is still a lot of upload bandwidth, even for 2014. In terms of performance, NVIDIA believes that it should live up to expectations set by their GRID. I do not have any experience with this, but others on the conference call took it as good news.
As for content, NVIDIA has expanded the number of supported titles to over a hundred, including new entries: Assassin's Creed IV, Batman: Arkham Origins, Battlefield 4, Call of Duty: Ghosts, Daylight, Titanfall, and Dark Souls II. They also claim that users can add other apps which are not officially supported, Halo 2: Vista was mentioned as an example, for streaming. FPS and Bitrate can now be set by the user. A bluetooth mouse and keyboard can also be paired to SHIELD for that input type through GameStream.
Yeah, I don't like checkbox comparisons either. It's just a summary.
A new TegraZone was also briefly mentioned. Its main upgrade was apparently its library interface. There has also been a number of PC titles ported to Android recently, such as Mount and Blade: Warband.
The update is available now and the $199 promotion will last until the end of April.
Introduction and Technical Specifications
Courtesy of ASUS
The ASUS ROG Poseidon GTX 780 video card is the latest incarnation of the Republic of Gamer (ROG) Poseidon series. Like the previous Poseidon series products, the Poseidon GTX 780 features a hybrid cooler, capable of air and liquid-based cooling for the GPU and on board components. The AUS ROG Poseidon GTX 780 graphics card comes with an MSRP of $599, a premium price for a premium card .
Courtesy of ASUS
In designing the Poseidon GTX 780 graphics card, ASUS packed in many of premium components you would normally find as add-ons. Additionally, the card features motherboard quality power components, featuring a 10 phase digital power regulation system using ASUS DIGI+ VRM technology coupled with Japanese black metallic capacitors. The Poseidon GTX 780 has the following features integrated into its design: DisplayPort output port, HDMI output port, dual DVI ports (DVI-D and DVI-I type ports), aluminum backplate, integrated G 1/4" threaded liquid ports, dual 90mm cooling fans, 6-pin and 8-pin PCIe-style power connectors, and integrated power connector LEDs and ROG logo LED.
DX11 could rival Mantle
The big story at GDC last week was Microsoft’s reveal of DirectX 12 and the future of the dominant API for PC gaming. There was plenty of build up to the announcement with Microsoft’s DirectX team posting teasers and starting up a Twitter account of the occasion. I hosted a live blog from the event which included pictures of the slides. It was our most successful of these types of events with literally thousands of people joining in the conversation. Along with the debates over the similarities of AMD’s Mantle API and the timeline for DX12 release, there are plenty of stories to be told.
After the initial session, I wanted to setup meetings with both AMD and NVIDIA to discuss what had been shown and get some feedback on the planned direction for the GPU giants’ implementations. NVIDIA presented us with a very interesting set of data that both focused on the future with DX12, but also on the now of DirectX 11.
The reason for the topic is easy to decipher – AMD has built up the image of Mantle as the future of PC gaming and, with a full 18 months before Microsoft’s DirectX 12 being released, how developers and gamers respond will make an important impact on the market. NVIDIA doesn’t like to talk about Mantle directly, but it’s obvious that it feels the need to address the questions in a roundabout fashion. During our time with NVIDIA’s Tony Tamasi at GDC, the discussion centered as much on OpenGL and DirectX 11 as anything else.
What are APIs and why do you care?
For those that might not really understand what DirectX and OpenGL are, a bit of background first. APIs (application programming interface) are responsible for providing an abstraction layer between hardware and software applications. An API can deliver consistent programming models (though the language can vary) and do so across various hardware vendors products and even between hardware generations. They can provide access to feature sets of hardware that have a wide range in complexity, but allow users access to hardware without necessarily knowing great detail about it.
Over the years, APIs have developed and evolved but still retain backwards compatibility. Companies like NVIDIA and AMD can improve DirectX implementations to increase performance or efficiency without adversely (usually at least) affecting other games or applications. And because the games use that same API for programming, changes to how NVIDIA/AMD handle the API integration don’t require game developer intervention.
With the release of AMD Mantle, the idea of a “low level” API has been placed in the minds of gamers and developers. The term “low level” can mean many things, but in general it is associated with an API that is more direct, has a thinner set of abstraction layers, and uses less translation from code to hardware. The goal is to reduce the amount of overhead (performance hit) that APIs naturally impair for these translations. With additional performance available, the CPU cycles can be used by the program (game) or be slept to improve battery life. In certain cases, GPU throughput can increase where the API overhead is impeding the video card's progress.
Passing additional control to the game developers, away from the API or GPU driver developers, gives those coders additional power and improves the ability for some vendors to differentiate. Interestingly, not all developers want this kind of control as it requires more time, more development work, and small teams that depend on that abstraction to make coding easier will only see limited performance advantages.
The reasons for this transition to a lower level API is being driven the by widening gap of performance between CPU and GPUs. NVIDIA provided the images below.
On the left we see performance scaling in terms of GFLOPS and on the right the metric is memory bandwidth. Clearly the performance of NVIDIA's graphics chips has far outpaced (as have AMD’s) what the best Intel desktop processor have been able and that gap means that the industry needs to innovate to find ways to close it.
Subject: General Tech, Graphics Cards | March 22, 2014 - 12:43 AM | Scott Michaud
NVIDIA's Release 340.xx GPU drivers for Windows will be the last to contain "enhancements and optimizations" for users with video cards based on architectures before Fermi. While NVIDIA will provided some extended support for 340.xx (and earlier) drivers until April 1st, 2016, they will not be able to install Release 343.xx (or later) drivers. Release 343 will only support Fermi, Kepler, and Maxwell-based GPUs.
The company has a large table on their CustHelp website filled with product models that are pining for the fjords. In short, if the model is 400-series or higher (except the GeForce 405) then it is still fully supported. If you do have the GeForce 405, or anything 300-series and prior, then GeForce Release 340.xx drivers will be the end of the line for you.
As for speculation, Fermi was the first modern GPU architecture for NVIDIA. It transitioned to standards-based (IEEE 754, etc.) data structures, introduced L1 and L2 cache, and so forth. From our DirectX 12 live blog, we also noticed that the new graphics API will, likewise, begin support at Fermi. It feels to me that NVIDIA, like Microsoft, wants to shed the transition period and work on developing a platform built around that baseline.
Subject: Graphics Cards | March 21, 2014 - 02:33 PM | Jeremy Hellstrom
Tagged: ROG MARS 760, nvidia, gtx 760, GK104, asus
If you can afford to spend $1000 or more on a GPU the ASUS ROG MARS GTX 760 is an interesting choice. The two GTX 760 cores on this card are not modified as we have seen on some other two GPU cards, indeed ASUS even overclocked them to a base of 1006MHz and a boost clock of 1072MHz. Ryan reviewed this card back in December, awarding it with a Gold and [H]ard|OCP is revisiting this card with a new driver and a different lineup of games. They also pass this unique card from ASUS a Gold after it finished stomping on AMD and the GTX 780Ti.
"The ASUS ROG MARS 760 is one of the most unique custom built video cards out on the market today. ASUS has designed a video card sporting dual NVIDIA GTX 760 GPUs on a single video card and given gamers something that didn't exist before in the market place. We will find out how it compares with the fastest video cards out there."
Here are some more Graphics Card articles from around the web:
- EVGA GeForce GTX 750 1GB SC Video Card Review @ Legit Reviews
- Gigabyte GTX 750 Ti Video Card Review @ Hardware Asylum
- NVIDIA GeForce GTX 780 Ti in 2-way SL @ X-bit Labs
- Nvidia GTX Titan Black v Palit GTX780 Ti Jetstream OC @ Kitguru
- PNY GeForce GTX 780 OC 3GB Review @ Hardware Canucks
- PNY GeForce GTX 770 OC2 4GB Review @ Hardware Canucks
- Asus GTX 750 Ti OC 2GB @ Kitguru
- ZOTAC GeForce GTX 780 Ti AMP! Edition @ X-bit Labs
- NVIDIA GeForce GTX 750 Ti @ X-bit Labs
- Intel Iris Graphics 5100: Windows 8.1 vs. Ubuntu 14.04 Linux @ Phoronix
- 4K Ultra HD Graphics Card Testing On Ubuntu 14.04 LTS @ Phoronix
- 30-Way Graphics Card Comparison On Ubuntu Linux 14.04 LTS @ Phoronix
- PowerColor Radeon R7 250X 1GB Video Card Review @ Legit Reviews
- MSI Radeon R9 290X Gaming Review! @ Bjorn3D
- PowerColor R9 290X PCS+ 4GB Review @ Hardware Canucks
- Powercolor R9 290X PCS+ 4GB @ eTeknix
- XFX R9 290 Double Dissipation Review @ Hardware Canucks
- Sapphire Vapor-X R9 280X (Refresh) Video Card Review @HiTech Legion
- Powercolor R9 290X PCS+ 4GB @ eTeknix
- AMD Radeon R9 270X and R9 270 @ [H]ard|OCP