Subject: Graphics Cards | August 10, 2015 - 06:14 PM | Sebastian Peak
Tagged: overclocking, overclock, open source, nvidia, MSI Afterburner, API
An author called "2PKAQWTUQM2Q7DJG" (likely not a real name) has published a fascinating little article today on his/her Wordpress blog entitled, "Overclocking Tools for NVIDIA GPUs Suck. I Made My Own". What it contains is a full account of the process of creating an overclocking tool beyond the constraints of common utilities such as MSI Afterburner.
By probing MSI's OC utility using Ollydbg (an x86 "assembler level analysing debugger") the author was able to track down how Afterburner was working.
“nvapi.dll” definitely gets loaded here using LoadLibrary/GetModuleHandle. We’re on the right track. Now where exactly is that lib used? ... That’s simple, with the program running and the realtime graph disabled (it polls NvAPI constantly adding noise to the mass of API calls). we place a memory breakpoint on the .Text memory segment of the NVapi.dll inside MSI Afterburner’s process... Then we set the sliders in the MSI tool to get some negligible GPU underclock and hit the “apply” button. It breaks inside NvAPI… magic!
After further explaining the process and his/her source code for an overclocking utility, the user goes on to show the finished product in the form of a command line utility.
There is a link to the finished version of this utility at the end of the article, as well as the entire process with all source code. It makes for an interesting read (even for the painfully inept at programming, such as myself), and the provided link to download this mysterious overclocking utility (disguised as a JPG image file, no less) makes it both tempting and a little dubious. Does this really allow overclocking any NVIDIA GPU, including mobile? What could be the harm in trying?? In all seriousness however since some of what was seemingly uncovered in the article is no doubt proprietary, how long will this information be available?
It would probably be wise to follow the link to the Wordpress page ASAP!
It's Basically a Function Call for GPUs
Mantle, Vulkan, and DirectX 12 all claim to reduce overhead and provide a staggering increase in “draw calls”. As mentioned in the previous editorial, loading graphics card with tasks will take a drastic change in these new APIs. With DirectX 10 and earlier, applications would assign attributes to (what it is told is) the global state of the graphics card. After everything is configured and bound, one of a few “draw” functions is called, which queues the task in the graphics driver as a “draw call”.
While this suggests that just a single graphics device is to be defined, which we also mentioned in the previous article, it also implies that one thread needs to be the authority. This limitation was known about for a while, and it contributed to the meme that consoles can squeeze all the performance they have, but PCs are “too high level” for that. Microsoft tried to combat this with “Deferred Contexts” in DirectX 11. This feature allows virtual, shadow states to be loaded from secondary threads, which can be appended to the global state, whole. It was a compromise between each thread being able to create its own commands, and the legacy decision to have a single, global state for the GPU.
Some developers experienced gains, while others lost a bit. It didn't live up to expectations.
The paradigm used to load graphics cards is the problem. It doesn't make sense anymore. A developer might not want to draw a primitive with every poke of the GPU. At times, they might want to shove a workload of simple linear algebra through it, while other requests could simply be pushing memory around to set up a later task (or to read the result of a previous one). More importantly, any thread could want to do this to any graphics device.
The new graphics APIs allow developers to submit their tasks quicker and smarter, and it allows the drivers to schedule compatible tasks better, even simultaneously. In fact, the driver's job has been massively simplified altogether. When we tested 3DMark back in March, two interesting things were revealed:
- Both AMD and NVIDIA are only a two-digit percentage of draw call performance apart
- Both AMD and NVIDIA saw an order of magnitude increase in draw calls
Subject: Graphics Cards, Processors, Mobile, Shows and Expos | August 10, 2015 - 09:01 AM | Scott Michaud
Tagged: vulkan, spir, siggraph 2015, Siggraph, opengl sc, OpenGL ES, opengl, opencl, Khronos
When the Khronos Group announced Vulkan at GDC, they mentioned that the API is coming this year, and that this date is intended to under promise and over deliver. Recently, fans were hoping that it would be published at SIGGRAPH, which officially begun yesterday. Unfortunately, Vulkan has not released. It does hold a significant chunk of the news, however. Also, it's not like DirectX 12 is holding a commanding lead at the moment. The headers were public only for a few months, and the code samples are less than two weeks old.
The organization made announcements for six products today: OpenGL, OpenGL ES, OpenGL SC, OpenCL, SPIR, and, as mentioned, Vulkan. They wanted to make their commitment clear, to all of their standards. Vulkan is urgent, but some developers will still want the framework of OpenGL. Bind what you need to the context, then issue a draw and, if you do it wrong, the driver will often clean up the mess for you anyway. The briefing was structure to be evident that it is still in their mind, which is likely why they made sure three OpenGL logos greeted me in their slide deck as early as possible. They are also taking and closely examining feedback about who wants to use Vulkan or OpenGL, and why.
As for Vulkan, confirmed platforms have been announced. Vendors have committed to drivers on Windows 7, 8, 10, Linux, including Steam OS, and Tizen (OSX and iOS are absent, though). Beyond all of that, Google will accept Vulkan on Android. This is a big deal, as Google, despite its open nature, has been avoiding several Khronos Group standards. For instance, Nexus phones and tablets do not have OpenCL drivers, although Google isn't stopping third parties from rolling it into their devices, like Samsung and NVIDIA. Direct support of Vulkan should help cross-platform development as well as, and more importantly, target the multi-core, relatively slow threaded processors of those devices. This could even be of significant use for web browsers, especially in sites with a lot of simple 2D effects. Google is also contributing support from their drawElements Quality Program (dEQP), which is a conformance test suite that they bought back in 2014. They are going to expand it to Vulkan, so that developers will have more consistency between devices -- a big win for Android.
While we're not done with Vulkan, one of the biggest announcements is OpenGL ES 3.2 and it fits here nicely. At around the time that OpenGL ES 3.1 brought Compute Shaders to the embedded platform, Google launched the Android Extension Pack (AEP). This absorbed OpenGL ES 3.1 and added Tessellation, Geometry Shaders, and ASTC texture compression to it. It was also more tension between Google and cross-platform developers, feeling like Google was trying to pull its developers away from Khronos Group. Today, OpenGL ES 3.2 was announced and includes each of the AEP features, plus a few more (like “enhanced” blending). Better yet, Google will support it directly.
Next up are the desktop standards, before we finish with a resurrected embedded standard.
OpenGL has a few new extensions added. One interesting one is the ability to assign locations to multi-samples within a pixel. There is a whole list of sub-pixel layouts, such as rotated grid and Poisson disc. Apparently this extension allows developers to choose it, as certain algorithms work better or worse for certain geometries and structures. There were probably vendor-specific extensions for a while, but now it's a ratified one. Another extension allows “streamlined sparse textures”, which helps manage data where the number of unpopulated entries outweighs the number of populated ones.
OpenCL 2.0 was given a refresh, too. It contains a few bug fixes and clarifications that will help it be adopted. C++ headers were also released, although I cannot comment much on it. I do not know the state that OpenCL 2.0 was in before now.
And this is when we make our way back to Vulkan.
SPIR-V, the code that runs on the GPU (or other offloading device, including the other cores of a CPU) in OpenCL and Vulkan is seeing a lot of community support. Projects are under way to allow developers to write GPU code in several interesting languages: Python, .NET (C#), Rust, Haskell, and many more. The slide lists nine that Khronos Group knows about, but those four are pretty interesting. Again, this is saying that you can write code in the aforementioned languages and have it run directly on a GPU. Curiously missing is HLSL, and the President of Khronos Group agreed that it would be a useful language. The ability to cross-compile HLSL into SPIR-V means that shader code written for DirectX 9, 10, 11, and 12 could be compiled for Vulkan. He expects that it won't take long for a project to start, and might already be happening somewhere outside his Google abilities. Regardless, those who are afraid to program in the C-like GLSL and HLSL shading languages might find C# and Python to be a bit more their speed, and they seem to be happening through SPIR-V.
As mentioned, we'll end on something completely different.
For several years, the OpenGL SC has been on hiatus. This group defines standards for graphics (and soon GPU compute) in “safety critical” applications. For the longest time, this meant aircraft. The dozens of planes (which I assume meant dozens of models of planes) that adopted this technology were fine with a fixed-function pipeline. It has been about ten years since OpenGL SC 1.0 launched, which was based on OpenGL ES 1.0. SC 2.0 is planned to launch in 2016, which will be based on the much more modern OpenGL ES 2 and ES 3 APIs that allow pixel and vertex shaders. The Khronos Group is asking for participation to direct SC 2.0, as well as a future graphics and compute API that is potentially based on Vulkan.
The devices that this platform intends to target are: aircraft (again), automobiles, drones, and robots. There are a lot of ways that GPUs can help these devices, but they need a good API to certify against. It needs to withstand more than an Ouya, because crashes could be much more literal.
Going Beyond the Reference GTX 970
Zotac has been an interesting company to watch for the past few years. It is a company that has made a name for themselves in the small form factor community with some really interesting designs and products. They continue down that path, but they have increasingly focused on high quality graphics cards that address a pretty wide market. They provide unique products from the $40 level up through the latest GTX 980 Ti with hybrid water and air cooling for $770. The company used to focus on reference designs, but some years past they widened their appeal by applying their own design decisions to the latest NVIDIA products.
Catchy looking boxes for people who mostly order online! Still, nice design.
The beginning of this year saw Zotac introduce their latest “Core” brand products that aim to provide high end features to more modestly priced parts. The Core series makes some compromises to hit price points that are more desirable for a larger swath of consumers. The cards often rely on more reference style PCBs with good quality components and advanced cooling solutions. This equation has been used before, but Zotac is treading some new ground by offering very highly clocked cards right out of the box.
Overall Zotac has a very positive reputation in the industry for quality and support.
Plenty of padding in the box to protect your latest investment.
Zotac GTX 970 AMP! Extreme Core Edition
The product we are looking at today is the somewhat long-named AMP! Extreme Core Edition. This is based on the NVIDIA GTX 970 chip which features 56 ROPS, 1.75 MB of L2 cache, and 1664 CUDA Cores. The GTX 970 has of course been scrutinized heavily due to the unique nature of its memory subsystem. While it does physically have a 256 bit bus, the last 512 MB (out of 4GB) is addressed by a significantly slower unit due to shared memory controller capacity. In theory the card reference design supports up to 224 GB/sec of memory bandwidth. There are obviously some very unhappy people out there about this situation, but much of this could have been avoided if NVIDIA had disclosed the exact nature of the GTX 970 configuration.
Subject: Graphics Cards | August 7, 2015 - 10:46 AM | Ryan Shrout
Tagged: sdk, Oculus, nvidia, direct driver mode, amd
In an email sent out by Oculus this morning, the company has revealed some interesting details about the upcoming release of the Oculus SDK 0.7 on August 20th. The most interesting change is the introduction of Direct Driver Mode, developed in tandem with both AMD and NVIDIA.
This new version of the SDK will remove the simplistic "Extended Mode" that many users and developers implemented for a quick and dirty way of getting the Rift development kits up and running. However, that implementation had the downside of additional latency, something that Oculus is trying to eliminate completely.
Here is what Oculus wrote about the "Direct Driver Mode" in its email to developers:
Direct Driver Mode is the most robust and reliable solution for interfacing with the Rift to date. Rather than inserting VR functionality between the OS and the graphics driver, headset awareness is added directly to the driver. As a result, Direct Driver Mode avoids many of the latency challenges of Extended Mode and also significantly reduces the number of conflicts between the Oculus SDK and third party applications. Note that Direct Driver Mode requires new drivers from NVIDIA and AMD, particularly for Kepler (GTX 645 or better) and GCN (HD 7730 or better) architectures, respectively.
We have heard NVIDIA and AMD talk about the benefits of direct driver implementations for VR headsets for along time. NVIDIA calls its software implementation GameWorks VR and AMD calls its software support LiquidVR. Both aim to do the same thing - give more direct access to the headset hardware to the developer while offering new ways for faster and lower latency rendering to games.
Both companies have unique features to offer as well, including NVIDIA and it's multi-res shading technology. Check out our interview with NVIDIA on the topic below:
NVIDIA's Tom Petersen came to our offices to talk about GameWorks VR
Other notes in the email include a tentative scheduled release of November for the 1.0 version of the Oculus SDK. But until that version releases, Oculus is only guaranteeing that each new runtime will support the previous version of the SDK. So, when SDK 0.8 is released, you can only guarantee support for it and 0.7. When 0.9 comes out, game developers will need make sure they are at least on SDK 0.8 otherwise they risk incompatibility. Things will be tough for developers in this short window of time, but Oculus claims its necessary to "allow them to more rapidly evolve the software architecture and API." After SDK 1.0 hits, future SDK releases will continue to support 1.0.
Bioshock Infinite Results
Our Intel Skylake launch coverage is intense! Make sure you hit up all the stories and videos that are interesting for you!
- The Intel Core i7-6700K Review - Skylake First for Enthusiasts (Video)
- Skylake vs. Sandy Bridge: Discrete GPU Showdown (Video)
- ASUS Z170-A Motherboard Preview
- Intel Skylake / Z170 Rapid Storage Technology Tested - PCIe and SATA RAID
Today marks the release of Intel's newest CPU architecture, code named Skylake. I already posted my full review of the Core i7-6700K processor so, if you are looking for CPU performance and specification details on that part, you should start there. What we are looking at in this story is the answer to a very simple, but also very important question:
Is it time for gamers using Sandy Bridge system to finally bite the bullet and upgrade?
I think you'll find that answer will depend on a few things, including your gaming resolution and aptitude for multi-GPU configuration, but even I was surprised by the differences I saw in testing.
Our testing scenario was quite simple. Compare the gaming performance of an Intel Core i7-6700K processor and Z170 motherboard running both a single GTX 980 and a pair of GTX 980s in SLI against an Intel Core i7-2600K and Z77 motherboard using the same GPUs. I installed both the latest NVIDIA GeForce drivers and the latest Intel system drivers for each platform.
|Skylake System||Sandy Bridge System|
|Processor||Intel Core i7-6700K||Intel Core i7-2600K|
|Motherboard||ASUS Z170-Deluxe||Gigabyte Z68-UD3H B3|
|Memory||16GB DDR4-2133||8GB DDR3-1600|
|Graphics Card||1x GeForce GTX 980
2x GeForce GTX 980 (SLI)
|1x GeForce GTX 980
2x GeForce GTX 980 (SLI)
|OS||Windows 8.1||Windows 8.1|
Our testing methodology follows our Frame Rating system, which uses a capture-based system to measure frame times at the screen (rather than trusting the software's interpretation).
If you aren't familiar with it, you should probably do a little research into our testing methodology as it is quite different than others you may see online. Rather than using FRAPS to measure frame rates or frame times, we are using an secondary PC to capture the output from the tested graphics card directly and then use post processing on the resulting video to determine frame rates, frame times, frame variance and much more.
This amount of data can be pretty confusing if you attempting to read it without proper background, but I strongly believe that the results we present paint a much more thorough picture of performance than other options. So please, read up on the full discussion about our Frame Rating methods before moving forward!!
While there are literally dozens of file created for each “run” of benchmarks, there are several resulting graphs that FCAT produces, as well as several more that we are generating with additional code of our own.
If you need some more background on how we evaluate gaming performance on PCs, just check out my most recent GPU review for a full breakdown.
I only had time to test four different PC titles:
- Bioshock Infinite
- Grand Theft Auto V
- GRID 2
- Metro: Last Light
Subject: Graphics Cards | August 4, 2015 - 06:06 PM | Jeremy Hellstrom
Tagged: 980 Ti, asus, msi, gigabyte, evga, GTX 980 Ti G1 GAMING, GTX 980 Ti STRIX OC, GTX 980 Ti gaming 6g
If you've decided that the GTX 980 Ti is the card for you due to price, performance or other less tangible reasons you will find that there are quite a few to choose from. Each have the same basic design but the coolers and frequencies vary between manufacturers, as do the prices. That is why it is handy that The Tech Report have put together a round up of four models for a direct comparison. In the article you will see the EVGA GeForce GTX 980 Ti SC+, Gigabyte's GTX 980 Ti G1 Gaming, MSI GTX 980 Ti Gaming 6G and the ASUS Strix GTX 980 Ti OC Edition. The cards are not only checked for basic and overclocked performance, there is also noise levels and power consumption to think about, so check out the full review.
"The GeForce GTX 980 Ti is pretty much the fastest GPU you can buy.The aftermarket cards offer higher clocks and better cooling than Nvidia's reference design. But which one is right for you?"
Here are some more Graphics Card articles from around the web:
- MSI GTX 980 Ti Gaming 6G Review @ Hardware Canucks
- Palit GTX 980 Ti Super JetStream 6 GB @ techPowerUp
- MSI GeForce GTX 960 GAMING 4G @ [H]ard|OCP
- Maxwell Hits The Workstation: NVIDIA Quadro M6000 Graphics Card Review @ Techgage
- NVIDIA's Tegra X1 Delivers Stunning Performance On Ubuntu Linux @ Phoronix
- The AMD Radeon R9 Fury Is Currently A Disaster On Linux @ Phoronix
- Sapphire Nitro R9 390 8G D5 Review, Playing With Nitro @ Bjorn3d
Subject: Graphics Cards | August 1, 2015 - 07:31 AM | Scott Michaud
Tagged: nvidia, maxwell, gtx 960, gtx 950 ti, gtx 950
A couple of sites are claiming that NVIDIA intends to replace the first-generation GeForce GTX 750 Ti with more Maxwell, in the form of the GeForce GTX 950 and/or GTX 950 Ti. The general consensus is that it will run on a cut-down GM206 chip, which is currently found in the GTX 960. I will go light on the rumored specifications because this part of the rumor is single-source, from accounts of a HWBattle page that has been deleted. But for a general ballpark of performance, the GTX 960 has a full GM206 chip while the 950(/Ti) is expected to lose about a quarter of its printed shader units.
The particularly interesting part is the power, though. As we reported, Maxwell was branded as a power-efficient version of the Kepler architecture. This led to a high-end graphics cards that could be powered by the PCIe bus. According to these rumors, the new card will require a single, 8-pin power connector on top of the 75W provided by the bus. This has one of two interesting implications that I can think of.
- The 750 Ti did not sell for existing systems as well as anticipated, or
- The GM206 chip just couldn't hit that power target and they didn't want to make another die
Whichever is true, it will be interesting to see how NVIDIA brands this if/when the card launches. Creating a graphics card for systems without available power rails was a novel concept and it seemed to draw attention. That said, the rumors claim they're not doing it this time... for some reason.
A few years ago, we took our first look at the inexpensive 27" 1440p monitors which were starting to flood the market via eBay sellers located in Korea. These monitors proved to be immensely popular and largely credited for moving a large number of gamers past 1080p.
However, in the past few months we have seen a new trend from some of these same Korean monitor manufacturers. Just like the Seiki Pro SM40UNP 40" 4K display that we took a look at a few weeks ago, the new trend is large 4K monitors.
Built around a 42-in LG AH-IPS panel, the Wasabi Mango UHD420 is an impressive display. Inclusion of HDMI 2.0 and DisplayPort 1.2 allow you to achieve 4K at a full 60Hz and 4:4:4 color gamut. At a cost of just under $800 on Amazon, this is an incredibly appealing value.
Whether or not the UHD420 is a TV or a monitor is actually quite the tossup. The lack of a tuner
might initially lead you to believe it's not a TV. Inclusion of a DisplayPort connector, and USB 3.0 hub might make you believe it's a monitor, but it's bundled with a remote control (entirely in Korean). In reality, this display could really be used for either use case (unless you use OTA tuning), and really starts to blur the lines between a "dumb" TV and a monitor. You'll also find VESA 400x400mm mounting holes on this display for easy wall mounting.
Subject: Graphics Cards | July 27, 2015 - 04:33 PM | Jeremy Hellstrom
Tagged: 4k, amd, R9 FuryX, GTX 980 Ti, gtx titan x
[H]ard|OCP have set up their testbed for a 4K showdown between the similarly priced GTX 980 Ti and Radeon R9 Fury X with the $1000 TITAN X tossed in there for those with more money than sense. The test uses the new Catalyst 15.7 and the GeForce 353.30 drivers to give a more even playing field while benchmarking Witcher 3, GTA V and other games. When the dust settled the pattern was obvious and the performance differences could be seen. The deltas were not huge but when you are paying $650 + tax for a GPU even performance a few frames better or a graphical option that can be used really matters. Perhaps the most interesting result was the redemption of the TITAN X, its extra price was reflected in the performance results. Check them out for yourself here.
"We take the new AMD Radeon R9 Fury X and evaluate the 4K gaming experience. We will also compare against the price competitive GeForce GTX 980 Ti as well as a GeForce GTX TITAN X. Which video card provides the best experience and performance when gaming at glorious 4K resolution?"
Here are some more Graphics Card articles from around the web:
- PowerColor PCS+ R9 380 4GB: The Affordable 4GB Solution @ Bjorn3D
- AMD Fury X "Fiji" Voltage Scaling @ techPowerUp
- HIS Radeon R9 390 IceQ X2 OC 8GB Video Card Review @ Madshrimps
- XFX R9 380 Double Dissipation 4GB @ [H]ard|OCP
- The New AMD GPU Open-Source Driver On Linux 4.2 Works, But Still A Lot Of Work Ahead @ Phoronix
- MSI Radeon R7 370 GAMING 4G @ Phoronix
- 15-Way AMD/NVIDIA Graphics Card Comparison For 4K Linux Gaming @ Phoronix
- PNY GTX980 Ti XLR8 OC @ Kitguru
- ASUS GTX 980 Ti STRIX Gaming 6 GB @ techPowerUp
- PNY GTX 960 XLR8 Review @ OCC
- GIGABYTE GeForce GTX 970 WindForce 3X OC 4GB Graphics Card Review @ NikKTech
- Inno3D iChill GTX 980 Ti HerculeZ X3 Air Boss Ultra @ HardwareOverclock