Author:
Manufacturer: Various

Early testing for higher end GPUs

UPDATE 2/5/16: Nixxes released a new version of Rise of the Tomb Raider today with some significant changes. I have added another page at the end of this story that looks at results with the new version of the game, a new AMD driver and I've also included some SLI and CrossFire results.

I will fully admit to being jaded by the industry on many occasions. I love my PC games and I love hardware but it takes a lot for me to get genuinely excited about anything. After hearing game reviewers talk up the newest installment of the Tomb Raider franchise, Rise of the Tomb Raider, since it's release on the Xbox One last year, I've been waiting for its PC release to give it a shot with real hardware. As you'll see in the screenshots and video in this story, the game doesn't appear to disappoint.

rotr-screen1.jpg

Rise of the Tomb Raider takes the exploration and "tomb raiding" aspects that made the first games in the series successful and applies them to the visual quality and character design brought in with the reboot of the series a couple years back. The result is a PC game that looks stunning at any resolution, but even more so in 4K, that pushes your hardware to its limits. For single GPU performance, even the GTX 980 Ti and Fury X struggle to keep their heads above water.

In this short article we'll look at the performance of Rise of the Tomb Raider with a handful of GPUs, leaning towards the high end of the product stack, and offer up my view on whether each hardware vendor is living up to expectations.

Continue reading our look at GPU performance in Rise of the Tomb Raider!!

Manufacturer: PC Perspective
Tagged: moores law, gpu, cpu

Are Computers Still Getting Faster?

It looks like CES is starting to wind down, which makes sense because it ended three days ago. Now that we're mostly caught up, I found a new video from The 8-Bit Guy. He doesn't really explain any old technologies in this one. Instead, he poses an open question about computer speed. He was able to have a functional computing experience on a ten-year-old Apple laptop, which made him wonder if the rate of computer advancement is slowing down.

I believe that he (and his guest hosts) made great points, but also missed a few important ones.

One of his main arguments is that software seems to have slowed down relative to hardware. I don't believe that is true, but I believe it's looking in the right area. PCs these days are more than capable of doing just about anything in terms of 2D user interface that we would want to, and do so with a lot of overhead for inefficient platforms and sub-optimal programming (relative to the 80's and 90's at the very least). The areas that require extra horsepower are usually doing large batches of many related tasks. GPUs are key in this area, and they are keeping up as fast as they can, despite some stagnation with fabrication processes and a difficulty (at least before HBM takes hold) in keeping up with memory bandwidth.

For the last five years to ten years or so, CPUs have been evolving toward efficiency as GPUs are being adopted for the tasks that need to scale up. I'm guessing that AMD, when they designed the Bulldozer architecture, hoped that GPUs would have been adopted much more aggressively, but even as graphics devices, they now have a huge effect on Web, UI, and media applications.

google-android-opengl-es-extensions.jpg

These are also tasks that can scale well between devices by lowering resolution (and so forth). The primary thing that a main CPU thread needs to do is figure out the system's state and keep the graphics card fed before the frame-train leaves the station. In my experience, that doesn't scale well (although you can sometimes reduce the amount of tracked objects for games and so forth). Moreover, it is easier to add GPU performance, compared to single-threaded CPU, because increasing frequency and single-threaded IPC should be more complicated than planning out more, duplicated blocks of shaders. These factors combine to give lower-end hardware a similar experience in the most noticeable areas.

So, up to this point, we discussed:

  • Software is often scaling in ways that are GPU (and RAM) limited.
  • CPUs are scaling down in power more than up in performance.
  • GPU-limited tasks can often be approximated with smaller workloads.
    • Software gets heavier, but it doesn't need to be "all the way up" (ex: resolution).
    • Some latencies are hard to notice anyway.

Back to the Original Question

This is where “Are computers still getting faster?” can be open to interpretation.

intel-devilscanyon-overview.JPG

Tasks are diverging from one class of processor into two, and both have separate industries, each with their own, multiple goals. As stated, CPUs are mostly progressing in power efficiency, which extends (an assumed to be) sufficient amount of performance downward to multiple types of devices. GPUs are definitely getting faster, but they can't do everything. At the same time, RAM is plentiful but its contribution to performance can be approximated with paging unused chunks to the hard disk or, more recently on Windows, compressing them in-place. Newer computers with extra RAM won't help as long as any single task only uses a manageable amount of it -- unless it's seen from a viewpoint that cares about multi-tasking.

In short, computers are still progressing, but the paths are now forked and winding.

Author:
Manufacturer: AMD

AMD Polaris Architecture Coming Mid-2016

In early December, I was able to spend some time with members of the newly formed Radeon Technologies Group (RTG), which is a revitalized and compartmentalized section of AMD that is taking over all graphics work. During those meetings, I was able to learn quite a bit about the plans for RTG going forward, including changes for AMD FreeSync and implementation of HDR display technology, and their plans for the GPUOpen open-sourced game development platform.  Perhaps most intriguing of all: we received some information about the next-generation GPU architecture, targeted for 2016.

Codenamed Polaris, this new architecture will be the 4th generation of GCN (Graphics Core Next), and it will be the first AMD GPU that is built on FinFET process technology. These two changes combined promise to offer the biggest improvement in performance per watt, generation to generation, in AMD’s history.

polaris-5.jpg

Though the amount of information provided about the Polaris architecture is light, RTG does promise some changes to the 4th iteration of its GCN design. Those include primitive discard acceleration, an improved hardware scheduler, better pre-fetch, increased shader efficiency, and stronger memory compression. We have already discussed in a previous story that the new GPUs will include HDMI 2.0a and DisplayPort 1.3 display interfaces, which offer some impressive new features and bandwidth. From a multimedia perspective, Polaris will be the first GPU to include support for h.265 4K decode and encode acceleration.

polaris-15.jpg

This slide shows us quite a few changes, most of which were never discussed specifically that we can report, coming to Polaris. Geometry processing and the memory controller stand out as potentially interesting to me – AMD’s Fiji design continues to lag behind NVIDIA’s Maxwell in terms of tessellation performance and we would love to see that shift. I am also very curious to see how the memory controller is configured on the entire Polaris lineup of GPUs – we saw the introduction of HBM (high bandwidth memory) with the Fury line of cards.

Continue reading our overview of the AMD Polaris announcement!!

Author:
Manufacturer: AMD

May the Radeon be with You

In celebration of the release of The Force Awakens as well as the new Star Wars Battlefront game from DICE and EA, AMD sent over some hardware for us to use in a system build, targeted at getting users up and running in Battlefront with impressive quality and performance, but still on a reasonable budget. Pairing up an AMD processor, MSI motherboard, Sapphire GPU with a low cost chassis, SSD and more, the combined system includes a FreeSync monitor for around $1,200.

swbf.jpg

Holiday breaks are MADE for Star Wars Battlefront

Though the holiday is already here and you'd be hard pressed to build this system in time for it, I have a feeling that quite a few of our readers and viewers will find themselves with some cash and gift certificates in hand, just ITCHING for a place to invest in a new gaming PC.

The video above includes a list of components, the build process (in brief) and shows us getting our gaming on with Star Wars Battlefront. Interested in building a system similar the one above on your own? Here's the hardware breakdown.

  AMD Powered Star Wars Battlefront System
Processor AMD FX-8370 - $197
Cooler Master Hyper 212 EVO - $29
Motherboard MSI 990FXA Gaming - $137
Memory AMD Radeon Memory DDR3-2400 - $79
Graphics Card Sapphire NITRO Radeon R9 380X - $266
Storage SanDisk Ultra II 240GB SSD - $79
Case Corsair Carbide 300R - $68
Power Supply Seasonic 600 watt 80 Plus - $69
Monitor AOC G2460PF 1920x1080 144Hz FreeSync - $259
Total Price Full System (without monitor) - Amazon.com - $924

For under $1,000, plus another $250 or so for the AOC FreeSync capable 1080p monitor, you can have a complete gaming rig for your winter break. Let's detail some of the specific components.

cpu.jpg

AMD sent over the FX-8370 processor for our build, a 4-module / 8-core CPU that runs at 4.0 GHz, more than capable of handling any gaming work load you can toss at it. And if you need to do some transcoding, video work or, heaven forbid, school or productivity work, the FX-8370 has you covered there too.

cooler.jpg

For the motherboard AMD sent over the MSI 990FXA Gaming board, one of the newer AMD platforms that includes support for USB 3.1 so you'll have a good length of usability for future expansion. The Cooler Master Hyper 212 EVO cooler was our selection to keep the FX-8370 running smoothly and 8GB of AMD Radeon DDR3-2133 memory is enough for the system to keep applications and the Windows 10 operating system happy.

Continue reading about our AMD system build for Star Wars Battlefront!!

Author:
Manufacturer: AMD

Open Source your GPU!

As part of the AMD’s recent RTG (Radeon Technologies Group) Summit in Sonoma, the company released information about a new initiative to help drive development and evolution in the world of gaming called GPUOpen.  As the name implies, the idea is to use an open source mentality to drivers, libraries, SDKs and more to improve the relationship between AMD’s hardware and the gaming development ecosystem.

gpuopen-5.jpg

When the current generation of consoles was first announced, AMD was riding a wave of positive PR that it hadn’t felt in many years. Because AMD Radeon hardware was at the root of the PlayStation 4 and the Xbox One, game developers would become much more adept at programming for AMD’s GCN architecture and that would waterfall down to PC gamers. At least, that was the plan. In practice though I think you’d be hard pressed to find any analyst to put their name on a statement claiming that proclamation from AMD actually transpired. It just hasn’t happened – but that does not mean that it still can’t if all the pieces fall into place.

gpuopen-7.jpg

The issue that AMD, NVIDIA, and game developers have to work around is a divided development ecosystem. While on the console side programmers tend to have very close to the metal access on CPU and GPU hardware, that hasn’t been the case with PCs until very recently. AMD was the first to make moves in this area with the Mantle API but now we have DirectX 12, a competing low level API, that will have much wider reach than Mantle or Vulkan (what Mantle has become).

AMD also believes, as do many developers, that a “black box” development environment for tools and effects packages is having a negative effect on the PC gaming ecosystem. The black box mentality means that developers don’t have access to the source code of some packages and thus cannot tweak performance and features to their liking.

Continue reading our overview of the new GPUOpen initiative from the Radeon Technologies Group!!

Author:
Manufacturer: AMD

What RTG has planned for 2016

Last week the Radeon Technology Group invited a handful of press and analysts to a secluded location in Sonoma, CA to discuss the future of graphics, GPUs and of course Radeon. For those of you that seem a bit confused, the RTG (Radeon Technologies Group) was spun up inside AMD to encompass all of the graphics products and IP inside the company. Though today’s story is not going to focus on the fundamental changes that RTG brings to the future of AMD, I will note, without commentary, that we saw not a single AMD logo in our presentations or in the signage present throughout the week.

Much of what I learned during the RTG Summit in Sonoma is under NDA and will likely be so for some time. We learned about the future architectures, direction and product theories that will find their way into a range of solutions available in 2016 and 2017.

What I can discuss today is a pair of features that are being updated and improved for current generation graphics cards and for Radeon GPUs coming in 2016: FreeSync and HDR displays. The former is one that readers of PC Perspective should be very familiar with while the latter will offer a new window into content coming in late 2016.

High Dynamic Range Displays: Better Pixels

In just the last couple of years we have seen a spike in resolution for mobile, desktop and notebook displays. We now regularly have 4K monitors on sale for around $500 and very good quality 4K panels going for something in the $1000 range. Couple that with the increase in market share of 21:9 panels with 3440x1440 resolutions and clearly there is a demand from consumers for a better visual experience on their PCs.

rtg1-8.jpg

But what if the answer isn’t just more pixels, but better pixels? We already have this discussed weekly when comparing render resolutions in games of 4K at lower image quality solutions versus 2560x1440 at maximum IQ settings (for example) but the truth is that panel technology has the ability to make a dramatic change to how we view all content – games, movies, productivity – with the introduction of HDR, high dynamic range.

rtg1-10.jpg

As the slide above demonstrates there is a wide range of luminance in the real world that our eyes can see. Sunlight crosses the 1.6 billion nits mark while basic fluorescent lighting in our homes and offices exceeds 10,000 nits. Compare to the most modern PC displays that range from 0.1 nits to 250 nits and you can already tell where the discussion is heading. Even the best LCD TVs on the market today have a range of 0.1 to 400 nits.

Continue reading our overview of new FreeSync and HDR features for Radeon in 2016!!

Author:
Manufacturer: AMD

FreeSync and Frame Pacing Get a Boost

Make sure you catch today's live stream we are hosting with AMD to discuss much more about the new Radeon Software Crimson driver. We are giving away four Radeon graphics cards as well!! Find all the information right here.

Earlier this month AMD announced plans to end the life of the Catalyst Control Center application for control of your Radeon GPU, introducing a new brand simply called Radeon Software. The first iteration of this software, Crimson, is being released today and includes some impressive user experience changes that are really worth seeing and, well, experiencing.

Users will no doubt lament the age of the previous Catalyst Control Center; it was slow, clunky and difficult to navigate around. Radeon Software Crimson changes all of this with a new UI, a new backend that allows it to start up almost instantly, as well as a handful of new features that might be a surprise to some of our readers. Here's a quick rundown of what stands out to me:

  • Opens in less than a second in my testing
  • Completely redesigned and modern user interface
  • Faster display initialization
  • New clean install utility (separate download)
  • Per-game Overdrive (overclocking) settings
  • LiquidVR integration
  • FreeSync improvements at low frame rates
  • FreeSync planned for HDMI (though not implemented yet)
  • Frame pacing support in DX9 titles
  • New custom resolution support
  • Desktop-based Virtual Super Resolution
  • Directional scaling for 2K to 4K upscaling (Fiji GPUs only)
  • Shader cache (precompiled) to reduce compiling-induced frame time variance
  • Non-specific DX12 improvements
  • Flip queue size optimizations (frame buffer length) for specific games
  • Wider target range for Frame Rate Target Control

crimson-7.jpg

That's quite a list of new features, some of which will be more popular than others, but it looks like there should be something for everyone to love about the new Crimson software package from AMD.

For this story today I wanted to focus on two of the above features that have long been a sticking point for me, and see how well AMD has fixed them with the first release of Radeon Software.

FreeSync: Low Frame Rate Compensation

I might be slightly biased, but I don't think anyone has done a more thorough job of explaining and diving into the differences between AMD FreeSync and NVIDIA G-Sync than the team at PC Perspective. Since day one of the G-Sync variable refresh release we have been following the changes and capabilities of these competing features and writing about what really separates them from a technological point of view, not just pricing and perceived experiences. 

Continue reading our overview of new features in AMD Radeon Software Crimson!!

Author:
Manufacturer: AMD

Four High Powered Mini ITX Systems

Thanks to Sebastian for helping me out with some of the editorial for this piece and to Ken for doing the installation and testing on the system builds! -Ryan

Update (1/23/16): Now that that AMD Radeon R9 Nano is priced at just $499, it becomes an even better solution for these builds, dropping prices by $150 each.

While some might wonder where the new Radeon R9 Nano fits in a market that offers the AMD Fury X for the same price, the Nano is a product that defines a new category in the PC enthusiast community. It is a full-scale GPU on an impossibly small 6-inch PCB, containing the same core as the larger liquid-cooled Fury X, but requiring 100 watts less power than Fury X and cooled by a single-fan dual-slot air cooler.

The R9 Nano design screams compatibility. It has the ability to fit into virtually any enclosure (including many of the smallest mini-ITX designs), as long as the case supports a dual-slot (full height) GPU. The total board length of 6 inches is shorter than a mini-ITX motherboard, which is 6.7 inches square! Truly, the Nano has the potential to change everything when it comes to selecting a small form-factor (SFF) enclosure.

IMG_3232.jpg

Typically, a gaming-friendly enclosure would need at minimum a ~270 mm GPU clearance, as a standard 10.5-inch reference GPU translates into 266.7 mm in length. Even very small mini-ITX enclosures have had to position components specifically to allow for these longer cards – if they wanted to be marketed as compatible with a full-size GPU solution, of course. Now with the R9 Nano, smaller and more powerful than any previous ITX-specific graphics card to date, one of the first questions we had was a pretty basic one: what enclosure should we put this R9 Nano into?

With no shortage of enclosures at our disposal to try out a build with this new card, we quickly discovered that many of them shared a design choice: room for a full-length GPU. So, what’s the advantage of the Nano’s incredibly compact size? It must be pointed out that larger (and faster) Fury X has the same MSRP, and at 7.5 inches the Fury X will fit comfortably in cases that have spacing for the necessary radiator.

Finding a Case for Nano

While even some of the tiniest mini-ITX enclosures (EVGA Hadron, NCASE M1, etc.) offer support for a 10.5-in GPU, there are several compact mini-ITX cases that don’t support a full-length graphics card due to their small footprint. While by no means a complete list, here are some of the options out there (note: there are many more mini-ITX cases that don’t support a full-height or dual-slot expansion card at all, such as slim HTPC enclosures):

Manufacturer Model Price
Cooler Master Elite 110 $47.99, Amazon.com
Cooltek Coolcube  
Lian Li PC-O5 $377, Amazon.com
Lian Li PC-Q01 $59.99, Newegg.com
Lian Li PC-Q03 $74.99, Newegg.com
Lian Li PC-Q07 $71.98, Amazon.com
Lian Li PC-Q21  
Lian Li PC-Q26  
Lian Li PC-Q27  
Lian Li PC-Q30 $139.99, Newegg.com
Lian Li PC-Q33 $134.99, Newegg.com
Raijintek Metis $59.99, Newegg.com
Rosewill Legacy V3 Plus-B $59.99, Newegg.com

The list is dominated by Lian Li, who offers a number of cube-like mini-ITX enclosures that would ordinarily be out of the question for a gaming rig, unless one of the few ITX-specific cards were chosen for the build. Many other fine enclosure makers (Antec, BitFenix, Corsair, Fractal Design, SilverStone, etc.) offer mini-ITX enclosures that support full-length GPUs, as this has pretty much become a requirement for an enthusiast PC case.

Continue our look at building Mini ITX systems with the AMD Radeon R9 Nano!!

Author:
Manufacturer: MSI

Quick Look

Last month NVIDIA introduced the world to the GTX 980 in a new form factor for gaming notebook. Using the same Maxwell GPU, the same performance levels but with slightly tweaked power delivery and TDPs, notebooks powered by the GTX 980 promise to be a noticeable step faster than anything before it.

IMG_3474.JPG

Late last week I got my hands on the updated MSI GT72S Dominator Pro G, the first retail ready gaming notebook to not only integrate the new GTX 980 GPU but also an unlocked Skylake mobile processor. 

This machine is something to behold - though it looks very similar to previous GT72 versions, this machine hides hardware unlike anything we have been able to carry in a backpack before. And the sexy red exterior with MSI Dragon Army logo blazoned across the back definitely help it to stand out in a crowd. If you happen to be in a crowd of notebooks.

IMG_3475.JPG

A quick spin around the GT72S reveals a sizeable collection of hardware and connections. On the left you'll find a set of four USB 3.0 ports as well as four audio inputs and ouputs and an SD card reader.

IMG_3476.JPG

On the opposite side there are two more USB 3.0 ports (totalling six) and the optical / Blu-ray burner. With that many USB 3.0 ports you should never struggle with accessories availability - headset, mouse, keyboard, hard drive and portable fan? Check.

Continue reading our preview of the new MSI GT72S Dominator Pro G with the NVIDIA GeForce GTX 980!!

Manufacturer: NVIDIA

GPU Enthusiasts Are Throwing a FET

NVIDIA is rumored to launch Pascal in early (~April-ish) 2016, although some are skeptical that it will even appear before the summer. The design was finalized months ago, and unconfirmed shipping information claims that chips are being stockpiled, which is typical when preparing to launch a product. It is expected to compete against AMD's rumored Arctic Islands architecture, which will, according to its also rumored numbers, be very similar to Pascal.

This architecture is a big one for several reasons.

nvidia-2015-pascal-zoomed.jpg

Image Credit: WCCFTech

First, it will jump two full process nodes. Current desktop GPUs are manufactured at 28nm, which was first introduced with the GeForce GTX 680 all the way back in early 2012, but Pascal will be manufactured on TSMC's 16nm FinFET+ technology. Smaller features have several advantages, but a huge one for GPUs is the ability to fit more complex circuitry in the same die area. This means that you can include more copies of elements, such as shader cores, and do more in fixed-function hardware, like video encode and decode.

That said, we got a lot more life out of 28nm than we really should have. Chips like GM200 and Fiji are huge, relatively power-hungry, and complex, which is a terrible idea to produce when yields are low. I asked Josh Walrath, who is our go-to for analysis of fab processes, and he believes that FinFET+ is probably even more complicated today than 28nm was in the 2012 timeframe, which was when it launched for GPUs.

It's two full steps forward from where we started, but we've been tiptoeing since then.

NVIDIA-2015-Pascal-GPU-2015.jpg

Image Credit: WCCFTech

Second, Pascal will introduce HBM 2.0 to NVIDIA hardware. HBM 1.0 was introduced with AMD's Radeon Fury X, and it helped in numerous ways -- from smaller card size to a triple-digit percentage increase in memory bandwidth. The 980 Ti can talk to its memory at about 300GB/s, while Pascal is rumored to push that to 1TB/s. Capacity won't be sacrificed, either. The top-end card is expected to contain 16GB of global memory, which is twice what any console has. This means less streaming, higher resolution textures, and probably even left-over scratch space for the GPU to generate content in with compute shaders. Also, according to AMD, HBM is an easier architecture to communicate with than GDDR, which should mean a savings in die space that could be used for other things.

Third, the architecture includes native support for three levels of floating point precision. Maxwell, due to how limited 28nm was, saved on complexity by reducing 64-bit IEEE 754 decimal number performance to 1/32nd of 32-bit numbers, because FP64 values are rarely used in video games. This saved transistors, but was a huge, order-of-magnitude step back from the 1/3rd ratio found on the Kepler-based GK110. While it probably won't be back to the 1/2 ratio that was found in Fermi, Pascal should be much better suited for GPU compute.

NVIDIA-2015-Pascal-GPU_Compute-Performance-635x357.jpg

Image Credit: WCCFTech

Mixed precision could help video games too, though. Remember how I said it supports three levels? The third one is 16-bit, which is half of the format that is commonly used in video games. Sometimes, that is sufficient. If so, Pascal is said to do these calculations at twice the rate of 32-bit. We'll need to see whether enough games (and other applications) are willing to drop down in precision to justify the die space that these dedicated circuits require, but it should double the performance of anything that does.

So basically, this generation should provide a massive jump in performance that enthusiasts have been waiting for. Increases in GPU memory bandwidth and the amount of features that can be printed into the die are two major bottlenecks for most modern games and GPU-accelerated software. We'll need to wait for benchmarks to see how the theoretical maps to practical, but it's a good sign.

Author:
Manufacturer: Lionhead Studios

Benchmark Overview

When approached a couple of weeks ago by Microsoft with the opportunity to take an early look at an upcoming performance benchmark built on a DX12 game pending release later this year, I of course was excited for the opportunity. Our adventure into the world of DirectX 12 and performance evaluation started with the 3DMark API Overhead Feature Test back in March and was followed by the release of the Ashes of the Singularity performance test in mid-August. Both of these tests were pinpointing one particular aspect of the DX12 API - the ability to improve CPU throughput and efficiency with higher draw call counts and thus enabling higher frame rates on existing GPUs.

ScreenShot00004.jpg

This game and benchmark are beautiful...

Today we dive into the world of Fable Legends, an upcoming free to play based on the world of Albion. This title will be released on the Xbox One and for Windows 10 PCs and it will require the use of DX12. Though scheduled for release in Q4 of this year, Microsoft and Lionhead Studios allowed us early access to a specific performance test using the UE4 engine and the world of Fable Legends. UPDATE: It turns out that the game will have a fall-back DX11 mode that will be enabled if the game detects a GPU incapable of running DX12.

This benchmark focuses more on the GPU side of DirectX 12 - on improved rendering techniques and visual quality rather than on the CPU scaling aspects that made Ashes of the Singularity stand out from other graphics tests we have utilized. Fable Legends is more representative of what we expect to see with the release of AAA games using DX12. Let's dive into the test and our results!

Continue reading our look at the new Fable Legends DX12 Performance Test!!

Author:
Manufacturer: NVIDIA

Pack a full GTX 980 on the go!

For many years, the idea of a truly mobile gaming system has been attainable if you were willing to pay the premium for high performance components. But anyone that has done research in this field would tell you that though they were named similarly, the mobile GPUs from both AMD and NVIDIA had a tendency to be noticeably slower than their desktop counterparts. A GeForce GTX 970M, for example, only had a CUDA core count that was slightly higher than the desktop GTX 960, and it was 30% lower than the true desktop GTX 970 product. So even though you were getting fantastic mobile performance, there continued to be a dominant position that desktop users held over mobile gamers in PC gaming.

This fall, NVIDIA is changing that with the introduction of the GeForce GTX 980 for gaming notebooks. Notice I did not put an 'M' at the end of that name; it's not an accident. NVIDIA has found a way, through binning and component design, to cram the entirety of a GM204-based Maxwell GTX 980 GPU inside portable gaming notebooks.

980-5.jpg

The results are impressive and the implications for PC gamers are dramatic. Systems built with the GTX 980 will include the same 2048 CUDA cores, 4GB of GDDR5 running at 7.0 GHz and will run at the same base and typical GPU Boost clocks as the reference GTX 980 cards you can buy today for $499+. And, while you won't find this GPU in anything called a "thin and light", 17-19" gaming laptops do allow for portability of gaming unlike any SFF PC.

So how did they do it? NVIDIA has found a way to get a desktop GPU with a 165 watt TDP into a form factor that has a physical limit of 150 watts (for the MXM module implementations at least) through binning, component selection and improved cooling. Not only that, but there is enough headroom to allow for some desktop-class overclocking of the GTX 980 as well.

Continue reading our preview of the new GTX 980 for notebooks!!

Author:
Manufacturer: AMD
Tagged: video, radeon, R9, Nano, hbm, Fiji, amd

Specs and Hardware

The AMD Radeon Nano graphics card is unlike any product we have ever tested at PC Perspective. As I wrote and described to the best of my ability (without hardware in my hands) late last month, AMD is targeting a totally unique and different classification of hardware with this release. As a result, there is quite a bit of confusion, criticism, and concern about the Nano, and, to be upfront, not all of it is unwarranted.

IMG_3232.jpg

After spending the past week with an R9 Nano here in the office, I am prepared to say this immediately: for users matching specific criteria, there is no other option that comes close to what AMD is putting on the table today. That specific demographic though is going to be pretty narrow, a fact that won’t necessarily hurt AMD simply due to the obvious production limitations of the Fiji and HBM architectures.

At $650, the R9 Nano comes with a flagship cost but it does so knowing full well that it will not compete in terms of raw performance against the likes of the GTX 980 Ti or AMD’s own Radeon R9 Fury X. However, much like Intel has done with the Ultrabook and ULV platforms, AMD is attempting to carve out a new market that is looking for dense, modest power GPUs in small form factors. Whether or not they have succeeded is what I am looking to determine today. Ride along with me as we journey on the roller coaster of a release that is the AMD Radeon R9 Nano.

Continue reading our review of the AMD Radeon R9 Nano!!

Manufacturer: PC Perspective

To the Max?

Much of the PC enthusiast internet, including our comments section, has been abuzz with “Asynchronous Shader” discussion. Normally, I would explain what it is and then outline the issues that surround it, but I would like to swap that order this time. Basically, the Ashes of the Singularity benchmark utilizes Asynchronous Shaders in DirectX 12, but they disable it (by Vendor ID) for NVIDIA hardware. They say that this is because, while the driver reports compatibility, “attempting to use it was an unmitigated disaster in terms of performance and conformance”.

epic-2015-ue4-dx12.jpg

AMD's Robert Hallock claims that NVIDIA GPUs, including Maxwell, cannot support the feature in hardware at all, while all AMD GCN graphics cards do. NVIDIA has yet to respond to our requests for an official statement, although we haven't poked every one of our contacts yet. We will certainly update and/or follow up if we hear from them. For now though, we have no idea whether this is a hardware or software issue. Either way, it seems more than just politics.

So what is it?

Simply put, Asynchronous Shaders allows a graphics driver to cram workloads in portions of the GPU that are idle, but not otherwise available. For instance, if a graphics task is hammering the ROPs, the driver would be able to toss an independent physics or post-processing task into the shader units alongside it. Kollock from Oxide Games used the analogy of HyperThreading, which allows two CPU threads to be executed on the same core at the same time, as long as it has the capacity for it.

Kollock also notes that compute is becoming more important in the graphics pipeline, and it is possible to completely bypass graphics altogether. The fixed-function bits may never go away, but it's possible that at least some engines will completely bypass it -- maybe even their engine, several years down the road.

I wonder who would pursue something so silly, whether for a product or even just research.

But, like always, you will not get an infinite amount of performance by reducing your waste. You are always bound by the theoretical limits of your components, and you cannot optimize past that (except for obviously changing the workload itself). The interesting part is: you can measure that. You can absolutely observe how long a GPU is idle, and represent it as a percentage of a time-span (typically a frame).

And, of course, game developers profile GPUs from time to time...

According to Kollock, he has heard of some console developers getting up to 30% increases in performance using Asynchronous Shaders. Again, this is on console hardware and so this amount may increase or decrease on the PC. In an informal chat with a developer at Epic Games, so massive grain of salt is required, his late night ballpark “totally speculative” guesstimate is that, on the Xbox One, the GPU could theoretically accept a maximum ~10-25% more work in Unreal Engine 4, depending on the scene. He also said that memory bandwidth gets in the way, which Asynchronous Shaders would be fighting against. It is something that they are interested in and investigating, though.

AMD-2015-MantleAPI-slide1.png

This is where I speculate on drivers. When Mantle was announced, I looked at its features and said “wow, this is everything that a high-end game developer wants, and a graphics developer absolutely does not”. From the OpenCL-like multiple GPU model taking much of the QA out of SLI and CrossFire, to the memory and resource binding management, this should make graphics drivers so much easier.

It might not be free, though. Graphics drivers might still have a bunch of games to play to make sure that work is stuffed through the GPU as tightly packed as possible. We might continue to see “Game Ready” drivers in the coming years, even though much of that burden has been shifted to the game developers. On the other hand, maybe these APIs will level the whole playing field and let all players focus on chip design and efficient injestion of shader code. As always, painfully always, time will tell.

Author:
Manufacturer: AMD

The Tiniest Fiji

Way back on June 16th, AMD held a live stream event during E3 to announce a host of new products. In that group was the AMD Radeon R9 Fury X, R9 Fury and the R9 Nano. Of the three, the Nano was the most intriguing to most of the online press as it was the one we knew the least about. AMD promised a full Fiji GPU in a package with a 6-in PCB and a 175 watt TDP. Well today, AMD is, uh, re-announcing (??) the AMD Radeon R9 Nano with more details on specifications, performance and availability.

r9nano-2.jpg

First, let’s get this out of the way: AMD is making this announcement today because they publicly promised the R9 Nano for August. And with the final days of summer creeping up on them, rather than answer questions about another delay, AMD is instead going the route of a paper launch, but one with a known end date. We will apparently get our samples of the hardware in early September with reviews and the on-sale date following shortly thereafter. (Update: AMD claims the R9 Nano will be on store shelves on September 10th and should have "critical mass" of availability.)

Now let’s get to the details that you are really here for. And rather than start with the marketing spin on the specifications that AMD presented to the media, let’s dive into the gory details right now.

  R9 Nano R9 Fury R9 Fury X GTX 980 Ti TITAN X GTX 980 R9 290X
GPU Fiji XT Fiji Pro Fiji XT GM200 GM200 GM204 Hawaii XT
GPU Cores 4096 3584 4096 2816 3072 2048 2816
Rated Clock 1000 MHz 1000 MHz 1050 MHz 1000 MHz 1000 MHz 1126 MHz 1000 MHz
Texture Units 256 224 256 176 192 128 176
ROP Units 64 64 64 96 96 64 64
Memory 4GB 4GB 4GB 6GB 12GB 4GB 4GB
Memory Clock 500 MHz 500 MHz 500 MHz 7000 MHz 7000 MHz 7000 MHz 5000 MHz
Memory Interface 4096-bit (HBM) 4096-bit (HBM) 4096-bit (HBM) 384-bit 384-bit 256-bit 512-bit
Memory Bandwidth 512 GB/s 512 GB/s 512 GB/s 336 GB/s 336 GB/s 224 GB/s 320 GB/s
TDP 175 watts 275 watts 275 watts 250 watts 250 watts 165 watts 290 watts
Peak Compute 8.19 TFLOPS 7.20 TFLOPS 8.60 TFLOPS 5.63 TFLOPS 6.14 TFLOPS 4.61 TFLOPS 5.63 TFLOPS
Transistor Count 8.9B 8.9B 8.9B 8.0B 8.0B 5.2B 6.2B
Process Tech 28nm 28nm 28nm 28nm 28nm 28nm 28nm
MSRP (current) $649 $549 $649 $649 $999 $499 $329

AMD wasn’t fooling around, the Radeon R9 Nano graphics card does indeed include a full implementation of the Fiji GPU and HBM, including 4096 stream processors, 256 texture units and 64 ROPs. The GPU core clock is rated “up to” 1.0 GHz, nearly the same as the Fury X (1050 MHz), and the only difference that I can see in the specifications on paper is that the Nano is rated at 8.19 TFLOPS of theoretical compute performance while the Fury X is rated at 8.60 TFLOPS.

Continue reading our preview of the AMD Radeon R9 Nano graphics card!!

Author:
Manufacturer: ASUS

Retail Card Design

AMD is in an interesting spot right now. The general consensus is that both the AMD Radeon R9 Fury X and the R9 Fury graphics cards had successful launches into the enthusiast community. We found that the performance of the Fury X was slightly under that of the GTX 980 Ti from NVIDIA, but also that the noise levels and power draw were so improved on Fiji over Hawaii that many users would dive head first into the new flagship from the red team.

The launch of the non-X AMD Fury card was even more interesting – here was a card with a GPU performing better than the competition in a price point that NVIDIA didn’t have an exact answer. The performance gap between the GTX 980 and GTX 980 Ti resulted in a $550 graphics card that AMD had a victory with. Add in the third Fiji-based product due out in a few short weeks, the R9 Nano, and you have a robust family of products that don’t exactly dominate the market but do put AMD in a positive position unlike any it has seen in recent years.

asus1.jpg

But there are some problems. First and foremost for AMD, continuing drops in market share. With the most recent reports from multiple source claiming that AMD’s Q2 2015 share has dropped to 18%, an all-time low in the last decade or so, AMD needs some growth and they need it now. Here’s the catch: AMD can’t make enough of the Fiji chip to affect that number at all. The Fury X, Fury and Nano are going to be hard to find for the foreseeable future thanks to production limits on the HBM (high bandwidth memory) integration; that same feature that helps make Fiji the compelling product it is. I have been keeping an eye on the stock of the Fury and Fury X products and found that it often can’t be found anywhere in the US for purchase. Maybe even more damning is the fact that the Radeon R9 Fury, the card that is supposed to be the model customizable by AMD board partners, still only has two options available: the Sapphire, which we reviewed when it launched, and the ASUS Strix R9 Fury that we are reviewing today.

AMD’s product and financial issues aside, the fact is that the Radeon R9 Fury 4GB and the ASUS Strix iteration of it are damned good products. ASUS has done its usual job of improving on the design of the reference PCB and cooler, added in some great features and packaged it up a price that is competitive and well worth the investment for enthusiast gamers. Our review today will only lightly touch on out-of-box performance of the Strix card mostly because it is so similar to that of the initial Fury review we posted in July. Instead I will look at the changes to the positioning of the AMD Fury product (if any) and how the cooler and design of the Strix product helps it stand out. Overclocking, power consumption and noise will all be evaluated as well.

Continue reading our review of the ASUS Strix R9 Fury Graphics Card!!

Author:
Manufacturer: NVIDIA

Another Maxwell Iteration

The mainstream end of the graphics card market is about to get a bit more complicated with today’s introduction of the GeForce GTX 950. Based on a slightly cut down GM206 chip, the same used in the GeForce GTX 960 that was released almost 8 months ago, the new GTX 950 will fill a gap in the product stack for NVIDIA, resting right at $160-170 MSRP. Until today that next-down spot from the GTX 960 was filled by the GeForce GTX 750 Ti, the very first iteration of Maxwell (we usually call it Maxwell 1) that came out in February of 2014!

Even though that is a long time to go without refreshing the GTX x50 part of the lineup, NVIDIA was likely hesitant to do so based on the overwhelming success of the GM107 for mainstream gaming. It was low cost, incredibly efficient and didn’t require any external power to run. That led us down the path of upgrading OEM PCs with GTX 750 Ti, an article and video that still gets hundreds of views and dozens of comments a week.

IMG_3123.JPG

The GTX 950 has some pretty big shoes to fill. I can tell you right now that it uses more power than the GTX 750 Ti, and it requires a 6-pin power connector, but it does so while increasing gaming performance dramatically. The primary competition from AMD is the Radeon R7 370, a Pitcairn GPU that is long in the tooth and missing many of the features that Maxwell provides.

And NVIDIA is taking a secondary angle with the GTX 950 launch –targeting the MOBA players (DOTA 2 in particular) directly and aggressively. With the success of this style of game over the last several years, and the impressive $18M+ purse for the largest DOTA 2 tournament just behind us, there isn’t a better area of PC gaming to be going after today. But are the tweaks and changes to the card and software really going to make a difference for MOBA gamers or is it just marketing fluff?

Let’s dive into everything GeForce GTX 950!

Continue reading our review of the NVIDIA GeForce GTX 950 2GB Graphics Card!!

Author:
Manufacturer: Stardock

Benchmark Overview

I knew that the move to DirectX 12 was going to be a big shift for the industry. Since the introduction of the AMD Mantle API along with the Hawaii GPU architecture we have been inundated with game developers and hardware vendors talking about the potential benefits of lower level APIs, which give more direct access to GPU hardware and enable more flexible threading for CPUs to game developers and game engines. The results, we were told, would mean that your current hardware would be able to take you further and future games and applications would be able to fundamentally change how they are built to enhance gaming experiences tremendously.

I knew that the reader interest in DX12 was outstripping my expectations when I did a live blog of the official DX12 unveil by Microsoft at GDC. In a format that consisted simply of my text commentary and photos of the slides that were being shown (no video at all), we had more than 25,000 live readers that stayed engaged the whole time. Comments and questions flew into the event – more than me or my staff could possible handle in real time. It turned out that gamers were indeed very much interested in what DirectX 12 might offer them with the release of Windows 10.

game3.jpg

Today we are taking a look at the first real world gaming benchmark that utilized DX12. Back in March I was able to do some early testing with an API-specific test that evaluates the overhead implications of DX12, DX11 and even AMD Mantle from Futuremark and 3DMark. This first look at DX12 was interesting and painted an amazing picture about the potential benefits of the new API from Microsoft, but it wasn’t built on a real game engine. In our Ashes of the Singularity benchmark testing today, we finally get an early look at what a real implementation of DX12 looks like.

And as you might expect, not only are the results interesting, but there is a significant amount of created controversy about what those results actually tell us. AMD has one story, NVIDIA another and Stardock and the Nitrous engine developers, yet another. It’s all incredibly intriguing.

Continue reading our analysis of the Ashes of the Singularity DX12 benchmark!!

Author:
Manufacturer: Intel

It comes after 8, but before 10

As the week of Intel’s Developer Forum (IDF) begins, you can expect to see a lot of information about Intel’s 6th Generation Core architecture, codenamed Skylake, finally revealed. When I posted my review of the Core i7-6700K, the first product based on that architecture to be released in any capacity, I was surprised that Intel was willing to ship product without the normal amount of background information for media and developers. Rather than give us the details and then ship product, which has happened for essentially every consumer product release I have been a part of, Intel did the reverse: ship a consumer friendly CPU and then promise to tell us how it all works later in the month at IDF.

Today I came across a document posted on Intel’s website that dives into very specific detail on the new Gen9 graphics and compute architecture of Skylake. Details on the Core architecture changes are not present, and instead we are given details on how the traditional GPU portion of the SoC has changed. To be clear: I haven’t had any formal briefing from Intel on this topic or anything surrounding the architecture of Skylake or the new Gen9 graphics system but I wanted to share the details we found available. I am sure we’ll learn more this week as IDF progresses so I will update this story where necessary.

What Intel calls Processor Graphics is what we used to call simply integrated graphics for the longest time. The purpose and role of processor graphics has changed drastically over the years and it is now not only responsible for 3D graphics rendering but compute, media and display capabilities of the Intel Skylake SoC (when discrete add-in graphics is not used). The architecture document used to source this story focuses on Gen9 graphics, the compute architecture utilized in the latest Skylake CPUs. The Intel HD Graphics 530 on the Core i7-6700K / Core i5-6600K is the first product released and announced using Gen9 graphics and is also the first to adopt Intel’s new 3-digit naming scheme.

skylakegen9-4.jpg

This die shot of the Core i7-6700K shows the increased size and prominence of the Gen9 graphics in the overall SoC design. Containing four traditional x86 CPU cores and 1 “slice” implementation of Gen9 graphics (with three visible sub-slices we’ll describe below), this is not likely to be the highest performing iteration of the latest Intel HD Graphics technology.

skylakegen9-4.2.jpg

Like the Intel processors before it, the Skylake design utilizes a ring bus architecture to connect the different components of the SoC. This bi-directional interconnect has a 32-byte wide data bus and connects to multiple “agents” on the CPU. Each individual CPU core is considered its own agent while the Gen9 compute architecture is considered one complete agent. The system agent bundles the DRAM memory, the display controller, PCI Express and other I/O interface that communicate with the rest of the PC. Any off-chip memory requests and transactions occur through this bus while on-chip data transfers tend to be handled differently.

Continue reading our look at the new Gen9 graphics and compute architecture on Skylake!!

Manufacturer: PC Perspective

It's Basically a Function Call for GPUs

Mantle, Vulkan, and DirectX 12 all claim to reduce overhead and provide a staggering increase in “draw calls”. As mentioned in the previous editorial, loading graphics card with tasks will take a drastic change in these new APIs. With DirectX 10 and earlier, applications would assign attributes to (what it is told is) the global state of the graphics card. After everything is configured and bound, one of a few “draw” functions is called, which queues the task in the graphics driver as a “draw call”.

While this suggests that just a single graphics device is to be defined, which we also mentioned in the previous article, it also implies that one thread needs to be the authority. This limitation was known about for a while, and it contributed to the meme that consoles can squeeze all the performance they have, but PCs are “too high level” for that. Microsoft tried to combat this with “Deferred Contexts” in DirectX 11. This feature allows virtual, shadow states to be loaded from secondary threads, which can be appended to the global state, whole. It was a compromise between each thread being able to create its own commands, and the legacy decision to have a single, global state for the GPU.

Some developers experienced gains, while others lost a bit. It didn't live up to expectations.

pcper-2015-dx12-290x.png

The paradigm used to load graphics cards is the problem. It doesn't make sense anymore. A developer might not want to draw a primitive with every poke of the GPU. At times, they might want to shove a workload of simple linear algebra through it, while other requests could simply be pushing memory around to set up a later task (or to read the result of a previous one). More importantly, any thread could want to do this to any graphics device.

pcper-2015-dx12-980.png

The new graphics APIs allow developers to submit their tasks quicker and smarter, and it allows the drivers to schedule compatible tasks better, even simultaneously. In fact, the driver's job has been massively simplified altogether. When we tested 3DMark back in March, two interesting things were revealed:

  • Both AMD and NVIDIA are only a two-digit percentage of draw call performance apart
  • Both AMD and NVIDIA saw an order of magnitude increase in draw calls

Read on to see what this means for games and game development.