All | Editorial | General Tech | Graphics Cards | Networking | Motherboards | Cases and Cooling | Processors | Chipsets | Memory | Displays | Systems | Storage | Mobile | Shows and Expos
Tick Tock Tick Tock Tick Tock Tock
A few websites have been re-reporting on a leak from BenchLife.info about Kaby Lake, which is supposedly a second 14nm redesign (“Tock”) to be injected between Skylake and Cannonlake.
UPDATE (July 2nd, 3:20pm ET): It has been pointed out that many hoaxes have come out of the same source, and that I should be more clear in my disclaimer. This is an unconfirmed, relatively easy to fake leak that does not have a second, independent source. I reported on it because (apart from being interesting enough) some details were listed on the images, but not highlighted in the leak, such as "GT0" and a lack of Iris Pro on -K. That suggests that the leaker got the images from somewhere, but didn't notice those details, which implies that the original source was hoaxed by an anonymous source, who only seeded the hoax to a single media outlet, or that it was an actual leak.
Either way, enjoy my analysis but realize that this is a single, unconfirmed source who allegedly published hoaxes in the past.
Image Credit: BenchLife.info
If true, this would be a major shift in both Intel's current roadmap as well as how they justify their research strategies. It also includes a rough stack of product categories, from 4.5W up to 91W TDPs, including their planned integrated graphics configurations. This leads to a pair of interesting stories:
How Kaby Lake could affect Intel's processors going forward. Since 2006, Intel has only budgeted a single CPU architecture redesign for any given fabrication process node. Taking two attempts on the 14nm process buys time for 10nm to become viable, but it could also give them more time to build up a better library of circuit elements, allowing them to assemble better processors in the future.
What type of user will be given Iris Pro? Also, will graphics-free options be available in the sub-Enthusiast class? When buying a processor from Intel, the high-end mainstream processors tend to have GT2-class graphics, such as the Intel HD 4600. Enthusiast architectures, such as Haswell-E, cannot be used without discrete graphics -- the extra space is used for more cores, I/O lanes, or other features. As we will discuss later, Broadwell took a step into changing the availability of Iris Pro in the high-end mainstream, but it doesn't seem like Kaby Lake will make any more progress. Also, if I am interpreting the table correctly, Kaby Lake might bring iGPU-less CPUs to LGA 1151.
Keeping Your Core Regular
To the first point, Intel has been on a steady tick-tock cycle since the Pentium 4 architecture reached the 65nm process node, which was a “tick”. The “tock” came from the Conroe/Merom architecture that was branded “Core 2”. This new architecture was a severe departure from the high clock, relatively low IPC design that Netburst was built around, which instantaneously changed the processor landscape from a dominant AMD to an Intel runaway lead.
After 65nm and Core 2 started the cycle, every new architecture alternated between shrinking the existing architecture to smaller transistors (tick) and creating a new design on the same fabrication process (tock). Even though Intel has been steadily increasing their R&D budget over time, which is now in the range of $10 to $12 billion USD each year, creating smaller, more intricate designs with new process nodes has been getting harder. For comparison, AMD's total revenue (not just profits) for 2014 was $5.51 billion USD.
Retail cards still suffer from the issue
In our review of AMD's latest flagship graphics card, the Radeon R9 Fury X, I noticed and commented on the unique sound that the card was producing during our testing. A high pitched whine, emanating from the pump of the self-contained water cooler designed by Cooler Master, was obvious from the moment our test system was powered on and remained constant during use. I talked with a couple of other reviewers about the issue before the launch of the card and it seemed that I wasn't alone. Looking around other reviews of the Fury X, most make mention of this squeal specifically.
Noise from graphics cards come in many forms. There is the most obvious and common noise from on-board fans and the air it moves. Less frequently, but distinctly, the sound of inductor coil whine comes up. Fan noise spikes when the GPU gets hot, causing the fans to need to spin faster and move more air across the heatsink, which keeps everything running cool. Coil whine changes pitch based on the frame rate (and the frequency of power delivery on the card) and can be alleviated by using higher quality components on the board itself.
But the sound of our Fury X was unique: it was caused by the pump itself and it was constant. The noise it produced did not change as the load on the GPU varied. It was also 'pitchy' - a whine that seemed to pierce through other sounds in the office. A close analog might be the sound of an older, CRT TV or monitor that is left powered on without input.
In our review process, AMD told us the solution was fixed. In an email sent to the media just prior to the Fury X launch, an AMD rep stated:
In regards to the “pump whine”, AMD received feedback that during open bench testing some cards emit a mild “whining” noise. This is normal for most high speed liquid cooling pumps; Usually the end user cannot hear the noise as the pumps are installed in the chassis, and the radiator fan is louder than the pump. Since the AMD Radeon™ R9 Fury X radiator fan is near silent, this pump noise is more noticeable.
The issue is limited to a very small batch of initial production samples and we have worked with the manufacturer to improve the acoustic profile of the pump. This problem has been resolved and a fix added to production parts and is not an issue.
I would disagree that this is "normal" but even so, taking AMD at its word, I wrote that we heard the noise but also that AMD had claimed to have addressed it. Other reviewers noted the same comment from AMD, saying the result was fixed. But very quickly after launch some users were posting videos on YouTube and on forums with the same (or worse) sounds and noise. We had already started bringing in a pair of additional Fury X retail cards from Newegg in order to do some performance testing, so it seemed like a logical next step for us to test these retail cards in terms of pump noise as well.
First, let's get the bad news out of the way: both of the retail AMD Radeon R9 Fury X cards that arrived in our offices exhibit 'worse' noise, in the form of both whining and buzzing, compared to our review sample. In this write up, I'll attempt to showcase the noise profile of the three Fury X cards in our possession, as well as how they compare to the Radeon R9 295X2 (another water cooled card) and the GeForce GTX 980 Ti reference design - added for comparison.
A fury unlike any other...
Officially unveiled by AMD during E3 last week, we are finally ready to show you our review of the brand new Radeon R9 Fury X graphics card. Very few times has a product launch meant more to a company, and to its industry, than the Fury X does this summer. AMD has been lagging behind in the highest-tiers of the graphics card market for a full generation. They were depending on the 2-year-old Hawaii GPU to hold its own against a continuous barrage of products from NVIDIA. The R9 290X, despite using more power, was able to keep up through the GTX 700-series days, but the release of NVIDIA's Maxwell architecture forced AMD to move the R9 200-series parts into the sub-$350 field. This is well below the selling prices of NVIDIA's top cards.
The AMD Fury X hopes to change that with a price tag of $650 and a host of new features and performance capabilities. It aims to once again put AMD's Radeon line in the same discussion with enthusiasts as the GeForce series.
The Fury X is built on the new AMD Fiji GPU, an evolutionary part based on AMD's GCN (Graphics Core Next) architecture. This design adds a lot of compute horsepower (4,096 stream processors) and it also is the first consumer product to integrate HBM (High Bandwidth Memory) support with a 4096-bit memory bus!
Of course the question is: what does this mean for you, the gamer? Is it time to start making a place in your PC for the Fury X? Let's find out.
The new Radeon R9 300-series
The new AMD Radeon R9 and R7 300-series of graphics cards are coming into the world with a rocky start. We have seen rumors and speculation about what GPUs are going to be included, what changes would be made and what prices these would be shipping at for what seems like months, and in truth it has been months. AMD's Radeon R9 290 and R9 290X based on the new Hawaii GPU launched nearly 2 years ago, while the rest of the 200-series lineup was mostly a transition of existing products in the HD 7000-family. The lone exception was the Radeon R9 285, a card based on a mysterious new GPU called Tonga that showed up late to the game to fill a gap in the performance and pricing window for AMD.
AMD's R9 300-series, and the R7 300-series in particular, follows a very similar path. The R9 390 and R9 390X are still based on the Hawaii architecture. Tahiti is finally retired and put to pasture, though Tonga lives on as the Radeon R9 380. Below that you have the Radeon R7 370 and 360, the former based on the aging GCN 1.0 Curacao GPU and the latter based on Bonaire. On the surface its easy to refer to these cards with the dreaded "R-word"...rebrands. And though that seems to be the case there are some interesting performance changes, at least at the high end of this stack, that warrant discussion.
And of course, AMD partners like Sapphire are using this opportunity of familiarity with the GPU and its properties to release newer product stacks. In this case Sapphire is launching the new Nitro brand for a series of cards that it is aimed at what it considers the most common type of gamer: one that is cost conscious and craves performance over everything else.
The result is a stack of GPUs with prices ranging from about $110 up to ~$400 that target the "gamer" group of GPU buyers without the added price tag that some other lines include. Obviously it seems a little crazy to be talking about a line of graphics cards that is built for gamers (aren't they all??) but the emphasis is to build a fast card that is cool and quiet without the additional cost of overly glamorous coolers, LEDs or dip switches.
Today I am taking a look at the new Sapphire Nitro R9 390 8GB card, but before we dive head first into that card and its performance, let's first go over the changes to the R9-level of AMD's product stack.
Fiji: A Big and Necessary Jump
Fiji has been one of the worst kept secrets in a while. The chip has been talked about, written about, and rumored about seemingly for ages. The chip has promised to take on NVIDIA at the high end by bringing about multiple design decisions that are aimed to give it a tremendous leap in performance and efficiency as compared to previous GCN architectures. NVIDIA released their Maxwell based products last year and added to that this year with the Titan X and the GTX 980 Ti. These are the parts that Fiji is aimed to compete with.
The first product that Fiji will power is the R9 Fury X with integrated water cooling.
AMD has not been standing still, but their R&D budgets have been taking a hit as of late. The workforce has also been pared down to the bare minimum (or so I hope) while still being able to design, market, and sell products to the industry. This has affected their ability to produce as large a quantity of new chips as NVIDIA has in the past year. Cut-backs are likely not the entirety of the story, but they have certainly affected it.
The plan at AMD seems to be to focus on very important products and technologies, and then migrate those technologies to new products and lines when it makes the most sense. Last year we saw the introduction of “Tonga” which was the first major redesign after the release of the GCN 1.1 based Hawaii which powers the R9 290 and R9 390 series. Tonga delivered double the tessellation performance over Hawaii, it improved overall architecture efficiency, and allowed AMD to replace the older Tahiti and Pitcairn chips with an updated unit that featured xDMA and TrueAudio support. Tonga was a necessary building block that allowed AMD to produce a chip like Fiji.
Introduction and Technical Specifications
The measure of a true modder is not in how powerful he can make his system by throwing money at it, but in how well he can innovate to make his components run better with what he or she has on hand. Some make artistic statements with their truly awe-inspiring cases, while others take the dremel and clamps to their beloved video cards in an attempt to eek out that last bit of performance. This article serves the later of the two. Don't get me wrong, the card will look nice once we're done with it, but the point here is to re-use components on hand where possible to minimize the cost while maximizing the performance (and sound) benefits.
EVGA GTX 970 SC Graphics Card
Courtesy of EVGA
We started with an EVGA GTX 970 SC card with 4GB ram and bundled with the new revision of EVGA's ACX cooler, ACX 2.0. This card is well built with a slight factory overclock out of the box. The ACX 2.0 cooler is a redesigned version of the initial version of the cooler included with the card, offering better cooling potential with fan's not activated for active cooling until the GPU block temperature breeches 60C.
Courtesy of EVGA
WATERCOOL HeatKiller GPU-X3 Core GPU Waterblock
Courtesy of WATERCOOL
For water cooling the EVGA GTX 970 SC GPU, we decided to use the WATERCOOL HeatKiller GPU-X3 Core water block. This block features a POM-based body with a copper core for superior heat transfer from the GPU to the liquid medium. The HeatKiller GPU-X3 Core block is a GPU-only cooler, meaning that the memory and integrated VRM circuitry will not be actively cooled by the block. The decision to use a GPU only block rather than a full cover block was two fold - availability and cost. I had a few of these on hand, making of an easy decision cost-wise.
Digging into a specific market
A little while ago, I decided to think about processor design as a game. You are given a budget of complexity, which is determined by your process node, power, heat, die size, and so forth, and the objective is to lay out features in the way that suits your goal and workload best. While not the topic of today's post, GPUs are a great example of what I mean. They make the assumption that in a batch of work, nearby tasks are very similar, such as the math behind two neighboring pixels on the screen. This assumption allows GPU manufacturers to save complexity by chaining dozens of cores together into not-quite-independent work groups. The circuit fits the work better, and thus it lets more get done in the same complexity budget.
Carrizo is aiming at a 63 million unit per year market segment.
This article is about Carrizo, though. This is AMD's sixth-generation APU, starting with Llano's release in June 2011. For this launch, Carrizo is targeting the 15W and 35W power envelopes for $400-$700 USD notebook devices. AMD needed to increase efficiency on the same, 28nm process that we have seen in their product stack since Kabini and Temash were released in May of 2013. They tasked their engineers to optimize their APU's design for these constraints, which led to dense architectures and clever features on the same budget of complexity, rather than smaller transistors or a bigger die.
15W was their primary target, and they claim to have exceeded their own expectations.
Backing up for a second. Beep. Beep. Beep. Beep.
When I met with AMD last month, I brought up the Bulldozer architecture with many individuals. I suspected that it was a quite clever design that didn't reach its potential because of external factors. As I started this editorial, processor design is a game and, if you can save complexity by knowing your workload, you can do more with less.
Bulldozer looked like it wanted to take a shortcut by cutting elements that its designers believed would be redundant going forward. First and foremost, two cores share a single floating point (decimal) unit. While you need some floating point capacity, upcoming workloads could use the GPU for a massive increase in performance, which is right there on the same die. As such, the complexity that is dedicated to every second FPU can be cut and used for something else. You can see this trend throughout various elements of the architecture.
When NVIDIA launched the GeForce GTX Titan X card only back in March of this year, I knew immediately that the GTX 980 Ti would be close behind. The Titan X was so different from the GTX 980 when it came to pricing and memory capacity (12GB, really??) that NVIDIA had set up the perfect gap with which to place the newly minted GTX 980 Ti. Today we get to take the wraps off of that new graphics card and I think you'll be impressed with what you find, especially when you compare its value to the Titan X.
Based on the same Maxwell architecture and GM200 GPU, with some minor changes to GPU core count, memory size and boost speeds, the GTX 980 Ti finds itself in a unique spot in the GeForce lineup. Performance-wise it's basically identical in real-world game testing to the GTX Titan X, yet is priced $350 less that that 12GB behemoth. Couple that with a modest $50 price drop in the GTX 980 cards and you have all markers of an enthusiast graphics card that will sell as well as any we have seen in recent generations.
The devil is in all the other details, of course. AMD has its own plans for this summer but the Radeon R9 290X is still sitting there at a measly $320, undercutting the GTX 980 Ti by more than half. NVIDIA seems to be pricing its own GPUs as if it isn't even concerned with what AMD and the Radeon brand are doing. That could be dangerous if it goes on too long, but for today, can the R9 290X put up enough fight with the aging Hawaii XT GPU to make its value case to gamers on the fence?
Will the GeForce GTX 980 Ti be the next high-end GPU to make a splash in the market, or will it make a thud at the bottom of the GPU gene pool? Let's dive into it, shall we?
Big Things, Small Packages
Sapphire isn’t a brand we have covered in a while, so it is nice to see a new and interesting product drop on our door. Sapphire was a relative unknown until around the release of the Radeon 9700 Pro days. This was around the time when ATI decided that they did not want to be so vertically integrated, so allowed other companies to start buying their chips and making their own cards. This was done to provide a bit of stability for ATI pricing, as they didn’t have to worry about a volatile component market that could cause their margins to plummet. By selling just the chips to partners, ATI could more adequately control margins on their own product while allowing their partners to make their own deals and component choices for the finished card.
ATI had very limited graphics card production of their own, so they often would farm out production to second sources. One of these sources ended up turning into Sapphire. When ATI finally allowed other partners to produce and brand their own ATI based products, Sapphire already had a leg up on the competition by being a large producer already of ATI products. They soon controlled a good portion of the marketplace by their contacts, pricing, and close relationship with ATI.
Since this time ATI has been bought up by AMD and they no longer produce any ATI branded cards. Going vertical when it come to producing their own chips and video cards was obviously a bad idea, we can look back at 3dfx and their attempt at vertical integration and how that ended for the company. AMD obviously produces an initial reference version of their cards and coolers, but allows their partners to sell the “sticker” version and then develop their own designs. This has worked very well for both NVIDIA and AMD, and it has allowed their partners to further differentiate their product from the competition.
Sapphire usually does a bang up job on packaging the graphics card. Oh look, a mousepad!
Sapphire is not as big of a player as they used to be, but they are still one of the primary partners of AMD. It would not surprise me in the least if they still produced the reference designs for AMD and then distributed those products to other partners. Sapphire is known for building a very good quality card and their cooling solutions have been well received as well. The company does have some stiff competition from the likes of Asus, MSI, and others for this particular market. Unlike those two particular companies, Sapphire obviously does not make any NVIDIA based boards. This has been a blessing and a curse, depending on what the cycle is looking like between AMD and NVIDIA and who has dominance in any particular marketplace.
High Bandwidth Memory
UPDATE: I have embedded an excerpt from our PC Perspective Podcast that discusses the HBM technology that you might want to check out in addition to the story below.
The chances are good that if you have been reading PC Perspective or almost any other website that focuses on GPU technologies for the past year, you have read the acronym HBM. You might have even seen its full name: high bandwidth memory. HBM is a new technology that aims to turn the ability for a processor (GPU, CPU, APU, etc.) to access memory upside down, almost literally. AMD has already publicly stated that its next generation flagship Radeon GPU will use HBM as part of its design, but it wasn’t until today that we could talk about what HBM actually offers to a high performance processor like Fiji. At its core HBM drastically changes how the memory interface works, how much power is required for it and what metrics we will use to compare competing memory architectures. AMD and its partners started working on HBM with the industry more than 7 years ago, and with the first retail product nearly ready to ship, it’s time to learn about HBM.
We got some time with AMD’s Joe Macri, Corporate Vice President and Product CTO, to talk about AMD’s move to HBM and how it will shift the direction of AMD products going forward.
The first step in understanding HBM is to understand why it’s needed in the first place. Current GPUs, including the AMD Radeon R9 290X and the NVIDIA GeForce GTX 980, utilize a memory technology known as GDDR5. This architecture has scaled well over the past several GPU generations but we are starting to enter the world of diminishing returns. Balancing memory performance and power consumption is always a tough battle; just ask ARM about it. On the desktop component side we have much larger power envelopes to work inside but the power curve that GDDR5 is on will soon hit a wall, if you plot it far enough into the future. The result will be either drastically higher power consuming graphics cards or stalling performance improvements of the graphics market – something we have not really seen in its history.
While it’s clearly possible that current and maybe even next generation GPU designs could still have depended on GDDR5 as the memory interface, the move to a different solution is needed for the future; AMD is just making the jump earlier than the rest of the industry.
DirectX 12 Has No More Secrets
The DirectX 12 API is finalized and the last of its features are known. Before the BUILD conference, the list consisted of Conservative Rasterization, Rasterizer Ordered Viewed, Typed UAV Load, Volume Tiled Resources, and a new Tiled Resources revision for non-volumetric content. When the GeForce GTX 980 launched, NVIDIA claimed it would be compatible with DirectX 12 features. Enthusiasts were skeptical, because Microsoft did not officially finalize the spec at the time.
Last week, Microsoft announced the last feature of the graphics API: Multiadapter.
We already knew that Multiadapter existed, at least to some extent. It is the part of the specification that allows developers to address multiple graphics adapters to split tasks between them. In DirectX 11 and earlier, secondary GPUs would remain idle unless the graphics driver sprinkled some magic fair dust on it with SLI, CrossFire, or Hybrid CrossFire. The only other way to access this dormant hardware was by spinning up an OpenCL (or similar compute API) context on the side.
Way back in January of this year, while attending CES 2015 in Las Vegas, we wandered into the MSI suite without having any idea what we might see as new and exciting product. Besides the GT80 notebook with a mechanical keyboard on it, the MSI GS30 Shadow was easily the most interesting and exciting technology. Although MSI is not the first company to try this, the Shadow is the most recent attempt to combine the benefits of a thin and light notebook with a discrete, high performance GPU when the former is connected to the latter's docking station.
The idea has always been simple but the implementation has always been complex. Take a thin, light, moderately powered notebook that is usable and high quality in its own right and combine it with the ability to connect a discrete GPU while at home for gaming purposes. In theory, this is the best of both worlds: a notebook PC for mobile productivity and gaming capability courtesy of an external GPU. But as the years have gone on, more companies try and more companies fail; the integration process is just never as perfect a mix as we hope.
Today we see if MSI and the GS30 Shadow can fare any better. Does the combination of a very high performance thin and light notebook and the GamingDock truly create a mobile and gaming system that is worth your investment?
It's more than just a branding issue
As a part of my look at the first wave of AMD FreeSync monitors hitting the market, I wrote an analysis of how the competing technologies of FreeSync and G-Sync differ from one another. It was a complex topic that I tried to state in as succinct a fashion as possible given the time constraints and that the article subject was on FreeSync specifically. I'm going to include a portion of that discussion here, to recap:
First, we need to look inside the VRR window, the zone in which the monitor and AMD claims that variable refresh should be working without tears and without stutter. On the LG 34UM67 for example, that range is 48-75 Hz, so frame rates between 48 FPS and 75 FPS should be smooth. Next we want to look above the window, or at frame rates above the 75 Hz maximum refresh rate of the window. Finally, and maybe most importantly, we need to look below the window, at frame rates under the minimum rated variable refresh target, in this example it would be 48 FPS.
AMD FreeSync offers more flexibility for the gamer than G-Sync around this VRR window. For both above and below the variable refresh area, AMD allows gamers to continue to select a VSync enabled or disabled setting. That setting will be handled as you are used to it today when your game frame rate extends outside the VRR window. So, for our 34UM67 monitor example, if your game is capable of rendering at a frame rate of 85 FPS then you will either see tearing on your screen (if you have VSync disabled) or you will get a static frame rate of 75 FPS, matching the top refresh rate of the panel itself. If your game is rendering at 40 FPS, lower than the minimum VRR window, then you will again see the result of tearing (with VSync off) or the potential for stutter and hitching (with VSync on).
But what happens with this FreeSync monitor and theoretical G-Sync monitor below the window? AMD’s implementation means that you get the option of disabling or enabling VSync. For the 34UM67 as soon as your game frame rate drops under 48 FPS you will either see tearing on your screen or you will begin to see hints of stutter and judder as the typical (and previously mentioned) VSync concerns again crop their head up. At lower frame rates (below the window) these artifacts will actually impact your gaming experience much more dramatically than at higher frame rates (above the window).
G-Sync treats this “below the window” scenario very differently. Rather than reverting to VSync on or off, the module in the G-Sync display is responsible for auto-refreshing the screen if the frame rate dips below the minimum refresh of the panel that would otherwise be affected by flicker. So, in a 30-144 Hz G-Sync monitor, we have measured that when the frame rate actually gets to 29 FPS, the display is actually refreshing at 58 Hz, each frame being “drawn” one extra instance to avoid flicker of the pixels but still maintains a tear free and stutter free animation. If the frame rate dips to 25 FPS, then the screen draws at 50 Hz. If the frame rate drops to something more extreme like 14 FPS, we actually see the module quadruple drawing the frame, taking the refresh rate back to 56 Hz. It’s a clever trick that keeps the VRR goals and prevents a degradation of the gaming experience. But, this method requires a local frame buffer and requires logic on the display controller to work. Hence, the current implementation in a G-Sync module.
As you can see, the topic is complicated. So Allyn and I (and an aging analog oscilloscope) decided to take it upon ourselves to try and understand and teach the implementation differences with the help of some science. The video below is where the heart of this story is focused, though I have some visual aids embedded after it.
Still not clear on what this means for frame rates and refresh rates on current FreeSync and G-Sync monitors? Maybe this will help.
Our first DX12 Performance Results
Late last week, Microsoft approached me to see if I would be interested in working with them and with Futuremark on the release of the new 3DMark API Overhead Feature Test. Of course I jumped at the chance, with DirectX 12 being one of the hottest discussion topics among gamers, PC enthusiasts and developers in recent history. Microsoft set us up with the latest iteration of 3DMark and the latest DX12-ready drivers from AMD, NVIDIA and Intel. From there, off we went.
First we need to discuss exactly what the 3DMark API Overhead Feature Test is (and also what it is not). The feature test will be a part of the next revision of 3DMark, which will likely ship in time with the full Windows 10 release. Futuremark claims that it is the "world's first independent" test that allows you to compare the performance of three different APIs: DX12, DX11 and even Mantle.
It was almost one year ago that Microsoft officially unveiled the plans for DirectX 12: a move to a more efficient API that can better utilize the CPU and platform capabilities of future, and most importantly current, systems. Josh wrote up a solid editorial on what we believe DX12 means for the future of gaming, and in particular for PC gaming, that you should check out if you want more background on the direction DX12 has set.
One of DX12 keys for becoming more efficient is the ability for developers to get closer to the metal, which is a phrase to indicate that game and engine coders can access more power of the system (CPU and GPU) without having to have its hand held by the API itself. The most direct benefit of this, as we saw with AMD's Mantle implementation over the past couple of years, is improved quantity of draw calls that a given hardware system can utilize in a game engine.
With the release of the GeForce GTX 980 back in September of 2014, NVIDIA took the lead in performance with single GPU graphics cards. The GTX 980 and GTX 970 were both impressive options. The GTX 970 offered better performance than the R9 290 as did the GTX 980 compared to the R9 290X; on top of that, both did so while running at lower power consumption and while including new features like DX12 feature level support, HDMI 2.0 and MFAA (multi-frame antialiasing). Because of those factors, the GTX 980 and GTX 970 were fantastic sellers, helping to push NVIDIA’s market share over 75% as of the 4th quarter of 2014.
But in the back of our mind, and in the minds of many NVIDIA fans, we knew that the company had another GPU it was holding on to: the bigger, badder version of Maxwell. The only question was going to be WHEN the company would release it and sell us a new flagship GeForce card. In most instances, this decision is based on the competitive landscape, such as when AMD might be finally updating its Radeon R9 290X Hawaii family of products with the rumored R9 390X. Perhaps NVIDIA is tired of waiting or maybe the strategy is to launch soon before Fiji GPUs make their debut. Either way, NVIDIA officially took the wraps off of the new GeForce GTX TITAN X at the Game Developers Conference two weeks ago.
At the session hosted by Epic Games’ Tim Sweeney, NVIDIA CEO Jen-Hsun Huang arrived when Tim lamented about needing more GPU horsepower for their UE4 content. In his hands he had the first TITAN X GPU and talked about only a couple of specifications: the card would have 12GB of memory and it would be based on a GPU with 8 billion transistors.
Since that day, you have likely seen picture after picture, rumor after rumor, about specifications, pricing and performance. Wait no longer: the GeForce GTX TITAN X is here. With a $999 price tag and a GPU with 3072 CUDA cores, we clearly have a new king of the court.
Finally, a SHIELD Console
NVIDIA is filling out the family of the SHIELD brand today with the announcement of SHIELD, a set-top box powered by the Tegra X1 processor. SHIELD will run Android TV and act as a game playing, multimedia watching, GRID streaming device. Selling for $199 and available in May of this year, there is a lot to discuss.
Odd naming scheme aside, the SHIELD looks to be an impressive little device, sitting on your home theater or desk and bringing a ton of connectivity and performance to your TV. Running Android TV means the SHIELD will have access to the entire library of Google Play media including music, movies and apps. SHIELD supports 4K video playback at 60 Hz thanks to an HDMI 2.0 connection and fully supports H.265/HEVC decode thanks to Tegra X1 processor.
Here is a full breakdown of the device's specifications.
|NVIDIA SHIELD Specifications|
|Processor||NVIDIA® Tegra® X1 processor with 256-core Maxwell™ GPU with 3GB RAM|
|Video Features||4K Ultra-HD Ready with 4K playback and capture up to 60 fps (VP9, H265, H264)|
|Audio||7.1 and 5.1 surround sound pass through over HDMI
High-resolution audio playback up to 24-bit/192kHz over HDMI and USB
High-resolution audio upsample to 24-bit/192hHz over USB
|Wireless||802.11ac 2x2 MIMO 2.4 GHz and 5 GHz Wi-Fi
Two USB 3.0 (Type A)
MicroSD slot (supports 128GB cards)
IR Receiver (compatible with Logitech Harmony)
|Gaming Features||NVIDIA GRID™ streaming service
|SW Updates||SHIELD software upgrades directly from NVIDIA|
|Power||40W power adapter|
|Weight and Size||Weight: 23oz / 654g
Height: 5.1in / 130mm
Width: 8.3in / 210mm
Depth: 1.0in / 25mm
|OS||Android TV™, Google Cast™ Ready|
|In the box||NVIDIA SHIELD
NVIDIA SHIELD controller
HDMI cable (High Speed), USB cable (Micro-USB to USB)
Power adapter (Includes plugs for North America, Europe, UK)
|Requirements||TV with HDMI input, Internet access|
|Options||SHIELD controller, SHIELD remove, SHIELD stand|
Obviously the most important feature is the Tegra X1 SoC, built on an 8-core 64-bit ARM processor and a 256 CUDA Core Maxwell architecture GPU. This gives the SHIELD set-top more performance than basically any other mobile part on the market, and demos showing Doom 3 and Crysis 3 running natively on the hardware drive the point home. With integrated HEVC decode support the console is the first Android TV device to offer the support for 4K video content at 60 FPS.
Even though storage is only coming in at 16GB, the inclusion of an MicroSD card slot enabled expansion to as much as 128GB more for content and local games.
The first choice for networking will be the Gigabit Ethernet port, but the 2x2 dual-band 802.11ac wireless controller means that even those of us that don't have hardwired Internet going to our TV will be able to utilize all the performance and features of SHIELD.
As GDC progresses here in San Francisco, AMD took the wraps off of a new SDK for game developers to use to improve experiences with virtual reality (VR) headsets. Called LiquidVR, the goal is provide a smooth and stutter free VR experience that is universal across all headset hardware and to keep the wearer, be it a gamer or professional user, immersed.
AMD's CTO of Graphics, Raja Koduri spoke with us about the three primary tenets of the LiquidVR initiative. The 'three Cs' as it is being called are Comfort, Compatibility and Compelling Content. Ignoring the fact that we have four C's in that phrase, the premise is straight forward. Comfortable use of VR means there is little to no issues with neusea and that can be fixed with ultra-low latency between motion (of your head) and photons (hitting your eyes). For compatibility, AMD would like to assure that all VR headsets are treated equally and all provide the best experience. Oculus, HTC and others should operate in a simple, plug-and-play style. Finally, the content story is easy to grasp with a focus on solid games and software to utilize VR but AMD also wants to ensure that the rendering is scalable across different hardware and multiple GPUs.
To address these tenets AMD has built four technologies into LiquidVR: late data latching, asynchronous shaders, affinity multi-GPU, and direct-to-display.
The idea behind late data latching is to get the absolute most recent raw data from the VR engine to the users eyes. This means that rather than asking for the head position of a gamer at the beginning of a render job, LiquidVR will allow the game to ask for it at the end of the rendering pipeline, which might seem counter-intuitive. Late latch means the users head movement is tracked until the end of the frame render rather until just the beginning, saving potentially 5-10ms of delay.
Who Should Care? Thankfully, Many People
The Khronos Group has made three announcements today: Vulkan (their competitor to DirectX 12), OpenCL 2.1, and SPIR-V. Because there is actually significant overlap, we will discuss them in a single post rather than splitting them up. Each has a role in the overall goal to access and utilize graphics and compute devices.
Before we get into what everything is and does, let's give you a little tease to keep you reading. First, Khronos designs their technologies to be self-reliant. As such, while there will be some minimum hardware requirements, the OS pretty much just needs to have a driver model. Vulkan will not be limited to Windows 10 and similar operating systems. If a graphics vendor wants to go through the trouble, which is a gigantic if, Vulkan can be shimmed into Windows 8.x, Windows 7, possibly Windows Vista despite its quirks, and maybe even Windows XP. The words “and beyond” came up after Windows XP, but don't hold your breath for Windows ME or anything. Again, the further back in Windows versions you get, the larger the “if” becomes but at least the API will not have any “artificial limitations”.
Outside of Windows, the Khronos Group is the dominant API curator. Expect Vulkan on Linux, Mac, mobile operating systems, embedded operating systems, and probably a few toasters somewhere.
On that topic: there will not be a “Vulkan ES”. Vulkan is Vulkan, and it will run on desktop, mobile, VR, consoles that are open enough, and even cars and robotics. From a hardware side, the API requires a minimum of OpenGL ES 3.1 support. This is fairly high-end for mobile GPUs, but it is the first mobile spec to require compute shaders, which are an essential component of Vulkan. The presenter did not state a minimum hardware requirement for desktop GPUs, but he treated it like a non-issue. Graphics vendors will need to be the ones making the announcements in the end, though.
Quiet, Efficient Gaming
The last few weeks have been dominated by talk about the memory controller of the Maxwell based GTX 970. There are some very strong opinions about that particular issue, and certainly NVIDIA was remiss on actually informing consumers about how it handles the memory functionality of that particular product. While that debate rages, we have somewhat lost track of other products in the Maxwell range. The GTX 960 was released during this particular firestorm and, while it also shared the outstanding power/performance qualities of the Maxwell architecture, it is considered a little overpriced when compared to other cards in its price class in terms of performance.
It is easy to forget that the original Maxwell based product to hit shelves was the GTX 750 series of cards. They were released a year ago to some very interesting reviews. The board is one of the first mainstream cards in recent memory to have a power draw that is under 75 watts, but can still play games with good quality settings at 1080P resolutions. Ryan covered this very well and it turned out to be a perfect gaming card for many pre-built systems that do not have extra power connectors (or a power supply that can support 125+ watt graphics cards). These are relatively inexpensive cards and very easy to install, producing a big jump in performance as compared to the integrated graphics components of modern CPUs and APUs.
The GTX 750 and GTX 750 Ti have proven to be popular cards due to their overall price, performance, and extremely low power consumption. They also tend to produce a relatively low amount of heat, due to solid cooling combined with that low power consumption. The Maxwell architecture has also introduced some new features, but the major changes are to the overall design of the architecture as compared to Kepler. Instead of 192 cores per SMK, there are now 128 cores per SMM. NVIDIA has done a lot of work to improve performance per core as well as lower power in a fairly dramatic way. An interesting side effect is that the CPU hit with Maxwell is a couple of percentage points higher than Kepler. NVIDIA does lean a bit more on the CPU to improve overall GPU power, but most of this performance hit is covered up by some really good realtime compiler work in the driver.
Asus has taken the GTX 750 Ti and applied their STRIX design and branding to it. While there are certainly faster GPUs on the market, there are none that exhibit the power characteristics of the GTX 750 Ti. The combination of this GPU and the STRIX design should result in an extremely efficient, cool, and silent card.
A baker's dozen of GTX 960
Back on the launch day of the GeForce GTX 960, we hosted NVIDIA's Tom Petersen for a live stream. During the event, NVIDIA and its partners provided ten GTX 960 cards for our live viewers to win which we handed out through about an hour and a half. An interesting idea was proposed during the event - what would happen if we tried to overclock all of the product NVIDIA had brought along to see what the distribution of results looked like? After notifying all the winners of their prizes and asking for permission from each, we started the arduous process of testing and overclocking a total of 13 (10 prizes plus our 3 retail units already in the office) different GTX 960 cards.
Hopefully we will be able to provide a solid base of knowledge for buyers of the GTX 960 that we don't normally have the opportunity to offer: what is the range of overclocking you can expect and what is the average or median result. I think you will find the data interesting.
The 13 Contenders
Our collection of thirteen GTX 960 cards includes a handful from ASUS, EVGA and MSI. The ASUS models are all STRIX models, the EVGA cards are of the SSC variety, and the MSI cards include a single Gaming model and three 100ME. (The only difference between the Gaming and 100ME MSI cards is the color of the cooler.)
To be fair to the prize winners, I actually assigned each of them a specific graphics card before opening them up and testing them. I didn't want to be accused of favoritism by giving the best overclockers to the best readers!