All | Editorial | General Tech | Graphics Cards | Networking | Motherboards | Cases and Cooling | Processors | Chipsets | Memory | Displays | Systems | Storage | Mobile | Shows and Expos
It comes after 8, but before 10
As the week of Intel’s Developer Forum (IDF) begins, you can expect to see a lot of information about Intel’s 6th Generation Core architecture, codenamed Skylake, finally revealed. When I posted my review of the Core i7-6700K, the first product based on that architecture to be released in any capacity, I was surprised that Intel was willing to ship product without the normal amount of background information for media and developers. Rather than give us the details and then ship product, which has happened for essentially every consumer product release I have been a part of, Intel did the reverse: ship a consumer friendly CPU and then promise to tell us how it all works later in the month at IDF.
Today I came across a document posted on Intel’s website that dives into very specific detail on the new Gen9 graphics and compute architecture of Skylake. Details on the Core architecture changes are not present, and instead we are given details on how the traditional GPU portion of the SoC has changed. To be clear: I haven’t had any formal briefing from Intel on this topic or anything surrounding the architecture of Skylake or the new Gen9 graphics system but I wanted to share the details we found available. I am sure we’ll learn more this week as IDF progresses so I will update this story where necessary.
What Intel calls Processor Graphics is what we used to call simply integrated graphics for the longest time. The purpose and role of processor graphics has changed drastically over the years and it is now not only responsible for 3D graphics rendering but compute, media and display capabilities of the Intel Skylake SoC (when discrete add-in graphics is not used). The architecture document used to source this story focuses on Gen9 graphics, the compute architecture utilized in the latest Skylake CPUs. The Intel HD Graphics 530 on the Core i7-6700K / Core i5-6600K is the first product released and announced using Gen9 graphics and is also the first to adopt Intel’s new 3-digit naming scheme.
This die shot of the Core i7-6700K shows the increased size and prominence of the Gen9 graphics in the overall SoC design. Containing four traditional x86 CPU cores and 1 “slice” implementation of Gen9 graphics (with three visible sub-slices we’ll describe below), this is not likely to be the highest performing iteration of the latest Intel HD Graphics technology.
Like the Intel processors before it, the Skylake design utilizes a ring bus architecture to connect the different components of the SoC. This bi-directional interconnect has a 32-byte wide data bus and connects to multiple “agents” on the CPU. Each individual CPU core is considered its own agent while the Gen9 compute architecture is considered one complete agent. The system agent bundles the DRAM memory, the display controller, PCI Express and other I/O interface that communicate with the rest of the PC. Any off-chip memory requests and transactions occur through this bus while on-chip data transfers tend to be handled differently.
It's Basically a Function Call for GPUs
Mantle, Vulkan, and DirectX 12 all claim to reduce overhead and provide a staggering increase in “draw calls”. As mentioned in the previous editorial, loading graphics card with tasks will take a drastic change in these new APIs. With DirectX 10 and earlier, applications would assign attributes to (what it is told is) the global state of the graphics card. After everything is configured and bound, one of a few “draw” functions is called, which queues the task in the graphics driver as a “draw call”.
While this suggests that just a single graphics device is to be defined, which we also mentioned in the previous article, it also implies that one thread needs to be the authority. This limitation was known about for a while, and it contributed to the meme that consoles can squeeze all the performance they have, but PCs are “too high level” for that. Microsoft tried to combat this with “Deferred Contexts” in DirectX 11. This feature allows virtual, shadow states to be loaded from secondary threads, which can be appended to the global state, whole. It was a compromise between each thread being able to create its own commands, and the legacy decision to have a single, global state for the GPU.
Some developers experienced gains, while others lost a bit. It didn't live up to expectations.
The paradigm used to load graphics cards is the problem. It doesn't make sense anymore. A developer might not want to draw a primitive with every poke of the GPU. At times, they might want to shove a workload of simple linear algebra through it, while other requests could simply be pushing memory around to set up a later task (or to read the result of a previous one). More importantly, any thread could want to do this to any graphics device.
The new graphics APIs allow developers to submit their tasks quicker and smarter, and it allows the drivers to schedule compatible tasks better, even simultaneously. In fact, the driver's job has been massively simplified altogether. When we tested 3DMark back in March, two interesting things were revealed:
- Both AMD and NVIDIA are only a two-digit percentage of draw call performance apart
- Both AMD and NVIDIA saw an order of magnitude increase in draw calls
Going Beyond the Reference GTX 970
Zotac has been an interesting company to watch for the past few years. It is a company that has made a name for themselves in the small form factor community with some really interesting designs and products. They continue down that path, but they have increasingly focused on high quality graphics cards that address a pretty wide market. They provide unique products from the $40 level up through the latest GTX 980 Ti with hybrid water and air cooling for $770. The company used to focus on reference designs, but some years past they widened their appeal by applying their own design decisions to the latest NVIDIA products.
Catchy looking boxes for people who mostly order online! Still, nice design.
The beginning of this year saw Zotac introduce their latest “Core” brand products that aim to provide high end features to more modestly priced parts. The Core series makes some compromises to hit price points that are more desirable for a larger swath of consumers. The cards often rely on more reference style PCBs with good quality components and advanced cooling solutions. This equation has been used before, but Zotac is treading some new ground by offering very highly clocked cards right out of the box.
Overall Zotac has a very positive reputation in the industry for quality and support.
Plenty of padding in the box to protect your latest investment.
Zotac GTX 970 AMP! Extreme Core Edition
The product we are looking at today is the somewhat long-named AMP! Extreme Core Edition. This is based on the NVIDIA GTX 970 chip which features 56 ROPS, 1.75 MB of L2 cache, and 1664 CUDA Cores. The GTX 970 has of course been scrutinized heavily due to the unique nature of its memory subsystem. While it does physically have a 256 bit bus, the last 512 MB (out of 4GB) is addressed by a significantly slower unit due to shared memory controller capacity. In theory the card reference design supports up to 224 GB/sec of memory bandwidth. There are obviously some very unhappy people out there about this situation, but much of this could have been avoided if NVIDIA had disclosed the exact nature of the GTX 970 configuration.
Bioshock Infinite Results
Our Intel Skylake launch coverage is intense! Make sure you hit up all the stories and videos that are interesting for you!
- The Intel Core i7-6700K Review - Skylake First for Enthusiasts (Video)
- Skylake vs. Sandy Bridge: Discrete GPU Showdown (Video)
- ASUS Z170-A Motherboard Preview
- Intel Skylake / Z170 Rapid Storage Technology Tested - PCIe and SATA RAID
Today marks the release of Intel's newest CPU architecture, code named Skylake. I already posted my full review of the Core i7-6700K processor so, if you are looking for CPU performance and specification details on that part, you should start there. What we are looking at in this story is the answer to a very simple, but also very important question:
Is it time for gamers using Sandy Bridge system to finally bite the bullet and upgrade?
I think you'll find that answer will depend on a few things, including your gaming resolution and aptitude for multi-GPU configuration, but even I was surprised by the differences I saw in testing.
Our testing scenario was quite simple. Compare the gaming performance of an Intel Core i7-6700K processor and Z170 motherboard running both a single GTX 980 and a pair of GTX 980s in SLI against an Intel Core i7-2600K and Z77 motherboard using the same GPUs. I installed both the latest NVIDIA GeForce drivers and the latest Intel system drivers for each platform.
|Skylake System||Sandy Bridge System|
|Processor||Intel Core i7-6700K||Intel Core i7-2600K|
|Motherboard||ASUS Z170-Deluxe||Gigabyte Z68-UD3H B3|
|Memory||16GB DDR4-2133||8GB DDR3-1600|
|Graphics Card||1x GeForce GTX 980
2x GeForce GTX 980 (SLI)
|1x GeForce GTX 980
2x GeForce GTX 980 (SLI)
|OS||Windows 8.1||Windows 8.1|
Our testing methodology follows our Frame Rating system, which uses a capture-based system to measure frame times at the screen (rather than trusting the software's interpretation).
If you aren't familiar with it, you should probably do a little research into our testing methodology as it is quite different than others you may see online. Rather than using FRAPS to measure frame rates or frame times, we are using an secondary PC to capture the output from the tested graphics card directly and then use post processing on the resulting video to determine frame rates, frame times, frame variance and much more.
This amount of data can be pretty confusing if you attempting to read it without proper background, but I strongly believe that the results we present paint a much more thorough picture of performance than other options. So please, read up on the full discussion about our Frame Rating methods before moving forward!!
While there are literally dozens of file created for each “run” of benchmarks, there are several resulting graphs that FCAT produces, as well as several more that we are generating with additional code of our own.
If you need some more background on how we evaluate gaming performance on PCs, just check out my most recent GPU review for a full breakdown.
I only had time to test four different PC titles:
- Bioshock Infinite
- Grand Theft Auto V
- GRID 2
- Metro: Last Light
A few years ago, we took our first look at the inexpensive 27" 1440p monitors which were starting to flood the market via eBay sellers located in Korea. These monitors proved to be immensely popular and largely credited for moving a large number of gamers past 1080p.
However, in the past few months we have seen a new trend from some of these same Korean monitor manufacturers. Just like the Seiki Pro SM40UNP 40" 4K display that we took a look at a few weeks ago, the new trend is large 4K monitors.
Built around a 42-in LG AH-IPS panel, the Wasabi Mango UHD420 is an impressive display. Inclusion of HDMI 2.0 and DisplayPort 1.2 allow you to achieve 4K at a full 60Hz and 4:4:4 color gamut. At a cost of just under $800 on Amazon, this is an incredibly appealing value.
Whether or not the UHD420 is a TV or a monitor is actually quite the tossup. The lack of a tuner
might initially lead you to believe it's not a TV. Inclusion of a DisplayPort connector, and USB 3.0 hub might make you believe it's a monitor, but it's bundled with a remote control (entirely in Korean). In reality, this display could really be used for either use case (unless you use OTA tuning), and really starts to blur the lines between a "dumb" TV and a monitor. You'll also find VESA 400x400mm mounting holes on this display for easy wall mounting.
... But Is the Timing Right?
Windows 10 is about to launch and, with it, DirectX 12. Apart from the massive increase in draw calls, Explicit Multiadapter, both Linked and Unlinked, has been the cause of a few pockets of excitement here and there. I am a bit concerned, though. People seem to find this a new, novel concept that gives game developers the tools that they've never had before. It really isn't. Depending on what you want to do with secondary GPUs, game developers could have used them for years. Years!
Before we talk about the cross-platform examples, we should talk about Mantle. It is the closest analog to DirectX 12 and Vulkan that we have. It served as the base specification for Vulkan that the Khronos Group modified with SPIR-V instead of HLSL and so forth. Some claim that it was also the foundation of DirectX 12, which would not surprise me given what I've seen online and in the SDK. Allow me to show you how the API works.
Mantle is an interface that mixes Graphics, Compute, and DMA (memory access) into queues of commands. This is easily done in parallel, as each thread can create commands on its own, which is great for multi-core processors. Each queue, which are lists leading to the GPU that commands are placed in, can be handled independently, too. An interesting side-effect is that, since each device uses standard data structures, such as IEEE754 decimal numbers, no-one cares where these queues go as long as the work is done quick enough.
Since each queue is independent, an application can choose to manage many of them. None of these lists really need to know what is happening to any other. As such, they can be pointed to multiple, even wildly different graphics devices. Different model GPUs with different capabilities can work together, as long as they support the core of Mantle.
DirectX 12 and Vulkan took this metaphor so their respective developers could use this functionality across vendors. Mantle did not invent the concept, however. What Mantle did is expose this architecture to graphics, which can make use of all the fixed-function hardware that is unique to GPUs. Prior to AMD's usage, this was how GPU compute architectures were designed. Game developers could have spun up an OpenCL workload to process physics, audio, pathfinding, visibility, or even lighting and post-processing effects... on a secondary GPU, even from a completely different vendor.
Vista's multi-GPU bug might get in the way, but it was possible in 7 and, I believe, XP too.
Fiji brings the (non-X) Fury
Last month was a big one for AMD. At E3 the company hosted its own press conference to announce the Radeon R9 300-series of graphics as well as the new family of products based on the Fiji GPU. It started with the Fury X, a flagship $650 graphics card with an integrated water cooler that was well received. It wasn't perfect by any means, but it was a necessary move for AMD to compete with NVIDIA on the high end of the discrete graphics market.
At the event AMD also talked about the Radeon R9 Fury (without the X) as the version of Fiji that would be taken by board partners to add custom coolers and even PCB designs. (They also talked about the R9 Nano and a dual-GPU version of Fiji, but nothing new is available on those products yet.) The Fury, priced $100 lower than the Fury X at $549, is going back to a more classic GPU design. There is no "reference" product though, so cooler and PCB designs are going to vary from card to card. We already have two different cards in our hands that differ dramatically from one another.
The Fury cuts down the Fiji GPU a bit with fewer stream processors and texture units, but keeps most other specs the same. This includes the 4GB of HBM (high bandwidth memory), 64 ROP count and even the TDP / board power. Performance is great and it creates an interesting comparison between itself and the GeForce GTX 980 cards on the market. Let's dive into this review!
SLI and CrossFire
Last week I sat down with a set of three AMD Radeon R9 Fury X cards, our sampled review card as well as two retail cards purchased from Newegg, to see how the reports of the pump whine noise from the cards was shaping up. I'm not going to dive into that debate again here in this story as I think we have covered it pretty well thus far in that story as well as on our various podcasts, but rest assured we are continuing to look into the revisions of the Fury X to see if AMD and Cooler Master were actually able to fix the issue.
What we have to cover today is something very different, and likely much more interesting for a wider range of users. When you have three AMD Fury X cards in your hands, you of course have to do some multi-GPU testing with them. With our set I was able to run both 2-Way and 3-Way CrossFire with the new AMD flagship card and compare them directly to the comparable NVIDIA offering, the GeForce GTX 980 Ti.
There isn't much else I need to do to build up this story, is there? If you are curious how well the new AMD Fury X scales in CrossFire with two and even three GPUs, this is where you'll find your answers.
Introduction and Technical Specifications
In our previous article here, we demonstrated how to mod the EVGA GTX 970 SC ACX 2.0 video card to get higher performance and significantly lower running temps. Now we decided to take two of these custom modded EVGA GTX 970 cards to see how well they perform in an SLI configuration. ASUS was kind enough to supply us with one of their newly introduced ROG Enthusiast SLI Bridges for our experiments.
ASUS ROG Enthusiast SLI Bridge
Courtesy of ASUS
Courtesy of ASUS
For the purposes of running the two EVGA GTX 970 SC ACX 2.0 video cards in SLI, we chose to use the 3-way variant of ASUS' ROG Enthusiast SLI Bridge so that we could run the tests with full 16x bandwidth across both cards (with the cards in PCIe 3.0 x16 slots 1 and 3 in our test board). This customized SLI adapter features a powered red-colored ROG logo embedded in its brushed aluminum upper surface. The adapter supports 2-way and 3-way SLI in a variety of board configurations.
Courtesy of ASUS
ASUS offers their ROG Enthusiast SLI Bridge in 3 sizes for various variations on 2-way, 3-way, and 4-way SLI configurations. All bridges feature the top brushed-aluminum cap with embedded glowing ROG logo.
Courtesy of ASUS
The smallest bridge supports 2-way SLI configurations with either a two or three slot separation. The middle sized bridge supports up to a 3-way SLI configuration with a two slot separation required between each card. The largest bridge support up to a 4-way SLI configuration, also requiring a two slot separation between each card used.
Technical Specifications (taken from the ASUS website)
|Dimensions||2-WAY: 97 x 43 x 21 (L x W x H mm)
3-WAY: 108 x 53 x 21 (L x W x H mm)
4-WAY: 140 x 53 x 21 (L x W x H mm)
|Weight||70 g (2-WAY)
91 g (3-WAY)
|Compatible GPU set-ups||2-WAY: 2-WAY-S & 2-WAY-M
3-WAY: 2-WAY-L & 3-WAY
|Contents||2-WAY: 1 x optional power cable & 2 PCBs included for varying configurations
3-WAY: 1 x optional power cable
4-WAY: 1 x optional power cable
Tick Tock Tick Tock Tick Tock Tock
A few websites have been re-reporting on a leak from BenchLife.info about Kaby Lake, which is supposedly a second 14nm redesign (“Tock”) to be injected between Skylake and Cannonlake.
UPDATE (July 2nd, 3:20pm ET): It has been pointed out that many hoaxes have come out of the same source, and that I should be more clear in my disclaimer. This is an unconfirmed, relatively easy to fake leak that does not have a second, independent source. I reported on it because (apart from being interesting enough) some details were listed on the images, but not highlighted in the leak, such as "GT0" and a lack of Iris Pro on -K. That suggests that the leaker got the images from somewhere, but didn't notice those details, which implies that the original source was hoaxed by an anonymous source, who only seeded the hoax to a single media outlet, or that it was an actual leak.
Either way, enjoy my analysis but realize that this is a single, unconfirmed source who allegedly published hoaxes in the past.
Image Credit: BenchLife.info
If true, this would be a major shift in both Intel's current roadmap as well as how they justify their research strategies. It also includes a rough stack of product categories, from 4.5W up to 91W TDPs, including their planned integrated graphics configurations. This leads to a pair of interesting stories:
How Kaby Lake could affect Intel's processors going forward. Since 2006, Intel has only budgeted a single CPU architecture redesign for any given fabrication process node. Taking two attempts on the 14nm process buys time for 10nm to become viable, but it could also give them more time to build up a better library of circuit elements, allowing them to assemble better processors in the future.
What type of user will be given Iris Pro? Also, will graphics-free options be available in the sub-Enthusiast class? When buying a processor from Intel, the high-end mainstream processors tend to have GT2-class graphics, such as the Intel HD 4600. Enthusiast architectures, such as Haswell-E, cannot be used without discrete graphics -- the extra space is used for more cores, I/O lanes, or other features. As we will discuss later, Broadwell took a step into changing the availability of Iris Pro in the high-end mainstream, but it doesn't seem like Kaby Lake will make any more progress. Also, if I am interpreting the table correctly, Kaby Lake might bring iGPU-less CPUs to LGA 1151.
Keeping Your Core Regular
To the first point, Intel has been on a steady tick-tock cycle since the Pentium 4 architecture reached the 65nm process node, which was a “tick”. The “tock” came from the Conroe/Merom architecture that was branded “Core 2”. This new architecture was a severe departure from the high clock, relatively low IPC design that Netburst was built around, which instantaneously changed the processor landscape from a dominant AMD to an Intel runaway lead.
After 65nm and Core 2 started the cycle, every new architecture alternated between shrinking the existing architecture to smaller transistors (tick) and creating a new design on the same fabrication process (tock). Even though Intel has been steadily increasing their R&D budget over time, which is now in the range of $10 to $12 billion USD each year, creating smaller, more intricate designs with new process nodes has been getting harder. For comparison, AMD's total revenue (not just profits) for 2014 was $5.51 billion USD.
Retail cards still suffer from the issue
In our review of AMD's latest flagship graphics card, the Radeon R9 Fury X, I noticed and commented on the unique sound that the card was producing during our testing. A high pitched whine, emanating from the pump of the self-contained water cooler designed by Cooler Master, was obvious from the moment our test system was powered on and remained constant during use. I talked with a couple of other reviewers about the issue before the launch of the card and it seemed that I wasn't alone. Looking around other reviews of the Fury X, most make mention of this squeal specifically.
Noise from graphics cards come in many forms. There is the most obvious and common noise from on-board fans and the air it moves. Less frequently, but distinctly, the sound of inductor coil whine comes up. Fan noise spikes when the GPU gets hot, causing the fans to need to spin faster and move more air across the heatsink, which keeps everything running cool. Coil whine changes pitch based on the frame rate (and the frequency of power delivery on the card) and can be alleviated by using higher quality components on the board itself.
But the sound of our Fury X was unique: it was caused by the pump itself and it was constant. The noise it produced did not change as the load on the GPU varied. It was also 'pitchy' - a whine that seemed to pierce through other sounds in the office. A close analog might be the sound of an older, CRT TV or monitor that is left powered on without input.
In our review process, AMD told us the solution was fixed. In an email sent to the media just prior to the Fury X launch, an AMD rep stated:
In regards to the “pump whine”, AMD received feedback that during open bench testing some cards emit a mild “whining” noise. This is normal for most high speed liquid cooling pumps; Usually the end user cannot hear the noise as the pumps are installed in the chassis, and the radiator fan is louder than the pump. Since the AMD Radeon™ R9 Fury X radiator fan is near silent, this pump noise is more noticeable.
The issue is limited to a very small batch of initial production samples and we have worked with the manufacturer to improve the acoustic profile of the pump. This problem has been resolved and a fix added to production parts and is not an issue.
I would disagree that this is "normal" but even so, taking AMD at its word, I wrote that we heard the noise but also that AMD had claimed to have addressed it. Other reviewers noted the same comment from AMD, saying the result was fixed. But very quickly after launch some users were posting videos on YouTube and on forums with the same (or worse) sounds and noise. We had already started bringing in a pair of additional Fury X retail cards from Newegg in order to do some performance testing, so it seemed like a logical next step for us to test these retail cards in terms of pump noise as well.
First, let's get the bad news out of the way: both of the retail AMD Radeon R9 Fury X cards that arrived in our offices exhibit 'worse' noise, in the form of both whining and buzzing, compared to our review sample. In this write up, I'll attempt to showcase the noise profile of the three Fury X cards in our possession, as well as how they compare to the Radeon R9 295X2 (another water cooled card) and the GeForce GTX 980 Ti reference design - added for comparison.
A fury unlike any other...
Officially unveiled by AMD during E3 last week, we are finally ready to show you our review of the brand new Radeon R9 Fury X graphics card. Very few times has a product launch meant more to a company, and to its industry, than the Fury X does this summer. AMD has been lagging behind in the highest-tiers of the graphics card market for a full generation. They were depending on the 2-year-old Hawaii GPU to hold its own against a continuous barrage of products from NVIDIA. The R9 290X, despite using more power, was able to keep up through the GTX 700-series days, but the release of NVIDIA's Maxwell architecture forced AMD to move the R9 200-series parts into the sub-$350 field. This is well below the selling prices of NVIDIA's top cards.
The AMD Fury X hopes to change that with a price tag of $650 and a host of new features and performance capabilities. It aims to once again put AMD's Radeon line in the same discussion with enthusiasts as the GeForce series.
The Fury X is built on the new AMD Fiji GPU, an evolutionary part based on AMD's GCN (Graphics Core Next) architecture. This design adds a lot of compute horsepower (4,096 stream processors) and it also is the first consumer product to integrate HBM (High Bandwidth Memory) support with a 4096-bit memory bus!
Of course the question is: what does this mean for you, the gamer? Is it time to start making a place in your PC for the Fury X? Let's find out.
The new Radeon R9 300-series
The new AMD Radeon R9 and R7 300-series of graphics cards are coming into the world with a rocky start. We have seen rumors and speculation about what GPUs are going to be included, what changes would be made and what prices these would be shipping at for what seems like months, and in truth it has been months. AMD's Radeon R9 290 and R9 290X based on the new Hawaii GPU launched nearly 2 years ago, while the rest of the 200-series lineup was mostly a transition of existing products in the HD 7000-family. The lone exception was the Radeon R9 285, a card based on a mysterious new GPU called Tonga that showed up late to the game to fill a gap in the performance and pricing window for AMD.
AMD's R9 300-series, and the R7 300-series in particular, follows a very similar path. The R9 390 and R9 390X are still based on the Hawaii architecture. Tahiti is finally retired and put to pasture, though Tonga lives on as the Radeon R9 380. Below that you have the Radeon R7 370 and 360, the former based on the aging GCN 1.0 Curacao GPU and the latter based on Bonaire. On the surface its easy to refer to these cards with the dreaded "R-word"...rebrands. And though that seems to be the case there are some interesting performance changes, at least at the high end of this stack, that warrant discussion.
And of course, AMD partners like Sapphire are using this opportunity of familiarity with the GPU and its properties to release newer product stacks. In this case Sapphire is launching the new Nitro brand for a series of cards that it is aimed at what it considers the most common type of gamer: one that is cost conscious and craves performance over everything else.
The result is a stack of GPUs with prices ranging from about $110 up to ~$400 that target the "gamer" group of GPU buyers without the added price tag that some other lines include. Obviously it seems a little crazy to be talking about a line of graphics cards that is built for gamers (aren't they all??) but the emphasis is to build a fast card that is cool and quiet without the additional cost of overly glamorous coolers, LEDs or dip switches.
Today I am taking a look at the new Sapphire Nitro R9 390 8GB card, but before we dive head first into that card and its performance, let's first go over the changes to the R9-level of AMD's product stack.
Fiji: A Big and Necessary Jump
Fiji has been one of the worst kept secrets in a while. The chip has been talked about, written about, and rumored about seemingly for ages. The chip has promised to take on NVIDIA at the high end by bringing about multiple design decisions that are aimed to give it a tremendous leap in performance and efficiency as compared to previous GCN architectures. NVIDIA released their Maxwell based products last year and added to that this year with the Titan X and the GTX 980 Ti. These are the parts that Fiji is aimed to compete with.
The first product that Fiji will power is the R9 Fury X with integrated water cooling.
AMD has not been standing still, but their R&D budgets have been taking a hit as of late. The workforce has also been pared down to the bare minimum (or so I hope) while still being able to design, market, and sell products to the industry. This has affected their ability to produce as large a quantity of new chips as NVIDIA has in the past year. Cut-backs are likely not the entirety of the story, but they have certainly affected it.
The plan at AMD seems to be to focus on very important products and technologies, and then migrate those technologies to new products and lines when it makes the most sense. Last year we saw the introduction of “Tonga” which was the first major redesign after the release of the GCN 1.1 based Hawaii which powers the R9 290 and R9 390 series. Tonga delivered double the tessellation performance over Hawaii, it improved overall architecture efficiency, and allowed AMD to replace the older Tahiti and Pitcairn chips with an updated unit that featured xDMA and TrueAudio support. Tonga was a necessary building block that allowed AMD to produce a chip like Fiji.
Introduction and Technical Specifications
The measure of a true modder is not in how powerful he can make his system by throwing money at it, but in how well he can innovate to make his components run better with what he or she has on hand. Some make artistic statements with their truly awe-inspiring cases, while others take the dremel and clamps to their beloved video cards in an attempt to eek out that last bit of performance. This article serves the later of the two. Don't get me wrong, the card will look nice once we're done with it, but the point here is to re-use components on hand where possible to minimize the cost while maximizing the performance (and sound) benefits.
EVGA GTX 970 SC Graphics Card
Courtesy of EVGA
We started with an EVGA GTX 970 SC card with 4GB ram and bundled with the new revision of EVGA's ACX cooler, ACX 2.0. This card is well built with a slight factory overclock out of the box. The ACX 2.0 cooler is a redesigned version of the initial version of the cooler included with the card, offering better cooling potential with fan's not activated for active cooling until the GPU block temperature breeches 60C.
Courtesy of EVGA
WATERCOOL HeatKiller GPU-X3 Core GPU Waterblock
Courtesy of WATERCOOL
For water cooling the EVGA GTX 970 SC GPU, we decided to use the WATERCOOL HeatKiller GPU-X3 Core water block. This block features a POM-based body with a copper core for superior heat transfer from the GPU to the liquid medium. The HeatKiller GPU-X3 Core block is a GPU-only cooler, meaning that the memory and integrated VRM circuitry will not be actively cooled by the block. The decision to use a GPU only block rather than a full cover block was two fold - availability and cost. I had a few of these on hand, making of an easy decision cost-wise.
Digging into a specific market
A little while ago, I decided to think about processor design as a game. You are given a budget of complexity, which is determined by your process node, power, heat, die size, and so forth, and the objective is to lay out features in the way that suits your goal and workload best. While not the topic of today's post, GPUs are a great example of what I mean. They make the assumption that in a batch of work, nearby tasks are very similar, such as the math behind two neighboring pixels on the screen. This assumption allows GPU manufacturers to save complexity by chaining dozens of cores together into not-quite-independent work groups. The circuit fits the work better, and thus it lets more get done in the same complexity budget.
Carrizo is aiming at a 63 million unit per year market segment.
This article is about Carrizo, though. This is AMD's sixth-generation APU, starting with Llano's release in June 2011. For this launch, Carrizo is targeting the 15W and 35W power envelopes for $400-$700 USD notebook devices. AMD needed to increase efficiency on the same, 28nm process that we have seen in their product stack since Kabini and Temash were released in May of 2013. They tasked their engineers to optimize their APU's design for these constraints, which led to dense architectures and clever features on the same budget of complexity, rather than smaller transistors or a bigger die.
15W was their primary target, and they claim to have exceeded their own expectations.
Backing up for a second. Beep. Beep. Beep. Beep.
When I met with AMD last month, I brought up the Bulldozer architecture with many individuals. I suspected that it was a quite clever design that didn't reach its potential because of external factors. As I started this editorial, processor design is a game and, if you can save complexity by knowing your workload, you can do more with less.
Bulldozer looked like it wanted to take a shortcut by cutting elements that its designers believed would be redundant going forward. First and foremost, two cores share a single floating point (decimal) unit. While you need some floating point capacity, upcoming workloads could use the GPU for a massive increase in performance, which is right there on the same die. As such, the complexity that is dedicated to every second FPU can be cut and used for something else. You can see this trend throughout various elements of the architecture.
When NVIDIA launched the GeForce GTX Titan X card only back in March of this year, I knew immediately that the GTX 980 Ti would be close behind. The Titan X was so different from the GTX 980 when it came to pricing and memory capacity (12GB, really??) that NVIDIA had set up the perfect gap with which to place the newly minted GTX 980 Ti. Today we get to take the wraps off of that new graphics card and I think you'll be impressed with what you find, especially when you compare its value to the Titan X.
Based on the same Maxwell architecture and GM200 GPU, with some minor changes to GPU core count, memory size and boost speeds, the GTX 980 Ti finds itself in a unique spot in the GeForce lineup. Performance-wise it's basically identical in real-world game testing to the GTX Titan X, yet is priced $350 less that that 12GB behemoth. Couple that with a modest $50 price drop in the GTX 980 cards and you have all markers of an enthusiast graphics card that will sell as well as any we have seen in recent generations.
The devil is in all the other details, of course. AMD has its own plans for this summer but the Radeon R9 290X is still sitting there at a measly $320, undercutting the GTX 980 Ti by more than half. NVIDIA seems to be pricing its own GPUs as if it isn't even concerned with what AMD and the Radeon brand are doing. That could be dangerous if it goes on too long, but for today, can the R9 290X put up enough fight with the aging Hawaii XT GPU to make its value case to gamers on the fence?
Will the GeForce GTX 980 Ti be the next high-end GPU to make a splash in the market, or will it make a thud at the bottom of the GPU gene pool? Let's dive into it, shall we?
Big Things, Small Packages
Sapphire isn’t a brand we have covered in a while, so it is nice to see a new and interesting product drop on our door. Sapphire was a relative unknown until around the release of the Radeon 9700 Pro days. This was around the time when ATI decided that they did not want to be so vertically integrated, so allowed other companies to start buying their chips and making their own cards. This was done to provide a bit of stability for ATI pricing, as they didn’t have to worry about a volatile component market that could cause their margins to plummet. By selling just the chips to partners, ATI could more adequately control margins on their own product while allowing their partners to make their own deals and component choices for the finished card.
ATI had very limited graphics card production of their own, so they often would farm out production to second sources. One of these sources ended up turning into Sapphire. When ATI finally allowed other partners to produce and brand their own ATI based products, Sapphire already had a leg up on the competition by being a large producer already of ATI products. They soon controlled a good portion of the marketplace by their contacts, pricing, and close relationship with ATI.
Since this time ATI has been bought up by AMD and they no longer produce any ATI branded cards. Going vertical when it come to producing their own chips and video cards was obviously a bad idea, we can look back at 3dfx and their attempt at vertical integration and how that ended for the company. AMD obviously produces an initial reference version of their cards and coolers, but allows their partners to sell the “sticker” version and then develop their own designs. This has worked very well for both NVIDIA and AMD, and it has allowed their partners to further differentiate their product from the competition.
Sapphire usually does a bang up job on packaging the graphics card. Oh look, a mousepad!
Sapphire is not as big of a player as they used to be, but they are still one of the primary partners of AMD. It would not surprise me in the least if they still produced the reference designs for AMD and then distributed those products to other partners. Sapphire is known for building a very good quality card and their cooling solutions have been well received as well. The company does have some stiff competition from the likes of Asus, MSI, and others for this particular market. Unlike those two particular companies, Sapphire obviously does not make any NVIDIA based boards. This has been a blessing and a curse, depending on what the cycle is looking like between AMD and NVIDIA and who has dominance in any particular marketplace.
High Bandwidth Memory
UPDATE: I have embedded an excerpt from our PC Perspective Podcast that discusses the HBM technology that you might want to check out in addition to the story below.
The chances are good that if you have been reading PC Perspective or almost any other website that focuses on GPU technologies for the past year, you have read the acronym HBM. You might have even seen its full name: high bandwidth memory. HBM is a new technology that aims to turn the ability for a processor (GPU, CPU, APU, etc.) to access memory upside down, almost literally. AMD has already publicly stated that its next generation flagship Radeon GPU will use HBM as part of its design, but it wasn’t until today that we could talk about what HBM actually offers to a high performance processor like Fiji. At its core HBM drastically changes how the memory interface works, how much power is required for it and what metrics we will use to compare competing memory architectures. AMD and its partners started working on HBM with the industry more than 7 years ago, and with the first retail product nearly ready to ship, it’s time to learn about HBM.
We got some time with AMD’s Joe Macri, Corporate Vice President and Product CTO, to talk about AMD’s move to HBM and how it will shift the direction of AMD products going forward.
The first step in understanding HBM is to understand why it’s needed in the first place. Current GPUs, including the AMD Radeon R9 290X and the NVIDIA GeForce GTX 980, utilize a memory technology known as GDDR5. This architecture has scaled well over the past several GPU generations but we are starting to enter the world of diminishing returns. Balancing memory performance and power consumption is always a tough battle; just ask ARM about it. On the desktop component side we have much larger power envelopes to work inside but the power curve that GDDR5 is on will soon hit a wall, if you plot it far enough into the future. The result will be either drastically higher power consuming graphics cards or stalling performance improvements of the graphics market – something we have not really seen in its history.
While it’s clearly possible that current and maybe even next generation GPU designs could still have depended on GDDR5 as the memory interface, the move to a different solution is needed for the future; AMD is just making the jump earlier than the rest of the industry.
DirectX 12 Has No More Secrets
The DirectX 12 API is finalized and the last of its features are known. Before the BUILD conference, the list consisted of Conservative Rasterization, Rasterizer Ordered Viewed, Typed UAV Load, Volume Tiled Resources, and a new Tiled Resources revision for non-volumetric content. When the GeForce GTX 980 launched, NVIDIA claimed it would be compatible with DirectX 12 features. Enthusiasts were skeptical, because Microsoft did not officially finalize the spec at the time.
Last week, Microsoft announced the last feature of the graphics API: Multiadapter.
We already knew that Multiadapter existed, at least to some extent. It is the part of the specification that allows developers to address multiple graphics adapters to split tasks between them. In DirectX 11 and earlier, secondary GPUs would remain idle unless the graphics driver sprinkled some magic fair dust on it with SLI, CrossFire, or Hybrid CrossFire. The only other way to access this dormant hardware was by spinning up an OpenCL (or similar compute API) context on the side.