Review Index:
Feedback

AMD Radeon HD 7970 3GB Graphics Card Review - Tahiti at 28nm

Author:
Manufacturer: AMD

The First 28nm GPU Architecture

It is going to be an exciting 2012. Both AMD and NVIDIA are going to be bringing gamers entirely new GPU architectures, Intel has Ivy Bridge up its sleeve and the CPU side of AMD is looking forward to the introduction of the Piledriver lineup. Today though we end 2011 with the official introduction of the AMD Southern Islands GPU design, a completely new architecture from the ground up that engineers have been working on for more than three years.

This GPU will be the first on several fronts: the first 28nm part, the first cards with support for PCI Express 3.0 and the first to officially support DirectX 11.1 coming with Windows 8. Southern Islands is broken up into three different families starting with Tahiti at the high-end, Pitcairn for sweet spot gaming and Cape Verde for budget discrete options. The Radeon HD 7970 card that is launching today with availability in early January is going to be the top-end single GPU option, based on Tahiti.

Let's see what 4.31 billion transistors buys you in today's market.  I have embedded a very short video review here as well for your perusal but of course, you should continue down a bit further for the entire, in-depth review of the Radeon HD 7970 GPU.

Southern Islands - Starting with Tahiti

Before we get into benchmark results we need to get a better understanding of this completely new GPU design that was first divulged in June at the AMD Fusion Developer Summit. At that time, our own lovely and talented Josh Walrath wrote up a great preview of the architecture that remains accurate and pertinent for today's release. We will include some of Josh's analysis here and interject with anything new that we have learned from AMD about the Southern Islands architecture.

When NVIDIA introduced the G80, they took a pretty radical approach to GPU design. Instead of going with previous VLIW architectures which would support operations such as Vec4+Scalar, they went with a completely scalar architecture. This allowed a combination of flexibility of operation types, ease of scheduling, and a high utilization of compute units. AMD has taken a somewhat similar, but still unique approach to their new architecture.

View Full Size

Continue reading our review of the AMD Radeon HD 7970 3GB graphics card and Southern Islands architecture!!

Instead of going with a purely scalar setup like NVIDIA, they opted for a vector + scalar solution. The new architecture revolves around the Compute Unit, which contains all of the functional units. The CU can almost be viewed as a fully independent processor. The unit features its own L1 cache, branch and MSG unit, control and decode unit, instruction fetch arbitration functionality, and the scalar and vector units.

View Full Size

The vector units are the primary workers in the CU when it comes to crunching numbers. Each unit contains four cores, and allows for four “wavefronts” to be processed at any one time. Because AMD stepped away from the VLIW5/4 architectures, and have gone with a vector+scalar setup, we expect to see a high utilization of each unit as compared to the old. We also expect scheduling to be much easier and efficient, which will again improve performance and efficiency. The scalar unit will actually be responsible for all of the pointer ops as well as branching code. This particular setup harkens back to the Cray supercomputers of the 1980s. The combination of scalar and vector processors was very intuitive for the workloads back then, and that follows onto the workloads of today that AMD looks to address.

View Full Size

The combination of these processors and the overall design of each CU gives it the properties of different types of units. It is a MIMD (multiple instructions multiple data) in that it can address four threads per cycle per vector, from different apps. It acts as a SIMD (single instruction multiple data) much like the previous generation of GPUs. Finally it has SMT (symmetric multi-threading) in that all four vector cores can be working on different instructions, and there are 40 waves active in each CU at any one time. Furthermore, as mentioned in the slide, it supports multiple asynchronous and independent command streams. Essentially the unit is able to work on all kinds of workloads at once, no matter what the source of the data or instructions are.

View Full Size

A full Tahiti GPU will feature 32 of the Compute Units, each made up of 64 stream processors. That brings the entire GPU to a total of 2048 SPs, a 33% jump over the Cayman architecture that was built around 1536 processors.

Memory and Caches

One area that AMD did detail extensively was the changes in the internal cache, as well as their push for fully virtualized memory. Each CU has its own L1 cache divided into data, instruction, and load/store. The GPU then has shared L2 cache which is fully coherent. Each L1 cache has a 64 bit interface with the L2, and once this scales in terms of both CU count and GPU clockspeed, we can expect to see multiple terabytes per second of bandwidth between the caches. The L1 caches and texture caches are now read/write, as compared to the read only units in previous architectures. This is a big nod not only to efficiency and performance, but also the type of caches needed for some serious compute type workloads.

View Full Size

The next level of memory support is that of full virtualization of memory with the CPU. Previous generations of products were limited to what memory and cache were onboard each video card. This posed some limitations on not just content in graphics, but were also problematic in compute type scenarios. Large data sets proved to be troublesome, and required a memory virtualization system which was separate from the CPUs virtual memory. By adopting x86-64 virtual memory support on the GPU, this gets rid of a lot of the problems in previous cards. The GPU shares the virtual memory space, which improves data handling and locality, as well as gracefully surviving unhappy things like page faults and oversubscriptions. This again is aimed at helping to improve the programming model. With virtual memory, the GPU’s state is not hidden, and it should also allow for fast context switches as well as context switch pre-emption. State changes and context switches can be quite costly, so when working in an environment that features both graphics based and compute workloads, the added features described above should make things go a whole lot smoother, as well as be significantly faster, thereby limiting the amount of downtime per CU.

View Full Size

It also opens up some new advantages to traditional graphics. “Megatextures” which will not fit on a card’s frame buffer can be stored in virtual memory. While not as fast as onboard, it is still far faster than loading up the texture from the hard drive. This should allow for more seamless worlds. I’m sure John Carmack is quite excited about this technology.

Obviously the 32 CU on Tahiti make up the majority of the architecture but there are several other keys to look at.  Just as we saw with Cayman, Southern Islands will offer dual geometry engines for improved scalability as well as eight render back-ends with 32 ROP engines

View Full Size

The memory interface gets a nice boost moving from the 256-bit GDDR5 interface on Cayman to a 384-bit interface capable of 264 GB/sec of bandwidth.  There are six individual 64-bit dual-channel memory controllers that create an unbalanced render back-end ratio of 4:6. 

View Full Size

And of course, the architecture rounds out with the controllers for the PCI Express 3.0 interface capable of 8 GigaTransfers / second (essentially double that of PCIe 2.0), Eyefinity display controllers, UVD engine, CrossFire compositor, etc.  All of this adds up to an amazing 4.31 billion transistors on a 28nm process technology inside a 365mm2 die

You might be wondering why we don't have the typical die shot of the lovely new 28nm Tahiti GPU.  In truth, I have no answer for you, other than we asked several times and were told in each instance that they didn't have one.  While this did arouse some suspicion from us as the the design of Tahiti, AMD assured us there were no tricks up its sleeve, the die was built with 36 CUs with 4 disabled, for example.

February 4, 2012 | 08:28 AM - Posted by SiliconDoc (not verified)

Gee Mark, telling the truth, while calling Nvidia "turtle!" a couple of times is shilling for the eeevil "them" ?
Is this place that much pop culture red amd fanboy that you have to excuse yourself even after calling Nvidia names?
It's a sad biased world when what is patently obvious to a truthful mind must be parsed and dissed just to try to get along and not be attacked.
I'm frankly sick of it.
It's okay ot say AMDATI is a copycat and came in late to superior Nvidia styled architecture and they will get beaten by Nvidia in the next release - it's OK to say it.
In fact, we've all probably heard the latter part already in leaked rumor - at which point, for those who keep up, the AMD fanboys went angry wild and screamed it was a group of Nvidia marketing team spammers who leaked $299 and faster... even though Charlie claimed he saw the benchmarks...
Well, in this case time will tell, and thanks for speaking your mind, and no thanks for calling Nvidia turtle twice, and don't apologize just because red fanboys will attack you, that is, if you can help yourself. Is it really that bad, the attacking of anyone who says a single bad word against AMD and has a good prediction for Nvidia that just makes sense based upon endless years of patterns...that you have to caveat some simple truths and cover the bases with some armor ?

December 23, 2011 | 03:04 PM - Posted by Stas (not verified)

From what I see, AMD takes over GTX580 by a ridiculous margin, being absurdly priced. Ryan, raise GTX580 gpu and memory clocks to meet HD 7970 ones, and GTX580 will show same or better performance. In the result, HD 7970 can be buried straight away.

February 4, 2012 | 08:36 AM - Posted by SiliconDoc (not verified)

That's an interesting thought. If that pans out, we shouldn't be hearing about "superior architecture" from those who promote AMD, endlessly, it seems.
However, they will just claim superior tech anyway and blame the "release" drivers and claim future releases will unleash the "true power". The next breath they will claim AMDATI no longer has driver issues and is now equivalent to Nvidia...
---
YES I say, let's see an apples clocks comparison.

December 24, 2011 | 11:13 AM - Posted by gabriel (not verified)

that makes my life even harder...

Now I don't know what to do, buy a 6990 or one 7970 and upgrade later by getting a second one, or even if I should wait for the 7990.

what do you guys think?

December 24, 2011 | 02:20 PM - Posted by Ryan Shrout

Hard to say - as I noted in my review, I almost always pick a single GPU over a dual-GPU solution if performance is close the same. Less hassle to deal with and fewer potential problems.

December 26, 2011 | 01:30 AM - Posted by Swoosh (not verified)

Since the release of AMD's 7900 series video cards is just
the corner, its best to wait for the release of the mighty
7990, so your options as far as future proof gaming will be
much longer and may infact (if you are using a single full HD
monitor) will never have to upgrade video card again because
that 7990 for sure is one helluva long, bad ass, intimidating,
very very fast video card that will give you the gaming
performance more than you expected and if you're into movie
editing, its performance (as far as openCL GPU acceleration
support is concerned) will even give you faster renderings
like never before that includes 3d renderings that supports
OpenCL. :-)

December 24, 2011 | 01:43 PM - Posted by Swede (not verified)

Great review!

Man, is there an easy way to get your hand on gfx cards in general and get them shipped overseas? As of right now the 580's are averaging at $680 over here (in Sweden) and I'd like to upgrade from my 5850.

December 26, 2011 | 01:34 AM - Posted by Swoosh (not verified)

Upgrading from 5850 to 7970?

Well, sounds like a very good upgrade decision for me!
go for it! :-)

December 26, 2011 | 09:05 PM - Posted by nabokovfan87

try amazon .uk/.de or somewhere

They have shipped stuff to me in the US, they might do it to your location as well. especially .uk to Sweden.

December 27, 2011 | 04:14 PM - Posted by Z (not verified)

HOW DO PEOPLE HAVE REVIEWS OUT ? the PROPER drivers for PCI-E Gen3 are not even out !-?

December 29, 2011 | 08:38 AM - Posted by Anonymous (not verified)

Just wondering how much WATTS does my power supply have to be to support it?

January 2, 2012 | 04:02 AM - Posted by crunzaty (not verified)

i think minimum is 450 Not quite sure ^^

January 2, 2012 | 04:01 AM - Posted by crunzaty (not verified)

I think nVIDIA will come with a new card that will beat this and be cheaper :)

January 9, 2012 | 09:02 AM - Posted by soldierguy

Hey Ryan..you mention a future review of Eyefinity/Crossfire...For sure I'd like to see that.
Nice review thanks.

January 18, 2012 | 01:04 PM - Posted by Anonymous (not verified)

This is a great card. I was a little hesitant to fire sale my 2 MSI 6970 Lightnings (which are amazing) and buying a single 7970, but it was unwarranted. The performance, once easily overclocked (by ATI/AMD's stock overdrive), is on par with my previous setup on games that supported crossfire. The games that didn't support it, well this card just blows my previous setup out of the water in that scenario.

Another point that ALMOST made me sit on my hands for a while and wait was going from top of the line custom cards (msi lightnings)to launch day OEM cards. I've only purchased cards with after market coolers for the past 3 generations but I just couldn't wait to try out the 7970. I'm happy to say that while gaming, I don't notice a difference in fan noise. If anything, it runs cooler than my previous setup (obviously crossfire had a lot to do with it) and it may be actually quieter. While the Lightnings had an excellent HSF along with all sorts of other custom parts, the 7970's 28nm process just really nullifies the need for a monster HSF setup.

Plenty of great reviews out there on the interwebz, go read those. Just wanted to 5 star this because of the fool with the 3 star review spreading FUD.

http://www.amazon.com/gp/product/B006O714FI/ref=as_li_ss_tl?ie=UTF8&tag=...

February 24, 2012 | 07:48 PM - Posted by Danny (not verified)

just a question..

i've bought one of this cards... but i have a problem. When i start gaming the Graphic card heats so much, over 70 degrees Celcios.

Any idea of what the problem?

October 8, 2012 | 10:20 AM - Posted by Anonymous (not verified)

but will it play Wolfenstein 3d in ultra settings. Just kidding this card rocks add a second card in xfire unstopable

July 12, 2013 | 09:52 PM - Posted by fake name forced me (not verified)

I get 10 or 20 fps of choppy, erratic performance in TF2 on w8000 in linux. A software renderer would probably be faster. The linux drivers are garbage.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.