Trinity Finally Comes to the Desktop

We finally have the FULL review of AMD’s Trinity Products.

Trinity.  Where to start?  I find myself asking that question, as the road to this release is somewhat tortuous.  Trinity, as a product code name, came around in early 2011.  The first working silicon was shown that Summer.  The first actual release of product was the mobile part in late Spring of this year.  Throughout the summer notebook designs based on Trinity started to trickle out.  Today we cover the release of the desktop versions of this product.

AMD has certainly had its ups and downs when it comes to APU releases.  Their first real APU was Zacate, based on the new Bobcat CPU architecture.  This product was an unmitigated success for AMD.  Llano, on the other hand, had a pretty rocky start.  Production and various supply issues caused it to be far less of a success than hoped.  These issues were oddly enough not cleared up until late Spring of this year.  By then mobile Trinity was out and people were looking towards the desktop version of the chip.  AMD saw the situation, and the massive supply of Llano chips that it had, and decided to delay introduction of desktop Trinity until a later date.

To say that expectations for Trinity are high is an understatement.  AMD has been on the ropes for quite a few years in terms of CPU performance.  While the Phenom II series were at least competitive with the Core 2 Duo and Quad chips, they did not match up well against the latest i7/i5/i3 series of parts.  Bulldozer was supposed to erase the processor advantage Intel had, but it came out of the oven as a seemingly half baked part.  Piledriver was designed to succeed Bulldozer, and is supposed to shore up the architecture to make it more competitive.  Piledriver is the basis of Trinity.  Piledriver does sport significant improvements in clockspeed, power consumption, and IPC (instructions per clock).  People are hopeful that Trinity would be able to match the performance of current Ivy Bridge processors from Intel, or at least get close.

So does it match Intel?  In ways, I suppose.  How much better is it than Bulldozer?  That particular answer is actually a bit surprising.  Is it really that much of a step above Llano?  Yet another somewhat surprising answer for that particular question.  Make no mistake, Trinity for desktop is a major launch for AMD, and their continued existence as a CPU manufacturer depends heavily on this part.

Last week we showed off the basic gaming performance of Trinity, and there was more than a little bit of controversy surrounding that release.  While the gaming tests showed the A10 5800K to be head and shoulders above everything else in the world (at least in the integrated GPU world), there were those who thought that AMD was trying to dictate reviews to give readers a far too positive impression of Trinity’s performance.  Our opinion was that we wanted to get performance data out there as soon as possible, but we also warned readers that there was more to Trinity than just good graphics performance.

 

The Technology Behind Trinity

We covered Trinity this past Spring when the mobile parts were released, but we can certainly take a quick refresher on what makes up the processor.

There are many major upgrades to this latest APU as compared to the previous Llano.  The big two are of course the use of the Piledriver micro-architecture instead of the older generation “Husky” that served Llano and basically originated with the Athlon II series.  The graphics portion is also greatly changed by using the newer VLIW 4 architecture rather than the older VLIW 5.  These two are the biggest and most obvious changes.

Going to the Piledriver core should allow AMD a larger range of power envelopes than Llano featured.  There is a lot more finely grained power control throughout the design, and the overall architecture is just slightly more power efficient per clock than the previous generation.  AMD also included all of the new bells and whistles when it comes to architectural innovations such as AVX, FMA4, FMA3, and other new operations that should improve performance.  Piledriver is a reworked Bulldozer design, and improvements under the hood again help not only IPC, but also power characteristics.  Piledriver should be able to clock higher, achieve higher IPC, and still have the same or lower TDP when it comes to module count and clockspeed.

VLIW 4 was originally introduced with the HD 6900 series of graphics chips and features greater stream unit utilization than the previous VLIW 5 architecture when it comes to DX10 and DX11 workloads.  While there are fewer stream units overall (384 vs. 400 in Llano), the redesigned unit is able to run at a higher clockspeed and is more efficient per clock in most workloads.  The A8 3870K had a GPU clockspeed of 600 MHz, while the A10 5800K is at 800 MHz.

GPGPU applications should also run faster on this particular unit due to all of the internal changes and the move to VLIW 4.  AMD has been much more aggressive as of late in tuning their OpenCL performance, and Trinity can achieve a pretty impressive 763 GFLOPs of performance at the top end.  This is up from the 500 GFLOPs area of Llano.

Trinity also features an updated UVD unit (Universal Video Decoder) which offloads even more work from the CPU by natively supporting operations from MVC, DivX, and MPEG-4.  Outputs also get an upgrade as Trinity can natively support up to 3+1 monitors.  Yes, Trinity can do Eyefinity if a user wants to.

Internally things are a lot more beefy when it comes to interconnects.  There is a lot more bandwidth internal to the chip than ever before with an AMD APU.  This should allow better communication from the CPU cores to the GPU, as well as a greater utilization of available bandwidth from main memory.

Trinity will come in two general categories for the desktop.  There will be higher end 100 watt parts and then more mainstream 65 watt units.  There will also be Trinity units branded as Athlon X4 and will not have a working graphics portion.

Turbo Core 3.0 is another upgraded function that allows the different units to speed up and slow down, depending on utilization.  If an application is CPU heavy, then the CPU will be allocated extra power and will clock into Turbo Mode, while the GPU downclocks so that the overall TDP does not go over the limit.  If an application is more GPU heavy, then that part is allowed more power and can be run at full speed while the CPU is downclocked.  This little dance happens in milliseconds on the processor and there is dedicated logic used to monitor the cores and their usage.

Trinity is fabricated by GLOBALFOUNDRIES on their 32 nm PD-SOI/HKMG process.  This is by now a very mature process with good yields and well known characteristics.  This is probably another reason why Trinity should be more efficient overall.

« PreviousNext »