hUMA has come with a weapon to slay the memory latency dragon

Subject: General Tech | April 30, 2013 - 01:23 PM |
Tagged: Steamroller, piledriver, Kaveri, Kabini, hUMA, hsa, GCN, bulldozer, APU, amd

AMD may have united GPU and CPU into the APU but one hurdle had remained until now, the the non-uniformity of memory access between the two processors.  Today we learned about one of the first successful HAS projects called Heterogeneous Uniform Memory Access, aka hUMA, which will appear in the upcoming Kaveri chip family.   The use of this new technology will allow the on-die CPU and GPU to access the same memory pool, both physical and virtual and any data passed between the two processors will remain coherent.  As The Tech Report mentions in their overview hUMA will not provide as much of a benefit to discrete GPUs, while they will be able to share address space the widely differing clock speeds between GDDR5 and DDR3 prevent unification to the level of an APU.

Make sure to read Josh's take as well so you can keep up with him on the Podcast.

huma_02.jpg

"At the Fusion Developer Summit last June, AMD CTO Mark Papermaster teased Kaveri, AMD's next-generation APU due later this year. Among other things, Papermaster revealed that Kaveri will be based on the Steamroller architecture and that it will be the first AMD APU with fully shared memory.

Last week, AMD shed some more light on Kaveri's uniform memory architecture, which now has a snazzy marketing name: heterogeneous uniform memory access, or hUMA for short."

Here is some more Tech News from around the web:

Tech Talk

Author:
Subject: Processors
Manufacturer: AMD

heterogeneous Uniform Memory Access

 

Several years back we first heard AMD’s plans on creating a uniform memory architecture which will allow the CPU to share address spaces with the GPU.  The promise here is to create a very efficient architecture that will provide excellent performance in a mixed environment of serial and parallel programming loads.  When GPU computing came on the scene it was full of great promise.  The idea of a heavily parallel processing unit that will accelerate both integer and floating point workloads could be a potential gold mine in wide variety of applications.  Alas, the promise of the technology did not meet expectations when we have viewed the results so far.  There are many problems with combining serial and parallel workloads between CPUs and GPUs, and a lot of this has to do with very basic programming and the communication of data between two separate memory pools.

huma_01.jpg

CPUs and GPUs do not share common memory pools.  Instead of using pointers in programming to tell each individual unit where data is stored in memory, the current implementation of GPU computing requires the CPU to write the contents of that address to the standalone memory pool of the GPU.  This is time consuming and wastes cycles.  It also increases programming complexity to be able to adjust to such situations.  Typically only very advanced programmers with a lot of expertise in this subject could program effective operations to take these limitations into consideration.  The lack of unified memory between CPU and GPU has hindered the adoption of the technology for a lot of applications which could potentially use the massively parallel processing capabilities of a GPU.

The idea for GPU compute has been around for a long time (comparatively).  I still remember getting very excited about the idea of using a high end video card along with a card like the old GeForce 6600 GT to be a coprocessor which would handle heavy math operations and PhysX.  That particular plan never quite came to fruition, but the idea was planted years before the actual introduction of modern DX9/10/11 hardware.  It seems as if this step with hUMA could actually provide a great amount of impetus to implement a wide range of applications which can actively utilize the GPU portion of an APU.

Click here to continue reading about AMD's hUMA architecture.

Author:
Subject: Motherboards
Manufacturer: ASUS

AM3+ Last Gasp?

 

Over the past several years I have reviewed quite a few Asus products.  The ones that typically grab my attention are the ROG based units.  These are usually the most interesting, over the top, and expensive products in their respective fields.  Ryan has reviewed the ROG graphics cards, and they have rarely disappointed.  I have typically taken a look at the Crosshair series of boards that support AMD CPUs.

chvfz_01.jpg

Crosshair usually entails the “best of the best” when it comes to features and power delivery.  My first brush with these boards was the Crosshair IV.  That particular model was only recently taken out of my primary work machine.  It proved itself to be an able performer and lasted for years (even overclocked).  The Crosshair IV Extreme featured the Lucid Hydra chip to allow mutli-GPU performance without going to pure SLI or Crossfire.  The Crosshair V got rid of Lucid and added official SLI support and it incorporated the Supreme FX II X-Fi audio.  All of these boards have some things in common.  They are fast, they overclock well, and they are among the most expensive motherboards ever for the AMD platform.

So what is there left to add?  The Crosshair V is a very able platform for Bulldozer and Piledriver based parts.  AMD is not updating the AM3+ chipsets, so we are left with the same 990FX northbridge and the SB950 southie (both of which are essentially the same as the 890FX/SB850).  It should be a simple refresh, right?  We had Piledriver released a few months ago and there should be some power and BIOS tweaks that can be implemented and then have a rebranded board.  Sounds logical, right?  Well, thankfully for us, Asus did not follow that path.

The Asus Crosshair V Formula Z is a fairly radical redesign of the previous generation of products.  The amount of extra features, design changes, and power characteristics make it a far different creature than the original Crosshair V.  While both share many of the same style features, under the skin this is a very different motherboard.  I am rather curious why Asus did not brand this as the “Crosshair VI”.  Let’s explore, shall we?

Click here to read the entire review on the ASUS Crosshair V Formula-Z

Welcome Richland, another refined die from AMD

Subject: Processors | March 12, 2013 - 02:52 PM |
Tagged: VLIW4, trinity, Richland, piledriver, notebook, mobile, hd 8000, APU, amd, A10-5750

The differences between Richland and Trinity are not earth shattering but there are certainly some refinements implemented by AMD in the A10-5750.  One very noticeable one is support for DDR3-1866 as well as better power management for both the CPU and GPU; with new temperature balancing algorithms and measurement the ability to balance the load properly has increased from Trinity.  Many AMD users will be more interested in the GPU portion of the die than the CPU, as that is where AMD actually has as lead on Intel and this particular chip contains the HD8650G, with clocks of 720MHz boost and 533MHz base and increase from the previous generation of 35 and 37MHz respectively.  You can read more about the other three models that will be released over at The Tech Report.

Don't forget Josh either!

TR_dice.jpg

"AMD has formally introduced the first members of its Richland APU family. We have the goods on the chips and Richland's new power management tech, which combines temperature-based inputs with bottleneck-aware clock boosting."

Here are some more Processor articles from around the web:

Processors

Author:
Subject: Processors
Manufacturer: AMD

AMD Exposes Richland

When we first heard about “Richland” last year, there was a little bit of excitement from people.  Not many were sure what to expect other than a faster “Trinity” based CPU with a couple extra goodies.  Today we finally get to see what Richland is.  While interesting, it is not necessarily exciting.  While an improvement, it will not take AMD over the top in the mobile market.  What it actually brings to the table is better competition and a software suite that could help to convince buyers to choose AMD instead of a competing Intel part.

From a design standpoint, it is nearly identical to the previous Trinity.  That being said, a modern processor is not exactly simple.  A lot of software optimizations can be applied to these products to increase performance and efficiency.  It seems that AMD has done exactly that.  We had heard rumors that the graphics portion was in fact changed, but it looks like it has stayed the same.  Process improvements have been made, but that is about the extent of actual hardware changes to the design.

rich_01.jpg

The new Richland APUs are branded the A-5000 series of products.  The top end is the A10-5750M with HD-8650 integrated graphics.  This is still the VLIW-4 based graphics unit seen in the previous Trinity products, but enough changes have been made with software that I can enable Dual Graphics with the new Solar System based GPUs (GCN).  The speeds of these products have received a nice boost.  As compared to the previous top end A10-4600, the 5750 takes the base speed from 2.3 GHz to 2.5 GHz.  Boost goes from 3.2 GHz up to 3.5 GHz.  The graphics portion takes the base clock from 496 MHz up to 533 MHz, while turbo mode improves over the 4600 from 685 MHz to 720 MHz.  These are not staggering figures, but it all still fits within the 35 watt TDP of the previous product.

rich_02.jpg

One other important improvement is the ability to utilize DDR-3 1866 memory.  Throughout the past year we have seen memory densities increase fairly dramatically without impacting power consumption.  This goes for speed as well.  While we would expect to see lower power DIMMs be used in the thin and light categories, expect to see faster DDR-3 1866 in the larger notebooks that will soon be heading our way.

Click here to read more about AMD's Richland APUs!

Podcast #226 - Dual GTX 690 System from Origin, Intel's new SATA6 controller, Piledriver-based Opeterons and more!

Subject: General Tech | November 8, 2012 - 01:33 PM |
Tagged: ssd, sata6, podcast, piledriver, pcper, origin, opeteron, nvidia, Intel, genesis, corsair, amd, 690

PC Perspective Podcast #226 - 11/08/2012

Join us this week as we talk about a Dual GTX 690 System from Origin, Intel's new SATA6 controller, Piledriver-based Opeterons and more!

You can subscribe to us through iTunes and you can still access it directly through the RSS page HERE.

The URL for the podcast is: http://pcper.com/podcast - Share with your friends!

  • iTunes - Subscribe to the podcast directly through the iTunes Store
  • RSS - Subscribe through your regular RSS reader
  • MP3 - Direct download link to the MP3 file

Hosts: Ryan Shrout, Jeremy Hellstrom, Josh Walrath, and  Allyn Malventano

This Podcast is brought to you by MSI!

Program length: 1:21:17

Podcast topics of discussion:

  1. Join us for the MoH Game Stream!
  2. Week in Reviews:
    1. 0:04:30 Corsair Vengeance C70 Case
    2. 0:07:30 ASUS P8Z77 WS Motherboard
    3. 0:12:20 ORIGIN Genesis Dual GTX 690 System
    4. 0:16:40 Silverstone 450 watt SFX Power Supply
  3. 0:19:30 This podcast is brought to you by MSI
  4. News items of interest:
    1. 0:20:25 Intel Crystal Forest Communications Platform
    2. 0:23:30 Google Nexus 10 tablet
    3. 0:27:00 Corsair Hydro H100i and H80i coolers
    4. 0:34:00 New Corsair AXi series power supplies
    5. 0:36:30 Intel DC S3700 Enterprise SSD
    6. 0:46:30 AMD Launches Piledriver based Opteron 6300 chips
    7. 0:51:10 Get Assassin's Creed III for Samsung SSD
    8. 0:52:45 Limited Linux Steam Beta starts
    9. 0:56:15 Zotac AD06 with new AMD APU
    10. 0:58:30 Mouse.. DRM!?
  5. Closing:
    1. Hardware / Software Pick of the Week
      1. Ryan: Corsair Vengence MM200 and MM400 Mouse Mats
      2. Jeremy: Movember and Is this thing on or did it crash? or NewEgg
      3. Josh: Everyone needs a mouse
      4. Allyn: Shure SE315-CL and CBL-M+-K
  1. 1-888-38-PCPER or podcast@pcper.com
  2. http://pcper.com/podcast
  3. http://twitter.com/ryanshrout and http://twitter.com/pcper
  4. Closing/outro

Be sure to subscribe to the PC Perspective YouTube channel!!

 

Finally something new for the server team from AMD

Subject: General Tech, Processors | November 6, 2012 - 01:30 PM |
Tagged: piledriver, opteron 6300, amd, Abu Dhabi

AMD_6300.jpg

Low power, high density server designs are very important but it is nice to see updates on the more powerful server processors as well, something quite rare so far in 2012.  AMD has finally released their Opteron 6300 family, with ten members bearing between 8 to 16 cores and all running at over 3GHz.  We don't have any reviews to offer, so the only performance benchmarks are from AMD's press releases, but you can expect more change than just an increase in frequency as this is a Piledriver based chip.  The Register has put together a high level overview of the new Opterons or you can head on over to AMD to check out the information on offer there.  Cray is already shipping servers based on these chips, with Dell and HP releasing a variety of servers in the near future.

AMD_6300price.png

"Customers using big ol' fat x86 servers didn't have much to jump for joy about this year. There just isn't a lot going on. But to make things interesting, AMD is now goosing the performance of its top-end parts with the launch of its "Abu Dhabi" Opteron 6300s, which sport the "Piledriver" cores that already debuted in the FX Series of high-end desktop chips."

Here is some more Tech News from around the web:

Tech Talk

Source: The Register

AMD Launches Piledriver-based Opteron 6300 Server Chips

Subject: Processors | November 6, 2012 - 01:15 PM |
Tagged: server, piledriver, opteron, datacenter, cpu, amd

AMD announced new server processors on Monday based on the same Piledriver architecture used in the Trinity APUs and Vishera desktop CPUs we recently reviewed. With the release of the Opteron 6300 series, AMD is bringing Piledriver to the server room.

The new chips – similar to the desktop counterparts – bring several performance improvements over the previous generation 6200 series Opterons based on the Bulldozer architecture. AMD is positioning the chips as a upgrade path to existing servers and on merits of performance-per-dollar efficiency. As is AMD's fashion, the new chips are competitively priced and "good enough" performance-wise. With 6300, AMD has stated the goal is to reduce the TCO, or Total Cost of Ownership for servers used in data centers, supercomputers, and enterprises by being compatible with existing AMD server platforms with a BIOS upgrade and representing efficiency improvements over previous chips.

Opteron_6300_Series_hand_metallic_background.jpg

The Opteron 6300 series CPUs themselves build upon the Vishera desktop parts by adding more cores and more L3 cache. The server parts will have up to 16 cores clocked at 2.8GHz base and 3.2GHz turbo. They will have TDP ratings between 85W and 140W and will feature prices from $500 to $1,400. On the cache front, the chips have a 16KB L1 data cache per core, 64KB L1 instruction cache per module, 1MB L2 cache per core, and a shared 16MB cache per socket. AMD has included a quad channel memory controller that supports DDR3 up to 1866 MHz and 1.5TB per server in 4P configurations. AMD has rounded out the chips with four x16 HyperTransport 3.0 links rated at 6.4 GT/s per link. Up to 4 processors per server will be supported, which means a maximum of 64 cores.

Opteron_6300_die_shot_16_core.jpg

With Piledriver, AMD added a number of new instructions including FMA3, BMI, and F16c. The company has also implemented server tweaks to the Bulldozer design to improve branch prediction, instructions per clock, scheduling, and reduced the power draw at higher clockspeeds allowing for the chps to clock higher while staying within the same power envelope of the Bulldozer-based Opteron 6200 series.

AMD is using the same socket as the 6200 series processors, and the new chips can be deployed as an upgrade to the old servers without needing a new motherboard.

Screenshot (339).png

When pitting the new Opteron 6380 to the previous-generation 6278, AMD is claiming a number of performance increases, including a 24-percent and 40-percent improvement in SPECjob2005 and SPECpower_ssj2008 respectively.

Further, the company is claiming competitive performance in server workloads with the Intel competition. AMD offers up benchmarks showing the Opteron 6380 and Xeon E5-2690 trading wins, with the AMD part being slower in the STREAM benchmark, but being slightly faster in LAMPS and NAMD. The allure of the Opteron, according to AMD is that the AMD part is almost half the price of the Intel processor, and is hoping the lower priced parts will encourage adoption. AMD argues that the money saved could easily go towards more RAM or more storage (or simply be saved of course).

Screenshot (338).png

The company has announced that its first major design win is Big Red II supercomputer at Indiana University. Built by Cray, the Big Red II will feature 21,000+ Opteron 6300-series CPU cores paired with NVIDIA GPUs. It represents a massive increase in computing power over IU’s previous Big Red supercomputer with 4,100 CPU cores, and will be used for medical, physics, chemistry, and climate research. Beyond that, AMD has stated more that 30 hardware vendors are slated to introduce servers based on the new Piledriver-based Opteron processors including HP, Dell, Cray, SGI, Supermicro, Sugon, and (of course) SeaMicro. On the software side of things, AMD is working with Microsoft, VMware, Xen, Red Hat, and Openstack. The company also stated that it is leaning on the experience and knowledge gained from the HSA Foundation to improve software support and guide the future direction of Opteron development.

Screenshot (336).png

The Opteron 6300 series is an interesting release that brings several improvements to the company’s server chip offerings. At launch, there are 10 processors to choose from, ranging from the quad core 6308 clocked at 3.5GHz for $501 to the top-end 6386 SE with 16 cores (2.8GHz base, 3.5GHz max turbo) and a $1,392 price tag. The 6366HE is an interesting part as well. It is the same price as the 12-core, 115W TDP Opteron 6348, but its has 16 lower-clocked cores and an 85W TDP. With the non-HE edition processors with 16 cores starting at $703, the 6366HE for $575 is a decent deal if you need multi-threading more than a fewer number of higher clocked cores.

Another bit that I found intriguing is that in a few years, AMD will (likely, if all goes according to plan) be offering processors for just about every type of server. They will have low cost, low power ARM Cortex-A57 based chips, Accelerated Processing Units (APUs) well suited to mixed workloads including GPU-accelerated tasks, and CPU-only chips with lots of traditional x86-64 cores. It seems that Intel will continue to hold the high end on pure performance, but AMD and its SeaMicro server division have not given up competing in the server room by a long shot.

[

Further reading:

The Piledrive architecture and Vishera desktop CPU review and The future of AMD: Vishera and Beyond at PC Perspective.

Piledrivers are elegant in comparison to Bulldozers

Subject: Processors | October 23, 2012 - 02:44 PM |
Tagged: vishera, Steamroller, piledriver, FX-8350, fx-8150, FX-6300, FX-6200, bulldozer, amd

The FX-8350 Vishera processor from AMD has finally arrived with 8 fully unlocked cores of polished Piledriver processing power.  With Piledriver there are no huge changes to the existing Bulldozer architecture, this is more of a polishing and optimizing the existing architecture and [H]ard|OCP's testing bears that out.  While faster than the previous generation FX-8150 it still lags behind Intel's Ivy Bridge processors, disappointing but certainly expected.  The unlocked cores do lend themselves somewhat to overclocking, with [H] hitting a stable 4.6GHz with all cores enabled, a 10% jump in frequency.  At that speed it does better when competing with Intel's offerings, until you overclock them as well at which point the comparative performance suffers somewhat.

Make sure to catch Josh's review, covering both the 8 core FX-8350 and the $132 FX-6300 which has a disabled module; bringing back memories of older AMD chips whose modules could be brought back to life.

H_fx8350.png

"AMD's new Piledriver core technology should not be a surprise to any enthusiast as much of its "embargoed" information has already been exposed on the Net. Today we take the AMD FX series model 8350 desktop variant, code named Vishera, and look at it in an enthusiast way as we expose its IPC at 4GHz, and a bit of overclocking."

Here are some more Processor articles from around the web:

Processors

Source: [H]ard|OCP
Author:
Subject: Processors
Manufacturer: AMD

Bulldozer to Vishera

 

Bulldozer is the word.  Ok, perhaps it is not “the” word, but it is “a” word.  When AMD let that little codename slip some years back, AMD enthusiasts and tech journalists started to salivate about the possibilities.  Here was a unique and very new architecture that promised excellent single thread performance and outstanding multi-threaded performance all in a package that was easy to swallow and digest.  Probiotics for the PC.  Some could argue that the end product for Bulldozer and probiotics are the same, but I am not overly fond of writing articles containing four letter colorful metaphors.

vish_01.jpg

The long and short of Bulldozer is that it was a product that was pushed out too fast, it had specifications that were too aggressive for the time, and it never delivered on the promise of the architecture.  Logically there are some very good reasons behind the architecture, but implementing these ideas into a successful product is another story altogether.  The chip was never able to reach the GHz range it was supposed to and stay within reasonable TDP limits.  To get the chip out in a timely manner, timings had to be loosened internally so the chip could even run.  Performance per clock was pretty dismal, and the top end FX-8150 was only marginally faster than the previous top end Phenom II X6 1100T.  In some cases, the X6 was still faster and a more competent “all around” processor.

There really was not a whole lot for AMD to do about the situation.  It had to have a new product, and it just did not turn out as nicely as they had hoped.  The reasons for this are legion, but simply put AMD is competing with a company that is over ten times the size, with the resulting R&D budgets that such a size (and margins) can afford.  Engineers looking for work are a dime a dozen, and Intel can hire as many as they need.  So, instead of respinning Bulldozer ad nauseum and releasing new speed grades throughout the year by tweaking the process and metal layer design, AMD let the product line sit and stagnate at the top end for a year (though they did release higher TDP models based on the dual module FX-4000 and triple module FX-6000 series).  Engineers were pushed into more forward looking projects.  One of these is Vishera.

Click here to read the rest of the Vishera Review!