AMD Releases 2014 Mobile APU Details: Beema and Mullins Cut TDPs

Subject: Processors | November 13, 2013 - 05:35 PM |
Tagged: Puma, Mullins, mobile, Jaguar, GCN, beema, apu13, APU, amd, 2014

AMD’s APU13 is all about APUs and their programming, but the hardware we have seen so far has been dominated by the upcoming Kaveri products for FM2+.  It seems that AMD has more up their sleeves for release this next year, and it has somewhat caught me off guard.  The Beema and Mullins based products are being announced today, but we do not have exact details on these products.  The codenames have been around for some time now, but interest has been minimal since they are evolutionary products based on Kabini and Temash APUs that have been available this year.  Little did I know that things would be far more interesting than that.

apu13_01.png

The basis for Beema and Mullins is the Puma core.  This is a highly optimized revision of Jaguar, and in some ways can be considered a new design.  All of the basics in terms of execution units, caches, and memory controllers are the same.  What AMD has done is go through the design with a fine toothed comb and make it far more efficient per clock than what we have seen previously.  This is still a 28 nm part, but the extra attention and love lavished upon it by AMD has resulted in a much more efficient system architecture for the CPU and GPU portions.

The parts will be offered in two and four core configurations.  Beema will span from 10W to 25W configurations.  Mullins will go all the way down to “2W SDP”.  SDP essentially means that while the chip can be theoretically rated higher, it will rarely go above that 2W envelope in the vast majority of situations.  These chips are expected to be around 2X more efficient per clock than the previous Jaguar based products.  This means that at similar clock speeds, Beema and Mullins will pull far less power than that previous gen.  It should also allow some higher clockspeeds at the top end 25W area.

apu13_02.png

These will be some of the first fanless quad cores that AMD will introduce for the tablet market.  Previously we have seen tablets utilize the cut down versions of Temash to hit power targets, but with this redesign it is entirely possible to utilize the fully enabled quad core Mullins.  AMD has not given us specific speeds for these products, but we can guess that they will be around what we see currently, but the chip will just have a lower TDP rating.

AMD is introducing their new security platform based on the ARM Trustzone.  Essentially a small ARM Cortex A5 is integrated in the design and handles the security aspects of this feature.  We were not briefed on how this achieves security, but the slide below gives some of the bullet points of the technology.

apu13_03.png

Since the pure-play foundries will not have a workable 20 nm process for AMD to jump to in a timely manner, AMD had no other choice but to really optimize the Jaguar core to make it more competitive with products from Intel and the ARM partners.  At 28 nm the ARM ecosystem has a power advantage over AMD, while at 22 nm Intel offers similar performance to AMD but with greater power efficiency.

This is a necessary update for AMD as the competition has certainly not slowed down.  AMD is more constrained obviously by the lack of a next-generation process node available for 1H 2014, so a redesign of this magnitude was needed.  The performance per watt metric is very important here, as it promises longer battery life without giving up the performance people received from the previous Kabini/Temash family of APUs.  This design work could be carried over to the next generation of APUs using 20 nm and below, which hopefully will keep AMD competitive with the rest of the market.  Beema and Mullins are interesting looking products that will be shown off at CES 2014.

apu13_04.png

Source: AMD

AMD Kaveri's Fast... But Less Than Expected.

Subject: General Tech, Processors | November 12, 2013 - 06:50 PM |
Tagged: Kaveri, apu13, amd

AMD will deliver its latest round of APUs (Kaveri) on January 14th. These processors, built on a 28nm process, will combine the Steamroller architecture on the CPU with HSA-compliant Graphics Core Next (GCN) cores on the GPU. Together they are expected to bring 856 GFLOPs of computational performance.

AMD-Kaveri.jpg

Thomas Ryan at SemiAccurate, however, remembers that AMD expected over a TeraFLOP.

Of course Kaveri has been a troubled chip for AMD. At this point Kaveri is over a year late and most of that delay is due to a series of internal issues at AMD rather than technical problems. But now with the knowledge that Kaveri missed AMD’s internal performance targets by about 20 percent it’s hard to be very positive about AMD’s next big-core APU.

The problem comes from a reduction in the clock rate AMD expected back in February 2012. Steamroller was expected to reach 4 GHz but that has been slightly reduced to 3.7 GHz; this is obviously a small impact from a compute standpoint (weakened by just under10 GFLOPs). The GPU, on the other hand, was cut from 900MHz down to 720 MHz; its performance was reduced by a whole 25% (Update: 20%. Accidentally divided by 720 instead of 900). Using AMD's formula for calculating FLOP performance, Kaveri's 856 GFLOP rating corresponds to an 18% reduction from the original 1050 GFLOP target.

But, personally, I am still positive about Kaveri.

The introduction of HSA features into mainstream x86 processors has begun. The ability to share memory between the CPU and the GPU could be a big deal, especially for tasks such as AI and physics. AI especially interests me (although I am by no means an expert) because it is a mixture of branching and parallel instructions. The HSA model could, potentially, operate on the data with whichever architecture makes sense. Currently, synchronizing CPU and GPU memory is very costly; you could easily spend most of your processing time budget waiting for memory transfers.

856 GFLOPs is a definite reduction from 1050 GFLOPs. Still, if Kaveri (and APUs going forward) can effectively nullify the latencies involved with GPGPU work, an Intel Ivy Bridge-E Core i7 4960X has an instruction throughput of ~160 GFLOPs.

And before you say it: Yes, I know, Ivy Bridge-E can be paired with fast discrete graphics. This combination is ideal for easily separated tasks such as when the CPU prepares a frame and then a GPU draws it; you get the best of both worlds if both can keep working.

But what if your workload is a horrific mish-mash of back-and-forth serial and parallel? That is where AMD might have an edge.

Source: SemiAccurate

Video: Battlefield 4 Running on AMD A10 Kaveri APU and Image Decoder HSA Acceleration

Subject: Graphics Cards, Processors | November 12, 2013 - 06:10 PM |
Tagged: amd, Kaveri, APU, video, hsa

Yesterday at the AMD APU13 developer conference, the company showed off the upcoming Kaveri APU running Battlefield 4 completely on the integrated graphics.  I was able to push the AMD guys along and get a little more personal demo to share with our readers.  The Kaveri APU had some of its details revealed this week:

  • Quad-core Steamroller x86
  • 512 Stream Processor GPU
  • 856 GFLOPS of theoretical performance
  • 3.7 GHz CPU clock speed, 720 MHz GPU clock speed

AMD wanted to be sure we pointed out in this video that the estimate clock speeds for FLOP performance may not be what the demo system was run at (likely a bit lower).  Also, the version of Battlefield 4 here is the standard retail version and with further improvements from the driver team as the upcoming Mantle API implementation will likely introduce even more performance for the APU.

The game was running at 1920x1080 with MOSTLY medium quality settings (lighting set to low) but the results still looked damn impressive and the frame rates were silky and smooth.  Considering this is running on a desktop with integrated processor graphics, the game play experience is simply unmatched.  

Memory in the system was running at 2133 MHz.

The second demo looks at the image decoding acceleration that AMD is going to enable with Kaveri APUs upon release with a driver.  Essentially, as the demonstration shows in the video, AMD is overwriting the integrated Windows JPG decompression algorithm with a new one that utilizes HSA to accelerate on both the x86 and SIMD (GPU) portions of the silicon.  For the most strenuous demo that used 22 MP images saw a 100% increase in performance compared to the Kaveri CPU cores alone.

Author:
Subject: Processors
Manufacturer: AMD

More Details from Lisa Su

The executives at AMD like to break their own NDAs.  Then again, they are the ones typically setting these NDA dates, so it isn’t a big deal.  It is no secret that Kaveri has been in the pipeline for some time.  We knew a lot of the basic details of the product, but there were certainly things that were missing.  Lisu Su went up onstage and shared a few new details with us.

kaveri.jpg

Kaveri will be made up of 4 “Steamroller” cores, which are enhanced versions of the previous Bulldozer/Trinity/Vishera families of products.  Nearly everything in the processor is doubled.  It now has dual decode, more cache, larger TLBs, and a host of other smaller features that all add up to greater single thread performance and better multi-threaded handling and performance.   Integer performance will be improved, and the FPU/MMX/SSE unit now features 2 x 128 bit FMAC units which can “fuse” and support AVX 256.

However, there was no mention of the fabled 6 core Kaveri.  At this time, it is unlikely that particular product will be launched anytime soon. 

Click to read the entire article here!

Author:
Manufacturer: AMD

An issue of variance

AMD just sent along an email to the press with a new driver to use for Radeon R9 290X and Radeon R9 290 testing going forward.  Here is the note:

We’ve identified that there’s variability in fan speeds across AMD R9 290 series boards. This variability in fan speed translates into variability of the cooling capacity of the fan-sink.

The flexibility of AMD PowerTune technology enables us to correct this variability in a driver update. This update will normalize the fan RPMs to the correct values.

The correct target RPM values are 2200RPM for the AMD Radeon R9 290X ‘Quiet mode’, and 2650RPM for the R9 290. You can verify these in GPU-Z.

If you’re working on stories relating to R9 290 series products, please use this driver as it will reduce any variability in fan speeds. This driver will be posted publicly tonight.

Great!  This is good news!  Except it also creates some questions. 

When we first tested the R9 290X and the R9 290, we discussed the latest iteration of AMD's PowerTune technology. That feature attempts to keep clocks as high as possible under the constraints of temperature and power.  I took issue with the high variability of clock speeds on our R9 290X sample, citing this graph:

clock-avg.png

I then did some digging into the variance and the claims that AMD was building a "configurable" GPU.  In that article we found that there were significant performance deltas between "hot" and "cold" GPUs; we noticed that doing simple, quick benchmarks would produce certain results that were definitely not real-world in nature.  At the default 40% fan speed, Crysis 3 showed 10% variance with the 290X at 2560x1440:

Crysis3_2560x1440_OFPS.png

Continue reading our coverage of the most recent driver changes and how they affect the R9 290X and R9 290!!

AMD Releases Catalyst 13.11 Beta 9.2 Driver To Correct Performance Variance Issue of R9 290 Series Graphics Cards

Subject: Graphics Cards, Cases and Cooling | November 8, 2013 - 02:41 AM |
Tagged: R9 290X, powertune, hawaii, graphics drivers, gpu, GCN, catalyst 13.11 beta, amd, 290x

AMD recently launched its 290X graphics card, which is the new high-end single GPU solution using a GCN-based Hawaii architecture. The new GPU is rather large and incorporates an updated version of AMD's PowerTune technology to automatically adjust clockspeeds based on temperature and a maximum fan speed of 40%. Unfortunately, it seems that some 290X cards available at retail exhibited performance characteristics that varied from review units.

Retail versus Review Sample Performance Variance Testing.jpg

AMD has looked into the issue and released the following statement in response to the performance variances (which PC Perspective is looking into as well).

Hello, We've identified that there's variability in fan speeds across AMD R9 290 series boards. This variability in fan speed translates into variability of the cooling capacity of the fan-sink. The flexibility of AMD PowerTune technology enables us to correct this variability in a driver update. This update will normalize the fan RPMs to the correct values.

The correct target RPM values are 2200RPM for the AMD Radeon R9 290X "Quiet mode", and 2650RPM for the R9 290. You can verify these in GPU-Z. If you're working on stories relating to R9 290 series products, please use this driver as it will reduce any variability in fan speeds. This driver will be posted publicly tonight.

From the AMD statement, it seems to be an issue with fan speeds from card to card causing the performance variances. With a GPU that is rated to run at up to 95C, a fan limited to 40% maximum, and dynamic clockspeeds, it is only natural that cards could perform differently, especially if case airflow is not up to par. On the other hand, the specific issue pointed out by other technology review sites (per my understanding, it was initially Tom's Hardware that reported on the retail vs review sample variance) is  an issue where the 40% maximum on certain cards is not actually the RPM target that AMD intended.

AMD intended for the Radeon R9 290X's fan to run at 2200RPM (40%) in Quiet Mode and the fan on the R9 290 (which has a maximum fan speed percentage of 47%) to spin at 2650 RPM in Quiet Mode. However, some cards 40% values are not actually hitting those intended RPMs, which is causing performance differences due to cooling and PowerTune adjusting the clockspeeds accordingly.

Luckily, the issue is being worked on by AMD, and it is reportedly rectified by a driver update. The driver update ensures that the fans are actually spinning at the intended speed when set to the 40% (R9 290X) or 47% (R9 290) values in Catalyst Control Center. The new driver, which includes the fix, is version Catalyst 13.11 Beta 9.2 and is available for download now. 

If you are running a R9 290 or R9 290X in your system, you should consider updating to the latest driver to ensure you are getting the cooling (and as a result gaming) performance you are supposed to be getting.

Catalyst 13.11 Beta 9.2 is available from the AMD website.

Also read:

Stay tuned to PC Perspective for more information on the Radeon R9 290 series GPU performance variance issue as it develops.

Image credit: Ryan Shrout (PC Perspective).

Source: AMD

PC Perspective Podcast #276 - AMD Radeon R9 290, Gigabyte Z87X-UD5H, SSD Torture tests and more!

Subject: General Tech | November 7, 2013 - 05:12 PM |
Tagged: Z87X-UD5H, video, R9 290X, r9 290, podcast, nvidia, gtx 780, grid, ec2, amd, amazon

PC Perspective Podcast #276 - 11/07/2013

Join us this week as we discuss the AMD Radeon R9 290, Gigabyte Z87X-UD5H, SSD Torture tests and more!

You can subscribe to us through iTunes and you can still access it directly through the RSS page HERE.

The URL for the podcast is: http://pcper.com/podcast - Share with your friends!

  • iTunes - Subscribe to the podcast directly through the iTunes Store
  • RSS - Subscribe through your regular RSS reader
  • MP3 - Direct download link to the MP3 file

Hosts: Ryan Shrout, Jeremy Hellstrom, Josh Walrath, and Allyn Malventano

 
Due to a recording error, portions of the audio track are missing. Because of this, the audio will skip around in various places. This is actually happening, and you aren't crazy (well maybe, but not because of the audio). Considering these files were almost not recovered, it's a miracle we have this much of the recording.
 
Program length: 0:47:56
  1. Week in Review:
  2. Hardware/Software Picks of the Week:
  3. podcast@pcper.com
  4. Closing/outro

 

For AMD, X does not mark the spot

Subject: General Tech | November 5, 2013 - 01:22 PM |
Tagged: radeon, r9 290, hawaii, crossfire, amd, 290x, powertune

How does all the power of a GTX 780 for a price tag $100 lower sound to you?  Honestly it might sound a little loud as the reference cooler on the R9 290 can be a little loud at 50% which is the speed you need to be able to keep this card running full out.  As long as you don't mind the sound or are willing to wait for custom air or water cooling solutions there are no negatives about the 290.  Frame pacing makes Crossfire much smoother and it sports the hardware improvements for EyeFinity to improve your experience in 4K and multi-monitor usage.  [H]ard|OCP actually uses the word epic just before giving this card a Gold Award, check out their full review here.

Ryan's review, including Frame Rating can be found by clicking here.

1383560446bHRev9wPcM_1_1.gif

"It is time now to look at AMD's Radeon R9 290. This lower-cost R9 290 series video card packs a punch, not only in performance, but also in price. Watch it compete with the GeForce GTX 780, and win while being priced lower. This is the value you have been waiting for with gaming performance."

Here are some more Graphics Card articles from around the web:

Graphics Cards


Source: [H]ard|OCP
Author:
Manufacturer: AMD

More of the same for a lot less cash

The week before Halloween, AMD unleashed a trick on the GPU world under the guise of the Radeon R9 290X and it was the fastest single GPU graphics card we had tested to date.  With a surprising price point of $549, it was able to outperform the GeForce GTX 780 (and GTX TITAN in most cases) while under cutting the competitions price by $100.  Not too bad! 

amd1.jpg

Today's release might be more surprising (and somewhat confusing).  The AMD Radeon R9 290 4GB card is based on the same Hawaii GPU with a few less compute units enabled (CUs) and an even more aggressive price and performance placement.  Seriously, has AMD lost its mind?

Can a card with a $399 price tag cut into the same performance levels as the JUST DROPPED price of $499 for the GeForce GTX 780??  And, if so, what sacrifices are being made by users that adopt it?  Why do so many of our introduction sentences end in question marks?

The R9 290 GPU - Hawaii loses a small island

If you are new to the Hawaii GPU and you missed our first review of the Radeon R9 290X from last month, you should probably start back there.  The architecture is very similar to that of the HD 7000-series Tahiti GPUs with some modest changes to improve efficiency with the biggest jump in raw primitives per second to 4/clock over 2/clock.

diagram1.jpg

The R9 290 is based on Hawaii though it has four fewer compute units (CUs) than the R9 290X.  When I asked AMD if that meant there was one fewer CU per Shader Engine or if they were all removed from a single Engine, they refused to really answer.  Instead, several "I'm not allowed to comment on the specific configuration" lines were given.  This seems pretty odd as NVIDIA has been upfront about the dual options for its derivative GPU models.  Oh well.

Continue reading our review of the AMD Radeon R9 290 4GB Graphics Card Review!!!

Author:
Manufacturer: AMD

Clock Variations

When AMD released the Radeon R9 290X last month, I came away from the review very impressed with the performance and price point the new flagship graphics card was presented with.  My review showed that the 290X was clearly faster than the NVIDIA GeForce GTX 780 and (and that time) was considerably less expensive as well - a win-win for AMD without a doubt. 

But there were concerns over a couple of aspects of the cards design.  First was the temperature and, specifically, how AMD was okay with this rather large silicon hitting 95C sustained.  Another concern, AMD has also included a switch at the top of the R9 290X to switch fan profiles.  This switch essentially creates two reference defaults and makes it impossible for us to set a baseline of performance.  These different modes only changed the maximum fan speed that the card was allowed to reach.  Still, performance changed because of this setting thanks to the newly revised (and updated) AMD PowerTune technology.

We also saw, in our initial review, a large variation in clock speeds both from one game to another as well as over time (after giving the card a chance to heat up).  This led me to create the following graph showing average clock speeds 5-7 minutes into a gaming session with the card set to the default, "quiet" state.  Each test is over a 60 second span.

clock-avg.png

Clearly there is variance here which led us to more questions about AMD's stance.  Remember when the Kepler GPUs launched.  AMD was very clear that variance from card to card, silicon to silicon, was bad for the consumer as it created random performance deltas between cards with otherwise identical specifications. 

When it comes to the R9 290X, though, AMD claims both the GPU (and card itself) are a customizable graphics solution.  The customization is based around the maximum fan speed which is a setting the user can adjust inside the Catalyst Control Center.  This setting will allow you to lower the fan speed if you are a gamer desiring a quieter gaming configuration while still having great gaming performance.  If you are comfortable with a louder fan, because headphones are magic, then you have the option to simply turn up the maximum fan speed and gain additional performance (a higher average clock rate) without any actual overclocking.

Continue reading our article on the AMD Radeon R9 290X - The Configurable GPU!!!