Data mining Mantle at APU 13

Subject: General Tech | November 26, 2013 - 12:46 PM |
Tagged: amd, Mantle, apu13

The Tech Report learned quite a bit about Mantle at APU 13, focusing much more deeply on what Mantle is and how it will work.  To think of it as a replacement for DirectX is a good start as it is an API but it also changes how your system interacts with your GPU.  The briefing delves into to the technical side, describing the context-based execution model which Mantle uses to give you proper access to assign tasks to multiple processors or other resources as the memory interface is also completely revamped.   There are four pages describing Mantle for your reading pleasure here and with the strong early adoption it would be worth your time to learn more about it.

unpredictable.jpg

"At its APU13 developer conference in San Jose, California, AMD invited journalists and developers to listen to hours worth of keynotes and sessions by Mantle's creators and early adopters. We sat through all of it—and talked to some of those experts one on one—in order to get a sense of what Mantle does, how it will impact performance, and what its future may hold."

Here is some more Tech News from around the web:

Tech Talk

AMD Mantle Deep Dive Video from AMD APU13 Event

Subject: Graphics Cards | November 13, 2013 - 09:54 PM |
Tagged: video, Mantle, apu13, amd

While attending the AMD APU13 event, an annual developer conference the company uses to promote heterogeneous computing, I got to sit in during a deep dive on the AMD Mantle, a new hardware level API first announced in September.  Rather than attempt to re-explain what was explained quite well, I decided to record the session on video and then intermix the slides presented in a produced video for our readers.

The result is likely the best (and seemingly first) explanation of how Mantle actually works and what it does differently than existing APIs like DirectX and OpenGL.

Also, because we had some requests, I am embedding the live blog we ran during Johan Andersson's keynote from APU13.  Enjoy!

AMD Releases 2014 Mobile APU Details: Beema and Mullins Cut TDPs

Subject: Processors | November 13, 2013 - 05:35 PM |
Tagged: Puma, Mullins, mobile, Jaguar, GCN, beema, apu13, APU, amd, 2014

AMD’s APU13 is all about APUs and their programming, but the hardware we have seen so far has been dominated by the upcoming Kaveri products for FM2+.  It seems that AMD has more up their sleeves for release this next year, and it has somewhat caught me off guard.  The Beema and Mullins based products are being announced today, but we do not have exact details on these products.  The codenames have been around for some time now, but interest has been minimal since they are evolutionary products based on Kabini and Temash APUs that have been available this year.  Little did I know that things would be far more interesting than that.

apu13_01.png

The basis for Beema and Mullins is the Puma core.  This is a highly optimized revision of Jaguar, and in some ways can be considered a new design.  All of the basics in terms of execution units, caches, and memory controllers are the same.  What AMD has done is go through the design with a fine toothed comb and make it far more efficient per clock than what we have seen previously.  This is still a 28 nm part, but the extra attention and love lavished upon it by AMD has resulted in a much more efficient system architecture for the CPU and GPU portions.

The parts will be offered in two and four core configurations.  Beema will span from 10W to 25W configurations.  Mullins will go all the way down to “2W SDP”.  SDP essentially means that while the chip can be theoretically rated higher, it will rarely go above that 2W envelope in the vast majority of situations.  These chips are expected to be around 2X more efficient per clock than the previous Jaguar based products.  This means that at similar clock speeds, Beema and Mullins will pull far less power than that previous gen.  It should also allow some higher clockspeeds at the top end 25W area.

apu13_02.png

These will be some of the first fanless quad cores that AMD will introduce for the tablet market.  Previously we have seen tablets utilize the cut down versions of Temash to hit power targets, but with this redesign it is entirely possible to utilize the fully enabled quad core Mullins.  AMD has not given us specific speeds for these products, but we can guess that they will be around what we see currently, but the chip will just have a lower TDP rating.

AMD is introducing their new security platform based on the ARM Trustzone.  Essentially a small ARM Cortex A5 is integrated in the design and handles the security aspects of this feature.  We were not briefed on how this achieves security, but the slide below gives some of the bullet points of the technology.

apu13_03.png

Since the pure-play foundries will not have a workable 20 nm process for AMD to jump to in a timely manner, AMD had no other choice but to really optimize the Jaguar core to make it more competitive with products from Intel and the ARM partners.  At 28 nm the ARM ecosystem has a power advantage over AMD, while at 22 nm Intel offers similar performance to AMD but with greater power efficiency.

This is a necessary update for AMD as the competition has certainly not slowed down.  AMD is more constrained obviously by the lack of a next-generation process node available for 1H 2014, so a redesign of this magnitude was needed.  The performance per watt metric is very important here, as it promises longer battery life without giving up the performance people received from the previous Kabini/Temash family of APUs.  This design work could be carried over to the next generation of APUs using 20 nm and below, which hopefully will keep AMD competitive with the rest of the market.  Beema and Mullins are interesting looking products that will be shown off at CES 2014.

apu13_04.png

Source: AMD

AMD Kaveri's Fast... But Less Than Expected.

Subject: General Tech, Processors | November 12, 2013 - 06:50 PM |
Tagged: Kaveri, apu13, amd

AMD will deliver its latest round of APUs (Kaveri) on January 14th. These processors, built on a 28nm process, will combine the Steamroller architecture on the CPU with HSA-compliant Graphics Core Next (GCN) cores on the GPU. Together they are expected to bring 856 GFLOPs of computational performance.

AMD-Kaveri.jpg

Thomas Ryan at SemiAccurate, however, remembers that AMD expected over a TeraFLOP.

Of course Kaveri has been a troubled chip for AMD. At this point Kaveri is over a year late and most of that delay is due to a series of internal issues at AMD rather than technical problems. But now with the knowledge that Kaveri missed AMD’s internal performance targets by about 20 percent it’s hard to be very positive about AMD’s next big-core APU.

The problem comes from a reduction in the clock rate AMD expected back in February 2012. Steamroller was expected to reach 4 GHz but that has been slightly reduced to 3.7 GHz; this is obviously a small impact from a compute standpoint (weakened by just under10 GFLOPs). The GPU, on the other hand, was cut from 900MHz down to 720 MHz; its performance was reduced by a whole 25% (Update: 20%. Accidentally divided by 720 instead of 900). Using AMD's formula for calculating FLOP performance, Kaveri's 856 GFLOP rating corresponds to an 18% reduction from the original 1050 GFLOP target.

But, personally, I am still positive about Kaveri.

The introduction of HSA features into mainstream x86 processors has begun. The ability to share memory between the CPU and the GPU could be a big deal, especially for tasks such as AI and physics. AI especially interests me (although I am by no means an expert) because it is a mixture of branching and parallel instructions. The HSA model could, potentially, operate on the data with whichever architecture makes sense. Currently, synchronizing CPU and GPU memory is very costly; you could easily spend most of your processing time budget waiting for memory transfers.

856 GFLOPs is a definite reduction from 1050 GFLOPs. Still, if Kaveri (and APUs going forward) can effectively nullify the latencies involved with GPGPU work, an Intel Ivy Bridge-E Core i7 4960X has an instruction throughput of ~160 GFLOPs.

And before you say it: Yes, I know, Ivy Bridge-E can be paired with fast discrete graphics. This combination is ideal for easily separated tasks such as when the CPU prepares a frame and then a GPU draws it; you get the best of both worlds if both can keep working.

But what if your workload is a horrific mish-mash of back-and-forth serial and parallel? That is where AMD might have an edge.

Source: SemiAccurate
Author:
Subject: Processors
Manufacturer: AMD

More Details from Lisa Su

The executives at AMD like to break their own NDAs.  Then again, they are the ones typically setting these NDA dates, so it isn’t a big deal.  It is no secret that Kaveri has been in the pipeline for some time.  We knew a lot of the basic details of the product, but there were certainly things that were missing.  Lisu Su went up onstage and shared a few new details with us.

kaveri.jpg

Kaveri will be made up of 4 “Steamroller” cores, which are enhanced versions of the previous Bulldozer/Trinity/Vishera families of products.  Nearly everything in the processor is doubled.  It now has dual decode, more cache, larger TLBs, and a host of other smaller features that all add up to greater single thread performance and better multi-threaded handling and performance.   Integer performance will be improved, and the FPU/MMX/SSE unit now features 2 x 128 bit FMAC units which can “fuse” and support AVX 256.

However, there was no mention of the fabled 6 core Kaveri.  At this time, it is unlikely that particular product will be launched anytime soon. 

Click to read the entire article here!

AMD Planning APU13 Developer Summit In San Jose, California

Subject: General Tech | May 1, 2013 - 07:08 AM |
Tagged: hUMA, hsa, apu13, APU, amd, AFDS

AMD announced its third annual Developer Summit last week. Dubbed “APU13,” the upcoming summit is the AMD equivalent to NVIDIA’s GTC and is an annual event that brings together industry analysts, researchers, programmers, academics, and software/hardware companies pursuing heterogeneous computing technologies.

In previous years, the AMD Developer Summit has been the launchpad for C++ AMP and the HSA Foundation. This year’s Summit will continue that trend towards heterogeneous computing as well as look back over the year and provide updates on where the various HSA member companies are at as far as goals to move towards standards-based heterogenous computing.

AMD Logo.png

In addition to keynote speeches from AMD and some of its partners, expect a great deal of presentations and workshops from researchers and programmers that are working on new programming models and hardware solutions to efficiently use CPU and GPU processors. More information on hUMA is one of the likely topics, for example. Discussion about upcoming hardware, process nodes, and products may also be on the table so far as it relates to the HSA theme. Considering the summit is called “APU13,” I also expect that AMD will reveal additional details on the company’s Kaveri APU as well as a look into its future product road map.

AMD is currently asking for presentation proposals from researchers in a number of HSA and technology-related fields including heterogeneous computing, cloud computing, web technologies, programming languages, gaming and graphics technologies, and software security. The lineup of presenters for the summit is still being worked out, and proposal papers will be accepted until May 10th with the winners being notified over the summer.

In all, AMD’s APU13 should be an exciting and intellectual event. Last year’s AMD Fusion Developer Summit (AFDS) was an interesting and fun event to cover, and I hope that APU13 will keep up the same momentum and interest in heterogeneous computing that AFDS started.

Source: AMD