Subject: General Tech | April 30, 2013 - 01:23 PM | Jeremy Hellstrom
Tagged: Steamroller, piledriver, Kaveri, Kabini, hUMA, hsa, GCN, bulldozer, APU, amd
AMD may have united GPU and CPU into the APU but one hurdle had remained until now, the the non-uniformity of memory access between the two processors. Today we learned about one of the first successful HAS projects called Heterogeneous Uniform Memory Access, aka hUMA, which will appear in the upcoming Kaveri chip family. The use of this new technology will allow the on-die CPU and GPU to access the same memory pool, both physical and virtual and any data passed between the two processors will remain coherent. As The Tech Report mentions in their overview hUMA will not provide as much of a benefit to discrete GPUs, while they will be able to share address space the widely differing clock speeds between GDDR5 and DDR3 prevent unification to the level of an APU.
Make sure to read Josh's take as well so you can keep up with him on the Podcast.
"At the Fusion Developer Summit last June, AMD CTO Mark Papermaster teased Kaveri, AMD's next-generation APU due later this year. Among other things, Papermaster revealed that Kaveri will be based on the Steamroller architecture and that it will be the first AMD APU with fully shared memory.
Last week, AMD shed some more light on Kaveri's uniform memory architecture, which now has a snazzy marketing name: heterogeneous uniform memory access, or hUMA for short."
Here is some more Tech News from around the web:
- AMD’s new heterogeneous Uniform Memory Access
- hUMA; AMD’s Heterogeneous Unified Memory Architecture @ Hardware Canucks
- Compro TN50W Cloud Network Camera @ Tweaktown
- Wifi Pineapple project uses updated hardware for man-in-the-middle attacks @ Hack a Day
- New OpenWRT Drops Support For Linux 2.4, Low-Mem Devices @ Slashdot
- HP mashes up ProLiant, Integrity, BladeSystem, and Moonshot server @ The Register
- Acer selling tablet using Intel Y series processor @ The Register
- CERN Celebrates 20 Years of an Open Web (and Rebuilds 1st Web Page) @ Slashdot
- BitFenix 5K YouTube Subscriber Giveaway @ eTeknix
heterogeneous Uniform Memory Access
Several years back we first heard AMD’s plans on creating a uniform memory architecture which will allow the CPU to share address spaces with the GPU. The promise here is to create a very efficient architecture that will provide excellent performance in a mixed environment of serial and parallel programming loads. When GPU computing came on the scene it was full of great promise. The idea of a heavily parallel processing unit that will accelerate both integer and floating point workloads could be a potential gold mine in wide variety of applications. Alas, the promise of the technology did not meet expectations when we have viewed the results so far. There are many problems with combining serial and parallel workloads between CPUs and GPUs, and a lot of this has to do with very basic programming and the communication of data between two separate memory pools.
CPUs and GPUs do not share common memory pools. Instead of using pointers in programming to tell each individual unit where data is stored in memory, the current implementation of GPU computing requires the CPU to write the contents of that address to the standalone memory pool of the GPU. This is time consuming and wastes cycles. It also increases programming complexity to be able to adjust to such situations. Typically only very advanced programmers with a lot of expertise in this subject could program effective operations to take these limitations into consideration. The lack of unified memory between CPU and GPU has hindered the adoption of the technology for a lot of applications which could potentially use the massively parallel processing capabilities of a GPU.
The idea for GPU compute has been around for a long time (comparatively). I still remember getting very excited about the idea of using a high end video card along with a card like the old GeForce 6600 GT to be a coprocessor which would handle heavy math operations and PhysX. That particular plan never quite came to fruition, but the idea was planted years before the actual introduction of modern DX9/10/11 hardware. It seems as if this step with hUMA could actually provide a great amount of impetus to implement a wide range of applications which can actively utilize the GPU portion of an APU.
Define an Enthusiast CPU...
FM2 poses an interesting quandary for motherboard manufacturers. AMD provides a very robust and full featured chip for use with their processors (A85X) that would lend itself well to midrange and enthusiast class motherboards. Unfortunately, AMD does not provide a similarly high end CPU as compared to the competition at price ranges that would make sense for a motherboard that would cost between $140 and $250 on the FM2 platform.
So these manufacturers are constrained on price to offer fully featured motherboards that take advantage of all aspects of the A85X FCH (Fusion Controller Hub). Until AMD can deliver a more competitive CPU on the FM2 platform, motherboard manufacturers will be forced to design offerings that can really go no higher than $129 (the current price of the fastest A10 processor from AMD). This is not necessarily a bad thing though, as it has forced these manufacturers to really rethink their designs and to focus their energies on getting the greatest bang-for-the-buck. AMD is selling a decent number of these processors, but the market is constrained as compared to the Intel offerings utilizing the 1155 BGA infrastructure.
Gigabyte has taken this particular bull by the horns and have applied a very unique (so far) technology to the board. This is on top of all the other marketing and engineering terms that we are quite familiar with. The company itself is one of the top three manufacturers of motherboards in the world, and they typically trail Asus in terms of shipments but are still ahead of MSI. As with any motherboard manufacturer, the quality of Gigabyte products has seen peaks and valleys through the years. From what I have seen for the past few years though, Gigabyte is doing very well in terms of overall quality and value.
Subject: Graphics Cards, Processors | January 23, 2013 - 02:42 PM | Ryan Shrout
Tagged: southern islands, sony, ps4, playstation 4, orbis, Kaveri, bulldozer, APU, amd
Earlier today a report from Kotaku.com posted some details about the upcoming PlayStation console, code named Orbis and sometimes just called the PS4. Kotaku author Luke Plunkett got the information from a 90 page PDF that details the development kit so the information is likely pretty accurate if incomplete. It discusses a new controller and a completely new accounts system but I was mostly interested in the hardware details given.
We'll begin with the specs. And before we go any further, know that these are current specs for a PS4 development kit, not the final retail console itself. So while the general gist of the things you see here may be similar to what makes it into the actual commercial hardware, there's every chance some—if not all of it—changes, if only slightly.
This is key to keep in mind because here are the specs listed on the report:
- 8GB of system memory
- 2.2GB of graphics memory
- 4 module (8 core) AMD Bulldozer CPU
- AMD "R10xx" based GPU
- 4x USB 3.0 ports and 2x Ethernet connections
- Blu-ray drive
- 160GB HDD
- HDMI and optical audio output
We are essentially talking about an AMD FX-series processor with a Southern Islands based discrete card and I am nearly 100% sure that this will not match the configuration of the shipping system. Think about it - would a console developer really want to have a processor that can draw more than 100 watts inside its box in addition to a discrete GPU? I doubt it.
Instead, let's go with the idea that this developer kit is simply meant to emulate some final specifications. More than likely we are looking at an APU solution that combines Bulldozer or Steamroller cores along with GCN-based GPU SIMD arrays. The most likely candidate is Kaveri, a 28nm based product that meets both of those requirements. Josh recently discussed the future with Kaveri in a post during CES, worth checking out. AMD has told us several times that Kaveri should be able to hit the 1.0 TFLOPs level of performance and if we compare to the current discrete GPUs would enable graphics performance similar to that of an under-clocked Radeon HD 7770.
There is some room for doubt though - Kaveri isn't supposed to be out until "late Q4" though its possible that the PS4 will be the first customer. It is also possible that AMD is making a specific discrete GPU for implementation on the PS4 based on the GCN architecture that would be faster than the graphics performance expected on the Kaveri APU.
When speaking with our own Josh Walrath on this rumor, he tended to think that Sony and AMD would not use an APU but would rather combine a separate CPU and GPU on a single substrate, allowing for better yields than a combined APU part. In order to make up for the slower memory controller interface (on substrate is not as fast as on-die) AMD might again utilize backside cache, just like the one used on the Xbox 360 today. With process technology improvements its not unthinkable to see that jump to 30 or 40MB of cache.
With the debate of a 2013 or 2014 release still up in the air, there is plenty of time for this to change still but we will likely know for sure after our next trip to Taipei.
We are Still Among the Living
The day after the official AMD presentation we were able to sit down with Leslie Sobon for a good hour and really dig into the products we are expecting throughout this next year. AMD did not officially announce any products, but they revealed more details about products on their roadmaps.
To say that AMD is in a somewhat precarious situation is an understatement. This does not necessarily mean that they won’t survive for some years. This was never mentioned to us by AMD, but we can assume that it is not in ATIC’s best interest to let AMD flounder too much. AMD is still GLOBALFOUNDRIES largest customer, and ATIC believes that they can become a fabrication giant in the next few years. So, while AMD is hitting some hard times, they will be around for some time to come in spite of their issues.
Believe it or not, AMD is still a CPU company with some relevant producxts. While Intel has the advantage in x86 performance and process technology, AMD has a distinct advantage in the integrated graphics portion. While Trinity was a big step in the right direction in terms of performance and power consumption, it was not enough to boost their flagging marketshare. Throughout the 2013 they are working on several products that will help to change their fortunes.
The first product that we will likely see is the Jaguar core based Kabini APUs. These are the next generation, low power APUs which will replace the Brazos 2.0 products that we currently are seeing. These quad core and dual core parts are manufactured by TSMC on their 28 nm process. Kabini will be the first APU to include the new GCN architecture that we currently see in the HD 7700 series and above. AMD will be breaking new ground in offering a true quad core part at price points unseen so far.
Less Risk, Faster Product Development and Introduction
There have been quite a few articles lately about the upcoming Bulldozer refresh from AMD, but a lot of the information that they have posted is not new. I have put together a few things that seem to have escaped a lot of these articles, and shine a light on what I consider the most important aspects of these upcoming releases. The positive thing that most of these articles have achieved is increasing interest in AMD’s upcoming products, and what they might do for that company and the industry in general.
The original FX-8150 hopefully will only be a slightly embarrasing memory for AMD come Q3/Q4 of this year.
The current Bulldozer architecture that powers the AMD FX series of processors is not exactly an optimal solution. It works, and seems to do fine, but it does not surpass the performance of the previous generation Phenom II X6 series of chips in any meaningful way. Let us not mention how it compares to Intel’s Sandy Bridge and Ivy Bridge products. It is not that the design is inherently flawed or bad, but rather that it was a unique avenue of thought that was not completely optimized. The train of thought is that AMD seems to have given up on the high single threaded performance that Intel has excelled at for some time. Instead they are going for good single threaded performance, and outstanding multi-threaded performance. To achieve this they had to rethink how to essentially make the processor as wide as possible, keep the die size and TDP down to reasonable sizes, and still achieve a decent amount of performance in single threaded applications.
Bulldozer was meant to address this idea, and its success is debatable. The processor works, it shows up as an eight logical core processor, and it seems to scale well with multi-threading. The problem, as stated before, is that it does not perform like a next generation part. In fact, it is often compared to Intel’s Prescott, which was a larger chip on a smaller process than the previous Northwood processor, but did not outperform the earlier part in any meaningful way (except in heat production). The difference between Intel and AMD in this aspect is that as compared to Prescott, Bulldozer as an entirely new architecture as compared to the Prescott/Northwood lineage. AMD has radically changed the way it designs processors. Taking some lessons from the graphics arm of the company and their successful Radeon brand, AMD is applying that train of thought to processors.
Subject: Graphics Cards, Processors | June 12, 2012 - 12:18 PM | Ryan Shrout
Tagged: Kaveri, APU, amd, AFDS
During the opening keynote at the AMD Fusion Developer Summit 2012, AMD's Dr. Lisa Su revealed a slide with performance of the upcoming 3rd genreation Kaveri APU.
While Trinity is currently rated at 726 GFLOPS, the Kaveri APU due late in 2012 or early 2013, will have at least 1 TFLOPS of total compute performance. That is a 37% boost over the previous generation.
If you want more information, check out our keynote live blog!!
AMD Gives a Glimpse of the Near Future
AMD has released an updated roadmap for these next two years, and the information contained within is quite revealing of where AMD is going and how they are shifting their lineup to be less dependent on a single manufacturer. The Financial Analyst Day has brought a few surprises of where AMD is headed, and how they will get there. Rory Read and Mark Papermaster have brought a new level of energy to the company that seemingly has been either absent or muted. Sometimes a new set of eyes on a problem, or in this case the attitudes and culture of a company, can bring about significant changes for the positive. From what we have seen so far from Rory and company is a new energy and direction for AMD. While AMD is still sticking to their roots, they are looking to further expand upon their expertise in some areas, all the while being flexible enough to license products from other companies that are far enough away from AMD's core competence that it pays to license rather than force engineers to re-invent the wheel.
The roadmaps cover graphics, desktop, mobile, and server products through 2013.
This first slide is a snapshot of the current and upcoming APU lineup. Southern Islands is the codename for the recently released HD 7000 series of desktop parts. This will cover products from the 7700 level on up to the top end 7990. Of great interest are the Brazos 2.0 and Hondo chips. AMD had cancelled the "Krishna" series of chips which would have been based on Bobcat cores up to 4 on 28 nm. Details are still pending, but it seems Brazos 2.0 will still be 40 nm parts but much more refined so they can be clocked higher and still pull less power. Hondo looks to be the basic Brazos core, but for Ultra Low Power (lower clocks, possibly disabled units, etc.) which would presumably scale to 5 watts and possibly lower.