Subject: Graphics Cards, Processors | December 8, 2015 - 08:07 AM | Scott Michaud
Tagged: hsa, GCC, amd
Phoronix, the Linux-focused hardware website, highlighted patches for the GNU Compiler Collection (GCC) that implement HSA. This will allow newer APUs, such as AMD's Carrizo, to accelerate chunks of code (mostly loops) that have been tagged with a precompiler flag as valuable to be done on the GPU. While I have done some GPGPU development, many of the low-level specifics of HSA aren't areas that I have too much experience with.
The patches have been managed by Martin Jambor of SUSE Labs. You can see a slideshow presentation of their work on the GNU website. Even though features froze about a month ago, they are apparently hoping that this will make it into the official GCC 6 release. If so, many developers around the world will be able to target HSA-compatible hardware in the first half of 2016. Technically, anyone can do so regardless, but they would need to specifically use the unofficial branch on the GCC Subversion repository. This probably means compiling it themselves, and it might even be behind on a few features in other branches that were accepted into GCC 6.
Subject: General Tech | November 18, 2015 - 12:35 PM | Jeremy Hellstrom
Tagged: amd, firepro, boltzmann, HPC, hsa
AMD has announced the Boltzmann Initiative to compete against Intel and NVIDIA in the HPC market this week at SC15. It is not a physical product but rather new a way to unite the processing power of HSA compliant AMD APUs and FirePro GPUs. They have announced several new projects including the Heterogeneous Compute Compiler (HCC) and Heterogeneous-compute Interface for Portability (HIP) for CUDA based apps which can automatically convert CUDA code into C++. They also announced a headless Linux driver and HSA runtime infrastructure interface for managing clusters which utilizes their InfiniBand fabric interconnect to interface system memory directly to GPU memory as well as adding P2P GPU support and numerous other enhancements. Check out more at DigiTimes.
"The Boltzmann Initiative leverages HSA's ability to harness both central processing units (CPU) and AMD FirePro graphics processing units (GPU) for maximum compute efficiency through software."
Here is some more Tech News from around the web:
- Microsoft Open-Sources Visual Studio Code @ Slashdot
- Microsoft's gamble pays off as half of enterprises pledge Windows 10 in 2016 @ The Inquirer
- Microsoft chief Satya drops an S bomb in Windows 10, cloud talk @ The Register
- Trend Micro warns of Ashley Madison fallout and rise in data breaches @ The Inquirer
- How to Test-Drive OpenStack @ Linux.com
- Adobe releases out-of-band security patches – amazingly not for Flash @ The Register
- Asus RP-AC56 802.11ac wireless extender @ Kitguru
Subject: General Tech | March 17, 2015 - 01:18 PM | Jeremy Hellstrom
Tagged: hsa foundation, hsa, amd, arm, Samsung, Imagination Technologies, HSAIL
We have been talking about the HSA foundation since 2013, a cooperative effort by AMD, ARM, Imagination, Samsung, Qualcomm, MediaTek and TI to design a heterogeneous memory architecture to allow GPUs, DSPs and CPUs to all directly access the same physical memory. The release of the official specifications today are a huge step forward for these companies, especially for garnering future mobile market share as physical hardware apart from Carrizo becomes available.
Programmers will be able to use C, C++, Fortran, Java, and Python to write HSA-compliant code which is then compiled into HSAIL (Heterogeneous System Architecture Intermediate Language) and from there to the actual binary executables which will run on your devices. HSA currently supports x86 and x64 and there are Linux kernel patches available for those who develop on that OS. Intel and NVIDIA are not involved in this project at all, they have chosen their own solutions for mobile devices and while Intel certainly has pockets deep enough to experiment NVIDIA might not. We shall soon see if Pascal and improvements Maxwell's performance and efficiency through future generations can compete with the benefits of HSA.
The current problem is of course hardware, Bald Eagle and Carrizo are scheduled to arrive on the market soon but currently they are not available. Sea Islands GPUs and Kaveri have some HSA enhancements but with limited hardware to work with it will be hard to convince developers to focus on programming HSA optimized applications. The release of the official specs today is a great first step; if you prefer an overview to reading through the official documents The Register has a good article right here.
"The HSA Foundation today officially published version 1.0 of its Heterogeneous System Architecture specification, which (if we were being flippant) describes how GPUs, DSPs and CPUs can share the same physical memory and pass pointers between each other. (A provisional 1.0 version went live in August 2014.)"
Here is some more Tech News from around the web:
- Droidberry dangles: Why the BlackBerry-Samsung alliance is big potatoes @ The Register
- BlackBerry: FREAK SSL bug affects BES, BBM and BlackBerry smartphones @ The Inquirer
- Apple will pay you to ditch your Android or BlackBerry smartphone @ The Inquirer
- Ext4 Filesystem Improvements to Address Scaling Challenges @ Linux.com
- Microsoft gives EMET divine powers to repel God Mode attack @ The Register
- Microsoft RE-BORKS Windows 7 patch after reboot loop horror @ The Register
- Fujitsu Could Help Smartphone Chips Run Cooler @ Slashdot
- Gigabyte announces financial results for 2014 @ DigiTimes
- 3D Audio Standard Released @ Slashdot
- NikKTech And Nanoxia Spring Break EU Giveaway
Filling the Product Gaps
In the first several years of my PCPer employment, I typically handled most of the AMD CPU refreshes. These were rather standard affairs that involved small jumps in clockspeed and performance. These happened every 6 to 8 months, with the bigger architectural shifts happening some years apart. We are finally seeing a new refresh of the AMD APU parts after the initial release of Kaveri to the world at the beginning of this year. This update is different. Unlike previous years, there are no faster parts than the already available A10-7850K.
This refresh deals with fleshing out the rest of the Kaveri lineup with products that address different TDPs, markets, and prices. The A10-7850K is still the king when it comes to performance on the FM2+ socket (as long as users do not pay attention to the faster CPU performance of the A10-6800K). The initial launch in January also featured another part that never became available until now; the A8-7600 was supposed to be available some months ago, but is only making it to market now. The 7600 part was unique in that it had a configurable TDP that went from 65 watts down to 45 watts. The 7850K on the other hand was configurable from 95 watts down to 65 watts.
So what are we seeing today? AMD is releasing three parts to address the lower power markets that AMD hopes to expand their reach into. The A8-7600 was again detailed back in January, but never released until recently. The other two parts are brand new. The A10-7800 is a 65 watt TDP part with a cTDP that goes down to 45 watts. The other new chip is the A6-7600K which is unlocked, has a configurable TDP, and looks to compete directly with Intel’s recently released 20 year Anniversary Pentium G3258.
Subject: General Tech | July 11, 2014 - 02:36 PM | Jeremy Hellstrom
Tagged: linux, hsa, amd, open source
Open source HSA has arrived for the Linux kernel with a newly released set of patches which will allow Sea Islands and newer GPUs to share hardware resources. These patches are both for a sample driver for any HSA-compatible hardware and the river for Radeon GPUs. As the debut of the Linux 3.16 kernel is so close you shouldn't expect to see these patches included until 3.17 which should be released in the not too distant future. Phoronix and Linux users everywhere give a big shout of thanks to AMD's John Bridgman for his work on this project.
"AMD has just published a massive patch-set for the Linux kernel that finally implements a HSA (Heterogeneous System Architecture) in open-source. The set of 83 patches implement a Linux HSA driver for Radeon family GPUs and serves too as a sample driver for other HSA-compatible devices. This big driver in part is what well known Phoronix contributor John Bridgman has been working on at AMD."
Here is some more Tech News from around the web:
- Things that make you go hmm: GlobalFoundries hires ex-IBM chip fabber @ The Register
- Microsoft uncovers bogus SSL certificates, urges users to beware of cyber attacks @ The Inquirer
- Gameover Zeus malware returns stronger than ever @ The Inquirer
- How to Operate Linux Spycams With Motion @ Linux.com
- PAPAGO! Dashcam P2PRO 1080p Review @ OCC
Subject: Processors | July 9, 2014 - 05:42 PM | Josh Walrath
Tagged: nvidia, msi, Luxmark, Lightning, hsa, GTX 580, GCN, APU, amd, A88X, A10-7850K
When I first read many of the initial AMD A10 7850K reviews, my primary question was how would the APU act if there was a different GPU installed on the system and did not utilize the CrossFire X functionality that AMD talked about. Typically when a user installs a standalone graphics card on the AMD FM2/FM2+ platform, they disable the graphics portion of the APU. They also have to uninstall the AMD Catalyst driver suite. So this then leaves the APU as a CPU only, and all of that graphics silicon is left silent and dark.
Who in their right mind would pair a high end graphics card with the A10-7850K? This guy!
Does this need to be the case? Absolutely not! The GCN based graphics unit on the latest Kaveri APUs is pretty powerful when used in GPGPU/OpenCL applications. The 4 cores/2 modules and 8 GCN cores can push out around 856 GFlops when fully utilized. We also must consider that the APU is the first fully compliant HSA (Heterogeneous System Architecture) chip, and it handles memory accesses much more efficiently than standalone GPUs. The shared memory space with the CPU gets rid of a lot of the workarounds typically needed for GPGPU type applications. It makes sense that users would want to leverage the performance potential of a fully functioning APU while upgrading their overall graphics performance with a higher end standalone GPU.
To get this to work is very simple. Assuming that the user has been using the APU as their primary graphics controller, they should update to the latest Catalyst drivers. If the user is going to use an AMD card, then it would behoove them to totally uninstall the Catalyst driver and re-install only after the new card is installed. After this is completed restart the machine, go into the UEFI, and change the primary video boot device to PEG (PCI-Express Graphics) from the integrated unit. Save the setting and shut down the machine. Insert the new video card and attach the monitor cable(s) to it. Boot the machine and either re-install the Catalyst suite if an AMD card is used, or install the latest NVIDIA drivers if that is the graphics choice.
Windows 7 and Windows 8 allow users to install multiple graphics drivers from different vendors. In my case I utilized a last generation GTX 580 (the MSI N580GTX Lightning) along with the AMD A10 7850K. These products coexist happily together on the MSI A88X-G45 Gaming motherboard. The monitor is attached to the NVIDIA card and all games are routed through that since it is the primary graphics adapter. Performance seems unaffected with both drivers active.
I find it interesting that the GPU portion of the APU is named "Spectre". Who owns those 3dfx trademarks anymore?
When I load up Luxmark I see three entries: the APU (CPU and GPU portions), the GPU portion of the APU, and then the GTX 580. Luxmark defaults to the GPUs. We see these GPUs listed as “Spectre”, which is the GCN portion of the APU, and the NVIDIA GTX 580. Spectre supports OpenCL 1.2 while the GTX 580 is an OpenCL 1.1 compliant part.
With both GPUs active I can successfully run the Luxmark “Sala” test. The two units perform better together than when they are run separately. Adding in the CPU does increase the score, but not by very much (my guess here is that the APU is going to be very memory bandwidth bound in such a situation). Below we can see the results of the different units separate and together.
These results make me hopeful about the potential of AMD’s latest APU. It can run side by side with a standalone card, and applications can leverage the performance of this unit. Now all we need is more HSA aware software. More time and more testing is needed for setups such as this, and we need to see if HSA enabled software really does see a boost from using the GPU portion of the APU as compared to a pure CPU piece of software or code that will run on the standalone GPU.
Personally I find the idea of a heterogeneous solution such as this appealing. The standalone graphics card handles the actual graphics portions, the CPU handles that code, and the HSA software can then fully utilize the graphics portion of the APU in a very efficient manner. Unfortunately, we do not have hard numbers on the handful of HSA aware applications out there, especially when used in conjunction with standalone graphics. We know in theory that this can work (and should work), but until developers get out there and really optimize their code for such a solution, we simply do not know if having an APU will really net the user big gains as compared to something like the i7 4770 or 4790 running pure x86 code.
In the meantime, at least we know that these products work together without issue. The mixed mode OpenCL results make a nice case for improving overall performance in such a system. I would imagine with more time and more effort from developers, we could see some really interesting implementations that will fully utilize a system such as this one. Until then, happy experimenting!
Subject: General Tech | May 21, 2014 - 03:06 PM | Jeremy Hellstrom
Tagged: amd, Bald Eagle, embedded, hsa
AMD has just introduced their powerful new embedded chip called Bald Eagle. Depending on the model of processor you purchase you get two or four Steamroller CPU cores, and up to eight GCN GPU cores based on the HD 9000 series. That gives the higher end chips enough juice to power up to four independent 3D, 4K, or HD displays which you can bump up to nine if you include an embedded Radeon E8860 discrete GPU in your system. The cores are all fully HSA compliant and will support ECC and non-ECC DDR3 at speeds of up to 2133MHz as well as support for PCIe Gen3 x16, PCIe Gen2 2x4 and USB and SATA as well. Check out more at The Inquirer.
"Bald Eagle also enables heterogeneous system architecture (HSA), which first appeared in AMD chippery in its desktop Kaveri APUs this January, and which allows the CPU and GPU to share the same system memory, vastly simpifying the programming challenge of getting GPUs to shoulder the parallel-processing chores that they excel at far better then CPUs."
Here is some more Tech News from around the web:
Subject: General Tech | April 17, 2014 - 01:07 PM | Jeremy Hellstrom
Tagged: amd, hsa, berlin, Opteron X-series, Red Hat
Next Wednesday we will get our first look at the HSA enabled Opteron X Series, otherwise known as Berlin. AMD will be unveiling the processor at the Red Hat Summit in San Francisco with an X2100 Opteron running on a Linux environment that is based on the Fedora Project. We have very recently had a chance to see the desktop equivalent, Kaveri, in action but this will be the first example of AMD's heterogeneous computing on a server. Keep your eyes peeled for our coverage, in the mean time you can get a preview at The Register.
"AMD will give the first public demo of its second-generation Opteron X-Series server processor, code-named "Berlin", at the Red Hat Summit in San Francisco on Wednesday."
Here is some more Tech News from around the web:
- Microsoft reissues Windows 8.1 Update for enterprise customers @ The Register
- Learn How to Contribute to the Linux Kernel, Take the Eudyptula Challenge @ Linux.com
- Delays in OLED TV shipments impede growth of OLED material market, says DisplaySearch @ DigiTimes
- Samsung Galaxy S5 fingerprint scanner hacked in just 4 DAYS @ The Register
- Steam vulnerability allows hackers to bypass security and swipe account data @ The Inquirer
Subject: Processors | January 14, 2014 - 02:52 PM | Jeremy Hellstrom
Tagged: a10-6700, a8-6500, a8-7600, amd, APU, hsa, i3-4330, Kaveri
Not only are the first Kaveri reviews arriving today, the A10-7850K is up for sale on both NewEgg and Amazon and the A10-7700K is available on NewEgg. This new part, at 45W competes favourably with the previous 100W Trinity APU in most tests and when Ryan boosted it to 65W it gained a little more. The Steamroller cores have been updated but not in a way that has a huge effect on CPU performance, on the other hand the 384 SIMD units composing the GPU portion of this chip are quite impressive, 1080p gaming of current generation titles is possible on this chip and we haven't seen it's big brother with 512 SIMD units yet. In the Tech Report's review you can see that BF4 is playable on this chip and this is not the Mantle version optimized for AMD's new architecture. It is also a pity that Thief was unavailable to see just what TrueAudio is capable of. Unfortunately this chip will not find its home in gamers dream machines, that is simply not where AMD is targeting its CPUs. However, for SFF systems that need to be energy efficient and where a discrete GPU is to big to fit Kaveri will usher in a new level of performance.
"AMD's next-generation APU packs in a ton of innovation, including updated "Steamroller" CPU cores, GCN graphics, and advanced HSA features. But is it enough to restore AMD's competitiveness in desktop processors?"
Here are some more Processor articles from around the web:
- AMD A10-7850K Kaveri: Windows 8.1 vs. Ubuntu Linux @ Phoronix
- AMD A10-7850K Kaveri: The Linux Introduction @ Phoronix
- AMD Kaveri APU Architecture Overview @ Benchmark Reviews
- AMD Kaveri A10 7850K & A8 7600 Review @ Hardware Canucks
The AMD Kaveri Architecture
Kaveri: AMD’s New Flagship Processor
How big is Kaveri? We already know the die size of it, but what kind of impact will it have on the marketplace? Has AMD chosen the right path by focusing on power consumption and HSA? Starting out an article with three questions in a row is a questionable tactic for any writer, but these are the things that first come to mind when considering a product the likes of Kaveri. I am hoping we can answer a few of these questions by the end of this article, but alas it seems as though the market will have the final say as to how successful this new architecture is.
AMD has been pursuing the “Future is Fusion” line for several years, but it can be argued that Kaveri is truly the first “Fusion” product that completes the overall vision for where AMD wants to go. The previous several generations of APUs were initially not all that integrated in a functional sense, but the complexity and completeness of that integration has been improved upon with each iteration. Kaveri takes this integration to the next step, and one which fulfills the promise of a truly heterogeneous computing solution. While AMD has the hardware available, we have yet to see if the software companies are willing to leverage the compute power afforded by a robust and programmable graphics unit powered by AMD’s GCN architecture.
(Editor's Note: The following two pages were written by our own Josh Walrath, dicsussing the technology and architecture of AMD Kaveri. Testing and performance analysis by Ryan Shrout starts on page 3.)
The first step in understanding Kaveri is taking a look at the process technology that AMD is using for this particular product. Since AMD divested itself of their manufacturing arm, they have had to rely on GLOBALFOUNDRIES to produce nearly all of their current CPUs and APUs. Bulldozer, Piledriver, Llano, Trinity, and Richland based parts were all produced on GF’s 32 nm PD-SOI process. The lower power APUs such as Brazos and Kabini have been produced by TSMC on their 40 nm and 28 nm processes respectively.
Kaveri will take a slightly different approach here. It will be produced by GLOBALFOUNDRIES, but it will forego the SOI and utilize a bulk silicon process. 28 nm HKMG is very common around the industry, but few pure play foundries were willing to tailor their process to the direct needs of AMD and the Kaveri product. GF was able to do such a thing. APUs are a different kind of animal when it comes to fabrication, primarily because the two disparate units require different characteristics to perform at the highest efficiency. As such, compromises had to be made.