AMD Releases App SDK 3.0 with OpenCL 2.0

Subject: Graphics Cards, Processors | August 30, 2015 - 09:14 PM |
Tagged: amd, carrizo, Fiji, opencl, opencl 2.0

Apart from manufacturers with a heavy first-party focus, such as Apple and Nintendo, hardware is useless without developer support. In this case, AMD has updated their App SDK to include support for OpenCL 2.0, with code samples. It also updates the SDK for Windows 10, Carrizo, and Fiji, but it is not entirely clear how.


That said, OpenCL is important to those two products. Fiji has a very high compute throughput compared to any other GPU at the moment, and its memory bandwidth is often even more important for GPGPU workloads. It is also useful for Carrizo, because parallel compute and HSA features are what make it a unique product. AMD has been creating first-party software software and helping popular third-party developers such as Adobe, but a little support to the world at large could bring a killer application or two, especially from the open-source community.

The SDK has been available in pre-release form for quite some time now, but it is finally graduated out of beta. OpenCL 2.0 allows for work to be generated on the GPU, which is especially useful for tasks that vary upon previous results without contacting the CPU again.

Source: AMD

Khronos Group at SIGGRAPH 2015

Subject: Graphics Cards, Processors, Mobile, Shows and Expos | August 10, 2015 - 09:01 AM |
Tagged: vulkan, spir, siggraph 2015, Siggraph, opengl sc, OpenGL ES, opengl, opencl, Khronos

When the Khronos Group announced Vulkan at GDC, they mentioned that the API is coming this year, and that this date is intended to under promise and over deliver. Recently, fans were hoping that it would be published at SIGGRAPH, which officially begun yesterday. Unfortunately, Vulkan has not released. It does hold a significant chunk of the news, however. Also, it's not like DirectX 12 is holding a commanding lead at the moment. The headers were public only for a few months, and the code samples are less than two weeks old.


The organization made announcements for six products today: OpenGL, OpenGL ES, OpenGL SC, OpenCL, SPIR, and, as mentioned, Vulkan. They wanted to make their commitment clear, to all of their standards. Vulkan is urgent, but some developers will still want the framework of OpenGL. Bind what you need to the context, then issue a draw and, if you do it wrong, the driver will often clean up the mess for you anyway. The briefing was structure to be evident that it is still in their mind, which is likely why they made sure three OpenGL logos greeted me in their slide deck as early as possible. They are also taking and closely examining feedback about who wants to use Vulkan or OpenGL, and why.

As for Vulkan, confirmed platforms have been announced. Vendors have committed to drivers on Windows 7, 8, 10, Linux, including Steam OS, and Tizen (OSX and iOS are absent, though). Beyond all of that, Google will accept Vulkan on Android. This is a big deal, as Google, despite its open nature, has been avoiding several Khronos Group standards. For instance, Nexus phones and tablets do not have OpenCL drivers, although Google isn't stopping third parties from rolling it into their devices, like Samsung and NVIDIA. Direct support of Vulkan should help cross-platform development as well as, and more importantly, target the multi-core, relatively slow threaded processors of those devices. This could even be of significant use for web browsers, especially in sites with a lot of simple 2D effects. Google is also contributing support from their drawElements Quality Program (dEQP), which is a conformance test suite that they bought back in 2014. They are going to expand it to Vulkan, so that developers will have more consistency between devices -- a big win for Android.


While we're not done with Vulkan, one of the biggest announcements is OpenGL ES 3.2 and it fits here nicely. At around the time that OpenGL ES 3.1 brought Compute Shaders to the embedded platform, Google launched the Android Extension Pack (AEP). This absorbed OpenGL ES 3.1 and added Tessellation, Geometry Shaders, and ASTC texture compression to it. It was also more tension between Google and cross-platform developers, feeling like Google was trying to pull its developers away from Khronos Group. Today, OpenGL ES 3.2 was announced and includes each of the AEP features, plus a few more (like “enhanced” blending). Better yet, Google will support it directly.

Next up are the desktop standards, before we finish with a resurrected embedded standard.

OpenGL has a few new extensions added. One interesting one is the ability to assign locations to multi-samples within a pixel. There is a whole list of sub-pixel layouts, such as rotated grid and Poisson disc. Apparently this extension allows developers to choose it, as certain algorithms work better or worse for certain geometries and structures. There were probably vendor-specific extensions for a while, but now it's a ratified one. Another extension allows “streamlined sparse textures”, which helps manage data where the number of unpopulated entries outweighs the number of populated ones.

OpenCL 2.0 was given a refresh, too. It contains a few bug fixes and clarifications that will help it be adopted. C++ headers were also released, although I cannot comment much on it. I do not know the state that OpenCL 2.0 was in before now.

And this is when we make our way back to Vulkan.


SPIR-V, the code that runs on the GPU (or other offloading device, including the other cores of a CPU) in OpenCL and Vulkan is seeing a lot of community support. Projects are under way to allow developers to write GPU code in several interesting languages: Python, .NET (C#), Rust, Haskell, and many more. The slide lists nine that Khronos Group knows about, but those four are pretty interesting. Again, this is saying that you can write code in the aforementioned languages and have it run directly on a GPU. Curiously missing is HLSL, and the President of Khronos Group agreed that it would be a useful language. The ability to cross-compile HLSL into SPIR-V means that shader code written for DirectX 9, 10, 11, and 12 could be compiled for Vulkan. He expects that it won't take long for a project to start, and might already be happening somewhere outside his Google abilities. Regardless, those who are afraid to program in the C-like GLSL and HLSL shading languages might find C# and Python to be a bit more their speed, and they seem to be happening through SPIR-V.

As mentioned, we'll end on something completely different.


For several years, the OpenGL SC has been on hiatus. This group defines standards for graphics (and soon GPU compute) in “safety critical” applications. For the longest time, this meant aircraft. The dozens of planes (which I assume meant dozens of models of planes) that adopted this technology were fine with a fixed-function pipeline. It has been about ten years since OpenGL SC 1.0 launched, which was based on OpenGL ES 1.0. SC 2.0 is planned to launch in 2016, which will be based on the much more modern OpenGL ES 2 and ES 3 APIs that allow pixel and vertex shaders. The Khronos Group is asking for participation to direct SC 2.0, as well as a future graphics and compute API that is potentially based on Vulkan.

The devices that this platform intends to target are: aircraft (again), automobiles, drones, and robots. There are a lot of ways that GPUs can help these devices, but they need a good API to certify against. It needs to withstand more than an Ouya, because crashes could be much more literal.

Manufacturer: PC Perspective

... But Is the Timing Right?

Windows 10 is about to launch and, with it, DirectX 12. Apart from the massive increase in draw calls, Explicit Multiadapter, both Linked and Unlinked, has been the cause of a few pockets of excitement here and there. I am a bit concerned, though. People seem to find this a new, novel concept that gives game developers the tools that they've never had before. It really isn't. Depending on what you want to do with secondary GPUs, game developers could have used them for years. Years!

Before we talk about the cross-platform examples, we should talk about Mantle. It is the closest analog to DirectX 12 and Vulkan that we have. It served as the base specification for Vulkan that the Khronos Group modified with SPIR-V instead of HLSL and so forth. Some claim that it was also the foundation of DirectX 12, which would not surprise me given what I've seen online and in the SDK. Allow me to show you how the API works.


Mantle is an interface that mixes Graphics, Compute, and DMA (memory access) into queues of commands. This is easily done in parallel, as each thread can create commands on its own, which is great for multi-core processors. Each queue, which are lists leading to the GPU that commands are placed in, can be handled independently, too. An interesting side-effect is that, since each device uses standard data structures, such as IEEE754 decimal numbers, no-one cares where these queues go as long as the work is done quick enough.

Since each queue is independent, an application can choose to manage many of them. None of these lists really need to know what is happening to any other. As such, they can be pointed to multiple, even wildly different graphics devices. Different model GPUs with different capabilities can work together, as long as they support the core of Mantle.


DirectX 12 and Vulkan took this metaphor so their respective developers could use this functionality across vendors. Mantle did not invent the concept, however. What Mantle did is expose this architecture to graphics, which can make use of all the fixed-function hardware that is unique to GPUs. Prior to AMD's usage, this was how GPU compute architectures were designed. Game developers could have spun up an OpenCL workload to process physics, audio, pathfinding, visibility, or even lighting and post-processing effects... on a secondary GPU, even from a completely different vendor.

Vista's multi-GPU bug might get in the way, but it was possible in 7 and, I believe, XP too.

Read on to see a couple reasons why we are only getting this now...

Who Should Care? Thankfully, Many People

The Khronos Group has made three announcements today: Vulkan (their competitor to DirectX 12), OpenCL 2.1, and SPIR-V. Because there is actually significant overlap, we will discuss them in a single post rather than splitting them up. Each has a role in the overall goal to access and utilize graphics and compute devices.


Before we get into what everything is and does, let's give you a little tease to keep you reading. First, Khronos designs their technologies to be self-reliant. As such, while there will be some minimum hardware requirements, the OS pretty much just needs to have a driver model. Vulkan will not be limited to Windows 10 and similar operating systems. If a graphics vendor wants to go through the trouble, which is a gigantic if, Vulkan can be shimmed into Windows 8.x, Windows 7, possibly Windows Vista despite its quirks, and maybe even Windows XP. The words “and beyond” came up after Windows XP, but don't hold your breath for Windows ME or anything. Again, the further back in Windows versions you get, the larger the “if” becomes but at least the API will not have any “artificial limitations”.

Outside of Windows, the Khronos Group is the dominant API curator. Expect Vulkan on Linux, Mac, mobile operating systems, embedded operating systems, and probably a few toasters somewhere.

On that topic: there will not be a “Vulkan ES”. Vulkan is Vulkan, and it will run on desktop, mobile, VR, consoles that are open enough, and even cars and robotics. From a hardware side, the API requires a minimum of OpenGL ES 3.1 support. This is fairly high-end for mobile GPUs, but it is the first mobile spec to require compute shaders, which are an essential component of Vulkan. The presenter did not state a minimum hardware requirement for desktop GPUs, but he treated it like a non-issue. Graphics vendors will need to be the ones making the announcements in the end, though.

Before we go further, some background is necessary. Read on for that and lots more!

Using the embedded HD7850 to spot the next generation of gamers in the womb

Subject: General Tech | January 12, 2015 - 01:29 PM |
Tagged: ultrasound, opencl, hd 7850

The new bk3000 Ultrasound System from Analogic will use an embedded HD7850 and OpenCL to triple the quality of the information the ultrasound reveals.  This will allow ultrasounds to reveal anatomical detail and micro-vascularization that was not available with previous ultrasound technology and could even enable Gamegaters to locate their own heads with the use of the E14C4t transducer.  The most familiar usage of ultrasound is for displaying a fetus in utero but there are far more medical uses for this type of (mostly) non-invasive scan and the increase in detail and the transformation abilities that Open CL brings will not only make it more effective but could expand the usefulness of ultrasounds as a diagnostic tool.  As we at PC Perspective continue to age we are very appreciative of advances such as this, especially if we can get a split screen that allows us to do a little light gaming while the doctors poke and prod!


SUNNYVALE, Calif. — Jan. 12, 2015 — AMD (NASDAQ:AMD) today announced that the AMD Embedded Radeon HD 7850 GPU is enabling cutting-edge application performance for the BK Ultrasound, powered by Analogic, bk3000 ultrasound system. Analogic is a leader in developing healthcare and security technology solutions to advance the practice of medicine to save lives.

“The AMD Embedded Radeon HD 7850 GPU with OpenCL provides a powerful and efficient pairing,” said Cameron Swen, segment marketing manager, medical applications, AMD Embedded Solutions. “This product is yet another proof point to AMD’s dedication to the healthcare segment through its technology, which helps facilitate crisp, detailed medical image visualization and other advanced graphics-driven capabilities, helping doctors provide improved care for patients.”

Analogic used OpenCL standard to gain access to the GPU for general-purpose computing, referred to as “GPGPU,” delivering exceptional performance and offering system and development cost reduction through cross-platform portability. As a result of using AMD GPU technology, Analogic achieved a 3x improvement in the amount of information in each ultrasound image and reduced time from capture to presentation. Traditional FPGAs and DSPs create a fixed, inflexible implementation that requires custom software targeted at specific hardware. Going to a software-based solution using OpenCL helps to further lower the development cost and provides improved long term value since the software can be used across product lines and through generation shifts.

“It was a critical design goal for us to implement a platform that delivered exceptional performance,” said Jacques Coumans, chief marketing and scientific officer, Analogic. “After reviewing the options available, we chose the AMD Embedded Radeon HD 7850 GPU for its excellent quality and scalability. The bk3000 ultrasound system, powered by AMD embedded graphics technology, delivers exceptional speed and image fidelity, which allows clinicians to identify anatomy and flow dynamics deeper in challenging patients.”

The AMD Embedded Radeon HD 7850 is based on AMD’s award-winning Graphics Core Next (GCN) architecture to advance the visual growth and parallel processing capabilities of embedded applications. In addition to ultrasound, other applications for GPGPU include some of the most complex parallel applications such as terrain and weather mapping, facial and gesture recognition, and biometric and DNA analysis.

The new Analogic bk3000 ultrasound system is targeted for urology, surgery, general imaging, and procedure guidance applications and is commercially available in key markets worldwide.

Source: AMD

AMD hits the peak of performance in gaming and productivity

Subject: General Tech | August 7, 2014 - 12:45 PM |
Tagged: HPC, amd, firepro, S9150, S9050, opencl

The new cooling on the 290X tends to have it at the top of the gaming charts and with the impending release of two new FirePro HPC cards AMD looks to take the productivity title away from the Tesla K40.  The higher end S9150 boasts 16GB GDDR5 memory with a 512-bit memory interface, 44 GCN compute units with 64 stream processors each there is a total of 2816 stream processors on board.  That equates to 5.07 TFLOPS peak single-precision  2.53 TFLOPS peak double-precision performance with theoretical memory bandwidth of 320GB per second.  AMD expects the S9150 to have support for OpenCL 2.0 drivers by the end of the year, which the lower priced and specced S9050 will not though both will support AMD Stream technology and OpenCL 1.2.  Check them out at The Register.


"The company's new big gun is the FirePro S9150 card, which maxes out at a blistering 5.07 TFLOPS peak single-precision floating-point performance and 2.53 TFLOPS peak double-precision performance."

Here is some more Tech News from around the web:

Tech Talk

Source: The Register

GDC 14: EGL 1.5 Specification Released by Khronos

Subject: General Tech, Graphics Cards, Mobile, Shows and Expos | March 19, 2014 - 09:02 AM |
Tagged: OpenGL ES, opengl, opencl, gdc 14, GDC, EGL

The Khronos Group has also released their ratified specification for EGL 1.5. This API is at the center of data and event management between other Khronos APIs. This version increases security, interoperability between APIs, and support for many operating systems, including Android and 64-bit Linux.


The headline on the list of changes is the move that EGLImage objects makes, from the realm of extension into EGL 1.5's core functionality, giving developers a reliable method of transferring textures and renderbuffers between graphics contexts and APIs. Second on the list is the increased security around creating a graphics context, primarily designed for WebGL applications which any arbitrary website can become. Further down the list is the EGLSync object which allows further partnership between OpenGL (and OpenGL ES) and OpenCL. The GPU may not need CPU involvement when scheduling between tasks on both APIs.

During the call, the representative also wanted to mention that developers have asked them to bring EGL back to Windows. While it has not happened yet, they have announced that it is a current target.

The EGL 1.5 spec is available at the Khronos website.

Source: Khronos

GDC 14: SYCL 1.2 Provisional Spec Released by Khronos

Subject: General Tech, Graphics Cards, Mobile, Shows and Expos | March 19, 2014 - 09:01 AM |
Tagged: SYCL, opencl, gdc 14, GDC

To gather community feedback, the provisional specification for SYCL 1.2 has been released by The Khronos Group. SYCL extends itself upon OpenCL with the C++11 standard. This technology is built on another Khronos platform, SPIR, which allows the OpenCL C programming language to be mapped onto LLVM, with its hundreds of compatible languages (and Khronos is careful to note that they intend for anyone to make their own compatible alternative langauge).


In short, SPIR allows many languages which can compile into LLVM to take advantage of OpenCL. SYCL is the specification for creating C++11 libraries and compilers through SPIR.

As stated earlier, Khronos wants anyone to make their own compatible language:

While SYCL is one possible solution for developers, the OpenCL group encourages innovation in programming models for heterogeneous systems, either by building on top of the SPIR™ low-level intermediate representation, leveraging C++ programming techniques through SYCL, using the open source CLU libraries for prototyping, or by developing their own techniques.

SYCL 1.2 supports OpenCL 1.2 and they intend to develop it alongside OpenCL. Future releases are expected to support the latest OpenCL 2.0 specification and keep up with future developments.

The SYCL 1.2 provisional spec is available at the Khronos website.

Source: Khronos

NitroWare Tests AMD's Photoshop OpenCL Claims

Subject: General Tech, Graphics Cards, Processors | February 5, 2014 - 02:08 AM |
Tagged: photoshop, opencl, Adobe

Adobe has recently enhanced Photoshop CC to accelerate certain filters via OpenCL. AMD contacted NitroWare with this information and claims of 11-fold performance increases with "Smart Sharpen" on Kaveri, specifically. The computer hardware site decided to test these claims on a Radeon HD 7850 using the test metrics that AMD provided them.

Sure enough, he noticed a 16-fold gain in performance. Without OpenCL, the filter's loading bar was on screen for over ten seconds; with it enabled, there was no bar.

Dominic from NitroWare is careful to note that an HD 7850 is significantly higher performance than an APU (barring some weird scenario involving memory transfers or something). This might mark the beginning of Adobe's road to sensible heterogeneous computing outside of video transcoding. Of course, this will also be exciting for AMD. While they cannot keep up with Intel, thread per thread, they are still a heavyweight in terms of total performance. With Photoshop, people might actually notice it.

Manufacturer: NVIDIA

NVIDIA Finally Gets Serious with Tegra

Tegra has had an interesting run of things.  The original Tegra 1 was utilized only by Microsoft with Zune.  Tegra 2 had a better adoption, but did not produce the design wins to propel NVIDIA to a leadership position in cell phones and tablets.  Tegra 3 found a spot in Microsoft’s Surface, but that has turned out to be a far more bitter experience than expected.  Tegra 4 so far has been integrated into a handful of products and is being featured in NVIDIA’s upcoming Shield product.  It also hit some production snags that made it later to market than expected.

I think the primary issue with the first three generations of products is pretty simple.  There was a distinct lack of differentiation from the other ARM based products around.  Yes, NVIDIA brought their graphics prowess to the market, but never in a form that distanced itself adequately from the competition.  Tegra 2 boasted GeForce based graphics, but we did not find out until later that it was comprised of basically four pixel shaders and four vertex shaders that had more in common with the GeForce 7800/7900 series than it did with any of the modern unified architectures of the time.  Tegra 3 boasted a big graphical boost, but it was in the form of doubling the pixel shader units and leaving the vertex units alone.


While NVIDIA had very strong developer relations and a leg up on the competition in terms of software support, it was never enough to propel Tegra beyond a handful of devices.  NVIDIA is trying to rectify that with Tegra 4 and the 72 shader units that it contains (still divided between pixel and vertex units).  Tegra 4 is not perfect in that it is late to market and the GPU is not OpenGL ES 3.0 compliant.  ARM, Imagination Technologies, and Qualcomm are offering new graphics processing units that are not only OpenGL ES 3.0 compliant, but also offer OpenCL 1.1 support.  Tegra 4 does not support OpenCL.  In fact, it does not support NVIDIA’s in-house CUDA.  Ouch.

Jumping into a new market is not an easy thing, and invariably mistakes will be made.  NVIDIA worked hard to make a solid foundation with their products, and certainly they had to learn to walk before they could run.  Unfortunately, running effectively entails having design wins due to outstanding features, performance, and power consumption.  NVIDIA was really only average in all of those areas.  NVIDIA is hoping to change that.  Their first salvo into offering a product that offers features and support that is a step above the competition is what we are talking about today.

Continue reading our article on the NVIDIA Kepler architecture making its way to mobile markets and Tegra!