Subject: Graphics Cards | February 14, 2017 - 09:29 PM | Scott Michaud
Tagged: opencl 2.0, opencl, nvidia, graphics drivers
While the headline of the GeForce 378.66 graphics driver release is support for For Honor, Halo Wars 2, and Sniper Elite 4, NVIDIA has snuck something major into the 378 branch: OpenCL 2.0 is now available for evaluation. (I double-checked 378.49 release notes and confirmed that this is new to 378.66.)
OpenCL 2.0 support is not complete yet, but at least NVIDIA is now clearly intending to roll it out to end-users. Among other benefits, OpenCL 2.0 allows kernels (think shaders) to, without the host intervening, enqueue work onto the GPU. This saves one (or more) round-trips to the CPU, especially in workloads where you don’t know which kernel will be required until you see the results of the previous run, like recursive sorting algorithms.
So yeah, that’s good, albeit you usually see big changes at the start of version branches.
Another major addition is Video SDK 8.0. This version allows 10- and 12-bit decoding of VP9 and HEVC video. So... yeah. Applications that want to accelerate video encoding or decoding can now hook up to NVIDIA GPUs for more codecs and features.
NVIDIA’s GeForce 378.66 drivers are available now.
Subject: Graphics Cards, Processors | August 30, 2015 - 09:14 PM | Scott Michaud
Tagged: amd, carrizo, Fiji, opencl, opencl 2.0
Apart from manufacturers with a heavy first-party focus, such as Apple and Nintendo, hardware is useless without developer support. In this case, AMD has updated their App SDK to include support for OpenCL 2.0, with code samples. It also updates the SDK for Windows 10, Carrizo, and Fiji, but it is not entirely clear how.
That said, OpenCL is important to those two products. Fiji has a very high compute throughput compared to any other GPU at the moment, and its memory bandwidth is often even more important for GPGPU workloads. It is also useful for Carrizo, because parallel compute and HSA features are what make it a unique product. AMD has been creating first-party software software and helping popular third-party developers such as Adobe, but a little support to the world at large could bring a killer application or two, especially from the open-source community.
The SDK has been available in pre-release form for quite some time now, but it is finally graduated out of beta. OpenCL 2.0 allows for work to be generated on the GPU, which is especially useful for tasks that vary upon previous results without contacting the CPU again.
Subject: Graphics Cards, Processors, Mobile, Shows and Expos | August 10, 2015 - 09:01 AM | Scott Michaud
Tagged: vulkan, spir, siggraph 2015, Siggraph, opengl sc, OpenGL ES, opengl, opencl, Khronos
When the Khronos Group announced Vulkan at GDC, they mentioned that the API is coming this year, and that this date is intended to under promise and over deliver. Recently, fans were hoping that it would be published at SIGGRAPH, which officially begun yesterday. Unfortunately, Vulkan has not released. It does hold a significant chunk of the news, however. Also, it's not like DirectX 12 is holding a commanding lead at the moment. The headers were public only for a few months, and the code samples are less than two weeks old.
The organization made announcements for six products today: OpenGL, OpenGL ES, OpenGL SC, OpenCL, SPIR, and, as mentioned, Vulkan. They wanted to make their commitment clear, to all of their standards. Vulkan is urgent, but some developers will still want the framework of OpenGL. Bind what you need to the context, then issue a draw and, if you do it wrong, the driver will often clean up the mess for you anyway. The briefing was structure to be evident that it is still in their mind, which is likely why they made sure three OpenGL logos greeted me in their slide deck as early as possible. They are also taking and closely examining feedback about who wants to use Vulkan or OpenGL, and why.
As for Vulkan, confirmed platforms have been announced. Vendors have committed to drivers on Windows 7, 8, 10, Linux, including Steam OS, and Tizen (OSX and iOS are absent, though). Beyond all of that, Google will accept Vulkan on Android. This is a big deal, as Google, despite its open nature, has been avoiding several Khronos Group standards. For instance, Nexus phones and tablets do not have OpenCL drivers, although Google isn't stopping third parties from rolling it into their devices, like Samsung and NVIDIA. Direct support of Vulkan should help cross-platform development as well as, and more importantly, target the multi-core, relatively slow threaded processors of those devices. This could even be of significant use for web browsers, especially in sites with a lot of simple 2D effects. Google is also contributing support from their drawElements Quality Program (dEQP), which is a conformance test suite that they bought back in 2014. They are going to expand it to Vulkan, so that developers will have more consistency between devices -- a big win for Android.
While we're not done with Vulkan, one of the biggest announcements is OpenGL ES 3.2 and it fits here nicely. At around the time that OpenGL ES 3.1 brought Compute Shaders to the embedded platform, Google launched the Android Extension Pack (AEP). This absorbed OpenGL ES 3.1 and added Tessellation, Geometry Shaders, and ASTC texture compression to it. It was also more tension between Google and cross-platform developers, feeling like Google was trying to pull its developers away from Khronos Group. Today, OpenGL ES 3.2 was announced and includes each of the AEP features, plus a few more (like “enhanced” blending). Better yet, Google will support it directly.
Next up are the desktop standards, before we finish with a resurrected embedded standard.
OpenGL has a few new extensions added. One interesting one is the ability to assign locations to multi-samples within a pixel. There is a whole list of sub-pixel layouts, such as rotated grid and Poisson disc. Apparently this extension allows developers to choose it, as certain algorithms work better or worse for certain geometries and structures. There were probably vendor-specific extensions for a while, but now it's a ratified one. Another extension allows “streamlined sparse textures”, which helps manage data where the number of unpopulated entries outweighs the number of populated ones.
OpenCL 2.0 was given a refresh, too. It contains a few bug fixes and clarifications that will help it be adopted. C++ headers were also released, although I cannot comment much on it. I do not know the state that OpenCL 2.0 was in before now.
And this is when we make our way back to Vulkan.
SPIR-V, the code that runs on the GPU (or other offloading device, including the other cores of a CPU) in OpenCL and Vulkan is seeing a lot of community support. Projects are under way to allow developers to write GPU code in several interesting languages: Python, .NET (C#), Rust, Haskell, and many more. The slide lists nine that Khronos Group knows about, but those four are pretty interesting. Again, this is saying that you can write code in the aforementioned languages and have it run directly on a GPU. Curiously missing is HLSL, and the President of Khronos Group agreed that it would be a useful language. The ability to cross-compile HLSL into SPIR-V means that shader code written for DirectX 9, 10, 11, and 12 could be compiled for Vulkan. He expects that it won't take long for a project to start, and might already be happening somewhere outside his Google abilities. Regardless, those who are afraid to program in the C-like GLSL and HLSL shading languages might find C# and Python to be a bit more their speed, and they seem to be happening through SPIR-V.
As mentioned, we'll end on something completely different.
For several years, the OpenGL SC has been on hiatus. This group defines standards for graphics (and soon GPU compute) in “safety critical” applications. For the longest time, this meant aircraft. The dozens of planes (which I assume meant dozens of models of planes) that adopted this technology were fine with a fixed-function pipeline. It has been about ten years since OpenGL SC 1.0 launched, which was based on OpenGL ES 1.0. SC 2.0 is planned to launch in 2016, which will be based on the much more modern OpenGL ES 2 and ES 3 APIs that allow pixel and vertex shaders. The Khronos Group is asking for participation to direct SC 2.0, as well as a future graphics and compute API that is potentially based on Vulkan.
The devices that this platform intends to target are: aircraft (again), automobiles, drones, and robots. There are a lot of ways that GPUs can help these devices, but they need a good API to certify against. It needs to withstand more than an Ouya, because crashes could be much more literal.
... But Is the Timing Right?
Windows 10 is about to launch and, with it, DirectX 12. Apart from the massive increase in draw calls, Explicit Multiadapter, both Linked and Unlinked, has been the cause of a few pockets of excitement here and there. I am a bit concerned, though. People seem to find this a new, novel concept that gives game developers the tools that they've never had before. It really isn't. Depending on what you want to do with secondary GPUs, game developers could have used them for years. Years!
Before we talk about the cross-platform examples, we should talk about Mantle. It is the closest analog to DirectX 12 and Vulkan that we have. It served as the base specification for Vulkan that the Khronos Group modified with SPIR-V instead of HLSL and so forth. Some claim that it was also the foundation of DirectX 12, which would not surprise me given what I've seen online and in the SDK. Allow me to show you how the API works.
Mantle is an interface that mixes Graphics, Compute, and DMA (memory access) into queues of commands. This is easily done in parallel, as each thread can create commands on its own, which is great for multi-core processors. Each queue, which are lists leading to the GPU that commands are placed in, can be handled independently, too. An interesting side-effect is that, since each device uses standard data structures, such as IEEE754 decimal numbers, no-one cares where these queues go as long as the work is done quick enough.
Since each queue is independent, an application can choose to manage many of them. None of these lists really need to know what is happening to any other. As such, they can be pointed to multiple, even wildly different graphics devices. Different model GPUs with different capabilities can work together, as long as they support the core of Mantle.
DirectX 12 and Vulkan took this metaphor so their respective developers could use this functionality across vendors. Mantle did not invent the concept, however. What Mantle did is expose this architecture to graphics, which can make use of all the fixed-function hardware that is unique to GPUs. Prior to AMD's usage, this was how GPU compute architectures were designed. Game developers could have spun up an OpenCL workload to process physics, audio, pathfinding, visibility, or even lighting and post-processing effects... on a secondary GPU, even from a completely different vendor.
Vista's multi-GPU bug might get in the way, but it was possible in 7 and, I believe, XP too.
Who Should Care? Thankfully, Many People
The Khronos Group has made three announcements today: Vulkan (their competitor to DirectX 12), OpenCL 2.1, and SPIR-V. Because there is actually significant overlap, we will discuss them in a single post rather than splitting them up. Each has a role in the overall goal to access and utilize graphics and compute devices.
Before we get into what everything is and does, let's give you a little tease to keep you reading. First, Khronos designs their technologies to be self-reliant. As such, while there will be some minimum hardware requirements, the OS pretty much just needs to have a driver model. Vulkan will not be limited to Windows 10 and similar operating systems. If a graphics vendor wants to go through the trouble, which is a gigantic if, Vulkan can be shimmed into Windows 8.x, Windows 7, possibly Windows Vista despite its quirks, and maybe even Windows XP. The words “and beyond” came up after Windows XP, but don't hold your breath for Windows ME or anything. Again, the further back in Windows versions you get, the larger the “if” becomes but at least the API will not have any “artificial limitations”.
Outside of Windows, the Khronos Group is the dominant API curator. Expect Vulkan on Linux, Mac, mobile operating systems, embedded operating systems, and probably a few toasters somewhere.
On that topic: there will not be a “Vulkan ES”. Vulkan is Vulkan, and it will run on desktop, mobile, VR, consoles that are open enough, and even cars and robotics. From a hardware side, the API requires a minimum of OpenGL ES 3.1 support. This is fairly high-end for mobile GPUs, but it is the first mobile spec to require compute shaders, which are an essential component of Vulkan. The presenter did not state a minimum hardware requirement for desktop GPUs, but he treated it like a non-issue. Graphics vendors will need to be the ones making the announcements in the end, though.
Subject: General Tech | January 12, 2015 - 01:29 PM | Jeremy Hellstrom
Tagged: ultrasound, opencl, hd 7850
The new bk3000 Ultrasound System from Analogic will use an embedded HD7850 and OpenCL to triple the quality of the information the ultrasound reveals. This will allow ultrasounds to reveal anatomical detail and micro-vascularization that was not available with previous ultrasound technology and could even enable Gamegaters to locate their own heads with the use of the E14C4t transducer. The most familiar usage of ultrasound is for displaying a fetus in utero but there are far more medical uses for this type of (mostly) non-invasive scan and the increase in detail and the transformation abilities that Open CL brings will not only make it more effective but could expand the usefulness of ultrasounds as a diagnostic tool. As we at PC Perspective continue to age we are very appreciative of advances such as this, especially if we can get a split screen that allows us to do a little light gaming while the doctors poke and prod!
SUNNYVALE, Calif. — Jan. 12, 2015 — AMD (NASDAQ:AMD) today announced that the AMD Embedded Radeon HD 7850 GPU is enabling cutting-edge application performance for the BK Ultrasound, powered by Analogic, bk3000 ultrasound system. Analogic is a leader in developing healthcare and security technology solutions to advance the practice of medicine to save lives.
“The AMD Embedded Radeon HD 7850 GPU with OpenCL provides a powerful and efficient pairing,” said Cameron Swen, segment marketing manager, medical applications, AMD Embedded Solutions. “This product is yet another proof point to AMD’s dedication to the healthcare segment through its technology, which helps facilitate crisp, detailed medical image visualization and other advanced graphics-driven capabilities, helping doctors provide improved care for patients.”
Analogic used OpenCL standard to gain access to the GPU for general-purpose computing, referred to as “GPGPU,” delivering exceptional performance and offering system and development cost reduction through cross-platform portability. As a result of using AMD GPU technology, Analogic achieved a 3x improvement in the amount of information in each ultrasound image and reduced time from capture to presentation. Traditional FPGAs and DSPs create a fixed, inflexible implementation that requires custom software targeted at specific hardware. Going to a software-based solution using OpenCL helps to further lower the development cost and provides improved long term value since the software can be used across product lines and through generation shifts.
“It was a critical design goal for us to implement a platform that delivered exceptional performance,” said Jacques Coumans, chief marketing and scientific officer, Analogic. “After reviewing the options available, we chose the AMD Embedded Radeon HD 7850 GPU for its excellent quality and scalability. The bk3000 ultrasound system, powered by AMD embedded graphics technology, delivers exceptional speed and image fidelity, which allows clinicians to identify anatomy and flow dynamics deeper in challenging patients.”
The AMD Embedded Radeon HD 7850 is based on AMD’s award-winning Graphics Core Next (GCN) architecture to advance the visual growth and parallel processing capabilities of embedded applications. In addition to ultrasound, other applications for GPGPU include some of the most complex parallel applications such as terrain and weather mapping, facial and gesture recognition, and biometric and DNA analysis.
The new Analogic bk3000 ultrasound system is targeted for urology, surgery, general imaging, and procedure guidance applications and is commercially available in key markets worldwide.
Subject: General Tech | August 7, 2014 - 12:45 PM | Jeremy Hellstrom
Tagged: HPC, amd, firepro, S9150, S9050, opencl
The new cooling on the 290X tends to have it at the top of the gaming charts and with the impending release of two new FirePro HPC cards AMD looks to take the productivity title away from the Tesla K40. The higher end S9150 boasts 16GB GDDR5 memory with a 512-bit memory interface, 44 GCN compute units with 64 stream processors each there is a total of 2816 stream processors on board. That equates to 5.07 TFLOPS peak single-precision 2.53 TFLOPS peak double-precision performance with theoretical memory bandwidth of 320GB per second. AMD expects the S9150 to have support for OpenCL 2.0 drivers by the end of the year, which the lower priced and specced S9050 will not though both will support AMD Stream technology and OpenCL 1.2. Check them out at The Register.
"The company's new big gun is the FirePro S9150 card, which maxes out at a blistering 5.07 TFLOPS peak single-precision floating-point performance and 2.53 TFLOPS peak double-precision performance."
Here is some more Tech News from around the web:
- How to Choose the Best Linux Desktop for You @ Linux.com
- nCrypted Cloud brings client side integration to Dropbox, Microsoft Onedrive @ The Inquirer
- IBM can't give away its chip business: report @ The Register
- Testing VR Limits with a Raspberry Pi @ Hack a Day
- Google Will Give a Search Edge To Websites That Use Encryption @ Slashdot
- OpenSSL receives nine post-Heartbleed critical bug fixes @ The Inquirer
- Now even Internet Explorer will throw lousy old Java into the abyss @ The Register
- Striker Capsule Task Light @ Benchmark Reviews
- Almost $1K worth of prizes up for grabs in our haiku contest @ The Tech Report
Subject: General Tech, Graphics Cards, Mobile, Shows and Expos | March 19, 2014 - 09:02 AM | Scott Michaud
Tagged: OpenGL ES, opengl, opencl, gdc 14, GDC, EGL
The Khronos Group has also released their ratified specification for EGL 1.5. This API is at the center of data and event management between other Khronos APIs. This version increases security, interoperability between APIs, and support for many operating systems, including Android and 64-bit Linux.
The headline on the list of changes is the move that EGLImage objects makes, from the realm of extension into EGL 1.5's core functionality, giving developers a reliable method of transferring textures and renderbuffers between graphics contexts and APIs. Second on the list is the increased security around creating a graphics context, primarily designed for WebGL applications which any arbitrary website can become. Further down the list is the EGLSync object which allows further partnership between OpenGL (and OpenGL ES) and OpenCL. The GPU may not need CPU involvement when scheduling between tasks on both APIs.
During the call, the representative also wanted to mention that developers have asked them to bring EGL back to Windows. While it has not happened yet, they have announced that it is a current target.
The EGL 1.5 spec is available at the Khronos website.
Subject: General Tech, Graphics Cards, Mobile, Shows and Expos | March 19, 2014 - 09:01 AM | Scott Michaud
Tagged: SYCL, opencl, gdc 14, GDC
To gather community feedback, the provisional specification for SYCL 1.2 has been released by The Khronos Group. SYCL extends itself upon OpenCL with the C++11 standard. This technology is built on another Khronos platform, SPIR, which allows the OpenCL C programming language to be mapped onto LLVM, with its hundreds of compatible languages (and Khronos is careful to note that they intend for anyone to make their own compatible alternative langauge).
In short, SPIR allows many languages which can compile into LLVM to take advantage of OpenCL. SYCL is the specification for creating C++11 libraries and compilers through SPIR.
As stated earlier, Khronos wants anyone to make their own compatible language:
While SYCL is one possible solution for developers, the OpenCL group encourages innovation in programming models for heterogeneous systems, either by building on top of the SPIR™ low-level intermediate representation, leveraging C++ programming techniques through SYCL, using the open source CLU libraries for prototyping, or by developing their own techniques.
SYCL 1.2 supports OpenCL 1.2 and they intend to develop it alongside OpenCL. Future releases are expected to support the latest OpenCL 2.0 specification and keep up with future developments.
The SYCL 1.2 provisional spec is available at the Khronos website.
Subject: General Tech, Graphics Cards, Processors | February 5, 2014 - 02:08 AM | Scott Michaud
Tagged: photoshop, opencl, Adobe
Adobe has recently enhanced Photoshop CC to accelerate certain filters via OpenCL. AMD contacted NitroWare with this information and claims of 11-fold performance increases with "Smart Sharpen" on Kaveri, specifically. The computer hardware site decided to test these claims on a Radeon HD 7850 using the test metrics that AMD provided them.
Sure enough, he noticed a 16-fold gain in performance. Without OpenCL, the filter's loading bar was on screen for over ten seconds; with it enabled, there was no bar.
Dominic from NitroWare is careful to note that an HD 7850 is significantly higher performance than an APU (barring some weird scenario involving memory transfers or something). This might mark the beginning of Adobe's road to sensible heterogeneous computing outside of video transcoding. Of course, this will also be exciting for AMD. While they cannot keep up with Intel, thread per thread, they are still a heavyweight in terms of total performance. With Photoshop, people might actually notice it.