... But Is the Timing Right?
Windows 10 is about to launch and, with it, DirectX 12. Apart from the massive increase in draw calls, Explicit Multiadapter, both Linked and Unlinked, has been the cause of a few pockets of excitement here and there. I am a bit concerned, though. People seem to find this a new, novel concept that gives game developers the tools that they've never had before. It really isn't. Depending on what you want to do with secondary GPUs, game developers could have used them for years. Years!
Before we talk about the cross-platform examples, we should talk about Mantle. It is the closest analog to DirectX 12 and Vulkan that we have. It served as the base specification for Vulkan that the Khronos Group modified with SPIR-V instead of HLSL and so forth. Some claim that it was also the foundation of DirectX 12, which would not surprise me given what I've seen online and in the SDK. Allow me to show you how the API works.
Mantle is an interface that mixes Graphics, Compute, and DMA (memory access) into queues of commands. This is easily done in parallel, as each thread can create commands on its own, which is great for multi-core processors. Each queue, which are lists leading to the GPU that commands are placed in, can be handled independently, too. An interesting side-effect is that, since each device uses standard data structures, such as IEEE754 decimal numbers, no-one cares where these queues go as long as the work is done quick enough.
Since each queue is independent, an application can choose to manage many of them. None of these lists really need to know what is happening to any other. As such, they can be pointed to multiple, even wildly different graphics devices. Different model GPUs with different capabilities can work together, as long as they support the core of Mantle.
DirectX 12 and Vulkan took this metaphor so their respective developers could use this functionality across vendors. Mantle did not invent the concept, however. What Mantle did is expose this architecture to graphics, which can make use of all the fixed-function hardware that is unique to GPUs. Prior to AMD's usage, this was how GPU compute architectures were designed. Game developers could have spun up an OpenCL workload to process physics, audio, pathfinding, visibility, or even lighting and post-processing effects... on a secondary GPU, even from a completely different vendor.
Vista's multi-GPU bug might get in the way, but it was possible in 7 and, I believe, XP too.
Who Should Care? Thankfully, Many People
The Khronos Group has made three announcements today: Vulkan (their competitor to DirectX 12), OpenCL 2.1, and SPIR-V. Because there is actually significant overlap, we will discuss them in a single post rather than splitting them up. Each has a role in the overall goal to access and utilize graphics and compute devices.
Before we get into what everything is and does, let's give you a little tease to keep you reading. First, Khronos designs their technologies to be self-reliant. As such, while there will be some minimum hardware requirements, the OS pretty much just needs to have a driver model. Vulkan will not be limited to Windows 10 and similar operating systems. If a graphics vendor wants to go through the trouble, which is a gigantic if, Vulkan can be shimmed into Windows 8.x, Windows 7, possibly Windows Vista despite its quirks, and maybe even Windows XP. The words “and beyond” came up after Windows XP, but don't hold your breath for Windows ME or anything. Again, the further back in Windows versions you get, the larger the “if” becomes but at least the API will not have any “artificial limitations”.
Outside of Windows, the Khronos Group is the dominant API curator. Expect Vulkan on Linux, Mac, mobile operating systems, embedded operating systems, and probably a few toasters somewhere.
On that topic: there will not be a “Vulkan ES”. Vulkan is Vulkan, and it will run on desktop, mobile, VR, consoles that are open enough, and even cars and robotics. From a hardware side, the API requires a minimum of OpenGL ES 3.1 support. This is fairly high-end for mobile GPUs, but it is the first mobile spec to require compute shaders, which are an essential component of Vulkan. The presenter did not state a minimum hardware requirement for desktop GPUs, but he treated it like a non-issue. Graphics vendors will need to be the ones making the announcements in the end, though.
Subject: General Tech | January 12, 2015 - 01:29 PM | Jeremy Hellstrom
Tagged: ultrasound, opencl, hd 7850
The new bk3000 Ultrasound System from Analogic will use an embedded HD7850 and OpenCL to triple the quality of the information the ultrasound reveals. This will allow ultrasounds to reveal anatomical detail and micro-vascularization that was not available with previous ultrasound technology and could even enable Gamegaters to locate their own heads with the use of the E14C4t transducer. The most familiar usage of ultrasound is for displaying a fetus in utero but there are far more medical uses for this type of (mostly) non-invasive scan and the increase in detail and the transformation abilities that Open CL brings will not only make it more effective but could expand the usefulness of ultrasounds as a diagnostic tool. As we at PC Perspective continue to age we are very appreciative of advances such as this, especially if we can get a split screen that allows us to do a little light gaming while the doctors poke and prod!
SUNNYVALE, Calif. — Jan. 12, 2015 — AMD (NASDAQ:AMD) today announced that the AMD Embedded Radeon HD 7850 GPU is enabling cutting-edge application performance for the BK Ultrasound, powered by Analogic, bk3000 ultrasound system. Analogic is a leader in developing healthcare and security technology solutions to advance the practice of medicine to save lives.
“The AMD Embedded Radeon HD 7850 GPU with OpenCL provides a powerful and efficient pairing,” said Cameron Swen, segment marketing manager, medical applications, AMD Embedded Solutions. “This product is yet another proof point to AMD’s dedication to the healthcare segment through its technology, which helps facilitate crisp, detailed medical image visualization and other advanced graphics-driven capabilities, helping doctors provide improved care for patients.”
Analogic used OpenCL standard to gain access to the GPU for general-purpose computing, referred to as “GPGPU,” delivering exceptional performance and offering system and development cost reduction through cross-platform portability. As a result of using AMD GPU technology, Analogic achieved a 3x improvement in the amount of information in each ultrasound image and reduced time from capture to presentation. Traditional FPGAs and DSPs create a fixed, inflexible implementation that requires custom software targeted at specific hardware. Going to a software-based solution using OpenCL helps to further lower the development cost and provides improved long term value since the software can be used across product lines and through generation shifts.
“It was a critical design goal for us to implement a platform that delivered exceptional performance,” said Jacques Coumans, chief marketing and scientific officer, Analogic. “After reviewing the options available, we chose the AMD Embedded Radeon HD 7850 GPU for its excellent quality and scalability. The bk3000 ultrasound system, powered by AMD embedded graphics technology, delivers exceptional speed and image fidelity, which allows clinicians to identify anatomy and flow dynamics deeper in challenging patients.”
The AMD Embedded Radeon HD 7850 is based on AMD’s award-winning Graphics Core Next (GCN) architecture to advance the visual growth and parallel processing capabilities of embedded applications. In addition to ultrasound, other applications for GPGPU include some of the most complex parallel applications such as terrain and weather mapping, facial and gesture recognition, and biometric and DNA analysis.
The new Analogic bk3000 ultrasound system is targeted for urology, surgery, general imaging, and procedure guidance applications and is commercially available in key markets worldwide.
Subject: General Tech | August 7, 2014 - 12:45 PM | Jeremy Hellstrom
Tagged: HPC, amd, firepro, S9150, S9050, opencl
The new cooling on the 290X tends to have it at the top of the gaming charts and with the impending release of two new FirePro HPC cards AMD looks to take the productivity title away from the Tesla K40. The higher end S9150 boasts 16GB GDDR5 memory with a 512-bit memory interface, 44 GCN compute units with 64 stream processors each there is a total of 2816 stream processors on board. That equates to 5.07 TFLOPS peak single-precision 2.53 TFLOPS peak double-precision performance with theoretical memory bandwidth of 320GB per second. AMD expects the S9150 to have support for OpenCL 2.0 drivers by the end of the year, which the lower priced and specced S9050 will not though both will support AMD Stream technology and OpenCL 1.2. Check them out at The Register.
"The company's new big gun is the FirePro S9150 card, which maxes out at a blistering 5.07 TFLOPS peak single-precision floating-point performance and 2.53 TFLOPS peak double-precision performance."
Here is some more Tech News from around the web:
- How to Choose the Best Linux Desktop for You @ Linux.com
- nCrypted Cloud brings client side integration to Dropbox, Microsoft Onedrive @ The Inquirer
- IBM can't give away its chip business: report @ The Register
- Testing VR Limits with a Raspberry Pi @ Hack a Day
- Google Will Give a Search Edge To Websites That Use Encryption @ Slashdot
- OpenSSL receives nine post-Heartbleed critical bug fixes @ The Inquirer
- Now even Internet Explorer will throw lousy old Java into the abyss @ The Register
- Striker Capsule Task Light @ Benchmark Reviews
- Almost $1K worth of prizes up for grabs in our haiku contest @ The Tech Report
Subject: General Tech, Graphics Cards, Mobile, Shows and Expos | March 19, 2014 - 09:02 AM | Scott Michaud
Tagged: OpenGL ES, opengl, opencl, gdc 14, GDC, EGL
The Khronos Group has also released their ratified specification for EGL 1.5. This API is at the center of data and event management between other Khronos APIs. This version increases security, interoperability between APIs, and support for many operating systems, including Android and 64-bit Linux.
The headline on the list of changes is the move that EGLImage objects makes, from the realm of extension into EGL 1.5's core functionality, giving developers a reliable method of transferring textures and renderbuffers between graphics contexts and APIs. Second on the list is the increased security around creating a graphics context, primarily designed for WebGL applications which any arbitrary website can become. Further down the list is the EGLSync object which allows further partnership between OpenGL (and OpenGL ES) and OpenCL. The GPU may not need CPU involvement when scheduling between tasks on both APIs.
During the call, the representative also wanted to mention that developers have asked them to bring EGL back to Windows. While it has not happened yet, they have announced that it is a current target.
The EGL 1.5 spec is available at the Khronos website.
Subject: General Tech, Graphics Cards, Mobile, Shows and Expos | March 19, 2014 - 09:01 AM | Scott Michaud
Tagged: SYCL, opencl, gdc 14, GDC
To gather community feedback, the provisional specification for SYCL 1.2 has been released by The Khronos Group. SYCL extends itself upon OpenCL with the C++11 standard. This technology is built on another Khronos platform, SPIR, which allows the OpenCL C programming language to be mapped onto LLVM, with its hundreds of compatible languages (and Khronos is careful to note that they intend for anyone to make their own compatible alternative langauge).
In short, SPIR allows many languages which can compile into LLVM to take advantage of OpenCL. SYCL is the specification for creating C++11 libraries and compilers through SPIR.
As stated earlier, Khronos wants anyone to make their own compatible language:
While SYCL is one possible solution for developers, the OpenCL group encourages innovation in programming models for heterogeneous systems, either by building on top of the SPIR™ low-level intermediate representation, leveraging C++ programming techniques through SYCL, using the open source CLU libraries for prototyping, or by developing their own techniques.
SYCL 1.2 supports OpenCL 1.2 and they intend to develop it alongside OpenCL. Future releases are expected to support the latest OpenCL 2.0 specification and keep up with future developments.
The SYCL 1.2 provisional spec is available at the Khronos website.
Subject: General Tech, Graphics Cards, Processors | February 5, 2014 - 02:08 AM | Scott Michaud
Tagged: photoshop, opencl, Adobe
Adobe has recently enhanced Photoshop CC to accelerate certain filters via OpenCL. AMD contacted NitroWare with this information and claims of 11-fold performance increases with "Smart Sharpen" on Kaveri, specifically. The computer hardware site decided to test these claims on a Radeon HD 7850 using the test metrics that AMD provided them.
Sure enough, he noticed a 16-fold gain in performance. Without OpenCL, the filter's loading bar was on screen for over ten seconds; with it enabled, there was no bar.
Dominic from NitroWare is careful to note that an HD 7850 is significantly higher performance than an APU (barring some weird scenario involving memory transfers or something). This might mark the beginning of Adobe's road to sensible heterogeneous computing outside of video transcoding. Of course, this will also be exciting for AMD. While they cannot keep up with Intel, thread per thread, they are still a heavyweight in terms of total performance. With Photoshop, people might actually notice it.
NVIDIA Finally Gets Serious with Tegra
Tegra has had an interesting run of things. The original Tegra 1 was utilized only by Microsoft with Zune. Tegra 2 had a better adoption, but did not produce the design wins to propel NVIDIA to a leadership position in cell phones and tablets. Tegra 3 found a spot in Microsoft’s Surface, but that has turned out to be a far more bitter experience than expected. Tegra 4 so far has been integrated into a handful of products and is being featured in NVIDIA’s upcoming Shield product. It also hit some production snags that made it later to market than expected.
I think the primary issue with the first three generations of products is pretty simple. There was a distinct lack of differentiation from the other ARM based products around. Yes, NVIDIA brought their graphics prowess to the market, but never in a form that distanced itself adequately from the competition. Tegra 2 boasted GeForce based graphics, but we did not find out until later that it was comprised of basically four pixel shaders and four vertex shaders that had more in common with the GeForce 7800/7900 series than it did with any of the modern unified architectures of the time. Tegra 3 boasted a big graphical boost, but it was in the form of doubling the pixel shader units and leaving the vertex units alone.
While NVIDIA had very strong developer relations and a leg up on the competition in terms of software support, it was never enough to propel Tegra beyond a handful of devices. NVIDIA is trying to rectify that with Tegra 4 and the 72 shader units that it contains (still divided between pixel and vertex units). Tegra 4 is not perfect in that it is late to market and the GPU is not OpenGL ES 3.0 compliant. ARM, Imagination Technologies, and Qualcomm are offering new graphics processing units that are not only OpenGL ES 3.0 compliant, but also offer OpenCL 1.1 support. Tegra 4 does not support OpenCL. In fact, it does not support NVIDIA’s in-house CUDA. Ouch.
Jumping into a new market is not an easy thing, and invariably mistakes will be made. NVIDIA worked hard to make a solid foundation with their products, and certainly they had to learn to walk before they could run. Unfortunately, running effectively entails having design wins due to outstanding features, performance, and power consumption. NVIDIA was really only average in all of those areas. NVIDIA is hoping to change that. Their first salvo into offering a product that offers features and support that is a step above the competition is what we are talking about today.
Subject: General Tech | June 20, 2013 - 02:03 PM | Ken Addison
Tagged: video, podcast, 780m, frame rating, nvidia, kepler, xbox one, Adobe, CC, opencl
PC Perspective Podcast #256 - 06/20/2013
Join us this week as we discuss Mobile Frame Rating, NVIDIA licensing Kepler, Xbox One DRM and more!
The URL for the podcast is: http://pcper.com/podcast - Share with your friends!
- iTunes - Subscribe to the podcast directly through the Store
- RSS - Subscribe through your regular RSS reader
- MP3 - Direct download link to the MP3 file
Hosts: Ryan Shrout, Josh Walrath, Allyn Malventano and Morry Teitelman
Program length: 1:33:43
Week in Review:
0:01:55 NVIDIA GeForce GTX 780M Testing
News items of interest:
0:43:30 Ryan's summary of E3
Oculus 1080p, Razer Blade, Monoprice, SHIELD
1:11:40 Steam might allow shared games?
1:22:00 Hardware/Software Picks of the Week:
1-888-38-PCPER or firstname.lastname@example.org
OpenCL Support in a Meaningful Way
Adobe had OpenCL support since last year. You would never benefit from its inclusion unless you ran one of two AMD mobility chips under Mac OSX Lion, but it was there. Creative Cloud, predictably, furthers this trend with additional GPGPU support for applications like Photoshop and Premiere Pro.
This leads to some interesting points:
- How OpenCL is changing the landscape between Intel and AMD
- What GPU support is curiously absent from Adobe CC for one reason or another
- Which GPUs are supported despite not... existing, officially.
This should be very big news for our readers who do production work whether professional or for a hobby. If not, how about a little information about certain GPUs that are designed to compete with the GeForce 700-series?