The Right People to Interview
Last week, we reported that OpenCL’s roadmap would be merging into Vulkan, and OpenCL would, starting at some unspecified time in the future, be based “on an extended version of the Vulkan API”. This was based on quotes from several emails between myself and the Khronos Group.
Since that post, I had the opportunity to have a phone interview with Neil Trevett, president of the Khronos Group and chairman of the OpenCL working group, and Tom Olson, chairman of the Vulkan working group. We spent a little over a half hour going over Neil’s International Workshop on OpenCL (IWOCL) presentation, discussing the decision, and answering a few lingering questions. This post will present the results of that conference call in a clean, readable way.
First and foremost, while OpenCL is planning to merge into the Vulkan API, the Khronos Group wants to make it clear that “all of the merging” is coming from the OpenCL working group. The Vulkan API roadmap is not affected by this decision. Of course, the Vulkan working group will be able to take advantage of technologies that are dropping into their lap, but those discussions have not even begun yet.
Neil: Vulkan has its mission and its roadmap, and it’s going ahead on that. OpenCL is doing all of the merging. We’re kind-of coming in to head in the Vulkan direction.
Does that mean, in the future, that there’s a bigger wealth of opportunity to figure out how we can take advantage of all this kind of mutual work? The answer is yes, but we haven’t started those discussions yet. I’m actually excited to have those discussions, and are many people, but that’s a clarity. We haven’t started yet on how Vulkan, itself, is changed (if at all) by this. So that’s kind-of the clarity that I think is important for everyone out there trying to understand what’s going on.
Tom also prepared an opening statement. It’s not as easy to abbreviate, so it’s here unabridged.
Tom: I think that’s fair. From the Vulkan point of view, the way the working group thinks about this is that Vulkan is an abstract machine, or at least there’s an abstract machine underlying it. We have a programming language for it, called SPIR-V, and we have an interface controlling it, called the API. And that machine, in its full glory… it’s a GPU, basically, and it’s got lots of graphics functionality. But you don’t have to use that. And the API and the programming language are very general. And you can build lots of things with them. So it’s great, from our point of view, that the OpenCL group, with their special expertise, can use that and leverage that. That’s terrific, and we’re fully behind it, and we’ll help them all we can. We do have our own constituency to serve, which is the high-performance game developer first and foremost, and we are going to continue to serve them as our main mission.
So we’re not changing our roadmap so much as trying to make sure we’re a good platform for other functionality to be built on.
Neil then went on to mention that the decision to merge OpenCL’s roadmap into the Vulkan API took place only a couple of weeks ago. The purpose of the press release was to reach OpenCL developers and get their feedback. According to him, they did a show of hands at the conference, with a room full of a hundred OpenCL developers, and no-one was against moving to the Vulkan API. This gives them confidence that developers will accept the decision, and that their needs will be served by it.
Next up is the why. Read on for more.
Subject: Graphics Cards | May 23, 2017 - 03:58 PM | Jeremy Hellstrom
Tagged: ek cooling, pascal, nvidia, waterblock, GTX FE
The current series of EK Cooling waterblocks for Pascal based GPUs, up to and including the new Titan X are being replaced with a new family of coolers. The new GTX FE water blocks will be compatible with the previous generation of backplates, so you can do a partial upgrade or keep an eye out for discounts on the previous generation.
These new coolers will fit on any Founders Edition reference card, from GTX 1060's through to the Titan X, currently that count stands at 106 unique graphics cards so your card is likely to be compatible. You can choose between four models, a plain design, one with acetal, one with nickel and one with both acetal and nickel, whichever one you choose it will still run you 109.95€/$125USD
Full PR is below.
EK Water Blocks, the Slovenia-based premium computer liquid cooling gear manufacturer, is releasing several new EK-FC GeForce GTX FE water blocks that are compatible with multiple reference design Founders Edition NVIDIA® GeForce GTX 1060, 1070, 1080, 1080 Ti, Titan X Pascal and Titan Xp based graphics cards. All the water blocks feature recently introduced aesthetic terminal cover as well! FE blocks come as a replacement to current GeForce GTX 10x0 / TITAN X Series of water blocks.
All current GeForce GTX 10x0 / TITAN X Series of water blocks are going to be discontinued after the stock runs out and FE blocks come as a complete replacement. FE blocks are designed to fit all reference design Founders Edition NVIDIA GeForce GTX 1060, 1070, 1080, 1080 Ti, Titan X Pascal and Titan Xp based graphics cards. The current compatibility list rounds up a total of 106 graphics cards that are on the market, but as always, we recommend that you refer to the EK Cooling Configurator for a precise compatibility match.
The new EK-FC GeForce GTX FE water blocks are also backward compatible with all EK-FC1080 GTX Backplates, EK-FC1080 GTX Ti Backplates, and EK-FC Titan X Pascal Backplates.
Availability and pricing
These water blocks are made in Slovenia, Europe and are available for purchase through EK Webshop and Partner Reseller Network. In the table below you can see manufacturer suggested retail price (MSRP) with VAT included.
Subject: Graphics Cards | May 20, 2017 - 07:01 AM | Scott Michaud
Tagged: graphics drivers, amd
The second graphics driver of the month from AMD, Radeon Software Crimson ReLive 17.5.2, adds optimizations for Bethesda’s new shooter, Prey. AMD claims that it will yield up to a 4.5% performance improvement, as measured on an RX 580 (versus the same card with 17.5.1). This is over and above the up to 4.7% increase that 17.5.1 had over 17.4.4.
Outside of that game, 17.5.2 also addresses four issues. The first is a crash in NieR: Automata. The second is long load times in Forza Horizon 3. The third is a system hang with the RX 550 when going sleep. The fourth fixed issue is a bit more complicated; apparently, in a multi-GPU system, where monitors are attached to multiple graphics cards, the primary graphics card can appear disabled in Radeon Settings. All four are now fixed, so, if they affect you, then pick up the driver.
As always, they are available from AMD’s website.
Subject: Graphics Cards, Mobile | May 17, 2017 - 02:30 PM | Ryan Shrout
Tagged: snapdragon 835, snapdragon, qualcomm, google io 2017, google, daydream
During the Google I/O keynote, Google and Qualcomm announced a partnership to create a reference design for a standalone Daydream VR headset using Snapdragon 835 to enable the ecosystem of partners to have deliverable hardware in consumers’ hands by the end of 2017. The time line is aggressive, impressively so, thanks in large part to the previous work Qualcomm had done with the Snapdragon-based VR reference design we first saw in September 2016. At the time the Qualcomm platform was powered by the Snapdragon 820. Since then, Qualcomm has updated the design to integrate the Snapdragon 835 processor and platform, improving performance and efficiency along the way.
Google has now taken the reference platform and made some modifications to integrate Daydream support and will offer it to partners to show case what a standalone, untethered VR solution can do. Even though Google Daydream has been shipping in the form of slot-in phones with a “dummy” headset, integrating the whole package into a dedicate device offers several advantages.
First, I expected the free standalone units to have better performance than the phones used as a slot-in solution. With the ability to tune the device to higher thermal limits, Qualcomm and Google will be able to ramp up the clocks on the GPU and SoC to get optimal performance. And, because there is more room for a larger battery on the headset design, there should be an advantage in battery life along with the increase in performance.
The Qualcomm Snapdragon 835 VR Reference Device
It is also likely that the device will have better thermal properties than those using high smartphones today. In other words, with more space, there should be more area for cooling and thus the unit shouldn’t be as warm on the consumers face.
I would assume as well that the standalone units will have improved hardware over the smartphone iterations. That means better gyros, cameras, sensors, etc. that could lead to improved capability for the hardware in this form. Better hardware, tighter and more focused integration and better software support should mean lower latency and better VR gaming across the board. Assuming everything is implemented as it should.
The only major change that Google has made to this reference platform is the move away from Qualcomm’s 6DOF technology (6 degrees of freedom, allowing you to move in real space and have all necessary tracking done on the headset itself) and to Google calls WorldSense. Based on the Google Project Tango technology, this is the one area I have questions about going forward. I have used three different Tango enabled devices thus far with long-term personal testing and can say that while the possibilities for it were astounding, the implementations had been…slow. For VR that 100% cannot be the case. I don’t yet know how different its integration is from what Qualcomm had done previously, but hopefully Google will leverage the work Qualcomm has already done with its platform.
Google is claiming that consumers will have hardware based on this reference design in 2017 but no pricing has been shared with me yet. I wouldn’t expect it to be inexpensive though – we are talking about all the hardware that goes into a flagship smartphone plus a little extra for the VR goodness. We’ll see how aggressive Google wants its partners to be and if it is willing to absorb any of the upfront costs with subsidy.
Let me know if this is the direction you hope to see VR move – away from tethered PC-based solutions and into the world of standalone units.
Subject: Graphics Cards | May 17, 2017 - 01:55 PM | Jeremy Hellstrom
Tagged: nvidia, msi, gt 1030, gigabyte, evga. zotac
The GT 1030 quietly launched from a variety of vendors late yesterday amidst the tsunami of AMD announcements. The low profile card is advertised as offering twice the performance of the iGPU found on Intel Core i5 processors and in many cases is passively cooled. From the pricing of the cards available now, expect to pay around $75 to $85 for this new card.
EVGA announced a giveaway of several GTX 1030s at the same time as they released the model names. The card which is currently available retails for $75 and is clocked at 1290MHz base, 1544 MHz boost and has 384 CUDA Cores. The 2GB of GDDR5 is clocked a hair over 6GHz and runs on a 64 bit bus providing a memory bandwidth of 48.06 GB/s. Two of their three models offer HDMI + DVI-D out, the third has a pair of DVI-D connectors.
Zotac's offering provides slightly lower clocks, a base of 1227MHz and boost of 1468MHz however the VRAM remains unchanged at 6GHz. It pairs HDMI 2.0b with a DVI slot and comes with a low profile bracket if needed for an SFF build.
MSI went all out and released a half dozen models, two of which you can see above. The GT 1030 AERO ITX 2G OC is actively cooled which allows you to reach a 1265MHz base and 1518MHz boost clock. The passively cooled GT 1030 2GH LP OCV1 runs at the same frequency and fits in a single slot externally, however you will need to leave space inside the system as the heatsink takes up an additional slot internally. Both are fully compatible with the Afterburner Overclocking Utility and its features such as the Predator gameplay recording tool.
Last but not least are a pair from Gigabyte, the GT 1030 Low Profile 2G and Silent Low Profile 2G cards. The the cards both offer you two modes, in OC Mode the base clock is 1252MHz and boost clock 1506MHz while in Gaming Mode you will run at 1227MHz base and 1468MHz boost.
Is it time to buy that new GPU?
Testing commissioned by AMD. This means that AMD paid us for our time, but had no say in the results or presentation of them.
Earlier this week Bethesda and Arkane Studios released Prey, a first-person shooter that is a re-imaging of the 2006 game of the same name. Fans of System Shock will find a lot to love about this new title and I have found myself enamored with the game…in the name of science of course.
While doing my due diligence and performing some preliminary testing to see if we would utilize Prey for graphics testing going forward, AMD approached me to discuss this exact title. With the release of the Radeon RX 580 in April, one of the key storylines is that the card offers a reasonably priced upgrade path for users of 2+ year old hardware. With that upgrade you should see some substantial performance improvements and as I will show you here, the new Prey is a perfect example of that.
Targeting the Radeon R9 380, a graphics card that was originally released back in May of 2015, the RX 580 offers substantially better performance at a very similar launch price. The same is true for the GeForce GTX 960: launched in January of 2015, it is slightly longer in the tooth. AMD’s data shows that 80% of the users on Steam are running on R9 380X or slower graphics cards and that only 10% of them upgraded in 2016. Considering the great GPUs that were available then (including the RX 480 and the GTX 10-series), it seems more and more likely that we going to hit an upgrade inflection point in the market.
A simple experiment was setup: does the new Radeon RX 580 offer a worthwhile upgrade path for those many users of R9 380 or GTX 960 classifications of graphics cards (or older)?
|Radeon RX 580||Radeon R9 380||GeForce GTX 960|
|GPU||Polaris 20||Tonga Pro||GM206|
|Rated Clock||1340 MHz||918 MHz||1127 MHz|
|TDP||185 watts||190 watts||120 watts|
|MSRP (at launch)||$199 (4GB)
Subject: Graphics Cards | May 16, 2017 - 07:39 PM | Sebastian Peak
Tagged: Vega, reference, radeon, graphics card, gpu, Frontier Edition, amd
AMD has revealed their concept of a premium reference GPU for the upcoming Radeon Vega launch, with the "Frontier Edition" of the new graphics cards.
"Today, AMD announced its brand-new Radeon Vega Frontier Edition, the world’s most powerful solution for machine learning and advanced visualization aimed to empower the next generation of data scientists and visualization professionals -- the digital pioneers forging new paths in their fields. Designed to handle the most demanding design, rendering, and machine intelligence workloads, this powerful new graphics card excels in:
- Machine learning. Together with AMD’s ROCm open software platform, Radeon Vega Frontier Edition enables developers to tap into the power of Vega for machine learning algorithm development. Frontier Edition delivers more than 50 percent more performance than today’s most powerful machine learning GPUs.
- Advanced visualization. Radon Vega Frontier Edition provides the performance required to drive increasingly large and complex models for real-time visualization, physically-based rendering and virtual reality through the design phase as well as rendering phase of product development.
- VR workloads. Radeon Vega Frontier Edition is ideal for VR content creation supporting AMD’s LiquidVR technology to deliver the gripping content, advanced visual comfort and compatibility needed for next-generation VR experiences.
- Revolutionized game design workflows. Radeon Vega Frontier Edition simplifies and accelerates game creation by providing a single GPU optimized for every stage of a game developer’s workflow, from asset production to playtesting and performance optimization."
From the image provided on the official product page it appears that there will be both liquid-cooled (the gold card in the background) and air-cooled variants of these "Frontier Edition" cards, which AMD states will arrive with 16GB of HBM2 and offer 1.5x the FP32 performance and 3x the FP16 performance of the Fury X.
Radeon Vega Frontier Edition
- Compute units: 64
- Single precision compute performance (FP32): ~13 TFLOPS
- Half precision compute performance (FP16): ~25 TFLOPS
- Pixel Fillrate: ~90 Gpixels/sec
- Memory capacity: 16 GBs of High Bandwidth Cache
- Memory bandwidth: ~480 GBs/sec
The availability of the Radeon Vega Frontier Edition was announced as "late June", so we should not have too long to wait for further details, including pricing.
Subject: Graphics Cards | May 13, 2017 - 11:46 PM | Tim Verry
Tagged: SFF, pascal, nvidia, Inno3D, GP107
Hong Kong based Inno3D recently introduced a single slot graphics card using NVIDIA’s mid-range GTX 1050 Ti GPU. The aptly named Inno3D GeForce GTX 1050 Ti (1-Slot Edition) combines the reference clocked Pascal GPU, 4GB of GDDR5 memory, and a shrouded single fan cooler clad in red and black.
Around back, the card offers three display outputs including a HDMI 2.0, DisplayPort 1.4, and DVI-D. The single slot cooler is a bit of an odd design with an thin axial fan rather than a centrifugal type that sits over a fake plastic fin array. Note that these fins do not actually cool anything, in fact the PCB of the card does not even extend out to where the fan is; presumably the fins are there primarily for aesthetics and secondarily to channel a bit of the air the fan pulls down. Air is pulled in and pushed over the actual GPU heatsink (under the shroud) and out the vent holes next to the display connectors. Air is circulated through the case and is not actually exhausted like traditional dual slot (and some single slot) designs. I am curious how the choice of fan and vents will affect cooling performance.
Overclocking is going to be limited on this card, and it comes out-of-the-box clocked at NVIDIA reference speeds of 1290 MHz base and 1392 MHz boost for the GPU’s 768 cores and 7 GT/s for the 4GB of GDDR5 memory. The card measures 211 mm (~8.3”) long and should fit in just about any case. Since it pulls all of its power from the slot, it might be a good option for those slim towers OEMs like to use these days to get a bit of gaming out of a retail PC.
Inno3D is not yet talking availability or pricing, but looking at there existing lineup I would expect a MSRP around $150.
Subject: Graphics Cards | May 10, 2017 - 01:32 PM | Ryan Shrout
Tagged: v100, tesla, nvidia, gv100, gtc 2017
During the opening keynote to NVIDIA’s GPU Technology Conference, CEO Jen-Hsun Huang formally unveiled the latest GPU architecture and the first product based on it. The Tesla V100 accelerator is based on the Volta GPU architecture and features some amazingly impressive specifications. Let’s take a look.
|Tesla V100||GTX 1080 Ti||Titan X (Pascal)||GTX 1080||GTX 980 Ti||TITAN X||GTX 980||R9 Fury X||R9 Fury|
|GPU||GV100||GP102||GP102||GP104||GM200||GM200||GM204||Fiji XT||Fiji Pro|
|Base Clock||-||1480 MHz||1417 MHz||1607 MHz||1000 MHz||1000 MHz||1126 MHz||1050 MHz||1000 MHz|
|Boost Clock||1455 MHz||1582 MHz||1480 MHz||1733 MHz||1076 MHz||1089 MHz||1216 MHz||-||-|
|ROP Units||128 (?)||88||96||64||96||96||64||64||64|
|Memory Clock||878 MHz (?)||11000 MHz||10000 MHz||10000 MHz||7000 MHz||7000 MHz||7000 MHz||500 MHz||500 MHz|
|Memory Interface||4096-bit (HBM2)||352-bit||384-bit G5X||256-bit G5X||384-bit||384-bit||256-bit||4096-bit (HBM)||4096-bit (HBM)|
|Memory Bandwidth||900 GB/s||484 GB/s||480 GB/s||320 GB/s||336 GB/s||336 GB/s||224 GB/s||512 GB/s||512 GB/s|
|TDP||300 watts||250 watts||250 watts||180 watts||250 watts||250 watts||165 watts||275 watts||275 watts|
|Peak Compute||15 TFLOPS||10.6 TFLOPS||10.1 TFLOPS||8.2 TFLOPS||5.63 TFLOPS||6.14 TFLOPS||4.61 TFLOPS||8.60 TFLOPS||7.20 TFLOPS|
While we are low on details today, it appears that the fundamental compute units of Volta are similar to that of Pascal. The GV100 has 80 SMs with 40 TPCs and 5120 total CUDA cores, a 42% increase over the GP100 GPU used on the Tesla P100 and 42% more than the GP102 GPU used on the GeForce GTX 1080 Ti. The structure of the GPU remains the same GP100 with the CUDA cores organized as 64 single precision (FP32) per SM and 32 double precision (FP64) per SM.
Click to Enlarge
Interestingly, NVIDIA has already told us the clock speed of this new product as well, coming in at 1455 MHz Boost, more than 100 MHz lower than the GeForce GTX 1080 Ti and 25 MHz lower than the Tesla P100.
Click to Enlarge
Volta adds in support for a brand new compute unit though, known as Tensor Cores. With 640 of these on the GPU die, NVIDIA directly targets the neural network and deep learning fields. If this is your first time hearing about Tensor, you should read up on its influence on the hardware markets, bringing forth an open-source software library for machine learning. Google has invested in a Tensor-specific processor already, and now NVIDIA throws its hat in the ring.
Adding Tensor Cores to Volta allows the GPU to do mass processing for deep learning, on the order of a 12x improvement over Pascal’s capabilities using CUDA cores only.
For users interested in standard usage models, including gaming, the GV100 GPU offers 1.5x improvement in FP32 computing, up to 15 TFLOPS of theoretical performance and 7.5 TFLOPS of FP64. Other relevant specifications include 320 texture units, a 4096-bit HBM2 memory interface and 16GB of memory on-module. NVIDIA claims a memory bandwidth of 900 GB/s which works out to 878 MHz per stack.
Maybe more impressive is the transistor count: 21.1 BILLION! NVIDIA claims that this is the largest chip you can make physically with today’s technology. Considering it is being built on TSMC's 12nm FinFET technology and has an 815 mm2 die size, I see no reason to doubt them.
Shipping is scheduled for Q3 for Tesla V100 – at least that is when NVIDIA is promising the DXG-1 system using the chip is promised to developers.
I know many of you are interested in the gaming implications and timelines – sorry, I don’t have an answer for you yet. I will say that the bump from 10.6 TFLOPS to 15 TFLOPS is an impressive boost! But if the server variant of Volta isn’t due until Q3 of this year, I find it hard to think NVIDIA would bring the consumer version out faster than that. And whether or not NVIDIA offers gamers the chip with non-HBM2 memory is still a question mark for me and could directly impact performance and timing.
Subject: Graphics Cards | May 10, 2017 - 07:02 AM | Scott Michaud
Tagged: vrworks, nvidia, audio
GPUs are good at large bundles of related tasks, saving die area by tying several chunks of data together. This is commonly used for graphics, where screens have two-to-eight million (1080p to 4K) pixels, 3d models have thousands to millions of vertexes, and so forth. Each instruction is probably done hundreds, thousands, or millions of times, and so parallelism greatly helps with utilizing real-world matter to store and translate this data.
Audio is another area with a lot of parallelism. A second of audio has tens of thousands of sound pressure samples, but another huge advantage is that higher frequency sounds model pretty decently as rays, which can be traced. NVIDIA decided to repurpose their OptiX technology into calculating these rays. Beyond the architecture demo that you often see in global illumination demos, they also integrated it into an Unreal Tournament test map.
And now it’s been released, both as a standalone SDK and as an Unreal Engine 4.15 plug-in. I don’t know what its license specifically entails, because the source code requires logging into NVIDIA’s developer portal, but it looks like the plug-ins will be available to all users of supported engines.