Linked Multi-GPU Arrives... for Developers
The Khronos Group has released the Vulkan 22.214.171.124 specification, which includes experimental (more on that in a couple of paragraphs) support for VR enhancements, sharing resources between processes, and linking similar GPUs. This spec was released alongside a LunarG SDK and NVIDIA drivers, which are intended for developers, not gamers, that fully implement these extensions.
I would expect that the most interesting feature is experimental support for linking similar GPUs together, similar to DirectX 12’s Explicit Linked Multiadapter, which Vulkan calls a “Device Group”. The idea is that the physical GPUs hidden behind this layer can do things like share resources, such as rendering a texture on one GPU and consuming it in another, without the host code being involved. I’m guessing that some studios, like maybe Oxide Games, will decide to not use this feature. While it’s not explicitly stated, I cannot see how this (or DirectX 12’s Explicit Linked mode) would be compatible in cross-vendor modes. Unless I’m mistaken, that would require AMD, NVIDIA, and/or Intel restructuring their drivers to inter-operate at this level. Still, the assumptions that could be made with grouped devices are apparently popular with enough developers for both the Khronos Group and Microsoft to bother.
A slide from Microsoft's DirectX 12 reveal, long ago.
As for the “experimental” comment that I made in the introduction... I was expecting to see this news around SIGGRAPH, which occurs in late-July / early-August, alongside a minor version bump (to Vulkan 1.1).
I might still be right, though.
The major new features of Vulkan 126.96.36.199 are implemented as a new classification of extensions: KHX. In the past, vendors, like NVIDIA and AMD, would add new features as vendor-prefixed extensions. Games could query the graphics driver for these abilities, and enable them if available. If these features became popular enough for multiple vendors to have their own implementation of it, a committee would consider an EXT extension. This would behave the same across all implementations (give or take) but not be officially adopted by the Khronos Group. If they did take it under their wing, it would be given a KHR extension (or added as a required feature).
The Khronos Group has added a new layer: KHX. This level of extension sits below KHR, and is not intended for production code. You might see where this is headed. The VR multiview, multi-GPU, and cross-process extensions are not supposed to be used in released video games until they leave KHX status. Unlike a vendor extension, the Khronos Group wants old KHX standards to drop out of existence at some point after they graduate to full KHR status. It’s not something that NVIDIA owns and will keep it around for 20 years after its usable lifespan just so old games can behave expectedly.
How long will that take? No idea. I’ve already mentioned my logical but uneducated guess a few paragraphs ago, but I’m not going to repeat it; I have literally zero facts to base it on, and I don’t want our readers to think that I do. I don’t. It’s just based on what the Khronos Group typically announces at certain trade shows, and the length of time since their first announcement.
The benefit that KHX does bring us is that, whenever these features make it to public release, developers will have already been using it... internally... since around now. When it hits KHR, it’s done, and anyone can theoretically be ready for it when that time comes.
Zen vs. 40 Years of CPU Development
Zen is nearly upon us. AMD is releasing its next generation CPU architecture to the world this week and we saw CPU demonstrations and upcoming AM4 motherboards at CES in early January. We have been shown tantalizing glimpses of the performance and capabilities of the “Ryzen” products that will presumably fill the desktop markets from $150 to $499. I have yet to be briefed on the product stack that AMD will be offering, but we know enough to start to think how positioning and placement will be addressed by these new products.
To get a better understanding of how Ryzen will stack up, we should probably take a look back at what AMD has accomplished in the past and how Intel has responded to some of the stronger products. AMD has been in business for 47 years now and has been a major player in semiconductors for most of that time. It really has only been since the 90s where AMD started to battle Intel head to head that people have become passionate about the company and their products.
The industry is a complex and ever-shifting one. AMD and Intel have been two stalwarts over the years. Even though AMD has had more than a few challenging years over the past decade, it still moves forward and expects to compete at the highest level with its much larger and better funded competitor. 2017 could very well be a breakout year for the company with a return to solid profitability in both CPU and GPU markets. I am not the only one who thinks this considering that AMD shares that traded around the $2 mark ten months ago are now sitting around $14.
AMD Through 1996
AMD became a force in the CPU industry due to IBM’s requirement to have a second source for its PC business. Intel originally entered into a cross licensing agreement with AMD to allow it to produce x86 chips based on Intel designs. AMD eventually started to produce their own versions of these parts and became a favorite in the PC clone market. Eventually Intel tightened down on this agreement and then cancelled it, but through near endless litigation AMD ended up with a x86 license deal with Intel.
AMD produced their own Am286 chip that was the first real break from the second sourcing agreement with Intel. Intel balked at sharing their 386 design with AMD and eventually forced the company to develop its own clean room version. The Am386 was released in the early 90s, well after Intel had been producing those chips for years. AMD then developed their own version of the Am486 which then morphed into the Am5x86. The company made some good inroads with these speedy parts and typically clocked them faster than their Intel counterparts (eg. Am486 40 MHz and 80 MHz vs. the Intel 486 DX33 and DX66). AMD priced these points lower so users could achieve better performance per dollar using the same chipsets and motherboards.
Intel released their first Pentium chips in 1993. The initial version was hot and featured the infamous FDIV bug. AMD made some inroads against these parts by introducing the faster Am486 and Am5x86 parts that would achieve clockspeeds from 133 MHz to 150 MHz at the very top end. The 150 MHz part was very comparable in overall performance to the Pentium 75 MHz chip and we saw the introduction of the dreaded “P-rating” on processors.
There is no denying that Intel continued their dominance throughout this time by being the gold standard in x86 manufacturing and design. AMD slowly chipped away at its larger rival and continued to profit off of the lucrative x86 market. William Sanders III set the bar higher about where he wanted the company to go and he started on a much more aggressive path than many expected the company to take.
Subject: Processors | February 23, 2017 - 11:07 AM | Jeremy Hellstrom
Tagged: Intel, Skylake, kaby lake, delidding, relidding
[H]ard|OCP have been spending a lot of time removing the integrated heatspreader on recent Intel chips to see what effect it has on temperatures under load. Along the way we picked up tips on 3D printing a delidder and thankfully there was not much death along the way. One of their findings from this testing was that it can be beneficial to reattach the lid after changing out the thermal interface material and they have published a guide on how to do so. You will need a variety of tools, from Permatex Red RTV to razor blades, by way of isopropyl alcohol and syringes; as well as a steady hand. You may have many of the items on hand already and none are exceptionally expensive.
"So we have covered a lot about taking your shiny new Intel CPUs apart lately, affectionately known as "delidding." What we have found in our journey is that "relidding" the processor might be an important part of the process as well. But what if you do not have a fancy tool that will help you put Humpty back together again?"
Here are some more Processor articles from around the web:
- Intel Kaby Lake i5-7600K CPU Re-Lid Overclocking @ [H]ard|OCP
- Windows 10 vs. Ubuntu Linux OpenGL Benchmarks With A Core i7 7700K @ Phoronix
- Intel Core i3 2100 Sandy Bridge vs. Core i3 7100 Kabylake Performance @ Phoronix
- Pentium G4500 @ Hardware Secrets
Subject: General Tech, Storage | February 21, 2017 - 07:14 PM | Tim Verry
Tagged: Optane, kaby lake, Intel, 3D XPoint
Intel has announced that its Optane memory will require an Intel Kaby Lake processor to function. While previous demonstrations of the technology used an Intel Skylake processor, it appears this configuration will not be possible on the consumer versions of the technology.
Further, the consumer application accelerator drives will also require a 200-series chipset motherboard, and either a M.2 2280-S1-B-M or M.2 2242-S1-B-M connector with two or four PCI-E lanes. Motherboards will have to support NVMe v1.1 and Intel RST (Rapid Storage Technology) 15.5 or newer.
It is not clear why Intel is locking Optane technology to Kaby Lake and whether it is due to technical limitations that they were not able to resolve to keep Skylake compatible or if it is just a matter of not wanting to support the older platform and focus on its new Kaby Lake processors. As such, Kaby Lake is now required if you want UHD Blu Ray playback and Optane 3D XPoint SSDs.
What are your thoughts on this latest bit of Optane news? Has Intel sweetened the pot enough to encourage upgrade hold outs?
- A Closer Look at Intel's Optane SSD DC P4800X Enterprise SSD Performance
- Intel Quietly Launches Official Optane Memory Site
- The Intel Core i7-7700K Review - Kaby Lake and 14nm+
Subject: Editorial | February 16, 2017 - 01:36 PM | Alex Lustenberg
Tagged: Zen, Z170, webkit, webgpu, podcast, Optane, nvidia, Intel, icx, evga, ECS, crucial, Blender, anidees, amd
PC Perspective Podcast #437 - 02/16/17
Join us for EVGA iCX, Zen Architechure, Intel Optane, new NVIDIA and AMD driver releases, and more!
The URL for the podcast is: http://pcper.com/podcast - Share with your friends!
- iTunes - Subscribe to the podcast directly through the iTunes Store (audio only)
- Google Play - Subscribe to our audio podcast directly through Google Play!
- RSS - Subscribe through your regular RSS reader (audio only)
- MP3 - Direct download link to the MP3 file
Hosts: Allyn Malventano, Ken Addison, Josh Walrath, Jermey Hellstrom
Program length: 1:32:21
Podcast topics of discussion:
Week in Review:
News items of interest:
Hardware/Software Picks of the Week
Subject: Storage | February 15, 2017 - 08:58 PM | Allyn Malventano
Tagged: XPoint, ssd, Optane, memory, Intel, cache
We now have an actual Optane landing page on the Intel site that discusses the first iteration of 'Intel Optane Memory', which appears to be the 8000p Series that we covered last October and saw as an option on some upcoming Lenovo laptops. The site does not cover the upcoming enterprise parts like the 375GB P4800X, but instead, focuses on the far smaller 16GB and 32GB 'System Accelerator' M.2 modules.
Despite using only two lanes of PCIe 3.0, these modules turn in some impressive performance, but the capacities when using only one or two (16GB each) XPoint dies preclude an OS install. Instead, these will be used, presumably in combination with a newer form of Intel's Rapid Storage Technology driver, as a caching layer meant as an HDD accelerator:
While the random write performance and endurance of these parts blow any NAND-based SSD out of the water, the 2-lane bottleneck holds them back compared to high-end NVMe NAND SSDs, so we will likely see this first consumer iteration of Intel Optane Memory in OEM systems equipped with hard disks as their primary storage. A very quick 32GB caching layer should help speed things up considerably for the majority of typical buyers of these types of mobile and desktop systems, while still keeping the total cost below that for a decent capacity NAND SSD as primary storage. Hey, if you can't get every vendor to switch to pure SSD, at least you can speed up that spinning rust a bit, right?
Subject: General Tech | February 15, 2017 - 01:29 PM | Jeremy Hellstrom
Tagged: vulkan, Intel, Intel Skylake, kaby lake
The open source API, Vulkan, just received a big birthday present from Intel as they added official support on their Skylake and Kaby Lake CPUs under Windows 10. We have seen adoption of this API from a number of game engine designers, Unreal Engine and Unity have both embraced it, the latest DOOM release was updated to support Vulkan and there is even a Nintendo 64 renderer which runs on it. Ars Technica points out that both AMD and NVIDIA have been supporting this API for a while and that we can expect to see Android implementations of this close to the metal solution in the near future.
"After months in beta, Intel's latest driver for its integrated GPUs (version 188.8.131.5290) adds support for the low-overhead Vulkan API for recent GPUs running in Windows 10. The driver supports HD and Iris 500- and 600-series GPUs, the ones that ship with 6th- and 7th-generation Skylake and Kaby Lake processors."
Here is some more Tech News from around the web:
- XPoint: Leaked Intel specs reveal 'launch-ready' SSD – report @ The Register
- Patch Tuesday put on hold, SMB zero-day exploit likely to blame @ The Inquirer
- Roses are red, violets are blue, fake-news-detecting AI is fake news, too @ The Register
- Google's Not-so-secret New OS @ Slashdot
- Linksys Velop Mesh Wi-Fi Router System @ Custom PC Review
Subject: General Tech | February 13, 2017 - 12:59 PM | Jeremy Hellstrom
Tagged: acronis, caringo, Cisco, fujitsu, Intel, mimecast
The Register received more than a few tidbits of news from a wide array of storage companies, which they have condensed in this post. Acronis have released new versions of their Backup suite and True Image, with the Backup suite now able to capture Office 365 mailboxes. Cisco have released a product which allows you to have your onsite cloud run like Azure while Fujitsu announced their mid-range ETERNUS AF650 all-flash array. Intel have updated their implementation of the open source Lustre parallel file system for supercomputers and several companies released earning data, though Mimecast wished their news was better.
"Incoming! Boom, boom and boom again – storage news announcements hit the wires in a relentless barrage. Here's a few we've received showing developments in data protection, cloud storage, hyper-converged storage, the dregs of flash memory and more."
Here is some more Tech News from around the web:
- 1 of 5 Intel's 8th-gen 'Coffee Lake' chips will be 14nm, not 10nm @ The Inquirer
- Google has a canary problem: One clocked off and crocked its cloud @ The Register
- Ping Pong Ball Improves the Google Daydream Controller @ Hack a Day
- Munich looks to ditch its Linux infrastructure and bring back Windows @ The Inquirer
- That guy using a Surface you keep seeing around town could be a spook @ The Register
- DeepMind AI learns to act aggressive when it doesn't get its way @ The Inquirer
- Good in a Pinch: The Physics of Crimped Connections @ Hack a Day
- Macs don't get viruses? Hahaha, ha... seriously though, that Word doc could be malware @ The Register
Subject: Editorial | February 9, 2017 - 06:59 PM | Josh Walrath
Tagged: TSMC, Samsung, Results, quadro, Q4, nvidia, Intel, geforce, Drive PX2, amd, 2017, 2016
It is most definitely quarterly reports time for our favorite tech firms. NVIDIA’s is unique with their fiscal vs. calendar year as compared to how AMD and Intel report. This has to do when NVIDIA had their first public offering and set the fiscal quarters ahead quite a few months from the actual calendar. So when NVIDIA announces Q4 2017, it is actually reflecting the Q4 period in 2016. Clear as mud?
Semantics aside, NVIDIA had a record quarter. Gross revenue was an impressive $2.173 billion US. This is up slightly more than $700 million from the previous Q4. NVIDIA has shown amazing growth during this time attributed to several factors. Net income (GAAP) is at $655 million. This again is a tremendous amount of profit for a company that came in just over $2 billion in revenue. We can compare this to AMD’s results two weeks ago that hit $1.11 billion in revenue and a loss of $51 million for the quarter. Consider that AMD provides CPUs, chipsets, and GPUs to the market and is the #2 x86 manufacturer in the world.
The yearly results were just as impressive. FY 2017 featured record revenue and net income. Revenue was $6.91 billion as compare to FY 2016 at $5 billion. Net income for the year was $1.666 billion with comparison to $614 million for FY 2016. The growth for the entire year is astounding, and certainly the company had not seen an expansion like this since the early 2000s.
The core strength of the company continues to be gaming. Gaming GPUs and products provided $1.348 billion in revenue by themselves. Since the manufacturing industry was unable to provide a usable 20 nm planar product for large, complex ASICs companies such as NVIDIA and AMD were forced to innovate in design to create new products with greater feature sets and performance, all the while still using the same 28 nm process as previous products. Typically process shrinks accounted for the majority of improvements (more transistors packed into a smaller area with corresponding switching speed increases). Many users kept cards that were several years old due to there not being a huge impetus to upgrade. With the arrival of the 14 nm and 16 nm processes from Samsung and TSMC respectively, users suddenly had a very significant reason to upgrade. NVIDIA was able to address the entire market from high to low with their latest GTX 10x0 series of products. AMD on the other hand only had new products that hit the midrange and budget markets.
The next biggest area for NVIDIA is that of the datacenter. This has seen tremendous growth as compared to the other markets (except of course gaming) that NVIDIA covers. It has gone from around $97 million in Q4 2016 up to $296 million this last quarter. Tripling revenue in any one area is rare. Gaming “only” about doubled during this same time period. Deep learning and AI are two areas that required this type of compute power and NVIDIA was able to deliver a comprehensive software stack, as well as strategic partnerships that provided turnkey solutions for end users.
After datacenter we still have the visualization market based on the Quadro products. This area has not seen the dramatic growth as other aspects of the company, but it remains a solid foundation and a good money maker for the firm. The Quadro products continue to be improved upon and software support grows.
One area that promises to really explode in the next three to four years is the automotive sector. The Drive PX2 system is being integrated into a variety of cars and NVIDIA is focused on providing a solid and feature packed solution for manufacturers. Auto-pilot and “co-pilot” modes will become more and more important in upcoming models and should reach wide availability by 2020, if not a little sooner. NVIDIA is working with some of the biggest names in the industry from both automakers and parts suppliers. BMW should release a fully automated driving system later this year with their i8 series. Audi also has higher end cars in the works that will utilize NVIDIA hardware and fully automated operation. If NVIDIA continues to expand here, eventually it could become as significant a source of income as gaming is today.
There was one bit of bad news from the company. Their OEM & IP division has seen several drops over the past several quarters. NVIDIA announced that the IP licensing to Intel would be discontinued this quarter and would not be renewed. We know that AMD has entered into an agreement with Intel to provide graphics IP to the company in future parts and to cover Intel in potential licensing litigation. This was a fair amount of money per quarter for NVIDIA, but their other divisions more than made up for the loss of this particular income.
NVIDIA certainly seems to be hitting on all cylinders and is growing into markets that previously were unavailable as of five to ten years ago. They are spreading out their financial base so as to avoid boom and bust cycles of any one industry. Next quarter NVIDIA expects revenue to be down seasonally into the $1.9 billion range. Even though that number is down, it would still represent the 3rd highest quarterly revenue.
Subject: Processors | February 8, 2017 - 09:38 PM | Josh Walrath
Tagged: Zen, Skylake, Samsung, ryzen, kaby lake, ISSCC, Intel, GLOBALFOUNDRIES, amd, AM4, 14 nm FinFET
Yesterday EE Times posted some interesting information that they had gleaned at ISSCC. AMD released a paper describing the design process and advances they were able to achieve with the Zen architecture manufactured on Samsung’s/GF’s 14nm FinFETT process. AMD went over some of the basic measurements at the transistor scale and how it compares to what Intel currently has on their latest 14nm process.
The first thing that jumps out is that AMD claimes that their 4 core/8 thread x86 core is about 10% smaller than what Intel has with one of their latest CPUs. We assume it is either Kaby Lake or Skylake. AMD did not exactly go over exactly what they were counting when looking at the cores because there are some significant differences between the two architectures. We are not sure if that 44mm sq. figure includes the L3 cache or the L2 caches. My guess is that it probably includes L2 cache but not L3. I could be easily wrong here.
Going down the table we see that AMD and Samsung/GF are able to get their SRAM sizes down smaller than what Intel is able to do. AMD has double the amount of L2 cache per core, but it is only about 60% larger than Intel’s 256 KB L2. AMD also has a much smaller L3 cache as well than Intel. Both are 8 MB units but AMD comes in at 16 mm sq. while Intel is at 19.1 mm sq. There will be differences in how AMD and Intel set up these caches, and until we see L3 performance comparisons we cannot assume too much.
(Image courtesy of ISSCC)
In some of the basic measurements of the different processes we see that Intel has advantages throughout. This is not surprising as Intel has been well known to push process technology beyond what others are able to do. In theory their products will have denser logic throughout, including the SRAM cells. When looking at this information we wonder how AMD has been able to make their cores and caches smaller. Part of that is due to the likely setup of cache control and access.
One of the most likely culprits of this smaller size is that the less advanced FPU/SSE/AVX units that AMD has in Zen. They support AVX-256, but it has to be done in double the cycles. They can do single cycle AVX-128, but Intel’s throughput is much higher than what AMD can achieve. AVX is not the end-all, be-all but it is gaining in importance in high performance computing and editing applications. David Kanter in his article covering the architecture explicitly said that AMD made this decision to lower the die size and power constraints for this product.
Ryzen will undoubtedly be a pretty large chip overall once both modules and 16 MB of L3 cache are put together. My guess would be in the 220 mm sq. range, but again that is only a guess once all is said and done (northbridge, southbridge, PCI-E controllers, etc.). What is perhaps most interesting of it all is that AMD has a part that on the surface is very close to the Broadwell-E based Intel i7 chips. The i7-6900K runs at 3.2 to 3.7 GHz, features 8 cores and 16 threads, and around 20 MB of L2/L3 cache. AMD’s top end looks to run at 3.6 GHz, features the same number of cores and threads, and has 20 MB of L2/L3 cache. The Intel part is rated at 140 watts TDP while the AMD part will have a max of 95 watts TDP.
If Ryzen is truly competitive in this top end space (with a price to undercut Intel, yet not destroy their own margins) then AMD is going to be in a good position for the rest of this year. We will find out exactly what is coming our way next month, but all indications point to Ryzen being competitive in overall performance while being able to undercut Intel in TDPs for comparable cores/threads. We are counting down the days...