Linked Multi-GPU Arrives... for Developers

The Khronos Group has released the Vulkan 1.0.42.0 specification, which includes experimental (more on that in a couple of paragraphs) support for VR enhancements, sharing resources between processes, and linking similar GPUs. This spec was released alongside a LunarG SDK and NVIDIA drivers, which are intended for developers, not gamers, that fully implement these extensions.

I would expect that the most interesting feature is experimental support for linking similar GPUs together, similar to DirectX 12’s Explicit Linked Multiadapter, which Vulkan calls a “Device Group”. The idea is that the physical GPUs hidden behind this layer can do things like share resources, such as rendering a texture on one GPU and consuming it in another, without the host code being involved. I’m guessing that some studios, like maybe Oxide Games, will decide to not use this feature. While it’s not explicitly stated, I cannot see how this (or DirectX 12’s Explicit Linked mode) would be compatible in cross-vendor modes. Unless I’m mistaken, that would require AMD, NVIDIA, and/or Intel restructuring their drivers to inter-operate at this level. Still, the assumptions that could be made with grouped devices are apparently popular with enough developers for both the Khronos Group and Microsoft to bother.

microsoft-dx12-build15-linked.png

A slide from Microsoft's DirectX 12 reveal, long ago.

As for the “experimental” comment that I made in the introduction... I was expecting to see this news around SIGGRAPH, which occurs in late-July / early-August, alongside a minor version bump (to Vulkan 1.1).

I might still be right, though.

The major new features of Vulkan 1.0.42.0 are implemented as a new classification of extensions: KHX. In the past, vendors, like NVIDIA and AMD, would add new features as vendor-prefixed extensions. Games could query the graphics driver for these abilities, and enable them if available. If these features became popular enough for multiple vendors to have their own implementation of it, a committee would consider an EXT extension. This would behave the same across all implementations (give or take) but not be officially adopted by the Khronos Group. If they did take it under their wing, it would be given a KHR extension (or added as a required feature).

The Khronos Group has added a new layer: KHX. This level of extension sits below KHR, and is not intended for production code. You might see where this is headed. The VR multiview, multi-GPU, and cross-process extensions are not supposed to be used in released video games until they leave KHX status. Unlike a vendor extension, the Khronos Group wants old KHX standards to drop out of existence at some point after they graduate to full KHR status. It’s not something that NVIDIA owns and will keep it around for 20 years after its usable lifespan just so old games can behave expectedly.

khronos-group-logo.png

How long will that take? No idea. I’ve already mentioned my logical but uneducated guess a few paragraphs ago, but I’m not going to repeat it; I have literally zero facts to base it on, and I don’t want our readers to think that I do. I don’t. It’s just based on what the Khronos Group typically announces at certain trade shows, and the length of time since their first announcement.

The benefit that KHX does bring us is that, whenever these features make it to public release, developers will have already been using it... internally... since around now. When it hits KHR, it’s done, and anyone can theoretically be ready for it when that time comes.

Author:
Subject: Editorial
Manufacturer: AMD

Zen vs. 40 Years of CPU Development

Zen is nearly upon us.  AMD is releasing its next generation CPU architecture to the world this week and we saw CPU demonstrations and upcoming AM4 motherboards at CES in early January.  We have been shown tantalizing glimpses of the performance and capabilities of the “Ryzen” products that will presumably fill the desktop markets from $150 to $499.  I have yet to be briefed on the product stack that AMD will be offering, but we know enough to start to think how positioning and placement will be addressed by these new products.

zen_01.jpg

To get a better understanding of how Ryzen will stack up, we should probably take a look back at what AMD has accomplished in the past and how Intel has responded to some of the stronger products.  AMD has been in business for 47 years now and has been a major player in semiconductors for most of that time.  It really has only been since the 90s where AMD started to battle Intel head to head that people have become passionate about the company and their products.

The industry is a complex and ever-shifting one.  AMD and Intel have been two stalwarts over the years.  Even though AMD has had more than a few challenging years over the past decade, it still moves forward and expects to compete at the highest level with its much larger and better funded competitor.  2017 could very well be a breakout year for the company with a return to solid profitability in both CPU and GPU markets.  I am not the only one who thinks this considering that AMD shares that traded around the $2 mark ten months ago are now sitting around $14.

 

AMD Through 1996

AMD became a force in the CPU industry due to IBM’s requirement to have a second source for its PC business.  Intel originally entered into a cross licensing agreement with AMD to allow it to produce x86 chips based on Intel designs.  AMD eventually started to produce their own versions of these parts and became a favorite in the PC clone market.  Eventually Intel tightened down on this agreement and then cancelled it, but through near endless litigation AMD ended up with a x86 license deal with Intel.

AMD produced their own Am286 chip that was the first real break from the second sourcing agreement with Intel.  Intel balked at sharing their 386 design with AMD and eventually forced the company to develop its own clean room version.  The Am386 was released in the early 90s, well after Intel had been producing those chips for years. AMD then developed their own version of the Am486 which then morphed into the Am5x86.  The company made some good inroads with these speedy parts and typically clocked them faster than their Intel counterparts (eg. Am486 40 MHz and 80 MHz vs. the Intel 486 DX33 and DX66).  AMD priced these points lower so users could achieve better performance per dollar using the same chipsets and motherboards.

zen_02.jpg

Intel released their first Pentium chips in 1993.  The initial version was hot and featured the infamous FDIV bug.  AMD made some inroads against these parts by introducing the faster Am486 and Am5x86 parts that would achieve clockspeeds from 133 MHz to 150 MHz at the very top end.  The 150 MHz part was very comparable in overall performance to the Pentium 75 MHz chip and we saw the introduction of the dreaded “P-rating” on processors.

There is no denying that Intel continued their dominance throughout this time by being the gold standard in x86 manufacturing and design.  AMD slowly chipped away at its larger rival and continued to profit off of the lucrative x86 market.  William Sanders III set the bar higher about where he wanted the company to go and he started on a much more aggressive path than many expected the company to take.

Click here to read the rest of the AMD processor editorial!

Flipped your lid and want to reattach it?

Subject: Processors | February 23, 2017 - 11:07 AM |
Tagged: Intel, Skylake, kaby lake, delidding, relidding

[H]ard|OCP have been spending a lot of time removing the integrated heatspreader on recent Intel chips to see what effect it has on temperatures under load.  Along the way we picked up tips on 3D printing a delidder and thankfully there was not much death along the way.  One of their findings from this testing was that it can be beneficial to reattach the lid after changing out the thermal interface material and they have published a guide on how to do so.   You will need a variety of tools, from Permatex Red RTV to razor blades, by way of isopropyl alcohol and syringes; as well as a steady hand.  You may have many of the items on hand already and none are exceptionally expensive.

1487134654mHmb7IfVSy_1_10_l.jpg

"So we have covered a lot about taking your shiny new Intel CPUs apart lately, affectionately known as "delidding." What we have found in our journey is that "relidding" the processor might be an important part of the process as well. But what if you do not have a fancy tool that will help you put Humpty back together again?"

Here are some more Processor articles from around the web:

Processors

Source: [H]ard|OCP

Intel Details Optane Memory System Requirements

Subject: General Tech, Storage | February 21, 2017 - 07:14 PM |
Tagged: Optane, kaby lake, Intel, 3D XPoint

Intel has announced that its Optane memory will require an Intel Kaby Lake processor to function. While previous demonstrations of the technology used an Intel Skylake processor, it appears this configuration will not be possible on the consumer versions of the technology.

Intel Optane App Accelerator.jpg

Further, the consumer application accelerator drives will also require a 200-series chipset motherboard, and either a M.2 2280-S1-B-M or M.2 2242-S1-B-M connector with two or four PCI-E lanes. Motherboards will have to support NVMe v1.1 and Intel RST (Rapid Storage Technology) 15.5 or newer.

It is not clear why Intel is locking Optane technology to Kaby Lake and whether it is due to technical limitations that they were not able to resolve to keep Skylake compatible or if it is just a matter of not wanting to support the older platform and focus on its new Kaby Lake processors. As such, Kaby Lake is now required if you want UHD Blu Ray playback and Optane 3D XPoint SSDs.

What are your thoughts on this latest bit of Optane news? Has Intel sweetened the pot enough to encourage upgrade hold outs?

Also Read: 

 

Source: Bit-Tech

Podcast #437 - EVGA iCX, Zen Architecture, Optane, and more!

Subject: Editorial | February 16, 2017 - 01:36 PM |
Tagged: Zen, Z170, webkit, webgpu, podcast, Optane, nvidia, Intel, icx, evga, ECS, crucial, Blender, anidees, amd

PC Perspective Podcast #437 - 02/16/17

Join us for EVGA iCX, Zen Architechure, Intel Optane, new NVIDIA and AMD driver releases, and more!

You can subscribe to us through iTunes and you can still access it directly through the RSS page HERE.

The URL for the podcast is: http://pcper.com/podcast - Share with your friends!

Hosts: Allyn Malventano, Ken Addison, Josh Walrath, Jermey Hellstrom

Program length: 1:32:21

Source:

Intel Quietly Launches Official Optane Memory Site

Subject: Storage | February 15, 2017 - 08:58 PM |
Tagged: XPoint, ssd, Optane, memory, Intel, cache

We've been hearing a lot about Intel's upcoming Optane memory over the past two years, but the information had all been in the form of press announcements and leaked roadmap slides.

optane-memory-marquee-16x9.png.rendition.intel_.web_.1072.603.png

We now have an actual Optane landing page on the Intel site that discusses the first iteration of 'Intel Optane Memory', which appears to be the 8000p Series that we covered last October and saw as an option on some upcoming Lenovo laptops. The site does not cover the upcoming enterprise parts like the 375GB P4800X, but instead, focuses on the far smaller 16GB and 32GB 'System Accelerator' M.2 modules.

intel-optane-memory-8000p.jpg

Despite using only two lanes of PCIe 3.0, these modules turn in some impressive performance, but the capacities when using only one or two (16GB each) XPoint dies preclude an OS install. Instead, these will be used, presumably in combination with a newer form of Intel's Rapid Storage Technology driver, as a caching layer meant as an HDD accelerator:

While the random write performance and endurance of these parts blow any NAND-based SSD out of the water, the 2-lane bottleneck holds them back compared to high-end NVMe NAND SSDs, so we will likely see this first consumer iteration of Intel Optane Memory in OEM systems equipped with hard disks as their primary storage. A very quick 32GB caching layer should help speed things up considerably for the majority of typical buyers of these types of mobile and desktop systems, while still keeping the total cost below that for a decent capacity NAND SSD as primary storage. Hey, if you can't get every vendor to switch to pure SSD, at least you can speed up that spinning rust a bit, right?

Source: Intel

Vulkan is not extinct, in fact it might be about to erupt

Subject: General Tech | February 15, 2017 - 01:29 PM |
Tagged: vulkan, Intel, Intel Skylake, kaby lake

The open source API, Vulkan, just received a big birthday present from Intel as they added official support on their Skylake and Kaby Lake CPUs under Windows 10.  We have seen adoption of this API from a number of game engine designers, Unreal Engine and Unity have both embraced it, the latest DOOM release was updated to support Vulkan and there is even a Nintendo 64 renderer which runs on it.  Ars Technica points out that both AMD and NVIDIA have been supporting this API for a while and that we can expect to see Android implementations of this close to the metal solution in the near future.

khronos-2016-vulkanlogo2.png

"After months in beta, Intel's latest driver for its integrated GPUs (version 15.45.14.4590) adds support for the low-overhead Vulkan API for recent GPUs running in Windows 10. The driver supports HD and Iris 500- and 600-series GPUs, the ones that ship with 6th- and 7th-generation Skylake and Kaby Lake processors."

Here is some more Tech News from around the web:

Tech Talk

Source: Ars Technica

Clearing storage newsblasts from cache

Subject: General Tech | February 13, 2017 - 12:59 PM |
Tagged: acronis, caringo, Cisco, fujitsu, Intel, mimecast

The Register received more than a few tidbits of news from a wide array of storage companies, which they have condensed in this post.  Acronis have released new versions of their Backup suite and True Image, with the Backup suite now able to capture Office 365 mailboxes.  Cisco have released a product which allows you to have your onsite cloud run like Azure while Fujitsu announced their mid-range ETERNUS AF650 all-flash array.  Intel have updated their implementation of the open source Lustre parallel file system for supercomputers and several companies released earning data, though Mimecast wished their news was better.

acronis-logo.png

"Incoming! Boom, boom and boom again – storage news announcements hit the wires in a relentless barrage. Here's a few we've received showing developments in data protection, cloud storage, hyper-converged storage, the dregs of flash memory and more."

Here is some more Tech News from around the web:

Tech Talk

Source: The Register

NVIDIA Announces Q4 2017 Results

Subject: Editorial | February 9, 2017 - 06:59 PM |
Tagged: TSMC, Samsung, Results, quadro, Q4, nvidia, Intel, geforce, Drive PX2, amd, 2017, 2016

It is most definitely quarterly reports time for our favorite tech firms.  NVIDIA’s is unique with their fiscal vs. calendar year as compared to how AMD and Intel report.  This has to do when NVIDIA had their first public offering and set the fiscal quarters ahead quite a few months from the actual calendar.  So when NVIDIA announces Q4 2017, it is actually reflecting the Q4 period in 2016.  Clear as mud?

Semantics aside, NVIDIA had a record quarter.  Gross revenue was an impressive $2.173 billion US.  This is up slightly more than $700 million from the previous Q4.  NVIDIA has shown amazing growth during this time attributed to several factors.  Net income (GAAP) is at $655 million.  This again is a tremendous amount of profit for a company that came in just over $2 billion in revenue.  We can compare this to AMD’s results two weeks ago that hit $1.11 billion in revenue and a loss of $51 million for the quarter.  Consider that AMD provides CPUs, chipsets, and GPUs to the market and is the #2 x86 manufacturer in the world.

NVLogo_2D_H.jpg

The yearly results were just as impressive.  FY 2017 featured record revenue and net income.  Revenue was $6.91 billion as compare to FY 2016 at $5 billion.  Net income for the year was $1.666 billion with comparison to $614 million for FY 2016.  The growth for the entire year is astounding, and certainly the company had not seen an expansion like this since the early 2000s.

The core strength of the company continues to be gaming.  Gaming GPUs and products provided $1.348 billion in revenue by themselves.  Since the manufacturing industry was unable to provide a usable 20 nm planar product for large, complex ASICs companies such as NVIDIA and AMD were forced to innovate in design to create new products with greater feature sets and performance, all the while still using the same 28 nm process as previous products.  Typically process shrinks accounted for the majority of improvements (more transistors packed into a smaller area with corresponding switching speed increases).  Many users kept cards that were several years old due to there not being a huge impetus to upgrade.  With the arrival of the 14 nm and 16 nm processes from Samsung and TSMC respectively, users suddenly had a very significant reason to upgrade.  NVIDIA was able to address the entire market from high to low with their latest GTX 10x0 series of products.  AMD on the other hand only had new products that hit the midrange and budget markets.

NV-Q4-2014.jpg

The next biggest area for NVIDIA is that of the datacenter.  This has seen tremendous growth as compared to the other markets (except of course gaming) that NVIDIA covers.  It has gone from around $97 million in Q4 2016 up to $296 million this last quarter.  Tripling revenue in any one area is rare.  Gaming “only” about doubled during this same time period.  Deep learning and AI are two areas that required this type of compute power and NVIDIA was able to deliver a comprehensive software stack, as well as strategic partnerships that provided turnkey solutions for end users.

After datacenter we still have the visualization market based on the Quadro products.  This area has not seen the dramatic growth as other aspects of the company, but it remains a solid foundation and a good money maker for the firm.  The Quadro products continue to be improved upon and software support grows.

One area that promises to really explode in the next three to four years is the automotive sector.  The Drive PX2 system is being integrated into a variety of cars and NVIDIA is focused on providing a solid and feature packed solution for manufacturers.  Auto-pilot and “co-pilot” modes will become more and more important in upcoming models and should reach wide availability by 2020, if not a little sooner.  NVIDIA is working with some of the biggest names in the industry from both automakers and parts suppliers.  BMW should release a fully automated driving system later this year with their i8 series.  Audi also has higher end cars in the works that will utilize NVIDIA hardware and fully automated operation.  If NVIDIA continues to expand here, eventually it could become as significant a source of income as gaming is today.

There was one bit of bad news from the company.  Their OEM & IP division has seen several drops over the past several quarters.  NVIDIA announced that the IP licensing to Intel would be discontinued this quarter and would not be renewed.  We know that AMD has entered into an agreement with Intel to provide graphics IP to the company in future parts and to cover Intel in potential licensing litigation.  This was a fair amount of money per quarter for NVIDIA, but their other divisions more than made up for the loss of this particular income.

NVIDIA certainly seems to be hitting on all cylinders and is growing into markets that previously were unavailable as of five to ten years ago.  They are spreading out their financial base so as to avoid boom and bust cycles of any one industry.  Next quarter NVIDIA expects revenue to be down seasonally into the $1.9 billion range.  Even though that number is down, it would still represent the 3rd highest quarterly revenue.

Source: NVIDIA

AMD Details Zen at ISSCC

Subject: Processors | February 8, 2017 - 09:38 PM |
Tagged: Zen, Skylake, Samsung, ryzen, kaby lake, ISSCC, Intel, GLOBALFOUNDRIES, amd, AM4, 14 nm FinFET

Yesterday EE Times posted some interesting information that they had gleaned at ISSCC.  AMD released a paper describing the design process and advances they were able to achieve with the Zen architecture manufactured on Samsung’s/GF’s 14nm FinFETT process.  AMD went over some of the basic measurements at the transistor scale and how it compares to what Intel currently has on their latest 14nm process.

icon.jpg

The first thing that jumps out is that AMD claimes that their 4 core/8 thread x86 core is about 10% smaller than what Intel has with one of their latest CPUs.  We assume it is either Kaby Lake or Skylake.  AMD did not exactly go over exactly what they were counting when looking at the cores because there are some significant differences between the two architectures.  We are not sure if that 44mm sq. figure includes the L3 cache or the L2 caches.  My guess is that it probably includes L2 cache but not L3.  I could be easily wrong here.

Going down the table we see that AMD and Samsung/GF are able to get their SRAM sizes down smaller than what Intel is able to do.  AMD has double the amount of L2 cache per core, but it is only about 60% larger than Intel’s 256 KB L2.  AMD also has a much smaller L3 cache as well than Intel.  Both are 8 MB units but AMD comes in at 16 mm sq. while Intel is at 19.1 mm sq.  There will be differences in how AMD and Intel set up these caches, and until we see L3 performance comparisons we cannot assume too much.

Zen-comparison.png

(Image courtesy of ISSCC)

In some of the basic measurements of the different processes we see that Intel has advantages throughout.  This is not surprising as Intel has been well known to push process technology beyond what others are able to do.  In theory their products will have denser logic throughout, including the SRAM cells.  When looking at this information we wonder how AMD has been able to make their cores and caches smaller.  Part of that is due to the likely setup of cache control and access.

One of the most likely culprits of this smaller size is that the less advanced FPU/SSE/AVX units that AMD has in Zen.  They support AVX-256, but it has to be done in double the cycles.  They can do single cycle AVX-128, but Intel’s throughput is much higher than what AMD can achieve.  AVX is not the end-all, be-all but it is gaining in importance in high performance computing and editing applications.  David Kanter in his article covering the architecture explicitly said that AMD made this decision to lower the die size and power constraints for this product.

Ryzen will undoubtedly be a pretty large chip overall once both modules and 16 MB of L3 cache are put together.  My guess would be in the 220 mm sq. range, but again that is only a guess once all is said and done (northbridge, southbridge, PCI-E controllers, etc.).  What is perhaps most interesting of it all is that AMD has a part that on the surface is very close to the Broadwell-E based Intel i7 chips.  The i7-6900K runs at 3.2 to 3.7 GHz, features 8 cores and 16 threads, and around 20 MB of L2/L3 cache.  AMD’s top end looks to run at 3.6 GHz, features the same number of cores and threads, and has 20 MB of L2/L3 cache.  The Intel part is rated at 140 watts TDP while the AMD part will have a max of 95 watts TDP.

If Ryzen is truly competitive in this top end space (with a price to undercut Intel, yet not destroy their own margins) then AMD is going to be in a good position for the rest of this year.  We will find out exactly what is coming our way next month, but all indications point to Ryzen being competitive in overall performance while being able to undercut Intel in TDPs for comparable cores/threads.  We are counting down the days...

Source: AMD