Review Index:

AMD Vega GPU Architecture Preview: Redesigned Memory Architecture

Primitive Shader, Tile-based Rasterization

A New Geometry Primitive Shader

With Vega GPU architecture AMD is aiming to reinvent and the geometry pipeline. One of the fundamental problems with modern GPU rasterization is the need to filter through polygons that will never be seen in order to shade only the pixel necessary to render the output for the display. AMD gives an example from a 4K scene in the latest Deus Ex that starts with 220M polygons of data though only 2M of which are viewable. That 100x reduction in a significant part of the GPU development side of things, culling and order independence and other features.

View Full Size

The new programmable geometry pipeline on Vega will offer up to 2x the peak throughput per clock compared to previous generations by utilizing a new “primitive shader.” This new shader combines the functions of vertex and geometry shader and, as AMD told it to me, “with the right knowledge” you can discard game based primitives at an incredible rate. This right knowledge though is the crucial component – it is something that has to be coded for directly and isn’t something that AMD or Vega will be able to do behind the scenes.

View Full Size

This primitive shader type could be implemented by developers by simply wrapping current vertex shader code that would speed up throughput (to that 2x rate) through recognition of the Vega 10 driver packages. Another way this could be utilized is with extensions to current APIs (Vulkan seems like an obvious choice) and the hope is that this kind of shader will be adopted and implemented officially by upcoming API revisions including the next DirectX. AMD views the primitive shader as the natural progression of the geometry engine and the end of standard vertex and geometry shaders. In the end, that will be the complication with this new feature (as well as others) – its benefit to consumers and game developers will be dependent on the integration and adoption rates from developers themselves. We have seen in the past that AMD can struggle with pushing its own standardized features on the industry (but in some cases has had success ala FreeSync).

A Next-Generation Compute Unit (NCU)

Vega will introduce a revision and update to the compute unit that AMD has been evolving over the past years under the GCN name. We really don’t know much except that single precision operations per clock per compute unit will be the same at 128. What is new is that Vega will support 16-bit (half precision) and 8-bit ops in a packed math form, essentially doubling the operation throughput at 16-bit to 256. While previous architectures supported 16-bit math previously, Vega will take optimization and throughput to a higher level, making the NCU the “most flexible mixed precision compute unit” in the industry.

View Full Size

Without more information on NCU quantities or clock speeds we cannot make any judgements on estimated GPU performance or how different the NCU will be from the current CU being used in Polaris.

View Full Size

Next-Generation Pixel Engine – A Tile-based Approach

Before the subheading above gets your knickers in a twist, let me be sure to point out that anything that happens in the new Vega Draw Space Binning Rasterizer is simply one option that the GPU will have available to it. The goal of the DSBR is to save power and improve performance in some instances. The goal of every rasterizer is to cull pixels invisible to the scene so you are only shading pixels required at display time. Vega’s new DSBR will be able to do this in a tile-based manner, a rasterization technique that has traditionally been limited to mobile SoCs like Qualcomm’s Adreno but that even NVIDIA’s Maxwell architecture implemented for desktop users.

View Full Size

The Draw Space Binning Rasterizer is using cache aware information to capture batches of primitives in a way that has two positive effects. First, you will very often find multiple hits in the same proximity and second, this creates a new way to determine which pixels to shade. This reduces access to memory and to the off-chip caches to save power. This increases effective bandwidth on lower cost parts but can reduce power on even the highest performing GPU implementations.

Render Back-Ends Gain Access to L2

The final bit of information we know about the upcoming Vega architecture revolves around the render back-ends, or ROPs. In a legacy architecture like Polaris, memory accesses for pixel and texture data were non-coherent. This means you couldn't render to a texture and read it again without re-accessing memory. This behavior is common on current gaming and in particular for VR systems like Oculus that output a final image to a texture that is then modified again by the Oculus runtime. Now the ROPs will be able to access the L2 cache, improving performance for that VR implementation as well as for any game engine that uses deferred shading.

View Full Size

The beginning of the end…

…of the beginning. Or something to that effect. Today marks the start of official data dumps about Vega 10 and its associated products. If Polaris is our guide, I would expect to see subsequent information releases by AMD to maintain a balanced level of excitement, curiosity and braggadocios behavior.

View Full Size

By far the most interesting information released today is the move to include external memory options for the GPU other than simply the on-board memory, now referred to High Bandwidth Cache (HBC). The potential for this kind of memory system is substantial though I would wager the impact on enthusiast gaming will be minimal out of the gate and for a couple of generations. For professional and enterprise use cases though, having access to a cohesive memory system that includes HBM2, flash, system and network storage could create a massive disruption in the development cycle.

View Full Size

It sounds cliché, but it continues to look like it’s going to be true, but 2017 is shaping up to be a substantially transformative year for computing and gaming and AMD is definitely going to have its say on the matter.

Video News

January 5, 2017 | 09:13 AM - Posted by KARMAAA (not verified)

First. It's amazing how much more efficient they made this.

January 5, 2017 | 09:15 AM - Posted by Lucas B. (not verified)

Waiting for more info thanks pcper for the update!

January 5, 2017 | 09:18 AM - Posted by RamGuy (not verified)

So all we get is all buzzwords and no actual details on any products? So for all we know, VEGA could be several months away? And we still have no real clue on where the performance is going to be. Will we se something that could push NVIDIA and the Titan XP or will it simply be a competitor to GTX 1080 hopefully with lower prices. Who knows??

CES sure has been a huge letdown when it comes to juicy hardware.... And nothing on Zen/Ryzen?

January 5, 2017 | 09:22 AM - Posted by Master Chen (not verified)

>I've waited for almost three years..for THIS...

January 5, 2017 | 09:32 AM - Posted by Searching4Sasquatch (not verified)

They launched Polaris at CES, and gamers didn't get cards for sale until JUNE.

Waiting for AMD fanboys to be amazed when Vega doesn't ship each month until then.

January 5, 2017 | 09:36 AM - Posted by mLocke

"The potential for this kind of memory system is substantial though I would wager the impact on enthusiast gaming will be minimal out of the gate and for a couple of generations"

Yes. All the jigglebits in the verse cannot save you from the nearly non-existent memory utilization in games.

January 5, 2017 | 09:54 AM - Posted by PcShed (not verified)

So AMD, you had the chance, the opportunity to make people believers on you again (especially after New Horizon event with Ryzen cpu) and you gone a fucked it up. What was the point of having a countdown, getting people hyped up for a GPU then nothing, nada. 6 short videos about the products but no actual product.
Another reason for Nvidia to hike their prices up again. Thanks AMD.

January 12, 2017 | 03:09 PM - Posted by nvidiaperspective (not verified)

you don't have to buy a nvidia

January 5, 2017 | 10:14 AM - Posted by Anonymous (not verified)

All the hype and this?
Smells like last minute decision to respin the silicon.

January 5, 2017 | 10:19 AM - Posted by Anonymous (not verified)

AMD should focus on launching products rather than wasting time talking about them. Zen is overdue and Vega won't matter until there is a Zen platform to run it on. On the upside, Intel flopped once again with another lame product release.

January 5, 2017 | 10:23 AM - Posted by Michael Rand (not verified)

Well this is a let down, I was sort of expecting a bit more from AMD to be honest, all we get are some videos that show a whole lot of nothing. Will Nvidia have Volta out before this or will the 1080 Ti (if it exists) be enough?

January 5, 2017 | 10:40 AM - Posted by Anonymous (not verified)

Nvidia refused to sponsor this piece ?

January 5, 2017 | 05:26 PM - Posted by Jeremy Hellstrom

Or it's not part of our CES coverage perhaps?

January 5, 2017 | 10:47 AM - Posted by Tsbod (not verified)

AMD always delivering on the HYPE but not much else!

January 5, 2017 | 12:17 PM - Posted by Anonymous (not verified)

Is that why even now my Fury X out performs a GTX 1070 in Battlefield 1 because Nvidia cant make a proper DirectX 12 card that doesn't take a performance dump DX12 is used. Hype delivered IMO.

January 12, 2017 | 03:14 PM - Posted by nvidiaperspective (not verified)

Its a shame they cost so much, i know they cost a ton to make and that's the only place the fury x lost out really, if we could just click our fingers and turn all games from dx 11 to vulkan or dx12 #amd would be seriously clouting nvidiot

January 5, 2017 | 11:00 AM - Posted by Anonymous (not verified)

I wonder if any these memory advancements will result in better performance, Fiji's HBM1 did nothing to improve performance. Volta will be waiting in the wings for this card, I hope amd's big bet pays off.

January 5, 2017 | 11:15 AM - Posted by Anonymous (not verified)

Techpowerup appears to be saying that "High Bandwidth Memory Cache isn't the same as the HBM2 memory stacks." (1)
But looking at the Vega DIE shots, I only see the Vega Die and 2 HBM2 die stacks, could this High Bandwidth Memory Cache actually be some eDRAM/other memory on the GPU's Die that is managed by the High Bandwidth Cache Controller (HBCC)? Could the Cache Memory actually be etched into the interposer's silicon itself and the interposer actually be of an active design(with Cache memory etched into it) instead of just a passive design with only traces etched into it(?).

Techpowerup is stating:

"It begins with a fast cache memory that sits at a level above the traditional L2 cache, one that is sufficiently large and has extremely low latency. This cache is a separate silicon die that sits on the interposer, the silicon substrate that connects the GPU die to the memory stacks. AMD is calling this the High Bandwidth Memory Cache (HBMC). The GPU's conventional memory controllers won't interface with this cache since a dedicated High Bandwidth Cache Controller (HBCC) on the main GPU die handles it. High Bandwidth Memory Cache isn't the same as the HBM2 memory stacks. " (1)


"AMD Radeon Vega GPU Architecture" [See page 2 of the article]

January 5, 2017 | 11:26 AM - Posted by Anonymous (not verified)

I had expected Vega to essentially be a larger design, but otherwise very similar to Polaris. I guess it is going to be a much more massive re-design. Not surprising that it isn't available yet. It is unclear what the off package links are going to be available. It will be interesting to have X-point connected to such a device. The low latency and byte addressability could make it look like you have huge amounts of memory directly attached to the GPU for HPC. I don't really know what the current state of these systems are. I know they were adding virtual memory type systems to GPUs quite a while ago to swap out to system memory, but I don't know how much that is being utilized.

January 5, 2017 | 11:52 AM - Posted by Anonymous (not verified)

Intel's brand of XPoint(Optane) is nowhere near as fast as Intel's marketing claimed. So not much more performance can be had currently relative to SLC NAND and proper latency hiding by a CPU/GPU processor's cach/memory subsystems. I do see the need for maybe a GPU having some NVM made up of at least an SSD with 32GB of XPoint and the rest SLC NAND for that ON GPU/PCIe Card direct SSD drive Radeon SKU that AMD is making. Micron with have their own Quantx brand of XPoint so at least there will be competition to provide for and AMD's XPoint needs.

Really I'd like to See JEDEC and AMD/Nvidia and their assoicated HBM2 memory partners trying to get an NVM/XPoint addition to the JEDEC HBM/HBM2 standard for an XPoint NVM die added to the HBM/HBM2 die stack and have some NVM/XPoint memory right there on the HBM2/newer HBM# stacks. That would be great for graphics and large textures stored in the On HBM# stack/s for gaming and other graphics workloads and even compute workloads. XPoint durability is going to have to be very high for it to be used on the HBM stacks and provide service for the life of the device so XPoint will have to be in use for a while until that question can be answered!

January 5, 2017 | 11:56 AM - Posted by notwalle (not verified)

hi will him have full bandwidth accessibility over thunderbolt 3 using x8 pic-e 3?

January 5, 2017 | 12:05 PM - Posted by notwalle (not verified)

vega would easily handle 4k open world gaming in monitor or VR with fast loading times

January 5, 2017 | 12:05 PM - Posted by notwalle (not verified)

vega would easily handle 4k open world gaming in monitor or VR with fast loading times

January 5, 2017 | 12:09 PM - Posted by notwalle (not verified)

vega would easily handle 4k open world gaming in monitor or VR with fast loading times.
I had really fast loading time on ES: Skyrim with 2 rx 390. thanks.

AMD is getting attacked by intel and nvidia. but they have generalists hand cause they make gpu apu and CPU so they can again innovate with what they did with AMD64, NEXT GEN consoles and Solid State Computing that is direct GPU CPU communication. of course they got to talk to motherboard manufacturers to make AM4 like that.

January 5, 2017 | 03:39 PM - Posted by Anonymous (not verified)

If an a big if the rumor of Intel licensing AMD tech for their iGPU comes true then HSA will get a huge push.

January 5, 2017 | 04:42 PM - Posted by Anonymous (not verified)

That will never happen, Intel will not be getting any bleeding edge GPU IP from AMD! It will more than likely be Intel licensing the same OLD Very Basic GPU IP from AMD that Intel used to get from Nvidia, as AMD and Nvidia control options for some of the same types of basic GPU IP that Intel needs to keep licensing from either Nvidia or AMD to keep from getting sued!

There is a large pool of FRAND types of IP that both Nvidia and AMD both have the rights to that Intel needs to license in order to stay legal with Intel's GPU designs! So Intel can get that from either Nvidia or AMD, But that DOES NOT include any of Nvidia's or AMD's bleeding edge IP of the last 5 or 10 years!

January 5, 2017 | 12:11 PM - Posted by Anonymous (not verified)

Ryan are you actually serious with this? This article is full to the brim with horrendous grammatical errors, like this one:
"Why the move to calling it a cache will be covered below"
Or this word salad:
"Fundamental changes had to be made to how the GPU handles data flow, scheduling and directly impacted the architecture of the chip and data paths through it"
This is JUST from the first page. The second page gets worse. I actually found this so cringe-worthy that I could not continue reading it. Where is your editor?? This article needs some extensive corrections. You also have the same issue with your QC SD 835 article. I know that there is a big drive in the media to publish first, but an article in this state should never have been published.

January 5, 2017 | 12:22 PM - Posted by CNote

"With Vega GPU architecture AMD is aiming to reinvent and the geometry pipeline."
I hate to be the grammar police but you added an and.

January 5, 2017 | 02:05 PM - Posted by Master Chen (not verified)

All things eventually have to come to an end.

January 5, 2017 | 03:51 PM - Posted by Anonymous (not verified)

Where is the Nvidia banner? They should be haooy to pay all your expenses for you to make an AMD news... When you sold your soul to the devil... Don't worry, you will get your free 1080 Ti.... How can you even accept this in the first place? Does the word "independent" mean anything to you?

January 5, 2017 | 05:22 PM - Posted by Jeremy Hellstrom

here you go!

Coverage of CES 2017 is brought to you by NVIDIA!

PC Perspective's CES 2017 coverage is sponsored by NVIDIA.

Follow all of our coverage of the show at!

January 6, 2017 | 05:27 AM - Posted by Master Chen (not verified)

>Red out

January 5, 2017 | 08:43 PM - Posted by Anonymous (not verified)

Nice analysis, thank you =)
Vega is looking pretty hot!

January 6, 2017 | 02:09 AM - Posted by Anonymous (not verified)


January 6, 2017 | 04:55 AM - Posted by Dark (not verified)

Great coverage, what people dont seem to realize is that it is silly to have expected a card launch this early, it would have given Nvidia more time to respond. This IS news Architectural improvements look great im not a tech expert but it looks like AMD put alot of effort improving their architecture and that will benefit in the future as they try to future proof their cards

January 6, 2017 | 09:03 AM - Posted by Paul-Sebastian (not verified)

It seems like this is the natural progression from middleware texture streaming technology like GRANITE from Graphine Software which allows streaming very high quality textures into the GPU from various storages using highly optimised algorithms for those storage backends and for figuring out what is visible on the screen and what the user is mostly looking at. What it primarily builds up to, is better VR quality and experience in the future. I would expect the end of 2017 and the start of 2018 to be the coming boom of the VR/AR industry.

January 6, 2017 | 01:37 PM - Posted by YTech

Nice coverage!

All of this reminds me of what nVidia did to their 10 Series GTX cards. Except AMD has added features that nVidia hasn't, yet.

Interesting none the less :)

January 10, 2017 | 11:09 PM - Posted by Drbaltazar (not verified)

Gddr5x vs hbm2 .One is VRAM the other is classic desktop ram but stacked. Temperature on hbm2 is Gona be a bitch . Gddr5x I suspect there won't be any issue . On paper it should be a bomb but any ever researched hbm . They should . I ll give you an exemple . I have 6 and a 8 pin on my GPU I had to cut a wire on the 6 pin(as per the standard. It's all well and good to strut but if your allies don't follow the standard it's all for nothing .You get over heating issue and you search till you find that the maker didn't follow the standard . So I ll wait before I cheer . AMD has had ton of issue in the past to respect standard and make their friend maker respect the standard

January 14, 2017 | 04:16 PM - Posted by Sasquatch

I'm just wondering with all this memory stuff - how long til developers are coding for it, or will the drivers have to handle all this overhead?
Reminds me of FX chips. Those could have been great if the industry started coding for raw cores. Instead they kept to the known way & AMD was left on the roadside with new tech that no one was fully utilizing.
Sadly, I keep seeing a future where AMD is gone as a company before their tech is being utilized to it's fullest.
Or am I wrong on all this?

January 15, 2017 | 07:08 AM - Posted by Anonymous (not verified)

The high bandwidth cache controller should not require developers to explicitly code for it at all. It should be completely transparent.

May 18, 2017 | 09:37 AM - Posted by msroadkill612

Belatedly, a good article ryan.

I agree the local ssd raid0 as ~unlimited vram is exciting and worth dwelling on. Few others have.

If as u say, 16 pcie3~ lanes are available for the interconnect w/ the gpu, and given even now, fairly new tech single ssdS push the boundaries of 4 lanes - w/ speeds of ~3.5GBps, then incomprehensibly fast raid storage, is possible as a vast, virtual vram memory pool.

Its slower, but avoids the fetters of the pc bus via its direct link to the gpu.

Its a new deal for coders.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

This question is for testing whether you are a human visitor and to prevent automated spam submissions.