Review Index:

Next Gen Graphics and Process Migration: 20 nm and Beyond

Author: Josh Walrath
Subject: Editorial

Hitting the Wall Early and Often!

We as consumers have taken process advancements for granted.  Moore’s Law states that transistor densities will double every 18 months, and that has held true for a long time.  Companies like NVIDIA set an aggressive pace in terms of new products and refreshes that would often span around 14 to 16 months from start to finish.  So here we are some 22 months from the introduction of the HD 7970 and we see this same part refreshed as the R9 280X.  During that time we saw some clock speed improvements as TSMC’s 28 nm process matured, but the basic performance of the chip is essentially unchanged.  Some 14 months after the release of the first GK104 parts from NVIDIA, they too refreshed those exact same chips with the GTX 700 series.  Again, we saw a small bump in performance due to higher clockspeeds, but there is no true next-generation part waiting in the wings.

What exactly has happened that has slowed the pace of advancement for graphics?  There are two major factors seemingly at play; the rise of mobile computing and the chips that are powering this revolution, and the extreme slowdown of process migration as compared to historical trends.

View Full Size

I am still not entirely sure what voodoo ATI used at the time to get the basic R300 design to run on TSMC's 150 nm process as effectively as they did.  The 9800 XT at 415 MHz is just sorta crazy, and it didn't break the bank when it came to TDPs.  The original R300 was a singular moment in GPU history.

Mobile computing is perhaps the most tenuous reason, but it does make some sense in a variety of factors.  Both AMD and NVIDIA have mobile graphics groups which take away design resources from their larger projects.  While the modern Kepler and GCN architectures are able to scale from fairly low to really high in terms of TDP, they are not entirely effective when talking about the half watt space that are primarily where smartphones sit.  NVIDIA has a totally different architecture for graphics in Tegra as compared to the desktop.  AMD does not have a graphics architecture that will currently exist in that ultra-low TDP range, instead they utilize GCN for products that are 4 watts and up.  NVIDIA is planning on opening up Kepler to those areas, but they are not there yet.

Mobile computing is also a growth area for these companies as compared to desktop and laptop graphics.  R&D resources now have to be spread out to the different groups and they have to have competitive products, otherwise the company will not be able to cash in on those growth numbers that we have been seeing for the past several years.  After mobile chips have been developed, then we fractionalize off software and hardware support so these products can be integrated effectively into a 3rd party product.  This is all money shifted away from desktop graphics.  Remember, desktop graphics is actually a shrinking market due to the effective integration of graphics not just in the mobile space, but also with higher powered CPUs/APUs from Intel and AMD.

Finally with mobile computing, we are seeing a lot more pressure on advanced process lines in terms of wafer buys.  These ARM based chips are thriving at the 32 nm and 28 nm nodes.  The vast majority of users are quite pleased by the performance of these products across different workloads, and they have excellent power characteristics.  These are relatively small chips, so quite a few of them can be fit onto a wafer.  The problem here is the economics.  Margins are thin on these chips, and so the companies making the orders are probably much more aggressive in pursuing contracts, and leveraging different pure-play foundries against each other (TSMC, UMC, and GLOBALFOUNDRIES).  Samsung then throws another wrench into the mix by not just fabricating their own parts (Exynos), but also selling fab space to their competition in the form of Apple.  If these companies can in fact effectively negotiate lower priced wafers with promises of filling up the lines with orders, then companies such as TSMC will make less money per wafer as compared to more complex products like GPUs.  Less money is less R&D for advanced process features, and this behavior also maximizes the already spent R&D investment on the current process.  The end result here is less money being allocated towards advanced process development, so these advanced nodes will take longer to develop.

The accountants at the foundries have some very complex equations to maximize manufacturing and minimize expenses.  The risk of falling behind is always there, but these foundries are used to being a process node behind the industry leader (Intel) and still being able to pull good profits.  These foundries also get a significant cost break by adopting technology well after Intel has done the lion’s share of work (think optics, lithography, wafer handling, deposition, etc.) and monetary investment.  Their motivation is to stay close, but not risk the bleeding edge.  This is the opposite of what AMD did when they owned their own fabs, as their primary product competed directly with Intel.  Now the GLOBALFOUNDRIES is on its own, it has slowed down its pace of next generation process technology introduction, much to AMD’s chagrin.

Mobile computing has been a steady stream of income for the foundries as more and more products require advanced chips to power them.  Again, maximizing the investment in a current process line makes the company more money and leverages the expenses much more effectively than trying to jump to the next node as soon as possible.

This leads us into the slowdown of process technology that we are seeing.  While previous process nodes have had their issues (130 nm had void migration, the jump to copper interconnects was not without problems, etc.) it seems like the current 28 nm HKMG node was perhaps the last “easy” jump that the foundry industry will see.  This is not to say that 28 nm HKMG was easy, but the obstacles in the way towards 22/20 nm are pretty tremendous.  Intel was able to get to 22 nm over a year and a half ago with very good results.  This came about because of the billions that Intel invested in their fabrication technology.  They are the first to have implemented Tri-Gate in mass produced parts.  This was not an inexpensive endeavor in terms of money and man-hours.  Now, the reason why Intel went with the Tri-Gate technology was not about beating its chest and proclaiming that they had the most advanced process available; the reason was that they had no real choice in the matter if they were going to produce high performance CPUs that would scale power effectively with clock speed.

View Full Size

Intel spent billions to get 22 nm Tri-Gate up and running.  They are reaping the benefits of this technology each and every quarter that the rest of the industry lags behind.

22/20 nm processes can pack the transistors in.  Such a process utilizing planar transistors will have some issues right off the bat.  This is very general, but essentially the power curve increases very dramatically with clockspeed.  For example, if we were to compare transistor performance from 28 nm HKMG to a 20 nm HKMG product, the 20 nm might in fact be less power efficient per clock per transistor.  So while the designer can certainly pack more transistors into the same area, there could be some very negative effects from implementing that into a design.  For example, if a designer wants to create a chip with the same functionality as the old, but increase the number of die per wafer, then they can do that with the smaller process.  This may not be performance optimized though.  If the designer then specifies that the chips have to run as fast as the older, larger versions, then they run a pretty hefty risk of the chip pulling just as much power (if not more) and producing more heat per mm squared than the previous model.

Intel got around this particular issue by utilizing Tri-Gates.  This technology allowed the scaling of performance and power that we are accustomed to with process shrinks.  This technology has worked out very well for Intel, but it is not perfect.  As we have seen with Ivy Bridge and Haswell, these products do not scale in speed as well as the older, larger 32 nm Sandy Bridge processors.  Both of the 22 nm architectures start pulling in more power than the previous generation when clockspeeds go past 4.0 GHz.  Having said that, the Intel 22 nm Tri-Gate process is exceptionally power efficient at lower clockspeeds.  The slower the transistors switch, the more efficient they are.  These characteristics are very favorable to Intel when approaching the mobile sector.  This is certainly an area that Intel hopes to clean up in.  This is the area that is finally scaring all the other 3rd party SOC designers (Qualcomm, Samsung, NVIDIA, etc.) and potentially putting more pressure on the pure-play foundries to get it together.

October 22, 2013 | 06:12 PM - Posted by snook

thanks Josh. Lets hope there is that one guy who says "how about trying this?", and he changes everything.

October 22, 2013 | 09:50 PM - Posted by Josh Walrath

Make no mistake, there is a lot of research in a LOT of different areas to overcome the issues that the industry is running into.  The challenges have always been there (breaking the 1 micron barrier was seemingly huge), but now the challenges are just bigger, more complex, and more expensive.

October 25, 2013 | 11:56 AM - Posted by Anonymous (not verified)

Or maybe one of the foundrys will have a happy accident Bob Ross style.

They'll come in one monring to find their equipment had slipped around a new nm during the night and everything is a little out of whack. They're about to toss out the batch when someone grabs a wafer for the fun of it and runs a test and BAM! breakthrough!

Guy can dream, can't he.

October 22, 2013 | 06:26 PM - Posted by Evo01

Thanks Josh. Fantastic article.

October 22, 2013 | 07:46 PM - Posted by Fishbait

Very awesome and informative article Josh. What implications could this have with Moore’s law? Does this effectively stop it before the theoretical quantum limit in 2036? These will be an interesting few years for pure-play foundries and their clients indeed.

October 22, 2013 | 09:52 PM - Posted by Josh Walrath

Well, things will be necessarily slowing down.  There simply are hurdles that need lots of time and lots of money to solve.  10 nm shouldn't be that bad, 7 nm is hitting some interesting limits, and sub 7 nm is going to be really rough.  Litho, materials, and electrical characterisitcs at that size will be sorta crazy.

October 22, 2013 | 08:26 PM - Posted by Anonymous (not verified)

Just to add a few points to this excellent article:

- The 14nm/16nm nodes for GloFo and TSMC, respectively, are going to be utilizing a 20nm back-end-of-line. This means that while density won't increase, they'll improve power characteristics (these are the two FinFET nodes).

- The time-to-market for the above two nodes from both foundries should be more painless than if they were to attempt a shrink + FinFETs. As a result, if I were to guess I'd say we see the 14nm/16nm nodes a bit earlier than some had anticipated. Early tape-outs for 14nm and 20nm have been close together so that definitely adds some credence to that line of thought. Though not certainty ;P

- These node names (eg., 14nm) don't actually accurately describe the half-pitch. Unless I'm recalling incorrectly, the current tools would only allow something like 18nm(?). Intel's current 22nm FinFETs has been described in papers as 26nm. Whether that's true or not, I have no idea, but the point is that the half-pitch is only a single detail in a long list of attributes that defines a new "node." The takeaway is that you shouldn't get too caught up in the XXnm numbers and remember that it's the power, leakage, density, and performance of the node that actually matters.

October 22, 2013 | 08:43 PM - Posted by Josh Walrath

Thanks for the comments.

About the node names... Intel's 22 nm describes the smallest feature, but you are correct in that a certain other feature (I think it has to do with SRAM) is 26 nm.  There was some thought that AMD with GF's 28 nm would be able to get fairly close to the transistor density of Intel's 22 nm in certain aspects due to this size variance.

October 22, 2013 | 09:29 PM - Posted by Krewell (not verified)

Nice summary Josh. As the person above noted above, the node numbers are not strictly related to feature size (e.g. TSMC 16nm is FinFET transistors on 20nm backend).

Nvidia likes to talk about how GPUs have better than Moore's Law scaling, but with die sizes already at 550mm2 (GK110), that will not be true going forward - die sizes are already close to the limits of fab reticles (~600mm2).

I just had this same conversation with AMD's Raja Kudari. Raja's response is that it will take new architectures to improve performance, not just process shrinks and die area growth. It's going to take improvements in architecture efficiency and effectiveness. It also means that the GPU designers need to work closer with game engine developers to find efficiency improvements - Mantle is one example.

October 22, 2013 | 09:45 PM - Posted by Josh Walrath

If you have some spare minutes, you should read that old article I linked.  Some interesting stuff there (considering it was written in 2004 and issues at 130 nm were just being solved).

Thanks for reading!  The next few years are going to be very interesting considering the challenges ahead!

October 23, 2013 | 01:32 AM - Posted by Fiberton (not verified)

Reason I found it interesting that last year AMD replaced CPU architects that were the creators of the Athlon. We all really need AMD to do well to drive pricing down for everyone and to also push technology forward.

October 23, 2013 | 02:03 AM - Posted by Fiberton (not verified)

I wish I could edit :) They replaced the bulldozer architect. They have hired people who worked on the Athlon projects. 1 am sorry :).

October 23, 2013 | 01:09 PM - Posted by Josh Walrath

Yeah, some of the old guys came back.  Jim Keller is the big name.  Raja Koduri on the GPU side is back.  There is a lot of uplift in what they are trying to do, and I think overall they are heading in the right direction.  I like Dirk Meyer, but while he was a great CPU architect, he almost missed the major mobile transition that his product stack would not be able to address.

October 22, 2013 | 09:32 PM - Posted by Whayne (not verified)

Wonderful and informative post Josh. As a theoretical physicist with some background in solid state physics I've been aware of a few of the issues facing the industry especially the lithography. I cannot even begin to imagine how hard it's going to be to get 7-5nm process nodes operational. I expect quantum effects to come in earlier maybe even 10nm will be very tough. Quantum tunnelling will no doubt be a huge issue when line traces are so small.

Interesting times ahead. It looks like either an R9 290x or a GTX 780 Ti will be my friend until well into 2015, but that's okay, as they are still going to be pretty darn good cards.

October 22, 2013 | 09:42 PM - Posted by Josh Walrath

I find it interesting that pretty much the entire industry is heavily invested in EUV... and from what I understand, the risk there is still very high that it will even work out.

October 24, 2013 | 02:30 PM - Posted by Frenchy2k1 (not verified)

The industry has been working on EUV for over 10 years already and still seems quite far from its target (which has been moving during that time too).

Those are interesting times indeed at the process level.

The question you have not breached is about the economics of it. We have been seeing a lot of consolidation in the semiconductor for the past ten years and it is accelerating. Each process node cost exponentially more than the last one and THIS is the reason for pure play foundries: few companies can afford their own fabs anymore. Intel is of course the exception, but even they are starting to open their fabs to other companies (which nobody would expect just a few years ago). That means that even intel has too much capacity and cannot fill its fabs anymore.

Semiconductor is so far the pinnacle of human ingenuity, taking so much efforts from so many people to keep on track and follow Moore's Law. All those people are hard at work on EUV and backup plans (multiple patterning, they are doing dual, but 3 or 4 are definitely possible, 3D transistors are also coming, first in Flash at Samsung). We have not yet seen the end of semiconductor growth.

October 22, 2013 | 10:12 PM - Posted by Anonymous (not verified)

Looks like someone got influenced by their trip to Montreal.

October 22, 2013 | 10:17 PM - Posted by Josh Walrath

Heh, I didn't go to Montreal with Ryan and Ken.  Oddly enough, I started researching and writing this before that event.  I was sorta cranky when Carmack started talking about this subject... day late and a dollar short for me (or rather many millions of dollars short).

October 23, 2013 | 01:38 AM - Posted by Fiberton (not verified)

Your article really reminds of the past seeing the names of all the cards and players in the market. There was so much more excitement back then. Thanks for the read 0/

October 22, 2013 | 11:18 PM - Posted by Jerald Tapalla (not verified)

This is the first time I just sit and read a long article with my focus just on it. Very nice article, very informative especially for me who is new to this kind of stuff. Thanks for this.

October 22, 2013 | 11:38 PM - Posted by Josh Walrath

Thanks for reading the entire thing!  Ryan will thank you as well!

October 22, 2013 | 11:57 PM - Posted by Onehourleft

Great writing Josh. Also it was nice to visit your archives for the first time. I enjoyed both articles.

October 23, 2013 | 01:29 AM - Posted by MeezyATL (not verified)

Some great articles from the PC Per staff this week. Keep up the good work.

October 23, 2013 | 05:06 AM - Posted by thinkbiggar (not verified)

Can you run that thing in SLI? Also did it cause you to loose all your hair?
I bet the prices stayed the same but not with inflation. Don't tell marketing people about inflation. Once they learn about it we are all screwed.

October 23, 2013 | 01:11 PM - Posted by Josh Walrath

I lost my hair because I got married and had kids.

October 23, 2013 | 07:01 AM - Posted by WantT100 (not verified)

This is a stunning article, this is why I visit Pcper every day. Ryan give the man a bonus!

This article is up there with Scott's "The Windows You Love is Gone" stunner a year ago.

October 23, 2013 | 08:13 AM - Posted by Anonymous (not verified)

These kind of delays are to be expected. The smaller you go, the indiviual effects become that much ore pronounced. Instead of treating the design as a whole or in smaller but relatively large units, more research needs to be done in examining each and every change occuring within the system. Very time consuming and expensive. I will not be surprised if there are further delays. The break-neck speed of development had to come to an end some time.

Not disappointed about delays but very much expected. Can´t keep throwing money and expect it to pay dividends immediatly. My 2 cents.

October 23, 2013 | 12:46 PM - Posted by Josh Walrath

Yup, you are likely correct.  What we often don't hear about is how closely the fab engineers work with the designers.  The amount of back and forth work and information they do is pretty staggering, especially with these next generation nodes.  This simply isn't a "we are finished with the design, send it to the Fab guys and they can figure out the rest!" situation anymore.

We are also seeing the pure-play guys working to amortize their investments in current process nodes... because the next gen stuff is so expensive.  Gotta pay those bills.  They only hope that Intel will slow down, cause those guys don't clear $3 billion a quarter like Intel does.

October 23, 2013 | 10:25 AM - Posted by blitzio

Amazing article Josh, thank you for an informative read. I feel thoroughly educated.

October 23, 2013 | 10:29 AM - Posted by Max KreFey

Great article Josh! Thank you from cold mother Russia! :D

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.