Review Index:
Feedback

Next Gen Graphics and Process Migration: 20 nm and Beyond

Author:
Subject: Editorial
Manufacturer:

The Really Good Times are Over

We really do not realize how good we had it.  Sure, we could apply that to budget surpluses and the time before the rise of global terrorism, but in this case I am talking about the predictable advancement of graphics due to both design expertise and improvements in process technology.  Moore’s law has been exceptionally kind to graphics.  We can look back and when we plot the course of these graphics companies, they have actually outstripped Moore in terms of transistor density from generation to generation.  Most of this is due to better tools and the expertise gained in what is still a fairly new endeavor as compared to CPUs (the first true 3D accelerators were released in the 1993/94 timeframe).

The complexity of a modern 3D chip is truly mind-boggling.  To get a good idea of where we came from, we must look back at the first generations of products that we could actually purchase.  The original 3Dfx Voodoo Graphics was comprised of a raster chip and a texture chip, each contained approximately 1 million transistors (give or take) and were made on a then available .5 micron process (we shall call it 500 nm from here on out to give a sense of perspective with modern process technology).  The chips were clocked between 47 and 50 MHz (though often could be clocked up to 57 MHz by going into the init file and putting in “SET SST_GRXCLK=57”… btw, SST stood for Sellers/Smith/Tarolli, the founders of 3Dfx).  This revolutionary graphics card at the time could push out 47 to 50 megapixels and had 4 MB of VRAM and was released in the beginning of 1996.

View Full Size

My first 3D graphics card was the Orchid Righteous 3D.  Voodoo Graphics was really the first successful consumer 3D graphics card.  Yes, there were others before it, but Voodoo Graphics had the largest impact of them all.

In 1998 3Dfx released the Voodoo 2, and it was a significant jump in complexity from the original.  These chips were fabricated on a 350 nm process.  There were three chips to each card, one of which was the raster chip and the other two were texture chips.  At the top end of the product stack was the 12 MB cards.  The raster chip had 4 MB of VRAM available to it while each texture chip had 4 MB of VRAM for texture storage.  Not only did this product double performance from the Voodoo Graphics, it was able to run in single card configurations at 800x600 (as compared to the max 640x480 of the Voodoo Graphics).  This is the same time as when NVIDIA started to become a very aggressive competitor with the Riva TnT and ATI was about to ship the Rage 128.

Read the entire editorial here!

Process technology at this time improved in leaps and bounds.  Intel was always at or near the lead with others like IBM and Motorola keeping pace.  TSMC was the first Pure-Play foundry selling line space to 3rd parties and others such as Chartered and UMC were competitive across all of their lines.  TSMC has traditionally been the go-to foundry for the graphics industry, but around this time UMC was a close second.  Within one and a half years from the introduction of the Voodoo 2 and TnT class of graphics adapters, TSMC was offering 250 nm lines for willing customers.  NVIDIA was one of the first with the TnT 2 products, followed closely by 3dfx and the Voodoo 3.  ATI was a little bit behind with the Rage 128 Pro, but they were making progress in keeping up.

Right after this we were introduced to the half-step for process nodes.  TSMC released their 220 nm process for production and NVIDIA jumped on board with the original GeForce 256.  We did not see the big jump in power and die size benefits that a full process node can give, but it did provide a quick transition for designers going to the next advanced node.  Moving along we see the introduction of the 180 nm node and the GeForce 2 class of products.  The GeForce 2 GTS was a 25 million transistor chip that was running at 200 MHz.  Go back to the 2 million transistor Voodoo Graphics and we see that the chip design of the GeForce 2 GTS is 12.5x more complex running at four times the speed.  Between the Voodoo Graphics and GeForce 2 GTS we see only a span of four years between these developments.

View Full Size

The NVIDIA Riva TnT was the first serious competitor for 3Dfx's lineup of cards, including the then new Voodoo 2.

The pace did not slow down there.  Next up was the 150 nm half node from TSMC and the GeForce 3 series.  This chip was a monster for the time.  It was one of the first consumer level products that had a transistor count of around 57 million.  The GeForce 4, which was released a year after the GeForce 3 and still using the 150 nm process bumped that count up to around 67 million.  Then came the monster from ATI.  The R300, which powered the Radeon 9700 Pro, was an astonishing 107 million transistors on the same 150 nm process.  In the two years between 2000 and 2002 we see another quadrupling of transistor counts between two process nodes (and a half node at that) and another 100 to 150 MHz of speed for a complex GPU.

Around 2004 things started to slow down a bit, but that is a relative term as compared to the first eight years in 3D graphics.  I had written an article at my old site that covered what I had expected to be a problem in the years following.  “Slowing Down the Process Migration” discussed the inevitable slowing of process node transitions due to issues in materials, design strategies, and plain old physics.  Little did I know some of the major issues that plagued the 130 nm jump (migrating voids, design rule changes midstream, etc.) would be solved and we again returned to a very regular cadence of process improvements.  130 nm lead to 110, 90, 80, 65, 55, 45, 40, 32, and now 28 nm.  Graphics products did not inhabit every node, but they hit all of the major ones (45 and 32 nm were absent from most graphics platforms).

So where are we at now?  In 2003 the top end product was the Radeon 9800 XT running at 412 MHz and was comprised of 117 million transistors using TSMC’s highly optimized 150 nm process.  Today we are looking at the GTX TITAN based on the NVIDIA GK110 processor that weighs in at 7 billion transistors and around 850 MHz.  This represents twice the raw clockspeed and an astonishing 70 times more complex in transistor design in the span of ten years.  It is absolutely no wonder that we are spoiled by the constant stream of new products that advance the state of the art on a yearly basis with a major process node improvement every 18 months or so.

With this highly aggressive pace from year to year, why are we in graphics name only refresh-land right now?  I am starting to see a lot of commenters discussing their displeasure at both NVIDIA and AMD for their lack of a true, next-generation GPU.  The GK104 that originally powered the GTX 680 has morphed into a variety of products including the GTX 770 and GTX 760.  The GTX TITAN based on GK110 was released last year and it has been repurposed for the GTX 780.  AMD refreshed their lineups with last year’s Tahiti and Pitcairn chips, and the top end Hawaii chip (R9 290X) only reaches the complexity of last year’s GK110.  These parts are all based on TSMC’s 28 nm process.  Where exactly are the new chips and why aren’t we at 20 nm yet?

October 23, 2013 | 11:12 AM - Posted by Josh Walrath

Thanks!  I woke up this morning to weather that was -3C this AM!  Happy cold days to you as well!

October 23, 2013 | 11:31 AM - Posted by jgsieve (not verified)

Wow Josh, great article, I'm with you on this. you really have a passion for this.

October 23, 2013 | 01:14 PM - Posted by BigMack70 (not verified)

Great article!

I do think you got a bit speculative on the impact of mobile chips and on the supposed decline of the desktop graphics market... there has been some research recently showing that the desktop graphics segment is actually healthy and growing.

I think this needs substantiation and cannot be assumed:
"Remember, desktop graphics is actually a shrinking market due to the effective integration of graphics not just in the mobile space, but also with higher powered CPUs/APUs from Intel and AMD."

October 23, 2013 | 01:33 PM - Posted by Josh Walrath

Desktop graphics are not growing, they are shrinking.  But they are not shrinking that much.  Intel and AMD have such good integrated graphics anymore, a large portion of the people who would previously have been bundled with a low end card are now just integrated.

The sky is not falling on discrete graphics though, it just is not growing anymore.  Mobile IS growing, and that is where a lot of the R&D is going.

October 23, 2013 | 04:07 PM - Posted by BigMack70 (not verified)

I agree that a lot of R&D is going into mobile. However, things like this:
http://www.techpowerup.com/188572/global-pc-gaming-hardware-sales-shrug-...

suggest that there is growth occuring in the discrete graphics segment.

That's why I said that there needs to be some substantiation of the idea that mobile + integrated GPUs are detrimental to discrete GPU growth.

October 23, 2013 | 04:10 PM - Posted by Josh Walrath

Discrete isn't growing though.  Take a look at some of the J Peddie numbers over the past few years.  Sure, gaming systems are not being affected by the PC slowdown, but there are fewer shipments now than there were 3 years ago for discrete graphics.  It isn't plummeting, and it is a healthy market, but it just isn't growing.  All of the growth is mobile right now.

October 23, 2013 | 03:21 PM - Posted by Roger Garcia (not verified)

HAHA I still have my voodoo 2's and sli cable!

October 23, 2013 | 06:27 PM - Posted by derz

The man is a repository of knowledge. Ryan is truly lucky.

October 23, 2013 | 06:29 PM - Posted by Josh Walrath

And I bathe regularly!

October 24, 2013 | 07:03 AM - Posted by JackRocky (not verified)

Wow cool article.

As for the innovations in GPUs on 20nm. Well there is HyperCube or Stacked GDDR5 memory. You will get a marginally better GPU die but because the bandwidth is going to sky rocket it will seem like Christmas again...

Also AMD had a stacked DRAM prototype spotted in the wild in 2011. Why didn't it hit the market earlier? Maybe it needs time to be introduced to the market or maybe they saved this architectural revelation for the tough times, that the 20 nm without FDSOI is going to be...

Also come to think about it, should a GPU have 1 TB/s of bandwidth to main memory with improved latency at the same time, a lot of the on-die caches could be removed and pave way for more computational resources in the same die area. Of course the engineers will have to do their job and pick the right choices, but this is a possible outcome of new architectural breakthroughs that are orthogonal to the silicon production process.

So my point is, that the 290X/Titan replacement may in fact be a massive performance leap forward irrespective of the 20nm problems.

October 24, 2013 | 08:08 AM - Posted by IanD (not verified)

All the "next-generation 14nm" nodes are very similar, they're basically "20nm" metal (64nm pitch double-patterned) with faster transistors -- this applies to Intel "14nm" TriGate, TSMC "16nm" FinFET, GF "14nm" FinFET, ST "14nm" FDSOI Samsung "14nm" FinFET, there probably isn't a single feature on any of the chips which is 14nm but they had to call them something which was better than 20nm.

TSMC wouldn't call theirs 14nm because "fourteen" sounds like "go towards death" in Chinese -- and STs 14nm FDSOI used to be called 20nm (which was at least honest) until their marketing realised that everyone else was calling their similar processes 14nm, so they renamed it...

They're all a big advance on standard "20nm" planar (with the same metal stack) because lower leakage and lower operating voltage means lower power.

The issues are the risk and production difficulties and cost with new transistor structures, especially FinFET where Intel certainly had (and have?) issues with process variability, in spite of the fast they can sell both fast/leaky chips and slow/low power ones for more money than typical ones.

For all these processes (and 20nm bulk planar) the cost per gate is similar to or even higher than 28nm HKMG, which removes one of the big drivers for going to the next process node for many products. The industry was expecting EUV to come along and save its bacon, this not only hasn't happened yet but will certainly miss the next node after these ("10nm") which will need triple patterning -- and good look with that, both for design and cost.

So the lower power and higher density will mean that more functionality can be crammed onto one chip, but also that this will cost more -- which is an alien concept to an industry that for the last 40 years has assumed that the next process node will deliver more band for the same buck. Consumers may be in for a nasty shock when they find that their next super iGadget is even more expensive...

October 24, 2013 | 09:45 AM - Posted by Josh Walrath

Thank goodness for marketing and superstition to drive process naming!  Thanks for the info.  So strange to see these "advanced" nodes with the 20 nm back end.  Gonna be an interesting next few years of process tech.  Now we wait and see if all that money the industry invested in EUV will ever come to fruition.

October 25, 2013 | 01:31 PM - Posted by snc (not verified)

came to this site first time, very impressive article, great read, thanks for that!! will stop by more often :)

October 25, 2013 | 05:15 PM - Posted by Alex Antonio (not verified)

" It looks to compete with the GTX TITAN, but it will not leapfrog that part. It will probably end up faster, but by a couple of percentage points. It will not be the big jump we have seen in the past such as going from a GTX 580 or HD 6970 to a GTX 680 or HD 7970."

Thats not really a fair comparison... you are comparing generational leaps compared to competing products.

The generational leap for the R290x is from the 7970. Similarly the GTX 780 is the generational leap from the GTX 680.

As for for how the R290x compares to the 7970.. it is about 59% faster give or take the application. Thats the biggest leap generation to generation for as long as I can remember.

October 26, 2013 | 01:26 PM - Posted by Josh Walrath

Well, those really aren't generational leaps.  They are bigger products based on the same GCN and Kepler architectures that were introduced with the HD 7970 and GTX 680 respectively.  Titan has been out around a year now, and only now does AMD have an answer for that.  All of them are based on 28 nm.  So, those big chips are nice jumps in performance, but they are not the big architectural leaps that we have seen from the GTX 580 to GTX 680 or the HD 6970 to the HD 7970.

October 26, 2013 | 07:50 PM - Posted by kukreknecmi (not verified)

http://www.cadence.com/Community/blogs/ii/archive/2013/04/14/tsmc-2013-s...

Are theese mostly PR related or people just start assumptions from having a "%30 lower power consumption" on a "sram array" that, it will also be on the same level on 400mm2 GPU? Or both?

October 28, 2013 | 10:26 AM - Posted by Josh Walrath

Some is a bit of marketing hype, but the basics of timelines and products seems to be in line with what is expected.  Yes, there will be smaller chips, there will be more power efficient chips, but I think we will see some power/clock scaling issues with 20 nm planar.  It will be a better overall process than 28 nm HKMG, but do not expect miracles at the high end with large chips.  I could be out in left field, but it seems awfully positive and shiny in that blog.

October 27, 2013 | 03:48 PM - Posted by Watushio (not verified)

Great article Josh

Thnx

October 29, 2013 | 09:47 AM - Posted by Josh Walrath

Thanks!

November 4, 2013 | 12:17 AM - Posted by MdX MaxX (not verified)

Wonderful article!

I wish everyone could read this so we would stop hearing all the "wahhh Intel/AMD/Nvidia doesn't care about enthusiasts anymore" nonsense. Transistors don't just get smaller on their own.

November 7, 2013 | 10:42 AM - Posted by Adele Hars (not verified)

You rock, Josh -- great piece. A few clarifications. IBM is still using PD-SOI at 22nm in Power8 (see http://bit.ly/15saFUm). They've got SOI-FinFET lined up for 14nm. The FD-SOI crowd is skipping directly from 28nm to 14nm, which they say will be ready next year before 14nm (bulk) FinFET (see http://bit.ly/1cGjZgi). (Tho 28nm FDSOI is already pretty awesome in terms of power & perf -- it's what got 3GHz & an extra day of smartphone battery life - see http://bit.ly/1hPLvri). And ST's capacity in France is much more than you've indicated -- and now they're in the process of doubling it (thank you, Europe!) so they'll be at 4500 wafer starts/week by the end of 2014 (see http://bit.ly/1bdvMfr). Leti will have models available for 10nm FDSOI in a couple months, and PDKs in Q314 (see http://bit.ly/1bdwadP).

November 8, 2013 | 08:56 AM - Posted by Josh Walrath

Really good info here!  Thanks for joining in!

January 10, 2014 | 03:18 AM - Posted by laurent (not verified)

Thank you for this comprehensive and complete article on technological limitations of SC industry vs graphics maturity. I work in the ST fab that develop 28 then 14nm FDSOI right now and this kind of article makes it worth the efforts (to not say the hard work!) we put in this technology.

January 30, 2014 | 01:52 PM - Posted by Josh Walrath

14 nm FDSOI looks very, very interesting.  Can't wait to see how it progresses!

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.