Review Index:
Feedback

John Carmack Interview: GPU Race, Intel Graphics, Ray Tracing, Voxels and more!

Interview Transcript

Major props goes out to our own Tim Verry for getting this done so quickly for us.  Thank him for the speedy transcription!!

Ryan Shrout: I'm here with John Carmack of id Software. We're going to ask him some questions (user submitted questions as well some of our own). We'll focus on hardware, technology and those types of things. Thanks for joining us.

John Carmack: Welcome.

Ryan Shrout: One of the interesting ideas that has come up is game engines have evolved into these much more complex things, which obviously you are aware of, how are mathematics involved in this more so than just what we previously did with management of hardware in computers? How important are mathematics in these engines?

John Carmack: It's interesting in that I have had my math background overstated. For awhile there on our company press information it said something about me reading math textbooks all the time. I'm really not that much of a math geek. We've had a few people come through id that are much more versed in advanced mathematics. I was able to do all the things that I needed to do just by having a very strong, applicable knowledge of (basically) high school topics going through Algebra to Trigonometry, and a little bit of calculus. But knowing how to apply them to situations that weren't in the textbook and actually [knowing] how to use them for a problem that comes to you rather than something that's sitting down on a test, is what's really important. Now we use some more sophisticated mathematical solvers, [such as] numeric and iterative methods for some of the physics, collision response, and things like that. As you get to simulations, it gets a lot more where people start to looking at Navier-Stokes equations and things like that to do turbulent fluid flow simulations. You can throw arbitrary amounts of compute horsepower and analytical knowledge at things like that, but I do kind of keep coming back to time sliced things [being] enough, everything is linear and you can apply linear technologies to it. When we do get people applying for jobs, an advanced math background does say a lot about a person, to be able to get through working with high levels of abstraction. I've certainly had people working for me that I know are far beyond me in levels of what they can do analytically with math but I do always come back to, well, you can chop everything up again and do it all with discrete approximation methods. Many times you can spend hideous amount of blackboard space doing analytic solutions for lines that can be solved much more easily with Monte Carlo Iterative Methods.

Ryan Shrout: Focusing back on the hardware side of things, in previous years’ Quakecons we've had debates about what GPU was better for certain game engines, certain titles and what features AMD and NVIDIA do better. You've said previously that CPUs now, you don't worry about what features they have as they do what you want them to do. Are we at that point with GPUs? Is the hardware race over (or almost over)?

John Carmack: I don't worry about the GPU hardware at all. I worry about the drivers a lot because there is a huge difference between what the hardware can do and what we can actually get out of it if we have to control it at a fine grain level. That's really been driven home by this past project by working at a very low level of the hardware on consoles and comparing that to these PCs that are true orders of magnitude more powerful than the PS3 or something, but struggle in many cases to keep up the same minimum latency. They have tons of bandwidth, they can render at many more multi-samples, multiple megapixels per screen, but to be able to go through the cycle and get feedback... “fence here, update this here, and draw them there...” it struggles to get that done in 16ms, and that is frustrating.

Ryan Shrout: That's an API issue, API software overhead. Have you seen any improvements in that with DX 11 and multi-threaded drivers? Are those improving that or is it still not keeping up?

John Carmack: So we don't work directly with DX 11 but from the people that I talk with that are working with that, they (say) it might [have] some improvements, but it is still quite a thick layer of stuff between you and the hardware. NVIDIA has done some direct hardware address implementations where you can bypass most of the OpenGL overhead, and other ways to bypass some of the hidden state of OpenGL. Those things are good and useful, but what I most want to see is direct surfacing of the memory. It’s all memory there at some point, and the worst thing that kills Rage on the PC is texture updates. Where on the consoles we just say “we are going to update this one pixel here,” we just store it there as a pointer. On the PC it has to go through the massive texture update routine, and it takes tens of thousands of times [longer] if you just want to update one little piece. You start to advertise that overhead when you start to update larger blocks of textures, and AMD actually went and implemented a multi-texture update specifically for id Tech 5 so you can bash up and eliminate some of the overhead by saying “I need to update these 50 small things here,” but still it’s very inefficient. So I’m hoping that as we look forward, especially with Intel integrated graphics [where] it is the main memory, there is no reason we shouldn't be looking at that. With AMD and NVIDIA there's still issues of different memory banking arrangements and complicated things that they hide in their drivers, but we are moving towards integrated memory on a lot of things. I hope we wind up being able to say “give me a pointer, give me a pitch, give me a swizzle format,” and let me do things managing it with fences myself and we'll be able to do a better job.

Ryan Shrout: It seems like AMD has been doing a lot of talking about the future of the APU CPU and GPU combination.

John Carmack: It is important to separate a little bit. The current generation Fusion parts are really a separate CPU and separate GPU connected on the die by better or worse interconnects, but their vision is integrating them much more tightly such that they share cache hierarchies, address space, and page tables. I think its almost a foregone conclusion that its going to be the dominant architecture in the marketplace because there are these strong forces about how we are getting more shrinks on the dies, [and] we can stick more things on there. It’s going to just pay to integrate that, and you aren't going to be able to put as many transistors towards that if its a dedicated chip. You'll pick up a lot from this tight integration with cost benefits. Intel is also doing a lot better with their integrated graphics parts, once the butt of jokes, but they've taken a couple of steps now which are fully competent parts. The drivers still aren't very well tuned, the bandwidth is not great yet, and [neither is] total raw performance, but they are pretty much at feature parity barring some quality issues on there. It's close and they get better each time. This will be one of those things that will sneak up on people where the stuff that they got free with their CPU is all of a sudden good enough to run the games they want to play. I have high hopes that because it is all integrated memory, Intel will be able to lead the way with surfacing and direct access. That will give them the opportunity to sometimes take console developers who are used to this lower level access and maybe have something -shock of shocks- run better on Intel’s Integrated Graphics part than the much more expensive NVIDIA or AMD card that has all the layers of driver overhead.

Ryan Shrout: Does that lead us to a single platform across consoles and PCs? Where future consoles may be this next iteration in 2013 where there is one processor across all types of devices, or at least where you get the same type of access across all devices?

John Carmack: It's interesting, if you asked us a couple of years ago (when we surveyed the field) there was this thought that “would Intel's Larrabee be the platform to rule them all?” Of course, it turned out to not meet performance expectations. It was an interesting architecture, but a lot of us were cautioning from the very beginning to not underestimate the fixed functionality the GPU provides them. If people want to draw polygons made out of vertexes containing fragments, then hardware that's kind of built around that is going to outperform hardware of a more general purpose nature. I would be very surprised if all the major platforms came out with a common processor-GPU architecture. I think its obvious that there will be at least strong contenders with ARM cores mixed in with conventional GPUs. I hope that they don’t try to push Cell architectures again because there are a lot of reasons why peak performance is great... and there are cases in Rage where a PS3, if you are in an instance level where you have a lot of main memory (so a lot of things are cached in RAM), the extra horsepower of the Cells will let them transcode it into graphics memory much faster than the 360 and faster than all but the most extreme PCs, but its a pain in the ass to take advantage of it that way. For the cases when we didn't have all that buffered in memory, it winds up being more problematic.

Ryan Shrout: You mentioned ARM, Do you think that architecture is capable of the kind of performance necessary for consoles and PCs?

John Carmack: NVIDIA is basically reimplementing their very own custom ARM architecture. There is a big distinction between an instruction set or architecture specification and the physical implementations of it. Heck, x86 spans a huge range of capability. There is no doubt that ARM can handle a lot of that. What I would worry about a lot is that they are just beginning to push through their 64 bit transitions and any next gen consoles need to 64 bit platforms. Of course, the PS3 is 64 bit in many ways but it cuts off the upper half of the address space in a very strange decision making process. We don’t know how ARM is going to fare as 64 bit processors, but lots of companies have been through all of this. Apple has a ton of experience migrating across all of that, and I’m sure that they are working closely. How close Apple, ARM, and NVIDIA are working together are different things, and is an open question. ARM is more likely to make a dent in it just because [of] the x86 business model where desktop CPUs cost so much more... if you have PowerPC, MIPS, or especially ARM, it seems likely ARM has the better ecosystem and there would be sensible reasons to go with that.

Ryan Shrout: A couple of software questions now; A couple of years ago when we last talked we discussed ray tracing. At that time you didn’t think it was going to overtake anything in terms of rasterization or actual engine use. You were more interested in ray tracing for data structure accessing. Has anything changed?

John Carmack: I spent quite a bit of time on ray tracing in the last year and a half or so. I spent a bunch of time with OpenCL to figure out what I could do with that to make a real ray tracing engine. I made some really cool stuff. There are a few experiments I wanted to get results on. One was trying to say “well, what if you had a hybrid engine and only ray traced the specular reflections on there, how would that look?” I did that, and it turns out ray tracing off of a bump map surface looked god awful bad with one ray. You really need to throw at least 50 rays to make that look good. It was neat to have this overlaid mirrored surface in a lot of things but it clearly wasn’t a near-term function for this. Then, using it to replace rasterization, I got a somewhat mixed result where it was a little bit closer than I thought it might have been for tracing into a static scene. One of the things that I wanted to do that I’ve been trying to do for years was... with rasterization we aim for 60fps on Rage, but that means that we need to keep a cushion. We dynamically adjust our resolution, but we never get 100% utilization because we have to leave a little margin in case we mis-estimate.

I’ve always wanted to have a technology where you can just keep throwing rays, rasterizing, or something and when you are at 60fps you just use what you’ve got... so you can hard cap the stuff. I did a ray tracing engine where, on the system I was testing it on, a Fermi NVIDA card I could run a 720p ray trace at (in most of the cases) around 60fps, but in certain areas it would start falling off. I had it set up so that it could stop early, and have it be a lower resolution or have it include frames from somewhat further back so it turns into motion blur. I would do per-pixel jittering in both spatial domains for anti-aliasing (AA) and temporal for motion blur, and capping it in different ways. That worked really well, but the interesting thing is that when you see people make demos about certain things, it is not appreciated what a large gulf there is between a tech demo and what is going to be shown in a game. For example, I have this 60fps 720p ray tracing engine but it’s dealing with static models. It doesn’t have character animation or anything. It could spit out a depth buffer and spit out a hybrid engine to do that. Also, the fact that it runs 60fps at 720p for a ray tracer (seems great) but if you were just drawing that same thing with no fragment shaders using a traditional engine it would be running at 1,000fps. We do all this other stuff; once you throw your particles, post processing, and character animation... all that. That core part that has great detail and runs 20fps is a huge gulf between what can be part of a triple A engine.

On the other hand, we have converted all of our offline processing stuff to ray tracing. For years, the back-end MegaTexture generation for Rage was done with... we had a GPGPU cluster with NVIDIA cards and it was such a huge pain to keep. It was an amazing pain where one system would be having heat problems and would be behaving weird even though we thought they had identical drivers. Something would always be wrong with render farm 12, and whenever we wanted to put in new features it was like “Okay, writing new fragment programs to go into this.” Now, granted I did this just when CUDA was in its infancy. If I did re-implement it with OpenCL or CUDA we wouldn’t have some of these problems, but when I converted all these over to ray tracing there was a number of things that got a lot better. Things that we deal with, [for example] shadows and reflections that have to be approximated, and were so used to doing with rasterization... we sometimes forget how big of hacks these are. To be able to say I really just want that ray, and tell me what it hit; not do a projection with feathering shadowed edges and whatever the heck else we’re doing there, so much of the code got so much easier. If it’s a choice of... now that we have these awesome multi-core x86 CPUs where we can get 24 threads in commodity boxes... it’s true that one GPU card can do more ray tracing than one 24 thread x86 box, but it’s not multiples more and if it’s just a matter of buying more $2000 boxes, it makes the development, maintenance, and upkeep much better. While everyone in high performance computing is all “rah-rah” GPUs right now, I’ve come full circle back around to saying the fact that we can get massive amounts of x86 cores and threads... it wont win on FLOPS/watt or FLOPS/volume, but in terms of results per developer hour it is much, much better.

Ryan Shrout: There are different types of efficiency then?

John Carmack: Yeah, and when you are developing it is perfectly fine for us to spend hundreds of thousands of dollars in our back room to go ahead and make things better for that. It is a different question where, if every consumer has some graphic processor of some kind, we are going to do whatever it takes to max that out because you can’t tell everybody to go out and buy lots of x86 boxes. It’s interesting, the difference between the design decisions you make for a consumer target versus a development target.

August 12, 2011 | 09:07 AM - Posted by Anonymous (not verified)

I like the summary but, Will there be a written transcript of the interview?

I can read faster than watch and I can read the articles during sanity breaks at work.

Its rare that I make time for an internet video, but I will try to for this one.

August 12, 2011 | 11:59 AM - Posted by Ryan Shrout

We might try to do that but we hadn't planned on it. The whole idea of video (and then the written summary) was to NOT do a complete and direct transcription.

August 12, 2011 | 12:58 PM - Posted by Anonymous (not verified)

I think the subject matter is technical enough that you can assume the interested readers are not illiterate :) And yeah, i don't have time to watch a talking head for 32 minutes when i could read the transcript in 10.

August 13, 2011 | 01:17 PM - Posted by Anonymous (not verified)

And yet you have time to come on here and post 2x about it, when instead you could have watched the interview with the great illustrative video samples of games and graphic techniques.

August 12, 2011 | 12:44 PM - Posted by Wilhelm (not verified)

On a (crappy) mobile here, please make a transcript!

Also, making a transcript of anything Carmack says just makes sense :)

August 12, 2011 | 12:52 PM - Posted by Ryan Shrout

We get enough requests and we'll probably do it. :)

August 12, 2011 | 03:56 PM - Posted by Anonymous (not verified)

+1 for a transcript.

August 12, 2011 | 04:39 PM - Posted by Ryan Shrout

We are currently working on it!

August 12, 2011 | 09:01 PM - Posted by brickviking (not verified)

Another +1 for a transcript. Pleeeease?

Thanks, Dr Smokey.

August 12, 2011 | 09:09 PM - Posted by Ryan Shrout

We'll have sometime tomorrow early afternoon, promise.

August 12, 2011 | 01:30 PM - Posted by Darren (not verified)

It's rare that I leap to Sony's defence, but I feel I should so here. The reason the PS3 doesn't use full 64 bit addresses is simply because it doesn't need to. John seems to find the decision strange that they don't have a >4GiB address space, but given that there's only 512MiB of memory (plus some extra devices etc), having 8byte pointers would just be a waste of space that could be better used for other stuff.

The first few versions of the PS3 SDK were truly 64bit, but enough developers complained about the waste of space using long pointers that didn't need to be long that Sony saw sense and fixed it. The 360 similarly has a 64bit processor, but only bothers with a 32bit address space, simply because it's enough.

August 12, 2011 | 01:50 PM - Posted by DJ Fitz (not verified)

Another vote for the transcript. Besides being faster to read than view, a transcript would allow so much more. A transcript leads to indexing, which leads to searching, which leads to traffic from search engines, and ultimately more traffic on your site as a whole. This provides a much higher value for everyone far beyond the enjoyment and edification of just watching a John Carmack interview.

But anyways, thanks again for the interview. Always great to hear what John has been up to and the state of game development.

August 12, 2011 | 02:11 PM - Posted by Anonymous (not verified)

Transcript please!

August 12, 2011 | 02:42 PM - Posted by Ryan Shrout

We do have an editor working on it now. We'll add it as third page on this review today or tomorrow.

Thanks for reading!

August 12, 2011 | 07:28 PM - Posted by Tim Verry

Did someone say my name?!

August 12, 2011 | 09:37 PM - Posted by Ryan Shrout

Indeed, Tim is that lucky man.

August 13, 2011 | 06:19 AM - Posted by Ryan Shrout

Ask and you shall receive (sometimes), the interview transcript:

http://www.pcper.com/reviews/Editorial/John-Carmack-Interview-GPU-Race-I...

August 13, 2011 | 07:28 AM - Posted by Anonymous (not verified)

Thanks for the transcript, Tim!

August 13, 2011 | 04:07 PM - Posted by Tim Verry

You're welcome!

August 12, 2011 | 02:13 PM - Posted by Anonymous (not verified)

(Great summary though, thanks for that!)

August 12, 2011 | 04:16 PM - Posted by Paul Fjeld (not verified)

I for one am blown away by the clarity of the questions and Carmack's clear replies. You don't get a proper appreciation for how great an extemporaneous speaker Carmack is in a transcript, although the data density is certainly high enough to make a transcript useful. But I think it is worth the time to just listen to two smart people going at a complex subject.

Well done!

August 12, 2011 | 04:30 PM - Posted by Ryan Shrout

Thanks Paul!

August 12, 2011 | 07:21 PM - Posted by Anonymous (not verified)

1 TB is not that much anymore.

My PC is busy so I will cut this short ;) :)

August 12, 2011 | 09:38 PM - Posted by Ryan Shrout

It sure would be a lot to DOWNLOAD though, right?

August 13, 2011 | 04:08 PM - Posted by Tim Verry

Yeah, their bandwidth costs would be astronomical! I suppose they could go the bittorent route and just let their users host it for them, but I doubt they do that :( lol

August 13, 2011 | 01:04 AM - Posted by Anonymous (not verified)

I love how this interview guy pretends like he knows what Carmack is saying. Dude, I'm about to get a PhD in Neuroscience and I've been watching Carmack talk for 10 years, and I barely can follow what he says even with my failed CS degree. I would never be like "OK..sure..cool..right...OK..yeah..Cool..OK...yeah" when talking with the pioneers of my field. And then read some pre-made questions to ask him. OMG dude! Have you even played DOOM? Romero would make you his bitch.

August 13, 2011 | 01:49 AM - Posted by Anonymous (not verified)

PhD in Neuroscience? That's cool and all but you know, he isn't talking about neuroscience. If you managed to fail CS degree, it's not surprising that you can't follow Carmack. I could follow most of the stuff he was talking about and who knows what kind of background the interviewer has. He at least had good questions on the subject.

August 13, 2011 | 05:55 AM - Posted by Ryan Shrout

Exactly. There was very little stuff I didn't understand. After working in the graphics field for 11 years now, I have a pretty good grasp. Could I compete with Carmack in SAYING all of that? Nope. Could I understand most of it? Yup.

August 13, 2011 | 02:10 AM - Posted by Anonymous (not verified)

Great interview - though I dont think Ryan knows what he's talking about most of the time- check out the nervous nods in slightly the wrong places to simulate understanding.

Remember though, John is supposed to be doing the talking, not Ryan, and it succeeds in that aim.

Oh - previous poster, said same thing. Dont be hard on ryan: its all geeky techy stuff; not understanding is more a function of John's extreme geekiness than Ryans lack of intelligence. Low level programming knowledge does not equate to god-like wisdom. It's the guys job & passion, after all.

From what I've seen of Rage so far, it looks just like another boring shooter. Totally unexciting. Pioneer or no pioneer back in the day, I'm not that interested in their games.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.