John Carmack Interview: GPU Race, Intel Graphics, Ray Tracing, Voxels and more!
Major props goes out to our own Tim Verry for getting this done so quickly for us. Thank him for the speedy transcription!!
Ryan Shrout: I'm here with John Carmack of id Software. We're going to ask him some questions (user submitted questions as well some of our own). We'll focus on hardware, technology and those types of things. Thanks for joining us.
John Carmack: Welcome.
Ryan Shrout: One of the interesting ideas that has come up is game engines have evolved into these much more complex things, which obviously you are aware of, how are mathematics involved in this more so than just what we previously did with management of hardware in computers? How important are mathematics in these engines?
John Carmack: It's interesting in that I have had my math background overstated. For awhile there on our company press information it said something about me reading math textbooks all the time. I'm really not that much of a math geek. We've had a few people come through id that are much more versed in advanced mathematics. I was able to do all the things that I needed to do just by having a very strong, applicable knowledge of (basically) high school topics going through Algebra to Trigonometry, and a little bit of calculus. But knowing how to apply them to situations that weren't in the textbook and actually [knowing] how to use them for a problem that comes to you rather than something that's sitting down on a test, is what's really important. Now we use some more sophisticated mathematical solvers, [such as] numeric and iterative methods for some of the physics, collision response, and things like that. As you get to simulations, it gets a lot more where people start to looking at Navier-Stokes equations and things like that to do turbulent fluid flow simulations. You can throw arbitrary amounts of compute horsepower and analytical knowledge at things like that, but I do kind of keep coming back to time sliced things [being] enough, everything is linear and you can apply linear technologies to it. When we do get people applying for jobs, an advanced math background does say a lot about a person, to be able to get through working with high levels of abstraction. I've certainly had people working for me that I know are far beyond me in levels of what they can do analytically with math but I do always come back to, well, you can chop everything up again and do it all with discrete approximation methods. Many times you can spend hideous amount of blackboard space doing analytic solutions for lines that can be solved much more easily with Monte Carlo Iterative Methods.
Ryan Shrout: Focusing back on the hardware side of things, in previous years’ Quakecons we've had debates about what GPU was better for certain game engines, certain titles and what features AMD and NVIDIA do better. You've said previously that CPUs now, you don't worry about what features they have as they do what you want them to do. Are we at that point with GPUs? Is the hardware race over (or almost over)?
John Carmack: I don't worry about the GPU hardware at all. I worry about the drivers a lot because there is a huge difference between what the hardware can do and what we can actually get out of it if we have to control it at a fine grain level. That's really been driven home by this past project by working at a very low level of the hardware on consoles and comparing that to these PCs that are true orders of magnitude more powerful than the PS3 or something, but struggle in many cases to keep up the same minimum latency. They have tons of bandwidth, they can render at many more multi-samples, multiple megapixels per screen, but to be able to go through the cycle and get feedback... “fence here, update this here, and draw them there...” it struggles to get that done in 16ms, and that is frustrating.
Ryan Shrout: That's an API issue, API software overhead. Have you seen any improvements in that with DX 11 and multi-threaded drivers? Are those improving that or is it still not keeping up?
John Carmack: So we don't work directly with DX 11 but from the people that I talk with that are working with that, they (say) it might [have] some improvements, but it is still quite a thick layer of stuff between you and the hardware. NVIDIA has done some direct hardware address implementations where you can bypass most of the OpenGL overhead, and other ways to bypass some of the hidden state of OpenGL. Those things are good and useful, but what I most want to see is direct surfacing of the memory. It’s all memory there at some point, and the worst thing that kills Rage on the PC is texture updates. Where on the consoles we just say “we are going to update this one pixel here,” we just store it there as a pointer. On the PC it has to go through the massive texture update routine, and it takes tens of thousands of times [longer] if you just want to update one little piece. You start to advertise that overhead when you start to update larger blocks of textures, and AMD actually went and implemented a multi-texture update specifically for id Tech 5 so you can bash up and eliminate some of the overhead by saying “I need to update these 50 small things here,” but still it’s very inefficient. So I’m hoping that as we look forward, especially with Intel integrated graphics [where] it is the main memory, there is no reason we shouldn't be looking at that. With AMD and NVIDIA there's still issues of different memory banking arrangements and complicated things that they hide in their drivers, but we are moving towards integrated memory on a lot of things. I hope we wind up being able to say “give me a pointer, give me a pitch, give me a swizzle format,” and let me do things managing it with fences myself and we'll be able to do a better job.
Ryan Shrout: It seems like AMD has been doing a lot of talking about the future of the APU CPU and GPU combination.
John Carmack: It is important to separate a little bit. The current generation Fusion parts are really a separate CPU and separate GPU connected on the die by better or worse interconnects, but their vision is integrating them much more tightly such that they share cache hierarchies, address space, and page tables. I think its almost a foregone conclusion that its going to be the dominant architecture in the marketplace because there are these strong forces about how we are getting more shrinks on the dies, [and] we can stick more things on there. It’s going to just pay to integrate that, and you aren't going to be able to put as many transistors towards that if its a dedicated chip. You'll pick up a lot from this tight integration with cost benefits. Intel is also doing a lot better with their integrated graphics parts, once the butt of jokes, but they've taken a couple of steps now which are fully competent parts. The drivers still aren't very well tuned, the bandwidth is not great yet, and [neither is] total raw performance, but they are pretty much at feature parity barring some quality issues on there. It's close and they get better each time. This will be one of those things that will sneak up on people where the stuff that they got free with their CPU is all of a sudden good enough to run the games they want to play. I have high hopes that because it is all integrated memory, Intel will be able to lead the way with surfacing and direct access. That will give them the opportunity to sometimes take console developers who are used to this lower level access and maybe have something -shock of shocks- run better on Intel’s Integrated Graphics part than the much more expensive NVIDIA or AMD card that has all the layers of driver overhead.
Ryan Shrout: Does that lead us to a single platform across consoles and PCs? Where future consoles may be this next iteration in 2013 where there is one processor across all types of devices, or at least where you get the same type of access across all devices?
John Carmack: It's interesting, if you asked us a couple of years ago (when we surveyed the field) there was this thought that “would Intel's Larrabee be the platform to rule them all?” Of course, it turned out to not meet performance expectations. It was an interesting architecture, but a lot of us were cautioning from the very beginning to not underestimate the fixed functionality the GPU provides them. If people want to draw polygons made out of vertexes containing fragments, then hardware that's kind of built around that is going to outperform hardware of a more general purpose nature. I would be very surprised if all the major platforms came out with a common processor-GPU architecture. I think its obvious that there will be at least strong contenders with ARM cores mixed in with conventional GPUs. I hope that they don’t try to push Cell architectures again because there are a lot of reasons why peak performance is great... and there are cases in Rage where a PS3, if you are in an instance level where you have a lot of main memory (so a lot of things are cached in RAM), the extra horsepower of the Cells will let them transcode it into graphics memory much faster than the 360 and faster than all but the most extreme PCs, but its a pain in the ass to take advantage of it that way. For the cases when we didn't have all that buffered in memory, it winds up being more problematic.
Ryan Shrout: You mentioned ARM, Do you think that architecture is capable of the kind of performance necessary for consoles and PCs?
John Carmack: NVIDIA is basically reimplementing their very own custom ARM architecture. There is a big distinction between an instruction set or architecture specification and the physical implementations of it. Heck, x86 spans a huge range of capability. There is no doubt that ARM can handle a lot of that. What I would worry about a lot is that they are just beginning to push through their 64 bit transitions and any next gen consoles need to 64 bit platforms. Of course, the PS3 is 64 bit in many ways but it cuts off the upper half of the address space in a very strange decision making process. We don’t know how ARM is going to fare as 64 bit processors, but lots of companies have been through all of this. Apple has a ton of experience migrating across all of that, and I’m sure that they are working closely. How close Apple, ARM, and NVIDIA are working together are different things, and is an open question. ARM is more likely to make a dent in it just because [of] the x86 business model where desktop CPUs cost so much more... if you have PowerPC, MIPS, or especially ARM, it seems likely ARM has the better ecosystem and there would be sensible reasons to go with that.
Ryan Shrout: A couple of software questions now; A couple of years ago when we last talked we discussed ray tracing. At that time you didn’t think it was going to overtake anything in terms of rasterization or actual engine use. You were more interested in ray tracing for data structure accessing. Has anything changed?
John Carmack: I spent quite a bit of time on ray tracing in the last year and a half or so. I spent a bunch of time with OpenCL to figure out what I could do with that to make a real ray tracing engine. I made some really cool stuff. There are a few experiments I wanted to get results on. One was trying to say “well, what if you had a hybrid engine and only ray traced the specular reflections on there, how would that look?” I did that, and it turns out ray tracing off of a bump map surface looked god awful bad with one ray. You really need to throw at least 50 rays to make that look good. It was neat to have this overlaid mirrored surface in a lot of things but it clearly wasn’t a near-term function for this. Then, using it to replace rasterization, I got a somewhat mixed result where it was a little bit closer than I thought it might have been for tracing into a static scene. One of the things that I wanted to do that I’ve been trying to do for years was... with rasterization we aim for 60fps on Rage, but that means that we need to keep a cushion. We dynamically adjust our resolution, but we never get 100% utilization because we have to leave a little margin in case we mis-estimate.
I’ve always wanted to have a technology where you can just keep throwing rays, rasterizing, or something and when you are at 60fps you just use what you’ve got... so you can hard cap the stuff. I did a ray tracing engine where, on the system I was testing it on, a Fermi NVIDA card I could run a 720p ray trace at (in most of the cases) around 60fps, but in certain areas it would start falling off. I had it set up so that it could stop early, and have it be a lower resolution or have it include frames from somewhat further back so it turns into motion blur. I would do per-pixel jittering in both spatial domains for anti-aliasing (AA) and temporal for motion blur, and capping it in different ways. That worked really well, but the interesting thing is that when you see people make demos about certain things, it is not appreciated what a large gulf there is between a tech demo and what is going to be shown in a game. For example, I have this 60fps 720p ray tracing engine but it’s dealing with static models. It doesn’t have character animation or anything. It could spit out a depth buffer and spit out a hybrid engine to do that. Also, the fact that it runs 60fps at 720p for a ray tracer (seems great) but if you were just drawing that same thing with no fragment shaders using a traditional engine it would be running at 1,000fps. We do all this other stuff; once you throw your particles, post processing, and character animation... all that. That core part that has great detail and runs 20fps is a huge gulf between what can be part of a triple A engine.
On the other hand, we have converted all of our offline processing stuff to ray tracing. For years, the back-end MegaTexture generation for Rage was done with... we had a GPGPU cluster with NVIDIA cards and it was such a huge pain to keep. It was an amazing pain where one system would be having heat problems and would be behaving weird even though we thought they had identical drivers. Something would always be wrong with render farm 12, and whenever we wanted to put in new features it was like “Okay, writing new fragment programs to go into this.” Now, granted I did this just when CUDA was in its infancy. If I did re-implement it with OpenCL or CUDA we wouldn’t have some of these problems, but when I converted all these over to ray tracing there was a number of things that got a lot better. Things that we deal with, [for example] shadows and reflections that have to be approximated, and were so used to doing with rasterization... we sometimes forget how big of hacks these are. To be able to say I really just want that ray, and tell me what it hit; not do a projection with feathering shadowed edges and whatever the heck else we’re doing there, so much of the code got so much easier. If it’s a choice of... now that we have these awesome multi-core x86 CPUs where we can get 24 threads in commodity boxes... it’s true that one GPU card can do more ray tracing than one 24 thread x86 box, but it’s not multiples more and if it’s just a matter of buying more $2000 boxes, it makes the development, maintenance, and upkeep much better. While everyone in high performance computing is all “rah-rah” GPUs right now, I’ve come full circle back around to saying the fact that we can get massive amounts of x86 cores and threads... it wont win on FLOPS/watt or FLOPS/volume, but in terms of results per developer hour it is much, much better.
Ryan Shrout: There are different types of efficiency then?
John Carmack: Yeah, and when you are developing it is perfectly fine for us to spend hundreds of thousands of dollars in our back room to go ahead and make things better for that. It is a different question where, if every consumer has some graphic processor of some kind, we are going to do whatever it takes to max that out because you can’t tell everybody to go out and buy lots of x86 boxes. It’s interesting, the difference between the design decisions you make for a consumer target versus a development target.