Digging in a Little Deeper into the DiRT
Over the past few weeks I have had the chance to play the early access "DiRT Rally" title from Codemasters. This is a much more simulation based title that is currently PC only, which is a big switch for Codemasters and how they usually release their premier racing offerings. I was able to get a hold of Paul Coleman from Codemasters and set up a written interview with him. Paul's answers will be in italics.
Who are you, what do you do at Codemasters, and what do you do in your spare time away from the virtual wheel?
Hi my name is Paul Coleman and I am the Chief Games Designer on DiRT Rally. I’m responsible for making sure that the game is the most authentic representation of the sport it can be, I’m essentially representing the player in the studio. In my spare time I enjoy going on road trips with my family in our 1M Coupe. I’ve been co-driving in real world rally events for the last three years and I’ve used that experience to write and voice the co-driver calls in game.
If there is one area that DiRT has really excelled at is keeping frame rate consistent throughout multiple environments. Many games, especially those using cutting edge rendering techniques, often have dramatic frame rate drops at times. How do you get around this while still creating a very impressive looking game?
The engine that DiRT Rally has been built on has been constantly iterated on over the years and we have always been looking at ways of improving the look of the game while maintaining decent performance. That together with the fact that we work closely with GPU manufacturers on each project ensures that we stay current. We also have very strict performance monitoring systems that have come from optimising games for console. These systems have proved very useful when building DiRT Rally even though the game is exclusively on PC.
How do you balance out different controller use cases? While many hard core racers use a wheel, I have seen very competitive racing from people using handheld controllers as well as keyboards. Do you handicap/help those particular implementations so as not to make it overly frustrating to those users? I ask due to the difference in degrees of precision that a gamepad has vs. a wheel that can rotate 900 degrees.
Again this comes back to the fact that we have traditionally developed for console where the primary input device is a handheld controller. This is an area that other sims don’t usually have to worry about but for us it was second nature. There are systems that we have that add a layer between the handheld controller or keyboard and the game which help those guys but the wheel is without a doubt the best way to experience DiRT Rally as it is a direct input.
NCSU Researchers Tweak Core Prefetching And Bandwidth Allocation to Boost Mult-Core Performance By 40%
Subject: Processors | May 27, 2011 - 03:26 PM | Tim Verry
Tagged: processor, multi-core, efficiency, bandwidth, algorithm
With the clock speed arms race now behind us, the world has turned to increases in the number of processor cores to boost performance. As more applications are becoming multi-threaded, CPU core increases have become even more important. In the consumer space, quad and hexa-core chips are rather popular in the enthusiast segment. On the server side, eight core chips provide extreme levels of performance.
The way that most multi-core processors operate involves the various CPU cores having access to their own cache (Intel’s current gen chips actually have three levels of cache, with the third level being shared between all cores. This specific caching system; however, is beyond the scope of this article). This cache is extremely fast and keeps the processing core(s) fed with data which the processor then feeds through its assembly line-esque instruction pipeline(s). The cache is populated with data through a method called “prefetching.” This prefetching pulls data of running applications from RAM using mathematical algorithms to determine what the processor is likely going to need to process next. Unfortunately, while these predictive algorithms are usually correct, they sometimes make mistakes and the processor is not fed with data from the cache and thus must look for it elsewhere. These instances, called stalls, can severely degrade core performance as the processor must reach out past the cache and into the system memory (RAM), or worse, the even slower hard drive to find the data it needs. When the processor must reach beyond its on-die cache, it is required to use the system bus to query the RAM for data. This processor to RAM bus, while faster than reading from a disk drive, is much slower than the cache. Further, processors are restricted in the amount of available bandwidth between the CPU and the RAM. As the number of included cores increases, the amount of shared bandwidth each core has access to is greatly reduced.
The layout of a current Sandy Bridge Intel processor. Note the Cache and Memory I/O.
A team of researchers at North Carolina State University have been studying the above mentioned issues, which are inherent in multi-core processors. Specifically, the research team was part of North Carolina State University’s Department of Electrical and Computer Engineering, and includes Fang Liu and Yan Solihin who were funded in part by the National Science foundation. In a paper concluding their research that will be presented June 9th, 2011 at the international Conference on Measurement and Modeling of Computer Systems, they detail two methods for improving upon the current bandwidth allocation and cache prefetching implementations.
Dr. Yan Solihin, associate professor and co-author of the paper in question stated that certain processor cores require more bandwidth than others; therefore, by dynamically monitoring the type and amount of data being requested by each core, the amount of bandwidth available can be prioritized by a per-core basis. Solihin further stated that “by better distributing the bandwidth to the appropriate cores, the criteria are able to maximize system performance.”
Further, they have analyzed the data of the processors hardware counters and constructed a set of criteria that seek to improve efficiency by dynamically turning prefetching on and off on a per-core basis. By turning prefetching on and off on a per core basis, this further provides bandwidth to the cores that need it. By implementing both methods, the research team was able to improve multi-core performance by as much as 40 percent versus chips that do not prefetch data, and by 10 percent versus multi-core processors with cores that do prefetch data.
The researchers plan to detail their findings in a paper titled “Studying the Impact of Hardware Prefetching and Bandwidth Partitioning In Chip-Multiprocessors,” which will be publicly available on June 9th. The exact algorithms and criteria that they have determined will decrease the number of processor stalls and increase bandwidth efficiency will be extremely interesting to analyze. Further, it will be interesting to see if any of these improvements will be implemented by Intel or AMD in their future chips.