It is common knowledge that computing power consistently improves throughout time as dies shrink to smaller processes, clock rates increase, and the processor can do more and more things in parallel. One thing that people might not consider: how fast is the actual architecture itself? Think of the problem of computing in terms of a factory. You can increase the speed of the conveyor belt and you can add more assembly lines, but just how fast are the workers? There are many ways to increase the efficiency of a CPU: from tweaking the most common or adding new instruction sets to allow the task itself to be simplified; to playing with the pipeline size for proper balance between constantly loading the CPU with upcoming instructions and needing to dump and reload the pipe when you go the wrong way down an IF/ELSE statement. Tom’s Hardware wondered this and tested a variety of processors since 2005 with their settings modified such that they could only use one core and only be clocked at 3 GHz. Can you guess which architecture failed the most miserably?

Pfft, who says you ONLY need a calculator?

(Image from Intel)

Netburst architecture was designed to get very large clock rates at the expensive of heat — and performance. At the time, the race between Intel and its competitors was clock rate: the higher the clock the better it was for marketers despite a 1.3 GHz Athlon wrecking a 3.2 GHz Celeron in actual performance. If you are in the mood for a little chuckle, this marketing strategy was all destroyed when AMD decided to name their processors “Athlon XP 3200+” and so forth rather than by their actual clock rate. One of the major reasons that Netburst was so terrible was branch prediction. Branch prediction is a strategy you can use to speed up a processor: when you reach a conditional jump from one chunk of code to another, such as “if this is true do that, otherwise do this”, you do not know for sure what will come next. Pipelining is a method of loading multiple commands into a processor to keep it constantly working. Branch prediction says: “I think I’ll go down this branch” and loads the pipeline assuming that is true; if you are wrong, you need to dump the pipeline and correct your mistake. One way that Pentium Netburst kept high clock rates was by having a ridiculously huge pipeline, 2-4x larger than the first generation of Core 2 parts which replaced it; unfortunately the Pentium 4 branch prediction was terrible keeping the processor stuck needing to dump its pipeline perpetually.

The sum of all tests… at least time-based ones.

(Image from Tom’s Hardware)

Now that we excavated Intel’s skeletons to air them out it is time to bury them again and look at the more recent results. On the AMD side of things, it looks as though there has not been too much innovation on the efficiency side of things only now getting within range of the architecture efficiency that Intel had back in 2007 with their first introduction of Core 2. Obviously efficiency per core per clock means little in the real world as it tells you neither about raw performance of a part nor how power efficient it is. Still, it is interesting to see how big of a leap Intel made away from their turkey of an architecture theory known as Netburst and model the future around the Pentium 3 and Pentium M architectures. Lastly, despite the lead, it is interesting to note exactly how much work went into the Sandy Bridge architecture. Intel, despite an already large lead and focus outside of the x86 mindset, still tightened up their x86 architecture by a very visible margin. It might not be as dramatic as their abandonment of Pentium 4, but is still laudable in its own right.