Server and Workstation Upgrades

Intel is launching the Xeon E5-2600 v3 with up to 18 Haswell cores.

Today, on the eve of the Intel Developer Forum, the company is taking the wraps off its new server and workstation class high performance processors, Xeon E5-2600 v3. Known previously by the code name Haswell-EP, the release marks the entry of the latest microarchitecture from Intel to multi-socket infrastructure. Though we don't have hardware today to offer you in-house benchmarks quite yet, the details Intel shared with me last month in Oregon are simply stunning.

Starting with the E5-2600 v3 processor overview, there are more changes in this product transition than we saw in the move from Sandy Bridge-EP to Ivy Bridge-EP. First and foremost, the v3 Xeons will be available in core counts as high as 18, with HyperThreading allowing for 36 accessible threads in a single CPU socket. A new socket, LGA2011-v3 or R3, allows the Xeon platforms to run a quad-channel DDR4 memory system, very similar to the upgrade we saw with the Haswell-E Core i7-5960X processor we reviewed just last week.

The move to a Haswell-based microarchitecture also means that the Xeon line of processors is getting AVX 2.0, known also as Haswell New Instructions, allowing for 2x the FLOPS per clock per core. It also introduces some interesting changes to Turbo Mode and power delivery we'll discuss in a bit.

Maybe the most interesting architectural change to the Haswell-EP design is per core P-states, allowing each of the up to 18 cores running on a single Xeon processor to run at independent voltages and clocks. This is something that the consumer variants of Haswell do not currently support – every cores is tied to the same P-state. It turns out that when you have up to 18 cores on a single die, this ability is crucial to supporting maximum performance on a wide array of compute workloads and to maintain power efficiency. This is also the first processor to allow independent uncore frequency scaling, giving Intel the ability to improve performance with available headroom even if the CPU cores aren't the bottleneck.

QPI speeds get a slight upgrade on the platform as well, increasing available bandwidth between sockets in multi-processor systems. TDPs are raised as well – but are within 10-15 watts of the previous generation so its likely that not much redevelopment will be required by vendors to support the new Xeon family.

I won't spend too much time here; there are 22 different SKUs of the Xeon E5-2600 v3 that are being launched today, ranging from quad-core 3.0 GHz part to the 18-core E5-2699 v3 with a clock speed of 2.3 GHz. For low power environments there is a 55 watt processor option with 8-cores running at 1.8 GHz. Expect pricing to vary dramatically throughout the line as well.

Intel has built three chips for Haswell-EP with varying core count options. All fabricated on Intel's 22nm tri-gate transistor technology, one chip addresses 4-8 core processors, another addresses 6-12 core configurations and another for 14-18 core configurations. Intel has left themselves some overlap on the 6 and 8 core processors to bin and sort accordingly. These chips are BIG:

  • High Core Count
    • 5.56 Billion transistors
    • 661 mm2 die size
  • Medium Core Count
    • 3.83 Billion transistors
    • 483 mm2 die
  • Low Core Count
    • 2.6 Billion transistors
    • 354 mm2 die

This table offers a high level overview of all the major changes found in the v3 revision of the Xeon E5-2600 and the real-world benefits of the technologies. For example, the on-die bus has been updated to include two fully buffered rings, a necessary addition to support the extreme core counts launching today. The QPI interface frequency increase improves multi-socket coherence performance and Last Level Cache (LLC) changes reduce latency and increase bandwidth.

A comparison of the Xeon E5-2600 v2 and v3 internal architectures demonstrates the necessity of the buffered switches on the two ring buses. IVB-E stretched to 12 cores but the move to 18 cores requires some updated communication protocols. It is also interesting to note that many of the products will feature "unbalanced" dies, where there are more cores on one ring bus than on the other. Intel assured us that these differences are very, very minimal and should in no way affect per-thread performance.

Performance of crypto algorithms see a sizeable performance gain with the jump to AVX 2.0 even compared to SNB and IVB.

But the AVX performance does come at a cost – because of increased power draw when being heavily utilized by AVX instructions, clock speeds are going to be lower. These processors will now have a rated core base and turbo speed but also an AVX base frequency and an AVX Turbo frequency.

Resulting frequencies will depend on the utilization levels of the AVX code. For this example slide, with the 18-core E5-2699 v3, the base clock of 1.9 GHz will extend up to 2.6 GHz for "most" AVX workloads. If you are running an application with heavy AVX code inclusion you might be limited to 2.2 GHz or lower. Obviously the efficiency improvements that you get with AVX code will more than make up for the clock speed differences.

Along with the new series of processors comes some new platform technology as well. The C612 chipset shares nearly identical specs to the X99 chipset launched with the consumer Haswell-E platform this month. That includes 10 SATA 6G ports, 6 USB 3.0 ports and 8 USB 2.0 ports and up to 8 lanes of PCIe 2.0.

This chipset has support for two socket systems but still connects to the primary processor through DMI, which is a bit of a bandwidth limiting factor.

For a workstation or server builder, in the 2S market, Haswell-EP offers an unmatched combination of performance and features. With 40 lanes of PCI Express 3.0 from EACH processor, there is plenty of room for accelerator cards (GPUs, Xeon Phi) to be included and of course you can support Intel's latest Fortville network controllers with support for 40 GbE connectivity.

For small-scale servers or workstation buyers that are looking for optimal levels of performance for tasks like video editing or rendering, the combination of a high core count Xeon E5-2600 v3 processor and the C612 chipset should be a screamer. Internally here at PC Perspective, building a system with 36 processing cores and 72 processing threads (dual E5-2699 v3 CPUs) is dream-worthy, likely decreasing work times for some tasks by several times. The only real hiccup would be that current Windows operating systems can only address blocks of threads up to 64 – meaning 8 threads would be underutilized in that build.

I am hoping to get my hands on some of this hardware after IDF this week to really put it to the test. I realize that much of the target audience for processors like the Xeon E5-2600 v3 is beyond the scope of what we usually cover (HPC, comms servers, etc.), but the performance metrics to be gathered would be impressive. It's hard to even remember when it started again, but Intel's dominance in the high performance server market continues for yet another generation.