IBM Prepares Power9 CPUs to Power Servers and Supercomputers In 2018

Subject: Processors | September 2, 2016 - 01:39 AM |
Tagged: IBM, power9, power 3.0, 14nm, global foundries, hot chips

Earlier this month at the Hot Chips symposium, IBM revealed details on its upcoming Power9 processors and architecture. The new chips are aimed squarely at the data center and will be used for massive number crunching in big data and scientific applications in servers and supercomputer nodes.

Power9 is a big play from Big Blue, and will help the company expand its precense in the Intel-ruled datacenter market. Power9 processors are due out in 2018 and will be fabricated at Global Foundries on a 14nm HP FinFET process. The chips feature eight billion transistors and utilize an “execution slice microarchitecture” that lets IBM combine “slices” of fixed, floating point, and SIMD hardware into cores that support various levels of threading. Specifically, 2 slices make an SMT4 core and 4 slices make an SMT8 core. IBM will have Power9 processors with 24 SMT4 cores or 12 SMT8 cores (more on that later). Further, Power9 is IBM’s first processor to support its Power 3.0 instruction set.

View Full Size

According to IBM, its Power9 processors are between 50% to 125% faster than the previous generation Power8 CPUs depending on the application tested. The performance improvement is thanks to a doubling of the number of cores as well as a number of other smaller improvements including:

  • A 5 cycle shorter pipeline versus Power8
  • A single instruction random number generator (RNG)
  • Hardware assisted garbage collection for interpreted languages (e.g. Java)
  • New interrupt architecture
  • 128-bit quad precision floating point and decimal math support
    • Important for finance and security markets, massive databases and money math.
    • IEEE 754
  • CAPI 2.0 and NVLink support
  • Hardware accelerators for encryption and compression

The Power9 processor features 120 MB of direct attached eDRAM that acts as an L3 cache (256 GB/s). The chips offer up 7TB/s of aggregate fabric bandwidth which certainly sounds impressive but that is a number with everything added together. With that said, there is a lot going on under the hood. Power9 supports 48 lanes of PCI-E 4.0 (2 GB/s per lane per direction), 48 lanes of proprietary 25Gbps accelerator lanes – these will be used for NVLink 2.0 to connect to NVIDIA GPUs as well as to connect to FPGAs, ASICs, and other accelerators or new memory technologies using CAPI 2.0 (Coherent Accelerator Processor Interface) – , and four 16Gbps SMP links (NUMA) used to combine four quad socket Power9 boards into a single 16 socket “cluster.”

These are processors that are built to scale and tackle the big data problems. In fact, not only is Google interested in Power9 to power its services, but the US Department of Energy will be building two supercomputers using IBM’s Power9 CPUs and NVIDI’s Volta GPUs. Summit and Sierra will offer between 100 to 300 Petaflops of computer power and will be installed at Oak Ridge National Laboratory and Lawrence Livermore National Laboratory respectively. There, some of the projects they will tackle is enabling the researchers to visualize the internals of a virtual light water reactor, research methods to improve fuel economy, and delve further into bioinformatics research.

The Power9 processors will be available in four variants that differ in the number of cores and number of threads each core supports. The chips are broken down into Power9 SO (Scale Out) and Power9 SU (Scale Up) and each group has two processors depending on whether you need a greater number of weaker cores or a smaller number of more powerful cores. Power9 SO chips are intended for multi-core systems and will be used in servers with one or two sockets while Power9 SU chips are for multi-processor systems with up to four sockets per board and up to 16 total sockets per cluster when four four socket boards are linked together. Power9 SO uses DDR4 memory and supports a theoretical maximum 4TB of memory (1TB with today’s 64GB DIMMS) and 120 GB/s of bandwidth while Power9 SU uses IBM’s buffered “Centaur” memory scheme that allows the systems to address a theoretical maximum of 8TB of memory (2TB with 64GB DIMMS) at 230 GB/s. In other words, the SU series is Big Blue’s “big guns.”

View Full Size

A photo of the 24 core SMT4 Power9 SO die.

Here is where it gets a bit muddy. The processors are further broken down by an SMT4 or SMT8 and both Power9 SO and Power9 SU have both options. There are Power9 CPUs with 24 SMT4 cores and there are CPUs with 12 SMT8 cores. IBM indicated that SMT4 (four threads per core) was suited to systems running Linux and virtualization with emphasis on high core counts. Meanwhile SMT8 (eight threads per core) is a better option for large logical partitions (one big system versus partitioning out the compute cluster into smaller VMs as above) and running IBM’s Hypervisor. In either case (24 SMT4 or 12 SMT8) there is the same number of total threads, but you are able to choose whether you want fewer “stronger” threads on each core or more (albeit weaker) threads per core depending on which you workloads are optimized for.

Servers supporting Power9 are already under development by Google and Rackspace and blueprints are even available from the OpenPower Foundation. Currently, it appears that Power9 SO will emerge as soon as the second half of next year (2H 2017) with Power9 SU following in 2018 which would line up with the expected date for the Summit and Sierra supercomputer launches.

This is not a chip that will be showing up in your desktop any time soon, but it is an interesting high performance processor! I will be keeping an eye on updates from Oak Ridge lab hehe.

September 2, 2016 | 02:13 AM - Posted by Anonymous (not verified)

I was about to say that I want one, but I guess plugging a server with one of these into an internet connection with only 1Gb/s throughput would be considered abuse.

September 2, 2016 | 02:17 AM - Posted by quest4glory

I have quite a few customers who are still running POWER7+ and may not get into POWER8 until POWER9 ships. At least they're still going strong.

September 2, 2016 | 05:33 AM - Posted by Anonymous (not verified)

AMD just paid GloFo money so they could use another foundry. Polaris has shown it's issues. Maybe that's why they are aiming for 2018, a year after Zen comes out, so Zen can work out the kinks. Typical AMD, foot the bill for everyone and get none of the rewards.

September 2, 2016 | 07:29 AM - Posted by jabbadap (not verified)

Zen will be build on GF finfet 14nmm LPP process(Licensed from samsung).

IBM power 9 will be build on GF 14nm FF HP process(RFSOI), which is ibms own process sold to GF ~year ago. There might be clause in the contract, which prevents GF to offer 14nm HP process to any other than IBM. But in reality there's no products manufactured on that process yet. So we will see...

September 2, 2016 | 10:52 AM - Posted by Anonymous (not verified)

IBM, Samsung, and GF have been in a chip fab technology sharing/foundation for some years now but IBM's best chip fab IP is not given away it is licensed. GF got some IP to go along with the chip fabs and cash that IBM gave GF to take over IBM's Chip fabs, but a lot of the R&D/IP for IBM's in house use Power8s/Power9s is still under IBM's control, including some research fab capacity for R&D. IBM is one of the largest holders of patents/IP licensing on the planet, and IBM licenses its power8/power9 designs to third parties via the OpenPower foundation so expect there to be some affordable no IBM supplied power8/power9 server systems. Go read the AnandTech article on the Power8s and Power9's and see just what the power designs can do relative to Intel's x86 based SKUs.

Those SMT4 power9's with 24 cores are there to compete directly with Intel's Xeon SKUs. SMT4 has just about the best multi-threaded improvements overall for certain workloads, with diminishing returns for improved performance above SMT4, for 5 up to 8 processor threads per core, so for certain workloads that need to be more competitive with some of Intel’s Xeon SKUs those 24 core/SMT4 power9s will compete better with some of Intel’s Xeon offerings, overall the Power8’s are very competitive with Xeon SKUs, in fact Power8 outperforms Xeon on a lot of server benchmarks, so expect the power9 SKUs to take some market share, especially the third party licensed Power9’s from any licensees!

September 2, 2016 | 10:54 AM - Posted by Anonymous (not verified)

edit: no IBM
to: non IBM

September 2, 2016 | 05:47 AM - Posted by Anonymous (not verified)

"The Power9 processor features 120 MB of direct attached eDRAM that acts as an L3 cache (256 GB/s)."

More specifically, it seems to be 10 MB of cache for each 2 cores. It isn't a single large 128 MB cache and it is not on a separate chip. It is interesting that they went with eDRAM at level 3. It does have large L2 though; 512k per core. I wouldn't think that eDRAM would be anywhere near as fast as SRAM, but that large of cache as SRAM would have consumed a huge amount of die area and also would take a huge amount of power.

September 2, 2016 | 05:53 AM - Posted by Anonymous (not verified)

That Next Platform site has a terrible layout. On an iPad, the text only fills about a third of the screen along the right, and they seem to have disabled zooming. I really hat sites that do that. The text is often too small to read. I need an add on to black-list sites so I never go back to them by accident. I wouldn't mind having such a thing for sites that pop too much advertising also. If I get more than one thing that pops over the content and has to be closed, I am done with that website.

September 3, 2016 | 08:15 PM - Posted by Anonymous (not verified)

Confusing language.
Garbage collection is orthogonal to interpretation.
And this capability seems potentially useful for any memory allocator that zeros out newly allocated memory blocks or stack frames, not just garbage collected ones.

September 4, 2016 | 03:05 AM - Posted by Anonymous (not verified)

It would be nice if there is some consumer software to justify buying at least one of these.

September 4, 2016 | 11:52 AM - Posted by Anonymous (not verified)

You can buy a power8 server from one of the OpenPower licensees like Tyan, and they will probably be offering the SMT4 versions of the power9 for a more affordable price than IBM will offer for their versions of the power9. Hell I hope that someone gets/designs an 8 core SMT4 variant of the power9 and offers it up for the home server market. Nvidia could start offering home gaming servers with some SMT4 power9 variants and their GPUs for systems that run Linux/Vulkan. An 8 core SMT4 power9 variant would have 32 processor threads. As far as consumer software there are some good Linux based variants but it will be up to M$ to provide any windows version that could be run on the power9's ISA.

September 4, 2016 | 06:14 PM - Posted by quest4glory

They can run Linux,'s just a matter of porting and compiling or finding binary releases for POWER.

September 5, 2016 | 11:34 AM - Posted by Anonymous (not verified)

Watson would be nice bundle.

September 6, 2016 | 02:14 PM - Posted by Anonymous (not verified)

RH, therefore Fedora, supports POWER through development efforts and offer packages through their interval build service (koji).
RH and IBM have a good working relationship and i know that rh has a ton of IBM big iron their Westford lab.

September 6, 2016 | 02:22 AM - Posted by Anonymous (not verified)

Maybe a stupid question...

I know IBM has made Power CPU's long time a go with SMP, they're maybe the first to do SMP for mass production, and now they have SMP4 and SMP8 also, which mean 4 & 8 Threads per core.. which is much more than 2 threads per core for x86...

I know developing SMP is very tricky and hard, and when more threads per core are required the more engineering is needed, and maybe our current Desktop usage are not multi-threaded intensive as much as servers and super computers, so it's not worth the hustle for desktop usage...

I'm just asking about the feasibility of having more than two threads on x86, maybe Xeon and Opteron as these are meant for workstations and servers so they will use it... unless the workloads of these two will not benefit much...

November 3, 2016 | 01:38 AM - Posted by shaunheatheridge

Can be tricky sometimes. I am getting crazy on every start of the configuration. Recent;y, bought a refurbished unit ( ) and had to reasearch about its proper firmware, and luckily all IBM updates and firmware are just perfectly compatible.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

This question is for testing whether you are a human visitor and to prevent automated spam submissions.