Microsoft Xbox Project Scorpio Whitepaper Leaked

Subject: Systems | January 24, 2017 - 10:30 PM |
Tagged: xbox one, xbox, Project Scorpio, microsoft

Digital Foundry received an Xbox Project Scorpio whitepaper from an anonymous source, although they were able to validate its authenticity. Basically, they sent it to their own, off-the-record sources who would have access to the same info, and those individuals confirmed it’s an official document that they’ve seen before. Of course, the trust bottlenecks through Digital Foundry, but they’re about as reputable as you can get in this industry, so that works.

Anywho, disclaimer aside, the whitepaper unveils a few interesting details about how Project Scorpio is expecting to provide higher performance. The most interesting change is what’s missing: the small, on-chip RAM (ESRAM). Microsoft claims that the higher global memory bandwidth removes the need to have it on Project Scorpio.

Digital Foundry is still a bit concerned that, while the 320 GB/s bandwidth might be enough, the latency might be a concern for compatibility. Personally, I’m not too concerned. Modern GPUs do a huge amount of latency-hiding tricks, such as parking whole shaders at global memory accesses and running other tasks while the GPU fetches the memory the original shader needs, swapping it back and finishing when it arrives. Also, the increased GPU performance will mean that the game has more room to be wasteful of GPU resources, since it only needs to perform at least as good as a regular Xbox One. I expect that there wouldn’t be enough round-trips to ESRAM for it to be a major slowdown when running on Project Scorpio (and its not-ESRAM).

View Full Size

Seriously, Wall-E with a Freddie Mercury 'stache.

Microsoft does suggest that developers make use of ESRAM on Xbox One and Xbox One S, though. Yes, don’t deliberately throw away performance on the slower machines just because that accelerator isn’t available on higher-end devices, like Project Scorpio or a gaming PC (heh heh heh).

Another point that Digital Foundry highlighted was that the actual number of rendered fragments (pixels that may or may not make it to screen) didn’t scale up by a factor-of-four (going from 1080p to 4K) in all cases. A first-party developer noticed a case where it was only a 3.5x scaling between the two resolutions. (This metric was actually rendered pixels, not even just GPU load, which would include resolution-independent tasks, like physics simulations.) I’m not exactly sure how the number of fragments decreased, but it could be due to some rendering tricks, like when Halo renders the background at a lower resolution. (Yes, I’m using Khronos verbiage; it’s less ambiguous.)

They also assume that Project Scorpio will use pre-Zen AMD CPU cores. I agree. It seems like Zen wouldn’t be around early enough to make production, especially when you consider the pre-release units that are circulating around Microsoft, and probably third-party developers, too.

Project Scorpio launches this holiday season (2017).


January 24, 2017 | 10:52 PM - Posted by Anonymous (not verified)

Damn,Son.. It says 4K right on the die.

January 24, 2017 | 11:55 PM - Posted by Anonymous (not verified)

You have to love those "analysis". Scott, if you were any good or knew what you were talking about, you would actually be working for a proper tech company. The Borat's stache reference was the only highlight of this "article".

January 25, 2017 | 01:01 AM - Posted by Scott Michaud

Ah... the good 'ole "those who can, do; those who can't, teach" insult.

That said, do you have any specific complaints about my reporting? If so, then please, voice them. I genuinely welcome any and all feedback.

January 25, 2017 | 06:21 AM - Posted by Jann5s

Don't let the haters get you Scott, I always enjoy reading your work!

January 25, 2017 | 10:41 AM - Posted by Mike S. (not verified)

Seconded.

January 25, 2017 | 10:23 AM - Posted by flippityfloppit...

Yeah, don't feed them trolls.

January 25, 2017 | 10:58 AM - Posted by Anonymous (not verified)

Guarantee this was Josh W.

January 25, 2017 | 02:06 PM - Posted by Jeremy Hellstrom

So what do we get for your guarantee being completely wrong?

January 26, 2017 | 05:54 AM - Posted by Wardialer Turden (not verified)

Hmmm. I read semiengineering, the next platform, hpcwire and insidehpc mostly, but pcper is one of the few consumer oriented technology sites I read.

Genuinely curious what your level of expertise in the field of computers is.

Otherwise youre just another shithead who thinks they're anonymous but you're probably leaking their ip via WebRTC from a badly configured VPN.

January 25, 2017 | 12:01 AM - Posted by Anonymous (not verified)

It sounds a lot like a PS4 PRO on steroids. Roughly 1.5 X the RAM and GPU.

January 27, 2017 | 03:22 AM - Posted by Anonymous (not verified)

I believe the PS4 PRO had an extra 1GB of slower memory added for non-gaming tasks to free up more memory for 4K.

January 25, 2017 | 02:19 AM - Posted by Anonymous (not verified)

The chip for the Xbox would probably be on its own schedule, so I don't know if the desktop Zen schedule is relevant. There is the possibility of a lot of shared design across all of AMD's products, so I would expect a lot of re-use, but they could tape out at significantly different times. It is certainly going to be on 14 nm, and shrinking an old excavator core sounds like a lot of work compared to using a Zen Lite core that would be targeted at 14 nm from the start. Shrinking an excavator core just doesn't seem like it makes much sense. A single Zen module would probably be overkill, especially if it is a full 4 core/8 thread. Perhaps a 2 core/4 thread part if they are full Zen cores or a 4 core/8 thread part with very stripped down, minimal cores. I would expect AMD to make a smaller, lower power version of Zen anyway. They would probably need to strip out a lot of the speculative execution features that waste a lot of power. It wouldn't be comparable to a desktop Zen part. While it is possible that it is an excavator based design, I would be surprised. I expect a Zen based design with Polaris graphics. I don't think it will be a full featured, desktop Zen cores though. It would be interesting if some Vega features made it into the part, but that is probably not possible.

January 25, 2017 | 03:40 AM - Posted by Anonymous (not verified)

I think, the easiest path would be to strip some cache from ZEN.

January 25, 2017 | 12:17 PM - Posted by Anonymous (not verified)

It would not be a Zen core with speculative execution features stripped out and the speculative execution features are saving more energy than they waste in transistor count by keeping the execution pipelines from having to be constantly flushed at the waste of many CPU/pipeline clock cycles that otherwise would be doing productive work.

Zen’s SMT feature also makes for more efficient use of execution pipeline slots with the SMT ability to context switch between a CPU core’s processor threads. So with SMT if a CPU core has two available processor threads to manage it can very quickly switch the core’s execution over from one thread that has stalled to the other that can be worked on until the stalled thread’s dependency is resolved and the CPU core’s execution resources can again be switched back to working on the formally stalled thread’s work.

A CPUs execution pipelines have slots that have to execute by the clock so if there is no productive work to be done on that clock cycle then the execution pipeline/s have to execute a NOP(No operation) cycle/(Pipeline Bubble) that does no productive work. What SMT and speculative execution does is allow the CPU to keep its execution pipeline/s available slots full doing useful work and not wasting available pipeline cycles/slots with NOPs that do no productive work.

Some CPU features pay for themselves in efficiencies gained to more than justify the increase in transistor counts that it takes to implement these features(SMT and speculative execution). SMT and speculative execution would have never been utilized if the savings numbers did not add up for the CPU designers to justify using these features to begin with.

January 26, 2017 | 02:27 AM - Posted by Anonymous (not verified)

Speculative execution does not save power. You are executing instructions and the results may need to be thrown away. A NOP doesn't update any processor state so it doesn't switch much of any transistors or require transmitting any data. Transmitting data unnecessarily burns a lot of power in interconnect. It also requires power to do the speculative execution. It isn't free. You have a lot of transistors to detect if the processor guessed correctly, and if not flush the pipeline or roll back the state. That also can require data transmission long distances on the chip which waste power in interconnect. Out of order execution is similar. It needs a lot of extra transistors to detect data hazards and forward results back early. It also probably requires a lot of long interconnect which is a massive waste of power. There are a lot of different speculative execution features. Some aren't implemented because they might consume like 10% more power for a 2% gain, or they may only provide benefit for a limited number of applications. At larger process tech, many low power processors were still in order designs. We may be to a point where the less extreme speculative execution features are cheap enough that it isn't a huge issue, but processors have become more limited by interconnect, and many of these features could require long interconnect. In today's multi-threaded environments, it is probably better to use threading to execute non-speculative instructions rather than waste resources on guessing.

Simultaneous multi-threading can be a bit of a mixed bag also. It depends a lot on the applications. Running multiple threads can cause cache thrashing. You are cutting the cache size in half or you are just hitting the cache a lot harder. Large cache Xeon processors are 145 watt TDP without an integrated GPU. SRAM cache consumes a huge amount of power. The increased demand on the caches can cause a lot of spilling out to higher level; L1 into L2, L2 into L3, etc. Pushing 64 byte cache lines out to a higher level cache unnecessarily will waste a lot of power in transfer and in the SRAM itself. For single threaded operation, the L1 would be exclusively devoted to a process for whatever time-slice it gets from the OS. With simultaneous multi-threading, you can have another process competing. It can reduce performance for some applications.

Since the Xbox One uses 8 low power cores, I suspect they will want to stay with an 8 thread part. It may have less cache than a desktop parts. With high-speed graphics memory, there could be less need of L3 cache. If it isn't an excavator variant, I would guess that it could be a 4-core/8 thread Zen module with little or no L3 cache and maybe some other power hungry features removed.

January 25, 2017 | 07:37 AM - Posted by Anonymous (not verified)

Prior to DF's analysis, I assumed Scorpio would have HBM to both emulate ESRAM functions of previous consoles and to expand on its capabilities. It would be really disappointing if Scorpio were held back from greatness by the slow Jaguar. Likewise, unless Microsoft demands greater adherence to solid frame rates (60Hz preferably) for all games, I don't really see the value of this over PS4. Fewer exclusives than PS4 and with all the fancy upscaling tricks, very few will actually notice the difference between the two consoles in practice.

January 25, 2017 | 10:49 AM - Posted by Anonymous (not verified)

"It would be really disappointing if Scorpio were held back from greatness by the slow Jaguar"

For PC gamers it would be great. No need to upgrade CPU for the next 5 years.

January 26, 2017 | 07:36 AM - Posted by kal` (not verified)

Well thanks to Intel you haven't had to update in the last 5 years. People out here still deciding whether they should upgrade the i5 2500K.

Hopefully AMD will take things up a notch, if not, then yeh, your cpu will be good for the next 5+ years.

January 27, 2017 | 03:27 AM - Posted by Anonymous (not verified)

INTEL had nothing to do with lack of need to upgrade. You aren't fully using those cores now are you?

No. We were all waiting for DX12 and Vulkan games that are well threaded.

Even if Intel could have somehow got another 15% single thread performance over what it is now (which is pretty hard to do at this point if you understand CPU architecture) that's only gaining you 15% gain in FPS in CPU-bottlenecked situations (Which ALSO is not that common).

What's neat though is that my overclocked i7-3770K has held on pretty nicely and will do so for several more years with DX12/Vulkan making better use of it.

January 27, 2017 | 03:34 AM - Posted by Anonymous (not verified)

Also, there's a good Gamers Nexus video that shows how far behind an i5-2500K is in some games.
https://www.youtube.com/watch?v=4chk3fWb6xI

The i7-7700K is beating the i5-2500K by 70% in some games (i.e. 114FPS vs 68FPS).
http://www.gamersnexus.net/guides/2773-intel-i5-2500k-revisit-benchmark-...

(I know it's apples to oranges but YOU mentioned upgrading the i5-2500K so thought I'd link and show what's possible).

Seriously though, don't expect much more PER CORE. Running a thread of code will reach the point of diminishing returns. Intel will gain some efficiency by dropping steps if they redesign (and lose some backwards compatibility) but even then I don't know how much can be gained.

The future is all about multi-threading.

January 25, 2017 | 07:59 AM - Posted by Anonymous (not verified)

Dropping ESRAM is a Big Fucking Deal. Not being present, it greatly complicates backward- and forward-compatibility for games that need to run on both the XB1 and Scorpio. It means maintaining and optimising two different memory access architectures side by side.

Even if a monolithic memory store can match the ESRAM in bandwidth or latency, changes in access behaviour will be inevitable. With the low-level memory access behaviours console games use out of necessity to achieve acceptable performance, any change will either dramatically affect performance by screwing timings, or completely break functionality altogether.

For all the issues with developers making half-arsed PS4Pro 'enhancements' that raise framebuffer size but lose framerate consistency (and tag's down to devs, not Sony), the one thing that has been universally praised by both developers and end-users is that you put an existing game in a PS4Pro and it works. As far as I can tell, there has not been one single report of a game that works on a PS4 failing to work on a PS4Pro.

If Scorpio lacks such effortless backward compatibility, something is going to give when developers are resource-constrained.

January 25, 2017 | 08:01 AM - Posted by Anonymous (not verified)

I mean to add: while this may well be a genuine document, there's no guarantee it's an document. ESRAM could still have been re-added if developers reacted en-mass to this document with these complaints.

January 25, 2017 | 08:03 AM - Posted by Anonymous (not verified)

LOL, looks like PCPer's comment system eats anything in italic tags! "an document" should be "an up-to-date" document.

January 25, 2017 | 10:37 AM - Posted by Mike S. (not verified)

I am confident Microsoft will get backwards compatibility right with this. But I think the real problem is that optimizing a Scorpio game for high performance is now substantially different than optimizing a non-Scorpio Xbox One.

How many companies will bother?

January 26, 2017 | 02:31 AM - Posted by Anonymous (not verified)

The memory access architecture for Scorpio should be very similar to PC though. I would expect the Xbox Scorpio version to be very close to the PC version. They will still need to make specially optimized versions for Xbox One, but that isn't any different from the current state.

January 27, 2017 | 03:44 AM - Posted by Anonymous (not verified)

I wouldn't say that memory access is "very close" to the PC version. Remember we have shared memory with the GPU and CPU priorities being quite different than on PC.

A lot of things change such as buffering memory in system memory (i.e. DDR4) and swapping with VRAM. Now it's the same memory so the CPU and GPU need to be able to assume control as needed. It's really quite different.

Having said that, as time goes on the programs to spit out code for PC and console will become better and reduce some of the effort of more manually tweaking things.

January 26, 2017 | 07:39 AM - Posted by kal` (not verified)

maybe, maybe not. Personally I don't think it'll be a big deal. providing the bandwidth on the Scorpio is fast enough to overcome it -- just like the PC has been doing.

That being said the xbox 360 functions completely different from the xbox one yet backwards emulation works fine. so who knows. it'll probably be fine.

January 26, 2017 | 01:34 PM - Posted by Anonymous (not verified)

Bandwidth != access latency. If a bit of game logic is dependant on things occurring at the correct time (and you'd be suprised how many things can be optimised down to instruction level timing on consoles) then changing access latency even with identical (or more) bandwidth available can cause things to break.

As for XB1 backward compatibility: that's a good example of the problem. Each 'compatible' game has to be coded for individually, AND go through QC to make sure it works in all cases (and some bugs still slip through). You can't just slap in any 360 game and have it work without issue.

January 27, 2017 | 03:39 AM - Posted by Anonymous (not verified)

Optimizing for eSRAM is a big deal, but I doubt dropping it will create many headaches.

January 27, 2017 | 04:25 AM - Posted by Anonymous (not verified)

XBOX ONE Scorpio will be my very first modern console, and I'm in my later 40's.

My first game console was PONG. Yep. Next was the Commodore 64. Awesome machine at the time with a great little color screen.

My roommate in the army had an NES so I played a Mario Bros game. really loved that game!

Bought a PC for word processing only ($3000CDN) in 1996. Played Warcraft 2 but no time for anything else. Military, traveling etc.

Most significant upgrade was an HD3870 graphics card. Yaay!

When I got the HD3870 I was looking also at GAME CONSOLES but for the games I wanted to play, and the visual quality I wasn't impressed.

Kept WAITING for consoles to do what I wanted, and a solid 60FPS was important to me for most games. (Racing games at 30FPS? No thanks).

I thought the big jump to the XBOX ONE (and it IS a big jump relative to the XBOX 360) would focus on 60FPS as a no-brainer. NOPE.. sigh.

Now I'm too invested with STEAM to completely replace but a second machine (no kids) for EXCLUSIVES is what I want, and to justify partly with a 4K BluRay machine.

*I'm wealth, but having a GTX1080 rig, plus XBOX ONE Scorpio, PS4 Pro, and maybe Nintendo Switch just to play a few exclusives one each of the consoles seems weird to me.

Anybody else WANT to play exclusives but find it hard to JUSTIFY a purchase just for that?

January 28, 2017 | 06:35 PM - Posted by Anonymous (not verified)

When the Xbox 360 came out, the top end GPU was probably less than 125 watts, so it wasn't so far behind pc graphics. With the Xbox One, high end GPUs were much more expensive, and pushing 250 or even close to 300 watts in some cases. There was no way that the Xbox One was going to be competitive performance wise with the power and price budgets that PCs had reached.

The other part of the equation is that process tech stalled in about 2006 or so. While the size scaling continued until now, the power did not get reduced with each generation as it had before. Because of his large the die is on GPUs, they were stuck at 28 nm for about 5 years. If they had tried to use 20 nm planar, the power consumption and heat density would have been out of control. They really needed an upgrade to the 360 though l, but it just wasn't going to be that spectacular on 28 nm. They ended up releasing it around the end of 28 nm GPUs. This means that they can make a significantly more powerful device just a few years later with the switch to 14 nm FinFET. Since the original Xbox One wasn't really up to par for 1080p, an upgrade is needed. If expectations had not been set by much more expensive and higher power PC graphics, then it would have been much more acceptable. They would have just lowered the quality to render it at a good frame rate.

January 27, 2017 | 05:02 AM - Posted by Anonymous (not verified)

CPU not ZEN... looking more likely.

As per the linked acticle the 30FPS (CPU) with 60FPS (GPU) does tend to reaffirm this, but what does so even MORE so is the fact they've apparently confirmed it's an 8-core CPU.

You would NOT put an 8-core RYZEN CPU in the XBOX ONE. It does not make sense in terms of die-size and the financial cost.

A RYZEN CPU running 2.4GHz for example would achieve roughly 2x the performance per core.

In addition, Hyperthreading (SMT) can add (I'm guessing) at least 30% per core.

That makes the MAXIMUM RYZEN performance for the CPU at 2.5X that of the previous.

*Also, I know it won't happen but RYZEN at 2.4GHz would allow games to jump from 30FPS to 60FPS since it's 2x perf per core; so if you ONLY changed the cap from 30FPS to 60FPS and did not mess with graphics or anything else your FPS would simply DOUBLE. Nice!

Oh well.

**So if the CPU is nearly IDENTICAL to the PS4 PRO, how much better can the Scorpio actually look vs the PS4 PRO?

Keep in mind you still ask more of the CPU when raising graphics quality. I believe it's tricky to avoid CPU bottlenecks on the PS4 PRO.

4K is already rapidly diminishing returns visually due to the DISTANCE one sits from an HDTV.

Are we really going to end up arguing about what a SMALL PERCENT of people can perceive between these two machines?

Do I have to WAIT several more years for a console that can do 60FPS in almost every game?

Will Windows 10 + Steam Big Picture (or Steam Link) end up my "console" of choice?

*Here's a COOOOLL idea:
Create an external box that attaches to your PC in a couple years. It would be the guts of an XBOX ONE Scorpio. No BD-drive or hard drive. You could make it about the size of a GTX1080 (lower frequencies, less GPU but more CPU, more efficient cooling outside of big case).

The GAME would simply move over to the eBOX, and frankly coding would not be that hard. We already switch GPU's in laptops.

(and we could use the SAME box for PC gaming too despite the difference in memory architecture. relatively simple really).

Maybe the coolest thing of all is that the unit would be well balanced (CPU, GPU, memory) or a more powerful version and attach it. Unlike the external GPU units for laptops which still need a good CPU and have other issues.

And think of the RESELL value?
Just plug the unit into AC power. NO PC. Push a rear button and have it run an extensive diagnostic. Works? Great. Sell and buy a more powerful one without needing a completely new PC or having balancing issues. (heck, with DX12 and VULKAN maybe you can plug in another one for Split Frame Rendering of games).

January 28, 2017 | 06:20 PM - Posted by Anonymous (not verified)

It is looking less likely to be Zen based, but if it is, they probably wouldn't need to use a full 8 core design. The current Xbox One chip is two 4 core jaguar modules. Since Zen is threaded and significantly higher performance, a single 4-core/8 thread module would probably outperform two jaguar 4-core modules.

The external graphics/CPU just isn't a thing, and probably will not be. All of the other components are the cheap components. The actual processing elements are the expensive bits. Add in the need for a separate, and quite large power supply, and it gets to expensive. It is just better to have two separate systems.

January 27, 2017 | 05:10 AM - Posted by Anonymous (not verified)

(above should be 2x per core PRIOR to HT/SMT or a MAXIMUM single thread performance of 2x... it's easy to mix up core and thread performance

Core performance is based on a ROUGH estimate of 40% IPC x 40% frequency assuming 1.75 to 2.4GHz.. I'm not sure of the official XBOX ONE original, non-S spec under load

And HT/SMT is in addition to that. As per the 2x SINGLE THREAD requirement for 60FPS gaming... I'm not sure how well threaded the MAIN GAME THREAD actually is. I know on PC it's been an issue, but since launch is that still a common source of bottleneck that HYPERTHREADING can not help with or not? IPC of course yes, but I don't know of TOTAL performance is all you need or if that CORE/THREAD main code bottleneck is still common since the launch of the XBOX ONE and PS4.)