PCPer Mailbag #19 - Thanksgiving Edition!

Subject: Editorial | November 22, 2017 - 05:00 PM |
Tagged: video, Ryan Shrout, pcper mailbag, pcper

It's a special Thanksgiving edition of our weekly Q&A Mailbag! Take a break from the turkey and the in-laws and check out today's topics:

00:30 - Smartphone-like high efficiency cores for future laptops?
02:48 - Why is supersampling so demanding?
04:08 - Will GPUs ever replace CPUs?
05:18 - NVIDIA CPUs?
06:42 - Planned obsolescence for Android devices?
08:35 - HDR performance hit in games?
09:26 - M.2 GPUs?
10:53 - Raspberry Pi for holiday lights?
12:31 - PCPer origami?
13:21 - Memorable alcohol?
15:06 - Turkey vs. Ham

Be sure to subscribe to our YouTube Channel to make sure you never miss our weekly reviews and podcasts, and please consider supporting PC Perspective via Patreon to help us keep videos like our weekly mailbag coming!

Source: YouTube

November 22, 2017 | 05:40 PM - Posted by WiderOrderSuperscalarForSure (not verified)

Smartphone-like high efficiency cores for future laptops?

Starting with Apples A7 Cyclone processors Apple's Custom ARMv8A ISA running cores have been very powerful and able to execute more micro-ops per cycle than any of the Arm Holdings Reference design cores.

The Apple A7 cyclone cores are powerful, take a look and compare some of the execution resources to Intel's Haswell core i Series cores and this info on the Apple A7 Cyclone cores from Anand lal Shimpi's AnandTech srticle on the Apple A7:

CPU Codename----------------Cyclone,
ARM ISA---------------------ARMv8-A(32/64),
Issue Width-----------------6 micro-ops,
Reorder Buffer Size---------192 micro-ops,
Branch Mispredict Penalty---16 cycles (14 – 19),
Integer ALUs----------------4,
Load/Store Units------------2,
Load Latency----------------4 Cycles,
Branch Units----------------2,
Indirect Branch Units-------1,
FP/NEON ALUs----------------3,
L1 Cache-–------------------64KB I$ + 64KB D$,
L2 Cache--------------------1MB,
L3 Cache--------------------4MB,

The A7's Issue width, 6 micro-ops, is twice that of the Arm Holdings Refrence design cores and that Reorder Buffer Size is right up there with the desktop x86 designs that where around at the A7 Cyclone's time of release(Haswell). That Apple A7 looks more like a desktop CPU than a smartphone CPU. Apple's In-House Graphics needs a closer look and it's all black-box with Apple because Apple does not act like a technology company Apple acts like a retailer. Real Computer technolgy companies preset at the Hot Chips Symposium but Apple has not been there in years.

It's too bad that we have not had any News From AMD on that Jim Keller design custom ARM core K12 project that Keller worked on while he was working on the Zen x86 ISA design. And AMD's K12 was supposed to be more like the Zen design under the hood and it was only that K12 was engineered to execute the ARMv8A ISA instead of the x86 ISA. AMD's K12 may have even had SMT capabilities and that's an ability that all the current ARM custom/refrence designs lack, even Apple's custom designs!

November 22, 2017 | 06:31 PM - Posted by Anony mouse (not verified)

@ 6:38

That's one huge baby. How long was the recovery for that thing.

November 22, 2017 | 10:33 PM - Posted by djotter

@ 8:38 The Inebriati, pronounced, I assume, as a mashup of inebriated and Illuminati.

November 22, 2017 | 10:55 PM - Posted by djotter

Question: What do you think next generation VR headsets resolution will be? 1600p, 2160p? Up from 1200p, that is 1.77x or 3.24x the number of pixels. Will a 1080Ti be the new minimum to use a 2nd gen VR headset?

November 23, 2017 | 03:26 AM - Posted by JohnGR

"NVIDIA CPUs"

Nvidia did wanted to make x86 CPUs and probably the question was about x86 CPUs. But Intel refused to give them an x86 license.

That "low margin" thing, is the favorite excuse Nvidia uses any time it gets thrown out of a business. When they lost consoles they said the same, only to get involved in Switch a few years later, when they saw how important is today the console market, for the GPU architecture in PCs. With no consoles using AMD's GCN architecture, everyone would be using Gameworks and PhysX today, for game development and maybe DX12 and Vulkan would be different, favoring Nvidia's architectures the most.

A question.
Can we expect future laptops with Nvidia's SOCs in them, running Windows 10? Or is it a partnership exclusive between Qualcomm and Microsoft?

November 23, 2017 | 10:30 AM - Posted by DenverIsInCartmansProbe (not verified)

Nvidia's Denver(gen 1) cores are a little bit wider order superscalar than Apple's A7 Cyclone cores so 7+ ops per cycle and Nvidia's Parker/Denver2 cores have a bit more performance. Nvidia's Denver cores are a little bit different in that there is an underlying binary translation layer in software running at a lower level than the OS that stores already optimized code sequences and "According to Charlie Demerjian, the Project Denver CPU may internally translate the ARM instructions to an internal instruction set, using firmware in the CPU". (1)

I do think that Nvidia can also make use of an OpenPower license also and maybe make some use of the Power9(SMT4) core design and that all depends on OpenPower and its Licensees getting enough market share to maybe attract Microsoft's attention enough to port the windows OS over to the Power ISA. If Nvidia wanted to have some Linux Gaming oriented PC products in addition to its Tegra based Nintendo Switch Nvidia already has its Shield SKUs so maybe a Shield branded type of laptop could be developed running a Linux Distro and using Nvidia's Devner 2/newer cores.

If Intel can purchase a semi-custom GPU Die from AMD then maybe Nvidia could purchase a semi-custom x86 CPU die from AMD and create an MCM based SKU similar to Intel's MCM based design and Nvidia already uses Interposers so that's an option.

Really what Microsoft needs to do is create a full port of its windows OS onto the ARMv8A ISA sans the legacy software translation layers but that's an espensive and time consuming process that is just now starting with Microsoft and Qualcomm and those translation layers will be aroud for a long while.

So according to Wikipedia:

"Project Denver is the codename of a microarchitecture designed by Nvidia that implements the ARMv8-A 64/32-bit instruction sets using a combination of simple hardware decoder and software-based binary translation (dynamic recompilation) where "Denver's binary translation layer runs in software, at a lower level than the operating system, and stores commonly accessed, already optimized code sequences in a 128 MB cache stored in main memory".[1] Denver is a very wide in-order superscalar pipeline. Its design makes it suitable for integration with other SIPs cores (e.g. GPU, display controller, DSP, image processor, etc.) into one die constituting a system on a chip (SoC).

Project Denver is targeted at mobile computers, personal computers, servers, as well as supercomputers.[2]" (1)

"History

The existence of Project Denver was revealed at the 2011 Consumer Electronics Show.[8] In a March 4, 2011 Q&A article CEO Jen-Hsun Huang revealed that Project Denver is a five-year 64-bit ARMv8-A architecture CPU development on which hundreds of engineers had already worked for three and half years and which also has 32-bit ARM instruction set (ARMv7) backward compatibility.[9] Project Denver was started in Stexar Company (Colorado) as an x86-compatible processor using binary translation, similar to projects by Transmeta. Stexar was acquired by Nvidia in 2006.[10][11][12]

According to Tom's Hardware, there are engineers from Intel, AMD, HP, Sun and Transmeta on the Denver team, and they have extensive experience designing superscalar CPUs with out-of-order execution, very long instruction words (VLIW) and simultaneous multithreading (SMT).[13]

According to Charlie Demerjian, the Project Denver CPU may internally translate the ARM instructions to an internal instruction set, using firmware in the CPU.[14] Also according to Demerjian, Project Denver was originally intended to support both ARM and x86 code using code morphing technology from Transmeta, but was changed to the ARMv8-A 64-bit instruction set because Nvidia could not obtain a license to Intel's patents.[14]

The first consumer device shipping with Denver CPU cores, Google's Nexus 9, was announced on October 15, 2014. The tablet is manufactured by HTC and features the dual-core Tegra K1 SoC. The Nexus 9 is also the first 64-bit Android device available to consumers.[15]
" (1)

(1)

"Project Denver"

https://en.wikipedia.org/wiki/Project_Denver

November 23, 2017 | 10:34 PM - Posted by Streetguru

Why does no one know what bottlenecking is for gaming? Would you agree with the simplification that:
CPUs bottleneck refresh rate
GPUs bottleneck Resolution/Quality Settings

November 24, 2017 | 06:35 AM - Posted by JohnGR

I think this was answered in one of the first mailbags.

November 24, 2017 | 02:00 PM - Posted by Dbsseven

We’re you surprised Intel announced an “APU” with HBM before AMD?

November 24, 2017 | 03:18 PM - Posted by ItDoesNotQuackLikeAnAPU (not verified)

It's not an APU is more of a nano-motherboard that Intel EMIB/MCM package arrangement is. And the Intel SOC in that Package still has its Intel DogFOOD integrated graphics and that Intel SOC interfaces over the EMIB package to the discrete Radeon GPU die via PCIe traces.

So Let's be perfectly clear that Intel EMIB embedded Interposer part is only there to interface the semi-custom discrete Radeon Polaris die with its HBM2 VRAM and that Radeon discrete die is interfaced to the Intel SOC die the same PCIe way as if it was done via a laptop BGA/Socketed CPU to an included BGA on motherboard GPU. It's just a nano-motherboard like EMIB/MCM attached to a larger main board on that NUC photograph that's been making its away around the interwebs.

AMD's APUs are a bit more Integrated CPU cores to Integrated Radeon GPU CUs/Vega nCUs than some over PCIe interface and the Raven Ridge APUs use the Infinity Fabric and that's a level above in its abiity to link CPU cores to CPU cores or CPU cores to GPU cores via that infinity fabric.

It's not an APU its a nano-motherboard for most likely Apple's use and other mini-desktop/NUC like devices to keep Apple from totally dropping Intel and going full on with AMD's Raven Ridge APUs. Intel is only getting a Radeon Polaris SEMI-CUSTOM die and not a Vega die in this semi-custom deal so look for AMD to be coming out with some better Vega based APUs in the latter parts of 2018.

The Raven Ridge Mobile APU SKUs and the first Desktop Raven Ridge Desktop APUs will probably be the same die designs with the Desktop Raven Ridge variants configured for Desktop wattage and faster DDR4 memory. But towards the end of 2018 watch out as that HBM2 may be actually available on a true AMD APU variant that Intel will not be able to match with that EMIB nano-motherboard design. That whole Intel/AMD on that EMIB/MCM nano-motherboard design has Apple's name written all over it as Apple has the money and the influnce to make both Intel and AMD join hands and Tango 'till the rooster calls!

Apple has more cash on hand than the total market CAPs of Intel and AMD combined and Apple is almost a trillion dollar market cap company!

November 28, 2017 | 02:43 PM - Posted by Dbsseven

I get all the Apple business, my point was about the HBM integration and why AMD assisted Intel in going first. I'm sure AMD could look at Intel's semi-custom request, versus their internal roadmap, and figure out Intel would launch first.

AMD has had HBM for years, and must have had Polaris BEFORE offering it for semi-custom. The HBM integration they did for Intel they could have done for themselves. And the total die size of Polaris plus Ryzen is smaller than the Fiji chips. Given all this, a Ryzen APU with HBM via interposer is certainly possible for AMD. No need to help the competition.

And if (or eventually when) AMD gives the CPU cores access to the HBM there might be some serious CPU advantages (ie. Crystalwell). Though I haven't done the math to figure if Infinity Fabric is fast enough for this purpose, or if a unified memory controller is needed.

I wonder if interposers aren't economical for APUs(?), while EMIB is better?

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.