PCPer Mailbag #31 - 2/16/2018

Subject: Editorial | February 16, 2018 - 09:00 AM |
Tagged: video, Ryan Shrout, pcper mailbag

It's time for the PCPer Mailbag, our weekly show where Ryan and the team answer your questions about the tech industry, the latest and greatest GPUs, the process of running a tech review website, and more!

On today's show, Ryan is back in town to tackle your questions:

00:34 - AMD EPYC marketshare in datacenters?
04:22 - Intel's x86 licensing to AMD?
06:10 - Third-party pin-compatible processors?
08:27 - HSA support in new Ryzen APUs?
12:18 - Do NVIDIA and AMD care that gamers can't compete with miners for GPUs?
15:53 - Will Intel switch back to soldered heat spreaders for HEDT?
17:35 - Companies not releasing Spectre/Meltdown fixes for older hardware?
19:23 - Ryan's oldest working PC?
20:55 - Games making better use of CPU?
22:26 - Why is the mailbag video shot from the chest up?

Want to have your question answered on a future Mailbag? Leave a comment on this post or in the YouTube comments for the latest video. Check out new Mailbag videos each Friday!

Be sure to subscribe to our YouTube Channel to make sure you never miss our weekly reviews and podcasts, and please consider supporting PC Perspective via Patreon to help us keep videos like our weekly mailbag coming!

Source: YouTube

February 16, 2018 | 11:17 AM - Posted by ReadMoreFromServerRelatedAndResearchRelatedWebsitesOnHSA (not verified)

Epyc is a good dual socket value also and if you look at the numbers of 2P systems in use in the server market then Epyc's value is is more than just 1P.

On a 2P Epyc system you get Twice the memory channels 16 at 8 channels per socket. So for any systems that need a lot of memory bandwidth and not as much processing power there are options for using 2 lower core count 2P Epyc Processors, the 8 or 16 core lower cost SKUs, and still getting 16 or 32 core across 2 sockets but twice the memory bandwidth via 16 memory channels at 8 memory channels per socket. Some workloads love the memory bandwidth more than anything and Epyc/2P has the most available bandwidth.

I do not Know what the server OEMs are charging for their Epyc based systems, Dell and others, but a cost/benifit analysis must be looked at more than the Total System's purchase cost as well as the TCO(Total cost of Ownership).
But if you are building your own Epyc system the the savings is much more as you are looking at the cost of the hardware mostly with no labor costs involved(if you do the assembly) for that part of the price and it all depends on if you are able to make use of any Linux OS and open source solutions for whatever your software needs are.

On that HSA question:

HSA is still alive as the HSA foundation is still around and AMD's RR APUs use the Infinity Fabric so that IF can be used for cache coherency between CPU cores and GPU cores and caches between other IF capable processors also. If you go and read the Semiaccurate article(1).

[ And to stress what was stated below the author explicity states from that meeting held with AMD's IF engineers and the press:

"If the level of granularity is as fine as was intoned, it allows a CPU core to pass info to a shader ‘directly’ regardless of the two being on the same silicon or across a system."] (1)

So that HSA can work across CPU to GPU cores on the same die and also across CPUs to GPU on different dies for AMD. Read the entire article as it provides a good discription of where AMD is now with Zen/Vega, that IF that both Zen and Vega support, and where AMD will be gping forward with regards to HSA types of processing via the IF. AMD is not the only Processor maker to have HSA like features as all the modern microprocessors that can make use of OpenCL and even the DX12/Vulkan APIs will have more HSA like functionaliy for all processr makers to utilize.

AMD is not the only maker but AMD is much farther along in its HSA foundation HSA standards complience but just because the HSA foundation has their version of HSA the Khronos Group's SPIR-V is similar in reach to the HSA foundation's HSAIL.

"So that is the key to the new Infinity Fabric, the granularity, especially in mesh topologies it should allow bandwidth to scale with nodes. Topology is not protocol defined or restricted and the coherent links will work across sockets, CPUs, GPUs, and more. If the level of granularity is as fine as was intoned, it allows a CPU core to pass info to a shader ‘directly’ regardless of the two being on the same silicon or across a system. The separate control and data fabrics bring AMD up to modern SoC structures too, and in some ways beyond. Infinity Fabric is a really big deal, and at the risk of sounding like a broken record, it is going to be really interesting to see the details when " (1)

(1)

"AMD Infinity Fabric underpins everything they will make"

https://semiaccurate.com/2017/01/19/amd-infinity-fabric-underpins-everyt...

February 16, 2018 | 12:52 PM - Posted by Dbsseven

Since Apple designs the silicon and the OS for iOS devices, why do they license/use the ARM architecture? It seems like owning the whole stack would allow them to do some truly unique things in silicon and support them in the OS.

February 16, 2018 | 04:32 PM - Posted by BillionsSavedOnUsingAMatureISAandItsSoftwareOSEcosystem (not verified)

Apple licensed the ARMv8A ISA to get at the decades long OS/Software ecosystem that has built up around all of ARM Holdings' ISAs over the years. So that probably saved Apple more money than it cost to engineer its CPU hardware. The underlying Apple hardware in the A7/newer Apple A series processors that executes the ARMv8A ISA is a totally Apple custom design that Apple's P.A. Semiconductor(Acquired by Apple) engineers designed from the ground up.

Apple could go with RISC-V(Open ISA), or an in house ISA, but that's an entirely new software stack and until a better ecosystem is buit up around RISC-V, or any Apple in-house ISA, that would have to be designed with equivalent instructions that do the same things as the ARMMv8A ISA.
RISC-V's Instruction set is not as mature also relative to the established ISA like ARMv8A and x86 ISA are with their extentions for VMs and FP AVX(Intel and AMD), Neon(ARM)and others.

Apple haa a long history(1) with the Acorn RISC Machine(Now called Advanced RISC Machine/ARM Holdings). So that's billions in software/OS ecosystem savings over the years for Apple and there is probably not much limiting Apple from adding it's own custom Extentions to the ARMv8A ISA if Apple wanted. OSs and the supporting software/compiler and other software/firmware ecosystems cost billions to develop over the decades so changing an ISA is an expensive process and adopting an entirely new ISA that's never had any software ecosystem built up over the years is even more expensive. Apple did not have any problems switching from PowerPC to x86 because the x86 ISA has been around for decades also so its software ecosystem was already mature and widespread.

"Collaboration: ARM6[edit]

In the late 1980s Apple Computer and VLSI Technology started working with Acorn on newer versions of the ARM core. In 1990, Acorn spun off the design team into a new company named Advanced RISC Machines Ltd.,[24][25][26] which became ARM Ltd when its parent company, ARM Holdings plc, floated on the London Stock Exchange and NASDAQ in 1998.[27] The new Apple-ARM work would eventually evolve into the ARM6, first released in early 1992. Apple used the ARM6-based ARM610 as the basis for their Apple Newton PDA." (1)

(1)

"ARM architecture"

https://en.wikipedia.org/wiki/ARM_architecture

February 16, 2018 | 09:40 PM - Posted by Dbsseven

Good point about the history. But isn't the history/ecosystem only helpful if you're using pre-compiled code? Otherwise, so long as the compiler works properly it will generate machine code specific and efficient for the new architecture. (Sony thought this was worthwhile with the PS3, though that architecture was apparently very hard to optimize for.)

More generally my point was about why Apple uses "Vanilla" ARM. If you make the chips AND the compiler (as Apple does with LLVM) that offers unique advantages, such as customized silicon and ISAs for power/speed/etc. I am wondering if there aren't advantages that could be had by breaking the standard ARM ISA. (Asking another way: does Apple allow/use precompiled generic ARM blobs which would make specialization impossible?)

February 17, 2018 | 01:32 PM - Posted by AppleCouldDoWellGoingFullOnStackMachineInHardware (not verified)

It's much more than pre-compiled code it's all the SDK's and compilers that are optimized for ARM ISA and X86 ISA based systems over the years and the optimization manuals and millions to billions of engineer(Hardware, Software, Firmware) hours that have gone into the ARM ISA based systems, the x86 Based systems, and even the Power8/9, eariler IBM and MIPS ISA based systems.

Even Microsoft has been wworking for years, not very sucessful, on getting ARM support and most of that for win32 based applications on ARM is a translation layer so windows on ARM is not running native on the ARMv8A ISA yet also and even Apple had Rosetta to run PowerPC applications on x86 vial Rosetta's translation layers.

Apple is now moving more of its SOC functionality onto specilized ASIC prcessors as well on CPU die functional blocks so maybe in the future if enougf of Apple's specilized processin needs are done via its in house ASIC processing dies and thsoe sorts of related IP then Apple can think about maybe creating its own In-House CPU ISA. But Apple did not get all its billions spendin too much unecessarily when Apple just owns a precentage of Arm Holdings(Now part of Softbank) and Apple can use the tools for software(compilers, and pluguns, SDKs, middleware) that Arm Holdings creates and Arm Holdings funds that via the entire ARM ecosystem royalities/Licensing income stream forom all the arm based industry.

Apple gets its To Tier ARM Architectural License for the ARMv8A and other CPU(Trust Zone/PSP) licensing ast part of the deal and really Apple is using the ARMb8A ISA fo the very same economy of scale reasons that Apple is fabless and makes use of the Third Party Fab Businesses to Fab its Processor chips. Apple saves billions in not having to spend for expensive Fab Facility upkeep and Process node development costs as that's spread across the entire fab industry. So Ditto for and ARMv8A/x86 ISA based processors where Apple can use all the development and optimization middleware for the ARM and x86 ISA which had most of its development costs subsidized by the entire industries that make use of ARM ISA and x86 ISA based processors and related firmware and support hardware, MB Chipsets, etc.

Apple is nearly a trillion dollar market cap company because it knows what it needs to develop in house and what in can save money on by using that third party fab industry's economy of scale and the ISA market's economy of scale in the ARM ISA market more so than the lesser amount form the x86 ISA based market. Software/OSs cost more to develop and maintain than any hardware and going to a in-house Apple CPU ISA will cost mor on the software/OS ecosystem related costs than the cost to develop any new ISA. Everybody is making use of LLVM or other abstraction layers to have that portable code base.

If Apple really wanted to be revolutionary they would go back and look at some older CPU architectures from the past. And Transistor designs may go out of date relatively quickly but CPU architectures do not age as fast. That old Burroughs Stack machine architecture ran in its hadrware code that was almost high level(very lightly parsed down) code directly on the hardware.

So That Stack Machine architecture was ready made for running more directly procedural as well as object oriented(C++, other) Objective high level language code directly on the processor. Every bit of code on the Burroughs Stack machine ran in the hadrware was ran on a stack, a code stack and a data stack, and the OS(Burroughs MCP) and the hardware where directly in control over the stack pointers and that included top of srack pointers and bottom of stack pointers and any stack overflows/underflows generated a hardware based interrupt, ditto for in memory discriptor tags. The Burroughs machines' hardware(2) implementation was very influential to those that created Java, and other similar software VMs like LLVM, others.

From Wikipedia:

"The first member of the first series, the B5000,[3] was designed beginning in 1961 by a team under the leadership of Robert (Bob) Barton. It was a unique machine, well ahead of its time. It has been listed by the influential computing scientist John Mashey as one of the architectures that he admires the most. "I always thought it was one of the most innovative examples of combined hardware/software design I've seen, and far ahead of its time."[4] The B5000 was succeeded by the B5500[5] (which used disks rather than drum storage) and the B5700 (which allowed multiple CPUs to be clustered around shared disk). While there was no successor to the B5700, the B5000 line heavily influenced the design of the B6500, and Burroughs ported the Master Control Program (MCP) to that machine.

Unique features[edit]
All code automatically reentrant (fig 4.5 from the ACM Monograph shows in a nutshell why): programmers don't have to do anything more to have any code in any language spread across processors than to use just the two shown simple primitives. This is perhaps the canonical but no means the only benefit of these major distinguishing features of this architecture: Partially data-driven tagged and descriptor-based design
Hardware was designed to support software requirements
Hardware designed to exclusively support high-level programming languages
No Assembly language or assembler; all system software written in an extended variety of ALGOL 60. However, ESPOL had statements for each of the syllables in the architecture.
Few programmer accessible registers
Simplified instruction set
Stack architecture (to support high-level algorithmic languages)
Support for high-level operating system (MCP, Master Control Program)

Support for asymmetric (master/slave) multiprocessing
Support for other languages such as COBOL
Powerful string manipulation
An attempt at a secure architecture prohibiting unauthorized access of data or disruptions to operations[NB 2]
Early error-detection supporting development and testing of software
First commercial implementation of virtual memory[NB 3]
Successors still exist in the Unisys ClearPath/MCP machines
Influenced many of today's computing techniques" (1)

The Burrouhs Stack computer in hardware is similar to how Java/other LLVM and similar envionments are implemented in software but th Burroughs hardware implementation is more secure than can ever be acheived emulating a Stack machine in software to be run on arm/x86/power CPU's that are based loosely on the modified Harvard CPU architecture. So refrence this webage that highlights the Burroghs 5000/newer series Stack Machines(2).
.
.
.

"This page pays tribute to those who designed the Burroughs B5000 from 1960 to 1963. This was an innovative machine way ahead of its time, with the first commercial implementation of virtual memory, using descriptors, segments, and presence bits (p-bits) which caused an interrupt when the referenced segment was not in memory. WebObjects and OS X programmers will be familiar with this mechanism as ‘faults’.

Java programmers will relate to the B5000 in that its architecture was very high level, like a Java Virtual Machine (JVM), but implemented in hardware for efficiency. The B5000’s “virtual” machine was ALGOL oriented, but also included instructions for powerful string manipulation – a most important application in today’s environment. Better than JVM implementations though was the B5000’s stack dumps – if a program or the system crashed, the dumps included not just the called routines as with Java, but all the local stack frames and variables as well, so that a programmer could see what had happened.

This is not the only influence that the B5000 has on Apple and beyond, since Alan Kay, the inventor of the GUI window, and Xerox and Apple luminary was influenced by the B5000’s designer, Bob Barton, in his thinking about object-oriented systems.

Indeed the B5000 had many object-oriented features in its architecture with tags to distinguish word types (although strictly speaking tags weren’t fully implemented until the B5500), which helped keep the instruction set simple, since only one instruction was needed for a function, rather than the myriad of variants needed on conventional architectures. For example, there was only one ADD instruction, the variant coming from the data word itself as to whether it should perform a single or a double addition. This helped make the B5000 a true RISC computer, as in Regular Instruction Set Computer.

The system of tags also preserved system integrity and security. For example, odd tags represented data that could not be overwritten by user-level processes. Thus code (being tag 3 words) could not be overwritten. This means that many of today’s viruses would not be possible on these systems. (The original B5000 only had a single bit meaning read/write or read-only. Later machines extended the tag to three and four bits.)

The other security mechanism was the descriptor, which aside from the aforementioned p-bit for hardware-defined virtual memory defined the starting address and length for memory blocks. Any attempt to access (read or write) outside of those bounds resulted in the process being terminated. Thus a very common source of programming errors were detected early on in a running program, rather than later after potential data corruption had caused more serious things to go wrong. Alas most of the world has gone the C route where strings are terminated with a null byte, which is the worst possible choice, making metadata part of the data, rather than keeping it separate as with the B5000 descriptor. For those familiar with Eiffel’s test-driven development (TDD) technique of preconditions and postconditions monitoring the correct execution of a program, the B5000 had this built in.

The other big breakthrough of the B5000 was that is was entirely programmed in a high-level language, ALGOL (or variants thereof). The B5000’s MCP (Master Control Program) was the first operating system written in a high-level language, and as a systems programming language, ALGOL in concert with the B5000 hardware, proved to be a much better choice than C for systems development.

There was no assembler at all, and if there were, it would look rather like FORTH, being a stack machine - but why program in a machine-oriented language when you can program in a problem-oriented language.

The other architectural feature of the B5000 was that it had no programmer-addressable registers. All local variables and temporary results went on the stack. The trick was that the hardware designers were free to design the processor with as many or few registers as they liked without having to change or even recompile software. The stack would be mapped into these barrel registers. Registers are a pain in today’s multi programming environment – on a process change, the OS must copy all the registers out to main memory. In the B5000, the MVST machine-instruction achieves all this in one swoop (although MVST might not have been in the original B5000).

Because of this high-level architecture, today’s Unisys B5000 machines are implemented either as specially-designed chips, or can be emulated on standard Intel (or other) processors.

The successors of the B5000 and MCP are still going today in the Unisys ClearPath Libra MCP series. While other companies regularly ditch their operating systems or completely rewrite large swaths of them, the MCP has evolved over 40 years. Let’s see the other OS to do this (although only for eight years so far, but Unix for over 30) is Apple’s OS X.

Later Burroughs and Unisys systems also supported the concept of system library, which is how DLLs and frameworks should be done correctly. They provided a way to coordinate many asynchronous processes access to shared resources, thus implementing semaphores and monitors in a most elegant way.

These days, ALGOL may seem like an ancient computer language, but the following quote made by C. A. R. Hoare in Hints on Programming Language Design in 1973 still seems very applicable against today’s common languages:

The more I ponder the principles of language design, and the techniques that put them into practice, the more is my amazement at and admiration of ALGOL 60. Here is a language so far ahead of its time that it was not only an improvement on its predecessors but also on nearly all its successors." (2)

(1)

"Burroughs large systems"

https://en.wikipedia.org/wiki/Burroughs_large_systems

(2)

"Burroughs"

http://homepages.ihug.com.au/~ijoyner/Ian_Joyner/Burroughs.html

February 16, 2018 | 03:14 PM - Posted by djotter

Since the new disclosure statement, which has been very well received, I was wondering about the advertiser statement "... purchased advertising at PC Perspective during the past twelve months." There is a few spots on the lower right of the webpage that can range from B&H Photo, Dell, companies from my country, to singles in my area (which is not the best when reading the news at work). How do these fit into the advertising disclosure? Or is the advertiser a degree of separation that avoids the conflict of interest?

February 16, 2018 | 03:27 PM - Posted by Dbsseven

I noticed a while ago that Bristol Ridge APUs lost Hybrid Graphics support, and the new Raven Ridge don't list it at all. Are hybrid graphics dead?

February 16, 2018 | 04:55 PM - Posted by Anonymously Anonymous (not verified)

Have you noticed the lack of support for more than 2 gfx cards?
How many people do you know that run more than one gfx card in the same machine? (sli or crossfire)

Development costs doesn't really make sense for a subset of people that is a very small percentage of who might actually buy a game. I can't remember the % value, but it is really small for people who run two or more gfx cards. And since Hybrid graphics support is 2 gfx cards, well there you go.

And besides, from what I remember reading and watching about the experience of running a hybrid graphics solution, well it was sub-par to pretty bad, so much so that it would be better to just not get that discrete card for the hybrid and save to get a better discrete gpu to run by itself.

February 16, 2018 | 09:25 PM - Posted by Dbsseven

All true and fair. I just hate wasting available processing power. Thought, it seems like the newer Vulkan/DX12 might allow for writing code in a more generic way to take advantage of "all available" GPU(s). Maybe this sort of branding isn't really necessary, as it's more within the game engine now?

February 19, 2018 | 01:57 AM - Posted by CrossFireAndSLIBothAreDeadJIM (not verified)

Hybrid Graphics, CF, and SLI are all dead while both the DX12 and Vulkan graphics APIs have explicit GPU multi-adaptor as a feature of these respective latest graphics APIs.

So its up to the games/gaming engine developers to manage multi-GPU and not the drivers. The DX12 and Vulkan drivers are intentionally light-weight drivers that only expose the bare metal with that Multi-GPU load balancing up to the gaming engine makers mostly(via DX12/Vulkan) as the games programmers are using the development kits from the respective gaming engine companies to make their games.

So once one gaming engine offers an easy hand-holding middle-ware path to automate DX12's/Vulkan's Multi-GPU for the poor little games companies' mostly script kiddies then all the games will start to make use of all the GPUs available on PCs/Laptops.

Now it's up to the gamers to start demanding that games come on the market with support for DX12's/Vulkan's explicit GPU multi-adaptor. With all the single GPU solutions with the performance to run demanding games on a single GPU going to the miners instead, the gamers should be shouting from their rooftops for DX12 and Vulkan explicit GPU multi-adaptor so games can make use of multiple low power/lower cost GPUs for gaming.

Once all of AMD's GPUs sold out for mining then the miners went to Nvidia, who has 4 times the GPU supply/supply channels that AMD had and the miners have bought up all of Nvidia's GPUs likewise.

Gamers all you have as an option if the mining demand does not dry up is for you to have the ability to gang up more of the less costly GPU SKUs in an attempt to get better performance! And if you do not scream for DX12's/Vulkan's explicit GPU multi-adaptor from the games makers then you will not be able to game at all for some of the most demanding games.

Gamers you better also hope that AMD can get some Vega replacments for Polaris/Mainstream because Vega has the FP16 and other featurres like HBCC/HBC(HBM2), Explicit Primitive Shaders, etc. So maybe AMD can offer Vega GPUs with 2GB or 4GB of HBM2 and can get more GPUs made using lower amounts of HBM2 so AMD is not VRAM constrained by having to supply GPUs with 8GB of HBM2. Vega's HBCC/HBC(HBM2) IP will allow for desktop GPUs with 4GB of HBM2 to perform like a GPU with 8GB of HBM2 simply because of Vega's HBCC IP treating HBM2 lake a last level GPU Cache. And with that HBCC/HBC(HBM2) IP the remainder of VRAM can be made virtual and paged to and from system DIMM based DRAM in the background, managed more effeciently by Vega's HBCC instead of in software. Vega's HBCC/HBC(HBM2) IP should allow for more GPUs to be made if the GPU only needs 4GB of HBM2 instead of 8GB of HBM2.

February 17, 2018 | 05:24 PM - Posted by CNote

1060 3GB should start showing up more now that mining needs more than 3GB of cache.
How long until 6/8!?

February 18, 2018 | 03:24 AM - Posted by CNote

When should we expect higher core/thread count plus maybe higher vega numbers? Do you think they would do a threadripper APU with like 20+ CU?

February 21, 2018 | 11:50 PM - Posted by KingKookaluke (not verified)

mailbag question for the next one:

I've been looking for a good cloning software package, that I can use to back up, and restore systems that I build for my coworkers, and their children? I used to use Norton Ghost, but haven't in awhile. Anything new, and good out there now?

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.