PCPer Mailbag #38 - 4/6/2018

Subject: Editorial | April 6, 2018 - 09:00 AM |
Tagged: video, pcper mailbag, Josh Walrath

It's time for the PCPer Mailbag, our weekly show where Ryan and the team answer your questions about the tech industry, the latest and greatest GPUs, the process of running a tech review website, and more!

On today's show, Josh survived a home invasion so that he could answer your questions:

01:38 - Battery charging strategy?
04:03 - SSD endurance for video editing?
06:38 - Monitor with both G-SYNC and FreeSync?
08:53 - CPU cache size limitation?
13:39 - Silicon brick wall?
17:30 - Feeding the racing addiction?
20:03 - Breaking and entering?

Want to have your question answered on a future Mailbag? Leave a comment on this post or in the YouTube comments for the latest video. Check out new Mailbag videos each Friday!

Be sure to subscribe to our YouTube Channel to make sure you never miss our weekly reviews and podcasts, and please consider supporting PC Perspective via Patreon to help us keep videos like our weekly mailbag coming!

Source: YouTube

Video News

April 6, 2018 | 10:44 AM - Posted by Andrew (not verified)

Hi guys. Storage question.

I'm wanting to purchase an nvme m.2 ssd. The Samsung 960 parts have been out for 18 months now. Any word on when Samsung intend to ship an upgraded part?

April 6, 2018 | 05:29 PM - Posted by Sean (not verified)

A few weeks ago on the podcast you mentioned using HBM on a CPU as an L4 cache. That reminded me of Hybrid Memory Cube (HMC) technology. I remember hearing about them both initially at the same time about 5 years ago. HBM has now moved into its second generation, but I haven't heard anything about HMC for a couple of years. Will it make it into a mass-produced product? If I'm remembering right, HMC was supposed to be cheaper to produce since it uses high-speed serial links and doesn't need expensive interposers like HBM does. So, I would at least think that there would be at least a manufacturing-cost advantage over HBM. Any ideas as to why it hasn't taken off?

April 7, 2018 | 08:24 AM - Posted by OnesFerGraphicsTheOthersForServerSortsOfWorkloads (not verified)

This article says it all and the author states:

"Strictly from a point of view of trying to understand the family relationships between these, I’ve sketched something of a family tree below based on my interwebs ferreting. Specifically, the HBM side of things is targeted largely at improving graphics processing, while HMC is intended more for servers in data centers, to oversimplify." (1) [See article Graphic]

Read the full article as there are differet use cases for HBM2 and HMC. The author also does not go into overall power usage of these respective memory stacking technologies so that's maybe something to look into also as JEDEC HBM2 has those 1024bit(Subdivided into 8 independent 128 bit channels) wide interface per HBM2 stack. Remember the wider the parallel interface the lower the clocks need to be to achieve the same effective bandwidth so lower clocks translate into lower power usage and less need for error correction compared to higher clock rates on narrower interfaces.

HMC is still around and has its own use case and the author also states:

"Another difference reflects the backers and suppliers: HMC has pretty much only Micron as a supplier these days (apparently, HP and Microsoft were originally part of the deal, but they’ve backed out). HBM is a Hynix/AMD/Nvidia thing, with primary suppliers SK Hynix and Samsung." (1)


"January 2, 2017
Comparing Cubes

by Bryon Moyer "


April 7, 2018 | 08:59 AM - Posted by OnesFerGraphicsTheOthersForServerSortsOfWorkloads (not verified)

"A few weeks ago on the podcast you mentioned using HBM on a CPU as an L4 cache"

The HBCC(High Bandwidth Cache Controller) IP on Vega can make use of HBM2 as HBC(High Bandwidth Cache) so for any Vega GPU that has access to HBM2 on that Vega SKU that HBM2 becomes the HBC/Last Level Cache. GPUs usually only go as far as having L2 caches below their L1 Data/Instruction cashes with CPU's having L3 caches in addition to the cache levels above.

so HBM2(Used as HBC on Vega) made use of as cache on a GPU without any dedicated L3 cache would be more of an L3 cache. so I always use the phrase last level cache in case there may be GPUs that actually make use of actual L3 cache in their design that sits above HBM2. CPU designs tend to have more Cache levels while GPUs stop at L2 mostly but the L2 cache sizes on GPUs are rather large compared to CPUs in most cases.

So any Desktop/Mobile Vega GPUs that make use of HBM2 can actually have the HBM2 act as HBC/Last level cache with the excess textures/code that can not fit into 4GB of HBM2 paged out to regular system DRAM, or even SSD/H-drive, and Vega's HBCC will manage the page swaps in the background to and from regular system DRAM/SSD(Virtual VRAM swap space).

Desktop Vega GPUs usually come with 8GB of HBM2(Vega 56/64) or 16GB+ of HBM2(Vega FE and Raden Pro WX/Radeon Instinct) so for gaming workloads 8GB of HBM2 is plenty. Now for other Graphics Rendering usage even 32GB of VRAM may NOT be sufficient like for large 3D high resolution Animation scenes but Vega's HBCC can address up to 512TB of Virtual VRAM so that's plenty large for any usage.

April 7, 2018 | 10:16 PM - Posted by James

True virtualized memory support, like it is done on CPUs, has been long in coming to GPUs. You can set some stuff up to try and have the GPU manage the memory on the last few generations of GPUs, but as far as I know, it requires explicit set-up and has some limitations. It isn’t transparent and always available. On Vega, it seems to be a full equal to what CPUs do. It may actually be roughly the exact same or similar hardware that is in a Ryzen cpu. It would need some wider data paths, but the cache coherent memory system and infinity fabric may be roughly the same.

Not needing to micro-manage memory takes a big load off programmers. It is obviously useful for GPU compute where the data may be significantly larger than what a game uses. It will allow games to use memory much more efficiently since they don’t need to grab giant blocks of memory that they may not need. With virtual memory system, you can allocate all of the memory you want, it just will not have any real pages mapped to it until you actually use it. The actual in use memory (resident set size) is generally much smaller than what is normally allocated.

Besides the reduction in required memory size, since memory is used more efficiently, there is also the application to multi-GPU. Given the process constraints below 14 nm and what has been said about Navi, I would expect future AMD GPUs to actually be made using a relatively small GPU on an interposer with one or two stacks of HBM. The required bandwidth will scale with the amount of compute available on the GPU. Larger GPUs would then be made up of the HBM+GPU building blocks mounted on a package similar to Epyc. It would be a massive waste of memory to try to keep everything for the entire scene in each GPUs separate HBM cache. It would also be nearly impossible for the programmer to manage such distributed memory efficiently. If you have 4 HBM+GPU packages, then The 4 to 8 GB possible with a single HBM stack would probably be sufficient. It is unclear how much bandwidth they will be able to do over the on package infinity fabric links. If PCI-Express 4.0 signaling is used then the bandwidth could be quite high, but not anywhere near local HBM cache level. They will be able to clock it higher than standard PCI-Express due to the short trace length. They may do a different topology or other things to optimize for bandwidth over latency. Latency is much more important in the cpu configuration. If they just connect them in a square rather than fully connected, then the trace length could literally be a few mm. That would allow low power operation at high speed.

I have wondered if AMD HPC APU will actually use the Epyc socket. You could have two Zeppelin die on one side and a GPU on the other. There would be 4 links available from the two Zeppelin die to connect to the Vega package without using the two inter-socket links and the 2 x16 IO links. It would be even more interesting if the Vega package actually had 4 DDR4 memory channels to make use of all 8 memory channels to the package. That is part of the reason why I am wondering if Navi will actually use nearly the exact same infinity fabric design as Zen, just with a wider path to local memory.

For the distributed memory setup that is speculated for Navi, HBM as a cache with fully virtualized memory is probably a necessity. It would be impossible to manage in the way GPU memory is managed currently, regardless of wether the application is games or HPC.

April 8, 2018 | 12:27 PM - Posted by OneMustReadOrOneCanNotLearn (not verified)

"I have wondered if AMD HPC APU will actually use the Epyc socket"

Maybe AMD could but I think that Using A PCB/Organic Substrate MCM Precludes the Silicon Interposer's advantage to be of the Active Interposer design Here is an interesting research paper from from University of California, Santa Barbara and AMD Research, Advanced Micro Devices, Inc.

Note the use of active and pasive interposers and die/chiplets. And AMD is looking at active interposers where the connection fabric and fabric's control logic are all etched in the Interposer's silicon substrate. That paper talk about CPUs but this is also applicable to GPUs/other processors dies and modular/scalable processor systems made up of many easy to fab dies.

There is some interesting information about Network-on-Chip (NoC) that relates to Active Interposer technology. It's a good read.

Also in addition to that paper see this research paper(2) to get an idea of what's on AMD's/AMD's researchers minds for HPC and GPUs on exascale systems.

You should note that if any of these papers are TL:DR then you will not be able to see where AMD is going with their ideas. AMD does not spend research money for nothing and these resaearch papers are partially funded by AMD and via government exascale initiative funding! Remember that lots of government funded research IP finds it's way rather quickly into the consumer markets also not matter the orgianial intent of the R&D funding for computing and the government's needs.


"Cost-Effective Design of Scalable High-Performance
Systems Using Active and Passive Interposers"



"Design and Analysis of an APU for Exascale Computing"


April 8, 2018 | 02:10 PM - Posted by James

Putting a TL;DR to summarize your post isn’t a bad idea. If it sound interesting, the reader may go back and read the whole thing. Most of your long post will not be read by anyone.

You didn’t understand what I was talking about. I was talking about placing a GPU and HBM on a silicon interposer, which could contain active elements, up to a complete infinity fabric implementation. They would then place multiple interposers on an Epyc type, PCB package. While we probably aren’t going to see ridiculously large GPUs on these smaller processes, they still aren’t going to be tiny. Current Vega64 is around 500 square mm; add on two HBM packages, and you are up to around 700 square or more. Four such devices would be on the order of 2800 square mm. Placing many GPUs plus HBM on an interposer would be a prohibitively expensive, gigantic interposer. Each HBM2 die takes around 100 square mm. They are significantly larger than HBM1 die. For smaller than 14 nm, they may make the building block slightly smaller than current Vega, but they would want a single block be useful on its own, so I expect the GPU building block to still be relatively large, so combined with HBM2, a single GPU interposer is still going to be large. Doing each one separately increases yields. If something went wrong in a 4 GPU giant interposer, the whole thing could be rendered useless. It also would make HBM as a cache not that useful. On an interposer, remote memory could be accessed at near full speed. I wouldn’t be surprised if the giant multi-GPU interposer would cost tens of thousands of dollars.

April 8, 2018 | 05:53 PM - Posted by ReadOrLoseFaceWithoutTheProperKnoledgeThatComesFromReading (not verified)

"If something went wrong in a 4 GPU giant interposer, the whole thing could be rendered useless. It also would make HBM as a cache not that useful. On an interposer, remote memory could be accessed at near full speed. I wouldn’t be surprised if the giant multi-GPU interposer would cost tens of thousands of dollars."

Not if the GPU Dies where of a smaller size and the interface and it's control logic where in the interposer and not on the smaller GPU dies. It takes a lot of BEOL(Back End Of Line) metal layers on any Monolithic GPU DIE to host all those inter-unit traces etched into any GPU DIE's silicon substrate. So with a active interposer that's all moverd to the Interposer's Silicon Substrate traces and Fabric logic and all with a much more simpler Modular Scalable GPU Dies remaining with more room for Shader/other units being able to be fashoned using smaller modular GPU, or other processor, dies. The research paper discusses binning to a great degree for CPU dies but that's also applicable to smaller modular GPU dies/chiplets, or any other processor dies for that matter.

Look at "Fig. 6. Scale comparison of 40µm pitch microbump arrays to 256-bit and 512-bit flit width routers in 16nm and 65nm technologies."(1) in that first research paper and see what space is saved and even though the it's researched at 65nm that's still going to scale to lower nodes with a net space savings.

The paper even discusses placing redundent logic on the interpoaer for yield preservation and defect tolerance reasons. There is a lot of extra redundent circuitry on modern processors so some simple faults do no waste an entire expensive processor, or MCM or interposer, package and ditto for that even on monolothic processors. Do you Know what the Industry Definition for the Term "Chicken Bits" means? And Redundency is a big thing even on monolitic die processors especially where any interface fabrics are concerned.

You are not reading any of the papers and that research is very important indicator of what AMD is thinking. AMD/Others do not fund any research on any ideas that do not show promise. And What you are doing is wanting to not learn and still commenting without doing any of the necessary homework and reading those research papers is required. You are obviously more concerned about saving face but if you continue to turn your nose up at reading and knowledge then you will lose face even more.

All this active interposer stuff has implications for AMD's future for both CPUs and GPUs(Navi and beyond) where the Interposer Becomes the NoC(Network on a Chip) and hosts the Entire Infinity Fabric data/control fabrics and any other fabrics deemed necessry.

Reading Is Fundamental, Doubly So for High-Tech, and if you TL:DR too many times and you lose face. I sure read the papers even the parts I do not completely understand(Maths) and I saved the PDFs on my computer for future Refrence. AMD's sure funding plenty of research and the research papers being sponsored by AMD are sure as anything most telling as to what is on AMD's mind as far as what will be coming IP/Technology wise from AMD. Patent Filings also are a great source of information.

(1) [see fig. 6 on page 5 of this paper to see the footprint savings on an active interposer that becomes the network on a interposer Chip design with the processor CPU dise, or GPU for that matter, 3D stacked above]

"Cost-Effective Design of Scalable High-Performance
Systems Using Active and Passive Interposers"


April 7, 2018 | 01:21 AM - Posted by Anony mouse (not verified)

Has PCPerspective try'd to do any investigation or follow ups to Nvidia GPP program ? Talk to OEMs or AIBs ?


April 8, 2018 | 02:07 PM - Posted by OneMustReadOrOneCanNotLearn (not verified)

The entire CPU/GPU Laptop/PC OEM market has become dependent on their parts supply chains for support. Namely from 2 of the OEM market's rather larger Monopoly CPU/GPU interests. So for the smaller PC/Laptop OEMs they have allowed themselves to become so dependent on their processor parts suppliers that the smaller OEMs are dependent on these CPU/GPU suppliers for the very software/hardware Technical know-how to be able to even afford to engineer their OEM PC/laptop SKUs.

Now Companies like Dell and HP can afford the maintain their own in house engineering personnel but even there in that large OEM laptop/PC market there is still a dependency on those 2 large Monopoly Interests CPU/GPU parts suppliers to the Large PC/Laptop OEMs. And because of the lack of competition for so many years even the large PC/laptop OEM are paying a good portion of their PC's/laptop's BOM to those 2 vary large CPU and GPU monopoly interests. Even the large Laptop and PC OEMs have had to resort to selling Bloatware space on their respective OEM PC/Laptop offerings because of that lack of competition.

Now there has been some great product offerings from the smaller CPU/GPU competition over the years but the Big CPU/GPU monopoly interests have taken to blatant unfair market tactics even in the face of fines, as those fines are not large enough to have any lasting affect on these 2 large CPU and GPU monopoly interests. You can NOT expect this behavior to cease unless the fines are so large for the violators that they can not simply write the fines off as the cost of doing “Business” or the government takes actions to simply breaks the two big monopoly CPU and GPU interests up into smaller competing companies.

You and everyone else will have to become politically active with regards to antitrust issues and the laws already on the books or you will simply be ignored like some background noise. If you can not make antitrust a campaign issue like was done in the late 19th century and early 20th century and also get that Citizens United Supreme Court ruling overturned by congress passing a specific law/laws to limit corporate influence in the nations political processes then you the consumer are SOL with respect to that and with respect to democracy.

You are absent the necessary political force as a consumer only able vote with your wallet and that boycott method works so well when consumers finally get fed up enough to stop purchasing from the violators. This will be rather painful for any gaming addicts as they need their fix, but it’s very easy to just go the home builder route and never buy OEM/prebuilt where you do not have a direct choice in what parts suppliers are used.

AMD most definitely needs a NUC(Intel’s Mini Desktop Line) competitive product that make use of the lower end AMD Raven Ridge offerings. I’d say for gamers to also become AMD stockholders and vote as a group to force AMD’s management to start creating some more consumer focused Mini Desktop systems of the Bare-bones(NUC) and Mini Motherboard/Mini PC Case designs.

AMD is a CPU company first and foremost and they do not have a large enough share of the Gaming market to warrant any Nvidia style Funding approaches for the consumer market where the profit margins are too small currently. AMD would rather focus on the Professional markets where the mark-up AND margins will keep Lisa Su/Her management team employed longer term. AMD itself would rather play the second fiddle to Intel at this time until AMD can get the Epyc CPU revenue streams flowing to give AMD the excess revenues in which to try and buy its way back(For lack of a fair market way) into the consumer OEM PC/Laptop market and now the GPU AIB/Gaming markets of OEM Gaming Branded parts. AMD’s going to have to take the second tier gaming branding from now until who know when and the only choice that the consume has is to excersise that wallet closing functionality that built into all consumers by default.

April 7, 2018 | 08:35 PM - Posted by James

There are limits on cache size that are not directly related to die size. The basic idea is that bigger is slower. Caches generally have some degree of associativity. To look up something in the cache, the cpu fist has to determine which associative set the address could be in. Next, it has to actually compare part of the address to the tags. Since there is associativity, the address could map to multiple locations. The multiple possible tags must be compared to the tag portion of the address to determine if the cache set actually has the address cached. There is also all kinds of complicated stuff about whether you are dealing with a virtual or physical address, so the TLB, which caches virtual to physical mappings, can come into play.

When we are talking about cpu clock ticks, even the time to drive the signals across the chip can be significant. The time to look up the address and determine the location in the cache and compare to tags to determine if it is the actual correct address is significant in cpu clock ticks. You can not have giant L1 or L2 caches even if you were willing to spend massive amounts of die area and power. The time to get a response from the cache would be way too large. Cache design is probably one of the most complicated parts of modern CPUs. It is difficult to determine a good balance for 3 levels of caches. The actual cpu core, at least for integer instructions, is tiny compared to the caches. You are mostly buying a memory chip in some ways. A lot of design work goes into minimizing the latency and maximizing hit rate of the caches for a broad set of applications. It may be more important than the actual core design at this point. Modern processors can often execute 6 to 8 instructions per clock, but in reality, they are doing well if they can even reach an IPC of 1 for most code. Instructions are cheap, memory accesses are expensive.

We actually may kind of go slightly backwards in design. With a cpu made to go on a silicon interposer, I could imagine making an L3 cache with a memory controller on a separate die. The cache takes a lot of area, so making it as a separate chip, possible with process optimizations specific to cache design, may be a way to increase yields. You could just have a large cache that caches physical memory addresses mapped to the memory controller it is attached to. This could be useful for large applications. IBM Power processors, which use large MCMs already do this to get better latency when connecting terabytes of memory (higher latency than simple desktop systems). This wouldn’t be particularly useful for desktop systems though. Consumer level applications just do not have a large enough memory footprint to make use of these huge caches. The current caching architectures already cache them quite well. Increasing the cache size might have nearly zero effect for many consumer applications.

April 8, 2018 | 12:57 PM - Posted by OneMustReadOrOneCanNotLearn (not verified)

Cache access times are measured in nanoseconds and also the cache associativity affects the access times for better or worse. There is also the Type of cache, exclusive or inclusive and other factors. A modern CPU's cache subsystems can include dedicated controllers that do the page/cache tables walking and allow the CPU cores to issue queued cache requests and not tie up the CPUs execution piplines with any uncessary workloads. That and the Modern CPUs that make use of SMT in order to hide any cache access delays(Latency) and keep the CPU's/Processors Execution piplines operating at as close to 100% capacity as possible.

There is a marginal return metric on just how large a cache size can be retalive to the actual effeciencys gained and for L1, L2 caches on CPUs that's directly related to the CPUs total execution pipeline resources. GPUs can and do make use of larger Caches but that's more to do with the parallel nature of the GPUs Shader cores and the larger numbers of threads that can be in process on a GPU's many shader cores that are grouped for parallel(Thread Warps or Thread Wavefronts) execution at any one time by a GPUs parallel schedulers as opposed to a CPU's rather scalar cores and their schedular logic.

April 8, 2018 | 02:16 PM - Posted by James

I don’t think you added anything useful to my post and you completely misunderstood my other post. Hopefully this is clear and concise enough to be understood by you.

April 8, 2018 | 06:01 PM - Posted by ReadOrLoseFaceWithoutTheProperKnoledgeThatComesFromReading (not verified)

You must always be in face saving mode as the result of your hatred of reading and learning.

Your TL:DR is not a defence and Reading and Reading those Research papers and Other materials and related articles from trusted sources is absolutly necessary.

Read thsoe research papers and there are plenty more to be read and the learning never stops until you give uo the ghost or return to the cycle of elements whatever you believe!

April 9, 2018 | 06:41 AM - Posted by WhyMe (not verified)

(s)he never does, (s)he just waffles on ages.

Most people have probably learn't to just ignore him/her as anytime you question his/her gish-gallop you get nothing but insults back.

April 9, 2018 | 11:42 AM - Posted by RabidRaccoonWithDistemperToo (not verified)

And WhyMe(Cletus Delroy Spuckler's less Intellgent cousin) appears to be another one of those folks that hates, Oh so hates, reading and learning!

"Most people have probably learn't to just ignore him/her"

And you have not apparently learned to ignore the Walls-O-Text with those links to some interesting AMD sponsored research, with some government Exascale Computing grant money thrown in to speed the technological advancement process along(That IP finds its way into consumer produts PDQ these days).

Just look at all the Goobs over at r/AMD and r/Nvidia asking for all the reasons why and not being able to even understand even a simple answer.

High Technology Questions in the hands of folks that are at best only well trained technicians are not going to help with any really technically complex issues surrounding PC/Laptop and other computing devices. There are no simple answers regarding Computing Technology as it has come to exist in its current iteration! So folks who do not read up can only ever to expect to remain in the dark concerning how things in the real computing technology world actually work.

[Side Diatribe Begins Here]

AMD is sure not going to be too concerned about consumer gaming of the Flagship GPU Kind what with the total revenues/margins(Almost break even) on Flagship GPUs not worth the effort. It's that APU Graphics(Raven Ridge) and the Mainstream GPU maket that AMD is more interested in, see the rumors on that RX 580X, and see that Mainstream is for AMD a more logical business choice and only the professional GPU Compute/AI/Pro Graphics makrets will be getting Vega 20.

But Flagship Gamers should not worry too much about Vega 20 only going for the Professional Compute/AI markets as there is always those non performant Vega 20's that may get the chance to become binned down into some consumer flagship variant if the cheapskate gamers are willing to pay AMD more than just enough of a mark-up for AMD to more than break even. AMD's total sales margins need to be above 45% and higher to attract investment in order to give AMD the market CAP to afford Flagship GPUs that are specifically designed for gaming only, see Nvidia's market CAP to see where Nvidia gets the funds to do those 5 different base die tapeouts(GP100, GP102, GP104, GP106, GP108).

It Looks like AMD with its current lack of cash will have to rely more on Intel to buy AMD's Vega graphics into the laptop market. And AMD appears to be just fine with that and its other Console semi-costom gaming APU business.

Where are those ZOTAC ZBOX MA551 SKUs with Desktop Raven Ridge Inside, why are folks not doing the Simpsoms kids in that back seat things and asking all the enthusiasts websites:

Is ZOTAC ZBOX MA551 SKUs with Desktop Raven Ridge here yet!
Is ZOTAC ZBOX MA551 SKUs with Desktop Raven Ridge here yet!
Is ZOTAC ZBOX MA551 SKUs with Desktop Raven Ridge here yet!

Oh look and see Compulab has gone over to the Intel Dogfood graphics side with Intel's integrated graphics instead of maybe some Raven Ridge desktop or even mobile offerings in those Compulab offerings that used to come with AMD APUs.

Where are the Mini-Desktop bare bones offerings with Raven Ridge APUs inside as all I'm seeing is that one Zotac announcement.

Gamers had better hope that Epyc gets AMD back up to that around 21% share of the server/workstation/HPC CPU market share because that was the market share that AMD had with its Opteron Line of CPUs, and Epyc is way more performant than Opetron ever was. AMD's Opteron sales margins enabled AMD's share price to remain around $93 dollars for a good while before things went south for AMD and this was on CPU sales alone, as AMD has yet to acquire ATI.

Here is a little reminder to gamers, AMD could pull out of the retail consumer gaming market entirely except for consoles and semi-custom and still profit handsomely from Epyc and Radeon Pro WX and Radeon Instinct compute/AI GPU sales. AMD has such a small share currently of the discrete Desktop/Mobile GPU market. AMD is much better off working on some mainstream Vega/Navi SKUs for the future, its Raven Rige lineup, and those semi-custom gaming console deals that now even include that Intel semi-custom Vega die deal that may or may not actually be Vega graphics based.

The Radeon RX 580X/RX whatever X versions are remored to be incoming and maybe with faster GDDR/Whatever VRAM and hugher clocks.

[Foams at the mouth and hisses like a Rabid Raccoon, and the Side Diatribe Ends Here]

April 9, 2018 | 01:28 PM - Posted by WhyMe (not verified)

IDK what you said but case in point.

April 9, 2018 | 02:24 PM - Posted by RabidRaccoonWithDistemperToo (not verified)

Here is an example of why its good to call out folks that do not do their homework and make uninformed statments!

See this r/amd exchange and it's necessary to call out the Folks that are making these uneducated statments without doing their proper research paper reading before posting!

{Example Starts Here}

[–]HorumOmnium • 3 points 2 hours ago

GPUs have caches, but they are more used for bandwidth amplification than for latency reduction, since GPUs are very good at hiding latency to begin with.


[–]ET3D • -1 points an hour ago

On the contrary, GPUs are terrible at hiding latency. Typical latency for memory access on a GPU is in hundreds of cycles. If it wasn't for caches (and local memory) latency would have been a lot more of an issue.


[–]HorumOmnium • 2 points an hour ago

Please google “gpu hiding latency” and you’ll find plenty of papers from UC Berkeley and CalTech that can help you stop embarrassing yourself and start understanding how GPUs are the ultimate latency hiding architecture thanks to having a massive amount of threads.

The fact that they can hide latency so well is the reason why it’s acceptable for the external memory accesses to be in the hundreds of cycles, unlike CPUs.

It has a significant impact on the peak achievable DRAM BW: instead of scheduling for low latency, the row sorter can schedule for BW (which has a negative latency impact.)

Caches are always useful: increased local BW, reduced power, and yes, to a certain extent even more latency reduction.


[–]jorgp2 [score hidden] 23 minutes ago

Doesnt that depend on the workload though?


For content that does not contribute to any discussion.

[–]HorumOmnium [score hidden] 20 minutes ago

If you’re going to run one thread on the whole GPU, you obviously won’t be able to hide the latency.

But a typical GPU workload runs tens of thousands of threads at the same time.

A CPU can’t do that, so it relies on caches and memory locality to avoid the long DRAM latencies.


[–]ET3D • -1 points an hour ago

Dude, I never embarrass myself. I may be wrong at times, but that's nothing to be ashamed of. While your references are certainly worth pursuing, it's a pity that you had to be a jerk about them.


{Example Ends Here}(1)

The ones who read the research papers are the ones that will get their information correct most of the time(99.99%)
while the not so smart ones hate to read and really are not interested in technology to begin with! You GO! HorumOmnium and call those daft fools out!

Latency Hiding is not the same as latency(Good or Bad) Latency Hiding can in fact be helpful no matter the memory/cache latency(In Nanoseconds). Folks are not doing the needed research and r/Amd is replete with examples such as this!


"MetaCPU's have about three levels of cache to boots memory latency, do GPU's have caches and could they use them? self.Amd

submitted 3 hours ago by Arowx"


April 10, 2018 | 02:42 AM - Posted by WhyMe (not verified)

WTH are you rabbiting on about reddit when the question being asked is why you feel there's a need to witter on for ages and insult anyone who questions you.

April 10, 2018 | 10:55 AM - Posted by RabidRaccoonWithDistemperToo (not verified)

In Your case it's attacks on the length of the posts and no salient and cogent counter posts on an item by item basis, with links included to prove otherwise.

and it's folks like you who will not read and learn in order to prove your statment correct that is the problem with respect to High Technology subjects that are not, and I repeat not some sports like contest!

You get insults when you put insults out there and you get more afterwards because you have a need to complain about a post's length rather than what was said.

Lazy Folks can not even attempt to fully grasp the full complexities involved in CPU/GPU/other processor science, and that include driver/software and OS/API and middleware layers that can and do make up the full software/firmware stack that allows the processors' hardware to perform its intended use.

Those research papers/whitepapers reading are a requirement as well as reading the professional trade journials.

r/Amd is in need of some of the very same sorts of linux sorts telling it like it is, just as the above example but that's beyond their ability without the proper knowledge.
r/And does more damage to AMD's reputation that even those CTS-labs and Viceroy Research could ever hope to achieve!

[Out on a Wild Tangent, MAYBE, Starts Here]

Do you Know What Both Intel and Nvidia Fear most about AMD's Technology and IP prowess and it's not so much that Technology and IP in and of itself. Both Intel and Nvidia Fear that AMD Technology and IP prowess in the hands of an AMD with the money to do something about Buying AMD's way back into the CPU and GPU markets the same way that Both Intel and Nvidia currently do and have done so for quite a few years now! The Fear is there now with that Nvidia GPP Tactic just as the Fear has been with Intel and it's nefarious market tactics even longer than with Nvidia's nefarious market tactics.

[Out on a Wild Tangent, MAYBE, Ends Here, well this instence anyways]

April 8, 2018 | 09:15 PM - Posted by eagle63

Hey guys, general question regarding graphics driver updates as it relates to gaming. (This question has probably been asked and answered many times before, I just haven't run across it yet)

It seems that Nvidia (and probably AMD as well) will often come out with a new driver shortly after the release of a new AAA/popular game with the claim that it's "optimized" for that particular game. This seems crazy to me, because I assume that by optimizing the driver for a specific game, that also means it's probably now de-optimized for other games or applications right? I mean, it's not like they just suddenly figured out some engineering breakthrough or anything; they're just pushing the deck chairs around a bit (figuratively speaking) to put more weight into the specific things that might matter most for a specific game.

So can anyone explain what exactly these driver updates are really doing, and are they really a good thing? (for anyone other than those playing that specific game for which the driver is optimized) I'm also assuming this is something unique to PC gaming... consoles don't do this right?

Thanks for any info!!

April 8, 2018 | 11:39 PM - Posted by NotReallyIfYouHaveAnIdeaOfHowModernOSsReallyWork (not verified)

No that just means that there is a seperate code path for the games that get individual in the graphics driver optimizations. So there are driver extentions tuned for a specific title that are able to be called by that game, or even code thunks in the driver that are automatically enabled if that optimized title is decteted running with the driver decting the game and the game decting the new DLL symbols for any code calls that target the optimized game's driver function calls.

Both the games devlopers and AMD are working together so it's more than likely AMD and the Games maker working out some Tweaked/Customizied driver extentions that the game can target once the new drivers are istalled.

That Driver stuff is all done through APIs/ABIs anyways and the APIs/ABIs are extensible frameworks with AMD able to create custom driver extentions and the Game's maker having that functionality able to be targeted once the game detects the updated drivers.

A lot of that extentions and extensible features between the Graphics APIs and the drivers is already built into DX12 and Vulkan APIs as well as the eariler Graphics APIs. The DX12/Vulkan Graphics APIs have been designed as extensible Graphics APIs where AMD, Nvidia, and others can all register hardware specific extentions with Graphics APIs(DX12/Vilkan and older APIs) and then make use of those extentions in games for a specific GPU maker's hardware features that the others' GPU products lack.

Look at Windows and that Windows Display Driver Model(WDDM) and that's already up to in Windows 10 Fall Creators Update (version 1709) that includes WDDM 2.3. And that's a complicated framework/API that provide the basic 2d and 3d functionality under windows for Virtualized video memory, Scheduling with a runtime that handles scheduling of concurrent graphics contexts, Cross-process sharing of Direct3D surfaces and Enhanced fault-tolerance.

There are whole frameworks in the respective OSs dedicated to registering and querying hardware configurations. And that's all done with some sorts of Plug and Play OS/API functionality and that's part of the OS and is there to enable every part of the OS/Hardware and software/driver systems to be customizied on the fly for any hardware specific features and any new hardware that's plugged in to any ports on PCIe slots/whatever.

You may not realize it but that entire PC/Laptop and devices ecosystem take a milti-trillion dollar industrial investment to create and maintain with OSs/APIs and software and firmware that runs into the multi-millions of lines of code. It's a huge industral undertaking that's taken decades to reach this point and that's even more trillions spent just to get to this point so it's as easy as plug and play with some minimal automated driver/software/firmware updates.

There are more ARM ISA/other ISA based CPUs running embedded OSs on PCs/Laptops than most folks realize that are doing all sorts of controller services(SSDs, Hard-Drives, Wifi, Ethernet, others and even the MB chipsets have embedded CPUs/Controllers that provide services and need embedded OS/Firmware maintainence.

April 9, 2018 | 01:10 PM - Posted by RabidRaccoonWithDistemperToo (not verified)

Intel's responce to Toms Hardware about its Semi-Custom "Vega" Graphics!

"This is a custom Radeon graphics solution built for Intel. It is similar to the desktop Radeon RX Vega solution with a high bandwidth memory cache controller and enhanced compute units with additional ROPs."(1)

And MY interest in this Intel Statment in that Toms hardware article is less to do with the Vega/Polaris GPU DNA inside that Intel G series graphics and more to do with AMD actually able to bake in more ROPs to an existing GPU's tapeout. So with Vega 56 having the exact numbers of Shader cores and TMUs as Nvidia's GTX 1080TI whay can AMD neot do a Vega 56 respin on a new Vega DIE tapeout that has enough ROPs(Vega 56 only has 64 ROPs) to match the GTX1080Ti's 88 ROPs.

AMD has the ability to increase its ROP to shader Ratios in favor of more ROPs to match Nvidia's more ROP heavy GP102 based gaming GPUs so where is Vega 56 with 88 ROPs and a moare gaming focused design to match Nvidia's GTX 1080Ti in total ROP resources.

OK AMD make with the ROPs on any Vega respins and do not use the Vega 10 base die tapout that maxes out at 64 ROPs.
ROPs are why Nvidia is Winning the FPS matrics contest.


"Kaby Lake-G's Vega Credentials Questioned: Rapid Packed Math Not Working"


April 10, 2018 | 05:04 PM - Posted by Idontgetit (not verified)

I don't understand the fascination with Intel NUCs. Almost every site reviews them each every time Intel releases a new one like they are contractually obligated. OEMs like HP (Desktop Mini) and Dell (Micro PC) are all just as tiny form factors, and when all is considered, about 30-50% the cost of a fully installed NUC. Can you explain the novelty of the NUC and why people should consider it over a much less expensive and probably just as reliable OEM with a similar form factor?

April 11, 2018 | 02:47 AM - Posted by BetterGraphicsDriverSupportOnNUCs (not verified)

NUC's are a barebones option and Intel is in charge of the graphics Drivers/other Driver udates rather than HP/Dell and we all know that HP/Dell customize their Graphics Drivers somewhat and never properly support graphics driver updates compared to what Intel, Nvidia, and AMD do for their Generic Graphics drivers.

NUCs get their Graphics drivers support directly from the the processor's maker and AMD is only a subcontractor here that works with Intel(G series SKUs with Vega semi-custom discrete die on the EMIB/MCM) to get the drivers tweaked for gaming. Just you Try and get that level of service with Dell/HP for mini-desktop/laptops and see that the NUC is the better option!

AMD really needs an NUC sort of competing product also!

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

This question is for testing whether you are a human visitor and to prevent automated spam submissions.