RTX RDX; Tracing the troubles with the Ti

Subject: General Tech | November 9, 2018 - 12:53 PM |
Tagged: nvidia, RTX 2080 Ti

More evidence of issues with the Founders Edition of the RTX 2080 Ti appeared on a screen over at [H]ard|OCP while Kyle was relaxing with a little Hunt: Showdown.  Earlier hints of issues occurred, with some initial BSODs and a lacklustre overclocking experiment when trying to push the card beyond it's factory overclock.  A new driver just dropped yesterday and Kyle is going to keep testing as there are always numerous variables in these sorts of things but it is worth keeping up with.

On the plus side the crash unlocks a new colourful version of Centipede!

View Full Size

"This case is not in any way running "hot" with a single RTX 2080 Ti. Even this evening I was running its two 280mm fans at high to make sure I was giving it the airflow it needed. This case has been home to dual Titan X cards, as well as Radeon 290X Crossfire, and never had an issue."

Here is some more Tech News from around the web:

Tech Talk

 

Source: [H]ard|OCP

November 9, 2018 | 01:44 PM - Posted by Rocky1234 (not verified)

Thanks for the info and update on this. Oh and thanks now I want to play that very old game...:)

November 9, 2018 | 02:36 PM - Posted by Jeremy Hellstrom

The link goes to a Flash version :)

November 10, 2018 | 11:07 PM - Posted by collie

It's not the same without a trackball

November 10, 2018 | 02:26 PM - Posted by James

I am generally surprised at how high of clocks they get out of these things. They may have pushed it a bit too far. All kinds of things can cause issues, like the power delivery circuitry not being quite up to it. I generally have a policy of never buying anything that is at pcb revision 1.0. If you value stability (and your money) it is best to wait until they work out the bugs a bit. The number of super high end cards that they make like the 2080 Ti is actually very small compared to more mainstream cards. The more mainstream level cards will get tested much more just by more people running them. If there are issues with the more mainstream cards, it will be obvious very quickly.

Giant gpus at 12 nm and below probably should not exists anyway. I am somewhat surprised that AMD can make a large gpu on 7 nm (331 mm2 is probably quite large for the process). The base cost of a 7 nm wafer is going to be significantly more expensive than 14 nm. Add bad yields to that and these chips are probably going to be too expensive for the consumer market. The prices of nvidia’s large die chips even at 12 nm have gone up compared to the previous generation. I don’t know if we will see any of these large die product come to the mainstream consumer market. The enterprise market can pay thousands of dollars for these things though.

We might not get mainstream, but high end, consumer parts at 7 nm until we get a “chiplet” version. I suspect that AMD’s Navi parts will be a relatively small gpu die paired with a single HBM chip on an interposer. They can then place several of the interposers on a pcb and connect them together with infinity fabric. The HBM as a cache that AMD has developed with full cpu style virtual memory support is one component of that. It is unclear whether these would be like Zen 1 or like Zen 2 with a switch chip. I guess they could also go the monolithic but on multiple chips route. They could try to make them look like a single gpu even though it is composed of multiple chips. It should be interesting.

November 11, 2018 | 11:25 AM - Posted by DualGPUsOnePCIeCardViaNVLinkOrInfinityFabric (not verified)

Well both Nvidia's NVLink and AMD's Infiity Fabric can allow GPU/Cards to be made up of at minimum 2 separate DIEs on one PCIe card.

So intrinsic to both the NVLink protocol and the Infinity Fabric protocol is the extra GPU to GPU, CPU to GPU, and CPU to CPU(AMD), Processor cache coherency traffic that can allow for Multi-Processor-DIEs to be wired together and appear to software as if the Mulit-GPU or Processor DIEs act like one larger logical GPU/Processor.

So why should Nvidia or AMD wait for the more complex Multi-Small-GPU die/chiplet router and topology IP to be fully developed vetted/certified when at least 2 GPU dies can be wired up directly via NVLink and Infinity Fabric. And that wiring up of only 2 DIEs can be a starting point for both AMD and Nvidia to be able to cut each DIE's size in Half for better yields and still create Flagship Gaming GPU offerings.

The milti Small-Die designs that make use of many much smaller GPU/DIE chiplets have some routing issues to be engineered in order to not have bottlenecks somewhere in their mesh topologies but if the design is only there to link 2 larger GPU dies then both AMD and Nvidia could do that currently because 2 GPU DIEs wired together represents a much simpler point to point connection. And even 4 GPU DIEs can be wired up to each other in a rather simple 8 connection topology and still represents a single hop from each die to every other die.

But at minimum 2 smaller GPU DIEs on one PCIe card wired up via NVlink or Infinity Fabric is doable right now. And NVLink/Infinity Fabric will wire up 2/more GPU cards in such a manner that all the GPU's can share their VRAM in a single address space and the GPUs can actually snoop each others Cache Levels via that cache coherency ability that's intrinsic to both NVLink's and Infinity Fabric's Protocols.

I would not be suprised to see AMD creating Dual Vega 20 SKUs at 7nm at some point in time to compete better with Nvidia's Tesla V100 SKUs Tensor Cores on AI workloads. AMD needs to be in the development stages of getting their own Tensor Core IP ASAP. But Vega 20 has an Infinity Fabric Bridge IP so 4 Vega 20 GPUs can be wired up and work as one larger GPU, Ditto for Nvidia. And that's not some CF/SLI Limited IP as was used in the past because Both AMD(Infinity Fabric) and Nvidia(NVLink) are currently way beyond that CF/SLI(Now depreciated) IP.

November 12, 2018 | 05:11 PM - Posted by James

As far as I know, the instinct cards still connect to the cpu(s) with pci-e. It is unclear if you can place multiple interposers on a card without losing bandwidth to the cpus. Four cards would have 4 independent connections to the cpus. You would lose a lot if you limited that the one pci-e link for 2 or 4 gpus. The link speed isn’t that important for games, but it can be for HPC.

Also, latency is much less important for gpus. Gpus have to work on large chunks of data to be efficient. There is probably little reason to use a fully connected topology. Going with a smaller number of much wider links for the increased bandwidth seems to be the way they are going. The 4 gpus are just connected in a ring (square topology) for the new instinct cards. They are not fully connected. For games, the latency would be more important. Dual gpu setups often still stutter more than single gpu setups. The 4 gpus connected together with infinty fabric of the new instinct cards would still appear to the system as separate gpus, which are difficult to make use of in games. They may be working on a chiplet version of a gpu to look more like a monolithic device.

They could build compute units separately with a control chip handling memory access and every thing else. I don’t know which way they are going to go. Nvidia seems to be deliberately de-emphasizing multi-gpu, even though it should probably work much better under DX12/Vulcan. We definitely need to go multi-chip somewhow due to die size / yield limitations. It is suspicious that Nvidia seems to be trying to kill multi-gpu right when it is becoming necessary. Hopefully AMD is working on a chiplet style gpu that is still monolithic from a software perspective. HPC doesn’t require monolithic gpus normally. I have seen cases where 2 gpus with half the bandwidth and half the compute perform almost exactly the same as the single larger gpu. While gaming and HPC seem close at first glance, but they can require quite different things.

November 12, 2018 | 05:10 PM - Posted by elites2012

i was not moved with the new RTX idea anyway. you could have made it just advanced texture and lighting. stop trying to impress the share holders with BS tech.

November 13, 2018 | 12:56 AM - Posted by Anonymously Anonymous (not verified)

Ray Tracing in current games requires DX12. maybe in the future we'll see it implemented in Vulkan or something else, but for now all we have is DX12.
And there lies the problem, DX12 is shit in several games, severe hitching and stuttering.
https://www.youtube.com/watch?v=G57pfGTMpak&t=0s

When RT is finally patched into windows and BFV, RTX owners are going to be severly PO'd.
that video above shows a great exampmle of DX12 on an RTX card in BFV, and it looks like shit.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.