Programming with DirectX 12 (and Vulkan, and Mantle) is a much different process than most developers are used to. The biggest change is how work is submit to the driver. Previously, engines would bind attributes to a graphics API and issue one of a handful of “draw” commands, which turns the current state of the API into a message. Drivers would play around with queuing them and manipulating them, to optimize how these orders are sent to the graphics device, but the game developer had no control over that.
Now, the new graphics APIs are built more like command lists. Instead of bind, call, bind, call, and so forth, applications request queues to dump work into, and assemble the messages themselves. It even allows these messages to be bundled together and sent as a whole. This allows direct control over memory and the ability to distribute a lot of the command control across multiple CPU cores. Applications are only as fast as its slowest (relevant) thread, so the ability to spread work out increases actual performance.
NVIDIA has created a large list of things that developers should do, and others that they should not, to increase performance. Pretty much all of them apply equally, regardless of graphics vendor, but there are a few NVIDIA-specific comments, particularly the ones about NvAPI at the end and a few labeled notes in the “Root Signatures” category.
The tips are fairly diverse, covering everything from how to efficiently use things like command lists, to how to properly handle multiple GPUs, and even how to architect your engine itself. Even if you're not a developer, it might be interesting to look over to see how clues about what makes the API tick.
I thought this was funny. 970
I thought this was funny. 970 anyone
Resources
Dont’s
•Don’t rely on being able to allocate all GPU memory in one go
◦Depending on the underlying GPU architecture the memory may or may not be segmented
Truth in
Truth in Advertising
https://www.truthinadvertising.org/nvidia-geforce-gtx-970/
They have a total of 6 class-acting lawsuits pending. The newest one was filed in July.
This might be used against them since its the only known DX12 supporting card with segmented memory.
That’s pretty funny and sad
That’s pretty funny and sad at the same time.
It may be 970. Or it may be
It may be 970. Or it may be not.
GPUs should have more segments of memory. For DMA, for compute shader accessible, etc. Those may or may not be interexchangable.
PS 970 is 3.0 GB gpu. If vram excede 3,5GB then last TWO 0.5 share same bus! Both will be slow. So its either 3,5 fast GBs or 3GB +1GB of slow memory.
Slow and Slower?
Reminds me
Slow and Slower?
Reminds me of a movie…
’cause with dx12, the baton
’cause with dx12, the baton is now on devs’ hand. What they can do with the power will be politically impacting CPU/ GPU race.
so basically, amd/nvidia
so basically, amd/nvidia loses some control over devs. devs can implement their way more freely now. and this would be bad for amd/nvidia because a single game performance can swing wildly on their cards. seeing this, as the topic says, nvidia anticipates it by giving out “tips”.
Worse performance can vary
Worse performance can vary even within the same architecture. Just look at Mantle performance with R9 285. In BF4 and Hardline 285 run slower than DX11 in Mantle when Mantle was supposed to bring better result. To be honest i wonder since driver are controlled directly by dev then that’s mean dev have to optimize for every card out there even cards that using the same architecture. Because that what Nvidia and AMD did in the past. Hence we always see x perforamance increase in specific games for specific card only in their driver release notes. Can PCPer gives some lights on this?
Nvidia might be nervous and
Nvidia might be nervous and posted this so developers will not do a true XB1 port because all that work put into GCN architecture will benefit all GCN cards on the PC.
This is probably there way of try’n to reach out to gain favorable programming.
Would have been helpful if Ryan asked Microsoft or Lionhead how different if any from the XB1 to the PC benchmark of Fable Legends was there. Going by the benchmark a 7850 can get you 60fps on a console but only half the frames on the PC.
Somethings is wrong there.
Maybe the overhead and
Maybe the overhead and drivers are really that bad on PC? I know that I am continually surprised by the level of graphics fidelity the developers get out of the Xbox One given its specs.
I held off on buying one for over a year because the specs look like crap and the launch titles didn’t look great either. But now, I’m playing Forza Horizon 2 on it (granted at 30fps) and it looks extraordinary. I used to be extremely picky about graphics, but the last year or so of games have changed my mind. I just don’t feel like my dual 970s and higher resolution screen provide [their price worth] of jump in graphical fidelity outside of resolution, FPS, and AA like there was with previous PC to console comparisons. With that said, I still buy all first person shooters on PC because of the framerate and my mouse and keyboard.
Somewhat back on topic: I’m no longer a fan of Nvidia. The way they conduct business looks more and more like that of a monopoly. I look forward to buying whatever AMD launches next so long as it has more vram than the FuryX.
Is the console version
Is the console version actually running at the same settings? I don’t think most Xbox titles run at 1080p with ultra settings, right?
I’m pretty sure they will be
I’m pretty sure they will be running at slightly different settings. Part of the point of my previous rambling comment was that the visual differences between the settings that the Xbox one runs at and the PC maxed out seem to be much smaller than between prior generation consoles and PCs of those eras. Sometimes I wonder if the “ultra” settings on PC are really adding much to the visual experience or just lowering FPS.
lol Nvidia wants to regulate
lol Nvidia wants to regulate developers as they did in the oxide to cover up flaws in their graphics cards
Now if they have half a brain
Now if they have half a brain at AMD, and if that half part is at least partially functional, they will take that list and then publish everything they thing it is wrong in that list. Or at least create their own.
I think most programmers are
I think most programmers are familiar with AMDs list if they have developed a game for XB1.
This list is more of a PR stunt to do damage control of sorts for those developers who wont take the time to un-program GCN features in their ports.
Just look where they published it. GAMEWORKS.
Well now that gaming will
Well now that gaming will make good use of all that AMD Hardware asynchronous compute for games, in addition to other compute software, those that have been criticizing Hardware asynchronous compute as not beneficial/necessary to gaming will have to eat a big crow pie. There is more to gaming calculations that just graphics calculations, and there has always been, so being able to accelerate more of the non graphics gaming code on the GPU, as well as the graphics workloads for gaming will allow for better gaming with less latency, especially for discrete GPUs where there is some distance between the CPU and GPU with a lot of latency inducing protocol encoding/decoding steps along the way. This is why the VR gaming people are all in with Hardware asynchronous compute on the GPU, and more things done on the GPU at a lower latency.
This CPU to GPU communicating over PCIe and distance is gong to become a thing of the past, especially with AMD’s introducing APUs on an interposer, as the interposer with the separate CPU die wired up to the GPU separate die over some very wide Thousands of parallel traces CPU to GPU is going to be a game changer. No longer will there be a bottleneck between CPU and a powerful GPU as the interposer will allow for the same wide parallel connections etched in the silicon interposer’s substrate, and the CPU able to communicate with the GPU as if they where on the same monolithic die using the same style of on die interconnect fabric. AMD’s Arctic Islands new Micro-architecture with its new ISA will have even more of the Hardware asynchronous compute ability engineered into its ACE units to allow for even more GPU thread processing efficiency to run graphics and non graphics processing workloads with even less latency. These interposer base gaming APUs are going to be used for PC and console gaming and allow for even more powerful gaming in a much smaller form factor package, even smaller that the mini systems currently available.
Nvidia better get with more of that Hardware asynchronous compute on its GPU systems, or it will find that the gaming software ecosystem will be leaving Nvidia behind. DX12, Vulkan, and even Apple’s Metal graphics APIs are going to make great use of the GPU’s Hardware asynchronous compute ability, and more than just AMD’s Hardware asynchronous compute ability as the mobile GPU makers are also all in with the HSA foundation and more Hardware asynchronous compute on the GPU for mobile devices as well.
Rule Number One:
Never ever
Rule Number One:
Never ever ever Use Async compute!
Rule number ONE Nvidia better
Rule number ONE Nvidia better get its own IN HARDWARE Asynchronous compute, and Stop all that Green Goblin Gimping of GPU compute resources on its consumer SKUs. The games makers will be useing more of that GPU Hardware based Asynchronous compute and Nvidia will loose market share. The other GPU Makers that make GPUs for the mobile devices market will be getting even more on board with more hardware Asynchronous compute on their integrated mobile GPUs!
Asynchronous compute on the GPU is here to stay, and the VR gaming industry will make use of even more IN the Hardware Asynchronous compute for their products.
Those never ever use Asynchronous compute(In the GPU’s hardware) folks will never ever remain in business once the entire gaming/gaming engine and graphics API software stack makes use of that Asynchronous compute(In the GPU’s hrdware) from now on! AMD’s Arctic Islands will have even more ACE/Asynchronous compute(In the GPU’s hardware)!
Until pascal so we can get
Until pascal so we can get maxwell customers to upgrade
Fyi unreal engine supports async on Xbox one but is disabled on the pc. Muhahaha
Rule Number One:
Never ever
Rule Number One:
Never ever ever Use Async compute!
till nvidia implement that in gpu (read one after pascal aka maxwell refresh on finfet)
And what’s AMD’s rule number
And what’s AMD’s rule number one? “Never use DX11, because our driver coders are trained circus monkeys?”
Or perhaps the people that
Or perhaps the people that would have spent all of their time optimizing for specific DX11 titles spent their time coming up with a completely new graphics API instead; Mantle, which is essentially now Vulkan. I assume mantle also heavily influenced DX12, since it is needed for the AMD APU in the Xbox. These new APIs are a much cleaner solution than having to do so much game/engine specific optimizations. Most fanboys seem to forget that AMD essentially pushed the industry towards a much better solution, while Nvidia was content making money on the status quo.
Also, most game developers will not have to deal that much with raw DX12, Vulkan, or Metal. That will be abstracted by the game engine. Engine developers have to completely rewrite their game engines though.
The asynchronous compute
The asynchronous compute controversy seems like it may be more important going forward, especially if we move more towards multi-gpu solutions. With silicon interposers, it may make a lot of sense to use multiple smaller GPUs rather than one giant die. This is especially the case if yields at 14 or 16 nm are not that good. Yields of four 150 square mm die would be much better than one 600 square mm die. Asynchronous compute heavily code seems like it would make very good use of multiple GPUs since I would assume that the compute job can be scheduled on any available asynchronous compute engine, whether it is in the IGP or in one of several dedicated cards. Although, it may alleviate some of the lack of asynchronous compute scheduling resources on Nvidia cards if those jobs can be submitted to an integrated gpu also.
This is not enough.
Intel
This is not enough.
Intel publish whole ISA and docs and optimisations guides for their CPUs/GPUs.
AMD too for CPUs, while GPUs have ISA published.
Nvidias tips are thus far behind industry standard
It was always important, but with DX12 green teem can no longer make patch work with their driver (and even in the past not all games did get driver releases, did they?)
So big thx for those guidelines. We are waiting for more!
so let me get this
so let me get this straight
they actually know by now that they cant solve the whole async compute by driver alone without bottlenecking the cpu and thus they release that utterly stupid paper to try and regulate some devs in order to minimize the impact?
it seems like devs will have no chance now just to actually code the game by vendor specs
its funny that ms push hard for dx12 to be a standar while nvidia is actually pushing devs to be more on their favor…
i bet that if they release another batman like game with gameworks on it its going to be their funeral..
Batman doesn’t have anything
Batman doesn’t have anything to do with this. That’s WB to blame for been cheap.
not really but ok.. when a
not really but ok.. when a game doesnt work you cant just to blame one side of the dev team you know..
you win together you loose together
He forgetting the big speech
He forgetting the big speech Tom Petersen came on here to say how they send Nvidia programmers to help from day 1 and help along the progress of that title when its Nvidia sponsored.
Guess they fell asleep on the job.
“Don’ts
Don’t toggle between
“Don’ts
Don’t toggle between compute and graphics on the same command queue more than absolutely necessary
This is still a heavyweight switch to make”
Tsk tsk tsk
[url]https://developer.nvidia
[url]https://developer.nvidia.com/dx12-dos-and-donts#dx12[/url]
[quote]Do’s
[b]•Use hardware conservative raster for full-speed conservative rasterization[/b]
◦No need to use a GS to implement a ‘slow’ software base conservative rasterization
◦See [url]https://developer.nvidia.com/content/dont-be-conservative-conservative-rasterization[/url]
•Make use of NvAPI (when available) to access other Maxwell features
◦Advanced Rasterization features
◾Bounding box rasterization mode for quad based geometry
◾New MSAA features like post depth coverage mask and overriding the coverage mask for routing of data to sub-samples
◾Programmable MSAA sample locations
◦Fast Geometry Shader features
◾Render to cube maps in one geometry pass without geometry amplifications
◾Render to multiple viewports without geometry amplifications
◾Use the fast pass-through geometry shader for techniques that need per-triangle data in the pixel shader
◦New interlocked operations
◦Enhanced blending ops
◦New texture filtering ops
Don’ts
•Don’t use Raster Order View (ROV) techniques pervasively
◦Guaranteeing order doesn’t come for free
◦Always compare with alternative approaches like advanced blending ops and atomics
[/quote]
AND
[quote]Check carefully if the use of a separate compute command queues really is advantageous
Even for compute tasks that can in theory run in parallel with graphics tasks, the actual scheduling details of the parallel work on the GPU may not generate the results you hope for
Be conscious of which asynchronous compute and graphics workloads can be scheduled together [/quote]
so yes that’s `DO use what we can do well and Don’t use other stuff our competition can use well
also don’t sue ROV`s? well I’m sure intel will be real happy about that
The NvAPI part even says
The NvAPI part even says “when available” and to use that to access Maxwell features.
The ROV part doesn’t say don’t use it. It says don’t use it pervasively with comments as to why…
For the compute part it’s common sense that applies. If you’re going to write a DX12 title that takes the most advantage of all platforms, you need to know when you can use certain features…
All gpu architectures have
All gpu architectures have their strong and weak sides.
Optimizations thus should allow for some of that.
You will find some good optimizations guides for Intel, where they do not shy from discouraging from using features that are not performant on Intel hw, too!