NVIDIA's newest Turing based HPC card the RTX 4000 has arrived, with 2304 CUDA cores, 288 Tensor Cores, 36 RT Cores, and 8GB of GDDR6 on-board GPU memory. They haven't released any benchmarks as of yet but do state the new memory will offer a 40% increase in bandwidth compared to the previous P4000 and that the card can produce up to 57 TFLOPs of performance, one assumes this refers to INT8 performance.
They are showing the card off at Autodesk, if you visit they have set up a demo which uses the Enscape3D plugin to let you put on a VR headset to step inside a full-scale Autodesk Revit model and make changes in real time, which would be an interesting way to work. The card will sell for ~$900 which puts in reach of quite a few possible users and might encourage AMD to sell it's Instinct MI60 and MI50 cards for a price in that ballpark.
Check it out on NVIDIA's page here.
Looks like AMD is starting to
Looks like AMD is starting to offer higher clocked Epyc/Naples SKUs also(1). So that more towards the workstation end with the newer higher clocked Epyc/Naples variants. I’ll bet that will pair nicely with this RTX Quadro SKU to save some money on the CPU side because the RTX Quadro’s are not low cost but those Ray Tracing features for professional graphics workloads will likely be made use of.
“AMD has announced its new high-frequency EPYC 7371 processor designed for applications that benefit from high clocks. The CPU has 16 cores and is aimed at tasks like electronic design automation, high-frequency trading, and other. The EPYC 7371 can work in dual-socket configuration, thus offering up to 32 cores and 64 threads per box.
The AMD EPYC 7371 processor features 16 cores with SMT (spread across four eight-core Zen dies), 64 MB of L3 cache, an eight-channel DDR4 memory subsystem, and 128 PCIe lanes. The CPU features a 3.1 GHz default frequency, yet can run all cores at 3.6 GHz, or just eight cores at 3.8 GHz.” (1)
(1)
“AMD Launches High-Frequency EPYC 7371 Processor
by Anton Shilov on November 13, 2018 10:12 AM EST”
https://www.anandtech.com/show/13594/amd-launches-highfrequency-epyc-7371-processor
Why are they touting it as a
Why are they touting it as a CPU for high frequency trading? HFT people will go nowhere near AMD CPU:s as they have the latency as a dead cow (relatively speaking, in this field).
Anton appears to be
Anton appears to be channeling AMD’s untrained marketing on that usage model but who Knows about Zen-2 based Epyc/Rome CPUs if that I/O die is sporting loads of L4 Cache and can keep the trading code/data mostly sitting on the L4 and the L3 cach levels.
But high frequency trading is sure a Real Time Task so the OS better be a custom Spin of some Low Latency Trading Linux Kernel based build.
I’m more just thinking that this new Epyc Higher clocked spin will be better for Live 8K encoding Duties along with any makers brand of GPU/Other ASIC accelarator product if needed.
There is one line of Epyc SKUs that are more geared towards Real Time workloads and that’s the embedded Epyc SKUs but who Knows what AMD could do with Zen-2 and the Trading Businesses’ HFT needs.
Power9 offerings by some HFT providors are there so OpenPower and IBM sure have their feet in the HFT market also in addition to Intel and possibly AMD.
Epyc/Naples and the proper NUMA/UMA workload balancing can be helpful to reduce latency and Higher clocks can get more done per second or millesecond and Intel’s New Mesh Topology has latency disadvantages also relative to Intel’s Ring Bus Based lower core count SKUs.
I’d like to see if for Epyc/Rome that AMD may have gone with a Ring Bus for its 8 core chiplets with a 9th position on that ring bus being an Infinity Fabric(CAKE) and an Infinity Fabric hop to the I/O die. But the actual Topology of the Zen-2 Die/Chiplet on Epyc/Rome is a closely guarded secret currently. Dual Ring Busses running in bidirectional mode have some lower latency for any 8 core designs but there may be lower latency with a star/ring topology connection and each CPU core directly linked to the 7 other cores and maybe 1 more node that’s an Infinity Fabric portal.
AMD really needs a line of Epyc/Naples Graphics workstation oriented Products that can run at higher clocks and maybe have More Cache available. But Epyc/Rome and Zen-2 will probably see AMD actually begin to Brand some More Graphics/Video/Sound encoding/decoding oriented SKUs for the Professional Graphics Workstation Market and Hopefully that will also include direct Infinity Fabric based Epyc/Rome to Radeon Pro GPU interfacing instead of any PCIe only based options. IBM’s Power9 speak NVLink also for Direct Power9 to Nvidia GPU accelerator interfacing.
OpenPower Power9’s can also be considered also for HFT to compete with both AMD’s And Intel’s x86 based SKUs.
P.S. Anything not Branded Epyc, Xeon, Power is not really workstation/server grade no matter what anyone says as those are the parts with the actual Vetted/Certified ECC and ECC memory support and Epyc/Rome has improved ECC in all the cache levels also in addition to ECC in the transmission circuitry.
This Quadro is not for
This Quadro is not for compute/AI as much as it is for Professional Graphics and competing with AMD’s Radeon Pro WX branded Professional graphics/graphics workstation products.
The MI50 and MI60 are to compete with the Nvidia Tesla SKUs for Compute/AI infrencing workloads and not this Quadro SKUs.
This Quadro has tensor cores yes but they are for running the Trained AIs for Graphics Image Processing and not for any AI training as a primary use. Nvidia does all of its AI training on hundreds Tesla V100s and once the AI is properly trained it can be loaded onto the This Quadro SKU’s Tensor Cores to work its denoising, or other graphics AI effects and filtering tasks.
Usually the Low End Radeon Pro WX(formally Firepro) and lower end Quadro SKUs have less DP compute than the Compute/AI variants like the Vega 20 based MI50/MI60 and Tesla V100 branded SKUs. So This Quadro is mainly pro graphics with its tensor cores that are mostly for running already trained AI’s for Adobe/other Pro Graphics usage(AI based Filter effects, etc.).
Nvidia’s Teslas cost thousands and so will the Radeon Instinct MI50 and MI60 so that’s not the same market as this RTX Quadro 4000 that looks to be TU104 based.
It’s a TU106 based Part,
It’s a TU106 based Part, according to some other websites, so that’s even less Graphics Horsepower at that ~$900. So really that’s goint to take ages for any AI Training workloads.