Podcast #493 - New XPS 13, Noctua NH-L9a, News from NVIDIA GTC and more!

Subject: General Tech | March 29, 2018 - 02:37 PM |
Tagged: podcast, nvidia, GTC 2018, Volta, quadro gv100, dgx-2, noctua, NH-L9a-AM4

PC Perspective Podcast #493 - 03/29/18

Join us this week for our review of the new XPS 13,  Noctua NH-L9a, news from NVIDIA GTC and more!

You can subscribe to us through iTunes and you can still access it directly through the RSS page HERE.

The URL for the podcast is: http://pcper.com/podcast - Share with your friends!

Hosts: Allyn Malventano, Jeremy Hellstrom, Josh Walrath

Peanut Gallery: Ken Addison

Program length: 0:59:35

Podcast topics of discussion:

  1. Week in Review:
  2. News items of interest:
  3. Picks of the Week:
    1. Allyn: retro game music remixed - ocremix.org (torrents)

NVIDIA Announces DGX-2 with 16 GV100s & 8 100Gb NICs

Subject: Systems | March 27, 2018 - 08:04 PM |
Tagged: Volta, nvidia, dgx-2, DGX

So… this is probably not for your home.

NVIDIA has just announced their latest pre-built system for enterprise customers: the DGX-2. In it, sixteen Volta-based Tesla V100 graphics devices are connected using NVSwitch. This allows groups of graphics cards to communicate to and from every other group at 300GB/s, which, to give a sense of scale, is about as much bandwidth as the GTX 1080 has available to communicate with its own VRAM. NVSwitch treats all 512GB as a unified memory space, too, which means that the developer doesn’t need redundant copies across multiple boards just so it can be seen by the target GPU.

nvidia-2018-dgx2-explode.png

Note: 512GB is 16 x 32GB. This is not a typo. 32GB Tesla V100s are now available.

For a little recap, Tesla V100 cards run a Volta-based GV100 GPU, which has 5120 CUDA cores and runs them at ~15 TeraFLOPs of 32-bit performance. Each of these cores also scale exactly to FP64 and FP16, as was the case since Pascal’s high-end offering, leading to ~7.5 TeraFLOPs of 64-bit or ~30 TeraFLOPs of 16-bit computational throughput. Multiply that by sixteen and you get 480 TeraFLOPs of FP16, 240 TeraFLOPs of FP32, or 120 TeraFLOPs of FP64 performance for the whole system. If you count the tensor units, then we’re just under 2 PetaFlops of tensor instructions. This is powered by a pair of Xeon Platinum CPUs (Skylake) and backed by 1.5TB of system RAM – which is only 3x the amount of RAM that the GPUs have if you stop and think about it.

nvidia-2018-dgx-list.png

The device communicates with the outside world through eight EDR InfiniBand NICs. NVIDIA claims that this yields 1600 gigabits of bi-directional bandwidth. Given how much data this device is crunching, it makes sense to keep data flowing in and out as fast as possible, especially for real-time applications. While the Xeons are fast and have many cores, I’m curious to see how much overhead the networking adds to the system when under full load, minus any actual processing.

NVIDIA’s DGX-2 is expected to ship in Q3.

Source: NVIDIA