Subject: General Tech | October 4, 2017 - 08:59 PM | Scott Michaud
Tagged: 3D rendering, otoy, Unity, deep learning
When raytracing images, sample count has a massive impact on both quality and rendering performance. This corresponds to the number of rays within a pixel that were cast, which, when averaged out over many, many rays, eventually matches what the pixel should be. Think of it this way: if your first ray bounces directly into a bright light, and the second ray bounces into the vacuum of space, should the color be white? Black? Half-grey? Who knows! However, if you send 1000 rays with some randomized pattern, then the average is probably a lot closer to what it should be (which depends on how big the light is, what it bounces off of, etc.).
At Unite Austin, which started today, OTOY showed off an “AI temporal denoiser” algorithm for raytraced footage. Typically, an artist chooses a sample rate that looks good enough to the end viewer. In this case, the artist only needs to choose enough samples that an AI can create a good-enough video for the end user. While I’m curious how much performance is required in the inferencing stage, I do know how much a drop in sample rate can affect render times, and it’s a lot.
Check out OTOY’s video, embed above.
Subject: General Tech | June 28, 2017 - 11:17 PM | Scott Michaud
Tagged: Unity, machine learning, deep learning
Unity, who makes the popular 3D game engine of the same name, has announced a research fellowship for integrating machine learning into game development. Two students, who must have been enrolled in a Masters or a PhD program on June 26th, will be selected and provided with $30,000 for a 6-month fellowship. The deadline is midnight (PDT) on September 9th.
We’re beginning to see a lot of machine-learning applications being discussed for gaming. There are some cases, like global illumination and fluid simulations, where it could be faster for a deep-learning algorithm to hallucinate a convincing than a physical solver will produce a correct one. In this case, it makes sense to post-process each frame, so, naturally, game engine developers are paying attention.
If eligible, you can apply on their website.
Subject: General Tech | May 29, 2017 - 08:46 PM | Scott Michaud
Tagged: machine learning, fluid, deep neural network, deep learning
SIGGRAPH 2017 is still a few months away, but we’re already starting to see demos get published as groups try to get them accepted to various parts of the trade show. In this case, Physics Forests published a two-minute video where they perform fluid simulations without actually simulating fluid dynamics. Instead, they used a deep-learning AI to hallucinate a convincing fluid dynamics result given their inputs.
We’re seeing a lot of research into deep-learning AIs for complex graphics effects lately. The goal of most of these simulations, whether they are for movies or video games, is to create an effect that convinces the viewer that what they see is realistic. The goal is not to create an actually realistic effect. The question then becomes, “Is it easier to actually solve the problem? Or is it easier having an AI learn, based on a pile of data sorted into successes and failures, come up with an answer that looks correct to the viewer?”
In a lot of cases, like global illumination and even possibly anti-aliasing, it might be faster to have an AI trick you. Fluid dynamics is just one example.
Subject: General Tech | November 4, 2016 - 02:55 PM | Scott Michaud
Tagged: blizzard, google, ai, deep learning, Starcraft II
Blizzard and DeepMind, which was acquired by Google in 2014 and is now a subsidiary of Alphabet Inc., have just announced opening up StarCraft II for AI research. DeepMind was the company that made AlphaGo, which beat Lee Sedol, a grandmaster of Go, in a best-of-five showmatch with a score of four to one. They hinted at possibly having a BlizzCon champion, some year, do a showmatch as well, which would be entertaining.
StarCraft II is different from Go in three important ways. First, any given player knows what they scout, which they apparently will constrain these AI to honor. Second, there are three possible match-ups for any choice of race, except random, which has nine. Third, it's real-time, which can be good for AI, because they're not constrained by human input limitations, but also difficult from a performance standpoint.
From Blizzard's perspective, better AI can be useful, because humans need to be challenged to learn. Novices won't be embarrassed to lose to a computer over and over, so they can have a human-like opponent to experiment with. Likewise, grandmasters will want to have someone better than them to keep advancing, especially if it allows them to keep new strategies hidden. From DeepMind's perspective, this is another step in AI research, which could be applied to science, medicine, and so forth in the coming years and decades.
Unfortunately, this is an early announcement. We don't know any more details, although they will have a Blizzcon panel on Saturday at 1pm EDT (10am PDT).
Subject: Processors | October 1, 2016 - 06:11 PM | Tim Verry
Tagged: xavier, Volta, tegra, SoC, nvidia, machine learning, gpu, drive px 2, deep neural network, deep learning
Earlier this week at its first GTC Europe event in Amsterdam, NVIDIA CEO Jen-Hsun Huang teased a new SoC code-named Xavier that will be used in self-driving cars and feature the company's newest custom ARM CPU cores and Volta GPU. The new chip will begin sampling at the end of 2017 with product releases using the future Tegra (if they keep that name) processor as soon as 2018.
NVIDIA's Xavier is promised to be the successor to the company's Drive PX 2 system which uses two Tegra X2 SoCs and two discrete Pascal MXM GPUs on a single water cooled platform. These claims are even more impressive when considering that NVIDIA is not only promising to replace the four processors but it will reportedly do that at 20W – less than a tenth of the TDP!
The company has not revealed all the nitty-gritty details, but they did tease out a few bits of information. The new processor will feature 7 billion transistors and will be based on a refined 16nm FinFET process while consuming a mere 20W. It can process two 8k HDR video streams and can hit 20 TOPS (NVIDIA's own rating for deep learning int(8) operations).
Specifically, NVIDIA claims that the Xavier SoC will use eight custom ARMv8 (64-bit) CPU cores (it is unclear whether these cores will be a refined Denver architecture or something else) and a GPU based on its upcoming Volta architecture with 512 CUDA cores. Also, in an interesting twist, NVIDIA is including a "Computer Vision Accelerator" on the SoC as well though the company did not go into many details. This bit of silicon may explain how the ~300mm2 die with 7 billion transistors is able to match the 7.2 billion transistor Pascal-based Telsa P4 (2560 CUDA cores) graphics card at deep learning (tera-operations per second) tasks. Of course in addition to the incremental improvements by moving to Volta and a new ARMv8 CPU architectures on a refined 16nm FF+ process.
|Drive PX||Drive PX 2||NVIDIA Xavier||Tesla P4|
|CPU||2 x Tegra X1 (8 x A57 total)||2 x Tegra X2 (8 x A57 + 4 x Denver total)||1 x Xavier SoC (8 x Custom ARM + 1 x CVA)||N/A|
|GPU||2 x Tegra X1 (Maxwell) (512 CUDA cores total||2 x Tegra X2 GPUs + 2 x Pascal GPUs||1 x Xavier SoC GPU (Volta) (512 CUDA Cores)||2560 CUDA Cores (Pascal)|
|TFLOPS||2.3 TFLOPS||8 TFLOPS||?||5.5 TFLOPS|
|DL TOPS||?||24 TOPS||20 TOPS||22 TOPS|
|TDP||~30W (2 x 15W)||250W||20W||up to 75W|
|Process Tech||20nm||16nm FinFET||16nm FinFET+||16nm FinFET|
|Transistors||?||?||7 billion||7.2 billion|
For comparison, the currently available Tesla P4 based on its Pascal architecture has a TDP of up to 75W and is rated at 22 TOPs. This would suggest that Volta is a much more efficient architecture (at least for deep learning and half precision)! I am not sure how NVIDIA is able to match its GP104 with only 512 Volta CUDA cores though their definition of a "core" could have changed and/or the CVA processor may be responsible for closing that gap. Unfortunately, NVIDIA did not disclose what it rates the Xavier at in TFLOPS so it is difficult to compare and it may not match GP104 at higher precision workloads. It could be wholly optimized for int(8) operations rather than floating point performance. Beyond that I will let Scott dive into those particulars once we have more information!
Xavier is more of a teaser than anything and the chip could very well change dramatically and/or not hit the claimed performance targets. Still, it sounds promising and it is always nice to speculate over road maps. It is an intriguing chip and I am ready for more details, especially on the Volta GPU and just what exactly that Computer Vision Accelerator is (and will it be easy to program for?). I am a big fan of the "self-driving car" and I hope that it succeeds. It certainly looks to continue as Tesla, VW, BMW, and other automakers continue to push the envelope of what is possible and plan future cars that will include smart driving assists and even cars that can drive themselves. The more local computing power we can throw at automobiles the better and while massive datacenters can be used to train the neural networks, local hardware to run and make decisions are necessary (you don't want internet latency contributing to the decision of whether to brake or not!).
I hope that NVIDIA's self-proclaimed "AI Supercomputer" turns out to be at least close to the performance they claim! Stay tuned for more information as it gets closer to launch (hopefully more details will emerge at GTC 2017 in the US).
What are your thoughts on Xavier and the whole self-driving car future?
- NVIDIA Teases Xavier, a High-Performance ARM SoC for Drive PX & AI @ AnandTech
- Tegra Related News @ PC Perspective
- Tesla P4 Specifications @ NVIDIA
- CES 2016: NVIDIA Launches DRIVE PX 2 With Dual Pascal GPUs Driving A Deep Neural Network @ PC Perspective
Subject: Graphics Cards | April 5, 2016 - 02:13 AM | Tim Verry
Tagged: HPC, hbm, gpgpu, firepro s9300x2, firepro, dual fiji, deep learning, big data, amd
Earlier this month AMD launched a dual Fiji powerhouse for VR gamers it is calling the Radeon Pro Duo. Now, AMD is bringing its latest GCN architecture and HBM memory to servers with the dual GPU FirePro S9300 x2.
The new server-bound professional graphics card packs an impressive amount of computing hardware into a dual-slot card with passive cooling. The FirePro S9300 x2 combines two full Fiji GPUs clocked at 850 MHz for a total of 8,192 cores, 512 TUs, and 128 ROPs. Each GPU is paired with 4GB of non-ECC HBM memory on package with 512GB/s of memory bandwidth which AMD combines to advertise this as the first professional graphics card with 1TB/s of memory bandwidth.
Due to lower clockspeeds the S9300 x2 has less peak single precision compute performance versus the consumer Radeon Pro Duo at 13.9 TFLOPS versus 16 TFLOPs on the desktop card. Businesses will be able to cram more cards into their rack mounted servers though since they do not need to worry about mounting locations for the sealed loop water cooling of the Radeon card.
|FirePro S9300 x2||Radeon Pro Duo||R9 Fury X||FirePro S9170|
|GPU||Dual Fiji||Dual Fiji||Fiji||Hawaii|
|GPU Cores||8192 (2 x 4096)||8192 (2 x 4096)||4096||2816|
|Rated Clock||850 MHz||1050 MHz||1050 MHz||930 MHz|
|Texture Units||2 x 256||2 x 256||256||176|
|ROP Units||2 x 64||2 x 64||64||64|
|Memory||8GB (2 x 4GB)||8GB (2 x 4GB)||4GB||32GB ECC|
|Memory Clock||500 MHz||500 MHz||500 MHz||5000 MHz|
|Memory Interface||4096-bit (HBM) per GPU||4096-bit (HBM) per GPU||4096-bit (HBM)||512-bit|
|Memory Bandwidth||1TB/s (2 x 512GB/s)||1TB/s (2 x 512GB/s)||512 GB/s||320 GB/s|
|TDP||300 watts||?||275 watts||275 watts|
|Peak Compute||13.9 TFLOPS||16 TFLOPS||8.60 TFLOPS||5.24 TFLOPS|
AMD is aiming this card at datacenter and HPC users working on "big data" tasks that do not require the accuracy of double precision floating point calculations. Deep learning tasks, seismic processing, and data analytics are all examples AMD says the dual GPU card will excel at. These are all tasks that can be greatly accelerated by the massive parallel nature of a GPU but do not need to be as precise as stricter mathematics, modeling, and simulation work that depend on FP64 performance. In that respect, the FirePro S9300 x2 has only 870 GLFOPS of double precision compute performance.
Further, this card supports a GPGPU optimized Linux driver stack called GPUOpen and developers can program for it using either OpenCL (it supports OpenCL 1.2) or C++. AMD PowerTune, and the return of FP16 support are also features. AMD claims that its new dual GPU card is twice as fast as the NVIDIA Tesla M40 (1.6x the K80) and 12 times as fast as the latest Intel Xeon E5 in peak single precision floating point performance.
The double slot card is powered by two PCI-E power connectors and is rated at 300 watts. This is a bit more palatable than the triple 8-pin needed for the Radeon Pro Duo!
The FirePro S9300 x2 comes with a 3 year warranty and will be available in the second half of this year for $6000 USD. You are definitely paying a premium for the professional certifications and support. Here's hoping developers come up with some cool uses for the dual 8.9 Billion transistor GPUs and their included HBM memory!
Subject: General Tech | February 4, 2016 - 01:18 PM | Tim Verry
Tagged: open source, microsoft, machine learning, deep neural network, deep learning, cntk, azure
Microsoft has been using deep neural networks for awhile now to power its speech recognition technologies bundled into Windows and Skype to identify and follow commands and to translate speech respectively. This technology is part of Microsoft's Computational Network Toolkit. Last April, the company made this toolkit available to academic researchers on Codeplex, and it is now opening it up even more by moving the project to GitHub and placing it under an open source license.
Lead by chief speech and computer scientist Xuedong Huang, a team of Microsoft researchers built the Computational Network Toolkit (CNTK) to power all their speech related projects. The CNTK is a deep neural network for machine learning that is built to be fast and scalable across multiple systems, and more importantly, multiple GPUs which excel at these kinds of parallel processing workloads and algorithms. Microsoft heavily focused on scalability with CNTK and according to the company's own benchmarks (which is to say to be taken with a healthy dose of salt) while the major competing neural network tool kits offer similar performance running on a single GPU, when adding more than one graphics card CNTK is vastly more efficient with almost four times the performance of Google's TensorFlow and a bit more than 1.5-times Torch 7 and Caffe. Where CNTK gets a bit deep learning crazy is its ability to scale beyond a single system and easily tap into Microsoft's Azure GPU Lab to get access to numerous GPUs from their remote datacenters -- though its not free you don't need to purchase, store, and power the hardware locally and can ramp the number up and down based on how much GPU muscle you need. The example Microsoft provided showed two similarly spec'd Linux systems with four GPUs each running on Azure cloud hosting getting close to twice the performance of the 4 GPU system (75% increase). Microsoft claims that "CNTK can easily scale beyond 8 GPUs across multiple machines with superior distributed system performance."
Using GPU-based Azure machines, Microsoft was able to increase the performance of Cortana's speech recognition by 10-times compared to the local systems they were previously using.
It is always cool to see GPU compute in practice and now that CNTK is available to everyone, I expect to see a lot of new uses for the toolkit beyond speech recognition. Moving to an open source license is certainly good PR, but I think it was actually done more for Microsoft's own benefit rather than users which isn't necessarily a bad thing since both get to benefit from it. I am really interested to see what researchers are able to do with a deep neural network that reportedly offers so much performance thanks to GPUs. I'm curious what new kinds of machine learning opportunities the extra speed will enable.
If you are interested, you can check out CNTK on GitHub!
Subject: General Tech | January 5, 2016 - 01:17 AM | Tim Verry
Tagged: tegra, pascal, nvidia, driveworks, drive px 2, deep neural network, deep learning, autonomous car
NVIDIA is using the Consumer Electronics Show to launch the Drive PX 2 which is the latest bit of hardware aimed at autonomous vehicles. Several NVIDIA products combine to create the company's self-driving "end to end solution" including DIGITS, DriveWorks, and the Drive PX 2 hardware to train, optimize, and run the neural network software that will allegedly be the brains of future self-driving cars (or so NVIDIA hopes).
The Drive PX 2 hardware is the successor to the Tegra-powered Drive PX released last year. The Drive PX 2 represents a major computational power jump with 12 CPU cores and two discrete "Pascal"-based GPUs! NVIDIA has not revealed the full specifications yet, but they have made certain details available. There are two Tegra SoCs along with two GPUs that are liquid cooled. The liquid cooling consists of a large metal block with copper tubing winding through it and then passing into what looks to be external connectors that attach to a completed cooling loop (an exterior radiator, pump, and reservoir).
There are a total of 12 CPU cores including eight ARM Cortex A57 cores and four "Denver" cores. The discrete graphics are based on the 16nm FinFET process and will use the company's upcoming Pascal architecture. The total package will draw a maximum of 250 watts and will offer up to 8 TFLOPS of computational horsepower and 24 trillion "deep learning operations per second." That last number relates to the number of special deep learning instructions the hardware can process per second which, if anything, sounds like an impressive amount of power when it comes to making connections and analyzing data to try to classify it. Drive PX 2 is, according to NVIDIA, 10 times faster than it's predecessor at running these specialized instructions and has nearly 4 times the computational horsepower when it comes to TLOPS.
Similar to the original Drive PX, the driving AI platform can accept and process the inputs of up to 12 video cameras. It can also handle LiDAR, RADAR, and ultrasonic sensors. NVIDIA compared the Drive PX 2 to the TITAN X in its ability to process 2,800 images per second versus the consumer graphics card's 450 AlexNet images which while possibly not the best comparison does make it look promising.
Neural networks and machine learning are at the core of what makes autonomous vehicles possible along with hardware powerful enough to take in a multitude of sensor data and process it fast enough. The software side of things includes the DriveWorks development kit which includes specialized instructions and a neural network that can detect objects based on sensor input(s), identify and classify them, determine the positions of objects relative to the vehicle, and calculate the most efficient path to the destination.
Specifically, in the press release NVIDIA stated:
"This complex work is facilitated by NVIDIA DriveWorks™, a suite of software tools, libraries and modules that accelerates development and testing of autonomous vehicles. DriveWorks enables sensor calibration, acquisition of surround data, synchronization, recording and then processing streams of sensor data through a complex pipeline of algorithms running on all of the DRIVE PX 2's specialized and general-purpose processors. Software modules are included for every aspect of the autonomous driving pipeline, from object detection, classification and segmentation to map localization and path planning."
DIGITS is the platform used to train the neural network that is then used by the Drive PX 2 hardware. The software is purportedly improving in both accuracy and training time with NVIDIA achieving a 96% accuracy rating at identifying traffic signs based on the traffic sign database from Ruhr University Bochum after a training session lasting only 4 hours as opposed to training times of days or even weeks.
NVIDIA claims that the initial Drive PX has been picked up by over 50 development teams (automakers, universities, software developers, et al) interested in autonomous vehicles. Early access to development hardware is expected to be towards the middle of the year with general availability of final hardware in Q4 2016.
The new Drive PX 2 is getting a serious hardware boost with the inclusion of two dedicated graphics processors (the Drive PX was based around two Tegra X1 SoCs), and that should allow automakers to really push what's possible in real time and push the self-driving car a bit closer to reality and final (self) drive-able products. I'm excited to see that vision come to fruition and am looking forward to seeing what this improved hardware will enable in the auto industry!
Follow all of our coverage of the show at http://pcper.com/ces!