Manufacturer: Microsoft

It's all fun and games until something something AI.

Microsoft announced the Windows Machine Learning (WinML) API about two weeks ago, but they did so in a sort-of abstract context. This week, alongside the 2018 Game Developers Conference, they are grounding it in a practical application: video games!


Specifically, the API provides the mechanisms for game developers to run inference on the target machine. The training data that it runs against would be in the Open Neural Network Exchange (ONNX) format from Microsoft, Facebook, and Amazon. Like the initial announcement suggests, it can be used for any application, not just games, but… you know. If you want to get a technology off the ground, and it requires a high-end GPU, then video game enthusiasts are good lead users. When run in a DirectX application, WinML kernels are queued on the DirectX 12 compute queue.

We’ve discussed the concept before. When you’re rendering a video game, simulating an accurate scenario isn’t your goal – the goal is to look like you are. The direct way of looking like you’re doing something is to do it. The problem is that some effects are too slow (or, sometimes, too complicated) to correctly simulate. In these cases, it might be viable to make a deep-learning AI hallucinate a convincing result, even though no actual simulation took place.

Fluid dynamics, global illumination, and up-scaling are three examples.

Previously mentioned SIGGRAPH demo of fluid simulation without fluid simulation...
... just a trained AI hallucinating a scene based on input parameters.

Another place where AI could be useful is… well… AI. One way of making AI is to give it some set of data from the game environment, often including information that a player in its position would not be able to know, and having it run against a branching logic tree. Deep learning, on the other hand, can train itself on billions of examples of good and bad play, and make results based on input parameters. While the two methods do not sound that different, the difference between logic being designed (vs logic being assembled from an abstract good/bad dataset) someone abstracts the potential for assumptions and programmer error. Of course, it abstracts that potential for error into the training dataset, but that’s a whole other discussion.

The third area that AI could be useful is when you’re creating the game itself.

There’s a lot of grunt and grind work when developing a video game. Licensing prefab solutions (or commissioning someone to do a one-off asset for you) helps ease this burden, but that gets expensive in terms of both time and money. If some of those assets could be created by giving parameters to a deep-learning AI, then those are assets that you would not need to make, allowing you to focus on other assets and how they all fit together.

These are three of the use cases that Microsoft is aiming WinML at.


Sure, these are smooth curves of large details, but the antialiasing pattern looks almost perfect.

For instance, Microsoft is pointing to an NVIDIA demo where they up-sample a photo of a car, once with bilinear filtering and once with a machine learning algorithm (although not WinML-based). The bilinear algorithm behaves exactly as someone who has used Photoshop would expect. The machine learning algorithm, however, was able to identify the objects that the image intended to represent, and it drew the edges that it thought made sense.


Like their DirectX Raytracing (DXR) announcement, Microsoft plans to have PIX support WinML “on Day 1”. As for partners? They are currently working with Unity Technologies to provide WinML support in Unity’s ML-Agents plug-in. That’s all the game industry partners they have announced at the moment, though. It’ll be interesting to see who jumps in and who doesn’t over the next couple of years.

How deep is your learning?

Recently, we've had some hands-on time with NVIDIA's new TITAN V graphics card. Equipped with the GV100 GPU, the TITAN V has shown us some impressive results in both gaming and GPGPU compute workloads.

However, one of the most interesting areas that NVIDIA has been touting for GV100 has been deep learning. With a 1.33x increase in single-precision FP32 compute over the Titan Xp, and the addition of specialized Tensor Cores for deep learning, the TITAN V is well positioned for deep learning workflows.

In mathematics, a tensor is a multi-dimensional array of numerical values with respect to a given basis. While we won't go deep into the math behind it, Tensors are a crucial data structure for deep learning applications.


NVIDIA's Tensor Cores aim to accelerate Tensor-based math by utilizing half-precision FP16 math in order to process both dimensions of a Tensor at the same time. The GV100 GPU contains 640 of these Tensor Cores to accelerate FP16 neural network training.

It's worth noting that these are not the first Tensor operation-specific hardware, with others such as Google developing hardware for these specific functions.

Test Setup

  PC Perspective Deep Learning Testbed
Processor AMD Ryzen Threadripper 1920X
Motherboard GIGABYTE X399 AORUS Gaming 7
Memory 64GB Corsair Vengeance RGB DDR4-3000 
Storage Samsung SSD 960 Pro 2TB
Power Supply Corsair AX1500i 1500 watt
OS Ubuntu 16.04.3 LTS
Drivers AMD: AMD GPU Pro 17.50
NVIDIA: 387.34

For our NVIDIA testing, we used the NVIDIA GPU Cloud 17.12 Docker containers for both TensorFlow and Caffe2 inside of our Ubuntu 16.04.3 host operating system.

AMD testing was done using the hiptensorflow port from the AMD ROCm GitHub repositories.

For all tests, we are using the ImageNet Large Scale Visual Recognition Challenge 2012 (ILSVRC2012) data set.

Continue reading our look at deep learning performance with the NVIDIA Titan V!!

The Khronos Group Releases OpenVX 1.2 Conformance Tests

Subject: General Tech | November 26, 2017 - 07:39 PM |
Tagged: Khronos, openvx, deep neural network, computer vision

OpenVX is an API that enables computer vision in a range of applications, from gesture tracking to surveillance to robotics. This version includes the neural network nodes (convolution, deconvolution, etc.) as well as import and export for compiling graphs offline. This is a part of the updated Adopters Program for OpenCX, which is, as usual for the Khronos Group, cross-platform and royalty-free.


Obviously, this only affects a subset of our readers directly, but cross-platform, royalty-free APIs for advanced computing functions will eventually lead to interesting technology. At the same time, for the developers in our audience, the tools are now available to test your code and hardware. The Khronos Group expects that early implementations will ship in 2018.

You can read the full press release at their website.

Imagination Technologies Announces PowerVR 2NX NNA

Subject: Mobile | September 25, 2017 - 10:45 PM |
Tagged: Imagination Technologies, deep neural network

Imagination Technologies is known to develop interesting, somewhat offbeat hardware, such as GPUs with built-in ray tracers. In this case, the company is jumping into the neural network market with a Power VR-branded accelerator. The PowerVR Series2NX Neural Network Accelerator works on massively parallel, but low-precision tasks. AnandTech says that the chip can even work in multiple bit-depths on different layers in a single network, from 16-bit, down to 12-, 10-, 8-, 7-, 6-, 5-, and 4-bit.


Image Credit: Imagination Technologies via Anandtech

Imagination seems to say that this is variable “to maintain accuracy”. I’m guessing it doesn’t give an actual speed-up to tweak your network in that way, but I honestly don’t know.

As for Imagination Technologies, they intend to have this in mobile devices for, as they suggest, photography and predictive text. They also state the usual suspects: VR/AR, automotive, surveillance, and so forth. They are suggesting that this GPU technology will target Tensorflow Lite.

The PowerVR 2NX Neural Network Accelerator is available for licensing.

Fluid Simulations via Machine Learning Demo for SIGGRAPH

Subject: General Tech | May 29, 2017 - 08:46 PM |
Tagged: machine learning, fluid, deep neural network, deep learning

SIGGRAPH 2017 is still a few months away, but we’re already starting to see demos get published as groups try to get them accepted to various parts of the trade show. In this case, Physics Forests published a two-minute video where they perform fluid simulations without actually simulating fluid dynamics. Instead, they used a deep-learning AI to hallucinate a convincing fluid dynamics result given their inputs.

We’re seeing a lot of research into deep-learning AIs for complex graphics effects lately. The goal of most of these simulations, whether they are for movies or video games, is to create an effect that convinces the viewer that what they see is realistic. The goal is not to create an actually realistic effect. The question then becomes, “Is it easier to actually solve the problem? Or is it easier having an AI learn, based on a pile of data sorted into successes and failures, come up with an answer that looks correct to the viewer?”

In a lot of cases, like global illumination and even possibly anti-aliasing, it might be faster to have an AI trick you. Fluid dynamics is just one example.

AMD has the Instinct; if not the license, to kill

Subject: Graphics Cards | December 12, 2016 - 04:05 PM |
Tagged: vega 10, Vega, training, radeon, Polaris, machine learning, instinct, inference, Fiji, deep neural network, amd

Ryan was not the only one at AMD's Radeon Instinct briefing, covering their shot across NVIDIA's HPC products.  The Tech Report just released their coverage of the event and the tidbits which AMD provided about the MI25, MI8 and MI6; no relation to a certain British governmental department.   They focus a bit more on the technologies incorporated into GEMM and point out that AMD's top is not matched by an NVIDIA product, the GP100 GPU does not come as an add-in card.  Pop by to see what else they had to say.


"Thus far, Nvidia has enjoyed a dominant position in the burgeoning world of machine learning with its Tesla accelerators and CUDA-powered software platforms. AMD thinks it can fight back with its open-source ROCm HPC platform, the MIOpen software libraries, and Radeon Instinct accelerators. We examine how these new pieces of AMD's machine-learning puzzle fit together."

Here are some more Graphics Card articles from around the web:

Graphics Cards

Manufacturer: AMD

AMD Enters Machine Learning Game with Radeon Instinct Products

NVIDIA has been diving in to the world of machine learning for quite a while, positioning themselves and their GPUs at the forefront on artificial intelligence and neural net development. Though the strategies are still filling out, I have seen products like the DIGITS DevBox place a stake in the ground of neural net training and platforms like Drive PX to perform inference tasks on those neural nets in self-driving cars. Until today AMD has remained mostly quiet on its plans to enter and address this growing and complex market, instead depending on the compute prowess of its latest Polaris and Fiji GPUs to make a general statement on their own.


The new Radeon Instinct brand of accelerators based on current and upcoming GPU architectures will combine with an open-source approach to software and present researchers and implementers with another option for machine learning tasks.

The statistics and requirements that come along with the machine learning evolution in the compute space are mind boggling. More than 2.5 quintillion bytes of data are generated daily and stored on phones, PCs and servers, both on-site and through a cloud infrastructure. That includes 500 million tweets, 4 million hours of YouTube video, 6 billion google searches and 205 billion emails.


Machine intelligence is going to allow software developers to address some of the most important areas of computing for the next decade. Automated cars depend on deep learning to train, medical fields can utilize this compute capability to more accurately and expeditiously diagnose and find cures to cancer, security systems can use neural nets to locate potential and current risk areas before they affect consumers; there are more uses for this kind of network and capability than we can imagine.

Continue reading our preview of the AMD Radeon Instinct machine learning processors!

Microsoft Focusing Efforts, Forming AI and Research Group

Subject: General Tech | October 6, 2016 - 11:37 PM |
Tagged: supercomputer, microsoft, deep neural network, azure, artificial intelligence, ai

Microsoft recently announced it would be restructuring 5,000 employees as it focuses its efforts on artificial intelligence with a new AI and Research Group. The Redmond giant is pulling computer scientists and engineers from Microsoft Research, the Information Platfrom, Bing, and Cortana groups, and the Ambient Computing and Robotics teams. Led by 20 year Microsoft veteran Harry Shum (who has worked in both research and engineering roles at Microsoft), the new AI team promises to "democratize AI" and be a leader in the field with intelligent products and services. 

AI Cortana.jpg

It seems that "democratizing AI" is less about free artificial intelligence and more about making the technology accessible to everyone. The AI and Research Group plans to develop artificial intelligence to the point where it will change how humans interact with their computers (read: Cortana 2.0) with services and commands being conversational rather than strict commands, new applications baked with AI such as office and photo editors that are able to proof read and suggest optimal edits respectively, and new vision, speech, and machine analytics APIs that other developers will be able to harness for their own applications. (Wow that's quite the long sentence - sorry!)

Further, Microsoft wants to build the world's fastest AI supercomputer using its Azure cloud computing service. The Azure-powered AI will be available to everyone for their applications and research needs (for a price, of course!). Microsoft certainly has the money, brain power, and computing power to throw at the problem, and this may be one of the major areas where looking to "the cloud" for a company's computing needs is a smart move as the up front capital needed for hardware, engineers, and support staff to do something like this in-house would be extremely prohibative. It remains to be seen whether Microsoft will win out in the wake of competitors at being the first, but it is certainly staking its claim and does not want to be left out completely.

“Microsoft has been working in artificial intelligence since the beginning of Microsoft Research, and yet we’ve only begun to scratch the surface of what’s possible,” said Shum, executive vice president of the Microsoft AI and Research Group. “Today’s move signifies Microsoft’s commitment to deploying intelligent technology and democratizing AI in a way that changes our lives and the world around us for the better. We will significantly expand our efforts to empower people and organizations to achieve more with our tools, our software and services, and our powerful, global-scale cloud computing capabilities.”

Interestingly, this announcement comes shortly after a previous announcement that industry giants Amazon, Facebook, Google-backed DeepMind, IBM, and Microsoft founded the not-for-profit Partnership On AI organization that will collaborate and research best practices on AI development and exploitation (and hopefully how to teach them not to turn on us heh).

I am looking forward to the future of AI and the technologies it will enable!

Source: Microsoft

NVIDIA Teases Low Power, High Performance Xavier SoC That Will Power Future Autonomous Vehicles

Subject: Processors | October 1, 2016 - 06:11 PM |
Tagged: xavier, Volta, tegra, SoC, nvidia, machine learning, gpu, drive px 2, deep neural network, deep learning

Earlier this week at its first GTC Europe event in Amsterdam, NVIDIA CEO Jen-Hsun Huang teased a new SoC code-named Xavier that will be used in self-driving cars and feature the company's newest custom ARM CPU cores and Volta GPU. The new chip will begin sampling at the end of 2017 with product releases using the future Tegra (if they keep that name) processor as soon as 2018.


NVIDIA's Xavier is promised to be the successor to the company's Drive PX 2 system which uses two Tegra X2 SoCs and two discrete Pascal MXM GPUs on a single water cooled platform. These claims are even more impressive when considering that NVIDIA is not only promising to replace the four processors but it will reportedly do that at 20W – less than a tenth of the TDP!

The company has not revealed all the nitty-gritty details, but they did tease out a few bits of information. The new processor will feature 7 billion transistors and will be based on a refined 16nm FinFET process while consuming a mere 20W. It can process two 8k HDR video streams and can hit 20 TOPS (NVIDIA's own rating for deep learning int(8) operations).

Specifically, NVIDIA claims that the Xavier SoC will use eight custom ARMv8 (64-bit) CPU cores (it is unclear whether these cores will be a refined Denver architecture or something else) and a GPU based on its upcoming Volta architecture with 512 CUDA cores. Also, in an interesting twist, NVIDIA is including a "Computer Vision Accelerator" on the SoC as well though the company did not go into many details. This bit of silicon may explain how the ~300mm2 die with 7 billion transistors is able to match the 7.2 billion transistor Pascal-based Telsa P4 (2560 CUDA cores) graphics card at deep learning (tera-operations per second) tasks. Of course in addition to the incremental improvements by moving to Volta and a new ARMv8 CPU architectures on a refined 16nm FF+ process.

  Drive PX Drive PX 2 NVIDIA Xavier Tesla P4
CPU 2 x Tegra X1 (8 x A57 total) 2 x Tegra X2 (8 x A57 + 4 x Denver total) 1 x Xavier SoC (8 x Custom ARM + 1 x CVA) N/A
GPU 2 x Tegra X1 (Maxwell) (512 CUDA cores total 2 x Tegra X2 GPUs + 2 x Pascal GPUs 1 x Xavier SoC GPU (Volta) (512 CUDA Cores) 2560 CUDA Cores (Pascal)
TDP ~30W (2 x 15W) 250W 20W up to 75W
Process Tech 20nm 16nm FinFET 16nm FinFET+ 16nm FinFET
Transistors ? ? 7 billion 7.2 billion

For comparison, the currently available Tesla P4 based on its Pascal architecture has a TDP of up to 75W and is rated at 22 TOPs. This would suggest that Volta is a much more efficient architecture (at least for deep learning and half precision)! I am not sure how NVIDIA is able to match its GP104 with only 512 Volta CUDA cores though their definition of a "core" could have changed and/or the CVA processor may be responsible for closing that gap. Unfortunately, NVIDIA did not disclose what it rates the Xavier at in TFLOPS so it is difficult to compare and it may not match GP104 at higher precision workloads. It could be wholly optimized for int(8) operations rather than floating point performance. Beyond that I will let Scott dive into those particulars once we have more information!

Xavier is more of a teaser than anything and the chip could very well change dramatically and/or not hit the claimed performance targets. Still, it sounds promising and it is always nice to speculate over road maps. It is an intriguing chip and I am ready for more details, especially on the Volta GPU and just what exactly that Computer Vision Accelerator is (and will it be easy to program for?). I am a big fan of the "self-driving car" and I hope that it succeeds. It certainly looks to continue as Tesla, VW, BMW, and other automakers continue to push the envelope of what is possible and plan future cars that will include smart driving assists and even cars that can drive themselves. The more local computing power we can throw at automobiles the better and while massive datacenters can be used to train the neural networks, local hardware to run and make decisions are necessary (you don't want internet latency contributing to the decision of whether to brake or not!).

I hope that NVIDIA's self-proclaimed "AI Supercomputer" turns out to be at least close to the performance they claim! Stay tuned for more information as it gets closer to launch (hopefully more details will emerge at GTC 2017 in the US).

What are your thoughts on Xavier and the whole self-driving car future?

Also read:

Source: NVIDIA

Microsoft Lets Anyone "Git" Their Deep Learning On With Open Source CNTK

Subject: General Tech | February 4, 2016 - 01:18 PM |
Tagged: open source, microsoft, machine learning, deep neural network, deep learning, cntk, azure

Microsoft has been using deep neural networks for awhile now to power its speech recognition technologies bundled into Windows and Skype to identify and follow commands and to translate speech respectively. This technology is part of Microsoft's Computational Network Toolkit. Last April, the company made this toolkit available to academic researchers on Codeplex, and it is now opening it up even more by moving the project to GitHub and placing it under an open source license.

Lead by chief speech and computer scientist Xuedong Huang, a team of Microsoft researchers built the Computational Network Toolkit (CNTK) to power all their speech related projects. The CNTK is a deep neural network for machine learning that is built to be fast and scalable across multiple systems, and more importantly, multiple GPUs which excel at these kinds of parallel processing workloads and algorithms. Microsoft heavily focused on scalability with CNTK and according to the company's own benchmarks (which is to say to be taken with a healthy dose of salt) while the major competing neural network tool kits offer similar performance running on a single GPU, when adding more than one graphics card CNTK is vastly more efficient with almost four times the performance of Google's TensorFlow and a bit more than 1.5-times Torch 7 and Caffe. Where CNTK gets a bit deep learning crazy is its ability to scale beyond a single system and easily tap into Microsoft's Azure GPU Lab to get access to numerous GPUs from their remote datacenters -- though its not free you don't need to purchase, store, and power the hardware locally and can ramp the number up and down based on how much GPU muscle you need. The example Microsoft provided showed two similarly spec'd Linux systems with four GPUs each running on Azure cloud hosting getting close to twice the performance of the 4 GPU system (75% increase). Microsoft claims that "CNTK can easily scale beyond 8 GPUs across multiple machines with superior distributed system performance."


Using GPU-based Azure machines, Microsoft was able to increase the performance of Cortana's speech recognition by 10-times compared to the local systems they were previously using.

It is always cool to see GPU compute in practice and now that CNTK is available to everyone, I expect to see a lot of new uses for the toolkit beyond speech recognition. Moving to an open source license is certainly good PR, but I think it was actually done more for Microsoft's own benefit rather than users which isn't necessarily a bad thing since both get to benefit from it. I am really interested to see what researchers are able to do with a deep neural network that reportedly offers so much performance thanks to GPUs. I'm curious what new kinds of machine learning opportunities the extra speed will enable.

If you are interested, you can check out CNTK on GitHub!

Source: Microsoft