Get some deep Tensorflow lovin' instead of the cupidity which is the hallmark of today

Subject: General Tech | February 14, 2019 - 12:24 PM |
Tagged: tensorflow, google, Downtiration, deep learning, ai

Downtiration is not a word, but then again what you are about to hear isn't exactly a song either, though is closer to one than many of the insipid honey drenched hits you are likely to hear today.  A company by the name of Search Laboratory fed Google's Tenserflow software with 999 love songs and let it assemble the new benchmark for sentimental songbirds.  It is also a great example as to the current limitations of AI and Deep Learning, regardless of what the PR flacks would have you believe.

You can thank The Register for the next two irrecoverable minutes of your life.

"The song, entitled 'Downtiration Tender love' was created by media agency Search Laboratory and its "character-based Recurrent Neural Network," that uses Google's open-source machine learning software, TensorFlow loaded up with 999 snippets from the world's greatest love songs."

Here is some more Tech News from around the web:

Tech Talk

Source: The Inquirer

Mozilla & Ubisoft "Clever-Commit" Deep-Learning Code Review

Subject: General Tech | February 12, 2019 - 05:52 PM |
Tagged: Rust, mozilla, deep learning, c++

The basic premise of “deep learning” is that you process big pools of data to try and find “good” and/or “bad” patterns. After you build up a set of trained data, you can compare new data against it to accomplish some goal.

In this case, Mozilla is using it to scan commits to the Firefox codebase as a form of automated code review. The system was originally developed by Ubisoft as Commit Assistant, which they have been using as a form of code analysis. Mozilla has since partnered with them, and will contribute to its ability to scan C++, JavaScript, and Mozilla’s own Rust language.

Other vendors, such as Microsoft and their IntelliCode system, have been using deep learning to assist in software development. It’s an interesting premise that, along with unit tests, static code analysis, and so forth, should increase the quality of code.

Personally, I’m one of those people that regularly use static code analysis (if the platform has a good and affordable solution available). It’s good to follow strong design patterns, but it’s hard to recover from the “broken window theory” once you get a few hundred static code analysis warnings… or a few hundred compiler warnings. Apathy just sets in and I just end up ignoring everything from that feedback level, down. It pushes me to, if I can control a project from scratch, keep it clean of warnings and code analysis issues.

All that is to say – it’ll be interesting to see how Clever-Commit is adopted. Since it’s apparently on a per-commit basis, it shouldn’t be bogged down by past mistakes. I wonder if we can somehow add that theory to other forms of code analysis. I’m curious what sort of data we could gather by scanning from commit to commit… what that would bring in terms of a wholistic view of code quality for various projects.

And then… what will happen when deep learning starts generating code? Hmm.

Source: Ars Technica

NVIDIA Introduces AI Interactive Graphics Research: 3D from Real-World Video

Subject: General Tech | December 3, 2018 - 08:00 AM |
Tagged: ue4, nvidia, NeurIPS, deep learning, ai, 3D rendering

NVIDIA has introduced new research at the NeurIPS AI conference in Montreal that allows rendering of 3D environments from models trained on real-world videos. It's a complex topic that does have potential beyond scientific research with possible application for game developers, though this is not to the "product" stage just yet. A video accompanying the press release today shows how the researchers have implemented this technology so far:

AI_research_side-by-side_FINAL.JPG

"Company researchers used a neural network to apply visual elements from existing videos to new 3D environments. Currently, every object in a virtual world needs to be modeled. The NVIDIA research uses models trained from video to render buildings, trees, vehicles and objects."

The AI-generated city of a simple driving game demo shown at the NeurIPS AI conference gives us an early look at the sort of 3D environment that can be rendered by the neural network, as "the generative neural network learned to model the appearance of the world, including lighting, materials and their dynamics" from video footage, and this was rendered as the game environment using Unreal Engine 4.

"The technology offers the potential to quickly create virtual worlds for gaming, automotive, architecture, robotics or virtual reality. The network can, for example, generate interactive scenes based on real-world locations or show consumers dancing like their favorite pop stars."

Beyond video-to-video this research can also be applied to still images, with models providing the basis for what is eventually rendered movement (the video embedded above includes a demonstration of this aspect of the research - and yes, dancing is involved). And while all of this might be a year or two away from appearing in a new game release, but the possibilities are fascinating to contemplate, to say the least.

Source: NVIDIA

Intel wants you to stick it to deep learning

Subject: General Tech | November 14, 2018 - 01:46 PM |
Tagged: neural compute stick 2, Intel, deep learning

If you are interested in delving into the world of computer vision and image recognition then the new Neural Compute Stick from Intel is something you should be aware of.  For a mere $100 you get a system on a stick with a Movidius Myriad X VPU, which can be used with Intel's version of the OpenVINO toolkit to develop machine vision applications.  You don't need any special hardware, just plug this into a USB 3.0 port and get programming.  If this seems up your alley you can follow the links from The Inquirer for more details.

Intel-Neural-Compute-Stick.jpg

"Intel claims this is the first stick to feature a neural compute engine for dedicated hardware neural network inference (essentially the technique of putting AI smarts into action) accelerator."

Here is some more Tech News from around the web:

Tech Talk

 

Source: The Inquirer

Intro and NNEF 1.0 Finalization

SIGGRAPH 2018 is a huge computer graphics expo that occurs in a seemingly random host city around North America. (Asia has a sister event, called SIGGRAPH Asia, which likewise shuffles around.) In the last twenty years, the North American SIGGRAPH seems to like Los Angeles, which hosted the event nine times over that period, but Vancouver won out this year. As you would expect, the maintainers of OpenGL and Vulkan are there, and they have a lot to talk about.

In summary:

  • NNEF 1.0 has been finalized and released!
  • The first public demo of OpenXR is available and on the show floor.
  • glTF Texture Transmission Extension is being discussed.
  • OpenCL Ecosystem Roadmap is being discussed.
  • Khronos Educators Program has launched.

I will go through each of these points. Feel free to skip around between the sections that interest you!

Read on to see NNEF or see page 2 for the rest!

GTC 2018: Nvidia and ARM Integrating NVDLA Into Project Trillium For Inferencing at the Edge

Subject: General Tech | March 29, 2018 - 03:10 PM |
Tagged: project trillium, nvidia, machine learning, iot, GTC 2018, GTC, deep learning, arm, ai

During GTC 2018 NVIDIA and ARM announced a partnership that will see ARM integrate NVIDIA's NVDLA deep learning inferencing accelerator into the company's Project Trillium machine learning processors. The NVIDIA Deep Learning Accelerator (NVDLA) is an open source modular architecture that is specifically optimized for inferencing operations such as object and voice recognition and bringing that acceleration to the wider ARM ecosystem through Project Trillium will enable a massive number of smarter phones, tablets, Internet-of-Things, and embedded devices that will be able to do inferencing at the edge which is to say without the complexity and latency of having to rely on cloud processing. This means potentially smarter voice assistants (e.g. Alexa, Google), doorbell cameras, lighting, and security around the home and out-and-about on your phone for better AR, natural translation, and assistive technologies.

NVIDIAandARM_NVDLA.jpg

Karl Freund, lead analyst for deep learning at Moor Insights & Strategy was quoted in the press release in stating:

“This is a win/win for IoT, mobile and embedded chip companies looking to design accelerated AI inferencing solutions. NVIDIA is the clear leader in ML training and Arm is the leader in IoT end points, so it makes a lot of sense for them to partner on IP.”

ARM's Project Trillium was announced back in February and is a suite of IP for processors optimized for parallel low latency workloads and includes a Machine Learning processor, Object Detection processor, and neural network software libraries. NVDLA is a hardware and software platform based upon the Xavier SoC that is highly modular and configurable hardware that can feature a convolution core, single data processor, planar data processor, channel data processor, and data reshape engines. The NVDLA can be configured with all or only some of those elements and they can independently them up or down depending on what processing acceleration they need for their devices. NVDLA connects to the main system processor over a control interface and through two AXI memory interfaces (one optional) that connect to system memory and (optionally) dedicated high bandwidth memory (not necessarily HBM but just its own SRAM for example).

arm project trillium integrates NVDLA.jpg

NVDLA is presented as a free and open source architecture that promotes a standard way to design deep learning inferencing that can accelerate operations to infer results from trained neural networks (with the training being done on other devices perhaps by the DGX-2). The project, which hosts the code on GitHub and encourages community contributions, goes beyond the Xavier-based hardware and includes things like drivers, libraries, TensorRT support (upcoming)  for Google's TensorFlow acceleration, testing suites and SDKs as well as a deep learning training infrastructure (for the training side of things) that is compatible with the NVDLA software and hardware, and system integration support.

Bringing the "smarts" of smart devices to the local hardware and closer to the users should mean much better performance and using specialized accelerators will reportedly offer the performance levels needed without blowing away low power budgets. Internet-of-Things (IoT) and mobile devices are not going away any time soon, and the partnership between NVIDIA and ARM should make it easier for developers and chip companies to offer smarter (and please tell me more secure!) smart devices.

Also read:

Source: NVIDIA
Manufacturer: Microsoft

It's all fun and games until something something AI.

Microsoft announced the Windows Machine Learning (WinML) API about two weeks ago, but they did so in a sort-of abstract context. This week, alongside the 2018 Game Developers Conference, they are grounding it in a practical application: video games!

microsoft-2018-winml-graphic.png

Specifically, the API provides the mechanisms for game developers to run inference on the target machine. The training data that it runs against would be in the Open Neural Network Exchange (ONNX) format from Microsoft, Facebook, and Amazon. Like the initial announcement suggests, it can be used for any application, not just games, but… you know. If you want to get a technology off the ground, and it requires a high-end GPU, then video game enthusiasts are good lead users. When run in a DirectX application, WinML kernels are queued on the DirectX 12 compute queue.

We’ve discussed the concept before. When you’re rendering a video game, simulating an accurate scenario isn’t your goal – the goal is to look like you are. The direct way of looking like you’re doing something is to do it. The problem is that some effects are too slow (or, sometimes, too complicated) to correctly simulate. In these cases, it might be viable to make a deep-learning AI hallucinate a convincing result, even though no actual simulation took place.

Fluid dynamics, global illumination, and up-scaling are three examples.

Previously mentioned SIGGRAPH demo of fluid simulation without fluid simulation...
... just a trained AI hallucinating a scene based on input parameters.

Another place where AI could be useful is… well… AI. One way of making AI is to give it some set of data from the game environment, often including information that a player in its position would not be able to know, and having it run against a branching logic tree. Deep learning, on the other hand, can train itself on billions of examples of good and bad play, and make results based on input parameters. While the two methods do not sound that different, the difference between logic being designed (vs logic being assembled from an abstract good/bad dataset) someone abstracts the potential for assumptions and programmer error. Of course, it abstracts that potential for error into the training dataset, but that’s a whole other discussion.

The third area that AI could be useful is when you’re creating the game itself.

There’s a lot of grunt and grind work when developing a video game. Licensing prefab solutions (or commissioning someone to do a one-off asset for you) helps ease this burden, but that gets expensive in terms of both time and money. If some of those assets could be created by giving parameters to a deep-learning AI, then those are assets that you would not need to make, allowing you to focus on other assets and how they all fit together.

These are three of the use cases that Microsoft is aiming WinML at.

nvidia-2018-deeplearningcarupscale.png

Sure, these are smooth curves of large details, but the antialiasing pattern looks almost perfect.

For instance, Microsoft is pointing to an NVIDIA demo where they up-sample a photo of a car, once with bilinear filtering and once with a machine learning algorithm (although not WinML-based). The bilinear algorithm behaves exactly as someone who has used Photoshop would expect. The machine learning algorithm, however, was able to identify the objects that the image intended to represent, and it drew the edges that it thought made sense.

microsoft-2018-gdc-PIX.png

Like their DirectX Raytracing (DXR) announcement, Microsoft plans to have PIX support WinML “on Day 1”. As for partners? They are currently working with Unity Technologies to provide WinML support in Unity’s ML-Agents plug-in. That’s all the game industry partners they have announced at the moment, though. It’ll be interesting to see who jumps in and who doesn’t over the next couple of years.

AMD, a little too far ahead of the curve again?

Subject: General Tech | December 27, 2017 - 11:42 AM |
Tagged: nvidia, Intel, HBM2, deep learning

AMD has never been afraid to try new things, from hitting 1GHz first, to creating a true multicore processor, most recently adopting HBM and HBM2 into their graphics cards.  That move contributed to some of their recent difficulties with the current generation of GPUs; HBM is more expensive to produce and more of a challenge to implement.  While they were the first to implement HBM, it is NVIDIA and Intel which are benefiting from AMD's experimental nature.  Their new generation of HPC solutions, the Tesla P100, Quadro GP 100 and Lake Crest all use HBM2 and benefit from the experience Hynix, Samsung and TSMC gained fabbing the first generation.  Vega products offer slightly less memory bandwidth as well as lagging behind in overall performance, a drawback to being first.

On a positive note, AMD have now had more experience designing chips which make use of HBM and this could offer a new hope for the next generation of cards, both gaming and HPC flavours.  DigiTimes briefly covers the two processes manufacturers use in the production of HBM here.

_id1460366655_343178_1.jpg

"However, Intel's release of its deep-learning chip, Lake Crest, which came following its acquisition of Nervana, has come with HMB2. This indicates that HBM-based architecture will be the main development direction of memory solutions for HPC solutions by GPU vendors."

Here is some more Tech News from around the web:

Tech Talk

 

Source: DigiTimes

How deep is your learning?

Recently, we've had some hands-on time with NVIDIA's new TITAN V graphics card. Equipped with the GV100 GPU, the TITAN V has shown us some impressive results in both gaming and GPGPU compute workloads.

However, one of the most interesting areas that NVIDIA has been touting for GV100 has been deep learning. With a 1.33x increase in single-precision FP32 compute over the Titan Xp, and the addition of specialized Tensor Cores for deep learning, the TITAN V is well positioned for deep learning workflows.

In mathematics, a tensor is a multi-dimensional array of numerical values with respect to a given basis. While we won't go deep into the math behind it, Tensors are a crucial data structure for deep learning applications.

07.jpg

NVIDIA's Tensor Cores aim to accelerate Tensor-based math by utilizing half-precision FP16 math in order to process both dimensions of a Tensor at the same time. The GV100 GPU contains 640 of these Tensor Cores to accelerate FP16 neural network training.

It's worth noting that these are not the first Tensor operation-specific hardware, with others such as Google developing hardware for these specific functions.

Test Setup

  PC Perspective Deep Learning Testbed
Processor AMD Ryzen Threadripper 1920X
Motherboard GIGABYTE X399 AORUS Gaming 7
Memory 64GB Corsair Vengeance RGB DDR4-3000 
Storage Samsung SSD 960 Pro 2TB
Power Supply Corsair AX1500i 1500 watt
OS Ubuntu 16.04.3 LTS
Drivers AMD: AMD GPU Pro 17.50
NVIDIA: 387.34

For our NVIDIA testing, we used the NVIDIA GPU Cloud 17.12 Docker containers for both TensorFlow and Caffe2 inside of our Ubuntu 16.04.3 host operating system.

AMD testing was done using the hiptensorflow port from the AMD ROCm GitHub repositories.

For all tests, we are using the ImageNet Large Scale Visual Recognition Challenge 2012 (ILSVRC2012) data set.

Continue reading our look at deep learning performance with the NVIDIA Titan V!!

Intel Sheds More Light On Benefits of Nervana Neural Network Processor

Subject: General Tech, Processors | December 12, 2017 - 04:52 PM |
Tagged: training, nnp, nervana, Intel, flexpoint, deep learning, asic, artificial intelligence

Intel recently provided a few insights into its upcoming Nervana Neural Network Processor (NNP) on its blog. Built in partnership with deep learning startup Nervana Systems which Intel acquired last year for over $400 million, the AI-focused chip previously codenamed Lake Crest is built on a new architecture designed from the ground up to accelerate neural network training and AI modeling.

new_nervana_chip-fb.jpg

The full details of the Intel NNP are still unknown, but it is a custom ASIC with a Tensor-based architecture placed on a multi-chip module (MCM) along with 32GB of HBM2 memory. The Nervana NNP supports optimized and power efficient Flexpoint math and interconnectivity is huge on this scalable platform. Each AI accelerator features 12 processing clusters (with an as-yet-unannounced number of "cores" or processing elements) paired with 12 proprietary inter-chip links that 20-times faster than PCI-E, four HBM2 memory controllers, a management-controller CPU, as well as standard SPI, I2C, GPIO, PCI-E x16, and DMA I/O. The processor is designed to be highly configurable and to meet both mode and data parallelism goals.

The processing elements are all software controlled and can communicate with each other using high speed bi-directional links at up to a terabit per second. Each processing element has more than 2MB of local memory and the Nervana NNP has 30MB in total of local memory. Memory accesses and data sharing is managed with QOS software which controls adjustable bandwidth over multiple virtual channels with multiple priorities per channel. Processing elements can talk to and send/receive data between each other and the HBM2 stacks locally as well as off die to processing elements and HBM2 on other NNP chips. The idea is to allow as much internal sharing as possible and to keep as much data stored and transformed in local data as possible in order to save precious HBM2 bandwidth (1TB/s) for pre-fetching upcoming tensors, reduce the number of hops and resulting latency by not having to go out to the HBM2 memory and back to transfer data between cores and/or processors, and to save power. This setup also helps Intel achieve an extremely parallel and scalable platform where multiple Nervana NNP Xeon co-processors on the same and remote boards effectively act as a massive singular compute unit!

Intel Lake Crest Block Diagram.jpg
 

Intel's Flexpoint is also at the heart of the Nervana NNP and allegedly allows Intel to achieve similar results to FP32 with twice the memory bandwidth while being more power efficient than FP16. Flexpoint is used for the scalar math required for deep learning and uses fixed point 16-bit multiply and addition operations with a shared 5-bit exponent. Unlike FP16, Flexpoint uses all 16-bits of address space for the mantissa and passes the exponent in the instruction. The NNP architecture also features zero cycle transpose operations and optimizations for matrix multiplication and convolutions to optimize silicon usage.

Software control allows users to dial in the performance for their specific workloads, and since many of the math operations and data movement are known or expected in advance, users can keep data as close to the compute units working on that data as possible while minimizing HBM2 memory accesses and data movements across the die to prevent congestion and optimize power usage.

Intel is currently working with Facebook and hopes to have its deep learning products out early next year. The company may have axed Knights Hill, but it is far from giving up on this extremely lucrative market as it continues to push towards exascale computing and AI. Intel is pushing for a 100x increase in neural network performance by 2020 which is a tall order but Intel throwing its weight around in this ring is something that should give GPU makers pause as such an achievement could cut heavily into their GPGPU-powered entries into this market that is only just starting to heat up.

You won't be running Crysis or even Minecraft on this thing, but you might be using software on your phone for augmented reality or in your autonomous car that is running inference routines on a neural network that was trained on one of these chips soon enough! It's specialized and niche, but still very interesting.

Also read:

Source: Intel