Subject: Mobile | September 25, 2017 - 10:45 PM | Scott Michaud
Tagged: Imagination Technologies, deep neural network
Imagination Technologies is known to develop interesting, somewhat offbeat hardware, such as GPUs with built-in ray tracers. In this case, the company is jumping into the neural network market with a Power VR-branded accelerator. The PowerVR Series2NX Neural Network Accelerator works on massively parallel, but low-precision tasks. AnandTech says that the chip can even work in multiple bit-depths on different layers in a single network, from 16-bit, down to 12-, 10-, 8-, 7-, 6-, 5-, and 4-bit.
Image Credit: Imagination Technologies via Anandtech
Imagination seems to say that this is variable “to maintain accuracy”. I’m guessing it doesn’t give an actual speed-up to tweak your network in that way, but I honestly don’t know.
As for Imagination Technologies, they intend to have this in mobile devices for, as they suggest, photography and predictive text. They also state the usual suspects: VR/AR, automotive, surveillance, and so forth. They are suggesting that this GPU technology will target Tensorflow Lite.
The PowerVR 2NX Neural Network Accelerator is available for licensing.
Subject: General Tech | May 29, 2017 - 08:46 PM | Scott Michaud
Tagged: machine learning, fluid, deep neural network, deep learning
SIGGRAPH 2017 is still a few months away, but we’re already starting to see demos get published as groups try to get them accepted to various parts of the trade show. In this case, Physics Forests published a two-minute video where they perform fluid simulations without actually simulating fluid dynamics. Instead, they used a deep-learning AI to hallucinate a convincing fluid dynamics result given their inputs.
We’re seeing a lot of research into deep-learning AIs for complex graphics effects lately. The goal of most of these simulations, whether they are for movies or video games, is to create an effect that convinces the viewer that what they see is realistic. The goal is not to create an actually realistic effect. The question then becomes, “Is it easier to actually solve the problem? Or is it easier having an AI learn, based on a pile of data sorted into successes and failures, come up with an answer that looks correct to the viewer?”
In a lot of cases, like global illumination and even possibly anti-aliasing, it might be faster to have an AI trick you. Fluid dynamics is just one example.
Subject: Graphics Cards | December 12, 2016 - 04:05 PM | Jeremy Hellstrom
Tagged: vega 10, Vega, training, radeon, Polaris, machine learning, instinct, inference, Fiji, deep neural network, amd
Ryan was not the only one at AMD's Radeon Instinct briefing, covering their shot across NVIDIA's HPC products. The Tech Report just released their coverage of the event and the tidbits which AMD provided about the MI25, MI8 and MI6; no relation to a certain British governmental department. They focus a bit more on the technologies incorporated into GEMM and point out that AMD's top is not matched by an NVIDIA product, the GP100 GPU does not come as an add-in card. Pop by to see what else they had to say.
"Thus far, Nvidia has enjoyed a dominant position in the burgeoning world of machine learning with its Tesla accelerators and CUDA-powered software platforms. AMD thinks it can fight back with its open-source ROCm HPC platform, the MIOpen software libraries, and Radeon Instinct accelerators. We examine how these new pieces of AMD's machine-learning puzzle fit together."
Here are some more Graphics Card articles from around the web:
- The Complete AMD Radeon Instinct Tech Briefing @ Tech ARP
- Chill With Radeon Software Crimson ReLive Edition @ Techgage
- Radeon Software Crimson ReLive Edition—an overview @ The Tech Report
- AMD Radeon Crimson ReLive Drivers @ techPowerUp
- AMD talk to KitGuru about Crimson ReLive
- We retest Radeon Chill 2 The Tech Report
- MSI RX 480 Gaming X 8G Review @ OCC
- NVIDIA GeForce GTX 1080 PCI-Express Scaling @ techPowerUp
AMD Enters Machine Learning Game with Radeon Instinct Products
NVIDIA has been diving in to the world of machine learning for quite a while, positioning themselves and their GPUs at the forefront on artificial intelligence and neural net development. Though the strategies are still filling out, I have seen products like the DIGITS DevBox place a stake in the ground of neural net training and platforms like Drive PX to perform inference tasks on those neural nets in self-driving cars. Until today AMD has remained mostly quiet on its plans to enter and address this growing and complex market, instead depending on the compute prowess of its latest Polaris and Fiji GPUs to make a general statement on their own.
The new Radeon Instinct brand of accelerators based on current and upcoming GPU architectures will combine with an open-source approach to software and present researchers and implementers with another option for machine learning tasks.
The statistics and requirements that come along with the machine learning evolution in the compute space are mind boggling. More than 2.5 quintillion bytes of data are generated daily and stored on phones, PCs and servers, both on-site and through a cloud infrastructure. That includes 500 million tweets, 4 million hours of YouTube video, 6 billion google searches and 205 billion emails.
Machine intelligence is going to allow software developers to address some of the most important areas of computing for the next decade. Automated cars depend on deep learning to train, medical fields can utilize this compute capability to more accurately and expeditiously diagnose and find cures to cancer, security systems can use neural nets to locate potential and current risk areas before they affect consumers; there are more uses for this kind of network and capability than we can imagine.
Subject: General Tech | October 6, 2016 - 11:37 PM | Tim Verry
Tagged: supercomputer, microsoft, deep neural network, azure, artificial intelligence, ai
Microsoft recently announced it would be restructuring 5,000 employees as it focuses its efforts on artificial intelligence with a new AI and Research Group. The Redmond giant is pulling computer scientists and engineers from Microsoft Research, the Information Platfrom, Bing, and Cortana groups, and the Ambient Computing and Robotics teams. Led by 20 year Microsoft veteran Harry Shum (who has worked in both research and engineering roles at Microsoft), the new AI team promises to "democratize AI" and be a leader in the field with intelligent products and services.
It seems that "democratizing AI" is less about free artificial intelligence and more about making the technology accessible to everyone. The AI and Research Group plans to develop artificial intelligence to the point where it will change how humans interact with their computers (read: Cortana 2.0) with services and commands being conversational rather than strict commands, new applications baked with AI such as office and photo editors that are able to proof read and suggest optimal edits respectively, and new vision, speech, and machine analytics APIs that other developers will be able to harness for their own applications. (Wow that's quite the long sentence - sorry!)
Further, Microsoft wants to build the world's fastest AI supercomputer using its Azure cloud computing service. The Azure-powered AI will be available to everyone for their applications and research needs (for a price, of course!). Microsoft certainly has the money, brain power, and computing power to throw at the problem, and this may be one of the major areas where looking to "the cloud" for a company's computing needs is a smart move as the up front capital needed for hardware, engineers, and support staff to do something like this in-house would be extremely prohibative. It remains to be seen whether Microsoft will win out in the wake of competitors at being the first, but it is certainly staking its claim and does not want to be left out completely.
“Microsoft has been working in artificial intelligence since the beginning of Microsoft Research, and yet we’ve only begun to scratch the surface of what’s possible,” said Shum, executive vice president of the Microsoft AI and Research Group. “Today’s move signifies Microsoft’s commitment to deploying intelligent technology and democratizing AI in a way that changes our lives and the world around us for the better. We will significantly expand our efforts to empower people and organizations to achieve more with our tools, our software and services, and our powerful, global-scale cloud computing capabilities.”
Interestingly, this announcement comes shortly after a previous announcement that industry giants Amazon, Facebook, Google-backed DeepMind, IBM, and Microsoft founded the not-for-profit Partnership On AI organization that will collaborate and research best practices on AI development and exploitation (and hopefully how to teach them not to turn on us heh).
I am looking forward to the future of AI and the technologies it will enable!
Subject: Processors | October 1, 2016 - 06:11 PM | Tim Verry
Tagged: xavier, Volta, tegra, SoC, nvidia, machine learning, gpu, drive px 2, deep neural network, deep learning
Earlier this week at its first GTC Europe event in Amsterdam, NVIDIA CEO Jen-Hsun Huang teased a new SoC code-named Xavier that will be used in self-driving cars and feature the company's newest custom ARM CPU cores and Volta GPU. The new chip will begin sampling at the end of 2017 with product releases using the future Tegra (if they keep that name) processor as soon as 2018.
NVIDIA's Xavier is promised to be the successor to the company's Drive PX 2 system which uses two Tegra X2 SoCs and two discrete Pascal MXM GPUs on a single water cooled platform. These claims are even more impressive when considering that NVIDIA is not only promising to replace the four processors but it will reportedly do that at 20W – less than a tenth of the TDP!
The company has not revealed all the nitty-gritty details, but they did tease out a few bits of information. The new processor will feature 7 billion transistors and will be based on a refined 16nm FinFET process while consuming a mere 20W. It can process two 8k HDR video streams and can hit 20 TOPS (NVIDIA's own rating for deep learning int(8) operations).
Specifically, NVIDIA claims that the Xavier SoC will use eight custom ARMv8 (64-bit) CPU cores (it is unclear whether these cores will be a refined Denver architecture or something else) and a GPU based on its upcoming Volta architecture with 512 CUDA cores. Also, in an interesting twist, NVIDIA is including a "Computer Vision Accelerator" on the SoC as well though the company did not go into many details. This bit of silicon may explain how the ~300mm2 die with 7 billion transistors is able to match the 7.2 billion transistor Pascal-based Telsa P4 (2560 CUDA cores) graphics card at deep learning (tera-operations per second) tasks. Of course in addition to the incremental improvements by moving to Volta and a new ARMv8 CPU architectures on a refined 16nm FF+ process.
|Drive PX||Drive PX 2||NVIDIA Xavier||Tesla P4|
|CPU||2 x Tegra X1 (8 x A57 total)||2 x Tegra X2 (8 x A57 + 4 x Denver total)||1 x Xavier SoC (8 x Custom ARM + 1 x CVA)||N/A|
|GPU||2 x Tegra X1 (Maxwell) (512 CUDA cores total||2 x Tegra X2 GPUs + 2 x Pascal GPUs||1 x Xavier SoC GPU (Volta) (512 CUDA Cores)||2560 CUDA Cores (Pascal)|
|TFLOPS||2.3 TFLOPS||8 TFLOPS||?||5.5 TFLOPS|
|DL TOPS||?||24 TOPS||20 TOPS||22 TOPS|
|TDP||~30W (2 x 15W)||250W||20W||up to 75W|
|Process Tech||20nm||16nm FinFET||16nm FinFET+||16nm FinFET|
|Transistors||?||?||7 billion||7.2 billion|
For comparison, the currently available Tesla P4 based on its Pascal architecture has a TDP of up to 75W and is rated at 22 TOPs. This would suggest that Volta is a much more efficient architecture (at least for deep learning and half precision)! I am not sure how NVIDIA is able to match its GP104 with only 512 Volta CUDA cores though their definition of a "core" could have changed and/or the CVA processor may be responsible for closing that gap. Unfortunately, NVIDIA did not disclose what it rates the Xavier at in TFLOPS so it is difficult to compare and it may not match GP104 at higher precision workloads. It could be wholly optimized for int(8) operations rather than floating point performance. Beyond that I will let Scott dive into those particulars once we have more information!
Xavier is more of a teaser than anything and the chip could very well change dramatically and/or not hit the claimed performance targets. Still, it sounds promising and it is always nice to speculate over road maps. It is an intriguing chip and I am ready for more details, especially on the Volta GPU and just what exactly that Computer Vision Accelerator is (and will it be easy to program for?). I am a big fan of the "self-driving car" and I hope that it succeeds. It certainly looks to continue as Tesla, VW, BMW, and other automakers continue to push the envelope of what is possible and plan future cars that will include smart driving assists and even cars that can drive themselves. The more local computing power we can throw at automobiles the better and while massive datacenters can be used to train the neural networks, local hardware to run and make decisions are necessary (you don't want internet latency contributing to the decision of whether to brake or not!).
I hope that NVIDIA's self-proclaimed "AI Supercomputer" turns out to be at least close to the performance they claim! Stay tuned for more information as it gets closer to launch (hopefully more details will emerge at GTC 2017 in the US).
What are your thoughts on Xavier and the whole self-driving car future?
- NVIDIA Teases Xavier, a High-Performance ARM SoC for Drive PX & AI @ AnandTech
- Tegra Related News @ PC Perspective
- Tesla P4 Specifications @ NVIDIA
- CES 2016: NVIDIA Launches DRIVE PX 2 With Dual Pascal GPUs Driving A Deep Neural Network @ PC Perspective
Subject: General Tech | February 4, 2016 - 01:18 PM | Tim Verry
Tagged: open source, microsoft, machine learning, deep neural network, deep learning, cntk, azure
Microsoft has been using deep neural networks for awhile now to power its speech recognition technologies bundled into Windows and Skype to identify and follow commands and to translate speech respectively. This technology is part of Microsoft's Computational Network Toolkit. Last April, the company made this toolkit available to academic researchers on Codeplex, and it is now opening it up even more by moving the project to GitHub and placing it under an open source license.
Lead by chief speech and computer scientist Xuedong Huang, a team of Microsoft researchers built the Computational Network Toolkit (CNTK) to power all their speech related projects. The CNTK is a deep neural network for machine learning that is built to be fast and scalable across multiple systems, and more importantly, multiple GPUs which excel at these kinds of parallel processing workloads and algorithms. Microsoft heavily focused on scalability with CNTK and according to the company's own benchmarks (which is to say to be taken with a healthy dose of salt) while the major competing neural network tool kits offer similar performance running on a single GPU, when adding more than one graphics card CNTK is vastly more efficient with almost four times the performance of Google's TensorFlow and a bit more than 1.5-times Torch 7 and Caffe. Where CNTK gets a bit deep learning crazy is its ability to scale beyond a single system and easily tap into Microsoft's Azure GPU Lab to get access to numerous GPUs from their remote datacenters -- though its not free you don't need to purchase, store, and power the hardware locally and can ramp the number up and down based on how much GPU muscle you need. The example Microsoft provided showed two similarly spec'd Linux systems with four GPUs each running on Azure cloud hosting getting close to twice the performance of the 4 GPU system (75% increase). Microsoft claims that "CNTK can easily scale beyond 8 GPUs across multiple machines with superior distributed system performance."
Using GPU-based Azure machines, Microsoft was able to increase the performance of Cortana's speech recognition by 10-times compared to the local systems they were previously using.
It is always cool to see GPU compute in practice and now that CNTK is available to everyone, I expect to see a lot of new uses for the toolkit beyond speech recognition. Moving to an open source license is certainly good PR, but I think it was actually done more for Microsoft's own benefit rather than users which isn't necessarily a bad thing since both get to benefit from it. I am really interested to see what researchers are able to do with a deep neural network that reportedly offers so much performance thanks to GPUs. I'm curious what new kinds of machine learning opportunities the extra speed will enable.
If you are interested, you can check out CNTK on GitHub!
Subject: General Tech | January 5, 2016 - 01:17 AM | Tim Verry
Tagged: tegra, pascal, nvidia, driveworks, drive px 2, deep neural network, deep learning, autonomous car
NVIDIA is using the Consumer Electronics Show to launch the Drive PX 2 which is the latest bit of hardware aimed at autonomous vehicles. Several NVIDIA products combine to create the company's self-driving "end to end solution" including DIGITS, DriveWorks, and the Drive PX 2 hardware to train, optimize, and run the neural network software that will allegedly be the brains of future self-driving cars (or so NVIDIA hopes).
The Drive PX 2 hardware is the successor to the Tegra-powered Drive PX released last year. The Drive PX 2 represents a major computational power jump with 12 CPU cores and two discrete "Pascal"-based GPUs! NVIDIA has not revealed the full specifications yet, but they have made certain details available. There are two Tegra SoCs along with two GPUs that are liquid cooled. The liquid cooling consists of a large metal block with copper tubing winding through it and then passing into what looks to be external connectors that attach to a completed cooling loop (an exterior radiator, pump, and reservoir).
There are a total of 12 CPU cores including eight ARM Cortex A57 cores and four "Denver" cores. The discrete graphics are based on the 16nm FinFET process and will use the company's upcoming Pascal architecture. The total package will draw a maximum of 250 watts and will offer up to 8 TFLOPS of computational horsepower and 24 trillion "deep learning operations per second." That last number relates to the number of special deep learning instructions the hardware can process per second which, if anything, sounds like an impressive amount of power when it comes to making connections and analyzing data to try to classify it. Drive PX 2 is, according to NVIDIA, 10 times faster than it's predecessor at running these specialized instructions and has nearly 4 times the computational horsepower when it comes to TLOPS.
Similar to the original Drive PX, the driving AI platform can accept and process the inputs of up to 12 video cameras. It can also handle LiDAR, RADAR, and ultrasonic sensors. NVIDIA compared the Drive PX 2 to the TITAN X in its ability to process 2,800 images per second versus the consumer graphics card's 450 AlexNet images which while possibly not the best comparison does make it look promising.
Neural networks and machine learning are at the core of what makes autonomous vehicles possible along with hardware powerful enough to take in a multitude of sensor data and process it fast enough. The software side of things includes the DriveWorks development kit which includes specialized instructions and a neural network that can detect objects based on sensor input(s), identify and classify them, determine the positions of objects relative to the vehicle, and calculate the most efficient path to the destination.
Specifically, in the press release NVIDIA stated:
"This complex work is facilitated by NVIDIA DriveWorks™, a suite of software tools, libraries and modules that accelerates development and testing of autonomous vehicles. DriveWorks enables sensor calibration, acquisition of surround data, synchronization, recording and then processing streams of sensor data through a complex pipeline of algorithms running on all of the DRIVE PX 2's specialized and general-purpose processors. Software modules are included for every aspect of the autonomous driving pipeline, from object detection, classification and segmentation to map localization and path planning."
DIGITS is the platform used to train the neural network that is then used by the Drive PX 2 hardware. The software is purportedly improving in both accuracy and training time with NVIDIA achieving a 96% accuracy rating at identifying traffic signs based on the traffic sign database from Ruhr University Bochum after a training session lasting only 4 hours as opposed to training times of days or even weeks.
NVIDIA claims that the initial Drive PX has been picked up by over 50 development teams (automakers, universities, software developers, et al) interested in autonomous vehicles. Early access to development hardware is expected to be towards the middle of the year with general availability of final hardware in Q4 2016.
The new Drive PX 2 is getting a serious hardware boost with the inclusion of two dedicated graphics processors (the Drive PX was based around two Tegra X1 SoCs), and that should allow automakers to really push what's possible in real time and push the self-driving car a bit closer to reality and final (self) drive-able products. I'm excited to see that vision come to fruition and am looking forward to seeing what this improved hardware will enable in the auto industry!
Follow all of our coverage of the show at http://pcper.com/ces!
Subject: General Tech | November 12, 2015 - 02:46 AM | Tim Verry
Tagged: Tegra X1, nvidia, maxwell, machine learning, jetson, deep neural network, CUDA, computer vision
Nearly two years ago, NVIDIA unleashed the Jetson TK1, a tiny module for embedded systems based around the company's Tegra K1 "super chip." That chip was the company's first foray into CUDA-powered embedded systems capable of machine learning including object recognition, 3D scene processing, and enabling things like accident avoidance and self-parking cars.
Now, NVIDIA is releasing even more powerful kit called the Jetson TX1. This new development platform covers two pieces of hardware: the credit card sized Jetson TX1 module and a larger Jetson TX1 Development Kit that the module plugs into and provides plenty of I/O options and pin outs. The dev kit can be used by software developers or for prototyping while the module alone can be used with finalized embedded products.
NVIDIA foresees the Jetson TX1 being used in drones, autonomous vehicles, security systems, medical devices, and IoT devices coupled with deep neural networks, machine learning, and computer vision software. Devices would be able to learn from the environment in order to navigate safely, identify and classify objects of interest, and perform 3D mapping and scene modeling. NVIDIA partnered with several companies for proof-of-concepts including Kespry and Stereolabs.
Using the TX1, Kespry was able to use drones to classify and track in real time construction equipment moving around a construction site (in which the drone was not necessarily programmed for exactly as sites and weather conditions vary, the machine learning/computer vision was used to allow the drone to navigate the construction site and a deep neural network was used to identify and classify the type of equipment it saw using its cameras. Meanwhile Stereolabs used high resolution cameras and depth sensors to capture photos of buildings and then used software to reconstruct the 3D scene virtually for editing and modeling. You can find other proof-of-concept videos, including upgrading existing drones to be more autonomous posted here.
From the press release:
"Jetson TX1 will enable a new generation of incredibly capable autonomous devices," said Deepu Talla, vice president and general manager of the Tegra business at NVIDIA. "They will navigate on their own, recognize objects and faces, and become increasingly intelligent through machine learning. It will enable developers to create industry-changing products."
But what about the hardware side of things? Well, the TX1 is a respectable leap in hardware and compute performance. Sitting at 1 Teraflops of rated (FP16) compute performance, the TX1 pairs four ARM Cortex A57 and four ARM Cortex A53 64-bit CPU cores with a 256-core Maxwell-based GPU. Definitely respectable for its size and low power consumption, especially considering NVIDIA claims the SoC can best the Intel Skylake Core i7-6700K in certain workloads (thanks to the GPU portion). The module further contains 4GB of LPDDR4 memory and 16GB of eMMC flash storage.
In short, while on module storage has not increased, RAM has been doubled and compute performance has tripled for FP16 compute performance and jumped by approximately 40% for FP32 versus the Jetson TK1's 2GB of DDR3 and 192-core Kepler GPU. The TX1 also uses a smaller process node at 20nm (versus 28nm) and the chip is said to use "very little power." Networking support includes 802.11ac and Gigabit Ethernet. The chart below outlines the major differences between the two platforms.
|Jetson TX1||Jetson TK1|
|GPU (Architecture)||256-core (Maxwell)||192-core (Kepler)|
|CPU||4 x ARM Cortex A57 + 4 x A53||"4+1" ARM Cortex A15 "r3"|
|RAM||4 GB LPDDR4||2 GB LPDDR3|
|eMMC||16 GB||16 GB|
|Compute Performance (FP16)||1 TFLOP||326 GFLOPS|
|Compute Performance (FP32) - via AnandTech||512 GFLOPS (AT's estimation)||326 GFLOPS (NVIDIA's number)|
The TX1 will run the Linux For Tegra operating system and supports the usual suspects of CUDA 7.0, cuDNN, and VisionWorks development software as well as the latest OpenGL drivers (OpenGL 4.5, OpenGL ES 3.1, and Vulkan).
NVIDIA is continuing to push for CUDA Everywhere, and the Jetson TX1 looks to be a more mature product that builds on the TK1. The huge leap in compute performance should enable even more interesting projects and bring more sophisticated automation and machine learning to smaller and more intelligent devices.
For those interested, the Jetson TX1 Development Kit (the full I/O development board with bundled module) will be available for pre-order today at $599 while the TX1 module itself will be available soon for approximately $299 each in orders of 1,000 or more (like Intel's tray pricing).
With CUDA 7, it is apparently possible for the GPU to be used for general purpose processing as well which may open up some doors that where not possible before in such a small device. I am interested to see what happens with NVIDIA's embedded device play and what kinds of automated hardware is powered by the tiny SoC and its beefy graphics.