Addressing New Markets
Machine Learning is one of the hot topics in technology, and certainly one that is growing at a very fast rate. Applications such as facial recognition and self-driving cars are powering much of the development going on in this area. So far we have seen CPUs and GPUs being used in ML applications, but in most cases these are not the most efficient ways of doing these highly parallel but relatively computationally simple workloads. New chips have been introduced that are far more focused on machine learning, and now it seems that ARM is throwing their hat into the ring.
ARM is introducing three products under the Project Trillium brand. It features a ML processor, a OD (Object Detection) processor, and a ARM developed Neural Network software stack. This project came as a surprise for most of us, but in hindsight it is a logical avenue for them to address as it will be incredibly important moving forward. Currently many applications that require machine learning are not processed at the edge, namely in the consumer’s hand or device right next to them. Workloads may be requested from the edge, but most of the heavy duty processing occurs in datacenters located all around the world. This requires communication, and sometimes pretty hefty levels of bandwidth. If neither of those things are present, applications requiring ML break down.
Subject: General Tech | November 7, 2017 - 01:35 PM | Jeremy Hellstrom
Tagged: machine learning, ai
Not to be out done by the research conducted by Japan's Kyushu University which led to the frog is not truck portion of lasts weeks podcast, MIT researchers have also been tormenting image recognition software. Their findings were a little more worrisome, as a 3D printed turtle was identified as a rifle which could lead to some very bad situations in airports or other secure locations. In this case, instead of adding a few pixels to the image, they introduced different angles and lighting conditions which created enough noise to completely fool Google's image recognition AI, Inception. The printed turtle was misidentified because of a the texture which they chose, showing that this issue extends beyond photos to include physical objects. Pop by The Register for more details as well as an ingredient you never want to see on your toast.
"Students at MIT in the US claim they have developed an algorithm for creating 3D objects and pictures that trick image-recognition systems into severely misidentifying them. Think toy turtles labeled rifles, and baseballs as cups of coffee."
Here is some more Tech News from around the web:
- No, Samsung, you really do owe Apple $120m for patent infringement @ The Register
- Almost Everything on Computers Is Perceptually Slower Than It Was in 1983 @ [H]ard|OCP
- Get Watch Dogs FREE From Ubisoft This Week! @ TechARP
- Fat-fingered Level 3 techie reduces internet to level zero: Glitch knocks out connections @ The Register
- Kaspersky warns of increased DDoS attacks against gaming companies @ The Inquirer
- Android security update fixes KRACK, slaps Band-Aid on Pixel 2 XL screen @ Ars Technica
- Seldom used 'i' mangled by baffling autocorrect bug in Apple's iOS 11 @ The Register
- Microsoft releases strict standards for 'highly secure' Windows 10 devices @ The Inquirer
- MINIX: Intel's Hidden In-chip Operating System @ Slashdot
Subject: General Tech | June 28, 2017 - 11:17 PM | Scott Michaud
Tagged: Unity, machine learning, deep learning
Unity, who makes the popular 3D game engine of the same name, has announced a research fellowship for integrating machine learning into game development. Two students, who must have been enrolled in a Masters or a PhD program on June 26th, will be selected and provided with $30,000 for a 6-month fellowship. The deadline is midnight (PDT) on September 9th.
We’re beginning to see a lot of machine-learning applications being discussed for gaming. There are some cases, like global illumination and fluid simulations, where it could be faster for a deep-learning algorithm to hallucinate a convincing than a physical solver will produce a correct one. In this case, it makes sense to post-process each frame, so, naturally, game engine developers are paying attention.
If eligible, you can apply on their website.
Subject: General Tech | May 29, 2017 - 08:46 PM | Scott Michaud
Tagged: machine learning, fluid, deep neural network, deep learning
SIGGRAPH 2017 is still a few months away, but we’re already starting to see demos get published as groups try to get them accepted to various parts of the trade show. In this case, Physics Forests published a two-minute video where they perform fluid simulations without actually simulating fluid dynamics. Instead, they used a deep-learning AI to hallucinate a convincing fluid dynamics result given their inputs.
We’re seeing a lot of research into deep-learning AIs for complex graphics effects lately. The goal of most of these simulations, whether they are for movies or video games, is to create an effect that convinces the viewer that what they see is realistic. The goal is not to create an actually realistic effect. The question then becomes, “Is it easier to actually solve the problem? Or is it easier having an AI learn, based on a pile of data sorted into successes and failures, come up with an answer that looks correct to the viewer?”
In a lot of cases, like global illumination and even possibly anti-aliasing, it might be faster to have an AI trick you. Fluid dynamics is just one example.
Subject: General Tech, Processors | March 12, 2017 - 05:11 PM | Tim Verry
Tagged: pascal, nvidia, machine learning, iot, Denver, Cortex A57, ai
Measuring 50mm x 87mm, the Jetson TX2 packs quite a bit of processing power and I/O including an SoC with two 64-bit Denver 2 cores with 2MB L2, four ARM Cortex A57 cores with 2MB L2, and a 256-core GPU based on NVIDIA’s Pascal architecture. The TX2 compute module also hosts 8 GB of LPDDR4 (58.3 GB/s) and 32 GB of eMMC storage (SDIO and SATA are also supported). As far as I/O, the Jetson TX2 uses a 400-pin connector to connect the compute module to the development board or final product and the final I/O available to users will depend on the product it is used in. The compute module supports up to the following though:
- 2 x DSI
- 2 x DP 1.2 / HDMI 2.0 / eDP 1.4
- USB 3.0
- USB 2.0
- 12 x CSI lanes for up to 6 cameras (2.5 GB/second/lane)
- PCI-E 2.0:
- One x4 + one x1 or two x1 + one x2
- Gigabit Ethernet
The Jetson TX2 runs the “Linux for Tegra” operating system. According to NVIDIA the Jetson TX2 can deliver up to twice the performance of the TX1 or up to twice the efficiency at 7.5 watts at the same performance.
The extra horsepower afforded by the faster CPU, updated GPU, and increased memory and memory bandwidth will reportedly enable smart end user devices with faster facial recognition, more accurate speech recognition, and smarter AI and machine learning tasks (e.g. personal assistant, smart street cameras, smarter home automation, et al). Bringing more power locally to these types of internet of things devices is a good thing as less reliance on the cloud potentially means more privacy (unfortunately there is not as much incentive for companies to make this type of product for the mass market but you could use the TX2 to build your own).
Cisco will reportedly use the Jetson TX2 to add facial and speech recognition to its Cisco Spark devices. In addition to the hardware, NVIDIA offers SDKs and tools as part of JetPack 3.0. The JetPack 3.0 toolkit includes Tensor-RT, cuDNN 5.1, VisionWorks 1.6, CUDA 8, and support and drivers for OpenGL 4.5, OpenGL ES 3 2, EGL 1.4, and Vulkan 1.0.
The TX2 will enable better, stronger, and faster (well I don't know about stronger heh) industrial control systems, robotics, home automation, embedded computers and kiosks, smart signage, security systems, and other connected IoT devices (that are for the love of all processing are hardened and secured so they aren't used as part of a botnet!).
Interested developers and makers can pre-order the Jetson TX2 Development Kit for $599 with a ship date for US and Europe of March 14 and other regions “in the coming weeks.” If you just want the compute module sans development board, it will be available later this quarter for $399 (in quantities of 1,000 or more). The previous generation Jetson TX1 Development Kit has also received a slight price cut to $499.
Subject: Graphics Cards | December 12, 2016 - 04:05 PM | Jeremy Hellstrom
Tagged: vega 10, Vega, training, radeon, Polaris, machine learning, instinct, inference, Fiji, deep neural network, amd
Ryan was not the only one at AMD's Radeon Instinct briefing, covering their shot across NVIDIA's HPC products. The Tech Report just released their coverage of the event and the tidbits which AMD provided about the MI25, MI8 and MI6; no relation to a certain British governmental department. They focus a bit more on the technologies incorporated into GEMM and point out that AMD's top is not matched by an NVIDIA product, the GP100 GPU does not come as an add-in card. Pop by to see what else they had to say.
"Thus far, Nvidia has enjoyed a dominant position in the burgeoning world of machine learning with its Tesla accelerators and CUDA-powered software platforms. AMD thinks it can fight back with its open-source ROCm HPC platform, the MIOpen software libraries, and Radeon Instinct accelerators. We examine how these new pieces of AMD's machine-learning puzzle fit together."
Here are some more Graphics Card articles from around the web:
- The Complete AMD Radeon Instinct Tech Briefing @ Tech ARP
- Chill With Radeon Software Crimson ReLive Edition @ Techgage
- Radeon Software Crimson ReLive Edition—an overview @ The Tech Report
- AMD Radeon Crimson ReLive Drivers @ techPowerUp
- AMD talk to KitGuru about Crimson ReLive
- We retest Radeon Chill 2 The Tech Report
- MSI RX 480 Gaming X 8G Review @ OCC
- NVIDIA GeForce GTX 1080 PCI-Express Scaling @ techPowerUp
AMD Enters Machine Learning Game with Radeon Instinct Products
NVIDIA has been diving in to the world of machine learning for quite a while, positioning themselves and their GPUs at the forefront on artificial intelligence and neural net development. Though the strategies are still filling out, I have seen products like the DIGITS DevBox place a stake in the ground of neural net training and platforms like Drive PX to perform inference tasks on those neural nets in self-driving cars. Until today AMD has remained mostly quiet on its plans to enter and address this growing and complex market, instead depending on the compute prowess of its latest Polaris and Fiji GPUs to make a general statement on their own.
The new Radeon Instinct brand of accelerators based on current and upcoming GPU architectures will combine with an open-source approach to software and present researchers and implementers with another option for machine learning tasks.
The statistics and requirements that come along with the machine learning evolution in the compute space are mind boggling. More than 2.5 quintillion bytes of data are generated daily and stored on phones, PCs and servers, both on-site and through a cloud infrastructure. That includes 500 million tweets, 4 million hours of YouTube video, 6 billion google searches and 205 billion emails.
Machine intelligence is going to allow software developers to address some of the most important areas of computing for the next decade. Automated cars depend on deep learning to train, medical fields can utilize this compute capability to more accurately and expeditiously diagnose and find cures to cancer, security systems can use neural nets to locate potential and current risk areas before they affect consumers; there are more uses for this kind of network and capability than we can imagine.
Subject: Processors | October 1, 2016 - 06:11 PM | Tim Verry
Tagged: xavier, Volta, tegra, SoC, nvidia, machine learning, gpu, drive px 2, deep neural network, deep learning
Earlier this week at its first GTC Europe event in Amsterdam, NVIDIA CEO Jen-Hsun Huang teased a new SoC code-named Xavier that will be used in self-driving cars and feature the company's newest custom ARM CPU cores and Volta GPU. The new chip will begin sampling at the end of 2017 with product releases using the future Tegra (if they keep that name) processor as soon as 2018.
NVIDIA's Xavier is promised to be the successor to the company's Drive PX 2 system which uses two Tegra X2 SoCs and two discrete Pascal MXM GPUs on a single water cooled platform. These claims are even more impressive when considering that NVIDIA is not only promising to replace the four processors but it will reportedly do that at 20W – less than a tenth of the TDP!
The company has not revealed all the nitty-gritty details, but they did tease out a few bits of information. The new processor will feature 7 billion transistors and will be based on a refined 16nm FinFET process while consuming a mere 20W. It can process two 8k HDR video streams and can hit 20 TOPS (NVIDIA's own rating for deep learning int(8) operations).
Specifically, NVIDIA claims that the Xavier SoC will use eight custom ARMv8 (64-bit) CPU cores (it is unclear whether these cores will be a refined Denver architecture or something else) and a GPU based on its upcoming Volta architecture with 512 CUDA cores. Also, in an interesting twist, NVIDIA is including a "Computer Vision Accelerator" on the SoC as well though the company did not go into many details. This bit of silicon may explain how the ~300mm2 die with 7 billion transistors is able to match the 7.2 billion transistor Pascal-based Telsa P4 (2560 CUDA cores) graphics card at deep learning (tera-operations per second) tasks. Of course in addition to the incremental improvements by moving to Volta and a new ARMv8 CPU architectures on a refined 16nm FF+ process.
|Drive PX||Drive PX 2||NVIDIA Xavier||Tesla P4|
|CPU||2 x Tegra X1 (8 x A57 total)||2 x Tegra X2 (8 x A57 + 4 x Denver total)||1 x Xavier SoC (8 x Custom ARM + 1 x CVA)||N/A|
|GPU||2 x Tegra X1 (Maxwell) (512 CUDA cores total||2 x Tegra X2 GPUs + 2 x Pascal GPUs||1 x Xavier SoC GPU (Volta) (512 CUDA Cores)||2560 CUDA Cores (Pascal)|
|TFLOPS||2.3 TFLOPS||8 TFLOPS||?||5.5 TFLOPS|
|DL TOPS||?||24 TOPS||20 TOPS||22 TOPS|
|TDP||~30W (2 x 15W)||250W||20W||up to 75W|
|Process Tech||20nm||16nm FinFET||16nm FinFET+||16nm FinFET|
|Transistors||?||?||7 billion||7.2 billion|
For comparison, the currently available Tesla P4 based on its Pascal architecture has a TDP of up to 75W and is rated at 22 TOPs. This would suggest that Volta is a much more efficient architecture (at least for deep learning and half precision)! I am not sure how NVIDIA is able to match its GP104 with only 512 Volta CUDA cores though their definition of a "core" could have changed and/or the CVA processor may be responsible for closing that gap. Unfortunately, NVIDIA did not disclose what it rates the Xavier at in TFLOPS so it is difficult to compare and it may not match GP104 at higher precision workloads. It could be wholly optimized for int(8) operations rather than floating point performance. Beyond that I will let Scott dive into those particulars once we have more information!
Xavier is more of a teaser than anything and the chip could very well change dramatically and/or not hit the claimed performance targets. Still, it sounds promising and it is always nice to speculate over road maps. It is an intriguing chip and I am ready for more details, especially on the Volta GPU and just what exactly that Computer Vision Accelerator is (and will it be easy to program for?). I am a big fan of the "self-driving car" and I hope that it succeeds. It certainly looks to continue as Tesla, VW, BMW, and other automakers continue to push the envelope of what is possible and plan future cars that will include smart driving assists and even cars that can drive themselves. The more local computing power we can throw at automobiles the better and while massive datacenters can be used to train the neural networks, local hardware to run and make decisions are necessary (you don't want internet latency contributing to the decision of whether to brake or not!).
I hope that NVIDIA's self-proclaimed "AI Supercomputer" turns out to be at least close to the performance they claim! Stay tuned for more information as it gets closer to launch (hopefully more details will emerge at GTC 2017 in the US).
What are your thoughts on Xavier and the whole self-driving car future?
- NVIDIA Teases Xavier, a High-Performance ARM SoC for Drive PX & AI @ AnandTech
- Tegra Related News @ PC Perspective
- Tesla P4 Specifications @ NVIDIA
- CES 2016: NVIDIA Launches DRIVE PX 2 With Dual Pascal GPUs Driving A Deep Neural Network @ PC Perspective
Subject: General Tech | February 4, 2016 - 01:18 PM | Tim Verry
Tagged: open source, microsoft, machine learning, deep neural network, deep learning, cntk, azure
Microsoft has been using deep neural networks for awhile now to power its speech recognition technologies bundled into Windows and Skype to identify and follow commands and to translate speech respectively. This technology is part of Microsoft's Computational Network Toolkit. Last April, the company made this toolkit available to academic researchers on Codeplex, and it is now opening it up even more by moving the project to GitHub and placing it under an open source license.
Lead by chief speech and computer scientist Xuedong Huang, a team of Microsoft researchers built the Computational Network Toolkit (CNTK) to power all their speech related projects. The CNTK is a deep neural network for machine learning that is built to be fast and scalable across multiple systems, and more importantly, multiple GPUs which excel at these kinds of parallel processing workloads and algorithms. Microsoft heavily focused on scalability with CNTK and according to the company's own benchmarks (which is to say to be taken with a healthy dose of salt) while the major competing neural network tool kits offer similar performance running on a single GPU, when adding more than one graphics card CNTK is vastly more efficient with almost four times the performance of Google's TensorFlow and a bit more than 1.5-times Torch 7 and Caffe. Where CNTK gets a bit deep learning crazy is its ability to scale beyond a single system and easily tap into Microsoft's Azure GPU Lab to get access to numerous GPUs from their remote datacenters -- though its not free you don't need to purchase, store, and power the hardware locally and can ramp the number up and down based on how much GPU muscle you need. The example Microsoft provided showed two similarly spec'd Linux systems with four GPUs each running on Azure cloud hosting getting close to twice the performance of the 4 GPU system (75% increase). Microsoft claims that "CNTK can easily scale beyond 8 GPUs across multiple machines with superior distributed system performance."
Using GPU-based Azure machines, Microsoft was able to increase the performance of Cortana's speech recognition by 10-times compared to the local systems they were previously using.
It is always cool to see GPU compute in practice and now that CNTK is available to everyone, I expect to see a lot of new uses for the toolkit beyond speech recognition. Moving to an open source license is certainly good PR, but I think it was actually done more for Microsoft's own benefit rather than users which isn't necessarily a bad thing since both get to benefit from it. I am really interested to see what researchers are able to do with a deep neural network that reportedly offers so much performance thanks to GPUs. I'm curious what new kinds of machine learning opportunities the extra speed will enable.
If you are interested, you can check out CNTK on GitHub!
Subject: General Tech | November 12, 2015 - 02:46 AM | Tim Verry
Tagged: Tegra X1, nvidia, maxwell, machine learning, jetson, deep neural network, CUDA, computer vision
Nearly two years ago, NVIDIA unleashed the Jetson TK1, a tiny module for embedded systems based around the company's Tegra K1 "super chip." That chip was the company's first foray into CUDA-powered embedded systems capable of machine learning including object recognition, 3D scene processing, and enabling things like accident avoidance and self-parking cars.
Now, NVIDIA is releasing even more powerful kit called the Jetson TX1. This new development platform covers two pieces of hardware: the credit card sized Jetson TX1 module and a larger Jetson TX1 Development Kit that the module plugs into and provides plenty of I/O options and pin outs. The dev kit can be used by software developers or for prototyping while the module alone can be used with finalized embedded products.
NVIDIA foresees the Jetson TX1 being used in drones, autonomous vehicles, security systems, medical devices, and IoT devices coupled with deep neural networks, machine learning, and computer vision software. Devices would be able to learn from the environment in order to navigate safely, identify and classify objects of interest, and perform 3D mapping and scene modeling. NVIDIA partnered with several companies for proof-of-concepts including Kespry and Stereolabs.
Using the TX1, Kespry was able to use drones to classify and track in real time construction equipment moving around a construction site (in which the drone was not necessarily programmed for exactly as sites and weather conditions vary, the machine learning/computer vision was used to allow the drone to navigate the construction site and a deep neural network was used to identify and classify the type of equipment it saw using its cameras. Meanwhile Stereolabs used high resolution cameras and depth sensors to capture photos of buildings and then used software to reconstruct the 3D scene virtually for editing and modeling. You can find other proof-of-concept videos, including upgrading existing drones to be more autonomous posted here.
From the press release:
"Jetson TX1 will enable a new generation of incredibly capable autonomous devices," said Deepu Talla, vice president and general manager of the Tegra business at NVIDIA. "They will navigate on their own, recognize objects and faces, and become increasingly intelligent through machine learning. It will enable developers to create industry-changing products."
But what about the hardware side of things? Well, the TX1 is a respectable leap in hardware and compute performance. Sitting at 1 Teraflops of rated (FP16) compute performance, the TX1 pairs four ARM Cortex A57 and four ARM Cortex A53 64-bit CPU cores with a 256-core Maxwell-based GPU. Definitely respectable for its size and low power consumption, especially considering NVIDIA claims the SoC can best the Intel Skylake Core i7-6700K in certain workloads (thanks to the GPU portion). The module further contains 4GB of LPDDR4 memory and 16GB of eMMC flash storage.
In short, while on module storage has not increased, RAM has been doubled and compute performance has tripled for FP16 compute performance and jumped by approximately 40% for FP32 versus the Jetson TK1's 2GB of DDR3 and 192-core Kepler GPU. The TX1 also uses a smaller process node at 20nm (versus 28nm) and the chip is said to use "very little power." Networking support includes 802.11ac and Gigabit Ethernet. The chart below outlines the major differences between the two platforms.
|Jetson TX1||Jetson TK1|
|GPU (Architecture)||256-core (Maxwell)||192-core (Kepler)|
|CPU||4 x ARM Cortex A57 + 4 x A53||"4+1" ARM Cortex A15 "r3"|
|RAM||4 GB LPDDR4||2 GB LPDDR3|
|eMMC||16 GB||16 GB|
|Compute Performance (FP16)||1 TFLOP||326 GFLOPS|
|Compute Performance (FP32) - via AnandTech||512 GFLOPS (AT's estimation)||326 GFLOPS (NVIDIA's number)|
The TX1 will run the Linux For Tegra operating system and supports the usual suspects of CUDA 7.0, cuDNN, and VisionWorks development software as well as the latest OpenGL drivers (OpenGL 4.5, OpenGL ES 3.1, and Vulkan).
NVIDIA is continuing to push for CUDA Everywhere, and the Jetson TX1 looks to be a more mature product that builds on the TK1. The huge leap in compute performance should enable even more interesting projects and bring more sophisticated automation and machine learning to smaller and more intelligent devices.
For those interested, the Jetson TX1 Development Kit (the full I/O development board with bundled module) will be available for pre-order today at $599 while the TX1 module itself will be available soon for approximately $299 each in orders of 1,000 or more (like Intel's tray pricing).
With CUDA 7, it is apparently possible for the GPU to be used for general purpose processing as well which may open up some doors that where not possible before in such a small device. I am interested to see what happens with NVIDIA's embedded device play and what kinds of automated hardware is powered by the tiny SoC and its beefy graphics.