Subject: Graphics Cards | June 5, 2018 - 11:58 PM | Tim Verry
Tagged: Vega, machine learning, instinct, HBM2, gpu, computex 2018, computex, amd, 7nm
AMD showed off its first 7nm GPU in the form of the expected AMD Radeon Instinct RX Vega graphics product and RX Vega GPU with 32GB of HBM2 memory. The new GPU uses the Vega architecture along with the open source ecosystem built by AMD to enable both graphics and GPGPU workloads. AMD demonstrated using the 7nm RX Vega GPU for ray tracing in a cool demo that showed realistic reflections and shadows being rendered on a per pixel basis in a model. Granted, we are still a long way away from seeing that kind of detail in real time gaming, but is still cool to see glimpses of that ray traced future.
According to AMD, the 32GB of HBM2 memory will greatly benefit creators and enterprise clients that need to work with large datasets and be able to quickly make changes and updates to models before doing a final render. The larger memory buffer will also help in HPC applications with more big data databases being able to be kept close to the GPU for processing using the wide HBM2 memory bus. Further, HBM2 has physical size and energy efficiency benefits which will pique the interest of datacenters focused on maximizing TCO numbers.
Dr. Lisa Su came on state towards the end of the 7nm Vega demonstration to show off the GPU in person, and you can see that it is rather tiny for the compute power it provides! It is shorter than the two stacks of HBM2 dies on either side, for example.
Of course AMD did not disclose all the nitty-gritty specifications of the new machine learning graphics card that enthusiasts want to know. We will have to wait a bit longer for that information unfortunately!
As for other 7nm offerings? As Ryan talked about during CES in January, 2018 will primarily be the year for the machine learning-focused Radeon Instinct RX Vega 7nm GPU, with other consumer-focused GPUs using the smaller process node likely coming out in 2019. Whether those 7nm GPUs in 2019 will be a refreshed Vega or the new Navi is still up for debate, however AMD's graphics roadmap certainly doesn't rule out Navi as a possibility. In any case, AMD did state during the livestream that it intends to release a new GPU every year with the GPUs alternating between new architecture and new process node.
What are your thoughts on AMD's graphics roadmap and its first 7nm Vega GPU?
Subject: General Tech | June 4, 2018 - 04:14 PM | Tim Verry
Tagged: nvidia, ai, robotics, machine learning, machine vision, jetson, xavier
NVIDIA launched a new platform for programming and training AI-powered robots called NVIDIA Isaac. The platform is based around the company’s Xavier SoC and supported with Isaac Robotics Software which includes an Isaac SDK with accelerated libraries, NVIDIA developed Isaac IMX algorithms, and a virtualized training and testing environment called Isaac SIM.
According to NVIDIA, Isaac will enable a new wave of machines and robotics powered by artificial intelligence aimed at manufacturing, logistics, agriculture, construction, and other industrial and infrastructural industries. Using the Jetson Xavier hardware platform for processing along with a suite of sensors and cameras, Isaac-powered robots will be capable of accurately analyze their environment and their spatial positioning within it to be able to adapt to obstacles and work safely in hazardous areas and/or alongside human workers.
NVIDIA notes that its new Jetson Xavier platform is 10-times more energy efficient while offering 20-times more compute performance than the Jetson TX2. It seems that NVIDIA has been able to juice up the chip since it was last teased at GTC Europe with it now being rated at up to 30 TOPS and featuring 9 billion transistors. The 30W module (it can also operate in 10W and 15W modes) combines a 512-core Volta GPU with Tensor cores, two NVDLA deep learning accelerators, an 8-core 64-bit ARM CPU (8MB L2 + 4MB L3), and accelerators for image, vision, and video inputs. The Jetson Xavier can handle up to 16 camera inputs along with supporting sensor inputs through GPIO and other specialized interfaces. It supports three 4K60 display outputs, PCI-E 4.0, 10 Gbps USB 3.1, USB 2.0, Gigabit Ethernet, UFS, UART, SD, I2S, I2C, SPI, and CAN for I/O.
The virtual world simulation with Jetson Xavier in-the-loop testing sounds interesting if it works as described which would help accelerate development of software to run these promised smarter production lines, more efficient building of homes and other infrastructure like bridges, and easier and more cost effective home package delivery using adaptable and smarter robotics.
The Isaac development platform will be priced at $1,299 and will be available starting in August to early access partners.
What are your thoughts on NVIDIA Isaac?
Subject: General Tech | March 29, 2018 - 03:10 PM | Tim Verry
Tagged: project trillium, nvidia, machine learning, iot, GTC 2018, GTC, deep learning, arm, ai
During GTC 2018 NVIDIA and ARM announced a partnership that will see ARM integrate NVIDIA's NVDLA deep learning inferencing accelerator into the company's Project Trillium machine learning processors. The NVIDIA Deep Learning Accelerator (NVDLA) is an open source modular architecture that is specifically optimized for inferencing operations such as object and voice recognition and bringing that acceleration to the wider ARM ecosystem through Project Trillium will enable a massive number of smarter phones, tablets, Internet-of-Things, and embedded devices that will be able to do inferencing at the edge which is to say without the complexity and latency of having to rely on cloud processing. This means potentially smarter voice assistants (e.g. Alexa, Google), doorbell cameras, lighting, and security around the home and out-and-about on your phone for better AR, natural translation, and assistive technologies.
Karl Freund, lead analyst for deep learning at Moor Insights & Strategy was quoted in the press release in stating:
“This is a win/win for IoT, mobile and embedded chip companies looking to design accelerated AI inferencing solutions. NVIDIA is the clear leader in ML training and Arm is the leader in IoT end points, so it makes a lot of sense for them to partner on IP.”
ARM's Project Trillium was announced back in February and is a suite of IP for processors optimized for parallel low latency workloads and includes a Machine Learning processor, Object Detection processor, and neural network software libraries. NVDLA is a hardware and software platform based upon the Xavier SoC that is highly modular and configurable hardware that can feature a convolution core, single data processor, planar data processor, channel data processor, and data reshape engines. The NVDLA can be configured with all or only some of those elements and they can independently them up or down depending on what processing acceleration they need for their devices. NVDLA connects to the main system processor over a control interface and through two AXI memory interfaces (one optional) that connect to system memory and (optionally) dedicated high bandwidth memory (not necessarily HBM but just its own SRAM for example).
NVDLA is presented as a free and open source architecture that promotes a standard way to design deep learning inferencing that can accelerate operations to infer results from trained neural networks (with the training being done on other devices perhaps by the DGX-2). The project, which hosts the code on GitHub and encourages community contributions, goes beyond the Xavier-based hardware and includes things like drivers, libraries, TensorRT support (upcoming) for Google's TensorFlow acceleration, testing suites and SDKs as well as a deep learning training infrastructure (for the training side of things) that is compatible with the NVDLA software and hardware, and system integration support.
Bringing the "smarts" of smart devices to the local hardware and closer to the users should mean much better performance and using specialized accelerators will reportedly offer the performance levels needed without blowing away low power budgets. Internet-of-Things (IoT) and mobile devices are not going away any time soon, and the partnership between NVIDIA and ARM should make it easier for developers and chip companies to offer smarter (and please tell me more secure!) smart devices.
- NVDLA Primer
- Project Trillium: Machine Learning on ARM
- NVIDIA Announces DGX-2 with 16 GV100s & 8 100Gb NICs
- GTC 2018: NVIDIA Announces Volta-Powered Quadro GV100
- NVIDIA Teases Low Power, High Performance Xavier SoC That Will Power Future Autonomous Vehicles
- NVIDIA Launches Jetson TX2 With Pascal GPU For Embedded Devices
- ARM Announces Project Trillium, a New Dedicated AI Processing Family
Addressing New Markets
Machine Learning is one of the hot topics in technology, and certainly one that is growing at a very fast rate. Applications such as facial recognition and self-driving cars are powering much of the development going on in this area. So far we have seen CPUs and GPUs being used in ML applications, but in most cases these are not the most efficient ways of doing these highly parallel but relatively computationally simple workloads. New chips have been introduced that are far more focused on machine learning, and now it seems that ARM is throwing their hat into the ring.
ARM is introducing three products under the Project Trillium brand. It features a ML processor, a OD (Object Detection) processor, and a ARM developed Neural Network software stack. This project came as a surprise for most of us, but in hindsight it is a logical avenue for them to address as it will be incredibly important moving forward. Currently many applications that require machine learning are not processed at the edge, namely in the consumer’s hand or device right next to them. Workloads may be requested from the edge, but most of the heavy duty processing occurs in datacenters located all around the world. This requires communication, and sometimes pretty hefty levels of bandwidth. If neither of those things are present, applications requiring ML break down.
Subject: General Tech | November 7, 2017 - 01:35 PM | Jeremy Hellstrom
Tagged: machine learning, ai
Not to be out done by the research conducted by Japan's Kyushu University which led to the frog is not truck portion of lasts weeks podcast, MIT researchers have also been tormenting image recognition software. Their findings were a little more worrisome, as a 3D printed turtle was identified as a rifle which could lead to some very bad situations in airports or other secure locations. In this case, instead of adding a few pixels to the image, they introduced different angles and lighting conditions which created enough noise to completely fool Google's image recognition AI, Inception. The printed turtle was misidentified because of a the texture which they chose, showing that this issue extends beyond photos to include physical objects. Pop by The Register for more details as well as an ingredient you never want to see on your toast.
"Students at MIT in the US claim they have developed an algorithm for creating 3D objects and pictures that trick image-recognition systems into severely misidentifying them. Think toy turtles labeled rifles, and baseballs as cups of coffee."
Here is some more Tech News from around the web:
- No, Samsung, you really do owe Apple $120m for patent infringement @ The Register
- Almost Everything on Computers Is Perceptually Slower Than It Was in 1983 @ [H]ard|OCP
- Get Watch Dogs FREE From Ubisoft This Week! @ TechARP
- Fat-fingered Level 3 techie reduces internet to level zero: Glitch knocks out connections @ The Register
- Kaspersky warns of increased DDoS attacks against gaming companies @ The Inquirer
- Android security update fixes KRACK, slaps Band-Aid on Pixel 2 XL screen @ Ars Technica
- Seldom used 'i' mangled by baffling autocorrect bug in Apple's iOS 11 @ The Register
- Microsoft releases strict standards for 'highly secure' Windows 10 devices @ The Inquirer
- MINIX: Intel's Hidden In-chip Operating System @ Slashdot
Subject: General Tech | June 28, 2017 - 11:17 PM | Scott Michaud
Tagged: Unity, machine learning, deep learning
Unity, who makes the popular 3D game engine of the same name, has announced a research fellowship for integrating machine learning into game development. Two students, who must have been enrolled in a Masters or a PhD program on June 26th, will be selected and provided with $30,000 for a 6-month fellowship. The deadline is midnight (PDT) on September 9th.
We’re beginning to see a lot of machine-learning applications being discussed for gaming. There are some cases, like global illumination and fluid simulations, where it could be faster for a deep-learning algorithm to hallucinate a convincing than a physical solver will produce a correct one. In this case, it makes sense to post-process each frame, so, naturally, game engine developers are paying attention.
If eligible, you can apply on their website.
Subject: General Tech | May 29, 2017 - 08:46 PM | Scott Michaud
Tagged: machine learning, fluid, deep neural network, deep learning
SIGGRAPH 2017 is still a few months away, but we’re already starting to see demos get published as groups try to get them accepted to various parts of the trade show. In this case, Physics Forests published a two-minute video where they perform fluid simulations without actually simulating fluid dynamics. Instead, they used a deep-learning AI to hallucinate a convincing fluid dynamics result given their inputs.
We’re seeing a lot of research into deep-learning AIs for complex graphics effects lately. The goal of most of these simulations, whether they are for movies or video games, is to create an effect that convinces the viewer that what they see is realistic. The goal is not to create an actually realistic effect. The question then becomes, “Is it easier to actually solve the problem? Or is it easier having an AI learn, based on a pile of data sorted into successes and failures, come up with an answer that looks correct to the viewer?”
In a lot of cases, like global illumination and even possibly anti-aliasing, it might be faster to have an AI trick you. Fluid dynamics is just one example.
Subject: General Tech, Processors | March 12, 2017 - 05:11 PM | Tim Verry
Tagged: pascal, nvidia, machine learning, iot, Denver, Cortex A57, ai
Measuring 50mm x 87mm, the Jetson TX2 packs quite a bit of processing power and I/O including an SoC with two 64-bit Denver 2 cores with 2MB L2, four ARM Cortex A57 cores with 2MB L2, and a 256-core GPU based on NVIDIA’s Pascal architecture. The TX2 compute module also hosts 8 GB of LPDDR4 (58.3 GB/s) and 32 GB of eMMC storage (SDIO and SATA are also supported). As far as I/O, the Jetson TX2 uses a 400-pin connector to connect the compute module to the development board or final product and the final I/O available to users will depend on the product it is used in. The compute module supports up to the following though:
- 2 x DSI
- 2 x DP 1.2 / HDMI 2.0 / eDP 1.4
- USB 3.0
- USB 2.0
- 12 x CSI lanes for up to 6 cameras (2.5 GB/second/lane)
- PCI-E 2.0:
- One x4 + one x1 or two x1 + one x2
- Gigabit Ethernet
The Jetson TX2 runs the “Linux for Tegra” operating system. According to NVIDIA the Jetson TX2 can deliver up to twice the performance of the TX1 or up to twice the efficiency at 7.5 watts at the same performance.
The extra horsepower afforded by the faster CPU, updated GPU, and increased memory and memory bandwidth will reportedly enable smart end user devices with faster facial recognition, more accurate speech recognition, and smarter AI and machine learning tasks (e.g. personal assistant, smart street cameras, smarter home automation, et al). Bringing more power locally to these types of internet of things devices is a good thing as less reliance on the cloud potentially means more privacy (unfortunately there is not as much incentive for companies to make this type of product for the mass market but you could use the TX2 to build your own).
Cisco will reportedly use the Jetson TX2 to add facial and speech recognition to its Cisco Spark devices. In addition to the hardware, NVIDIA offers SDKs and tools as part of JetPack 3.0. The JetPack 3.0 toolkit includes Tensor-RT, cuDNN 5.1, VisionWorks 1.6, CUDA 8, and support and drivers for OpenGL 4.5, OpenGL ES 3 2, EGL 1.4, and Vulkan 1.0.
The TX2 will enable better, stronger, and faster (well I don't know about stronger heh) industrial control systems, robotics, home automation, embedded computers and kiosks, smart signage, security systems, and other connected IoT devices (that are for the love of all processing are hardened and secured so they aren't used as part of a botnet!).
Interested developers and makers can pre-order the Jetson TX2 Development Kit for $599 with a ship date for US and Europe of March 14 and other regions “in the coming weeks.” If you just want the compute module sans development board, it will be available later this quarter for $399 (in quantities of 1,000 or more). The previous generation Jetson TX1 Development Kit has also received a slight price cut to $499.
Subject: Graphics Cards | December 12, 2016 - 04:05 PM | Jeremy Hellstrom
Tagged: vega 10, Vega, training, radeon, Polaris, machine learning, instinct, inference, Fiji, deep neural network, amd
Ryan was not the only one at AMD's Radeon Instinct briefing, covering their shot across NVIDIA's HPC products. The Tech Report just released their coverage of the event and the tidbits which AMD provided about the MI25, MI8 and MI6; no relation to a certain British governmental department. They focus a bit more on the technologies incorporated into GEMM and point out that AMD's top is not matched by an NVIDIA product, the GP100 GPU does not come as an add-in card. Pop by to see what else they had to say.
"Thus far, Nvidia has enjoyed a dominant position in the burgeoning world of machine learning with its Tesla accelerators and CUDA-powered software platforms. AMD thinks it can fight back with its open-source ROCm HPC platform, the MIOpen software libraries, and Radeon Instinct accelerators. We examine how these new pieces of AMD's machine-learning puzzle fit together."
Here are some more Graphics Card articles from around the web:
- The Complete AMD Radeon Instinct Tech Briefing @ Tech ARP
- Chill With Radeon Software Crimson ReLive Edition @ Techgage
- Radeon Software Crimson ReLive Edition—an overview @ The Tech Report
- AMD Radeon Crimson ReLive Drivers @ techPowerUp
- AMD talk to KitGuru about Crimson ReLive
- We retest Radeon Chill 2 The Tech Report
- MSI RX 480 Gaming X 8G Review @ OCC
- NVIDIA GeForce GTX 1080 PCI-Express Scaling @ techPowerUp
AMD Enters Machine Learning Game with Radeon Instinct Products
NVIDIA has been diving in to the world of machine learning for quite a while, positioning themselves and their GPUs at the forefront on artificial intelligence and neural net development. Though the strategies are still filling out, I have seen products like the DIGITS DevBox place a stake in the ground of neural net training and platforms like Drive PX to perform inference tasks on those neural nets in self-driving cars. Until today AMD has remained mostly quiet on its plans to enter and address this growing and complex market, instead depending on the compute prowess of its latest Polaris and Fiji GPUs to make a general statement on their own.
The new Radeon Instinct brand of accelerators based on current and upcoming GPU architectures will combine with an open-source approach to software and present researchers and implementers with another option for machine learning tasks.
The statistics and requirements that come along with the machine learning evolution in the compute space are mind boggling. More than 2.5 quintillion bytes of data are generated daily and stored on phones, PCs and servers, both on-site and through a cloud infrastructure. That includes 500 million tweets, 4 million hours of YouTube video, 6 billion google searches and 205 billion emails.
Machine intelligence is going to allow software developers to address some of the most important areas of computing for the next decade. Automated cars depend on deep learning to train, medical fields can utilize this compute capability to more accurately and expeditiously diagnose and find cures to cancer, security systems can use neural nets to locate potential and current risk areas before they affect consumers; there are more uses for this kind of network and capability than we can imagine.