Addressing New Markets
Machine Learning is one of the hot topics in technology, and certainly one that is growing at a very fast rate. Applications such as facial recognition and self-driving cars are powering much of the development going on in this area. So far we have seen CPUs and GPUs being used in ML applications, but in most cases these are not the most efficient ways of doing these highly parallel but relatively computationally simple workloads. New chips have been introduced that are far more focused on machine learning, and now it seems that ARM is throwing their hat into the ring.
ARM is introducing three products under the Project Trillium brand. It features a ML processor, a OD (Object Detection) processor, and a ARM developed Neural Network software stack. This project came as a surprise for most of us, but in hindsight it is a logical avenue for them to address as it will be incredibly important moving forward. Currently many applications that require machine learning are not processed at the edge, namely in the consumer’s hand or device right next to them. Workloads may be requested from the edge, but most of the heavy duty processing occurs in datacenters located all around the world. This requires communication, and sometimes pretty hefty levels of bandwidth. If neither of those things are present, applications requiring ML break down.
Subject: Processors | February 7, 2018 - 09:01 AM | Tim Verry
Tagged: Xeon D, xeon, servers, networking, micro server, Intel, edge computing, augmented reality, ai
Intel announced a major refresh of its Xeon D System on a Chip processors aimed at high density servers that bring the power of the datacenter as close to end user devices and sensors as possible to reduce TCO and application latency. The new Xeon D 2100-series SoCs are built on Intel’s 14nm process technology and feature the company’s new mesh architecture (gone are the days of the ring bus). According to Intel the new chips are squarely aimed at “edge computing” and offer up 2.9-times the network performance, 2.8-times the storage performance, and 1.6-times the compute performance of the previous generation Xeon D-1500 series.
Intel has managed to pack up to 18 Skylake-based processing cores, Quick Assist Technology co-processing (for things like hardware accelerated encryption/decryption), four DDR4 memory channels addressing up to 512 GB of DDR4 2666 MHz ECC RDIMMs, four Intel 10 Gigabit Ethernet controllers, 32 lanes of PCI-E 3.0, and 20 lanes of flexible high speed I/O that includes up to 14 lanes of SATA 3.0, four USB 3.0 ports, or 20 lanes of PCI-E. Of course, the SoCs support Intel’s Management Engine, hardware virtualization, HyperThreading, Turbo Boost 2.0, and AVX-512 instructions with 1 FMA (fuse-multiply-add) as well..
Suffice it to say, there is a lot going on here with these new chips which represent a big step up in capabilities (and TDPs) further bridging the gap between the Xeon E3 v5 family and Xeon E5 family and the new Xeon Scalable Processors. Xeon D is aimed at datacenters where power and space are limited and while the soldered SoCs are single socket (1P) setups, high density is achieved by filling racks with as many single processor Mini ITX boards as possible. Xeon D does not quite match the per-core clockspeeds of the “proper” Xeons but has significantly more cores than Xeon E3 and much lower TDPs and cost than Xeon E5. It’s many lower clocked and lower power cores excel at burstable tasks such as serving up websites where many threads may be generated and maintained for long periods of time but not need a lot of processing power and when new page requests do come in the cores are able to turbo boost to meet demand. For example, Facebook is using Xeon D processors to serve up its front end websites in its Yosemite OpenRack servers where each server rack holds 192 Xeon D 1540 SoCs (four Xeon D boards per 1U sleds) for 1,536 Broadwell cores. Other applications include edge routers, network security appliances, self-driving vehicles, and augmented reality processing clusters. The autonomous vehicles use case is perhaps the best example of just what the heck edge computing is. Rather than fighting the laws of physics to transfer sensor data back to a datacenter for processing to be sent back to the car to in time for it to safely act on the processed information, the idea of edge computing is to bring most of the processing, networking, and storage power as close as possible to both the input sensors and the device (and human) that relies on accurate and timely data to make decisions.
As far as specifications, Intel’s new Xeon D lineup includes 14 processor models broken up into three main categories. The Edge Server and Cloud SKUs include eight, twelve, and eighteen core options with TDPs ranging from 65W to 90W. Interestingly, the 18 core Xeon D does not feature the integrated 10 GbE networking the lower end models have though it supports higher DDR4 memory frequencies. The two remaining classes of Xeon D SoCs are “Network Edge and Storage” and “Integrated Intel Quick Assist Technology” SKUs. These are roughly similar with two eight core, one 12 core, and one 16 core processor (the former also has a quad core that isn’t present in the latter category) though there is a big differentiator in clockspeeds. It seems customers will have to choose between core clockspeeds or Quick Assist acceleration (up to 100 Gbps) as the chips that do have QAT are clocked much lower than the chips without the co-processor hardware which makes sense because they have similar TDPs so clocks needed to be sacrificed to maintain the same core count. Thanks to the updated architecture, Intel is encroaching a bit on the per-core clockspeeds of the Xeon E3 and Xeon E5s though when turbo boost comes into play the Xeon Ds can’t compete.
The flagship Xeon D 2191 offers up two more cores (four additional threads) versus the previous Broadwell-based flagship Xeon D 1577 as well as higher clockspeeds at 1.6 GHz base versus 1.3 GHz and 2.2 GHz turbo versus 2.1 GHz turbo. The Xeon D 2191 does lack the integrated networking though. Looking at the two 16 core refreshed Xeon Ds compared to the 16 core Xeon D 1577, Intel has managed to increase clocks significantly (up to 2.2 GHz base and 3.0 GHz boost versus 1.3 GHz base and 2.10 GHz boost), double the number of memory channels and network controllers, and increase the maximum amount of memory from 128 GB to 512 GB. All those increases did come at the cost of TDP though which went from 45W to 100W.
Xeon D has always been an interesting platform both for enthusiasts running VM labs and home servers and big data enterprise clients building and serving up the 'next big thing' built on the astonishing amounts of data people create and consume on a daily basis. (Intel estimates a single self driving car would generate as much as 4TB of data per day while the average person in 2020 will generate 1.5 GB of data per day and VR recordings such as NFL True View will generate up to 3TB a minute!) With Intel ramping up both the core count, per-core performance, and I/O the platform is starting to not only bridge the gap between single socket Xeon E3 and dual socket Xeon E5 but to claim a place of its own in the fast-growing server market.
I am looking forward to seeing how Intel's partners and the enthusiast community take advantage of the new chips and what new projects they will enable. It is also going to be interesting to see the responses from AMD (e.g. Snowy Owl and to a lesser extent Great Horned Owl at the low and niche ends as it has fewer CPU cores but a built in GPU) and the various ARM partners (Qualcomm Centriq, X-Gene, Ampere, ect.*) as they vie for this growth market space with higher powered SoC options in 2018 and beyond.
- New Intel Xeon D Broadwell Processors Aimed at Low Power, High Density Servers
- Intel Xeon Scalable Processor Launch - New Architecture, New Platform for Data Center
- Qualcomm Centriq 2400 Arm-based Server Processor Begins Commercial Shipment
- Today's bonus AMD rumour: Starship, Naples, Zeppelin and a flock of Owls
*Note that X-Gene and Ampere are both backed by the Carlyle Group now with MACOM having sold X-Gene to Project Denver Holdings and the ex-Intel employee led Ampere being backed by the Carlyle Group.
Subject: Storage | January 31, 2018 - 08:39 PM | Tim Verry
Tagged: z-ssd, Z-NAND, Samsung, HPC, enterprise, ai
Samsung will be introducing a new high performance solid state drive using new Z-NAND flash at ISSCC next month. The new Samsung SZ985 Z-SSD is aimed squarely at the high-performance computing (HPC) market for big data number crunching, supercomputing, AI research, and IoT application development. The new drive will come in two capacities at 800GB and 240GB and combines low latency Z-NAND flash with 1.5GB LPDDR4 DRAM cache and an unspecified "high performance" Samsung controller.
The Z-NAND drive is interesting because it represents an extremely fast storage solution that offers up to 10-times cell read performance and 5-times less write latency than 3-bit V-NAND based drives such as Samsung's own PM963 NVMe SSD. The Z-NAND technology represents a middle ground (though closer to Optane than not) between NAND and X Point flash memory without the expense and complexity of 3D XPoint (at least, in theory). The single port 4-lane drive (PCI-E x4) reportedly is able to hit random read performance of 750,000 IOPS and random write performance of 170,000 IOPS. The drive is able to do this with very little latency at around 16µs (microseconds). To put that in perspective, a traditional NVMe SSD can exhibit write latencies of around 90+ microseconds while Optane sits at around half the latency of Z-NAND (~8-10µs). You can find a comparison chart of latency percentiles of various storage technologies here. While the press release did not go into transfer speeds or read latencies, Samsung talked about that late last year when it revealed the drive's existence. The SZ985 Z-SSD maxes out its x4 interface at 3.2 GB/s for both sequential reads and sequential writes. Further, read latencies are rated at between 12µs and 20µs. At the time Allyn noted that the 30 drive writes per day (DWPD) matched that of Intel's P4800X and stated that it was an impressive feat considering Samsung is essentially running its V-NAND flash in a different mode with Z-NAND. Looking at the specs, the Samsung SZ985 Z-SSD has the same 2 million hours MTBF but is actually rated higher for endurance at 42 Petabytes over five years (versus 41 PB). Both drives appear to offer the same 5-year warranty though we may have to wait for the ISSCC announcement for confirmation on that.
It appears that the SZ-985 offers a bit more capacity, higher random read IOPS, and better sequential performance but with slightly more latency and lower random write IOPS than the 3D XPoint based Intel Optane P4800X drive.
In all Samsung has an interesting drive and if they can price it right I can see them selling a ton of these drives to the enterprise market for big data analytics tasks as well as a high-speed drive for researchers. I am looking forward to more information being released about the Z-SSD and its Z-NAND flash technology at the ISSCC (International Solid-State Circuits Conference) in mid-February.
Subject: General Tech | November 7, 2017 - 01:35 PM | Jeremy Hellstrom
Tagged: machine learning, ai
Not to be out done by the research conducted by Japan's Kyushu University which led to the frog is not truck portion of lasts weeks podcast, MIT researchers have also been tormenting image recognition software. Their findings were a little more worrisome, as a 3D printed turtle was identified as a rifle which could lead to some very bad situations in airports or other secure locations. In this case, instead of adding a few pixels to the image, they introduced different angles and lighting conditions which created enough noise to completely fool Google's image recognition AI, Inception. The printed turtle was misidentified because of a the texture which they chose, showing that this issue extends beyond photos to include physical objects. Pop by The Register for more details as well as an ingredient you never want to see on your toast.
"Students at MIT in the US claim they have developed an algorithm for creating 3D objects and pictures that trick image-recognition systems into severely misidentifying them. Think toy turtles labeled rifles, and baseballs as cups of coffee."
Here is some more Tech News from around the web:
- No, Samsung, you really do owe Apple $120m for patent infringement @ The Register
- Almost Everything on Computers Is Perceptually Slower Than It Was in 1983 @ [H]ard|OCP
- Get Watch Dogs FREE From Ubisoft This Week! @ TechARP
- Fat-fingered Level 3 techie reduces internet to level zero: Glitch knocks out connections @ The Register
- Kaspersky warns of increased DDoS attacks against gaming companies @ The Inquirer
- Android security update fixes KRACK, slaps Band-Aid on Pixel 2 XL screen @ Ars Technica
- Seldom used 'i' mangled by baffling autocorrect bug in Apple's iOS 11 @ The Register
- Microsoft releases strict standards for 'highly secure' Windows 10 devices @ The Inquirer
- MINIX: Intel's Hidden In-chip Operating System @ Slashdot
Subject: General Tech, Motherboards | September 14, 2017 - 02:13 AM | Tim Verry
Tagged: password cracking, mining, gpgpu, cryptocurrency, colorful, ai
Colorful recently unveiled an interesting bare-bones motherboard focused on cryptocurrency miners and other GPU heavy workloads with its main feature being eight double spaced PCI-E 3.0 x16 slots. The non-standard form factor Colorful C.B250A-BTC PLUS V20 motherboard measures 485mm x 195mm (approx. 19.1 x 7.7 inches) and offers a no-frills setup that is ready for miners to attach to open racks. The motherboard is based on Intel’s LGA 1151 socket and B250 chipset.
The majority of the board is taken up by eight PCI-E 3.0 x16 slots where the top slot is wired directly to the CPU and is electrically x16 while the rest are wired to the B250 chipset and are x1 slots. There are 16(!) PCI-E power connectors (eight 6-pin and eight 8-pin) for providing power to the GPU and two 4-pin ATX power connectors for powering the CPU and single SO-DIMM slot through what looks to be six power phases. Notably, there is no 24-pin power connector on this board to make it easier to use multiple power supplies and share motherboards between power supplies (though it’s not clear how Colorful plans to control turning all these power supplies on/off at the same time). Beyond the PCI-E slots there is not much to this motherboard. Internal I/O includes the 1151 socket for Skylake and Kaby Lake CPUs, a single DDR4 SO-DIMM slot, one SATA port, one M.2 slot, and six fan headers. Around back are two USB ports, one HDMI video output, and a single gigabit ethernet port.
The board is a no-frills design that should be quite appealing for miners but also as an easy way to jump into GPGPU projects (AI research, rendering, machine learning, password cracking, etc.). The 2-slot spacing allows air cooled (hopefully blower style) cards to be installed without needing to find and test quality PCI-E riser cables. There is no word on pricing yet, and while it should be on the cheaper side based on the features and hardware it’s packing as it’s a custom design aimed at mining it may actually come out at a hefty premium for the convenience it offers them. On the bright side, it might have decent resale value to factor into the ROI calculations for the other non-mining applications I mentioned (a mean password cracking rig!). A neat board in any case, and as I mentioned previously it is interesting to see the new designs and configurations the mining craze has enticed manufacturers into exploring.
- Asus Launches B250 Expert Mining Motherboard With 19 PCI-E Slots
- Let's Talk About Mining - Cryptocurrency Revisited
- Donate to the PC Perspective Mining Pool! A NiceHash How-to
- A Quick Look at the SAPPHIRE Radeon RX 470 Mining Edition
- NVIDIA Partners Launching Mining Focused P106-100 and P104-100 Graphics Cards
- Mining specific cards are real - ASUS and Sapphire GP106 and RX 470 show up
Kal Simpson recently had the chance to sit down and have an extensive interview with SILVIA's Chief Product Officer - Cognitive Code, Alex Mayberry. SILVIA is a company that specializes on conversational AI that can be adapted to a variety of platforms and applications. Kal's comments are in bold while Alex's are in italics.
Always good to speak with you Alex. Whether it's the latest Triple-A video game release or the progress being made in changing the way we play, virtual reality for instance – your views and developments within the gaming space as a whole remains impressive. Before we begin, I’d like to give the audience a brief flashback of your career history. Prominent within the video game industry you’ve been involved with many, many titles – primarily within the PC gaming space. Quake 2: The Reckoning, America’s Army, a plethora of World of Warcraft titles.
Those more familiar with your work know you as the lead game producer for Diablo 3 / Reaper of Souls, as well as the executive producer for Star Citizen. The former of which we spoke on during the release of the game for PC, PlayStation 4 and the Xbox One, back in 2014.
So I ask, given your huge involvement with some of the most popular titles, what sparked your interest within the development of intelligent computing platforms? No-doubt the technology can be adapted to applications within gaming, but what’s the initial factor that drove you to Cognitive Code – the SILVIA technology?
AM: Conversational intelligence was something that I had never even thought about in terms of game development. My experience arguing with my Xbox and trying to get it to change my television channel left me pretty sceptical about the technology. But after leaving Star Citizen, my paths crossed with Leslie Spring, the CEO and Founder of Cognitive Code, and the creator of the SILVIA platform. Initially, Leslie was helping me out with some engineering work on VR projects I was spinning up. After collaborating for a bit, he introduced me to his AI, and I became intrigued by it. Although I was still very focused on VR at the time, my mind kept drifting to SILVIA.
I kept pestering Leslie with questions about the technology, and he continued to share some of the things that it could do. It was when I saw one of his game engine demos showing off a sci-fi world with freely conversant robots that the light went on in my head, and I suddenly got way more interested in artificial intelligence. At the same time, I was discovering challenges in VR that needed solutions. Not having a keyboard in VR creates an obstacle for capturing user input, and floating text in your field of view is really detrimental to the immersion of the experience. Also, when you have life-size characters in VR, you naturally want to speak to them. This is when I got interested in using SILVIA to introduce an entirely new mechanic to gaming and interactive entertainment. No more do we have to rely on conversation trees and scripted responses.
No more do we have to read a wall of text from a quest giver. With this technology, we can have a realistic and free-form conversation with our game characters, and speak to them as if they are alive. This is such a powerful tool for interactive storytelling, and it will allow us to breathe life into virtual characters in a way that’s never before been possible. Seeing the opportunity in front of me, I joined up with Cognitive Code and have spent the last 18 months exploring how to design conversationally intelligent avatars. And I’ve been having a blast doing it.
Subject: General Tech, Graphics Cards | May 27, 2017 - 12:18 AM | Tim Verry
Tagged: vision fund, softbank, nvidia, iot, HPC, ai
SoftBank, the Tokyo, Japan based Japanese telecom and internet technology company has reportedly quietly amassed a 4.9% stake in graphics chip giant NVIDIA. Bloomberg reports that SoftBank has carefully invested $4 billion into NVIDIA avoiding the need to get regulatory approval in the US by keeping its investment under 5% of the company. SoftBank has promised the current administration that it will invest $50 billion into US tech companies and it seems that NVIDIA is the first major part of that plan.
NVIDIA's Tesla V100 GPU.
Led by Chairman and CEO Masayoshi Son, SoftBank is not afraid to invest in technology companies it believes in with major past acquisitions and investments in companies like ARM Holdings, Sprint, Alibaba, and game company Supercell.
The $4 billion-dollar investment makes SoftBank the fourth largest shareholder in NVIDIA, which has seen the company’s stock rally from SoftBank’s purchases and vote of confidence. The (currently $93) $100 billion Vision Fund may also follow SoftBank’s lead in acquiring a stake in NVIDIA which is involved in graphics, HPC, AI, deep learning, and gaming.
Overall, this is good news for NVIDIA and its shareholders. I am curious what other plays SoftBank will make for US tech companies.
What are your thoughts on SoftBank investing heavily in NVIDIA?
Subject: General Tech, Processors | March 12, 2017 - 05:11 PM | Tim Verry
Tagged: pascal, nvidia, machine learning, iot, Denver, Cortex A57, ai
Measuring 50mm x 87mm, the Jetson TX2 packs quite a bit of processing power and I/O including an SoC with two 64-bit Denver 2 cores with 2MB L2, four ARM Cortex A57 cores with 2MB L2, and a 256-core GPU based on NVIDIA’s Pascal architecture. The TX2 compute module also hosts 8 GB of LPDDR4 (58.3 GB/s) and 32 GB of eMMC storage (SDIO and SATA are also supported). As far as I/O, the Jetson TX2 uses a 400-pin connector to connect the compute module to the development board or final product and the final I/O available to users will depend on the product it is used in. The compute module supports up to the following though:
- 2 x DSI
- 2 x DP 1.2 / HDMI 2.0 / eDP 1.4
- USB 3.0
- USB 2.0
- 12 x CSI lanes for up to 6 cameras (2.5 GB/second/lane)
- PCI-E 2.0:
- One x4 + one x1 or two x1 + one x2
- Gigabit Ethernet
The Jetson TX2 runs the “Linux for Tegra” operating system. According to NVIDIA the Jetson TX2 can deliver up to twice the performance of the TX1 or up to twice the efficiency at 7.5 watts at the same performance.
The extra horsepower afforded by the faster CPU, updated GPU, and increased memory and memory bandwidth will reportedly enable smart end user devices with faster facial recognition, more accurate speech recognition, and smarter AI and machine learning tasks (e.g. personal assistant, smart street cameras, smarter home automation, et al). Bringing more power locally to these types of internet of things devices is a good thing as less reliance on the cloud potentially means more privacy (unfortunately there is not as much incentive for companies to make this type of product for the mass market but you could use the TX2 to build your own).
Cisco will reportedly use the Jetson TX2 to add facial and speech recognition to its Cisco Spark devices. In addition to the hardware, NVIDIA offers SDKs and tools as part of JetPack 3.0. The JetPack 3.0 toolkit includes Tensor-RT, cuDNN 5.1, VisionWorks 1.6, CUDA 8, and support and drivers for OpenGL 4.5, OpenGL ES 3 2, EGL 1.4, and Vulkan 1.0.
The TX2 will enable better, stronger, and faster (well I don't know about stronger heh) industrial control systems, robotics, home automation, embedded computers and kiosks, smart signage, security systems, and other connected IoT devices (that are for the love of all processing are hardened and secured so they aren't used as part of a botnet!).
Interested developers and makers can pre-order the Jetson TX2 Development Kit for $599 with a ship date for US and Europe of March 14 and other regions “in the coming weeks.” If you just want the compute module sans development board, it will be available later this quarter for $399 (in quantities of 1,000 or more). The previous generation Jetson TX1 Development Kit has also received a slight price cut to $499.
Subject: General Tech | February 6, 2017 - 01:36 PM | Jeremy Hellstrom
Tagged: darpa, ai, security, Usenix Enigma 2017
DARPA hosted the first Cyber Grand Challenge last summer, in which the software from seven machine learning projects competed to find and patch vulnerabilities in a network, and to attack each other. While the specific vulnerabilities discovered have not been made public you can read a bit about what was revealed about the contest at Usenix Enigma 2017 over at The Register. For instance, one of the programs managed to find a flaw in the OS all the machines were running on and then hack into another to steal data. A different machine noticed this occurring and patched itself on the fly, making sure that it was protected from that particular attack. Also worth noting is that the entire contest was over in 20 minutes.
"The exact nature of these new bug types remains under wraps, although we hear that at least one involves exploitable vulnerabilities in data queues."
Here is some more Tech News from around the web:
- New SMB bug: How to crash Windows system with a 'link of death' @ The Register
- Windows Cloud: Microsoft's Chrome OS rival revealed in leaked screenshots @ The Inquirer
- Olimex Announces Their Open Source Laptop @ Hack a Day
- Google will restrict Gmail in Windows XP and Vista this year @ The Inquirer
- Denuvo: Our cracked RE7 protection is still better than nothing @ Ars Technica
- FYI: Ticking time-bomb fault will brick Cisco gear after 18 months @ The Register
Subject: General Tech | November 4, 2016 - 02:55 PM | Scott Michaud
Tagged: blizzard, google, ai, deep learning, Starcraft II
Blizzard and DeepMind, which was acquired by Google in 2014 and is now a subsidiary of Alphabet Inc., have just announced opening up StarCraft II for AI research. DeepMind was the company that made AlphaGo, which beat Lee Sedol, a grandmaster of Go, in a best-of-five showmatch with a score of four to one. They hinted at possibly having a BlizzCon champion, some year, do a showmatch as well, which would be entertaining.
StarCraft II is different from Go in three important ways. First, any given player knows what they scout, which they apparently will constrain these AI to honor. Second, there are three possible match-ups for any choice of race, except random, which has nine. Third, it's real-time, which can be good for AI, because they're not constrained by human input limitations, but also difficult from a performance standpoint.
From Blizzard's perspective, better AI can be useful, because humans need to be challenged to learn. Novices won't be embarrassed to lose to a computer over and over, so they can have a human-like opponent to experiment with. Likewise, grandmasters will want to have someone better than them to keep advancing, especially if it allows them to keep new strategies hidden. From DeepMind's perspective, this is another step in AI research, which could be applied to science, medicine, and so forth in the coming years and decades.
Unfortunately, this is an early announcement. We don't know any more details, although they will have a Blizzcon panel on Saturday at 1pm EDT (10am PDT).