Subject: Storage | March 29, 2018 - 10:43 PM | Tim Verry
Tagged: z-ssd, Z-NAND, workstation, Samsung, NVMe, M.2, HPC, enterprise
Samsung is expanding its Z-NAND based "Z-SSD" products with a new M.2 solid state drive for workstations and high-performance compute servers. Previously only available in half-height AIC (add-in-card) form factors, the SZ983 M.2 sports a M.2 22110 form factor and NVMe compatible PCI-E 3.0 x4 interface. The new drive was shown off at Samsung's booth during the Open Compute Project Summit in San Jose and was spotted by Anandtech who managed to snap a couple photos of it.
Image credit: Anandtech spotted Samsung's M.2 Z-SSD at OCP Summit 2018.
The new M.2 Z-SSD will come in 240GB and 480GB capacities and sports an 8 channel Phoenix controller. The drive on display at OCP Summit 2018 had a part number of MZ1JB240HMGG-000FB-001. Comparing it to the SZ985 PCI-E SSD, this new M.2 drive appears to also have a DRAM cache as well as capacitors to protect data in the event of power loss (data writes would be able to completely write from the cache to the drive before safe shutdown) though we don't know if this drive has the same 1.5GB of LPDDR4 cache or not. Note that the sticker of the M.2 drive reads SZ983 while Samsung elsewhere had the M.2 labeled as the SZ985 (M.2) so it's unclear which name will stick when this actually launches though hopefully it's the former just to avoid confusion. The Phoenix (formerly Polaris v2) controller is allegedly going to also be used on some of the higher end V-NAND drives though we'll have to wait and see if that happens or not.
Anyway, back to performance numbers, Samsung rates the M.2 Z-SSD at 3200 MB/s sequential reads and 2800 MB/s sequential writes (so a bit slower than the SZ985 at writes). Samsung did not talk random IOPS numbers. The drive is rated at the same 30 DWPD (drive writes per day) endurance rating as the SZ985 and will have the same 5-year warranty. I am curious if the M.2 NVMe drive is able to hit the same (or close to) random IOPS numbers as the PCI-E card which is rated at up to 750,000 read and 170,000 write IOPS.
Z-NAND is interesting as it represents a middle ground between V-NAND and other 3D NAND flash and 3D XPoint memory in both terms of cost and latency performance with Z-NAND being closer in latency to XPoint than V-NAND. Where it gets interesting is that Z-NAND is essentially V-NAND just run at a different mode and yet they are able to reduce write latency by 5-times (12-to-20 microseconds) and cell read latency by up to 10-times (16 microseconds). While Samsung is already working on second generation Z-NAND, these drives are using first generation Z-NAND which is the more performance (lowest latency) type but costs quite a bit more than 2nd generation which is only a bit slower (more read latency). Judging by the form 110mm form factor, this M.2 drive is aimed squarely at datacenter and workstation usage and is not likely to lead to a consumer Optane 800P (et al) competitor, but if it does well enough we may see some prosumer and consumer Z-NAND based options in the future with newer generations of Z-NAND as they get the right balance of cost and latency for the desktop gaming and enthusiast market.
- Samsung Introducing Z-NAND Based 800GB Z-SSD For Enterprise HPC
- FMS 2017: Samsung Announces QLC V-NAND, 16TB NGSFF SSD, Z-SSD V2, Key Value
- Samsung SZ985 Z-NAND SSD - Upcoming Competition for Intel's P4800X?
- Intel Optane SSD 800P 58GB, 118GB, and RAID Review - 3D XPoint Goes Mainstream
Subject: Storage | January 31, 2018 - 08:39 PM | Tim Verry
Tagged: z-ssd, Z-NAND, Samsung, HPC, enterprise, ai
Samsung will be introducing a new high performance solid state drive using new Z-NAND flash at ISSCC next month. The new Samsung SZ985 Z-SSD is aimed squarely at the high-performance computing (HPC) market for big data number crunching, supercomputing, AI research, and IoT application development. The new drive will come in two capacities at 800GB and 240GB and combines low latency Z-NAND flash with 1.5GB LPDDR4 DRAM cache and an unspecified "high performance" Samsung controller.
The Z-NAND drive is interesting because it represents an extremely fast storage solution that offers up to 10-times cell read performance and 5-times less write latency than 3-bit V-NAND based drives such as Samsung's own PM963 NVMe SSD. The Z-NAND technology represents a middle ground (though closer to Optane than not) between NAND and X Point flash memory without the expense and complexity of 3D XPoint (at least, in theory). The single port 4-lane drive (PCI-E x4) reportedly is able to hit random read performance of 750,000 IOPS and random write performance of 170,000 IOPS. The drive is able to do this with very little latency at around 16µs (microseconds). To put that in perspective, a traditional NVMe SSD can exhibit write latencies of around 90+ microseconds while Optane sits at around half the latency of Z-NAND (~8-10µs). You can find a comparison chart of latency percentiles of various storage technologies here. While the press release did not go into transfer speeds or read latencies, Samsung talked about that late last year when it revealed the drive's existence. The SZ985 Z-SSD maxes out its x4 interface at 3.2 GB/s for both sequential reads and sequential writes. Further, read latencies are rated at between 12µs and 20µs. At the time Allyn noted that the 30 drive writes per day (DWPD) matched that of Intel's P4800X and stated that it was an impressive feat considering Samsung is essentially running its V-NAND flash in a different mode with Z-NAND. Looking at the specs, the Samsung SZ985 Z-SSD has the same 2 million hours MTBF but is actually rated higher for endurance at 42 Petabytes over five years (versus 41 PB). Both drives appear to offer the same 5-year warranty though we may have to wait for the ISSCC announcement for confirmation on that.
It appears that the SZ-985 offers a bit more capacity, higher random read IOPS, and better sequential performance but with slightly more latency and lower random write IOPS than the 3D XPoint based Intel Optane P4800X drive.
In all Samsung has an interesting drive and if they can price it right I can see them selling a ton of these drives to the enterprise market for big data analytics tasks as well as a high-speed drive for researchers. I am looking forward to more information being released about the Z-SSD and its Z-NAND flash technology at the ISSCC (International Solid-State Circuits Conference) in mid-February.
Subject: Memory | January 12, 2018 - 05:46 PM | Tim Verry
Tagged: supercomputing, Samsung, HPC, HBM2, graphics cards, aquabolt
Samsung recently announced that it has begun mass production of its second generation HBM2 memory which it is calling “Aquabolt”. Samsung has refined the design of its 8GB HBM2 packages allowing them to achieve an impressive 2.4 Gbps per pin data transfer rates without needing more power than its first generation 1.2V HBM2.
Reportedly Samsung is using new TSV (through-silicon-via) design techniques and adding additional thermal bumps between dies to improve clocks and thermal control. Each 8GB HBM2 “Aquabolt” package is comprised of eight 8Gb dies each of which is vertically interconnected using 5,000 TSVs which is a huge number especially considering how small and tightly packed these dies are. Further, Samsung has added a new protective layer at the bottom of the stack to reinforce the package’s physical strength. While the press release did not go into detail, it does mention that Samsung had to overcome challenges relating to “collateral clock skewing” as a result of the sheer number of TSVs.
On the performance front, Samsung claims that Aquabolt offers up a 50% increase in per package performance versus its first generation “Flarebolt” memory which ran at 1.6Gbps per pin and 1.2V. Interestingly, Aquabolt is also faster than Samsung’s 2.0Gbps per pin HBM2 product (which needed 1.35V) without needing additional power. Samsung also compares Aquabolt to GDDR5 stating that it offers 9.6-times the bandwidth with a single package of HBM2 at 307 GB/s and a GDDR5 chip at 32 GB/s. Thanks to the 2.4 Gbps per pin speed, Aquabolt offers 307 GB/s of bandwidth per package and with four packages products such as graphics cards can take advantage of 1.2 TB/s of bandwidth.
This second generation HBM2 memory is a decent step up in performance (with HBM hitting 128GB/s and first generation HBM2 hitting 256 GB/s per package and 512 GB/s and 1 TB/s with four packages respectively), but the interesting bit is that it is faster without needing more power. The increased bandwidth and data transfer speeds will be a boon to the HPC and supercomputing market and useful for working with massive databases, simulations, neural networks and AI training, and other “big data” tasks.
Aquabolt looks particularly promising for the mobile market though with future products succeeding the current mobile Vega GPU in Kaby Lake-G processors, Ryzen Mobile APUs, and eventually discrete Vega mobile graphics cards getting a nice performance boost (it’s likely too late for AMD to go with this new HBM2 on these specific products, but future refreshes or generations may be able to take advantage of it). I’m sure it will also see usage in the SoCs uses in Intel’s and NVIDIA’s driverless car projects as well.
Subject: General Tech | November 30, 2017 - 12:48 AM | Tim Verry
Tagged: HPC, supercomputer, Raspberry Pi 3, cluster, research, LANL
The Raspberry Pi has been used to build cheap servers and small clusters before, but BitScope is taking the idea to the extreme with a professional enterprise solution. On display at SC17, the BitScope Raspberry Pi Cluster Module is a 6U rackable drawer that holds 144 Raspberry Pi 3 single board computers along with all of the power, networking, and air cooling needed to keep things running smoothly.
Each cluster module holds two and a half BitScope Blades with each BitScope Blade holding up to 60 Raspberry Pi PCs (or other SBCs like the ODROID C2). Enthusiasts can already purchase their own Quattro Pi boards as well as the cluster plate to assemble their own small clusters though the 6U Cluster Module drawer doesn’t appear to be for sale yet (heh). Specifically each Cluster Module has room for 144 active nodes, six spare nodes, and one cluster manager node.
For reference, the Raspberry Pi 3 features the Broadcom BCM2837 SoC with 4 ARM Cortex A53 cores at 1.2 GHz and a VideoCore IV GPU that is paired with 1 GB of LPDDR2 memory at 900 MHz, 100 Mbps Ethernet, 802.11n Wi-Fi and Bluetooth. The ODROID C2 has 4 Amlogic cores at 1.5 GHz, a Mali 450 GPU, 2 GB of DDR3 SDRAM, and Gigabit Ethernet. Interestingly, BitScope claims the Cluster Module uses a 10 Gigabit Ethernet SFP+ backbone which will help when communicating between Cluster Modules but speeds between individual nodes will be limited by at best one gigabit speeds (less in real world, and in the case of the Pi it is much less than the 100 Mbps port rating due to how it is wired to the SoC).
BitScope is currently building a platform for Los Alamos National Laboratory that will feature five Cluster Modules for a whopping 2,880 64-bit ARM cores, 720GB of RAM, and a 10GbE SFP+ fabric backbone. Fully expanded, a 42U server cabinet holds 7 modules (1008 active nodes / 4,032 active cores) and would consume up to 6KW of power. LANL expects their 5 module setup to use around 3000 W on average though.
What is the New Mexico Consortium and LANL planning to do with all these cores? Well, playing Crysis would prove tough even if they could SLI all those GPUs so instead they plan to use the Raspberry Pi-powered system to model much larger and prohibitively expensive supercomputers for R&D and software development. Building out a relatively low cost and low power system enables it to be powered on and accessed by more people including students, researchers, and programmers where they can learn and design software that runs as efficiently as possible on massive multiple core and multiple node systems. Getting software to scale out to hundreds and thousands of different nodes is tricky, especially if you want all the nodes working on the same problem(s) at once. Keeping each node fed with data, communicating amongst themselves, and returning accurate results while keeping latency low and utilization high is a huge undertaking. LANL is hoping that the Raspberry Pi based system will be the perfect testing ground for software and techniques they can then use on the big gun supercomputers like Trinity, Titan, Summit (ORNL, slated for 2018), and other smaller HPC clusters.
It is cool to see how far the Raspberry Pi has come and while I wish the GPU was more open so that the researchers could more easily work with heterogenous HPC coding rather than just working with the thousands of ARM cores, it is still impressive to see what is essentially a small supercomputer with a 1008 node cluster for under $25,000!
I am interested to see how the researchers at Los Alamos put it to work and the eventual improvements to HPC and supercomputing software that come from this budget cluster project!
- Intel Hopes For Exaflop Capable Supercomputers Within 10 Years
- The Next Most Powerful Supercomputer in the U.S. Is Almost Complete
- NVIDIA Launches Tesla K20X Accelerator Card, Powers Titan Supercomputer
- GTC 2013: Pedraforca Is A Power Efficient ARM + GPU Cluster For Homogeneous (GPU) Workloads
Subject: Cases and Cooling | November 20, 2017 - 10:09 PM | Tim Verry
Tagged: Supercomputing Conference, supercomputing, liquid cooling, immersion cooling, HPC, allied control, 3M
PC Gamer Hardware (formerly Maximum PC) spotted a cool immersion cooling system being shown off at the SuperComputing conference in Denver, Colorado earlier this month. Allied Control who was recently acquired by BitFury (popular for its Bitcoin mining ASICs) was at the show with a two phase immersion cooling system that takes advantage of 3M's Novec fluid and a water cooled condesor coil to submerge and cool high end and densely packed hardware with no moving parts and no pesky oil residue.
Nick Knupffer (@Nick_Knupffer) posted a video (embedded below) of the cooling system in action cooling a high end processor and five graphics cards. The components are submerged in a non-flamable, non-conductive fluid that has a very low boiling point of 41°C. Interestingly, the heatsinks and fans are removed allowing for direct contact between the fluid and the chips (in this case there is a copper baseplate on the CPU but bare ASICs can also be cooled). When the hardware is in use, heat is transfered to the liquid which begins to boil off from a liquid to a vapor / gaseous state. The vapor rises to the surface and hits a condensor coil (which can be water cooled) that cools the gas until it turns back into a liquid and falls back into the tank. The company has previously shown off an overclocked 20 GPU (250W) plus dual Xeon system that was able to run flat out (The GPUs at 120% TDP) running deep learning as well as mining Z-Cash when not working on HPC projects while keeping all the hardware well under thermal limits and not throttling. Cnet also spotted a 10 GPU system being shown off at Computex (warning autoplay video ad!).
According to 3M, two phase immersion cooling is extremely efficient (many times more than air or even water) and can enable up to 95% lower energy cooling costs versus conventional air cooling. Further, hardware can be packed much more tightly with up to 100kW/square meter versus 10kW/sq. m with air meaning immersion cooled hardware can take up to 10% less floor space and the heat produced can be reclaimed for datacenter building heating or other processes.
— Nick Knupffer (@Nick_Knupffer) November 14, 2017
Neat stuff for sure even if it is still out of the range of home gaming PCs and mining rigs for now! Speaking of mining BitFury plans to cool a massive 40+ MW ASIC mining farm in the Republic of Georgia using an Allied Control designed immersion cooling system (see links below)!
- Two-Phase Immersion Cooling A revolution in data center efficiency @ 3M [PDF]
- 3M, Orange Silicon Valley, Allied Control and U.S. Naval Research Laboratory Demonstrate High-Density Supercomputing at SC'17 @ 3M
- Revolutionary project built by BitFury and Allied Control to cool 40+ MW of ASIC clusters [PDF]
- Oil cooling: Deep fried, or deep energy savings? @ ExtremeTech
Subject: General Tech | August 9, 2017 - 12:43 PM | Jeremy Hellstrom
Tagged: nvidia, autonomous vehicles, HPC
NVIDIA has previously shown their interest in providing the brains for autonomous vehicles, their Xavier chip is scheduled for release some time towards the end of the year. They are continuing their efforts to break into this market by investing in start ups in a program called GPU Ventures. Today DigiTimes reports that NVIDIA purchased a stake in a Chinese company called Tusimple which is developing autonomous trucks. The transportation of goods may not be as interesting to the average consumer as self driving cars but the market could be more lucrative; there are a lot of trucks on the roads of the world and they are unlikely to be replaced any time soon.
"Tusimple, a Beijing-based startup focused on developing autonomous trucks, has disclosed that Nvidia will make a strategic investment to take a 3% stake in the company. Nvidia's investment is part of a a Series B financing round, Tusimple indicated."
Here is some more Tech News from around the web:
- Microsoft launches Outlook.com beta because it's not Gmail or Yahoo @ The Inquirer
- Intel will unveil 8th-gen 'Coffee Lake' processors on 21 August @ The Inquirer
- Microsoft Dumps Notorious Chinese Secure Certificate Vendor @ Slashdot
- It's 2017 and Hyper-V can be pwned by a guest app, Windows by a search query, Office by... @ The Register
- Core-blimey! Intel's Core i9 18-core monster – the numbers @ The Register
Subject: Graphics Cards | June 27, 2017 - 06:51 PM | Jeremy Hellstrom
Tagged: Vega FE, Vega, HPC, amd
AMD have released their new HPC card, the Radeon Vega Frontier Edition, which Jim told you about earlier this week. The air cooled version is available now, with an MSRP of $999USD followed by a water-cooled edition arriving in Q3 with price tag of $1499.
The specs they list for the cards are impressive and compare favourably to NVIDIA's P100 which is the card AMD tested against, offering higher TFLOPS for both FP32 and FP16 operations though the memory bandwidth lags a little behind.
|Peak/Boost Clock||1600 MHz||1442 MHz|
|FP32 TFLOPS (SP)||13.1||10.3|
|FP64 TFLOPS (DP)||
|Memory Interface||1.89 Gb/s
|Memory Bandwidth||483 GB/s||716 GB/s|
|Memory Size||16GB HBC*||16GB|
|TDP||300 W air, 375 W water||235 W|
The memory size for the Vega is interesting, HBC is AMDs High Bandwidth Cache Controller which not only uses the memory cache more effectively but is able to reach out to other high performance system memory for help. AMD states that the Radeon Vega Frontier Edition has the capability of expanding traditional GPU memory to 256TB; perhaps allowing new texture mods for Skyrim or Fallout! Expect to see more detail on this feature once we can get our hands on a card to abuse, nicely of course.
AMD used the DeepBench Benchmark to provide comparative results, the AMD Vega FE system used a dual socketed system with Xeon E5 2640v4s @ 2.4Ghz 10C/20T, 32GB DDR4 per socket, on Ubuntu 16.04 LTS with ROCm 1.5, and OpenCL 1.2, the NVIDIA Tesla P100 system used the same hardware with the CuDNN 5.1, Driver 375.39 and Cuda version 8.0.61 drivers. Those tests showed the AMD system completing the benchmark in 88.7ms, the Tesla P100 completed in 133.1 ms, quite an impressive lead for AMD. Again, there will be much more information on performance once the Vega FE can be tested.
Read on to hear about the new card in AMD's own words, with links to their sites.
Subject: General Tech, Graphics Cards | May 27, 2017 - 12:18 AM | Tim Verry
Tagged: vision fund, softbank, nvidia, iot, HPC, ai
SoftBank, the Tokyo, Japan based Japanese telecom and internet technology company has reportedly quietly amassed a 4.9% stake in graphics chip giant NVIDIA. Bloomberg reports that SoftBank has carefully invested $4 billion into NVIDIA avoiding the need to get regulatory approval in the US by keeping its investment under 5% of the company. SoftBank has promised the current administration that it will invest $50 billion into US tech companies and it seems that NVIDIA is the first major part of that plan.
NVIDIA's Tesla V100 GPU.
Led by Chairman and CEO Masayoshi Son, SoftBank is not afraid to invest in technology companies it believes in with major past acquisitions and investments in companies like ARM Holdings, Sprint, Alibaba, and game company Supercell.
The $4 billion-dollar investment makes SoftBank the fourth largest shareholder in NVIDIA, which has seen the company’s stock rally from SoftBank’s purchases and vote of confidence. The (currently $93) $100 billion Vision Fund may also follow SoftBank’s lead in acquiring a stake in NVIDIA which is involved in graphics, HPC, AI, deep learning, and gaming.
Overall, this is good news for NVIDIA and its shareholders. I am curious what other plays SoftBank will make for US tech companies.
What are your thoughts on SoftBank investing heavily in NVIDIA?
Subject: General Tech, Memory, Storage | May 26, 2017 - 10:14 PM | Tim Verry
Tagged: XPoint, Intel, HPC, DIMM, 3D XPoint
Intel recently teased a bit of new information on its 3D XPoint DIMMs and launched its first public demonstration of the technology at the SAP Sapphire conference where SAP’s HANA in-memory data analytics software was shown working with the new “Intel persistent memory.” Slated to arrive in 2018, the new Intel DIMMs based on the 3D XPoint technology developed by Intel and Micron will work in systems alongside traditional DRAM to provide a pool of fast, low latency, and high density nonvolatile storage that is a middle ground between expensive DDR4 and cheaper NVMe SSDs and hard drives. When looking at the storage stack, the storage density increases along with latency as it gets further away from the CPU. The opposite is also true, as storage and memory gets closer to the processor, bandwidth increases, latency decreases, and costs increase per unit of storage. Intel is hoping to bridge the gap between system DRAM and PCI-E and SATA storage.
According to Intel, system RAM offers up 10 GB/s per channel and approximately 100 nanoseconds of latency. 3D XPoint DIMMs will offer 6 GB/s per channel and about 250 nanoseconds of latency. Below that is the 3D XPoint-based NVMe SSDs (e.g. Optane) on a PCI-E x4 bus where they max out the bandwidth of the bus at ~3.2 GB/s and 10 microseconds of latency. Intel claims that non XPoint NVMe NAND solid state drives have around 100 microsecomds of latency, and of course, it gets worse from there when you go to NAND-based SSDs or even hard drives hanging of the SATA bus.
Intel’s new XPoint DIMMs have persistent storage and will offer more capacity that will be possible and/or cost effective with DDR4 DRAM. In giving up some bandwidth and latency, enterprise users will be able to have a large pool of very fast storage for storing their databases and other latency and bandwidth sensitive workloads. Intel does note that there are security concerns with the XPoint DIMMs being nonvolatile in that an attacker with physical access could easily pull the DIMM and walk away with the data (it is at least theoretically possible to grab some data from RAM as well, but it will be much easier to grab the data from the XPoint sticks. Encryption and other security measures will need to be implemented to secure the data, both in use and at rest.
Interestingly, Intel is not positioning the XPoint DIMMs as a replacement for RAM, but instead as a supplement. RAM and XPoint DIMMs will be installed in different slots of the same system and the DDR4 RAM will be used for the OS and system critical applications while the XPoint pool of storage will be used for storing data that applications will work on much like a traditional RAM disk but without needing to load and save the data to a different medium for persistent storage and offering a lot more GBs for the money.
While XPoint is set to arrive next year along with Cascade Lake Xeons, it will likely be a couple of years before the technology takes off. Supporting it is going to require hardware and software support for the workstations and servers as well as developers willing to take advantage of it when writing their specialized applications. Fortunately, Intel started shipping the memory modules to its partners for testing earlier this year. It is an interesting technology and the DIMM solution and direct CPU interface will really let the 3D XPoint memory shine and reach its full potential. It will primarily be useful for the enterprise, scientific, and financial industries where there is a huge need for faster and lower latency storage that can accommodate massive (multiple terabyte+) data sets that continue to get larger and more complex. It is a technology that likely will not trickle down to consumers for a long time, but I will be ready when it does. In the meantime, I am eager to see what kinds of things it will enable the big data companies and researchers to do! Intel claims it will not only be useful at supporting massive in-memory databases and accelerating HPC workloads but for things like virtualization, private clouds, and software defined storage.
What are your thoughts on this new memory tier and the future of XPoint?
- Intel Has Started Shipping Optane Memory Modules
- Intel Optane Memory 32GB Review - Faster Than Lightning
- A Closer Look at Intel's Optane SSD DC P4800X Enterprise SSD Performance
Subject: Processors | March 7, 2017 - 09:02 AM | Tim Verry
Tagged: SoC, server, ryzen, opteron, Naples, HPC, amd
Over the summer, AMD introduced its Naples platform which is the server-focused implementation of the Zen microarchitecture in a SoC (System On a Chip) package. The company showed off a prototype dual socket Naples system and bits of information leaked onto the Internet, but for the most part news has been quiet on this front (whereas there were quite a few leaks of Ryzen which is AMD's desktop implementation of Zen).
The wait seems to be finally over, and AMD appears ready to talk more about Naples which will reportedly launch in the second quarter of this year (Q2'17) with full availability of processors and motherboards from OEMs and channel partners (e.g. system integrators) happening in the second half of 2017. Per AMD, "Naples" processors are SoCs with 32 cores and 64 threads that support 8 memory channels and a (theoretical) maximum of 2TB DDR4-2667. (Using the 16GB DIMMs available today, Naples support 256GB of DDR4 per socket.) Further, the Naples SoC features 64 PCI-E 3.0 lanes. Rumors also indicated that the SoC included support for sixteen 10GbE interfaces, but AMD has yet to confirm this or the number of SATA/SAS ports offered. AMD did say that Naples has an optimized cache structure for HPC compute and "dedicated security hardware" though it did not go into specifics. (The security hardware may be similar to the ARM TrustZone technology it has used in the past.)
Naples will be offered in single and dual socket designs with dual socket systems offering up 64 cores, 128 threads, 32 DDR4 DIMMs (512 GB using 16 GB modules) on 16 total memory channels with 21.3 GB/s per channel bandwidth (170.7 GB/s per SoC), 128 PCI-E 3.0 lanes, and an AMD Infinity Fabric interconnect between the two processor sockets.
AMD claims that its Naples platform offers up to 45% more cores, 122% more memory bandwidth, and 60% more I/O than its competition. For its internal comparison, AMD chose the Intel Xeon E5-2699A V4 which is the processor with highest core count that is intended for dual socket systems (there are E7s with more cores but those are in 4P systems). The Intel Xeon E5-2699A V4 system is a 14nm 22 core (44 thread) processor clocked at 2.4 GHz base to 3.6 GHz turbo with 55MB cache. It supports four channels of DDR4-2400 for a maximum bandwidth of 76.8 GB/s (19.2 GB/s per channel) as well as 40 PCI-E 3.0 lanes. A dual socket system with two of those Xeons features 44 cores, 88 threads, and a theoretical maximum of 1.54 TB of ECC RAM.
AMD's reference platform with two 32 core Naples SoCs and 512 GB DDR4 2400 MHz was purportedly 2.5x faster at the seismic analysis workload than the dual Xeon E5-2699A V4 OEM system with 1866 MHz DDR4. Curiously, when AMD compared a Naples reference platform with 44 cores enabled and running 1866 MHz memory to a similarly configured Intel system the Naples platform was twice as fast. It seems that the increased number of memory channels and memory bandwidth are really helping the Naples platform pull ahead in this workload.
AMD further claims that its Naples platform is more balanced and suited to cloud computing and scientific and HPC workloads than the competition. Specifically, Forrest Norrod the Senior Vice president and General Manager of AMD's Enterprise, Embedded, and Semi-Custom Business Unit stated:
“’Naples’ represents a completely new approach to supporting the massive processing requirements of the modern datacenter. This groundbreaking system-on-chip delivers the unique high-performance features required to address highly virtualized environments, massive data sets and new, emerging workloads.”
There is no word on pricing yet, but it should be competitive with Intel's offerings (the E5-2699A V4 is $4,938). AMD will reportedly be talking data center strategy and its upcoming products during the Open Compute Summit later this week, so hopefully there will be more information released at those presentations.
(My opinions follow)
This is one area where AMD needs to come out strong with support from motherboard manufacturers, system integrators, OEM partners, and OS and software validation to succeed. Intel is not likely to take AMD encroaching on its lucrative server market share lightly, and AMD is going to have a long road ahead of it to regain the market share it once had in this area, but it does have a decent architecture on its hands to build off of with Zen and if it can secure partner support Intel is certainly going to have competition here that it has not had to face in a long time. Intel and AMD competing over the data center market is a good thing, and as both companies bring new technology to market it will trickle down into the consumer level hardware. Naples' success in the data center could mean a profitable AMD with R&D money to push Zen as far as it can – so hopefully they can pull it off.
What are your thoughts on the Naples SoC and AMD's push into the server market?
- Zen and the Art of CPU Design
- AMD Zen Architecture Overview: Focus on Ryzen
- Dissecting AMD Zen Architecture - Interview with David Kanter