Introduction, How PCM Works, Reading, Writing, and Tweaks
I’ve seen a bit of flawed logic floating around related to discussions about 3D XPoint technology. Some are directly comparing the cost per die to NAND flash (you can’t - 3D XPoint likely has fewer fab steps than NAND - especially when compared with 3D NAND). Others are repeating a bunch of terminology and element names without taking the time to actually explain how it works, and far too many folks out there can't even pronounce it correctly (it's spoken 'cross-point'). My plan is to address as much of the confusion as I can with this article, and I hope you walk away understanding how XPoint and its underlying technologies (most likely) work. While we do not have absolute confirmation of the precise material compositions, there is a significant amount of evidence pointing to one particular set of technologies. With Optane Memory now out in the wild and purchasable by folks wielding electron microscopes and mass spectrometers, I have seen enough additional information come across to assume XPoint is, in fact, PCM based.
XPoint memory. Note the shape of the cell/selector structure. This will be significant later.
While we were initially told at the XPoint announcement event Q&A that the technology was not phase change based, there is overwhelming evidence to the contrary, and it is likely that Intel did not want to let the cat out of the bag too early. The funny thing about that is that both Intel and Micron were briefing on PCM-based memory developments five years earlier, and nearly everything about those briefings lines up perfectly with what appears to have ended up in the XPoint that we have today.
Some die-level performance characteristics of various memory types. source
The above figures were sourced from a 2011 paper and may be a bit dated, but they do a good job putting some actual numbers with the die-level performance of the various solid state memory technologies. We can also see where the ~1000x speed and ~1000x endurance comparisons with XPoint to NAND Flash came from. Now, of course, those performance characteristics do not directly translate to the performance of a complete SSD package containing those dies. Controller overhead and management must take their respective cuts, as is shown with the performance of the first generation XPoint SSD we saw come out of Intel:
The ‘bridging the gap’ Latency Percentile graph from our Intel SSD DC P4800X review.
(The P4800X comes in at 10us above).
There have been a few very vocal folks out there chanting 'not good enough', without the basic understanding that the first publicly available iteration of a new technology never represents its ultimate performance capabilities. It took NAND flash decades to make it into usable SSDs, and another decade before climbing to the performance levels we enjoy today. Time will tell if this holds true for XPoint, but given Micron's demos and our own observed performance of Intel's P4800X and Optane Memory SSDs, I'd argue that it is most certainly off to a good start!
A 3D XPoint die, submitted for your viewing pleasure (click for larger version).
Subject: Storage | February 15, 2017 - 08:58 PM | Allyn Malventano
Tagged: XPoint, ssd, Optane, memory, Intel, cache
We now have an actual Optane landing page on the Intel site that discusses the first iteration of 'Intel Optane Memory', which appears to be the 8000p Series that we covered last October and saw as an option on some upcoming Lenovo laptops. The site does not cover the upcoming enterprise parts like the 375GB P4800X, but instead, focuses on the far smaller 16GB and 32GB 'System Accelerator' M.2 modules.
Despite using only two lanes of PCIe 3.0, these modules turn in some impressive performance, but the capacities when using only one or two (16GB each) XPoint dies preclude an OS install. Instead, these will be used, presumably in combination with a newer form of Intel's Rapid Storage Technology driver, as a caching layer meant as an HDD accelerator:
While the random write performance and endurance of these parts blow any NAND-based SSD out of the water, the 2-lane bottleneck holds them back compared to high-end NVMe NAND SSDs, so we will likely see this first consumer iteration of Intel Optane Memory in OEM systems equipped with hard disks as their primary storage. A very quick 32GB caching layer should help speed things up considerably for the majority of typical buyers of these types of mobile and desktop systems, while still keeping the total cost below that for a decent capacity NAND SSD as primary storage. Hey, if you can't get every vendor to switch to pure SSD, at least you can speed up that spinning rust a bit, right?
Subject: General Tech | October 13, 2016 - 03:19 PM | Jeremy Hellstrom
Tagged: terahertz, research, memory
You have probably recently heard of terahertz radiation used to scan physical objects, be it the T-Rays at airports or the the researchers at MIT who are reading books through the covers. There is more recent of news on researchers utilizing the spectrum between frequencies of 0.3THz and 3THz, this time pertaining to RAM cycles and the possibility of increasing the speed at which RAM can flip between a 0 and 1. In theory a terahertz electric field could flip bits 1000 times faster than the electromagnetic process currently used in flash memory. This could also be used in the new prototype RAM technology we have seen, such as MRAM, PRAM or STT-RAM. This is still a long way off but a rather interesting read, especially if you can follow the links from The Inquirer to the Nature submission.
"Using the prototypical antiferromagnet thulium orthoferrite (TmFeO3), we demonstrate that resonant terahertz pumping of electronic orbital transitions modifies the magnetic anisotropy for ordered Fe3+ spins and triggers large-amplitude coherent spin oscillations," the researchers helpfully explained."
Here is some more Tech News from around the web:
- Smart Linux Home Hubs Mix IoT with AI @ Linux.com
- Apple tipped to launch new MacBooks on 27 October @ The Inquirer
- Shadow Warrior 2 Dev Says DRM Makes A Game Worse @ [H]ard|OCP
- Adobe on patch parade to march out 83 bugs @ The Register
- First look at Windows Server 2016: 'Cloud for the masses'? We'll be the judge of that @ The Register
- Shadow Warrior 2 Dev Says DRM Makes A Game Worse @ [H]ard|OCP
- Become Very Unpopular Very Fast With This DIY EMP Generator @ Hack a Day
Subject: Cases and Cooling, Memory | August 3, 2015 - 08:10 PM | Scott Michaud
Tagged: corsair, dd4, ddr3l, memory, PSU, hydro, h100, H100i GTX, H110, H110i GTX
Skylake is coming up, with rumors pointing to a release at Gamescom in Germany, which is August 5th through August 9th. Beyond seeing the retail packaging, we are beginning to see to companies open up about how their products relate to the new architecture and chipset.
Corsair put up a blog post a few days ago to explain how their memory, water coolers, and power supplies interact with Skylake and Z170. On the PSU side, nothing has changed since Haswell. In terms for memory, DDR3L is supported with Skylake under certain motherboards, but users should look to DDR4.
None of the above should be new information.
What might be new information, though, is that Skylake supports existing LGA-1150 cooler mounts. This means that the Corsair Hydro series of sealed CPU liquid coolers will support Skylake without modification. This is where Corsair's blog stops but, knowing Intel's typical release structure, this likely means that the story will not change for Kaby Lake or Cannonlake, either. These three architectures are expected to use the same socket, which should mean the cooler is the same too.
So your aftermarket cooler should have quite a bit of legs, even with the stock mounts.
Subject: Storage | July 28, 2015 - 12:41 PM | Allyn Malventano
Tagged: XPoint, non-volatile RAM, micron, memory, Intel
Everyone that reads SSD reviews knows that NAND Flash memory comes with advantages and disadvantages. The cost is relatively good as compared to RAM, and the data remains even with power removed (non-volatile), but there are penalties in the relatively slow programming (write) speeds. To help solve this, today Intel and Micron jointly launched a new type of memory technology.
XPoint (spoken 'cross point') is a new class of memory technology with some amazing characteristics. 10x the density (vs. DRAM), 1000x the speed, and most importantly, 1000x the endurance as compared to current NAND Flash technology.
128Gb XPoint memory dies, currently being made by Intel / Micron, are of a similar capacity to current generation NAND dies. This is impressive for a first generation part, especially since it is physically smaller than a current gen NAND die of the same capacity.
Intel stated that the method used to store the bits is vastly different from what is being used in NAND flash memory today. Intel stated that the 'whole cell' properties change as a bit is being programmed, and that the fundamental physics involved is different, and that it is writable in small amounts (NAND flash must be erased in large blocks). While they did not specifically state it, it looks to be phase change memory (*edit* at the Q&A Intel stated this is not Phase Change). The cost of this technology should end up falling somewhere between the cost of DRAM and NAND Flash.
3D XPoint memory is already being produced at the Intel / Micron Flash Technology plant at Lehi, Utah. We toured this facility a few years ago.
Intel and Micron stated that this technology is coming very soon. 2016 was stated as a launch year, and there was a wafer shown to us on stage:
You know I'm a sucker for good wafer / die photos. As soon as this session breaks I'll get a better shot!
There will be more analysis to follow on this exciting new technology, but for now I need to run to a Q&A meeting with the engineers who worked on it. Feel free to throw some questions in the comments and I'll answer what I can!
*edit* - here's a die shot:
Added note - this wafer was manufactured on a 20nm process, and consists of a 2-layer matrix. Future versions should scale with additional layers to achieve higher capacities.
Subject: Graphics Cards | May 19, 2015 - 03:51 PM | Jeremy Hellstrom
Tagged: memory, high bandwidth memory, hbm, Fiji, amd
Ryan and the rest of the crew here at PC Perspective are excited about AMD's new memory architecture and the fact that they will be first to market with it. However as any intelligent reader is wont to look for; a second opinion on the topic is worth finding. Look no further than The Tech Report who have also been briefed on AMD's new memory architecture. Read on to see what they learned from Joe Macri and their thoughts on the successor to GDDR5 and HBM2 which is already in the works.
"HBM is the next generation of memory for high-bandwidth applications like graphics, and AMD has helped usher it to market. Read on to find out more about HBM and what we've learned about the memory subsystem in AMD's next high-end GPU, code-named Fiji."
Here are some more Graphics Card articles from around the web:
- AMD HBM High Bandwidth Memory Technology Unveiled @ [H]ard|OCP
- Diamond Wireless Video Stream HD 1080P HDMI @ eTeknix
- KFA2 GeForce GTX 980 ‘8Pack Edition’ 4096MB @ Kitguru
- Gigabyte GTX 960 OC 2 GB @ techPowerUp
- eForce GTX TITAN X Video Card Review @ Hardware Secrets
High Bandwidth Memory
UPDATE: I have embedded an excerpt from our PC Perspective Podcast that discusses the HBM technology that you might want to check out in addition to the story below.
The chances are good that if you have been reading PC Perspective or almost any other website that focuses on GPU technologies for the past year, you have read the acronym HBM. You might have even seen its full name: high bandwidth memory. HBM is a new technology that aims to turn the ability for a processor (GPU, CPU, APU, etc.) to access memory upside down, almost literally. AMD has already publicly stated that its next generation flagship Radeon GPU will use HBM as part of its design, but it wasn’t until today that we could talk about what HBM actually offers to a high performance processor like Fiji. At its core HBM drastically changes how the memory interface works, how much power is required for it and what metrics we will use to compare competing memory architectures. AMD and its partners started working on HBM with the industry more than 7 years ago, and with the first retail product nearly ready to ship, it’s time to learn about HBM.
We got some time with AMD’s Joe Macri, Corporate Vice President and Product CTO, to talk about AMD’s move to HBM and how it will shift the direction of AMD products going forward.
The first step in understanding HBM is to understand why it’s needed in the first place. Current GPUs, including the AMD Radeon R9 290X and the NVIDIA GeForce GTX 980, utilize a memory technology known as GDDR5. This architecture has scaled well over the past several GPU generations but we are starting to enter the world of diminishing returns. Balancing memory performance and power consumption is always a tough battle; just ask ARM about it. On the desktop component side we have much larger power envelopes to work inside but the power curve that GDDR5 is on will soon hit a wall, if you plot it far enough into the future. The result will be either drastically higher power consuming graphics cards or stalling performance improvements of the graphics market – something we have not really seen in its history.
While it’s clearly possible that current and maybe even next generation GPU designs could still have depended on GDDR5 as the memory interface, the move to a different solution is needed for the future; AMD is just making the jump earlier than the rest of the industry.
Subject: Storage, Shows and Expos | September 10, 2014 - 03:34 PM | Allyn Malventano
Tagged: TSV, Through Silicon Via, memory, idf 2014, idf
If you're a general computer user, you might have never heard the term "Through Silicon Via". If you geek out on photos of chip dies and wafers, and how chips are assembled and packaged, you might have heard about it. Regardless of your current knowledge of TSV, it's about to be a thing that impacts all of you in the near future.
Let's go into a bit of background first. We're going to talk about how chips are packaged. Micron has an excellent video on the process here:
The part we are going to focus on appears at 1:31 in the above video:
This is how chip dies are currently connected to the outside world. The dies are stacked (four high in the above pic) and a machine has to individually wire them to a substrate, which in turn communicates with the rest of the system. As you might imagine, things get more complex with this process as you stack more and more dies on top of each other:
16 layer die stack, pic courtesy NovaChips
...so we have these microchips with extremely small features, but to connect them we are limited to a relatively bulky process (called package-on-package). Stacking these flat planes of storage is a tricky thing to do, and one would naturally want to limit how many of those wires you need to connect. The catch is that those wires also equate to available throughput from the device (i.e. one wire per bit of a data bus). So, just how can we improve this method and increase data bus widths, throughput, etc?
Before I answer that, let me lead up to it by showing how flash memory has just taken a leap in performance. Samsung has recently made the jump to VNAND:
By stacking flash memory cells vertically within a die, Samsung was able to make many advances in flash memory, simply because they had more room within each die. Because of the complexity of the process, they also had to revert back to an older (larger) feature size. That compromise meant that the capacity of each die is similar to current 2D NAND tech, but the bonus is speed, longevity, and power reduction advantages by using this new process.
I showed you the VNAND example because it bears a striking resemblance to what is now happening in the area of die stacking and packaging. Imagine if you could stack dies by punching holes straight through them and making the connections directly through the bottom of each die. As it turns out, that's actually a thing:
Subject: General Tech, Motherboards, Memory | July 6, 2014 - 03:53 AM | Scott Michaud
Tagged: overclocking, memory, gigabyte
About a week ago, HWBOT posted a video of a new DDR3 memory clock record which was apparently beaten the very next day after the movie was published. Tom's Hardware reported on the first of the two, allegedly performed by Gigabyte on their Z97X-SOC Force LN2 Motherboard. The Tom's Hardware article also, erroneously, lists the 2nd place overclock (then 1st place) at 4.56 GHz when it was really half that, because DDR is duplex (2.28 GHz). This team posted their video with a recording of the overclock being measured by an oscilloscope. This asserts that they did not mess with HWBOT.
The now first place team, which managed 2.31 GHz on the same motherboard, did not go to the same level of proof, as far as I can tell.
This is the 2nd fastest overclock...
... but the fastest to be recorded with an oscilloscope that I can tell
Before the machine crashes to a blue screen, the oscilloscope actually reports 2.29 GHz. I am not sure why they took 10 MHZ off, but I expect it is because the system crashed before HWBOT was able to record that higher frequency. Either way, 2.28 GHz was a new world record, and verified by a video, whether or not it was immediately beat.
Tom's Hardware also claims that liquid nitrogen was used to cool the system, which brings sense to why they would use an LN2 board. It could have been chosen just for its overclocking features, but that would have been a weird tradeoff. The LN2 board doesn't have mounting points for a CPU air or water cooler. The extra features would have been offset by the need to build a custom CPU cooler, to not use liquid nitrogen with. It is also unclear how the memory was cooled, whether it was, somehow, liquid nitrogen-cooled too, or if it was exposed to the air.
Subject: Memory, Storage | June 4, 2014 - 11:15 AM | Sebastian Peak
Tagged: ssd, solid state drive, pcie, pci-e ssd, memory, M.2, ddr4, computex 2014, computex, adata, 2tb ssd
ADATA has been showing off some upcoming products at Computex, and it's all about DRAM.
We'll begin with an upcoming line of PCIe Enterprise/Server SSDs powered by the SandForce SF3700-series controller. We've been waiting for products with the SF3700 controller since January, when ADATA showed a prototype board at CES, and ADATA is now showcasing the controller in the "SR1020" series drives.
The first is a 2TB 2.5" drive, but the interface was not announced (and the sample on the floor appeared to be an empty shell). The listed specs are performance up to 1800MB/s and 150K IOPS, with the drive powered by the SF-3739 controller. Support for both AHCI and NVMe is also listed, along with the usual TRIM, NCQ, and SMART support.
Another 2TB SSD was shown with exactly the same specs as the 2.5" version, but this one is built on the M.2 spec. The drive will connect via 4 lanes of Gen 2 PCI Express. Both drives in ADATA's SR1020 PCIe SSD lineup will be available in capacities from 240GB - 2TB, and retail pricing and availability is forthcoming.
Continuing the DRAM theme, ADATA also showed new DDR4 modules in commodity and enthusiast flavors. Both of the registered DIMMs on display (an ultra-low profile DIMM was also shown) had standard DDR4 specs of 2133MHz at 1.2V, but ADATA also showed some performance DDR4 at their booth.
A pair of XPG Z1 DDR4 modules in action
No pricing or availability just yet on these products.