Breaking: Intel and Micron announce 3D XPoint Technology - 1000x Faster Than NAND

Subject: Storage | July 28, 2015 - 12:41 PM |
Tagged: XPoint, non-volatile RAM, micron, memory, Intel

Everyone that reads SSD reviews knows that NAND Flash memory comes with advantages and disadvantages. The cost is relatively good as compared to RAM, and the data remains even with power removed (non-volatile), but there are penalties in the relatively slow programming (write) speeds. To help solve this, today Intel and Micron jointly launched a new type of memory technology.

View Full Size

XPoint (spoken 'cross point') is a new class of memory technology with some amazing characteristics. 10x the density (vs. DRAM), 1000x the speed, and most importantly, 1000x the endurance as compared to current NAND Flash technology.

View Full Size

128Gb XPoint memory dies, currently being made by Intel / Micron, are of a similar capacity to current generation NAND dies. This is impressive for a first generation part, especially since it is physically smaller than a current gen NAND die of the same capacity.

Intel stated that the method used to store the bits is vastly different from what is being used in NAND flash memory today. Intel stated that the 'whole cell' properties change as a bit is being programmed, and that the fundamental physics involved is different, and that it is writable in small amounts (NAND flash must be erased in large blocks). While they did not specifically state it, it looks to be phase change memory (*edit* at the Q&A Intel stated this is not Phase Change). The cost of this technology should end up falling somewhere between the cost of DRAM and NAND Flash.

View Full Size

3D XPoint memory is already being produced at the Intel / Micron Flash Technology plant at Lehi, Utah. We toured this facility a few years ago.

Intel and Micron stated that this technology is coming very soon. 2016 was stated as a launch year, and there was a wafer shown to us on stage:

View Full Size

You know I'm a sucker for good wafer / die photos. As soon as this session breaks I'll get a better shot!

There will be more analysis to follow on this exciting new technology, but for now I need to run to a Q&A meeting with the engineers who worked on it. Feel free to throw some questions in the comments and I'll answer what I can!

*edit* - here's a die shot:

View Full Size

Added note - this wafer was manufactured on a 20nm process, and consists of a 2-layer matrix. Future versions should scale with additional layers to achieve higher capacities.

Press blast after the break.

SANTA CLARA, Calif. & BOISE, Idaho--(BUSINESS WIRE)--Intel Corporation and Micron Technology, Inc. today unveiled 3D XPoint™ technology, a non-volatile memory that has the potential to revolutionize any device, application or service that benefits from fast access to large sets of data. Now in production, 3D XPoint technology is a major breakthrough in memory process technology and the first new memory category since the introduction of NAND flash in 1989.

The explosion of connected devices and digital services is generating massive amounts of new data. To make this data useful, it must be stored and analyzed very quickly, creating challenges for service providers and system builders who must balance cost, power and performance trade-offs when they design memory and storage solutions. 3D XPoint technology combines the performance, density, power, non-volatility and cost advantages of all available memory technologies on the market today. The technology is up to 1,000 times faster and has up to 1,000 times greater endurance3 than NAND, and is 10 times denser than conventional memory.

“For decades, the industry has searched for ways to reduce the lag time between the processor and data to allow much faster analysis,” said Rob Crooke, senior vice president and general manager of Intel’s Non-Volatile Memory Solutions Group. “This new class of non-volatile memory achieves this goal and brings game-changing performance to memory and storage solutions.”

“One of the most significant hurdles in modern computing is the time it takes the processor to reach data on long-term storage,” said Mark Adams, president of Micron. “This new class of non-volatile memory is a revolutionary technology that allows for quick access to enormous data sets and enables entirely new applications.”

As the digital world quickly grows – from 4.4 zettabytes of digital data created in 2013 to an expected 44 zettabytes by 20204 – 3D XPoint technology can turn this immense amount of data into valuable information in nanoseconds. For example, retailers may use 3D XPoint technology to more quickly identify fraud detection patterns in financial transactions; healthcare researchers could process and analyze larger data sets in real time, accelerating complex tasks such as genetic analysis and disease tracking.

The performance benefits of 3D XPoint technology could also enhance the PC experience, allowing consumers to enjoy faster interactive social media and collaboration as well as more immersive gaming experiences. The non-volatile nature of the technology also makes it a great choice for a variety of low-latency storage applications since data is not erased when the device is powered off.

New Recipe, Architecture for Breakthrough Memory Technology

Following more than a decade of research and development, 3D XPoint technology was built from the ground up to address the need for non-volatile, high-performance, high-endurance and high-capacity storage and memory at an affordable cost. It ushers in a new class of non-volatile memory that significantly reduces latencies, allowing much more data to be stored close to the processor and accessed at speeds previously impossible for non-volatile storage.

The innovative, transistor-less cross point architecture creates a three-dimensional checkerboard where memory cells sit at the intersection of word lines and bit lines, allowing the cells to be addressed individually. As a result, data can be written and read in small sizes, leading to faster and more efficient read/write processes.

More details about 3D XPoint technology include:

  • Cross Point Array Structure – Perpendicular conductors connect 128 billion densely packed memory cells. Each memory cell stores a single bit of data. This compact structure results in high performance and high-density bits.
  • Stackable – In addition to the tight cross point array structure, memory cells are stacked in multiple layers. The initial technology stores 128Gb per die across two memory layers. Future generations of this technology can increase the number of memory layers, in addition to traditional lithographic pitch scaling, further improving system capacities.
  • Selector – Memory cells are accessed and written or read by varying the amount of voltage sent to each selector. This eliminates the need for transistors, increasing capacity while reducing cost.
  • Fast Switching Cell – With a small cell size, fast switching selector, low-latency cross point array and fast write algorithm, the cell is able to switch states faster than any existing non-volatile memory technology today.

3D XPoint technology will sample later this year with select customers, and Intel and Micron are developing individual products based on the technology.

Source: Intel

July 28, 2015 | 12:50 PM - Posted by Jack Pearson (not verified)

What is the read speed? What is the write speed? Numbers.

July 28, 2015 | 01:00 PM - Posted by Allyn Malventano

Well if you compare with typical NAND numbers, that works out to ~20 GB/sec write speed and 200GB/sec read speed (per die).

July 28, 2015 | 01:05 PM - Posted by aaguilar

I'm guessing that is theoretical speeds, right?


They did not mention anything about pricing?

July 28, 2015 | 02:43 PM - Posted by Allyn Malventano

'Between NAND and DRAM'

July 28, 2015 | 03:27 PM - Posted by aaguilar

wow I didn't saw that haha, thanks

July 28, 2015 | 02:44 PM - Posted by Simon Zerafa (not verified)

Hi Allyn,

So which PC interface would a storage device using this type of medium connect too to get best performance? :-)

Kind Regards

Simon Zerafa
--

July 28, 2015 | 02:51 PM - Posted by Allyn Malventano

PCIe or higher, but it could replace the DRAM / cache on a SATA SSD. 

July 28, 2015 | 12:56 PM - Posted by aaguilar

I think there is an error in the year "As the digital world quickly grows – from 4.4 zettabytes of digital data created in 2013 to an expected 44 zettabytes by 20204 "

July 28, 2015 | 01:07 PM - Posted by Jeremy Hellstrom

https://www.youtube.com/watch?v=RejrrkrRShM

July 28, 2015 | 12:59 PM - Posted by unacomn (not verified)

"expected 44 zettabytes by 20204 "

Then why the rush, we've got plenty of time to make new storage tech. Bring back IDE drives!

July 28, 2015 | 04:50 PM - Posted by Anonymous (not verified)

That's about when I expect phase change memory to come to market.

July 28, 2015 | 01:01 PM - Posted by Ashphalt (not verified)

I wonder if the new Intel 750 series ssd's had some of this new technology built into it?

July 28, 2015 | 02:03 PM - Posted by Ashphalt (not verified)

I guess it does in a way, The Series 750 Intel SSD uses Intel 20 nm 128 Gbit MLC NAND.
Multi Level Cell

July 28, 2015 | 02:45 PM - Posted by Allyn Malventano

This is radically different than what is in a 750. 

July 28, 2015 | 02:52 PM - Posted by Ashphalt (not verified)

What's in a 750 allyn?
I got that from guru3d http://www.guru3d.com/articles-pages/intel-750-nvme-1-2-tb-pcie-ssd-revi...

July 28, 2015 | 03:28 PM - Posted by Allyn Malventano

The 750 is planar 20nm NAND. 

July 28, 2015 | 01:03 PM - Posted by Anonymous (not verified)

It's about time! PCM was hinted at years ago!

The cost will be somewhere "between DRAM and NAND". So anywhere between $0.30/GB and $10.00/GB... haha

July 28, 2015 | 01:12 PM - Posted by Allyn Malventano

From the sounds of it, this will remain an in-between and is not meant to replace the recently announced IMFT 3D NAND.

July 28, 2015 | 01:13 PM - Posted by Socket69

With 10x the density, this sounds like something that could be placed on the cpu package the way HBM is on the Fury/FuryX?

July 28, 2015 | 01:43 PM - Posted by BillDStrong

You wouldn't want to use it like that, as it still has max write limits, and RAM is written in ways similar to what a defragmented does to an SSD.

Now, that approach is what HP's The Machine is going for, but you notice they are talking up the need for a whole new OS to deal with that type of memory architecture.

July 28, 2015 | 05:22 PM - Posted by Anonymous (not verified)

If it has durability such that it would last more than 10 years under the use conditions, then it may be acceptable to integrate it in package or on board. Intel doesn't seem to be going for silicon interposer technology though. The HMC interconnect is meant to go through standard PCBs, so this would be a candidate for NVDIMM, although I would expect them to use the HMC interface. You probably still wouldn't want to treat it like DRAM though. I am not exactly sure what they are going to be using to access things like this and NVDIMM type things in general. Current systems can handle this by treating it as if it is a disk, but accessing it through the NVMe software stack for speed. Most systems are optimized to avoid disk read/write anyway, since it is so slow. That isn't the most efficient way of using this tech though. It would treat it like swap, which would limit writes only to pages which must be swapped out, but it would not make that good of use of the read speed. For applications with large uncacheable memory footprints (large hash tables, databases, etc) you would probably want direct read access without caching in DRAM.

There are a lot of applications where having a large, high-speed, but limited write memory system could produce significant speed-ups. Maybe we will get real-time ray-tracing and other things which could not be done on current memory architectures. If you have that high of speed random access then you may be able to write all of the data needed for a scene into this new memory and have fast enough random access to allow ray-tracing. Ray-tracing has been difficult since it requires random access to the entire scene data. There are a lot of other workstation/server applications which could be significantly accelerated by having such a large pool of random access memory. Anything using large hash tables would see significant benefits. Hash tables are often the only option to allow random access at a reasonable speed in large data structures. The problem with this is that hash tables are inherently uncacheable.

July 29, 2015 | 09:06 AM - Posted by BlackDove (not verified)

Intel is using EMIB instead of interposers for on package HMC.

July 29, 2015 | 02:10 PM - Posted by Anonymous (not verified)

Interesting. I had not heard of Embedded Multi-die Interconnect Bridge before. You don't need TSV for the die-to-die connections, but that doesn't actually seem to be an issue with silicon interposers. The interposer is made on an older process, so the cost of the interposer doesn't seem like it would be an issue. Not all of the micro-bumps on an interposer are used for interconnect. Intel obviously needs something to compete with silicon interposer technogy. Using HMC stacks connected by standard board level interconnect is not going to compete with HBM on a silicon interposer. It will take significantly more power, more die area for the interfaces, and it would be more expensive. It could allow significantly more memory on the PCB, but that may only be a short term advantage. HMC interconnect was designed for running through a PCB though. With interconnect through these embedded interconnect bridge chips, the complicated serialized interface used by HMC stacks will be unnecessary.

This makes EMIB look like a stop-gap solution brought about in response to HBM on silicon interposers. They may need to cut out the serialized interfaces since they should be unnecessary with an embedded bridge die. This would make the resulting memory stacks almost the same as HBM. Intel always has to go their own way (not invented here mentality). This was looking like the situation with AMD K8 again. AMD had the more advanced system architecture for a while with on-die memory controllers and point-to-point serial interconnects; AMD64 helped also. Intel was forced to follow, but of course they had to push their own tech which was almost the same rather than just adopting existing standards. AFAIK, HMC isn't really meant to be used via a removable socket or slot. If it can be used for memory modules then it would be good for expandability; perhaps for connecting Intel's new non-volatile memory.

July 30, 2015 | 08:33 PM - Posted by BlackDove (not verified)

Intel could make their own interposers, probably a lot easier than anyone else, since they have plenty of fabs.

EMIB offers a second type of cost advantage: reconfigurability. Altera(now part of Intel) uses EMIB for their FPGA SiPs because you can quickly reconfigure a mixed process node and even mixed signal SiP with EMIB, rather than designing and fabbing a new interposer. Thats where you get HUGE cost savings.

And actually HMC has been in use in actual computers since 2014. Fujitsu Primehpc FX100 to be precise.

August 3, 2015 | 07:22 PM - Posted by Anonymous (not verified)

EMIB looks like interesting technology, but it seems like you are exaggerating the differences. The current silicon interposers are passive devices built on an old process tech. They have a comparatively small number of layers since they do not have any transistors. They have TSVs and a few layers of metal interconnect. These interconnects are also probably very large compared to the process tech; that is, they probably could have produced the same device on an even larger process node. This doesn't seem like that much of a design challenge to me. Mask are probably significantly cheaper to make compared to mask fora cutting edge process tech.

This EMIB tech has some flexibility, but if you want to change the interconnect, you still need to fab a new embedded interconnect chip which means new mask and everything else associated spinning a new piece of silicon, just like with a silicon interposer. Also, the "where needed" for the micro-bumps on EMIB doesn't seem to be particularly relevant. AFAIK, the micro-bumps are created over the entire wafer for the silicon interposer tech. The number of micro-bumps that could be used is very large; I have heard possibly in the hundreds of thousands. Most current products do not actually use that many of them to carry signals, but there is no reason not to create all of them. The bumps not used for signals still offer mechanical support and thermal conductivity. Using a large number of them would decrease yeilds, but right now, AMD seems to be using a relatively small number of them. The large number possible allows for redundancy; a defective bump can be routed around to some extent.

I am also suspicious that they would use HMC type memory with EMIB. HMC was designed to be routed through a board. HMC uses a high speed, narrow (8 or 16-bit), serialized interface. These interfaces will take a lot of die area and power compared to what is possible with silicon interposers and EMIB also. There would be little reason to run an HMC interface over an EMIB link. This means that we will probably get another interface that is almost the same as HBM for use with EMIB. It would be great if Intel could just adopt HBM, but unfortunately, they didn't invent it. I have seen your post on HMC before, so I know that you are proponent for some reason. Realistically, HBM and HMC are not direct competitors. I don't see HMC as a useful interface for EMIB type of connectivity unless EMIB is significantly worse than a silicon interposer. HMC is still a good replacement for board connected devices like lots of DRAM or NVRAM. If you have a lower-end system with an APU on an interposer or an EMIB package with a large amount of memory then there isn't going to be much reason to have fast external memory. A flash device on PCIe with NVMe would probably be sufficient. This may limit HMC to higher end systems that need a large amount of fast memory like severs and HPC systems.

July 30, 2015 | 04:17 PM - Posted by Jabbadap (not verified)

Hmm where does intel uses HMC(is there server side xeon motherboards that uses it)? Knight Landing has a HMC like especially for xeon phi platform specialized nonstandard custom memories.

July 30, 2015 | 08:35 PM - Posted by BlackDove (not verified)

Intel only currently has it in Knights Landing. Purley Xeons have some mystery memory in addition to DDR4 but i think thats going to be XPoint not HMC.

July 31, 2015 | 06:47 AM - Posted by Jabbadap (not verified)

Well knights landing does not use pure hmc(link removed due spam filter...):

Is this high-performance, on-package memory the same as HMC?
While leveraging the same fundamental technology benefits of HMC, this high-performance on-package memory has been optimized for integration into Knights Landing platforms.

Intel does have hmc implementations too, but I don't know are there any commercial products released in the market(hpc maybe?). After all micron and intel are quite close collaboration with these techs.

July 31, 2015 | 12:30 PM - Posted by BlackDove (not verified)

I know its not plain HMC. Intel calls it MCDRAM in Knights Landing and,uses the EMIB package. The underlying tech is HMC though and its not radically different than say Fujitsu's off package HMC.

Its on the package with the CPU and the CPU can address it either as really fast RAM or a massive cache depending on what the programmer needs.

Today, there is only Knights Landing, which is probably only in the hands of a few systems integrators for supercomputers at the moment, but the chips currently exist.

August 3, 2015 | 07:51 PM - Posted by Anonymous (not verified)

I would hope that the interface is radically different; the HMC interface makes no sense over such interconnect.

July 28, 2015 | 01:38 PM - Posted by Daniel Masterson (not verified)

That facility is just a short 20 minute drive from my house. I have even been there a couple of times. Really cool.

July 28, 2015 | 01:47 PM - Posted by Mike S. (not verified)

Sounds like this means the end of memory as we know it in a PC. Is DDR4 going to be the last generation of RAM?

July 28, 2015 | 02:47 PM - Posted by Allyn Malventano

1000x faster than NAND is still a good deal slower than DRAM. It's an excellent in-between solution, though. Think a hybrid SSD with a large amount of NAND and a small amount of XPoint.

July 28, 2015 | 04:32 PM - Posted by Mike S. (not verified)

I dunno. They're saying the access times are in nanoseconds, which is way faster than the miliseconds in latency that you see in DRAM.

July 28, 2015 | 05:42 PM - Posted by serpico (not verified)

DRAM latencies are measured in nanoseconds, not milliseconds.

July 28, 2015 | 05:50 PM - Posted by Anonymous (not verified)

DRAM access latencies are in nanoseconds. Looking at the Wikipedia article on memory latency, it indicates that the first word latency is on the order of 10 ns or so. An ssd might be around 0.1 ms random access latency. The difference between a millisecond (1 thousandth of a second) and a nanosecond (1 billionth of a second) is a factor of 1,000,000. Given these very rough calculations, we still have factor of at least 10,000 between DRAM and flash memory. If you make some new types that is a 1000x faster than flash, then it will still be at least 10 to 100x slower than DRAM. Since we are placing DRAM much closer to the processor with things like HBM and HMC, the need for system ram is actually decreased significantly. If you have an APU with 16 or 32 GB of HBM or HMC DRAM, then would you still need to connect some DDR4 to such a chip? Probably not for many applications. If this makes it to market and is "as advertised" then it could change the memory hierarchy significantly. You may see DRAM integrated in/on package with the next level out being this new non-volatile storage, and then flash and/or disk. You wouldn't use memory modules as we currently do.

July 29, 2015 | 09:52 AM - Posted by BlackDove (not verified)

I doubt that youll see it on APUs first. Youll see it replace burst buffers in supercomputers, memristors in The Machine, and added to the Purley Skylake E7 Xeons and probably Knights Hill as well.

Maybe by 2018-2020 youll have some of this on a PCI-E(or similar interface) card as a burst buffer for a PC but expect it to be very expensive.

Possibly, you could see hybrid DIMMs(already demonstrated with other flash) with this if the controllers on the CPU are there.

July 29, 2015 | 02:36 PM - Posted by Anonymous (not verified)

I was using APUs to illustrate the change in system architecture. Once we have APUs (or whatever you want to call CPUs with integrated GPUs) with HBM or HMC, then most systems will not really need DRAM modules as we currently use them. Next generation HBM will probably Allow 16 GB of more to be integrated on package. A PCIe connected flash device accessed vie NVMe would probably be fast enough for this change in system architecture, but flash would suffer from durability issues if used as swap in this manner.

If this new memory is in between DRAM and flash in cost, then I don't see why it would not be used to replace system memory modules. DRAM modules would be a waste to use with a processor that has large amounts of on package memory. The swap space would not need to be anywhere near as fast as current DRAM modules. Ideal solution would probably be both. You would have a processor with on package memory, and then a pool of both DRAM and non-volatile storage connected through a similar, if not the same, interface. With stacked memory chips, it would be easy to build a very large DRAM module. Perhaps we could have systems with a 64 GB DRAM module and a 64 GB NVRAM module, or just a combined write caching module (only write to NVRAM on power loss or system suspend). If this tech finally makes it to market, then it will allow a lot of changes to system architecture. It will take a bit for form factors and standards to settle down though, so buying on the bleeding edge may be risky.

July 30, 2015 | 10:39 AM - Posted by BlackDove (not verified)

The form factors and standards will probably vary greatly and like you said, take a long time to settle down.

For desktops, it might fake a while for karge amounts of DRAM to be on package, since most desktops dont need to be nearly as densely packaged as servers and supercomputers do.

The Skylake Xeon Purley platform will probably be the first time people see this stuff used and i dont know how they intend to package it.

Since each E7 Xeon will likely be able to address more than a terabyte of DRAM i would imagine that you could see SEVERAL TB of this per CPU in a server. Same thing with Xeon Phi. Knights Hill might have 16GB HMC on package, 384GB DDR4 DIMMs and a TB of this, if it all can be addressed efficiently.

For a desktop, id imagine that DIMMs like the UniDIMM will be around for a while as well, since not everyone needs the same amount of RAM. Having this on a hybrid non-ECC desktop DIMM might be too risky financially, so i think a PCI-E interface is likely, but it might saturate a lot of PCI-E lanes depending on the kind of bandwidth it can achieve.

In the next few years we will probably see a lot more work done on making CPU to RAM and NVRAM and CPU to GPU communication much higher bandwidth and lower latency since thats really whats keeping exascale from being achieveable sooner.

August 3, 2015 | 07:48 PM - Posted by Anonymous (not verified)

You seem to be an Intel person; I have seen you bring up HMC a lot, even where it is not really relevant to the topic. Why? Do you have some connection to HMC design? I could see HMC being relevant here if it can be used to connect xpoint memory. I don't know if we want xpoint soldered to the board yet, so a removable module would be best, but I don't know if the HMC interface is made to work through a socket or slot.

Regardless of what Intel is doing, If rumors are true, we may see a silicon interposer with many AMD Zen cores, a powerful GPU, and a lot of HBM in the next year or two. This could still be called an APU but it isn't going to be low-end. It is an HPC device. As I stated in a post above, I don't see HMC interfaces being used with EMIB. I post as anonymous since it seems to prevent flame wars to some extent. A personal attack on anonymous just looks stupid. Anyway, HMC interfaces would be a massive waste of power and die area if used via EMIB. EMIB seems like would allow almost the exact same type of connections as a silicon interposer which will favor wide, lower clocked interfaces. This is the exact opposite of the interface used for HMC. I suspect Intel will invent essentially an HBM clone for use over EMIB. it would still be stacked memory, but it would not really be HMC.

Also, I wouldn't ignore the NVidia and IBM partnership. A pascal gpu with HBM2 connected with a high speed interface to a Power architecture CPU will make a powerful HPC machine in a tiny space. With all of the things going on in the semiconductor industry, it seems like a lot of these companies are banding together to prevent Intel from becoming a total monopoly. With the cost of process tech, this was inevitable. There is a lot of disruptive technology coming out soon so there could be some big shake-ups.

August 3, 2015 | 08:26 PM - Posted by BlackDove (not verified)

Im totally buying GP100 for my desktop. That doesnt mean it will be better than Knights Landing in pre exascale supercomputers though.

As for EMIB, i was under the impression that the HMC based MCDRAM on Knights Landing is on EMIB. Is it not?

And i agree that both KNL and GP100 will make excellent supercomputers. The hundred million dollar contracts for Sierra, Summit and Aurora means the government must agree that BOTH architectures are good.

That being said, im pretty sure that Fujitsus SPARC XIfx and their Primehpc FX100 is CURRENTLY the best architecture for pre exascale.

I also think that the successor to NECs SX-ACE will be as well.

Theyre great in HPCG and the Japanese have a history of making the BEST supercomputers, if not always at the top of the increasingly irrelevant Top500.

Graph500 performance compared to Rpeak is pretty important and currently the ancient K and Earth Simulator 3(SX-ACE) are among the most efficient there.

Now im rambling, but i think that the Japanese are quietly so far ahead, that around 2020 theyll hit the USEFUL exaflop first.

Tadashi Watanabe will probably have a hand in making that happen and i predict that Fujitsu will be the integrator. I think Fujitsu is ahead of everyone else including NEC.

IF Fujitsu can get their TOFU to use even more optical links(its already partially) with silicon photonics then theyll beat Intel, Nvidia and AMD to exascale.

July 28, 2015 | 02:42 PM - Posted by Randal_46

How fast is this new prototype vs. present day DRAM?

July 28, 2015 | 06:08 PM - Posted by Anonymous (not verified)

Not as fast as DRAM, but damn close according to the Register article, and most likely first used for enterprise, HPC, and OEM phones and such devices, before maybe the consumer sees any affordable SSD type products in consumer market other than in OEM Phone/tablet devices.

At least it may mean that those expensive enterprise SSD made with only SLC NAND will be more affordable having been upstaged in the professional markets by 3D XPoint memory.

I'd love to see consumer motherboards offered with this 3d XPoint in a motherboard module(replaceable), maybe 60GB of 3D XPoint to host the OS and the paging files. Maybe even integrated into DRAM modules along side the DRAM dies, with maybe a little extra 3D Xpoint as a extra storage buffer for allowing some faster staging of data that is about to be needed, and as a write through backup for DRAM should there be a power interruption. Even 3D XPoint used as a faster cache level on SSD paired with normal NAND(No TLC allowed) for improved boot up, and faster OS responsiveness.

How something like 3D Xpoint and HBM can be utilized on systems to make everything less latency affected and more responsive, and less affected by the occasional loss of power.

July 29, 2015 | 09:02 AM - Posted by BlackDove (not verified)

This is going to replace burst buffers in supercomputers.

July 29, 2015 | 05:43 PM - Posted by Anonymous (not verified)

Burst buffers, in supercomputers, and maybe high end tablets and Phones will get some 3D XPoint to supplement their RAM, it wouldn't hurt to have some 4GB DRAM based tablets with maybe 8 or more GB of XPoint to go along with 64GB and higher amounts of slower NAND. I'd love a tablet that could maybe have at least 4GB of DRAM, and 16GB of XPoint For the OS, and pageing files, with regular MLC NAND making up the rest of the storage pool, no TLC NAND please! For tablet devices most certainly XPoint cache mixed with regular NAND and at least the OS and paging files could support preemptive multitasking more efficiently, and XPoint is denser than DRAM but not as fast, not an unworkable tradeoff in any case for mobile devices benefiting from XPoint. As far as lasting longer than NAND flash, XPoint is word addressable and lacks the write amplification extra wear and tear caused by needing the usual NAND flash block writes. So XPoint is happy with DRAM types of addressing without all the read whole blocks just to write to that very same block a smaller amount of data than a full block size. The overall latency has to be much better also with XPoint drivers/controllers not having to worry about the management of blocks, and the other hoops that NAND based controllers have to jump through just to store/retrieve data.

August 3, 2015 | 07:53 PM - Posted by Anonymous (not verified)

If it will be priced in between DRAM and flash, then I don't see why it wouldn't gain wider use rather quickly.

July 28, 2015 | 02:43 PM - Posted by Anonymous (not verified)

Damn like magnetic core memory without the donuts. So long TLC NAND, Smell You Later! I'm just wondering what states of matter they are using for storage, magnetic, spintronic, Etc.

July 28, 2015 | 02:49 PM - Posted by Allyn Malventano

We don't have details, but it is a resistance changing material and will only be implemented in 1 bit per 'cell' in NAND terms - SLC only. 

July 29, 2015 | 03:06 PM - Posted by Anonymous (not verified)

New article at The Register(7/29) goes into greater detail about just what Intel and Micron may be using and where Intel and Micron obtained some of the technology, some licensing source was mentioned. A lot of new technology and licensing comes from Academic research licensing, and there may be some of that, and other related technology in 3D XPoint.

July 29, 2015 | 03:15 PM - Posted by Anonymous (not verified)

"Peering closer at 3D XPoint memory: What are Intel, Micron up to?"(1)

(1)This is the name of the Register article, PCPER's filter is blocking the web link!

August 1, 2015 | 01:44 PM - Posted by Bill In Alabama

I also thought of core memory when I heard of the description. Not losing the state of bits when there's no power can cause other problems. One of the first programs a programming student wrote in those days was to clear core. In those days the computers I worked with had either 4K or 8K words of core. The facility where where I worked had a supplier nearby that manufactured core memory. We called it the core house. It was a sad day for them when semiconductor memory took over.

July 28, 2015 | 03:00 PM - Posted by Jabbadap (not verified)

So basically they announced a flash memory, which they can put to nvdimms.

But they say "it's not electron based but material based", so phonons maybe?

July 28, 2015 | 03:29 PM - Posted by Allyn Malventano

Be careful with your use of 'flash' - that's not what this new technology is. It appears closer to phase change than it is to flash. Not photonics either. It's a resistance change. 

July 28, 2015 | 04:51 PM - Posted by Jabbadap (not verified)

Yeah I should have said memory chips not flash memory, which this xpoint memory isn't.

I don't think you would need to mess with photonics when you are dealing with phonons(Uhh it's have been ages when I studied quantum mechanics, so I could very well be wrong). Kind of remember that phonon had something to do with material sand conductivity.

July 28, 2015 | 03:55 PM - Posted by BBMan (not verified)

Sounds to me like they are going to need a new bus.

July 28, 2015 | 04:48 PM - Posted by Anonymous (not verified)

Considering it is from Intel/Micron, I would expect it to use Hybrid Memory Cube style interconnect.

July 28, 2015 | 04:59 PM - Posted by MRFS (not verified)

Thanks for the timely coverage, Allyn.
I have no doubts that a very fast Non-Volatile RAM
will enable several revolutions in system design.
Think INSTANT-ON PCs: "switch ON Desktop appears"
(about as long as it took you to read those 4 words).

MRFS

July 28, 2015 | 06:38 PM - Posted by Frans (not verified)

Sounds like this is scraming for a new memory architecture where DRAM is demoted to cache and this new 'XRAM' becomes the main memory? Bye bye SSD/Flash...

July 29, 2015 | 02:50 PM - Posted by Anonymous (not verified)

If you already have a lot if DRAM on package, then there isn't much need for some extra modules. This stuff would still have some durability limitations though, so we may still see a system memory modules with DRAM. Flash will probably still have a place, although it may not be directly in the memory hierarchy. That is, it may not be used as swap; it may end up as just mass storage. It was questionable to use it as swap anyway, considering the durability. They can scale the capacity of flash significantly by using 3D flash cells and chip stacking so it may actually take over from hard drives eventually. They need to optimize it for long term storage in that case.

July 28, 2015 | 06:48 PM - Posted by John H (not verified)

The wording is curious

'10X denser than conventional memory', yet the 1000x claims specifically state 'NAND'.

This sounds like the density is 10x of regular DRAM and not NAND flash..

This sounds like it could actually be less dense than NAND?

July 28, 2015 | 08:36 PM - Posted by Pixy Misa (not verified)

They've announced a 128Gbit die, so it's in the same density ballpark as NAND and well ahead of DRAM.

July 28, 2015 | 10:13 PM - Posted by Allyn Malventano

It does appear in the same ballpark, but these dies looked slightly larger than IMFT 20nm planar dies of the same capacity. Planar can't scale vertically like this can though.

July 30, 2015 | 01:38 AM - Posted by Master Chen (not verified)

NAND's faster than DRAM.
3DXP (at least it's current form...it may evolve even further with upcoming 10nm and 7nm shrinks) takes same footprint as DRAM, but SDXP's density way higher than DRAM's. 1GB DRAM = 9GB 3DXP while same footprint. Not only this is going to lower production costs significantly, but this also potentially paves the way for such monstrosities as 36GB video cards (be those professional solutions or not) and 512GB memory kits where just one memory module stick has a whopping 128GB of memory on it's board WHILE it also has quite low timings and operates at very low voltages.

July 30, 2015 | 01:51 AM - Posted by Master Chen (not verified)

Oops, It seems that I've 'dun goofed there a bit. Obviously NAND's not faster than DRAM. Sorry.

July 28, 2015 | 07:48 PM - Posted by pollcats38

so I think the warranty should be about forty years?

July 28, 2015 | 09:28 PM - Posted by Anonymous (not verified)

Intel/Micron did not answer the question regarding PCM. Here is their response:

"Relative to phase change, which has been in the market place before and which Micron itself has some experience with in the past again this is a very different architecture in terms of the place it fills in the memory hierarchy because it has these dramatic improvements in speed and volatility and performance"

July 28, 2015 | 10:16 PM - Posted by Allyn Malventano

The bit storage itself may be very similar to PCM (not confirmed), but Intel and Micron had to solve a lot of other problems inherent in the stuff we've been reading about in previous years. Those problems are what kept the previous methods from being viable. 

July 29, 2015 | 01:23 AM - Posted by Hakuren

Ahhh the crucial word is 'announce'. Sadly by the time it gets mainstream You Allyn and I will be probably 65. :D

Seriously tho. Can only hope for speedy release when you look at the specs, but I remember when HAMR drives were hyped to be released ~2015 and it's only laboratory testing in 2015. Still it looks exceptionally promising as we really need to send classic SSDs & HDDs into dustbin of history where they belong. And if the price+capacity is right it may be final nail in HDD coffin and HAMR technology will become more or less irrelevant.

I hope that they will start with normal capacities right of the bat not like it was with SSD of 16,32,64 and then more GB.

BTW: Out of curiosity who is coming with all these funky names? V-NAND, 3D X-Point. Must be whole department of eggheads focused only on catchy names. LOL

July 31, 2015 | 12:28 PM - Posted by Allyn Malventano

Given IMFT's track record, it's a safe bet we will see this memory appearing in products within a year or two. 

July 29, 2015 | 08:59 AM - Posted by BlackDove (not verified)

This must have been the new non volatile memory they were talking about introducing with the Skylake Xeon Purley platform and probably Knights Hill as well.

July 29, 2015 | 08:59 AM - Posted by BlackDove (not verified)

This must have been the new non volatile memory they were talking about introducing with the Skylake Xeon Purley platform and probably Knights Hill as well.

July 29, 2015 | 06:18 PM - Posted by MRFS (not verified)

Let's try to utilize a current product comparison,
to keep our feet on the ground e.g.:

http://www.newegg.com/Product/Product.aspx?Item=N82E16820231801&Tpk=N82E...

However, if you know where to look, G.Skill are making
DDR4 that clocks at 3666 MHz x 8 ~= 30,000 MB/second
(3666 x 8 = 29,328):

http://www.tomshardware.com/news/gskill-fastest-ddr4-memory-3666mhz,2908...

Because this "bleeding edge" will eventually become
mainstream, as other DRAM vendors play catch-up,
we should be using a number like 30,000 MB/second
as a baseline for comparison purposes.

That raw bandwidth is certainly possible with a
modern Intel 4-channel integrated memory controller.

Now, how do Intel/Micron's 3D XPoint specs compare to that baseline?

MRFS

July 30, 2015 | 01:24 AM - Posted by Master Chen (not verified)

This announcement truly blew my friggin' brain off.
Most of the "modern" (C w0t I did THAR?) DRAM chips have 1GB of memory per-chip on their board, which in itself essentially consists of four 256MB modules (thus 1024MB).
According to 3DXP's info sheet, it's able to stack NINE 1GB modules on the same chip die as the one the "modern" DRAM utilizes, thus 9GB instead of 1GB on exactly same space. 9GB. On a footprint that current 1GB of DRAM takes. I cannot friggin' wait for first video cards and SSDs with this baby on board. Also...maybe it's just me, but...doesn't this essentially mean that ECC/non-ECC debacle is dead? Like, seriously. Absolutely. Completely. Totally. There is bound to be non-volatile options available for EVERYONE from this point on, and thus, as far as I can see it, ECC memory is basically killed off, paving the way for this technology, and since ECC is going to be dead, this can only mean that everyone is going to use this non-volatile solution in the near future. Because WHY NOT? 3DXP basically made ECC irrelevant. ECC is dead, in my opinion.

July 30, 2015 | 10:49 AM - Posted by BlackDove (not verified)

How does this technology mean that you no longer need ECC? Everything basically has some option for some type of ECC, from CPU cache to RAM to SSDs and HDDs.

Why wouldnt this itself be designed with ECC in mind just like HMC and HBM? ECC is becoming MORE important than ever as densities increase. All DDR4 has some intrinsic ECC(bus CRC).

Im really very confused by your statement since ECC of some type is starting to get built into everything.

July 30, 2015 | 08:43 PM - Posted by Master Chen (not verified)

ECC existed as a viable solution up to this point ONLY because it provided error correction factor for server solutions. With 3DXP stepping into the playground, ECC is basically obsolete, dead. Because 3DXP NEVER fails. It simply does not. 100% non-volatile. And it's way WAAAAAY cheaper to produce AND it's faster when it comes down to sheer performance. So, yeah, in my opinion, it looks like 3DXP actually killed ECC.

July 31, 2015 | 12:20 AM - Posted by BlackDove (not verified)

Well youre misinformed as to why ECC has been important and is getting MORE important, not less.

Nothing "never fails" and there are MANY types of ECC and many types of errors caused by many different factors, from SEUs caused by alpha decay of the package itself to cosmic ray air showers, electromagnetic interference and other types of radiation, or physical failures or damage to the silicon itself.

XPoint will likely have plenty of ECC capability like SRAM, DRAM, HMC, HBM, flash, HDD, optical discs, networks of ALL types. Pretty much ANY part of a computer or network can have an ECC option, and as densities increase with DRAM XPoint or optical interconnects, whatever, it all needs MORE ECC since theres a lot more damage done by a single neutron or alpha particle.

Look into bit rot too, but thats another few paragraphs.

July 31, 2015 | 06:02 AM - Posted by Master Chen (not verified)

>Nothing "never fails"
Did you even read what I've said previously? How many times do I have to repeat it? 3DXP is 100% non-volatile. It will pertain all of it's data at all times. This factor automatically kills ECC memory and also UPS devices. The only reason why server holders still use ECC right now is because of automatic error correcting, and LITERALLY the ONLY reason why people buy that UPS crap is because they're afraid to lose crucial bits (pun intended) of data in the case if power suddenly goes down. Paranoiacs, both of these types of users. 3DXP completely eliminates this paranoia as a factor itself. With 3DXP you simply CAN'T screw up the data by sudden electricity outage or power bursts. All of the data will have it's integrity at 100% correct state the next time the power returns to the device, with 3DXP. Don't you see it?

Maybe there's misunderstanding between us on what ECC stands for in those posts of mine. If so, then I'll clarify it further: when I say "ECC" I solely mean the "ECC memory", or, rather, ECC "memory kits". Basically, when I've said "ECC" previously, all I've meant is server RAM modules which have ECC on their board. After all, this is what ECC is being used for, these days. And that is exactly what's being killed off by 3DXP, in my opinion. That exact type of memory product. Mainstream (non-server) RAM solutions, by 99.99% of it all, do NOT have ECC on their board, if (for some reason) you don't know. It's funny to read how you say "any part of a computer can have ECC support", but when memory which your computer configuration uses simply doesn't have ECC on it's board, it can't use ECC, so that "any part of a computer" really makes me smile.

P.S. I know perfectly fine what a "bit rot" is, as well as what "inevitable cell discharge" is. Don't act like you're the definite mister know-it-all here, because it's pretty clear to me that you're not.

July 31, 2015 | 12:18 PM - Posted by BlackDove (not verified)

Im not acting like Mr. Know-It-All at all. What youre saying is just extremely misguided.

ECC has nothing to do with losing data stored in RAM because of power failures. Similarly UPSs aren t ONLY to prevent data loss from volatile memory losing power.

ECC has to do with detecting and correcting errors from several sources: subatomic particles flipping bits is one of the main ones. Electromagnetic radiation causes significan amounts of errors even at sea level.

Even the materials that RAM is packaged in undergo subatomic decay, and release subatomic particles which flip bits in RAM. If that means a change in financial data, a single event upset(soft error) can cost millions of dollars. For an average PC user it might mean a BSOD ro game crashing, which is why most home users dont get ECC RAM.

ECC is actually used in storage media of all types, even in computers without ECC DRAM. Like i said, network protocols use it in different forms as well. You dont need ECC ram to have error correction somewhere in your PC.

And saying that XPoint memory doesnt need ECC because its nonvolatile indicates that you dont understand why ECC is used at all, or why error correction is currently used in nonvolatile flash.

Its because power loss is NOT why you use ECC or TMR with voting or DEDSEC or rad hard CPUs or FPGAs or error correction in any of its forms. SEUs are the main reason.

July 31, 2015 | 11:18 PM - Posted by MRFS (not verified)

I'm going to weigh in here, just enough to be dangerous.

And, I much appreciate the superior comments made above
by people with much more hardware savvy than I will
ever have.

The first half of "error correction" is error detection.

Whether correction is possible depends entirely on the
mechanism chosen to do detection.

For example:
a simple parity check will tell you if a bit has
switched from 0 to 1, or from 1 to 0, but it won't
help you identify which bit switched.

Google what is a parity check?
and find:
"A parity bit, or check bit is a bit added to the end of a string of binary code that indicates whether the number of bits in the string with the value one is even or odd."

The objective of error "correction" is to identify
which bit or bits changed, and to reverse their value
so that the intended binary string is correct.

There are numerous ways in which "correction" can occur.

For example, with very small packets, like the
8b/10b legacy frame, that frame can simply be
re-transmitted with the correct 10 bits.

And, of course, the trailing "stop bit"
can operate as a parity bit (see above).

Here, do some homework to identify how PCIe 3.0
transmits a 128b/130b "jumbo frame" reliably
(1 start bit + 16 data bytes + 1 stop bit).

And, do the same to identify how USB 3.1
transmits a 128b/132b "jumbo frame" reliably.

What "error correction" logics are built into
PCIe 3.0 and USB 3.1?

With longer binary strings, the mechanisms
required to "correct" 1 bit errors will
necessarily become more complex, and those
same mechanisms may only "detect" 2 bit errors
but be unable to correct both errors:

if 0...1 is not correct,
is 1...1 correct?
is 1...0 correct? or
is 0...0 correct?

I suspect it is premature, and unrealistic,
to credit this new technology with 100%
perfect reliability, particularly at this
very early stage when very few of us have
any 3D XPoint memory to work with.

p.s. If my 40+ years of experience have told me
anything, it is to avoid the mistake of assuming
marketing claims equal empirical performance
one hundred percent of the time.

See the film "Fail Safe" for a crude but
relevant analogy:
http://www.imdb.com/title/tt0058083/

MRFS

August 1, 2015 | 12:28 AM - Posted by BlackDove (not verified)

There are MANY different methods for correcting or detecting errors. The specific method depends on what the errors are being detected and corrected in(RAM, optical discs, networks, flash etc.).

The specific method is unimportant in the context of this discussion though.

The REASON error correction is used in the first place is the important thing. Data loss from losing power is NOT one of them as that guy seems to think.

Neutrons and other subatomic particles are the main reason ECC is necessary in RAM. They have many sources like cosmic ray air showers, solar events, terrestrial radiation sources, and even radioactive decay of the materials in the RAMs packaging itself.

Like i said, a single flipped bit could cost millions or make a satellite fail. Space based processors and RAM have even more elaborate methods for correcting errors caused by radiation.

A UPS similarly has more function than preventing data loss from the fact that DRAM is volatile. Different tiers of datacenters and computing require higher levels of reliability and uptime.

Physical damage and stopping mission critical operations in places where ECC and UPSs are used is unacceptable because of the downstream effects of even one minute of downtime.

Dont believe me? Read up on how mainframes and supercomputers(very different from each other) work, where entire CPUs, memory modules or nodes can fail and operation continues seamlessly.

Some DATACENTERS are designed with TOTAL REDUNDANCY. Two physical datacenters both running as mirror images hundreds or thousands of miles apart to prevent ANY downtime.

Yeah XPoint is non-volatile but it certainly has built in ECC capabilities and it definitely doesnt mean the end of ECC, TMR or any other error correcting method.

August 1, 2015 | 12:28 AM - Posted by BlackDove (not verified)

There are MANY different methods for correcting or detecting errors. The specific method depends on what the errors are being detected and corrected in(RAM, optical discs, networks, flash etc.).

The specific method is unimportant in the context of this discussion though.

The REASON error correction is used in the first place is the important thing. Data loss from losing power is NOT one of them as that guy seems to think.

Neutrons and other subatomic particles are the main reason ECC is necessary in RAM. They have many sources like cosmic ray air showers, solar events, terrestrial radiation sources, and even radioactive decay of the materials in the RAMs packaging itself.

Like i said, a single flipped bit could cost millions or make a satellite fail. Space based processors and RAM have even more elaborate methods for correcting errors caused by radiation.

A UPS similarly has more function than preventing data loss from the fact that DRAM is volatile. Different tiers of datacenters and computing require higher levels of reliability and uptime.

Physical damage and stopping mission critical operations in places where ECC and UPSs are used is unacceptable because of the downstream effects of even one minute of downtime.

Dont believe me? Read up on how mainframes and supercomputers(very different from each other) work, where entire CPUs, memory modules or nodes can fail and operation continues seamlessly.

Some DATACENTERS are designed with TOTAL REDUNDANCY. Two physical datacenters both running as mirror images hundreds or thousands of miles apart to prevent ANY downtime.

Yeah XPoint is non-volatile but it certainly has built in ECC capabilities and it definitely doesnt mean the end of ECC, TMR or any other error correcting method.

August 1, 2015 | 12:31 AM - Posted by BlackDove (not verified)

Sorry about the double post. Posted from my phone. Please delete one of them.

July 30, 2015 | 01:42 AM - Posted by Master Chen (not verified)

Installing a 20GB video game SOLELY on my video card, completely cornering around HDD/SSD and everything? DO WANT.
I really hope this becomes a thing by 2018 or 2022. My Reggie is finally going to be body by that time, that's for sure.

August 3, 2015 | 08:06 PM - Posted by Anonymous (not verified)

That would be an unlikely use. If you have a 20 GB video game, that 20 GB is probably a compressed format that must be expanded before it can be used. Keeping the compressed format in memory probably wouldn't be tha useful. Anyway, in a few years, I doubt there will actually be any separate video cards. We will have the entire system on a single small board. You will have CPUs, GPUs, memory, and possibly a lot of other dies on the same package.

July 30, 2015 | 04:25 PM - Posted by BBMan (not verified)

USB 4.0 by 2020. Coincidence?