All | Editorial | General Tech | Graphics Cards | Networking | Motherboards | Cases and Cooling | Processors | Chipsets | Memory | Displays | Systems | Storage | Mobile | Shows and Expos
It has become increasingly apparent that flash memory die shrinks have hit a bit of a brick wall in recent years. The issues faced by the standard 2D Planar NAND process were apparent very early on. This was no real secret - here's a slide seen at the 2009 Flash Memory Summit:
Despite this, most flash manufacturers pushed the envelope as far as they could within the limits of 2D process technology, balancing shrinks with reliability and performance. One of the largest flash manufacturers was Intel, having joined forces with Micron in a joint venture dubbed IMFT (Intel Micron Flash Technologies). Intel remained in lock-step with Micron all the way up to 20nm, but chose to hold back at the 16nm step, presumably in order to shift full focus towards alternative flash technologies. This was essentially confirmed late last week, with Intel's announcement of a shift to 3D NAND production.
Intel's press briefing seemed to focus more on cost efficiency than performance, and after reviewing the very few specs they released about this new flash, I believe we can do some theorizing as to the potential performance of this new flash memory. From the above illustration, you can see that Intel has chosen to go with the same sort of 3D technology used by Samsung - a 32 layer vertical stack of flash cells. This requires the use of an older / larger process technology, as it is too difficult to etch these holes at a 2x nm size. What keeps the die size reasonable is the fact that you get a 32x increase in bit density. Going off of a rough approximation from the above photo, imagine that 50nm die (8 Gbit), but with 32 vertical NAND layers. That would yield a 256 Gbit (32 GB) die within roughly the same footprint.
Representation of Samsung's 3D VNAND in 128Gbit and 86 Gbit variants.
20nm planar (2D) = yellow square, 16nm planar (2D) = blue square.
Image republished with permission from Schiltron Corporation.
It's likely a safe bet that IMFT flash will be going for a cost/GB far cheaper than the competing Samsung VNAND, and going with a relatively large 256 Gbit (vs. VNAND's 86 Gbit) per-die capacity is a smart move there, but let's not forget that there is a catch - write speed. Most NAND is very fast on reads, but limited on writes. Shifting from 2D to 3D NAND netted Samsung a 2x speed boost per die, and another effective 1.5x speed boost due to their choice to reduce per-die capacity from 128 Gbit to 86 Gbit. This effective speed boost came from the fact that a given VNAND SSD has 50% more dies to reach the same capacity as an SSD using 128 Gbit dies.
Now let's examine how Intel's choice of a 256 Gbit die impacts performance:
- Intel SSD 730 240GB = 16x128 Gbit 20nm dies
- 270 MB/sec writes and ~17 MB/sec/die
- Crucial MX100 128GB = 8x128Gbit 16nm dies
- 150 MB/sec writes and ~19 MB/sec/die
- Samsung 850 Pro 128GB = 12x86Gbit VNAND dies
- 470MB/sec writes and ~40 MB/sec/die
If we do some extrapolation based on the assumption that IMFT's move to 3D will net the same ~2x write speed improvement seen by Samsung, combined with their die capacity choice of 256Gbit, we get this:
- Future IMFT 128GB SSD = 4x256Gbit 3D dies
- 40 MB/sec/die x 4 dies = 160MB/sec
Even rounding up to 40 MB/sec/die, we can see that also doubling the die capacity effectively negates the performance improvement. While the IMFT flash equipped SSD will very likely be a lower cost product, it will (theoretically) see the same write speed limits seen in today's SSDs equipped with IMFT planar NAND. Now let's go one layer deeper on theoretical products and assume that Intel took the 18-channel NVMe controller from their P3700 Series and adopted it to a consumer PCIe SSD using this new 3D NAND. The larger die size limits the minimum capacity you can attain and still fully utilize their 18 channel controller, so with one die per channel, you end up with this product:
- Theoretical 18 channel IMFT PCIE 3D NAND SSD = 18x256Gbit 3D dies
- 40 MB/sec/die x 18 dies = 720 MB/sec
- 18x32GB (die capacity) = 576GB total capacity
Overprovisioning decisions aside, the above would be the lowest capacity product that could fully utilize the Intel PCIe controller. While the write performance is on the low side by PCIe SSD standards, the cost of such a product could easily be in the $0.50/GB range, or even less.
In summary, while we don't have any solid performance data, it appears that Intel's new 3D NAND is not likely to lead to a performance breakthrough in SSD speeds, but their choice on a more cost-effective per-die capacity for their new 3D NAND is likely to give them significant margins and the wiggle room to offer SSDs at a far lower cost/GB than we've seen in recent years. This may be the step that was needed to push SSD costs into a range that can truly compete with HDD technology.
Investigating the issue
** Edit ** (24 Sep)
We have updated this story with temperature effects on the read speed of old data. Additional info on page 3.
** End edit **
** Edit 2 ** (26 Sep)
New quote from Samsung:
"We acknowledge the recent issue associated with the Samsung 840 EVO SSDs and are qualifying a firmware update to address the issue. While this issue only affects a small subset of all 840 EVO users, we regret any inconvenience experienced by our customers. A firmware update that resolves the issue will be available on the Samsung SSD website soon. We appreciate our customer’s support and patience as we work diligently to resolve this issue."
** End edit 2 **
** Edit 3 **
The firmware update and performance restoration tool has been tested. Results are found here.
** End edit 3 **
Over the past week or two, there have been growing rumblings from owners of Samsung 840 and 840 EVO SSDs. A few reports scattered across internet forums gradually snowballed into lengthy threads as more and more people took a longer look at their own TLC-based Samsung SSD's performance. I've spent the past week following these threads, and the past few days evaluating this issue on the 840 and 840 EVO samples we have here at PC Perspective. This post is meant to inform you of our current 'best guess' as to just what is happening with these drives, and just what you should do about it.
The issue at hand is an apparent slow down in the reading of 'stale' data on TLC-based Samsung SSDs. Allow me to demonstrate:
You might have seen what looks like similar issues before, but after much research and testing, I can say with some confidence that this is a completely different and unique issue. The old X25-M bug was the result of random writes to the drive over time, but the above result is from a drive that only ever saw a single large file write to a clean drive. The above drive was the very same 500GB 840 EVO sample used in our prior review. It did just fine in that review, and at afterwards I needed a quick temporary place to put a HDD image file and just happened to grab that EVO. The file was written to the drive in December of 2013, and if it wasn't already apparent from the above HDTach pass, it was 442GB in size. This brings on some questions:
- If random writes (i.e. flash fragmentation) are not causing the slow down, then what is?
- How long does it take for this slow down to manifest after a file is written?
Since the introduction of the Haswell line of CPUs, the Internet has been aflame with how hot the CPUs run. Speculation ran rampant on the cause with theories abounding about the lesser surface area and inferior thermal interface material (TIM) in between the CPU die surface and the underside of the CPU heat spreader. It was later confirmed that Intel had changed the TIM interfacing the CPU die surface to the heat spreader with Haswell, leading to the hotter than expected CPU temperatures. This increase in temperature led to inconsistent core-to-core temperatures as well as vastly inferior overclockability of the Haswell K-series chips over previous generations.
A few of the more adventurous enthusiasts took it upon themselves to use inventive ways to address the heat concerns surrounding the Haswell by delidding the processor. The delidding procedure involves physically removing the heat spreader from the CPU, exposing the CPU die. Some individuals choose to clean the existing TIM from the core die and heat spreader underside, applying superior TIM such as metal or diamond-infused paste or even the Coollaboratory Liquid Ultra metal material and fixing the heat spreader back in place. Others choose a more radical solution, removing the heat spreader from the equation entirely for direct cooling of the naked CPU die. This type of cooling method requires use of a die support plate, such as the MSI Die Guard included with the MSI Z97 XPower motherboard.
Whichever outcome you choose, you must first remove the heat spreader from the CPU's PCB. The heat spreader itself is fixed in place with black RTV-type material ensuring a secure and air-tight seal, protecting the fragile die from outside contaminants and influences. Removal can be done in multiple ways with two of the most popular being the razor blade method and the vise method. With both methods, you are attempting to separate the CPU PCB from the heat spreader without damaging the CPU die or components on the top or bottom sides of the CPU PCB.
When Magma Freezes Over...
Intel confirms that they have approached AMD about access to their Mantle API. The discussion, despite being clearly labeled as "an experiment" by an Intel spokesperson, was initiated by them -- not AMD. According to AMD's Gaming Scientist, Richard Huddy, via PCWorld, AMD's response was, "Give us a month or two" and "we'll go into the 1.0 phase sometime this year" which only has about five months left in it. When the API reaches 1.0, anyone who wants to participate (including hardware vendors) will be granted access.
AMD inside Intel Inside???
I do wonder why Intel would care, though. Intel has the fastest per-thread processors, and their GPUs are not known to be workhorses that are held back by API call bottlenecks, either. Of course, that is not to say that I cannot see any reason, however...
The AMD Argument
Earlier this week, a story was posted in a Forbes.com blog that dove into the idea of NVIDIA GameWorks and how it was doing a disservice not just on the latest Ubisoft title Watch_Dogs but on PC gamers in general. Using quotes from AMD directly, the author claims that NVIDIA is actively engaging in methods to prevent game developers from optimizing games for AMD graphics hardware. This is an incredibly bold statement and one that I hope AMD is not making lightly. Here is a quote from the story:
Gameworks represents a clear and present threat to gamers by deliberately crippling performance on AMD products (40% of the market) to widen the margin in favor of NVIDIA products. . . . Participation in the Gameworks program often precludes the developer from accepting AMD suggestions that would improve performance directly in the game code—the most desirable form of optimization.
The example cited on the Forbes story is the recently released Watch_Dogs title, which appears to show favoritism towards NVIDIA GPUs with performance of the GTX 770 ($369) coming close the performance of a Radeon R9 290X ($549).
It's evident that Watch Dogs is optimized for Nvidia hardware but it's staggering just how un-optimized it is on AMD hardware.
Watch_Dogs is the latest GameWorks title released this week.
I decided to get in touch with AMD directly to see exactly what stance the company was attempting to take with these kinds of claims. No surprise, AMD was just as forward with me as they appeared to be in the Forbes story originally.
The AMD Stance
Central to AMD’s latest annoyance with the competition is the NVIDIA GameWorks program. First unveiled last October during a press event in Montreal, GameWorks combines several NVIDIA built engine functions into libraries that can be utilized and accessed by game developers to build advanced features into games. NVIDIA’s website claims that GameWorks is “easy to integrate into games” while also including tutorials and tools to help quickly generate content with the software set. Included in the GameWorks suite are tools like VisualFX which offers rendering solutions like HBAO+, TXAA, Depth of Field, FaceWorks, HairWorks and more. Physics tools include the obvious like PhysX while also adding clothing, destruction, particles and more.
Taking it all the way to 12!
Microsoft has been developing DirectX for around 20 years now. Back in the 90s, the hardware and software scene for gaming was chaotic, at best. We had wonderful things like “SoundBlaster compatibility” and 3rd party graphics APIs such as Glide, S3G, PowerSGL, RRedline, and ATICIF. OpenGL was aimed more towards professional applications and it took John Carmack and iD, through GLQuake in 1996, to start the ball moving in that particular direction. There was a distinct need to provide standards across audio and 3D graphics that would de-fragment the industry and developers. DirectX was introduced with Windows 95, but the popularity of Direct3D did not really take off until DirectX 3.0 that was released in late 1996.
DirectX has had some notable successes, and some notable let downs, over the years. DX6 provided a much needed boost in 3D graphics, while DX8 introduced the world to programmable shading. DX9 was the most long-lived version, thanks to it being the basis for the Xbox 360 console with its extended lifespan. DX11 added in a bunch of features and made programming much simpler, all the while improving performance over DX10. The low points? DX10 was pretty dismal due to the performance penalty on hardware that supported some of the advanced rendering techniques. DirectX 7 was around a little more than a year before giving way to DX8. DX1 and DX2? Yeah, those were very unpopular and problematic, due to the myriad changes in a modern operating system (Win95) as compared to the DOS based world that game devs were used to.
Some four years ago, if going by what NVIDIA has said, initial talks were initiated to start pursuing the development of DirectX 12. DX11 was released in 2009 and has been an excellent foundation for PC games. It is not perfect, though. There is still a significant impact in potential performance due to a variety of factors, including a fairly inefficient hardware abstraction layer that relies more upon fast single threaded performance from a CPU rather than leveraging the power of a modern multi-core/multi-thread unit. This has the result of limiting how many objects can be represented on screen as well as different operations that would bottleneck even the fastest CPU threads.
Introduction and Background
Back in 2010, Intel threw a bit of a press thing for a short list of analysts and reviewers out at their IMFT flash memory plant at Lehi, Utah. The theme and message of that event was to announce 25nm flash entering mass production. A few years have passed, and 25nm flash is fairly ubiquitous, with 20nm rapidly gaining as IMFT scales production even higher with the smaller process. Last week, Intel threw a similar event, but instead of showing off a die shrink or even announcing a new enthusiast SSD, they chose to take a step back and brief us on the various design, engineering, and validation testing of their flash storage product lines.
At the Lehi event, I did my best to make off with a 25nm wafer.
Many topics were covered at this new event at the Intel campus at Folsom, CA, and over the coming weeks we will be filling you in on many of them as we take the necessary time to digest the fire hose of intel (pun intended) that we received. Today I'm going to lay out one of the more impressive things I saw at the briefings, and that is the process Intel goes through to ensure their products are among the most solid and reliable in the industry.
It wouldn’t be February if we didn’t hear the Q4 FY14 earnings from NVIDIA! NVIDIA does have a slightly odd way of expressing their quarters, but in the end it is all semantics. They are not in fact living in the future, but I bet their product managers wish they could peer into the actual Q4 2014. No, the whole FY14 thing relates back to when they made their IPO and how they started reporting. To us mere mortals, Q4 FY14 actually represents Q4 2013. Clear as mud? Lord love the Securities and Exchange Commission and their rules.
The past quarter was a pretty good one for NVIDIA. They came away with $1.144 billion in gross revenue and had a GAAP net income of $147 million. This beat the Street’s estimate by a pretty large margin. As a response, trading of NVIDIA’s stock has gone up in after hours. This has certainly been a trying year for NVIDIA and the PC market in general, but they seem to have come out on top.
NVIDIA beat estimates primarily on the strength of the PC graphics division. Many were focusing on the apparent decline of the PC market and assumed that NVIDIA would be dragged down by lower shipments. On the contrary, it seems as though the gaming market and add-in sales on the PC helped to solidify NVIDIA’s quarter. We can look at a number of factors that likely contributed to this uptick for NVIDIA.
Why is there a bourbon review on a PC-centric website?
We can’t live, eat, and breathe PC technology all the time. All of us have outside interests that may not intersect with the PC and mobile market. I think we would be pretty boring people if that were the case. Yes, our professional careers are centered in this area, but our personal lives do diverge from the PC world. You certainly can’t drink a GPU, though I’m sure somebody out there has tried.
The bottle is unique to Wyoming Whiskey. The bourbon has a warm, amber glow about it as well. Picture courtesy of Wyoming Whiskey
Many years ago I became a beer enthusiast. I loved to sample different concoctions, I would brew my own, and I settled on some personal favorites throughout the years. Living in Wyoming is not necessarily conducive to sampling many different styles and types of beers, and so I was in a bit of a rut. A few years back a friend of mine bought me a bottle of Tomatin 12 year single malt scotch, and I figured this would be an interesting avenue to move down since I had tapped out my selection of new and interesting beers (Wyoming has terrible beer distribution).
The Really Good Times are Over
We really do not realize how good we had it. Sure, we could apply that to budget surpluses and the time before the rise of global terrorism, but in this case I am talking about the predictable advancement of graphics due to both design expertise and improvements in process technology. Moore’s law has been exceptionally kind to graphics. We can look back and when we plot the course of these graphics companies, they have actually outstripped Moore in terms of transistor density from generation to generation. Most of this is due to better tools and the expertise gained in what is still a fairly new endeavor as compared to CPUs (the first true 3D accelerators were released in the 1993/94 timeframe).
The complexity of a modern 3D chip is truly mind-boggling. To get a good idea of where we came from, we must look back at the first generations of products that we could actually purchase. The original 3Dfx Voodoo Graphics was comprised of a raster chip and a texture chip, each contained approximately 1 million transistors (give or take) and were made on a then available .5 micron process (we shall call it 500 nm from here on out to give a sense of perspective with modern process technology). The chips were clocked between 47 and 50 MHz (though often could be clocked up to 57 MHz by going into the init file and putting in “SET SST_GRXCLK=57”… btw, SST stood for Sellers/Smith/Tarolli, the founders of 3Dfx). This revolutionary graphics card at the time could push out 47 to 50 megapixels and had 4 MB of VRAM and was released in the beginning of 1996.
My first 3D graphics card was the Orchid Righteous 3D. Voodoo Graphics was really the first successful consumer 3D graphics card. Yes, there were others before it, but Voodoo Graphics had the largest impact of them all.
In 1998 3Dfx released the Voodoo 2, and it was a significant jump in complexity from the original. These chips were fabricated on a 350 nm process. There were three chips to each card, one of which was the raster chip and the other two were texture chips. At the top end of the product stack was the 12 MB cards. The raster chip had 4 MB of VRAM available to it while each texture chip had 4 MB of VRAM for texture storage. Not only did this product double performance from the Voodoo Graphics, it was able to run in single card configurations at 800x600 (as compared to the max 640x480 of the Voodoo Graphics). This is the same time as when NVIDIA started to become a very aggressive competitor with the Riva TnT and ATI was about to ship the Rage 128.
A new generation of Software Rendering Engines.
We have been busy with side projects, here at PC Perspective, over the last year. Ryan has nearly broken his back rating the frames. Ken, along with running the video equipment and "getting an education", developed a hardware switching device for Wirecase and XSplit.
My project, "Perpetual Motion Engine", has been researching and developing a GPU-accelerated software rendering engine. Now, to be clear, this is just in very early development for the moment. The point is not to draw beautiful scenes. Not yet. The point is to show what OpenGL and DirectX does and what limits are removed when you do the math directly.
Errata: BioShock uses a modified Unreal Engine 2.5, not 3.
In the above video:
- I show the problems with graphics APIs such as DirectX and OpenGL.
- I talk about what those APIs attempt to solve, finding color values for your monitor.
- I discuss the advantages of boiling graphics problems down to general mathematics.
- Finally, I prove the advantages of boiling graphics problems down to general mathematics.
I would recommend watching the video, first, before moving forward with the rest of the editorial. A few parts need to be seen for better understanding.
If Microsoft was left to their own devices...
Microsoft's Financial Analyst Meeting 2013 set the stage, literally, for Steve Ballmer's last annual keynote to investors. The speech promoted Microsoft, its potential, and its unique position in the industry. He proclaims, firmly, their desire to be a devices and services company.
The explanation, however, does not befit either industry.
Ballmer noted, early in the keynote, how Bing is the only notable competitor to Google Search. He wanted to make it clear, to investors, that Microsoft needs to remain in the search business to challenge Google. The implication is that Microsoft can fill the cracks where Google does not, or even cannot, and establish a business from that foothold. I agree. Proprietary products (which are not inherently bad by the way), as Google Search is, require one or more rivals to fill the overlooked or under-served niches. A legitimate business can be established from that basis.
It is the following, similar, statement which troubles me.
Ballmer later mentioned, along the same vein, how Microsoft is among the few making fundamental operating system investments. Like search, the implication is that operating systems are proprietary products which must compete against one another. This, albeit subtly, does not match their vision as a devices and services company. The point of a proprietary platform is to own the ecosystem, from end to end, and to derive your value from that control. The product is not a device; the product is not a service; the product is a platform. This makes sense to them because, from birth, they were a company which sold platforms.
A platform as a product is not a device nor is it service.
Retiring the Workhorses
There is an inevitable shift coming. Honestly, this has been quite obvious for some time, but it has just taken AMD a bit longer to get here than many have expected. Some years back we saw AMD release their new motto, “The Future is Fusion”. While many thought it somewhat interesting and trite, it actually foreshadowed the massive shift from monolithic CPU cores to their APUs. Right now AMD’s APUs are doing “ok” in desktops and are gaining traction in mobile applications. What most people do not realize is that AMD will be going all APU all the time in the very near future.
We can look over the past few years and see that AMD has been headed in this direction for some time, but they simply have not had all the materials in place to make this dramatic shift. To get a better understanding of where AMD is heading, how they plan to address multiple markets, and what kind of pressures they are under, we have to look at the two major non-APU markets that AMD is currently hanging onto by a thread. In some ways, timing has been against AMD, not to mention available process technologies.
The Densest 2.5 Hours Imaginable
What do they want Origin to be?
GamesIndustry International conducted an interview with EA's Executive Vice President, Andrew Wilson, during this year's Electronic Entertainment Expo (E3 2013). Wilson was on the team which originally designed Origin before marketing decided to write off all DOS-era nostalgia they once held with PC gamers through recycling an old web address.
The service, itself, has also changed since the original project.
'"Over the years ... there've been some permutations of that vision that have manifested as part of Origin," Wilson said. "I think what we've done is taken a step back and said 'Wow, we've actually done some really cool things with Origin.' It is by no means perfect, but we've done some pretty cool things. As you say, the plumbing is there. What can we do now to really think about Origin in the next generation?"
Fans of Sim City, who faithfully pre-ordered, will likely argue that Origin does not have enough sewage treatment at the end of their plumbing and the out-flow defecated all over their experience. A good service can be built atop the foundations of Origin; but, I have little confidence in their ability to realize that potential.
Wilson, on the other hand, believes they now "get it".
One assertion deals with customers who purchase more than one game. He argues that multiple update and online services are required and that is a barrier for users who desire a second, third, or hundredth purchase thereafter. The belief is that Origin can create a single experience for users and remove that barrier to inhibit a user's purchase. In practice, Origin ends up being a bigger hurdle than a single-game's service. It washes a bad faith over their entire library and fails to justify itself: games, such as Sim City, update on their own and old titles still have their online services taken offline.
What it comes down to is lack of focus. Wilson believes development of Origin was too focused on the transaction, and that lead to bad faith, presumably because customers would smell the disingenuous salesman. Good Old Games (GOG), on the other hand, successfully focused on the transaction. The difference? GOG gets out of your way immediately after the transaction, leaving you with just the game plus its bonus pack-ins you ordered, not DRM and a redundant social network.
Steam is heavily focused as a service and that is where EA desires Origin to be. The problem? Valve has set a high bar for EA to contend with. Steam has built customer faith consistently, albeit not perfectly, over its life with its user-centric vision. Not only would EA need to be substantially better than Steam, it is fighting with a severe handicap from their history of shutting down gaming servers and threatening to delicense merchandise if their customers upset them.
A successful Origin will need to carefully consider what it wants to be and strive to win at that goal. While possible, they are still content to handicap themselves and, then, not own the results of their decisions.
OpenCL Support in a Meaningful Way
Adobe had OpenCL support since last year. You would never benefit from its inclusion unless you ran one of two AMD mobility chips under Mac OSX Lion, but it was there. Creative Cloud, predictably, furthers this trend with additional GPGPU support for applications like Photoshop and Premiere Pro.
This leads to some interesting points:
- How OpenCL is changing the landscape between Intel and AMD
- What GPU support is curiously absent from Adobe CC for one reason or another
- Which GPUs are supported despite not... existing, officially.
This should be very big news for our readers who do production work whether professional or for a hobby. If not, how about a little information about certain GPUs that are designed to compete with the GeForce 700-series?
An new era for computing? Or, just a bit of catching up?
Early Tuesday, at 2am for viewers in eastern North America, Intel performed their Computex 2013 keynote to officially kick off Haswell. Unlike ASUS from the night prior, Intel did not announce a barrage of new products; the purpose is to promote future technologies and the new products of their OEM and ODM partners. In all, there was a pretty wide variety of discussed topics.
Intel carried on with the computational era analogy: the 80's was dominated by mainframes; the 90's were predominantly client-server; and the 2000's brought the internet to the forefront. While true, they did not explicitly mention how each era never actually died but rather just bled through: we still use mainframes, especially with cloud infrastructure; we still use client-server; and just about no-one would argue that the internet has been displaced, despite its struggle against semi-native apps.
Intel believes that we are currently in the two-in-one era, which they probably mean "multiple-in-one" due to devices such as the ASUS Transformer Book Trio. They created a tagline, almost a mantra, illustrating their vision:
"It's a laptop when you need it; it's a tablet when you want it."
But before elaborating, they wanted to discuss their position in the mobile market. They believe they are becoming a major player in the mobile market with key design wins and outperforming some incumbent system on a chips (SoCs). The upcoming Silvermont architecture pines to be fill in the gaps below Haswell, driving smartphones and tablets and stretching upward to include entry-level notebooks and all-in-one PCs. The architecture promises to scale between offering three-fold more performance than its past generation, or a fifth of the power for equivalent performance.
Ryan discussed Silvermont last month, be sure to give his thoughts a browse for more depth.
Good effort goes a long way
The wait has been long and anxious for Heart of the Swarm, the expansion to 2010's StarCraft 2: Wings of Liberty. Blizzard originally hinted at a very rapid release schedule which did not exactly come to fruition. The nearly three years of development time for Heart of the Swarm is longer than a single studio spends on a full Call of Duty title; although, one could make a very credible argument that a Blizzard expansion requires more effort to create than said complete Call of Duty title.
But as Duke Nukem Forever demonstrated, a long time in development does not guarantee a fully baked product coming out the other end.
Blizzard games have always been highly entertaining albeit without deep artistic substance; their games are not first on the list for a university literature syllabus. But, there is a lot of room in life for engaging entertainment. In terms of the PC, Blizzard has always been one of the leading developers for the platform; they know how to deliver an exceptional PC experience if they choose to.
Taking a Fresh Look at GLOBALFOUNDRIES
It has been a while since we last talked about GLOBALFOUNDRIES, and it is high time to do so. So why the long wait between updates? Well, I think the long and short of it is a lack of execution from their stated roadmaps from around 2009 on. When GF first came on the scene they had a very aggressive roadmap about where their process technology will be and how it will be implemented. I believe that GF first mentioned a working 28 nm process in a early 2011 timeframe. There was a lot of excitement in some corners as people expected next generation GPUs to be available around then using that process node.
Fab 1 is the facility where all 32 nm SOI and most 28 nm HKMG are produced.
Obviously GF did not get that particular process up and running as expected. In fact, they had some real issues getting 32 nm SOI running in a timely manner. Llano was the first product GF produced on that particular node, as well as plenty of test wafers of Bulldozer parts. Both were delayed from when they were initially expected to hit, and both had fabrication issues. Time and money can fix most things when it comes to process technology, and eventually GF was able to solve what issues they had on their end. 32 nm SOI/HKMG is producing like gangbusters. AMD has improved their designs on their end to make things a bit easier as well at GF.
While shoring up the 32 nm process was of extreme importance to GF, it seemingly took resources away from further developing 28 nm and below processes. While work was still being done on these products, the roadmap was far too aggressive for what they were able to accomplish. The hits just kept coming though. AMD cut back on 32nm orders, which had a financial impact on both companies. It was cheaper for AMD to renegotiate the contract and take a penalty rather than order chips that it simply could not sell. GF then had lots of line space open on 32 nm SOI (Dresden) that could not be filled. AMD then voided another contract in which they suffered a larger penalty by opting to potentially utilize a second source for 28 nm HKMG production of their CPUs and APUs. AMD obviously was very uncomfortable about where GF was with their 28 nm process.
During all of this time GF was working to get their Luther Forest FAB 8 up and running. Building a new FAB is no small task. This is a multi-billion dollar endeavor and any new FAB design will have complications. Happily for GF, the development of this FAB has gone along seemingly according to plan. The FAB has achieved every major milestone in construction and deployment. Still, the risks involved with a FAB that could reach around $8 billion+ are immense.
2012 was not exactly the year that GF expected, or hoped for. It was tough on them and their partners. They also had more expenses such as acquiring Chartered back in 2009 and then acquiring the rather significant stake that AMD had in the company in the first place. During this time ATIC has been pumping money into GF to keep it afloat as well as its aspirations at being a major player in the fabrication industry.
Taking an Accurate Look at SSD Write Endurance
Last year, I posted a rebuttal to a paper describing the future of flash memory as ‘bleak’. The paper went through great (and convoluted) lengths to paint a tragic picture of flash memory endurance moving forward. Yesterday a newer paper hit Slashdot – this one doing just the opposite, and going as far as to assume production flash memory handling up to 1 Million erase cycles. You’d think that since I’m constantly pushing flash memory as a viable, reliable, and super-fast successor to Hard Disks (aka 'Spinning Rust'), that I’d just sit back on this one and let it fly. After all, it helps make my argument! Well, I can’t, because if there are errors published on a topic so important to me, it’s in the interest of journalistic integrity that I must now post an equal and opposite rebuttal to this one – even if it works against my case.
First I’m going to invite you to read through the paper in question. After doing so, I’m now going to pick it apart. Unfortunately I’m crunched for time today, so I’m going to reduce my dissertation into the form of some simple bulleted points:
- Max data write speed did not take into account 8/10 encoding, meaning 6Gb/sec = 600MB/sec, not 750MB/sec.
- The flash *page* size (8KB) and block sizes (2MB) chosen more closely resemble that of MLC parts (not SLC – see below for why this is important).
- The paper makes no reference to Write Amplification.
Perhaps the most glaring and significant is that all of the formulas, while correct, fail to consider the most important factor when dealing with flash memory writes – Write Amplification.
Before geting into it, I'll reference the excellent graphic that Anand put in his SSD Relapse piece:
SSD controllers combine smaller writes into larger ones in an attempt to speed up the effective write speed. This falls flat once all flash blocks have been written to at least once. From that point forward, the SSD must play musical chairs with the data on each and every small write. In a bad case, a single 4KB write turns into a 2MB write. For that example, Write Amplification would be a factor of 500, meaning the flash memory is cycled at 500x the rate calculated in the paper. Sure that’s an extreme example, but the point is that without referencing amplification at all, it is assumed to be a factor of 1, which would only be the case if you were only writing 2MB blocks of data to the SSD. This is almost never the case, regardless of Operating System.
After posters on Slashdot called out the author on his assumptions of rated P/E cycles, he went back and added two links to justify his figures. The problem is that the first links to a 2005 data sheet for 90nm SLC flash. Samsung’s 90nm flash was 1Gb per die (128MB). The packages were available with up to 4 dies each, and scaling up to a typical 16-chip SSD, that only gives you an 8GB SSD. Not very practical. That’s not to say 100k is an inaccurate figure for SLC endurance. It’s just a really bad reference to use is all. Here's a better one from the Flash Memory Summit a couple of years back:
The second link was a 2008 PR blast from Micron, based on their proposed pushing of the 34nm process to its limits. “One Million Write Cycles” was nothing more than a tag line for an achievement accomplished in a lab under ideal conditions. That figure was never reached in anything you could actually buy in a SATA SSD. A better reference would be from that same presentation at the Summit:
This shows larger process nodes hitting even beyond 1 million cycles (given sufficient additional error bits used for error correction), but remember it has to be something that is available and in a usable capacity to be practical for real world use, and that’s just not the case for the flash in the above chart.
At the end of the day, manufacturers must balance cost, capacity, and longevity. This forces a push towards smaller processes (for more capacity per cost), with the limit being how much endurance they are willing to give up in the process. In the end they choose based on what the customer needs. Enterprise use leans towards SLC or eMLC, as they are willing to spend more for the gain in endurance. Typical PC users get standard MLC and now even TLC, which are *good enough* for that application. It's worth noting that most SSD failures are not due to burning out all of the available flash P/E cycles. The vast majority are due to infant mortality failures of the controller or even due to buggy firmware. I've never written enough to any single consumer SSD (in normal operation) to wear out all of the flash. The closest I've come to a flash-related failure was when I had an ioDrive fail during testing by excessive heat causing a solder pad to lift on one of the flash chips.
All of this said, I’d love to see a revisit to the author’s well-structured paper – only based on the corrected assumptions I’ve outlined above. *That* is the type of paper I would reference when attempting to make *accurate* arguments for SSD endurance.