All | Editorial | General Tech | Graphics Cards | Networking | Motherboards | Cases and Cooling | Processors | Chipsets | Memory | Displays | Systems | Storage | Mobile | Shows and Expos
Zen vs. 40 Years of CPU Development
Zen is nearly upon us. AMD is releasing its next generation CPU architecture to the world this week and we saw CPU demonstrations and upcoming AM4 motherboards at CES in early January. We have been shown tantalizing glimpses of the performance and capabilities of the “Ryzen” products that will presumably fill the desktop markets from $150 to $499. I have yet to be briefed on the product stack that AMD will be offering, but we know enough to start to think how positioning and placement will be addressed by these new products.
To get a better understanding of how Ryzen will stack up, we should probably take a look back at what AMD has accomplished in the past and how Intel has responded to some of the stronger products. AMD has been in business for 47 years now and has been a major player in semiconductors for most of that time. It really has only been since the 90s where AMD started to battle Intel head to head that people have become passionate about the company and their products.
The industry is a complex and ever-shifting one. AMD and Intel have been two stalwarts over the years. Even though AMD has had more than a few challenging years over the past decade, it still moves forward and expects to compete at the highest level with its much larger and better funded competitor. 2017 could very well be a breakout year for the company with a return to solid profitability in both CPU and GPU markets. I am not the only one who thinks this considering that AMD shares that traded around the $2 mark ten months ago are now sitting around $14.
AMD Through 1996
AMD became a force in the CPU industry due to IBM’s requirement to have a second source for its PC business. Intel originally entered into a cross licensing agreement with AMD to allow it to produce x86 chips based on Intel designs. AMD eventually started to produce their own versions of these parts and became a favorite in the PC clone market. Eventually Intel tightened down on this agreement and then cancelled it, but through near endless litigation AMD ended up with a x86 license deal with Intel.
AMD produced their own Am286 chip that was the first real break from the second sourcing agreement with Intel. Intel balked at sharing their 386 design with AMD and eventually forced the company to develop its own clean room version. The Am386 was released in the early 90s, well after Intel had been producing those chips for years. AMD then developed their own version of the Am486 which then morphed into the Am5x86. The company made some good inroads with these speedy parts and typically clocked them faster than their Intel counterparts (eg. Am486 40 MHz and 80 MHz vs. the Intel 486 DX33 and DX66). AMD priced these points lower so users could achieve better performance per dollar using the same chipsets and motherboards.
Intel released their first Pentium chips in 1993. The initial version was hot and featured the infamous FDIV bug. AMD made some inroads against these parts by introducing the faster Am486 and Am5x86 parts that would achieve clockspeeds from 133 MHz to 150 MHz at the very top end. The 150 MHz part was very comparable in overall performance to the Pentium 75 MHz chip and we saw the introduction of the dreaded “P-rating” on processors.
There is no denying that Intel continued their dominance throughout this time by being the gold standard in x86 manufacturing and design. AMD slowly chipped away at its larger rival and continued to profit off of the lucrative x86 market. William Sanders III set the bar higher about where he wanted the company to go and he started on a much more aggressive path than many expected the company to take.
It always feels a little odd when covering NVIDIA’s quarterly earnings due to how they present their financial calendar. No, we are not reporting from the future. Yes, it can be confusing when comparing results and getting your dates mixed up. Regardless of the date before the earnings, NVIDIA did exceptionally well in a quarter that is typically the second weakest after Q1.
NVIDIA reported revenue of $1.43 billion. This is a jump from an already strong Q1 where they took in $1.30 billion. Compare this to the $1.027 billion of its competitor AMD who also provides CPUs as well as GPUs. NVIDIA sold a lot of GPUs as well as other products. Their primary money makers were the consumer space GPUs and the professional and compute markets where they have a virtual stranglehold on at the moment. The company’s GAAP net income is a very respectable $253 million.
The release of the latest Pascal based GPUs were the primary mover for the gains for this latest quarter. AMD has had a hard time competing with NVIDIA for marketshare. The older Maxwell based chips performed well against the entire line of AMD offerings and typically did so with better power and heat characteristics. Even though the GTX 970 was somewhat limited in its memory configuration as compared to the AMD products (3.5 GB + .5 GB vs. a full 4 GB implementation) it was a top seller in its class. The same could be said for the products up and down the stack.
Pascal was released at the end of May, but the company had been shipping chips to its partners as well as creating the “Founder’s Edition” models to its exacting specifications. These were strong sellers throughout the end of May until the end of the quarter. NVIDIA recently unveiled their latest Pascal based Quadro cards, but we do not know how much of an impact those have had on this quarter. NVIDIA has also been shipping, in very limited quantities, the Tesla P100 based units to select customers and outfits.
A Watershed Moment in Mobile
This previous May I was invited to Austin to be briefed on the latest core innovations from ARM and their partners. We were introduced to new CPU and GPU cores, as well as the surrounding technologies that provide the basis of a modern SOC in the ARM family. We also were treated to more information about the process technologies that ARM would embrace with their Artisan and POP programs. ARM is certainly far more aggressive now in their designs and partnerships than they have been in the past, or at least they are more willing to openly talk about them to the press.
The big process news that ARM was able to share at this time was the design of 10nm parts using an upcoming TSMC process node. This was fairly big news as TSMC was still introducing parts on their latest 16nm FF+ line. NVIDIA had not even released their first 16FF+ parts to the world in early May. Apple had dual sourced their 14/16 nm parts from Samsung and TSMC respectively, but these were based on LPE and FF lines (early nodes not yet optimized to LPP/FF+). So the news that TSMC would have a working 10nm process in 2017 was important to many people. 2016 might be a year with some good performance and efficiency jumps, but it seems that 2017 would provide another big leap forward after years of seeming stagnation of pure play foundry technology at 28nm.
Yesterday we received a new announcement from ARM that shows an amazing shift in thought and industry inertia. ARM is partnering with Intel to introduce select products on Intel’s upcoming 10nm foundry process. This news is both surprising and expected. It is surprising in that it happened as quickly as it did. It is expected as Intel is facing a very different world than it had planned for 10 years ago. We could argue that it is much different than they planned for 5 years ago.
Intel is the undisputed leader in process technologies and foundry practices. They are the gold standard of developing new, cutting edge process nodes and implementing them on a vast scale. This has served them well through the years as they could provide product to their customers seemingly on demand. It also allowed them a leg up in technology when their designs may not have fit what the industry wanted or needed (Pentium 4, etc.). It also allowed them to potentially compete in the mobile market with designs that were not entirely suited for ultra-low power. x86 is a modern processor technology with decades of development behind it, but that development focused mainly on performance at higher TDP ranges.
This past year Intel signaled their intent to move out of the sub 5 watt market and cede it to ARM and their partners. Intel’s ultra mobile offerings just did not make an impact in an area that they were expected to. For all of Intel’s advances in process technology, the base ARM architecture is just better suited to these power envelopes. Instead of throwing good money after bad (in the form of development time, wafer starts, rebates) Intel has stepped away from this market.
This leaves Intel with a problem. What to do with extra production capacity? Running a fab is a very expensive endeavor. If these megafabs are not producing chips 24/7, then the company is losing money. This past year Intel has seen their fair share of layoffs and slowing down production/conversion of fabs. The money spent on developing new, cutting edge process technologies cannot stop for the company if they want to keep their dominant position in the CPU industry. Some years back they opened up their process products to select 3rd party companies to help fill in the gaps of production. Right now Intel has far more production line space than they need for the current market demands. Yes, there were delays in their latest Skylake based processors, but those were solved and Intel is full steam ahead. Unfortunately, they do not seem to be keeping their fabs utilized at the level needed or desired. The only real option seems to be opening up some fab space to more potential customers in a market that they are no longer competing directly in.
The Intel Custom Foundry Group is working with ARM to provide access to their 10nm HPM process node. Initial production of these latest generation designs will commence in Q1 2017 with full scale production in Q4 2017. We do not have exact information as to what cores will be used, but we can imagine that they will be Cortex-A73 and A53 parts in big.LITTLE designs. Mali graphics will probably be the first to be offered on this advanced node as well due to the Artisan/POP program. Initial customers have not been disclosed and we likely will not hear about them until early 2017.
This is a big step for Intel. It is also a logical progression for them when we look over the changing market conditions of the past few years. They were unable to adequately compete in the handheld/mobile market with their x86 designs, but they still wanted to profit off of this ever expanding area. The logical way to monetize this market is to make the chips for those that are successfully competing here. This will cut into Intel’s margins, but it should increase their overall revenue base if they are successful here. There is no reason to believe that they won’t be.
The last question we have is if the 10nm HPM node will be identical to what Intel will use for their next generation “Cannonlake” products. My best guess is that the foundry process will be slightly different and will not provide some of the “secret sauce” that Intel will keep for themselves. It will probably be a mobile focused process node that stresses efficiency rather than transistor switching speed. I could be very wrong here, but I don’t believe that Intel will open up their process to everyone that comes to them hat in hand (AMD).
The partnership between ARM and Intel is a very interesting one that will benefit customers around the globe if it is handled correctly from both sides. Intel has a “not invented here” culture that has both benefited it and caused it much grief. Perhaps some flexibility on the foundry side will reap benefits of its own when dealing with very different designs than Intel is used to. This is a titanic move from where Intel probably thought it would be when it first started to pursue the ultra-mobile market, but it is a move that shows the giant can still positively react to industry trends.
First, Some Background
NVIDIA's Rumored GP102
When GP100 was announced, Josh and I were discussing, internally, how it would make sense in the gaming industry. Recently, an article on WCCFTech cited anonymous sources, which should always be taken with a dash of salt, that claimed NVIDIA was planning a second architecture, GP102, between GP104 and GP100. As I was writing this editorial about it, relating it to our own speculation about the physics of Pascal, VideoCardz claims to have been contacted by the developers of AIDA64, seemingly on-the-record, also citing a GP102 design.
I will retell chunks of the rumor, but also add my opinion to it.
In the last few generations, each architecture had a flagship chip that was released in both gaming and professional SKUs. Neither audience had access to a chip that was larger than the other's largest of that generation. Clock rates and disabled portions varied by specific product, with gaming usually getting the more aggressive performance for slightly better benchmarks. Fermi had GF100/GF110, Kepler had GK110/GK210, and Maxwell had GM200. Each of these were available in Tesla, Quadro, and GeForce cards, especially Titans.
Maxwell was interesting, though. NVIDIA was unable to leave 28nm, which Kepler launched on, so they created a second architecture at that node. To increase performance without having access to more feature density, you need to make your designs bigger, more optimized, or more simple. GM200 was giant and optimized, but, to get the performance levels it achieved, also needed to be more simple. Something needed to go, and double-precision (FP64) performance was the big omission. NVIDIA was upfront about it at the Titan X launch, and told their GPU compute customers to keep purchasing Kepler if they valued FP64.
Seeing Ryan transition from being a long-time Android user over to iOS late last year has had me thinking. While I've had hands on with flagship phones from many manufacturers since then, I haven't actually carried an Android device with me since the Nexus S (eventually, with the 4.0 Ice Cream Sandwich upgrade). Maybe it was time to go back in order to gain a more informed perspective of the mobile device market as it stands today.
So that's exactly what I did. When we received our Samsung Galaxy S7 review unit (full review coming soon, I promise!), I decided to go ahead and put a real effort forth into using Android for an extended period of time.
Full disclosure, I am still carrying my iPhone with me since we received a T-Mobile locked unit, and my personal number is on Verizon. However, I have been using the S7 for everything but phone calls, and the occasional text message to people who only has my iPhone number.
Now one of the questions you might be asking yourself right now is why did I choose the Galaxy S7 of all devices to make this transition with. Most Android aficionados would probably insist that I chose a Nexus device to get the best experience and one that Google intends to provide when developing Android. While these people aren't wrong, I decided that I wanted to go with a more popular device as opposed to the more niche Nexus line.
Whether you Samsung's approach or not, the fact is that they sell more Android devices than anyone else and the Galaxy S7 will be their flagship offering for the next year or so.
28HPCU: Cost Effective and Power Efficient
Have you ever been approached about something and upon first hearing about it, the opportunity just did not seem very exciting? Then upon digging into things, it became much more interesting? This happened to me with this announcement. At first blush, who really cares that ARM is partnering with UMC at 28 nm? Well, once I was able to chat with the people at ARM, it is much more interesting than initially expected.
The new hotness in fabrication is the latest 14 nm and 16 nm processes from Samsung/GF and TSMC respectively. It has been a good 4+ years since we last had a new process node that actually performed as expected. The planar 22/20 nm products just were not entirely suitable for mass production. Apple was one of the few to actually develop a part for TSMC’s 20 nm process that actually sold in the millions. The main problem was a lack of power and speed scaling as compared to 28 nm processes. Planar was a bad choice, but the development of FinFET technologies hadn’t been implemented in time for it to show up at this time by 3rd party manufacturers.
There is a problem with the latest process generations, though. They are new, expensive, and are production constrained. Also, they may not be entirely appropriate for the applications that are being developed. There are several strengths with 28 nm as compared. These are mature processes with an excess of line space. The major fabs are offering very competitive pricing structures for 28 nm as they see space being cleared up on the lines with higher end SOCs, GPUs, and assorted ASICs migrating to the new process nodes.
TSMC has typically been on the forefront of R&D with advanced nodes. UMC is not as aggressive with their development, but they tend to let others do some of the heavy lifting and then integrate the new nodes when it fits their pricing and business models. TSMC is on their third generation of 28 nm. UMC is on their second, but that generation encompasses many of the advanced features of TSMC’s 3rd generation so it is actually quite competitive.
Fighting for Relevance
AMD is still kicking. While the results of this past year have been forgettable, they have overcome some significant hurdles and look like they are improving their position in terms of cutting costs while extracting as much revenue as possible. There were plenty of ups and downs for this past quarter, but when compared to the rest of 2015 there were some solid steps forward here.
The company reported revenues of $958 million, which is down from $1.06 billion last quarter. The company also recorded a $103 million loss, but that is down significantly from the $197 million loss the quarter before. Q3 did have a $65 million write-down due to unsold inventory. Though the company made far less in revenues, they also shored up their losses. The company is still bleeding, but they still have plenty of cash on hand for the next several quarters to survive. When we talk about non-GAAP figures, AMD reports a $79 million loss for this past quarter.
For the entire year AMD recorded $3.99 billion in revenue with a net loss of $660 million. This is down from FY 2014 revenues of $5.51 billion and a net loss of $403 million. AMD certainly is trending downwards year over year, but they are hoping to reverse that come 2H 2016.
Graphics continues to be solid for AMD as they increased their sales from last quarter, but are down year on year. Holiday sales were brisk, but with only the high end Fury series being a new card during this season, the impact of that particular part was not as great as compared to the company having a new mid-range series like the newly introduced R9 380X. The second half of 2016 will see the introduction of the Polaris based GPUs for both mobile and desktop applications. Until then, AMD will continue to provide the current 28 nm lineup of GPUs to the market. At this point we are under the assumption that AMD and NVIDIA are looking at the same timeframe for introducing their next generation parts due to process technology advances. AMD already has working samples on Samsung’s/GLOBALFOUNDRIES 14nm LPP (low power plus) that they showed off at CES 2016.
Thank you for all you do!
Much of what I am going to say here is repeated from the description on our brand new Patreon support page, but I think a direct line to our readers is in order.
First, I think you may need a little back story. Ask anyone that has been doing online media in this field for any length of time and they will tell you that getting advertisers to sign on and support the production of "free" content has been getting more and more difficult. You'll see this proven out in the transition of several key personalities of our industry away from media into the companies they used to cover. And you'll see it in the absorption of some of our favorite media outlets, being purchased by larger entities with the promise of being able to continue doing what they have been doing. Or maybe you've seen it show as more interstitial ads, road blocks, sponsored site sections, etc.
At PC Perspective we've seen the struggle first hand but I have done my best to keep as much of that influence away from my team. We are not immune - several years ago we started doing site skins, something we didn't plan for initially. I do think I have done a better than average job keeping the lights on here though, so to speak. We have good sell through on our ad inventory and some of the best companies in our industry support the work we do.
Some of the PC Perspective team at CES 2016
Let me be clear though - we aren't on the verge of going out of business. I am not asking for Patreon supporters to keep from firing anyone. We just wanted to maintain and grow our content library and capability and it seemed like the audience that benefits and enjoys that content might be the best place to start.
Some of you are likely asking yourself if supporting PC Perspective is really necessary? After all, you can chug out a 400 word blog in no time! The truth is that high quality, technical content takes a lot of man hours and those hours are expensive. Our problem is that to advertisers, a page view is a page view, they don't really care how much time and effort went into creating the content on that page. If we spend 20 hours developing a way to evaluate variable refresh rate monitors with an oscilloscope, but put the results on a single page at pcper.com, we get the same amount of traffic as someone that just posts an hour's worth of gameplay experiences. Both are valuable to the community, but one costs a lot more to produce.
Frame Rating testing methodology helped move the industry forward
The easy way out is to create click bait style content (have you seen the new Marvel trailer??!?) and hope for enough extra page views to make up for the difference. But many people find the allure of the cheap/easy posts too easy and quickly devolve into press releases and marketing vomit. No one at PC Perspective wants to see that happen here.
Not only do we want to avoid a slide into that fate but we want to improve on what we are doing, going further down the path of technical analysis with high quality writing and video content. Very few people are working on this kind of writing and analysis yet it is vitally important to those of you that want the information to make critical purchasing decisions. And then you, in turn, pass those decisions on to others with less technical interest (brothers, mothers, friends).
We have ideas for new regular shows including a PC Perspective Mailbag, a gaming / Virtual LAN Party show and even an old hardware post-mortem production. All of these take extra time beyond what each person has dedicated today and the additional funding provided by a successful Patreon campaign will help us towards those goals.
I don't want anyone to feel that they are somehow less of a fan of PC Perspective if you can't help - that's not what we are about and not what I stand for. Just being here, reading and commenting on our work means a lot to us. You can still help by spreading the word about stories you find interesting or even doing your regular Amazon.com shopping through our link on the right side bar.
But for those of you that can afford a monthly contribution, consider a "value for value" amount. How much do you think the content we have produced and will produce is worth to you? If that's $3/month, thank you! If that's $20/month, thank you as well!
Support PC Perspective through Patreon
The team and I spent a lot of our time in the last several weeks talking through this Patreon campaign and we are proud to offer ourselves up to our community. PC Perspective is going to be here for a long time, and support from readers like you will help us be sure we can continue to improve and innovate on the information and content we provide.
Again, thank you so much for support over the last 16 years!
Looking Towards 2016
ARM invited us to a short conversation with them on the prospects of 2016. The initial answer as to how they feel the upcoming year will pan out is, “Interesting”. We covered a variety of topics ranging from VR to process technology. ARM is not announcing any new products at this time, but throughout this year they will continue to push their latest Mali graphics products as well as the Cortex A72.
Trends to Watch in 2016
The one overriding trend that we will see is that of “good phones at every price point”. ARM’s IP scales from very low to very high end mobile SOCs and their partners are taking advantage of the length and breadth of these technologies. High end phones based on custom cores (Apple, Qualcomm) will compete against those licensing the Cortex A72 and A57 parts for their phones. Lower end options that are less expensive and pull less power (which then requires less battery) will flesh out the midrange and budget parts. Unlike several years ago, the products from top to bottom are eminently usable and relatively powerful products.
Camera improvements will also take center stage for many products and continue to be a selling point and an area of differentiation for competitors. Improved sensors and software will obviously be the areas where the ARM partners will focus on, but ARM is putting some work into this area as well. Post processing requires quite a bit of power to do quickly and effectively. ARM is helping here to leverage the Neon SIMD engine and leveraging the power of the Mali GPU.
4K video is becoming more and more common as well with handhelds, and ARM is hoping to leverage that capability in shooting static pictures. A single 4K frame is around 8 megapixels in size. So instead of capturing video, the handheld can achieve a “best shot” type functionality. So the phone captures the 4K video and then users can choose the best shot available to them in that period of time. This is a simple idea that will be a nice feature for those with a product that can capture 4K video.
What you never knew you didn't know
While researching a few upcoming SD / microSD product reviews here at PC Perspective, I quickly found myself swimming in a sea of ratings and specifications. This write up was initially meant to explain and clarify these items, but it quickly grew into a reference too large to include in every SD card article, so I have spun it off here as a standalone reference. We hope it is as useful to you as it will be to our upcoming SD card reviews.
SD card speed ratings are a bit of a mess, so I'm going to do my best to clear things up here. I'll start with classes and grades. These are specs that define the *minimum* speed a given SD card should meet when reading or writing (both directions are used for the test). As with all flash devices, the write speed tends to be the more limiting factor. Without getting into gory detail, the tests used assume mostly sequential large writes and random reads occurring at no smaller than the minimum memory unit of the card (typically 512KB). The tests match the typical use case of an SD card, which is typically writing larger files (or sequential video streams), with minimal small writes (file table updates, etc).
In the above chart, we see speed 'Class' 2, 4, 6, and 10. The SD card spec calls out very specific requirements for these specs, but the gist of it is that an unfragmented SD card will be able to write at a minimum MB/s corresponding to its rated class (e.g. Class 6 = 6 MB/s minimum transfer speed). The workload specified is meant to represent a typical media device writing to an SD card, with buffering to account for slower FAT table updates (small writes). With higher bus speed modes (more on that later), we also get higher classes. Older cards that are not rated under this spec are referred to as 'Class 0'.
As we move higher than Class 10, we get to U1 and U3, which are referred to as UHS Speed Grades (contrary to the above table which states 'Class') in the SD card specification. The changeover from Class to Grade has something to do with speed modes, which also relates with the standard capacity of the card being used:
U1 and U3 correspond to 10 and 30 MB/s minimums, but the test conditions are slightly different for these specs (so Class 10 is not *exactly* the same as a U1 rating, even though they both equate to 10 MB/sec). Cards not performing to U1 are classified as 'Speed Grade 0'. One final note here is that a U rating also implies a UHS speed mode (see the next section).
New Components, New Approach
After 20 or so enclosure reviews over the past year and a half and some pretty inconsistent test hardware along the way, I decided to adopt a standardized test bench for all reviews going forward. Makes sense, right? Turns out choosing the best components for a cases and cooling test system was a lot more difficult than I expected going in, as special consideration had to be made for everything from form-factor to noise and heat levels.
Along with the new components I will also be changing the approach to future reviews by expanding the scope of CPU cooler testing. After some debate as to the type of CPU cooler to employ I decided that a better test of an enclosure would be to use both closed-loop liquid and air cooling for every review, and provide thermal and noise results for each. For CPU cooler reviews themselves I'll be adding a "real-world" load result to the charts to offer a more realistic scenario, running a standard desktop application (in this case a video encoder) in addition to the torture-test result using Prime95.
But what about this new build? It isn't completely done but here's a quick look at the components I ended up with so far along with the rationale for each selection.
CPU – Intel Core i5-6600K ($249, Amazon.com)
The introduction of Intel’s 6th generation Skylake processors provided the
excuse opportunity for an upgrade after using an AMD FX-6300 system for the last couple of enclosure reviews, and after toying with the idea of the new i7-6700K, and immediately realizing this was likely overkill and (more importantly) completely unavailable for purchase at the time, I went with the more "reasonable" option with the i5. There has long been a debate as to the need for hyper-threading for gaming (though this may be changing with the introduction of DX12) but in any case this is still a very powerful processor and when stressed should produce a challenging enough thermal load to adequately test both CPU coolers and enclosures going forward.
GPU – XFX Double Dissipation Radeon R9 290X ($347, Amazon.com)
This was by far the most difficult selection. I don’t think of my own use when choosing a card for a test system like this, as it must meet a set of criteria to be a good fit for enclosure benchmarks. If I choose a card that runs very cool and with minimal noise, GPU benchmarks will be far less significant as the card won’t adequately challenge the design and thermal characteristics of the enclosure. There are certainly options that run at greater temperatures and higher noise (a reference R9 290X for example), but I didn’t want a blower-style cooler with the GPU. Why? More and more GPUs are released with some sort of large multi-fan design rather than a blower, and for enclosure testing I want to know how the case handles the extra warm air.
Noise was an important consideration, as levels from an enclosure of course vary based on the installed components. With noise measurements a GPU cooler that has very low output at idle (or zero, as some recent cooler designs permit) will allow system idle levels to fall more on case fans and airflow than a GPU that might drown them out. (This would also allow a better benchmark of CPU cooler noise - particularly with self-contained liquid coolers and audible pump noise.) And while I wanted very quiet performance at idle, at load there must be sufficient noise to measure the performance of the enclosure in this regard, though of course nothing will truly tax a design quite like a loud blower. I hope I've found a good balance here.
To the Max?
Much of the PC enthusiast internet, including our comments section, has been abuzz with “Asynchronous Shader” discussion. Normally, I would explain what it is and then outline the issues that surround it, but I would like to swap that order this time. Basically, the Ashes of the Singularity benchmark utilizes Asynchronous Shaders in DirectX 12, but they disable it (by Vendor ID) for NVIDIA hardware. They say that this is because, while the driver reports compatibility, “attempting to use it was an unmitigated disaster in terms of performance and conformance”.
AMD's Robert Hallock claims that NVIDIA GPUs, including Maxwell, cannot support the feature in hardware at all, while all AMD GCN graphics cards do. NVIDIA has yet to respond to our requests for an official statement, although we haven't poked every one of our contacts yet. We will certainly update and/or follow up if we hear from them. For now though, we have no idea whether this is a hardware or software issue. Either way, it seems more than just politics.
So what is it?
Simply put, Asynchronous Shaders allows a graphics driver to cram workloads in portions of the GPU that are idle, but not otherwise available. For instance, if a graphics task is hammering the ROPs, the driver would be able to toss an independent physics or post-processing task into the shader units alongside it. Kollock from Oxide Games used the analogy of HyperThreading, which allows two CPU threads to be executed on the same core at the same time, as long as it has the capacity for it.
Kollock also notes that compute is becoming more important in the graphics pipeline, and it is possible to completely bypass graphics altogether. The fixed-function bits may never go away, but it's possible that at least some engines will completely bypass it -- maybe even their engine, several years down the road.
But, like always, you will not get an infinite amount of performance by reducing your waste. You are always bound by the theoretical limits of your components, and you cannot optimize past that (except for obviously changing the workload itself). The interesting part is: you can measure that. You can absolutely observe how long a GPU is idle, and represent it as a percentage of a time-span (typically a frame).
And, of course, game developers profile GPUs from time to time...
According to Kollock, he has heard of some console developers getting up to 30% increases in performance using Asynchronous Shaders. Again, this is on console hardware and so this amount may increase or decrease on the PC. In an informal chat with a developer at Epic Games, so massive grain of salt is required, his late night ballpark “totally speculative” guesstimate is that, on the Xbox One, the GPU could theoretically accept a maximum ~10-25% more work in Unreal Engine 4, depending on the scene. He also said that memory bandwidth gets in the way, which Asynchronous Shaders would be fighting against. It is something that they are interested in and investigating, though.
This is where I speculate on drivers. When Mantle was announced, I looked at its features and said “wow, this is everything that a high-end game developer wants, and a graphics developer absolutely does not”. From the OpenCL-like multiple GPU model taking much of the QA out of SLI and CrossFire, to the memory and resource binding management, this should make graphics drivers so much easier.
It might not be free, though. Graphics drivers might still have a bunch of games to play to make sure that work is stuffed through the GPU as tightly packed as possible. We might continue to see “Game Ready” drivers in the coming years, even though much of that burden has been shifted to the game developers. On the other hand, maybe these APIs will level the whole playing field and let all players focus on chip design and efficient injestion of shader code. As always, painfully always, time will tell.
It's Basically a Function Call for GPUs
Mantle, Vulkan, and DirectX 12 all claim to reduce overhead and provide a staggering increase in “draw calls”. As mentioned in the previous editorial, loading graphics card with tasks will take a drastic change in these new APIs. With DirectX 10 and earlier, applications would assign attributes to (what it is told is) the global state of the graphics card. After everything is configured and bound, one of a few “draw” functions is called, which queues the task in the graphics driver as a “draw call”.
While this suggests that just a single graphics device is to be defined, which we also mentioned in the previous article, it also implies that one thread needs to be the authority. This limitation was known about for a while, and it contributed to the meme that consoles can squeeze all the performance they have, but PCs are “too high level” for that. Microsoft tried to combat this with “Deferred Contexts” in DirectX 11. This feature allows virtual, shadow states to be loaded from secondary threads, which can be appended to the global state, whole. It was a compromise between each thread being able to create its own commands, and the legacy decision to have a single, global state for the GPU.
Some developers experienced gains, while others lost a bit. It didn't live up to expectations.
The paradigm used to load graphics cards is the problem. It doesn't make sense anymore. A developer might not want to draw a primitive with every poke of the GPU. At times, they might want to shove a workload of simple linear algebra through it, while other requests could simply be pushing memory around to set up a later task (or to read the result of a previous one). More importantly, any thread could want to do this to any graphics device.
The new graphics APIs allow developers to submit their tasks quicker and smarter, and it allows the drivers to schedule compatible tasks better, even simultaneously. In fact, the driver's job has been massively simplified altogether. When we tested 3DMark back in March, two interesting things were revealed:
- Both AMD and NVIDIA are only a two-digit percentage of draw call performance apart
- Both AMD and NVIDIA saw an order of magnitude increase in draw calls
Tick Tock Tick Tock Tick Tock Tock
A few websites have been re-reporting on a leak from BenchLife.info about Kaby Lake, which is supposedly a second 14nm redesign (“Tock”) to be injected between Skylake and Cannonlake.
UPDATE (July 2nd, 3:20pm ET): It has been pointed out that many hoaxes have come out of the same source, and that I should be more clear in my disclaimer. This is an unconfirmed, relatively easy to fake leak that does not have a second, independent source. I reported on it because (apart from being interesting enough) some details were listed on the images, but not highlighted in the leak, such as "GT0" and a lack of Iris Pro on -K. That suggests that the leaker got the images from somewhere, but didn't notice those details, which implies that the original source was hoaxed by an anonymous source, who only seeded the hoax to a single media outlet, or that it was an actual leak.
Either way, enjoy my analysis but realize that this is a single, unconfirmed source who allegedly published hoaxes in the past.
Image Credit: BenchLife.info
If true, this would be a major shift in both Intel's current roadmap as well as how they justify their research strategies. It also includes a rough stack of product categories, from 4.5W up to 91W TDPs, including their planned integrated graphics configurations. This leads to a pair of interesting stories:
How Kaby Lake could affect Intel's processors going forward. Since 2006, Intel has only budgeted a single CPU architecture redesign for any given fabrication process node. Taking two attempts on the 14nm process buys time for 10nm to become viable, but it could also give them more time to build up a better library of circuit elements, allowing them to assemble better processors in the future.
What type of user will be given Iris Pro? Also, will graphics-free options be available in the sub-Enthusiast class? When buying a processor from Intel, the high-end mainstream processors tend to have GT2-class graphics, such as the Intel HD 4600. Enthusiast architectures, such as Haswell-E, cannot be used without discrete graphics -- the extra space is used for more cores, I/O lanes, or other features. As we will discuss later, Broadwell took a step into changing the availability of Iris Pro in the high-end mainstream, but it doesn't seem like Kaby Lake will make any more progress. Also, if I am interpreting the table correctly, Kaby Lake might bring iGPU-less CPUs to LGA 1151.
Keeping Your Core Regular
To the first point, Intel has been on a steady tick-tock cycle since the Pentium 4 architecture reached the 65nm process node, which was a “tick”. The “tock” came from the Conroe/Merom architecture that was branded “Core 2”. This new architecture was a severe departure from the high clock, relatively low IPC design that Netburst was built around, which instantaneously changed the processor landscape from a dominant AMD to an Intel runaway lead.
After 65nm and Core 2 started the cycle, every new architecture alternated between shrinking the existing architecture to smaller transistors (tick) and creating a new design on the same fabrication process (tock). Even though Intel has been steadily increasing their R&D budget over time, which is now in the range of $10 to $12 billion USD each year, creating smaller, more intricate designs with new process nodes has been getting harder. For comparison, AMD's total revenue (not just profits) for 2014 was $5.51 billion USD.
Digging in a Little Deeper into the DiRT
Over the past few weeks I have had the chance to play the early access "DiRT Rally" title from Codemasters. This is a much more simulation based title that is currently PC only, which is a big switch for Codemasters and how they usually release their premier racing offerings. I was able to get a hold of Paul Coleman from Codemasters and set up a written interview with him. Paul's answers will be in italics.
Who are you, what do you do at Codemasters, and what do you do in your spare time away from the virtual wheel?
Hi my name is Paul Coleman and I am the Chief Games Designer on DiRT Rally. I’m responsible for making sure that the game is the most authentic representation of the sport it can be, I’m essentially representing the player in the studio. In my spare time I enjoy going on road trips with my family in our 1M Coupe. I’ve been co-driving in real world rally events for the last three years and I’ve used that experience to write and voice the co-driver calls in game.
If there is one area that DiRT has really excelled at is keeping frame rate consistent throughout multiple environments. Many games, especially those using cutting edge rendering techniques, often have dramatic frame rate drops at times. How do you get around this while still creating a very impressive looking game?
The engine that DiRT Rally has been built on has been constantly iterated on over the years and we have always been looking at ways of improving the look of the game while maintaining decent performance. That together with the fact that we work closely with GPU manufacturers on each project ensures that we stay current. We also have very strict performance monitoring systems that have come from optimising games for console. These systems have proved very useful when building DiRT Rally even though the game is exclusively on PC.
How do you balance out different controller use cases? While many hard core racers use a wheel, I have seen very competitive racing from people using handheld controllers as well as keyboards. Do you handicap/help those particular implementations so as not to make it overly frustrating to those users? I ask due to the difference in degrees of precision that a gamepad has vs. a wheel that can rotate 900 degrees.
Again this comes back to the fact that we have traditionally developed for console where the primary input device is a handheld controller. This is an area that other sims don’t usually have to worry about but for us it was second nature. There are systems that we have that add a layer between the handheld controller or keyboard and the game which help those guys but the wheel is without a doubt the best way to experience DiRT Rally as it is a direct input.
Process Technology Overview
We have been very spoiled throughout the years. We likely did not realize exactly how spoiled we were until it became very obvious that the rate of process technology advances hit a virtual brick wall. Every 18 to 24 months we were treated to a new, faster, more efficient process node that was opened up to fabless semiconductor firms and we were treated to a new generation of products that would blow our hair back. Now we have been in a virtual standstill when it comes to new process nodes from the pure-play foundries.
Few expected the 28 nm node to live nearly as long as it has. Some of the first cracks in the façade actually came from Intel. Their 22 nm Tri-Gate (FinFET) process took a little bit longer to get off the ground than expected. We also noticed some interesting electrical features from the products developed on that process. Intel skewed away from higher clockspeeds and focused on efficiency and architectural improvements rather than staying at generally acceptable TDPs and leapfrogging the competition by clockspeed alone. Overclockers noticed that the newer parts did not reach the same clockspeed heights as previous products such as the 32 nm based Sandy Bridge processors. Whether this decision was intentional from Intel or not is debatable, but my gut feeling here is that they responded to the technical limitations of their 22 nm process. Yields and bins likely dictated the max clockspeeds attained on these new products. So instead of vaulting over AMD’s products, they just slowly started walking away from them.
Samsung is one of the first pure-play foundries to offer a working sub-20 nm FinFET product line. (Photo courtesy of ExtremeTech)
When 28 nm was released the plans on the books were to transition to 20 nm products based on planar transistors, thereby bypassing the added expense of developing FinFETs. It was widely expected that FinFETs were not necessarily required to address the needs of the market. Sadly, that did not turn out to be the case. There are many other factors as to why 20 nm planar parts are not common, but the limitations of that particular process node has made it a relatively niche process node that is appropriate for smaller, low power ASICs (like the latest Apple SOCs). The Apple A8 is rumored to be around 90 mm square, which is a far cry from the traditional midrange GPU that goes from 250 mm sq. to 400+ mm sq.
The essential difficulty of the 20 nm planar node appears to be a lack of power scaling to match the increased transistor density. TSMC and others have successfully packed in more transistors into every square mm as compared to 28 nm, but the electrical characteristics did not scale proportionally well. Yes, there are improvements there per transistor, but when designers pack in all those transistors into a large design, TDP and voltage issues start to arise. As TDP increases, it takes more power to drive the processor, which then leads to more heat. The GPU guys probably looked at this and figured out that while they can achieve a higher transistor density and a wider design, they will have to downclock the entire GPU to hit reasonable TDP levels. When adding these concerns to yields and bins for the new process, the advantages of going to 20 nm would be slim to none at the end of the day.
Project Lead: Joris-Jan van ‘t Land
Thanks to Ian Comings, guest writer from the PC Perspective Forums who conducted the interview of Bohemia Interactive's Joris-Jan van ‘t Land. If you are interested in learning more about ArmA 3 and hanging out with some PC gamers to play it, check out the PC Perspective Gaming Forum!
I recently got the chance to send some questions to Bohemia Interactive, a computer game development company based out of Prague, Czech Republic, and a member of IDEA Games. Bohemia Interactive was founded in 1999 by CEO Marek Španěl, and it is best known for PC gaming gems like Operation Flashpoint: Cold War Crisis, The ArmA series, Take On Helicopters, and DayZ. The questions are answered by ArmA 3's Project Lead: Joris-Jan van ‘t Land.
PC Perspective: How long have you been at Bohemia Interactive?
VAN ‘T LAND: All in all, about 14 years now.
PC Perspective: What inspired you to become a Project Lead at Bohemia Interactive?
VAN ‘T LAND: During high school, it was pretty clear to me that I wanted to work in game development, and just before graduation, a friend and I saw a first preview for Operation Flashpoint: Cold War Crisis in a magazine. It immediately looked amazing to us; we were drawn to the freedom and diversity it promised and the military theme. After helping run a fan website (Operation Flashpoint Network) for a while, I started to assist with part-time external design work on the game (scripting and scenario editing). From that point, I basically grew naturally into this role at Bohemia Interactive.
PC Perspective: What part of working at Bohemia Interactive do you find most satisfying? What do you find most challenging?
VAN ‘T LAND: The amount of freedom and autonomy is very satisfying. If you can demonstrate skills in some area, you're welcome to come up with random ideas and roll with them. Some of those ideas can result in official releases, such as Arma 3 Zeus. Another rewarding aspect is the near real-time connection to those people who are playing the game. Our daily Dev-Branch release means the work I do on Monday is live on Tuesday. Our own ambitions, on the other hand, can sometimes result in some challenges. We want to do a lot and incorporate every aspect of combat in Arma, but we're still a relatively small team. This can mean we bite off more than we can deliver at an acceptable level of quality.
PC Perspective: What are some of the problems that have plagued your team, and how have they been overcome?
VAN ‘T LAND: One key problem for us was that we had no real experience with developing a game in more than one physical location. For Arma 3, our team was split over two main offices, which caused quite a few headaches in terms of communication and data synchronization. We've since had more key team members travel between the offices more frequently and improved our various virtual communication methods. A lot of work has been done to try to ensure that both offices have the latest version of the game at any given time. That is not always easy when your bandwidth is limited and games are getting bigger and bigger.
We’ve been tracking NVIDIA’s G-Sync for quite a while now. The comments section on Ryan’s initial article erupted with questions, and many of those were answered in a follow-on interview with NVIDIA’s Tom Petersen. The idea was radical – do away with the traditional fixed refresh rate and only send a new frame to the display when it has just completed rendering by the GPU. There are many benefits here, but the short version is that you get the low-latency benefit of V-SYNC OFF gaming combined with the image quality (lack of tearing) that you would see if V-SYNC was ON. Despite the many benefits, there are some potential disadvantages that come from attempting to drive an LCD panel at varying periods of time, as opposed to the fixed intervals that have been the norm for over a decade.
As the first round of samples came to us for review, the current leader appeared to be the ASUS ROG Swift. A G-Sync 144 Hz display at 1440P was sure to appeal to gamers who wanted faster response than the 4K 60 Hz G-Sync alternative was capable of. Due to what seemed to be large consumer demand, it has taken some time to get these panels into the hands of consumers. As our Storage Editor, I decided it was time to upgrade my home system, placed a pre-order, and waited with anticipation of finally being able to shift from my trusty Dell 3007WFP-HC to a large panel that can handle >2x the FPS.
Fast forward to last week. My pair of ROG Swifts arrived, and some other folks I knew had also received theirs. Before I could set mine up and get some quality gaming time in, my bro FifthDread and his wife both noted a very obvious flicker on their Swifts within the first few minutes of hooking them up. They reported the flicker during game loading screens and mid-game during background content loading occurring in some RTS titles. Prior to hearing from them, the most I had seen were some conflicting and contradictory reports on various forums (not limed to the Swift, though that is the earliest panel and would therefore see the majority of early reports), but now we had something more solid to go on. That night I fired up my own Swift and immediately got to doing what I do best – trying to break things. We have reproduced the issue and intend to demonstrate it in a measurable way, mostly to put some actual data out there to go along with those trying to describe something that is borderline perceptible for mere fractions of a second.
First a bit of misnomer correction / foundation laying:
- The ‘Screen refresh rate’ option you see in Windows Display Properties is actually a carryover from the CRT days. In terms of an LCD, it is the maximum rate at which a frame is output to the display. It is not representative of the frequency at which the LCD panel itself is refreshed by the display logic.
- LCD panel pixels are periodically updated by a scan, typically from top to bottom. Newer / higher quality panels repeat this process at a rate higher than 60 Hz in order to reduce the ‘rolling shutter’ effect seen when panning scenes or windows across the screen.
- In order to engineer faster responding pixels, manufacturers must deal with the side effect of faster pixel decay between refreshes. This is a balanced by increasing the frequency of scanning out to the panel.
- The effect we are going to cover here has nothing to do with motion blur, LightBoost, backlight PWM, LightBoost combined with G-Sync (not currently a thing, even though Blur Busters has theorized on how it could work, their method would not work with how G-Sync is actually implemented today).
With all of that out of the way, let’s tackle what folks out there may be seeing on their own variable refresh rate displays. Based on our testing so far, the flicker only presented at times when a game enters a 'stalled' state. These are periods where you would see a split-second freeze in the action, like during a background level load during game play in some titles. It also appears during some game level load screens, but as those are normally static scenes, they would have gone unnoticed on fixed refresh rate panels. Since we were absolutely able to see that something was happening, we wanted to be able to catch it in the act and measure it, so we rooted around the lab and put together some gear to do so. It’s not a perfect solution by any means, but we only needed to observe differences between the smooth gaming and the ‘stalled state’ where the flicker was readily observable. Once the solder dust settled, we fired up a game that we knew could instantaneously swing from a high FPS (144) to a stalled state (0 FPS) and back again. As it turns out, EVE Online does this exact thing while taking an in-game screen shot, so we used that for our initial testing. Here’s what the brightness of a small segment of the ROG Swift does during this very event:
Measured panel section brightness over time during a 'stall' event. Click to enlarge.
The relatively small ripple to the left and right of center demonstrate the panel output at just under 144 FPS. Panel redraw is in sync with the frames coming from the GPU at this rate. The center section, however, represents what takes place when the input from the GPU suddenly drops to zero. In the above case, the game briefly stalled, then resumed a few frames at 144, then stalled again for a much longer period of time. Completely stopping the panel refresh would result in all TN pixels bleeding towards white, so G-Sync has a built-in failsafe to prevent this by forcing a redraw every ~33 msec. What you are seeing are the pixels intermittently bleeding towards white and periodically being pulled back down to the appropriate brightness by a scan. The low latency panel used in the ROG Swift does this all of the time, but it is less noticeable at 144, as you can see on the left and right edges of the graph. An additional thing that’s happening here is an apparent rise in average brightness during the event. We are still researching the cause of this on our end, but this brightness increase certainly helps to draw attention to the flicker event, making it even more perceptible to those who might have not otherwise noticed it.
Some of you might be wondering why this same effect is not seen when a game drops to 30 FPS (or even lower) during the course of normal game play. While the original G-Sync upgrade kit implementation simply waited until 33 msec had passed until forcing an additional redraw, this introduced judder from 25-30 FPS. Based on our observations and testing, it appears that NVIDIA has corrected this in the retail G-Sync panels with an algorithm that intelligently re-scans at even multiples of the input frame rate in order to keep the redraw rate relatively high, and therefore keeping flicker imperceptible – even at very low continuous frame rates.
A few final points before we go:
- This is not limited to the ROG Swift. All variable refresh panels we have tested (including 4K) see this effect to a more or less degree than reported here. Again, this only occurs when games instantaneously drop to 0 FPS, and not when those games dip into low frame rates in a continuous fashion.
- The effect is less perceptible (both visually and with recorded data) at lower maximum refresh rate settings.
- The effect is not present at fixed refresh rates (G-Sync disabled or with non G-Sync panels).
This post was primarily meant as a status update and to serve as something for G-Sync users to point to when attempting to explain the flicker they are perceiving. We will continue researching, collecting data, and coordinating with NVIDIA on this issue, and will report back once we have more to discuss.
During the research and drafting of this piece, we reached out to and worked with NVIDIA to discuss this issue. Here is their statement:
"All LCD pixel values relax after refreshing. As a result, the brightness value that is set during the LCD’s scanline update slowly relaxes until the next refresh.
This means all LCDs have some slight variation in brightness. In this case, lower frequency refreshes will appear slightly brighter than high frequency refreshes by 1 – 2%.
When games are running normally (i.e., not waiting at a load screen, nor a screen capture) - users will never see this slight variation in brightness value. In the rare cases where frame rates can plummet to very low levels, there is a very slight brightness variation (barely perceptible to the human eye), which disappears when normal operation resumes."
So there you have it. It's basically down to the physics of how an LCD panel works at varying refresh rates. While I agree that it is a rare occurrence, there are some games that present this scenario more frequently (and noticeably) than others. If you've noticed this effect in some games more than others, let us know in the comments section below.
(Editor's Note: We are continuing to work with NVIDIA on this issue and hope to find a way to alleviate the flickering with either a hardware or software change in the future.)
It has become increasingly apparent that flash memory die shrinks have hit a bit of a brick wall in recent years. The issues faced by the standard 2D Planar NAND process were apparent very early on. This was no real secret - here's a slide seen at the 2009 Flash Memory Summit:
Despite this, most flash manufacturers pushed the envelope as far as they could within the limits of 2D process technology, balancing shrinks with reliability and performance. One of the largest flash manufacturers was Intel, having joined forces with Micron in a joint venture dubbed IMFT (Intel Micron Flash Technologies). Intel remained in lock-step with Micron all the way up to 20nm, but chose to hold back at the 16nm step, presumably in order to shift full focus towards alternative flash technologies. This was essentially confirmed late last week, with Intel's announcement of a shift to 3D NAND production.
Intel's press briefing seemed to focus more on cost efficiency than performance, and after reviewing the very few specs they released about this new flash, I believe we can do some theorizing as to the potential performance of this new flash memory. From the above illustration, you can see that Intel has chosen to go with the same sort of 3D technology used by Samsung - a 32 layer vertical stack of flash cells. This requires the use of an older / larger process technology, as it is too difficult to etch these holes at a 2x nm size. What keeps the die size reasonable is the fact that you get a 32x increase in bit density. Going off of a rough approximation from the above photo, imagine that 50nm die (8 Gbit), but with 32 vertical NAND layers. That would yield a 256 Gbit (32 GB) die within roughly the same footprint.
Representation of Samsung's 3D VNAND in 128Gbit and 86 Gbit variants.
20nm planar (2D) = yellow square, 16nm planar (2D) = blue square.
Image republished with permission from Schiltron Corporation.
It's likely a safe bet that IMFT flash will be going for a cost/GB far cheaper than the competing Samsung VNAND, and going with a relatively large 256 Gbit (vs. VNAND's 86 Gbit) per-die capacity is a smart move there, but let's not forget that there is a catch - write speed. Most NAND is very fast on reads, but limited on writes. Shifting from 2D to 3D NAND netted Samsung a 2x speed boost per die, and another effective 1.5x speed boost due to their choice to reduce per-die capacity from 128 Gbit to 86 Gbit. This effective speed boost came from the fact that a given VNAND SSD has 50% more dies to reach the same capacity as an SSD using 128 Gbit dies.
Now let's examine how Intel's choice of a 256 Gbit die impacts performance:
- Intel SSD 730 240GB = 16x128 Gbit 20nm dies
- 270 MB/sec writes and ~17 MB/sec/die
- Crucial MX100 128GB = 8x128Gbit 16nm dies
- 150 MB/sec writes and ~19 MB/sec/die
- Samsung 850 Pro 128GB = 12x86Gbit VNAND dies
- 470MB/sec writes and ~40 MB/sec/die
If we do some extrapolation based on the assumption that IMFT's move to 3D will net the same ~2x write speed improvement seen by Samsung, combined with their die capacity choice of 256Gbit, we get this:
- Future IMFT 128GB SSD = 4x256Gbit 3D dies
- 40 MB/sec/die x 4 dies = 160MB/sec
Even rounding up to 40 MB/sec/die, we can see that also doubling the die capacity effectively negates the performance improvement. While the IMFT flash equipped SSD will very likely be a lower cost product, it will (theoretically) see the same write speed limits seen in today's SSDs equipped with IMFT planar NAND. Now let's go one layer deeper on theoretical products and assume that Intel took the 18-channel NVMe controller from their P3700 Series and adopted it to a consumer PCIe SSD using this new 3D NAND. The larger die size limits the minimum capacity you can attain and still fully utilize their 18 channel controller, so with one die per channel, you end up with this product:
- Theoretical 18 channel IMFT PCIE 3D NAND SSD = 18x256Gbit 3D dies
- 40 MB/sec/die x 18 dies = 720 MB/sec
- 18x32GB (die capacity) = 576GB total capacity
Overprovisioning decisions aside, the above would be the lowest capacity product that could fully utilize the Intel PCIe controller. While the write performance is on the low side by PCIe SSD standards, the cost of such a product could easily be in the $0.50/GB range, or even less.
In summary, while we don't have any solid performance data, it appears that Intel's new 3D NAND is not likely to lead to a performance breakthrough in SSD speeds, but their choice on a more cost-effective per-die capacity for their new 3D NAND is likely to give them significant margins and the wiggle room to offer SSDs at a far lower cost/GB than we've seen in recent years. This may be the step that was needed to push SSD costs into a range that can truly compete with HDD technology.
Investigating the issue
** Edit ** (24 Sep)
We have updated this story with temperature effects on the read speed of old data. Additional info on page 3.
** End edit **
** Edit 2 ** (26 Sep)
New quote from Samsung:
"We acknowledge the recent issue associated with the Samsung 840 EVO SSDs and are qualifying a firmware update to address the issue. While this issue only affects a small subset of all 840 EVO users, we regret any inconvenience experienced by our customers. A firmware update that resolves the issue will be available on the Samsung SSD website soon. We appreciate our customer’s support and patience as we work diligently to resolve this issue."
** End edit 2 **
** Edit 3 **
The firmware update and performance restoration tool has been tested. Results are found here.
** End edit 3 **
Over the past week or two, there have been growing rumblings from owners of Samsung 840 and 840 EVO SSDs. A few reports scattered across internet forums gradually snowballed into lengthy threads as more and more people took a longer look at their own TLC-based Samsung SSD's performance. I've spent the past week following these threads, and the past few days evaluating this issue on the 840 and 840 EVO samples we have here at PC Perspective. This post is meant to inform you of our current 'best guess' as to just what is happening with these drives, and just what you should do about it.
The issue at hand is an apparent slow down in the reading of 'stale' data on TLC-based Samsung SSDs. Allow me to demonstrate:
You might have seen what looks like similar issues before, but after much research and testing, I can say with some confidence that this is a completely different and unique issue. The old X25-M bug was the result of random writes to the drive over time, but the above result is from a drive that only ever saw a single large file write to a clean drive. The above drive was the very same 500GB 840 EVO sample used in our prior review. It did just fine in that review, and at afterwards I needed a quick temporary place to put a HDD image file and just happened to grab that EVO. The file was written to the drive in December of 2013, and if it wasn't already apparent from the above HDTach pass, it was 442GB in size. This brings on some questions:
- If random writes (i.e. flash fragmentation) are not causing the slow down, then what is?
- How long does it take for this slow down to manifest after a file is written?