All | Editorial | General Tech | Graphics Cards | Networking | Motherboards | Cases and Cooling | Processors | Chipsets | Memory | Displays | Systems | Storage | Mobile | Shows and Expos
First, Some Background
NVIDIA's Rumored GP102
When GP100 was announced, Josh and I were discussing, internally, how it would make sense in the gaming industry. Recently, an article on WCCFTech cited anonymous sources, which should always be taken with a dash of salt, that claimed NVIDIA was planning a second architecture, GP102, between GP104 and GP100. As I was writing this editorial about it, relating it to our own speculation about the physics of Pascal, VideoCardz claims to have been contacted by the developers of AIDA64, seemingly on-the-record, also citing a GP102 design.
I will retell chunks of the rumor, but also add my opinion to it.
In the last few generations, each architecture had a flagship chip that was released in both gaming and professional SKUs. Neither audience had access to a chip that was larger than the other's largest of that generation. Clock rates and disabled portions varied by specific product, with gaming usually getting the more aggressive performance for slightly better benchmarks. Fermi had GF100/GF110, Kepler had GK110/GK210, and Maxwell had GM200. Each of these were available in Tesla, Quadro, and GeForce cards, especially Titans.
Maxwell was interesting, though. NVIDIA was unable to leave 28nm, which Kepler launched on, so they created a second architecture at that node. To increase performance without having access to more feature density, you need to make your designs bigger, more optimized, or more simple. GM200 was giant and optimized, but, to get the performance levels it achieved, also needed to be more simple. Something needed to go, and double-precision (FP64) performance was the big omission. NVIDIA was upfront about it at the Titan X launch, and told their GPU compute customers to keep purchasing Kepler if they valued FP64.
Seeing Ryan transition from being a long-time Android user over to iOS late last year has had me thinking. While I've had hands on with flagship phones from many manufacturers since then, I haven't actually carried an Android device with me since the Nexus S (eventually, with the 4.0 Ice Cream Sandwich upgrade). Maybe it was time to go back in order to gain a more informed perspective of the mobile device market as it stands today.
So that's exactly what I did. When we received our Samsung Galaxy S7 review unit (full review coming soon, I promise!), I decided to go ahead and put a real effort forth into using Android for an extended period of time.
Full disclosure, I am still carrying my iPhone with me since we received a T-Mobile locked unit, and my personal number is on Verizon. However, I have been using the S7 for everything but phone calls, and the occasional text message to people who only has my iPhone number.
Now one of the questions you might be asking yourself right now is why did I choose the Galaxy S7 of all devices to make this transition with. Most Android aficionados would probably insist that I chose a Nexus device to get the best experience and one that Google intends to provide when developing Android. While these people aren't wrong, I decided that I wanted to go with a more popular device as opposed to the more niche Nexus line.
Whether you Samsung's approach or not, the fact is that they sell more Android devices than anyone else and the Galaxy S7 will be their flagship offering for the next year or so.
28HPCU: Cost Effective and Power Efficient
Have you ever been approached about something and upon first hearing about it, the opportunity just did not seem very exciting? Then upon digging into things, it became much more interesting? This happened to me with this announcement. At first blush, who really cares that ARM is partnering with UMC at 28 nm? Well, once I was able to chat with the people at ARM, it is much more interesting than initially expected.
The new hotness in fabrication is the latest 14 nm and 16 nm processes from Samsung/GF and TSMC respectively. It has been a good 4+ years since we last had a new process node that actually performed as expected. The planar 22/20 nm products just were not entirely suitable for mass production. Apple was one of the few to actually develop a part for TSMC’s 20 nm process that actually sold in the millions. The main problem was a lack of power and speed scaling as compared to 28 nm processes. Planar was a bad choice, but the development of FinFET technologies hadn’t been implemented in time for it to show up at this time by 3rd party manufacturers.
There is a problem with the latest process generations, though. They are new, expensive, and are production constrained. Also, they may not be entirely appropriate for the applications that are being developed. There are several strengths with 28 nm as compared. These are mature processes with an excess of line space. The major fabs are offering very competitive pricing structures for 28 nm as they see space being cleared up on the lines with higher end SOCs, GPUs, and assorted ASICs migrating to the new process nodes.
TSMC has typically been on the forefront of R&D with advanced nodes. UMC is not as aggressive with their development, but they tend to let others do some of the heavy lifting and then integrate the new nodes when it fits their pricing and business models. TSMC is on their third generation of 28 nm. UMC is on their second, but that generation encompasses many of the advanced features of TSMC’s 3rd generation so it is actually quite competitive.
Fighting for Relevance
AMD is still kicking. While the results of this past year have been forgettable, they have overcome some significant hurdles and look like they are improving their position in terms of cutting costs while extracting as much revenue as possible. There were plenty of ups and downs for this past quarter, but when compared to the rest of 2015 there were some solid steps forward here.
The company reported revenues of $958 million, which is down from $1.06 billion last quarter. The company also recorded a $103 million loss, but that is down significantly from the $197 million loss the quarter before. Q3 did have a $65 million write-down due to unsold inventory. Though the company made far less in revenues, they also shored up their losses. The company is still bleeding, but they still have plenty of cash on hand for the next several quarters to survive. When we talk about non-GAAP figures, AMD reports a $79 million loss for this past quarter.
For the entire year AMD recorded $3.99 billion in revenue with a net loss of $660 million. This is down from FY 2014 revenues of $5.51 billion and a net loss of $403 million. AMD certainly is trending downwards year over year, but they are hoping to reverse that come 2H 2016.
Graphics continues to be solid for AMD as they increased their sales from last quarter, but are down year on year. Holiday sales were brisk, but with only the high end Fury series being a new card during this season, the impact of that particular part was not as great as compared to the company having a new mid-range series like the newly introduced R9 380X. The second half of 2016 will see the introduction of the Polaris based GPUs for both mobile and desktop applications. Until then, AMD will continue to provide the current 28 nm lineup of GPUs to the market. At this point we are under the assumption that AMD and NVIDIA are looking at the same timeframe for introducing their next generation parts due to process technology advances. AMD already has working samples on Samsung’s/GLOBALFOUNDRIES 14nm LPP (low power plus) that they showed off at CES 2016.
Thank you for all you do!
Much of what I am going to say here is repeated from the description on our brand new Patreon support page, but I think a direct line to our readers is in order.
First, I think you may need a little back story. Ask anyone that has been doing online media in this field for any length of time and they will tell you that getting advertisers to sign on and support the production of "free" content has been getting more and more difficult. You'll see this proven out in the transition of several key personalities of our industry away from media into the companies they used to cover. And you'll see it in the absorption of some of our favorite media outlets, being purchased by larger entities with the promise of being able to continue doing what they have been doing. Or maybe you've seen it show as more interstitial ads, road blocks, sponsored site sections, etc.
At PC Perspective we've seen the struggle first hand but I have done my best to keep as much of that influence away from my team. We are not immune - several years ago we started doing site skins, something we didn't plan for initially. I do think I have done a better than average job keeping the lights on here though, so to speak. We have good sell through on our ad inventory and some of the best companies in our industry support the work we do.
Some of the PC Perspective team at CES 2016
Let me be clear though - we aren't on the verge of going out of business. I am not asking for Patreon supporters to keep from firing anyone. We just wanted to maintain and grow our content library and capability and it seemed like the audience that benefits and enjoys that content might be the best place to start.
Some of you are likely asking yourself if supporting PC Perspective is really necessary? After all, you can chug out a 400 word blog in no time! The truth is that high quality, technical content takes a lot of man hours and those hours are expensive. Our problem is that to advertisers, a page view is a page view, they don't really care how much time and effort went into creating the content on that page. If we spend 20 hours developing a way to evaluate variable refresh rate monitors with an oscilloscope, but put the results on a single page at pcper.com, we get the same amount of traffic as someone that just posts an hour's worth of gameplay experiences. Both are valuable to the community, but one costs a lot more to produce.
Frame Rating testing methodology helped move the industry forward
The easy way out is to create click bait style content (have you seen the new Marvel trailer??!?) and hope for enough extra page views to make up for the difference. But many people find the allure of the cheap/easy posts too easy and quickly devolve into press releases and marketing vomit. No one at PC Perspective wants to see that happen here.
Not only do we want to avoid a slide into that fate but we want to improve on what we are doing, going further down the path of technical analysis with high quality writing and video content. Very few people are working on this kind of writing and analysis yet it is vitally important to those of you that want the information to make critical purchasing decisions. And then you, in turn, pass those decisions on to others with less technical interest (brothers, mothers, friends).
We have ideas for new regular shows including a PC Perspective Mailbag, a gaming / Virtual LAN Party show and even an old hardware post-mortem production. All of these take extra time beyond what each person has dedicated today and the additional funding provided by a successful Patreon campaign will help us towards those goals.
I don't want anyone to feel that they are somehow less of a fan of PC Perspective if you can't help - that's not what we are about and not what I stand for. Just being here, reading and commenting on our work means a lot to us. You can still help by spreading the word about stories you find interesting or even doing your regular Amazon.com shopping through our link on the right side bar.
But for those of you that can afford a monthly contribution, consider a "value for value" amount. How much do you think the content we have produced and will produce is worth to you? If that's $3/month, thank you! If that's $20/month, thank you as well!
Support PC Perspective through Patreon
The team and I spent a lot of our time in the last several weeks talking through this Patreon campaign and we are proud to offer ourselves up to our community. PC Perspective is going to be here for a long time, and support from readers like you will help us be sure we can continue to improve and innovate on the information and content we provide.
Again, thank you so much for support over the last 16 years!
Looking Towards 2016
ARM invited us to a short conversation with them on the prospects of 2016. The initial answer as to how they feel the upcoming year will pan out is, “Interesting”. We covered a variety of topics ranging from VR to process technology. ARM is not announcing any new products at this time, but throughout this year they will continue to push their latest Mali graphics products as well as the Cortex A72.
Trends to Watch in 2016
The one overriding trend that we will see is that of “good phones at every price point”. ARM’s IP scales from very low to very high end mobile SOCs and their partners are taking advantage of the length and breadth of these technologies. High end phones based on custom cores (Apple, Qualcomm) will compete against those licensing the Cortex A72 and A57 parts for their phones. Lower end options that are less expensive and pull less power (which then requires less battery) will flesh out the midrange and budget parts. Unlike several years ago, the products from top to bottom are eminently usable and relatively powerful products.
Camera improvements will also take center stage for many products and continue to be a selling point and an area of differentiation for competitors. Improved sensors and software will obviously be the areas where the ARM partners will focus on, but ARM is putting some work into this area as well. Post processing requires quite a bit of power to do quickly and effectively. ARM is helping here to leverage the Neon SIMD engine and leveraging the power of the Mali GPU.
4K video is becoming more and more common as well with handhelds, and ARM is hoping to leverage that capability in shooting static pictures. A single 4K frame is around 8 megapixels in size. So instead of capturing video, the handheld can achieve a “best shot” type functionality. So the phone captures the 4K video and then users can choose the best shot available to them in that period of time. This is a simple idea that will be a nice feature for those with a product that can capture 4K video.
What you never knew you didn't know
While researching a few upcoming SD / microSD product reviews here at PC Perspective, I quickly found myself swimming in a sea of ratings and specifications. This write up was initially meant to explain and clarify these items, but it quickly grew into a reference too large to include in every SD card article, so I have spun it off here as a standalone reference. We hope it is as useful to you as it will be to our upcoming SD card reviews.
SD card speed ratings are a bit of a mess, so I'm going to do my best to clear things up here. I'll start with classes and grades. These are specs that define the *minimum* speed a given SD card should meet when reading or writing (both directions are used for the test). As with all flash devices, the write speed tends to be the more limiting factor. Without getting into gory detail, the tests used assume mostly sequential large writes and random reads occurring at no smaller than the minimum memory unit of the card (typically 512KB). The tests match the typical use case of an SD card, which is typically writing larger files (or sequential video streams), with minimal small writes (file table updates, etc).
In the above chart, we see speed 'Class' 2, 4, 6, and 10. The SD card spec calls out very specific requirements for these specs, but the gist of it is that an unfragmented SD card will be able to write at a minimum MB/s corresponding to its rated class (e.g. Class 6 = 6 MB/s minimum transfer speed). The workload specified is meant to represent a typical media device writing to an SD card, with buffering to account for slower FAT table updates (small writes). With higher bus speed modes (more on that later), we also get higher classes. Older cards that are not rated under this spec are referred to as 'Class 0'.
As we move higher than Class 10, we get to U1 and U3, which are referred to as UHS Speed Grades (contrary to the above table which states 'Class') in the SD card specification. The changeover from Class to Grade has something to do with speed modes, which also relates with the standard capacity of the card being used:
U1 and U3 correspond to 10 and 30 MB/s minimums, but the test conditions are slightly different for these specs (so Class 10 is not *exactly* the same as a U1 rating, even though they both equate to 10 MB/sec). Cards not performing to U1 are classified as 'Speed Grade 0'. One final note here is that a U rating also implies a UHS speed mode (see the next section).
New Components, New Approach
After 20 or so enclosure reviews over the past year and a half and some pretty inconsistent test hardware along the way, I decided to adopt a standardized test bench for all reviews going forward. Makes sense, right? Turns out choosing the best components for a cases and cooling test system was a lot more difficult than I expected going in, as special consideration had to be made for everything from form-factor to noise and heat levels.
Along with the new components I will also be changing the approach to future reviews by expanding the scope of CPU cooler testing. After some debate as to the type of CPU cooler to employ I decided that a better test of an enclosure would be to use both closed-loop liquid and air cooling for every review, and provide thermal and noise results for each. For CPU cooler reviews themselves I'll be adding a "real-world" load result to the charts to offer a more realistic scenario, running a standard desktop application (in this case a video encoder) in addition to the torture-test result using Prime95.
But what about this new build? It isn't completely done but here's a quick look at the components I ended up with so far along with the rationale for each selection.
CPU – Intel Core i5-6600K ($249, Amazon.com)
The introduction of Intel’s 6th generation Skylake processors provided the
excuse opportunity for an upgrade after using an AMD FX-6300 system for the last couple of enclosure reviews, and after toying with the idea of the new i7-6700K, and immediately realizing this was likely overkill and (more importantly) completely unavailable for purchase at the time, I went with the more "reasonable" option with the i5. There has long been a debate as to the need for hyper-threading for gaming (though this may be changing with the introduction of DX12) but in any case this is still a very powerful processor and when stressed should produce a challenging enough thermal load to adequately test both CPU coolers and enclosures going forward.
GPU – XFX Double Dissipation Radeon R9 290X ($347, Amazon.com)
This was by far the most difficult selection. I don’t think of my own use when choosing a card for a test system like this, as it must meet a set of criteria to be a good fit for enclosure benchmarks. If I choose a card that runs very cool and with minimal noise, GPU benchmarks will be far less significant as the card won’t adequately challenge the design and thermal characteristics of the enclosure. There are certainly options that run at greater temperatures and higher noise (a reference R9 290X for example), but I didn’t want a blower-style cooler with the GPU. Why? More and more GPUs are released with some sort of large multi-fan design rather than a blower, and for enclosure testing I want to know how the case handles the extra warm air.
Noise was an important consideration, as levels from an enclosure of course vary based on the installed components. With noise measurements a GPU cooler that has very low output at idle (or zero, as some recent cooler designs permit) will allow system idle levels to fall more on case fans and airflow than a GPU that might drown them out. (This would also allow a better benchmark of CPU cooler noise - particularly with self-contained liquid coolers and audible pump noise.) And while I wanted very quiet performance at idle, at load there must be sufficient noise to measure the performance of the enclosure in this regard, though of course nothing will truly tax a design quite like a loud blower. I hope I've found a good balance here.
To the Max?
Much of the PC enthusiast internet, including our comments section, has been abuzz with “Asynchronous Shader” discussion. Normally, I would explain what it is and then outline the issues that surround it, but I would like to swap that order this time. Basically, the Ashes of the Singularity benchmark utilizes Asynchronous Shaders in DirectX 12, but they disable it (by Vendor ID) for NVIDIA hardware. They say that this is because, while the driver reports compatibility, “attempting to use it was an unmitigated disaster in terms of performance and conformance”.
AMD's Robert Hallock claims that NVIDIA GPUs, including Maxwell, cannot support the feature in hardware at all, while all AMD GCN graphics cards do. NVIDIA has yet to respond to our requests for an official statement, although we haven't poked every one of our contacts yet. We will certainly update and/or follow up if we hear from them. For now though, we have no idea whether this is a hardware or software issue. Either way, it seems more than just politics.
So what is it?
Simply put, Asynchronous Shaders allows a graphics driver to cram workloads in portions of the GPU that are idle, but not otherwise available. For instance, if a graphics task is hammering the ROPs, the driver would be able to toss an independent physics or post-processing task into the shader units alongside it. Kollock from Oxide Games used the analogy of HyperThreading, which allows two CPU threads to be executed on the same core at the same time, as long as it has the capacity for it.
Kollock also notes that compute is becoming more important in the graphics pipeline, and it is possible to completely bypass graphics altogether. The fixed-function bits may never go away, but it's possible that at least some engines will completely bypass it -- maybe even their engine, several years down the road.
But, like always, you will not get an infinite amount of performance by reducing your waste. You are always bound by the theoretical limits of your components, and you cannot optimize past that (except for obviously changing the workload itself). The interesting part is: you can measure that. You can absolutely observe how long a GPU is idle, and represent it as a percentage of a time-span (typically a frame).
And, of course, game developers profile GPUs from time to time...
According to Kollock, he has heard of some console developers getting up to 30% increases in performance using Asynchronous Shaders. Again, this is on console hardware and so this amount may increase or decrease on the PC. In an informal chat with a developer at Epic Games, so massive grain of salt is required, his late night ballpark “totally speculative” guesstimate is that, on the Xbox One, the GPU could theoretically accept a maximum ~10-25% more work in Unreal Engine 4, depending on the scene. He also said that memory bandwidth gets in the way, which Asynchronous Shaders would be fighting against. It is something that they are interested in and investigating, though.
This is where I speculate on drivers. When Mantle was announced, I looked at its features and said “wow, this is everything that a high-end game developer wants, and a graphics developer absolutely does not”. From the OpenCL-like multiple GPU model taking much of the QA out of SLI and CrossFire, to the memory and resource binding management, this should make graphics drivers so much easier.
It might not be free, though. Graphics drivers might still have a bunch of games to play to make sure that work is stuffed through the GPU as tightly packed as possible. We might continue to see “Game Ready” drivers in the coming years, even though much of that burden has been shifted to the game developers. On the other hand, maybe these APIs will level the whole playing field and let all players focus on chip design and efficient injestion of shader code. As always, painfully always, time will tell.
It's Basically a Function Call for GPUs
Mantle, Vulkan, and DirectX 12 all claim to reduce overhead and provide a staggering increase in “draw calls”. As mentioned in the previous editorial, loading graphics card with tasks will take a drastic change in these new APIs. With DirectX 10 and earlier, applications would assign attributes to (what it is told is) the global state of the graphics card. After everything is configured and bound, one of a few “draw” functions is called, which queues the task in the graphics driver as a “draw call”.
While this suggests that just a single graphics device is to be defined, which we also mentioned in the previous article, it also implies that one thread needs to be the authority. This limitation was known about for a while, and it contributed to the meme that consoles can squeeze all the performance they have, but PCs are “too high level” for that. Microsoft tried to combat this with “Deferred Contexts” in DirectX 11. This feature allows virtual, shadow states to be loaded from secondary threads, which can be appended to the global state, whole. It was a compromise between each thread being able to create its own commands, and the legacy decision to have a single, global state for the GPU.
Some developers experienced gains, while others lost a bit. It didn't live up to expectations.
The paradigm used to load graphics cards is the problem. It doesn't make sense anymore. A developer might not want to draw a primitive with every poke of the GPU. At times, they might want to shove a workload of simple linear algebra through it, while other requests could simply be pushing memory around to set up a later task (or to read the result of a previous one). More importantly, any thread could want to do this to any graphics device.
The new graphics APIs allow developers to submit their tasks quicker and smarter, and it allows the drivers to schedule compatible tasks better, even simultaneously. In fact, the driver's job has been massively simplified altogether. When we tested 3DMark back in March, two interesting things were revealed:
- Both AMD and NVIDIA are only a two-digit percentage of draw call performance apart
- Both AMD and NVIDIA saw an order of magnitude increase in draw calls
Tick Tock Tick Tock Tick Tock Tock
A few websites have been re-reporting on a leak from BenchLife.info about Kaby Lake, which is supposedly a second 14nm redesign (“Tock”) to be injected between Skylake and Cannonlake.
UPDATE (July 2nd, 3:20pm ET): It has been pointed out that many hoaxes have come out of the same source, and that I should be more clear in my disclaimer. This is an unconfirmed, relatively easy to fake leak that does not have a second, independent source. I reported on it because (apart from being interesting enough) some details were listed on the images, but not highlighted in the leak, such as "GT0" and a lack of Iris Pro on -K. That suggests that the leaker got the images from somewhere, but didn't notice those details, which implies that the original source was hoaxed by an anonymous source, who only seeded the hoax to a single media outlet, or that it was an actual leak.
Either way, enjoy my analysis but realize that this is a single, unconfirmed source who allegedly published hoaxes in the past.
Image Credit: BenchLife.info
If true, this would be a major shift in both Intel's current roadmap as well as how they justify their research strategies. It also includes a rough stack of product categories, from 4.5W up to 91W TDPs, including their planned integrated graphics configurations. This leads to a pair of interesting stories:
How Kaby Lake could affect Intel's processors going forward. Since 2006, Intel has only budgeted a single CPU architecture redesign for any given fabrication process node. Taking two attempts on the 14nm process buys time for 10nm to become viable, but it could also give them more time to build up a better library of circuit elements, allowing them to assemble better processors in the future.
What type of user will be given Iris Pro? Also, will graphics-free options be available in the sub-Enthusiast class? When buying a processor from Intel, the high-end mainstream processors tend to have GT2-class graphics, such as the Intel HD 4600. Enthusiast architectures, such as Haswell-E, cannot be used without discrete graphics -- the extra space is used for more cores, I/O lanes, or other features. As we will discuss later, Broadwell took a step into changing the availability of Iris Pro in the high-end mainstream, but it doesn't seem like Kaby Lake will make any more progress. Also, if I am interpreting the table correctly, Kaby Lake might bring iGPU-less CPUs to LGA 1151.
Keeping Your Core Regular
To the first point, Intel has been on a steady tick-tock cycle since the Pentium 4 architecture reached the 65nm process node, which was a “tick”. The “tock” came from the Conroe/Merom architecture that was branded “Core 2”. This new architecture was a severe departure from the high clock, relatively low IPC design that Netburst was built around, which instantaneously changed the processor landscape from a dominant AMD to an Intel runaway lead.
After 65nm and Core 2 started the cycle, every new architecture alternated between shrinking the existing architecture to smaller transistors (tick) and creating a new design on the same fabrication process (tock). Even though Intel has been steadily increasing their R&D budget over time, which is now in the range of $10 to $12 billion USD each year, creating smaller, more intricate designs with new process nodes has been getting harder. For comparison, AMD's total revenue (not just profits) for 2014 was $5.51 billion USD.
Digging in a Little Deeper into the DiRT
Over the past few weeks I have had the chance to play the early access "DiRT Rally" title from Codemasters. This is a much more simulation based title that is currently PC only, which is a big switch for Codemasters and how they usually release their premier racing offerings. I was able to get a hold of Paul Coleman from Codemasters and set up a written interview with him. Paul's answers will be in italics.
Who are you, what do you do at Codemasters, and what do you do in your spare time away from the virtual wheel?
Hi my name is Paul Coleman and I am the Chief Games Designer on DiRT Rally. I’m responsible for making sure that the game is the most authentic representation of the sport it can be, I’m essentially representing the player in the studio. In my spare time I enjoy going on road trips with my family in our 1M Coupe. I’ve been co-driving in real world rally events for the last three years and I’ve used that experience to write and voice the co-driver calls in game.
If there is one area that DiRT has really excelled at is keeping frame rate consistent throughout multiple environments. Many games, especially those using cutting edge rendering techniques, often have dramatic frame rate drops at times. How do you get around this while still creating a very impressive looking game?
The engine that DiRT Rally has been built on has been constantly iterated on over the years and we have always been looking at ways of improving the look of the game while maintaining decent performance. That together with the fact that we work closely with GPU manufacturers on each project ensures that we stay current. We also have very strict performance monitoring systems that have come from optimising games for console. These systems have proved very useful when building DiRT Rally even though the game is exclusively on PC.
How do you balance out different controller use cases? While many hard core racers use a wheel, I have seen very competitive racing from people using handheld controllers as well as keyboards. Do you handicap/help those particular implementations so as not to make it overly frustrating to those users? I ask due to the difference in degrees of precision that a gamepad has vs. a wheel that can rotate 900 degrees.
Again this comes back to the fact that we have traditionally developed for console where the primary input device is a handheld controller. This is an area that other sims don’t usually have to worry about but for us it was second nature. There are systems that we have that add a layer between the handheld controller or keyboard and the game which help those guys but the wheel is without a doubt the best way to experience DiRT Rally as it is a direct input.
Process Technology Overview
We have been very spoiled throughout the years. We likely did not realize exactly how spoiled we were until it became very obvious that the rate of process technology advances hit a virtual brick wall. Every 18 to 24 months we were treated to a new, faster, more efficient process node that was opened up to fabless semiconductor firms and we were treated to a new generation of products that would blow our hair back. Now we have been in a virtual standstill when it comes to new process nodes from the pure-play foundries.
Few expected the 28 nm node to live nearly as long as it has. Some of the first cracks in the façade actually came from Intel. Their 22 nm Tri-Gate (FinFET) process took a little bit longer to get off the ground than expected. We also noticed some interesting electrical features from the products developed on that process. Intel skewed away from higher clockspeeds and focused on efficiency and architectural improvements rather than staying at generally acceptable TDPs and leapfrogging the competition by clockspeed alone. Overclockers noticed that the newer parts did not reach the same clockspeed heights as previous products such as the 32 nm based Sandy Bridge processors. Whether this decision was intentional from Intel or not is debatable, but my gut feeling here is that they responded to the technical limitations of their 22 nm process. Yields and bins likely dictated the max clockspeeds attained on these new products. So instead of vaulting over AMD’s products, they just slowly started walking away from them.
Samsung is one of the first pure-play foundries to offer a working sub-20 nm FinFET product line. (Photo courtesy of ExtremeTech)
When 28 nm was released the plans on the books were to transition to 20 nm products based on planar transistors, thereby bypassing the added expense of developing FinFETs. It was widely expected that FinFETs were not necessarily required to address the needs of the market. Sadly, that did not turn out to be the case. There are many other factors as to why 20 nm planar parts are not common, but the limitations of that particular process node has made it a relatively niche process node that is appropriate for smaller, low power ASICs (like the latest Apple SOCs). The Apple A8 is rumored to be around 90 mm square, which is a far cry from the traditional midrange GPU that goes from 250 mm sq. to 400+ mm sq.
The essential difficulty of the 20 nm planar node appears to be a lack of power scaling to match the increased transistor density. TSMC and others have successfully packed in more transistors into every square mm as compared to 28 nm, but the electrical characteristics did not scale proportionally well. Yes, there are improvements there per transistor, but when designers pack in all those transistors into a large design, TDP and voltage issues start to arise. As TDP increases, it takes more power to drive the processor, which then leads to more heat. The GPU guys probably looked at this and figured out that while they can achieve a higher transistor density and a wider design, they will have to downclock the entire GPU to hit reasonable TDP levels. When adding these concerns to yields and bins for the new process, the advantages of going to 20 nm would be slim to none at the end of the day.
Project Lead: Joris-Jan van ‘t Land
Thanks to Ian Comings, guest writer from the PC Perspective Forums who conducted the interview of Bohemia Interactive's Joris-Jan van ‘t Land. If you are interested in learning more about ArmA 3 and hanging out with some PC gamers to play it, check out the PC Perspective Gaming Forum!
I recently got the chance to send some questions to Bohemia Interactive, a computer game development company based out of Prague, Czech Republic, and a member of IDEA Games. Bohemia Interactive was founded in 1999 by CEO Marek Španěl, and it is best known for PC gaming gems like Operation Flashpoint: Cold War Crisis, The ArmA series, Take On Helicopters, and DayZ. The questions are answered by ArmA 3's Project Lead: Joris-Jan van ‘t Land.
PC Perspective: How long have you been at Bohemia Interactive?
VAN ‘T LAND: All in all, about 14 years now.
PC Perspective: What inspired you to become a Project Lead at Bohemia Interactive?
VAN ‘T LAND: During high school, it was pretty clear to me that I wanted to work in game development, and just before graduation, a friend and I saw a first preview for Operation Flashpoint: Cold War Crisis in a magazine. It immediately looked amazing to us; we were drawn to the freedom and diversity it promised and the military theme. After helping run a fan website (Operation Flashpoint Network) for a while, I started to assist with part-time external design work on the game (scripting and scenario editing). From that point, I basically grew naturally into this role at Bohemia Interactive.
PC Perspective: What part of working at Bohemia Interactive do you find most satisfying? What do you find most challenging?
VAN ‘T LAND: The amount of freedom and autonomy is very satisfying. If you can demonstrate skills in some area, you're welcome to come up with random ideas and roll with them. Some of those ideas can result in official releases, such as Arma 3 Zeus. Another rewarding aspect is the near real-time connection to those people who are playing the game. Our daily Dev-Branch release means the work I do on Monday is live on Tuesday. Our own ambitions, on the other hand, can sometimes result in some challenges. We want to do a lot and incorporate every aspect of combat in Arma, but we're still a relatively small team. This can mean we bite off more than we can deliver at an acceptable level of quality.
PC Perspective: What are some of the problems that have plagued your team, and how have they been overcome?
VAN ‘T LAND: One key problem for us was that we had no real experience with developing a game in more than one physical location. For Arma 3, our team was split over two main offices, which caused quite a few headaches in terms of communication and data synchronization. We've since had more key team members travel between the offices more frequently and improved our various virtual communication methods. A lot of work has been done to try to ensure that both offices have the latest version of the game at any given time. That is not always easy when your bandwidth is limited and games are getting bigger and bigger.
We’ve been tracking NVIDIA’s G-Sync for quite a while now. The comments section on Ryan’s initial article erupted with questions, and many of those were answered in a follow-on interview with NVIDIA’s Tom Petersen. The idea was radical – do away with the traditional fixed refresh rate and only send a new frame to the display when it has just completed rendering by the GPU. There are many benefits here, but the short version is that you get the low-latency benefit of V-SYNC OFF gaming combined with the image quality (lack of tearing) that you would see if V-SYNC was ON. Despite the many benefits, there are some potential disadvantages that come from attempting to drive an LCD panel at varying periods of time, as opposed to the fixed intervals that have been the norm for over a decade.
As the first round of samples came to us for review, the current leader appeared to be the ASUS ROG Swift. A G-Sync 144 Hz display at 1440P was sure to appeal to gamers who wanted faster response than the 4K 60 Hz G-Sync alternative was capable of. Due to what seemed to be large consumer demand, it has taken some time to get these panels into the hands of consumers. As our Storage Editor, I decided it was time to upgrade my home system, placed a pre-order, and waited with anticipation of finally being able to shift from my trusty Dell 3007WFP-HC to a large panel that can handle >2x the FPS.
Fast forward to last week. My pair of ROG Swifts arrived, and some other folks I knew had also received theirs. Before I could set mine up and get some quality gaming time in, my bro FifthDread and his wife both noted a very obvious flicker on their Swifts within the first few minutes of hooking them up. They reported the flicker during game loading screens and mid-game during background content loading occurring in some RTS titles. Prior to hearing from them, the most I had seen were some conflicting and contradictory reports on various forums (not limed to the Swift, though that is the earliest panel and would therefore see the majority of early reports), but now we had something more solid to go on. That night I fired up my own Swift and immediately got to doing what I do best – trying to break things. We have reproduced the issue and intend to demonstrate it in a measurable way, mostly to put some actual data out there to go along with those trying to describe something that is borderline perceptible for mere fractions of a second.
First a bit of misnomer correction / foundation laying:
- The ‘Screen refresh rate’ option you see in Windows Display Properties is actually a carryover from the CRT days. In terms of an LCD, it is the maximum rate at which a frame is output to the display. It is not representative of the frequency at which the LCD panel itself is refreshed by the display logic.
- LCD panel pixels are periodically updated by a scan, typically from top to bottom. Newer / higher quality panels repeat this process at a rate higher than 60 Hz in order to reduce the ‘rolling shutter’ effect seen when panning scenes or windows across the screen.
- In order to engineer faster responding pixels, manufacturers must deal with the side effect of faster pixel decay between refreshes. This is a balanced by increasing the frequency of scanning out to the panel.
- The effect we are going to cover here has nothing to do with motion blur, LightBoost, backlight PWM, LightBoost combined with G-Sync (not currently a thing, even though Blur Busters has theorized on how it could work, their method would not work with how G-Sync is actually implemented today).
With all of that out of the way, let’s tackle what folks out there may be seeing on their own variable refresh rate displays. Based on our testing so far, the flicker only presented at times when a game enters a 'stalled' state. These are periods where you would see a split-second freeze in the action, like during a background level load during game play in some titles. It also appears during some game level load screens, but as those are normally static scenes, they would have gone unnoticed on fixed refresh rate panels. Since we were absolutely able to see that something was happening, we wanted to be able to catch it in the act and measure it, so we rooted around the lab and put together some gear to do so. It’s not a perfect solution by any means, but we only needed to observe differences between the smooth gaming and the ‘stalled state’ where the flicker was readily observable. Once the solder dust settled, we fired up a game that we knew could instantaneously swing from a high FPS (144) to a stalled state (0 FPS) and back again. As it turns out, EVE Online does this exact thing while taking an in-game screen shot, so we used that for our initial testing. Here’s what the brightness of a small segment of the ROG Swift does during this very event:
Measured panel section brightness over time during a 'stall' event. Click to enlarge.
The relatively small ripple to the left and right of center demonstrate the panel output at just under 144 FPS. Panel redraw is in sync with the frames coming from the GPU at this rate. The center section, however, represents what takes place when the input from the GPU suddenly drops to zero. In the above case, the game briefly stalled, then resumed a few frames at 144, then stalled again for a much longer period of time. Completely stopping the panel refresh would result in all TN pixels bleeding towards white, so G-Sync has a built-in failsafe to prevent this by forcing a redraw every ~33 msec. What you are seeing are the pixels intermittently bleeding towards white and periodically being pulled back down to the appropriate brightness by a scan. The low latency panel used in the ROG Swift does this all of the time, but it is less noticeable at 144, as you can see on the left and right edges of the graph. An additional thing that’s happening here is an apparent rise in average brightness during the event. We are still researching the cause of this on our end, but this brightness increase certainly helps to draw attention to the flicker event, making it even more perceptible to those who might have not otherwise noticed it.
Some of you might be wondering why this same effect is not seen when a game drops to 30 FPS (or even lower) during the course of normal game play. While the original G-Sync upgrade kit implementation simply waited until 33 msec had passed until forcing an additional redraw, this introduced judder from 25-30 FPS. Based on our observations and testing, it appears that NVIDIA has corrected this in the retail G-Sync panels with an algorithm that intelligently re-scans at even multiples of the input frame rate in order to keep the redraw rate relatively high, and therefore keeping flicker imperceptible – even at very low continuous frame rates.
A few final points before we go:
- This is not limited to the ROG Swift. All variable refresh panels we have tested (including 4K) see this effect to a more or less degree than reported here. Again, this only occurs when games instantaneously drop to 0 FPS, and not when those games dip into low frame rates in a continuous fashion.
- The effect is less perceptible (both visually and with recorded data) at lower maximum refresh rate settings.
- The effect is not present at fixed refresh rates (G-Sync disabled or with non G-Sync panels).
This post was primarily meant as a status update and to serve as something for G-Sync users to point to when attempting to explain the flicker they are perceiving. We will continue researching, collecting data, and coordinating with NVIDIA on this issue, and will report back once we have more to discuss.
During the research and drafting of this piece, we reached out to and worked with NVIDIA to discuss this issue. Here is their statement:
"All LCD pixel values relax after refreshing. As a result, the brightness value that is set during the LCD’s scanline update slowly relaxes until the next refresh.
This means all LCDs have some slight variation in brightness. In this case, lower frequency refreshes will appear slightly brighter than high frequency refreshes by 1 – 2%.
When games are running normally (i.e., not waiting at a load screen, nor a screen capture) - users will never see this slight variation in brightness value. In the rare cases where frame rates can plummet to very low levels, there is a very slight brightness variation (barely perceptible to the human eye), which disappears when normal operation resumes."
So there you have it. It's basically down to the physics of how an LCD panel works at varying refresh rates. While I agree that it is a rare occurrence, there are some games that present this scenario more frequently (and noticeably) than others. If you've noticed this effect in some games more than others, let us know in the comments section below.
(Editor's Note: We are continuing to work with NVIDIA on this issue and hope to find a way to alleviate the flickering with either a hardware or software change in the future.)
It has become increasingly apparent that flash memory die shrinks have hit a bit of a brick wall in recent years. The issues faced by the standard 2D Planar NAND process were apparent very early on. This was no real secret - here's a slide seen at the 2009 Flash Memory Summit:
Despite this, most flash manufacturers pushed the envelope as far as they could within the limits of 2D process technology, balancing shrinks with reliability and performance. One of the largest flash manufacturers was Intel, having joined forces with Micron in a joint venture dubbed IMFT (Intel Micron Flash Technologies). Intel remained in lock-step with Micron all the way up to 20nm, but chose to hold back at the 16nm step, presumably in order to shift full focus towards alternative flash technologies. This was essentially confirmed late last week, with Intel's announcement of a shift to 3D NAND production.
Intel's press briefing seemed to focus more on cost efficiency than performance, and after reviewing the very few specs they released about this new flash, I believe we can do some theorizing as to the potential performance of this new flash memory. From the above illustration, you can see that Intel has chosen to go with the same sort of 3D technology used by Samsung - a 32 layer vertical stack of flash cells. This requires the use of an older / larger process technology, as it is too difficult to etch these holes at a 2x nm size. What keeps the die size reasonable is the fact that you get a 32x increase in bit density. Going off of a rough approximation from the above photo, imagine that 50nm die (8 Gbit), but with 32 vertical NAND layers. That would yield a 256 Gbit (32 GB) die within roughly the same footprint.
Representation of Samsung's 3D VNAND in 128Gbit and 86 Gbit variants.
20nm planar (2D) = yellow square, 16nm planar (2D) = blue square.
Image republished with permission from Schiltron Corporation.
It's likely a safe bet that IMFT flash will be going for a cost/GB far cheaper than the competing Samsung VNAND, and going with a relatively large 256 Gbit (vs. VNAND's 86 Gbit) per-die capacity is a smart move there, but let's not forget that there is a catch - write speed. Most NAND is very fast on reads, but limited on writes. Shifting from 2D to 3D NAND netted Samsung a 2x speed boost per die, and another effective 1.5x speed boost due to their choice to reduce per-die capacity from 128 Gbit to 86 Gbit. This effective speed boost came from the fact that a given VNAND SSD has 50% more dies to reach the same capacity as an SSD using 128 Gbit dies.
Now let's examine how Intel's choice of a 256 Gbit die impacts performance:
- Intel SSD 730 240GB = 16x128 Gbit 20nm dies
- 270 MB/sec writes and ~17 MB/sec/die
- Crucial MX100 128GB = 8x128Gbit 16nm dies
- 150 MB/sec writes and ~19 MB/sec/die
- Samsung 850 Pro 128GB = 12x86Gbit VNAND dies
- 470MB/sec writes and ~40 MB/sec/die
If we do some extrapolation based on the assumption that IMFT's move to 3D will net the same ~2x write speed improvement seen by Samsung, combined with their die capacity choice of 256Gbit, we get this:
- Future IMFT 128GB SSD = 4x256Gbit 3D dies
- 40 MB/sec/die x 4 dies = 160MB/sec
Even rounding up to 40 MB/sec/die, we can see that also doubling the die capacity effectively negates the performance improvement. While the IMFT flash equipped SSD will very likely be a lower cost product, it will (theoretically) see the same write speed limits seen in today's SSDs equipped with IMFT planar NAND. Now let's go one layer deeper on theoretical products and assume that Intel took the 18-channel NVMe controller from their P3700 Series and adopted it to a consumer PCIe SSD using this new 3D NAND. The larger die size limits the minimum capacity you can attain and still fully utilize their 18 channel controller, so with one die per channel, you end up with this product:
- Theoretical 18 channel IMFT PCIE 3D NAND SSD = 18x256Gbit 3D dies
- 40 MB/sec/die x 18 dies = 720 MB/sec
- 18x32GB (die capacity) = 576GB total capacity
Overprovisioning decisions aside, the above would be the lowest capacity product that could fully utilize the Intel PCIe controller. While the write performance is on the low side by PCIe SSD standards, the cost of such a product could easily be in the $0.50/GB range, or even less.
In summary, while we don't have any solid performance data, it appears that Intel's new 3D NAND is not likely to lead to a performance breakthrough in SSD speeds, but their choice on a more cost-effective per-die capacity for their new 3D NAND is likely to give them significant margins and the wiggle room to offer SSDs at a far lower cost/GB than we've seen in recent years. This may be the step that was needed to push SSD costs into a range that can truly compete with HDD technology.
Investigating the issue
** Edit ** (24 Sep)
We have updated this story with temperature effects on the read speed of old data. Additional info on page 3.
** End edit **
** Edit 2 ** (26 Sep)
New quote from Samsung:
"We acknowledge the recent issue associated with the Samsung 840 EVO SSDs and are qualifying a firmware update to address the issue. While this issue only affects a small subset of all 840 EVO users, we regret any inconvenience experienced by our customers. A firmware update that resolves the issue will be available on the Samsung SSD website soon. We appreciate our customer’s support and patience as we work diligently to resolve this issue."
** End edit 2 **
** Edit 3 **
The firmware update and performance restoration tool has been tested. Results are found here.
** End edit 3 **
Over the past week or two, there have been growing rumblings from owners of Samsung 840 and 840 EVO SSDs. A few reports scattered across internet forums gradually snowballed into lengthy threads as more and more people took a longer look at their own TLC-based Samsung SSD's performance. I've spent the past week following these threads, and the past few days evaluating this issue on the 840 and 840 EVO samples we have here at PC Perspective. This post is meant to inform you of our current 'best guess' as to just what is happening with these drives, and just what you should do about it.
The issue at hand is an apparent slow down in the reading of 'stale' data on TLC-based Samsung SSDs. Allow me to demonstrate:
You might have seen what looks like similar issues before, but after much research and testing, I can say with some confidence that this is a completely different and unique issue. The old X25-M bug was the result of random writes to the drive over time, but the above result is from a drive that only ever saw a single large file write to a clean drive. The above drive was the very same 500GB 840 EVO sample used in our prior review. It did just fine in that review, and at afterwards I needed a quick temporary place to put a HDD image file and just happened to grab that EVO. The file was written to the drive in December of 2013, and if it wasn't already apparent from the above HDTach pass, it was 442GB in size. This brings on some questions:
- If random writes (i.e. flash fragmentation) are not causing the slow down, then what is?
- How long does it take for this slow down to manifest after a file is written?
Since the introduction of the Haswell line of CPUs, the Internet has been aflame with how hot the CPUs run. Speculation ran rampant on the cause with theories abounding about the lesser surface area and inferior thermal interface material (TIM) in between the CPU die surface and the underside of the CPU heat spreader. It was later confirmed that Intel had changed the TIM interfacing the CPU die surface to the heat spreader with Haswell, leading to the hotter than expected CPU temperatures. This increase in temperature led to inconsistent core-to-core temperatures as well as vastly inferior overclockability of the Haswell K-series chips over previous generations.
A few of the more adventurous enthusiasts took it upon themselves to use inventive ways to address the heat concerns surrounding the Haswell by delidding the processor. The delidding procedure involves physically removing the heat spreader from the CPU, exposing the CPU die. Some individuals choose to clean the existing TIM from the core die and heat spreader underside, applying superior TIM such as metal or diamond-infused paste or even the Coollaboratory Liquid Ultra metal material and fixing the heat spreader back in place. Others choose a more radical solution, removing the heat spreader from the equation entirely for direct cooling of the naked CPU die. This type of cooling method requires use of a die support plate, such as the MSI Die Guard included with the MSI Z97 XPower motherboard.
Whichever outcome you choose, you must first remove the heat spreader from the CPU's PCB. The heat spreader itself is fixed in place with black RTV-type material ensuring a secure and air-tight seal, protecting the fragile die from outside contaminants and influences. Removal can be done in multiple ways with two of the most popular being the razor blade method and the vise method. With both methods, you are attempting to separate the CPU PCB from the heat spreader without damaging the CPU die or components on the top or bottom sides of the CPU PCB.
When Magma Freezes Over...
Intel confirms that they have approached AMD about access to their Mantle API. The discussion, despite being clearly labeled as "an experiment" by an Intel spokesperson, was initiated by them -- not AMD. According to AMD's Gaming Scientist, Richard Huddy, via PCWorld, AMD's response was, "Give us a month or two" and "we'll go into the 1.0 phase sometime this year" which only has about five months left in it. When the API reaches 1.0, anyone who wants to participate (including hardware vendors) will be granted access.
AMD inside Intel Inside???
I do wonder why Intel would care, though. Intel has the fastest per-thread processors, and their GPUs are not known to be workhorses that are held back by API call bottlenecks, either. Of course, that is not to say that I cannot see any reason, however...
The AMD Argument
Earlier this week, a story was posted in a Forbes.com blog that dove into the idea of NVIDIA GameWorks and how it was doing a disservice not just on the latest Ubisoft title Watch_Dogs but on PC gamers in general. Using quotes from AMD directly, the author claims that NVIDIA is actively engaging in methods to prevent game developers from optimizing games for AMD graphics hardware. This is an incredibly bold statement and one that I hope AMD is not making lightly. Here is a quote from the story:
Gameworks represents a clear and present threat to gamers by deliberately crippling performance on AMD products (40% of the market) to widen the margin in favor of NVIDIA products. . . . Participation in the Gameworks program often precludes the developer from accepting AMD suggestions that would improve performance directly in the game code—the most desirable form of optimization.
The example cited on the Forbes story is the recently released Watch_Dogs title, which appears to show favoritism towards NVIDIA GPUs with performance of the GTX 770 ($369) coming close the performance of a Radeon R9 290X ($549).
It's evident that Watch Dogs is optimized for Nvidia hardware but it's staggering just how un-optimized it is on AMD hardware.
Watch_Dogs is the latest GameWorks title released this week.
I decided to get in touch with AMD directly to see exactly what stance the company was attempting to take with these kinds of claims. No surprise, AMD was just as forward with me as they appeared to be in the Forbes story originally.
The AMD Stance
Central to AMD’s latest annoyance with the competition is the NVIDIA GameWorks program. First unveiled last October during a press event in Montreal, GameWorks combines several NVIDIA built engine functions into libraries that can be utilized and accessed by game developers to build advanced features into games. NVIDIA’s website claims that GameWorks is “easy to integrate into games” while also including tutorials and tools to help quickly generate content with the software set. Included in the GameWorks suite are tools like VisualFX which offers rendering solutions like HBAO+, TXAA, Depth of Field, FaceWorks, HairWorks and more. Physics tools include the obvious like PhysX while also adding clothing, destruction, particles and more.