All | Editorial | General Tech | Graphics Cards | Networking | Motherboards | Cases and Cooling | Processors | Chipsets | Memory | Displays | Systems | Storage | Mobile | Shows and Expos
Ultimate Cord Cutting Guide - Part 2: Installation & Configuration
We're back with Part 2 of our cord cutting series, documenting our experience with dumping traditional cable and satellite providers in exchange for cheaper and more flexible online and over-the-air content. In Part 1 we looked at the devices that could serve as our cord-cutting hub, the types of subscription content that would be available, and the options for free OTA and online media.
In the end, we selected the NVIDIA SHIELD as our central media device due to its power, capabilities, and flexibility. Now in Part 2 we'll walk through setting up the SHIELD, adding our channels and services, configuring Plex, and more!
The Expected Unexpected
Last night we first received word that Raja had resigned from AMD (during a sabbatical) after they had launched Vega. The initial statement was that Raja would come back to resume his position at AMD in a December/January timeframe. During this time there was some doubt as to if Raja would in fact come back to AMD, as “sabbaticals” in the tech world would often lead the individual to take stock of their situation and move on to what they would consider to be greener pastures.
Raja has dropped by the PCPer offices in the past.
Initially it was thought that Raja would take the time off and then eventually jump to another company and tackle the issues there. This behavior is quite common in Silicon Valley and Raja is no stranger to this. Raja cut his teeth on 3D graphics at S3, but in 2001 he moved to ATI. While there he worked on a variety of programs including the original Radeon, the industry changing Radeon 9700 series, and finishing up with the strong HD 4000 series of parts. During this time ATI was acquired by AMD and he became one of the top graphics guru at that company. In 2009 he quit AMD and moved on to Apple. He was Director of Graphics Architecture at Apple, but little is known about what he actually did. During that time Apple utilized AMD GPUs and licensed Imagination Technologies graphics technology. Apple could have been working on developing their own architecture at this point, which has recently showed up in the latest iPhone products.
In 2013 Raja rejoined AMD and became a corporate VP of Visual Computing, but in 2015 he was promoted to leading the Radeon Technology Group after Lisu Su became CEO of the company. While there Raja worked to get AMD back on an even footing under pretty strained conditions. AMD had not had the greatest of years and had seen their primary moneymakers start taking on water. AMD had competitive graphics for the most part, and the Radeon technology integrated into AMD’s APUs truly was class leading. On the discrete side AMD was able to compare favorably to NVIDIA with the HD 7000 and later R9 200 series of cards. After NVIDIA released their Maxwell based chips, AMD had a hard time keeping up. The general consensus here is that the RTG group saw its headcount decreased by the company-wide cuts as well as a decrease in R&D funds.
Providers and Devices
"Cutting the Cord," the process of ditching traditional cable and satellite content providers for cheaper online-based services, is nothing new. For years, consumers have cancelled their cable subscriptions (or declined to even subscribe in the first place), opting instead to get their entertainment from companies like Netflix, Hulu, and YouTube.
But the recent introduction of online streaming TV services like Sling TV, new technologies like HDR, and the slow online adoption of live local channels has made the idea of cord cutting more complicated. While cord cutters who are happy with just Netflix and YouTube need not worry, what are the solutions for those who don't like the idea of high cost cable subscriptions but also want to preserve access to things like local channels and the latest 4K HDR content?
This article is the first in a three-part series that will look at this "high-end" cord cutting scenario. We'll be taking a look at the options for online streaming TV, access to local "OTA" (over the air) channels, and the devices that can handle it all, including DVR support, 4K output, and HDR compliance.
There are two approaches that you can take when considering the cord cutting process. The first is to focus on capabilities: Do you want 4K? HDR? Lossless surround sound audio? Voice search? Gaming?
The second approach is to focus on content: Do you want live TV or à la carte downloads? Can you live without ESPN or must it and your other favorite networks still be available? Are you heavily invested in iTunes content? Perhaps most importantly for those concerned with the "Spousal Acceptance Factor" (SAP), do you want the majority of your content contained in a single app, which can prevent you and your family members from having to jump between apps or devices to find what they want?
While most people on the cord cutting path will consider both approaches to a certain degree, it's easier to focus on the one that's most important to you, as that will make other choices involving devices and content easier. Of course, there are those of us out there that are open to purchasing and using multiple devices and content sources at once, giving us everything at the expense of increased complexity. But most cord cutters, especially those with families, will want to pursue a setup based around a single device that accommodates most, if not all, of their needs. And that's exactly what we set out to find.
Introduction, How PCM Works, Reading, Writing, and Tweaks
I’ve seen a bit of flawed logic floating around related to discussions about 3D XPoint technology. Some are directly comparing the cost per die to NAND flash (you can’t - 3D XPoint likely has fewer fab steps than NAND - especially when compared with 3D NAND). Others are repeating a bunch of terminology and element names without taking the time to actually explain how it works, and far too many folks out there can't even pronounce it correctly (it's spoken 'cross-point'). My plan is to address as much of the confusion as I can with this article, and I hope you walk away understanding how XPoint and its underlying technologies (most likely) work. While we do not have absolute confirmation of the precise material compositions, there is a significant amount of evidence pointing to one particular set of technologies. With Optane Memory now out in the wild and purchasable by folks wielding electron microscopes and mass spectrometers, I have seen enough additional information come across to assume XPoint is, in fact, PCM based.
XPoint memory. Note the shape of the cell/selector structure. This will be significant later.
While we were initially told at the XPoint announcement event Q&A that the technology was not phase change based, there is overwhelming evidence to the contrary, and it is likely that Intel did not want to let the cat out of the bag too early. The funny thing about that is that both Intel and Micron were briefing on PCM-based memory developments five years earlier, and nearly everything about those briefings lines up perfectly with what appears to have ended up in the XPoint that we have today.
Some die-level performance characteristics of various memory types. source
The above figures were sourced from a 2011 paper and may be a bit dated, but they do a good job putting some actual numbers with the die-level performance of the various solid state memory technologies. We can also see where the ~1000x speed and ~1000x endurance comparisons with XPoint to NAND Flash came from. Now, of course, those performance characteristics do not directly translate to the performance of a complete SSD package containing those dies. Controller overhead and management must take their respective cuts, as is shown with the performance of the first generation XPoint SSD we saw come out of Intel:
The ‘bridging the gap’ Latency Percentile graph from our Intel SSD DC P4800X review.
(The P4800X comes in at 10us above).
There have been a few very vocal folks out there chanting 'not good enough', without the basic understanding that the first publicly available iteration of a new technology never represents its ultimate performance capabilities. It took NAND flash decades to make it into usable SSDs, and another decade before climbing to the performance levels we enjoy today. Time will tell if this holds true for XPoint, but given Micron's demos and our own observed performance of Intel's P4800X and Optane Memory SSDs, I'd argue that it is most certainly off to a good start!
A 3D XPoint die, submitted for your viewing pleasure (click for larger version).
Zen vs. 40 Years of CPU Development
Zen is nearly upon us. AMD is releasing its next generation CPU architecture to the world this week and we saw CPU demonstrations and upcoming AM4 motherboards at CES in early January. We have been shown tantalizing glimpses of the performance and capabilities of the “Ryzen” products that will presumably fill the desktop markets from $150 to $499. I have yet to be briefed on the product stack that AMD will be offering, but we know enough to start to think how positioning and placement will be addressed by these new products.
To get a better understanding of how Ryzen will stack up, we should probably take a look back at what AMD has accomplished in the past and how Intel has responded to some of the stronger products. AMD has been in business for 47 years now and has been a major player in semiconductors for most of that time. It really has only been since the 90s where AMD started to battle Intel head to head that people have become passionate about the company and their products.
The industry is a complex and ever-shifting one. AMD and Intel have been two stalwarts over the years. Even though AMD has had more than a few challenging years over the past decade, it still moves forward and expects to compete at the highest level with its much larger and better funded competitor. 2017 could very well be a breakout year for the company with a return to solid profitability in both CPU and GPU markets. I am not the only one who thinks this considering that AMD shares that traded around the $2 mark ten months ago are now sitting around $14.
AMD Through 1996
AMD became a force in the CPU industry due to IBM’s requirement to have a second source for its PC business. Intel originally entered into a cross licensing agreement with AMD to allow it to produce x86 chips based on Intel designs. AMD eventually started to produce their own versions of these parts and became a favorite in the PC clone market. Eventually Intel tightened down on this agreement and then cancelled it, but through near endless litigation AMD ended up with a x86 license deal with Intel.
AMD produced their own Am286 chip that was the first real break from the second sourcing agreement with Intel. Intel balked at sharing their 386 design with AMD and eventually forced the company to develop its own clean room version. The Am386 was released in the early 90s, well after Intel had been producing those chips for years. AMD then developed their own version of the Am486 which then morphed into the Am5x86. The company made some good inroads with these speedy parts and typically clocked them faster than their Intel counterparts (eg. Am486 40 MHz and 80 MHz vs. the Intel 486 DX33 and DX66). AMD priced these points lower so users could achieve better performance per dollar using the same chipsets and motherboards.
Intel released their first Pentium chips in 1993. The initial version was hot and featured the infamous FDIV bug. AMD made some inroads against these parts by introducing the faster Am486 and Am5x86 parts that would achieve clockspeeds from 133 MHz to 150 MHz at the very top end. The 150 MHz part was very comparable in overall performance to the Pentium 75 MHz chip and we saw the introduction of the dreaded “P-rating” on processors.
There is no denying that Intel continued their dominance throughout this time by being the gold standard in x86 manufacturing and design. AMD slowly chipped away at its larger rival and continued to profit off of the lucrative x86 market. William Sanders III set the bar higher about where he wanted the company to go and he started on a much more aggressive path than many expected the company to take.
It always feels a little odd when covering NVIDIA’s quarterly earnings due to how they present their financial calendar. No, we are not reporting from the future. Yes, it can be confusing when comparing results and getting your dates mixed up. Regardless of the date before the earnings, NVIDIA did exceptionally well in a quarter that is typically the second weakest after Q1.
NVIDIA reported revenue of $1.43 billion. This is a jump from an already strong Q1 where they took in $1.30 billion. Compare this to the $1.027 billion of its competitor AMD who also provides CPUs as well as GPUs. NVIDIA sold a lot of GPUs as well as other products. Their primary money makers were the consumer space GPUs and the professional and compute markets where they have a virtual stranglehold on at the moment. The company’s GAAP net income is a very respectable $253 million.
The release of the latest Pascal based GPUs were the primary mover for the gains for this latest quarter. AMD has had a hard time competing with NVIDIA for marketshare. The older Maxwell based chips performed well against the entire line of AMD offerings and typically did so with better power and heat characteristics. Even though the GTX 970 was somewhat limited in its memory configuration as compared to the AMD products (3.5 GB + .5 GB vs. a full 4 GB implementation) it was a top seller in its class. The same could be said for the products up and down the stack.
Pascal was released at the end of May, but the company had been shipping chips to its partners as well as creating the “Founder’s Edition” models to its exacting specifications. These were strong sellers throughout the end of May until the end of the quarter. NVIDIA recently unveiled their latest Pascal based Quadro cards, but we do not know how much of an impact those have had on this quarter. NVIDIA has also been shipping, in very limited quantities, the Tesla P100 based units to select customers and outfits.
A Watershed Moment in Mobile
This previous May I was invited to Austin to be briefed on the latest core innovations from ARM and their partners. We were introduced to new CPU and GPU cores, as well as the surrounding technologies that provide the basis of a modern SOC in the ARM family. We also were treated to more information about the process technologies that ARM would embrace with their Artisan and POP programs. ARM is certainly far more aggressive now in their designs and partnerships than they have been in the past, or at least they are more willing to openly talk about them to the press.
The big process news that ARM was able to share at this time was the design of 10nm parts using an upcoming TSMC process node. This was fairly big news as TSMC was still introducing parts on their latest 16nm FF+ line. NVIDIA had not even released their first 16FF+ parts to the world in early May. Apple had dual sourced their 14/16 nm parts from Samsung and TSMC respectively, but these were based on LPE and FF lines (early nodes not yet optimized to LPP/FF+). So the news that TSMC would have a working 10nm process in 2017 was important to many people. 2016 might be a year with some good performance and efficiency jumps, but it seems that 2017 would provide another big leap forward after years of seeming stagnation of pure play foundry technology at 28nm.
Yesterday we received a new announcement from ARM that shows an amazing shift in thought and industry inertia. ARM is partnering with Intel to introduce select products on Intel’s upcoming 10nm foundry process. This news is both surprising and expected. It is surprising in that it happened as quickly as it did. It is expected as Intel is facing a very different world than it had planned for 10 years ago. We could argue that it is much different than they planned for 5 years ago.
Intel is the undisputed leader in process technologies and foundry practices. They are the gold standard of developing new, cutting edge process nodes and implementing them on a vast scale. This has served them well through the years as they could provide product to their customers seemingly on demand. It also allowed them a leg up in technology when their designs may not have fit what the industry wanted or needed (Pentium 4, etc.). It also allowed them to potentially compete in the mobile market with designs that were not entirely suited for ultra-low power. x86 is a modern processor technology with decades of development behind it, but that development focused mainly on performance at higher TDP ranges.
This past year Intel signaled their intent to move out of the sub 5 watt market and cede it to ARM and their partners. Intel’s ultra mobile offerings just did not make an impact in an area that they were expected to. For all of Intel’s advances in process technology, the base ARM architecture is just better suited to these power envelopes. Instead of throwing good money after bad (in the form of development time, wafer starts, rebates) Intel has stepped away from this market.
This leaves Intel with a problem. What to do with extra production capacity? Running a fab is a very expensive endeavor. If these megafabs are not producing chips 24/7, then the company is losing money. This past year Intel has seen their fair share of layoffs and slowing down production/conversion of fabs. The money spent on developing new, cutting edge process technologies cannot stop for the company if they want to keep their dominant position in the CPU industry. Some years back they opened up their process products to select 3rd party companies to help fill in the gaps of production. Right now Intel has far more production line space than they need for the current market demands. Yes, there were delays in their latest Skylake based processors, but those were solved and Intel is full steam ahead. Unfortunately, they do not seem to be keeping their fabs utilized at the level needed or desired. The only real option seems to be opening up some fab space to more potential customers in a market that they are no longer competing directly in.
The Intel Custom Foundry Group is working with ARM to provide access to their 10nm HPM process node. Initial production of these latest generation designs will commence in Q1 2017 with full scale production in Q4 2017. We do not have exact information as to what cores will be used, but we can imagine that they will be Cortex-A73 and A53 parts in big.LITTLE designs. Mali graphics will probably be the first to be offered on this advanced node as well due to the Artisan/POP program. Initial customers have not been disclosed and we likely will not hear about them until early 2017.
This is a big step for Intel. It is also a logical progression for them when we look over the changing market conditions of the past few years. They were unable to adequately compete in the handheld/mobile market with their x86 designs, but they still wanted to profit off of this ever expanding area. The logical way to monetize this market is to make the chips for those that are successfully competing here. This will cut into Intel’s margins, but it should increase their overall revenue base if they are successful here. There is no reason to believe that they won’t be.
The last question we have is if the 10nm HPM node will be identical to what Intel will use for their next generation “Cannonlake” products. My best guess is that the foundry process will be slightly different and will not provide some of the “secret sauce” that Intel will keep for themselves. It will probably be a mobile focused process node that stresses efficiency rather than transistor switching speed. I could be very wrong here, but I don’t believe that Intel will open up their process to everyone that comes to them hat in hand (AMD).
The partnership between ARM and Intel is a very interesting one that will benefit customers around the globe if it is handled correctly from both sides. Intel has a “not invented here” culture that has both benefited it and caused it much grief. Perhaps some flexibility on the foundry side will reap benefits of its own when dealing with very different designs than Intel is used to. This is a titanic move from where Intel probably thought it would be when it first started to pursue the ultra-mobile market, but it is a move that shows the giant can still positively react to industry trends.
First, Some Background
NVIDIA's Rumored GP102
When GP100 was announced, Josh and I were discussing, internally, how it would make sense in the gaming industry. Recently, an article on WCCFTech cited anonymous sources, which should always be taken with a dash of salt, that claimed NVIDIA was planning a second architecture, GP102, between GP104 and GP100. As I was writing this editorial about it, relating it to our own speculation about the physics of Pascal, VideoCardz claims to have been contacted by the developers of AIDA64, seemingly on-the-record, also citing a GP102 design.
I will retell chunks of the rumor, but also add my opinion to it.
In the last few generations, each architecture had a flagship chip that was released in both gaming and professional SKUs. Neither audience had access to a chip that was larger than the other's largest of that generation. Clock rates and disabled portions varied by specific product, with gaming usually getting the more aggressive performance for slightly better benchmarks. Fermi had GF100/GF110, Kepler had GK110/GK210, and Maxwell had GM200. Each of these were available in Tesla, Quadro, and GeForce cards, especially Titans.
Maxwell was interesting, though. NVIDIA was unable to leave 28nm, which Kepler launched on, so they created a second architecture at that node. To increase performance without having access to more feature density, you need to make your designs bigger, more optimized, or more simple. GM200 was giant and optimized, but, to get the performance levels it achieved, also needed to be more simple. Something needed to go, and double-precision (FP64) performance was the big omission. NVIDIA was upfront about it at the Titan X launch, and told their GPU compute customers to keep purchasing Kepler if they valued FP64.
Seeing Ryan transition from being a long-time Android user over to iOS late last year has had me thinking. While I've had hands on with flagship phones from many manufacturers since then, I haven't actually carried an Android device with me since the Nexus S (eventually, with the 4.0 Ice Cream Sandwich upgrade). Maybe it was time to go back in order to gain a more informed perspective of the mobile device market as it stands today.
So that's exactly what I did. When we received our Samsung Galaxy S7 review unit (full review coming soon, I promise!), I decided to go ahead and put a real effort forth into using Android for an extended period of time.
Full disclosure, I am still carrying my iPhone with me since we received a T-Mobile locked unit, and my personal number is on Verizon. However, I have been using the S7 for everything but phone calls, and the occasional text message to people who only has my iPhone number.
Now one of the questions you might be asking yourself right now is why did I choose the Galaxy S7 of all devices to make this transition with. Most Android aficionados would probably insist that I chose a Nexus device to get the best experience and one that Google intends to provide when developing Android. While these people aren't wrong, I decided that I wanted to go with a more popular device as opposed to the more niche Nexus line.
Whether you Samsung's approach or not, the fact is that they sell more Android devices than anyone else and the Galaxy S7 will be their flagship offering for the next year or so.
28HPCU: Cost Effective and Power Efficient
Have you ever been approached about something and upon first hearing about it, the opportunity just did not seem very exciting? Then upon digging into things, it became much more interesting? This happened to me with this announcement. At first blush, who really cares that ARM is partnering with UMC at 28 nm? Well, once I was able to chat with the people at ARM, it is much more interesting than initially expected.
The new hotness in fabrication is the latest 14 nm and 16 nm processes from Samsung/GF and TSMC respectively. It has been a good 4+ years since we last had a new process node that actually performed as expected. The planar 22/20 nm products just were not entirely suitable for mass production. Apple was one of the few to actually develop a part for TSMC’s 20 nm process that actually sold in the millions. The main problem was a lack of power and speed scaling as compared to 28 nm processes. Planar was a bad choice, but the development of FinFET technologies hadn’t been implemented in time for it to show up at this time by 3rd party manufacturers.
There is a problem with the latest process generations, though. They are new, expensive, and are production constrained. Also, they may not be entirely appropriate for the applications that are being developed. There are several strengths with 28 nm as compared. These are mature processes with an excess of line space. The major fabs are offering very competitive pricing structures for 28 nm as they see space being cleared up on the lines with higher end SOCs, GPUs, and assorted ASICs migrating to the new process nodes.
TSMC has typically been on the forefront of R&D with advanced nodes. UMC is not as aggressive with their development, but they tend to let others do some of the heavy lifting and then integrate the new nodes when it fits their pricing and business models. TSMC is on their third generation of 28 nm. UMC is on their second, but that generation encompasses many of the advanced features of TSMC’s 3rd generation so it is actually quite competitive.
Fighting for Relevance
AMD is still kicking. While the results of this past year have been forgettable, they have overcome some significant hurdles and look like they are improving their position in terms of cutting costs while extracting as much revenue as possible. There were plenty of ups and downs for this past quarter, but when compared to the rest of 2015 there were some solid steps forward here.
The company reported revenues of $958 million, which is down from $1.06 billion last quarter. The company also recorded a $103 million loss, but that is down significantly from the $197 million loss the quarter before. Q3 did have a $65 million write-down due to unsold inventory. Though the company made far less in revenues, they also shored up their losses. The company is still bleeding, but they still have plenty of cash on hand for the next several quarters to survive. When we talk about non-GAAP figures, AMD reports a $79 million loss for this past quarter.
For the entire year AMD recorded $3.99 billion in revenue with a net loss of $660 million. This is down from FY 2014 revenues of $5.51 billion and a net loss of $403 million. AMD certainly is trending downwards year over year, but they are hoping to reverse that come 2H 2016.
Graphics continues to be solid for AMD as they increased their sales from last quarter, but are down year on year. Holiday sales were brisk, but with only the high end Fury series being a new card during this season, the impact of that particular part was not as great as compared to the company having a new mid-range series like the newly introduced R9 380X. The second half of 2016 will see the introduction of the Polaris based GPUs for both mobile and desktop applications. Until then, AMD will continue to provide the current 28 nm lineup of GPUs to the market. At this point we are under the assumption that AMD and NVIDIA are looking at the same timeframe for introducing their next generation parts due to process technology advances. AMD already has working samples on Samsung’s/GLOBALFOUNDRIES 14nm LPP (low power plus) that they showed off at CES 2016.
Thank you for all you do!
Much of what I am going to say here is repeated from the description on our brand new Patreon support page, but I think a direct line to our readers is in order.
First, I think you may need a little back story. Ask anyone that has been doing online media in this field for any length of time and they will tell you that getting advertisers to sign on and support the production of "free" content has been getting more and more difficult. You'll see this proven out in the transition of several key personalities of our industry away from media into the companies they used to cover. And you'll see it in the absorption of some of our favorite media outlets, being purchased by larger entities with the promise of being able to continue doing what they have been doing. Or maybe you've seen it show as more interstitial ads, road blocks, sponsored site sections, etc.
At PC Perspective we've seen the struggle first hand but I have done my best to keep as much of that influence away from my team. We are not immune - several years ago we started doing site skins, something we didn't plan for initially. I do think I have done a better than average job keeping the lights on here though, so to speak. We have good sell through on our ad inventory and some of the best companies in our industry support the work we do.
Some of the PC Perspective team at CES 2016
Let me be clear though - we aren't on the verge of going out of business. I am not asking for Patreon supporters to keep from firing anyone. We just wanted to maintain and grow our content library and capability and it seemed like the audience that benefits and enjoys that content might be the best place to start.
Some of you are likely asking yourself if supporting PC Perspective is really necessary? After all, you can chug out a 400 word blog in no time! The truth is that high quality, technical content takes a lot of man hours and those hours are expensive. Our problem is that to advertisers, a page view is a page view, they don't really care how much time and effort went into creating the content on that page. If we spend 20 hours developing a way to evaluate variable refresh rate monitors with an oscilloscope, but put the results on a single page at pcper.com, we get the same amount of traffic as someone that just posts an hour's worth of gameplay experiences. Both are valuable to the community, but one costs a lot more to produce.
Frame Rating testing methodology helped move the industry forward
The easy way out is to create click bait style content (have you seen the new Marvel trailer??!?) and hope for enough extra page views to make up for the difference. But many people find the allure of the cheap/easy posts too easy and quickly devolve into press releases and marketing vomit. No one at PC Perspective wants to see that happen here.
Not only do we want to avoid a slide into that fate but we want to improve on what we are doing, going further down the path of technical analysis with high quality writing and video content. Very few people are working on this kind of writing and analysis yet it is vitally important to those of you that want the information to make critical purchasing decisions. And then you, in turn, pass those decisions on to others with less technical interest (brothers, mothers, friends).
We have ideas for new regular shows including a PC Perspective Mailbag, a gaming / Virtual LAN Party show and even an old hardware post-mortem production. All of these take extra time beyond what each person has dedicated today and the additional funding provided by a successful Patreon campaign will help us towards those goals.
I don't want anyone to feel that they are somehow less of a fan of PC Perspective if you can't help - that's not what we are about and not what I stand for. Just being here, reading and commenting on our work means a lot to us. You can still help by spreading the word about stories you find interesting or even doing your regular Amazon.com shopping through our link on the right side bar.
But for those of you that can afford a monthly contribution, consider a "value for value" amount. How much do you think the content we have produced and will produce is worth to you? If that's $3/month, thank you! If that's $20/month, thank you as well!
Support PC Perspective through Patreon
The team and I spent a lot of our time in the last several weeks talking through this Patreon campaign and we are proud to offer ourselves up to our community. PC Perspective is going to be here for a long time, and support from readers like you will help us be sure we can continue to improve and innovate on the information and content we provide.
Again, thank you so much for support over the last 16 years!
Looking Towards 2016
ARM invited us to a short conversation with them on the prospects of 2016. The initial answer as to how they feel the upcoming year will pan out is, “Interesting”. We covered a variety of topics ranging from VR to process technology. ARM is not announcing any new products at this time, but throughout this year they will continue to push their latest Mali graphics products as well as the Cortex A72.
Trends to Watch in 2016
The one overriding trend that we will see is that of “good phones at every price point”. ARM’s IP scales from very low to very high end mobile SOCs and their partners are taking advantage of the length and breadth of these technologies. High end phones based on custom cores (Apple, Qualcomm) will compete against those licensing the Cortex A72 and A57 parts for their phones. Lower end options that are less expensive and pull less power (which then requires less battery) will flesh out the midrange and budget parts. Unlike several years ago, the products from top to bottom are eminently usable and relatively powerful products.
Camera improvements will also take center stage for many products and continue to be a selling point and an area of differentiation for competitors. Improved sensors and software will obviously be the areas where the ARM partners will focus on, but ARM is putting some work into this area as well. Post processing requires quite a bit of power to do quickly and effectively. ARM is helping here to leverage the Neon SIMD engine and leveraging the power of the Mali GPU.
4K video is becoming more and more common as well with handhelds, and ARM is hoping to leverage that capability in shooting static pictures. A single 4K frame is around 8 megapixels in size. So instead of capturing video, the handheld can achieve a “best shot” type functionality. So the phone captures the 4K video and then users can choose the best shot available to them in that period of time. This is a simple idea that will be a nice feature for those with a product that can capture 4K video.
What you never knew you didn't know
While researching a few upcoming SD / microSD product reviews here at PC Perspective, I quickly found myself swimming in a sea of ratings and specifications. This write up was initially meant to explain and clarify these items, but it quickly grew into a reference too large to include in every SD card article, so I have spun it off here as a standalone reference. We hope it is as useful to you as it will be to our upcoming SD card reviews.
SD card speed ratings are a bit of a mess, so I'm going to do my best to clear things up here. I'll start with classes and grades. These are specs that define the *minimum* speed a given SD card should meet when reading or writing (both directions are used for the test). As with all flash devices, the write speed tends to be the more limiting factor. Without getting into gory detail, the tests used assume mostly sequential large writes and random reads occurring at no smaller than the minimum memory unit of the card (typically 512KB). The tests match the typical use case of an SD card, which is typically writing larger files (or sequential video streams), with minimal small writes (file table updates, etc).
In the above chart, we see speed 'Class' 2, 4, 6, and 10. The SD card spec calls out very specific requirements for these specs, but the gist of it is that an unfragmented SD card will be able to write at a minimum MB/s corresponding to its rated class (e.g. Class 6 = 6 MB/s minimum transfer speed). The workload specified is meant to represent a typical media device writing to an SD card, with buffering to account for slower FAT table updates (small writes). With higher bus speed modes (more on that later), we also get higher classes. Older cards that are not rated under this spec are referred to as 'Class 0'.
As we move higher than Class 10, we get to U1 and U3, which are referred to as UHS Speed Grades (contrary to the above table which states 'Class') in the SD card specification. The changeover from Class to Grade has something to do with speed modes, which also relates with the standard capacity of the card being used:
U1 and U3 correspond to 10 and 30 MB/s minimums, but the test conditions are slightly different for these specs (so Class 10 is not *exactly* the same as a U1 rating, even though they both equate to 10 MB/sec). Cards not performing to U1 are classified as 'Speed Grade 0'. One final note here is that a U rating also implies a UHS speed mode (see the next section).
New Components, New Approach
After 20 or so enclosure reviews over the past year and a half and some pretty inconsistent test hardware along the way, I decided to adopt a standardized test bench for all reviews going forward. Makes sense, right? Turns out choosing the best components for a cases and cooling test system was a lot more difficult than I expected going in, as special consideration had to be made for everything from form-factor to noise and heat levels.
Along with the new components I will also be changing the approach to future reviews by expanding the scope of CPU cooler testing. After some debate as to the type of CPU cooler to employ I decided that a better test of an enclosure would be to use both closed-loop liquid and air cooling for every review, and provide thermal and noise results for each. For CPU cooler reviews themselves I'll be adding a "real-world" load result to the charts to offer a more realistic scenario, running a standard desktop application (in this case a video encoder) in addition to the torture-test result using Prime95.
But what about this new build? It isn't completely done but here's a quick look at the components I ended up with so far along with the rationale for each selection.
CPU – Intel Core i5-6600K ($249, Amazon.com)
The introduction of Intel’s 6th generation Skylake processors provided the
excuse opportunity for an upgrade after using an AMD FX-6300 system for the last couple of enclosure reviews, and after toying with the idea of the new i7-6700K, and immediately realizing this was likely overkill and (more importantly) completely unavailable for purchase at the time, I went with the more "reasonable" option with the i5. There has long been a debate as to the need for hyper-threading for gaming (though this may be changing with the introduction of DX12) but in any case this is still a very powerful processor and when stressed should produce a challenging enough thermal load to adequately test both CPU coolers and enclosures going forward.
GPU – XFX Double Dissipation Radeon R9 290X ($347, Amazon.com)
This was by far the most difficult selection. I don’t think of my own use when choosing a card for a test system like this, as it must meet a set of criteria to be a good fit for enclosure benchmarks. If I choose a card that runs very cool and with minimal noise, GPU benchmarks will be far less significant as the card won’t adequately challenge the design and thermal characteristics of the enclosure. There are certainly options that run at greater temperatures and higher noise (a reference R9 290X for example), but I didn’t want a blower-style cooler with the GPU. Why? More and more GPUs are released with some sort of large multi-fan design rather than a blower, and for enclosure testing I want to know how the case handles the extra warm air.
Noise was an important consideration, as levels from an enclosure of course vary based on the installed components. With noise measurements a GPU cooler that has very low output at idle (or zero, as some recent cooler designs permit) will allow system idle levels to fall more on case fans and airflow than a GPU that might drown them out. (This would also allow a better benchmark of CPU cooler noise - particularly with self-contained liquid coolers and audible pump noise.) And while I wanted very quiet performance at idle, at load there must be sufficient noise to measure the performance of the enclosure in this regard, though of course nothing will truly tax a design quite like a loud blower. I hope I've found a good balance here.
To the Max?
Much of the PC enthusiast internet, including our comments section, has been abuzz with “Asynchronous Shader” discussion. Normally, I would explain what it is and then outline the issues that surround it, but I would like to swap that order this time. Basically, the Ashes of the Singularity benchmark utilizes Asynchronous Shaders in DirectX 12, but they disable it (by Vendor ID) for NVIDIA hardware. They say that this is because, while the driver reports compatibility, “attempting to use it was an unmitigated disaster in terms of performance and conformance”.
AMD's Robert Hallock claims that NVIDIA GPUs, including Maxwell, cannot support the feature in hardware at all, while all AMD GCN graphics cards do. NVIDIA has yet to respond to our requests for an official statement, although we haven't poked every one of our contacts yet. We will certainly update and/or follow up if we hear from them. For now though, we have no idea whether this is a hardware or software issue. Either way, it seems more than just politics.
So what is it?
Simply put, Asynchronous Shaders allows a graphics driver to cram workloads in portions of the GPU that are idle, but not otherwise available. For instance, if a graphics task is hammering the ROPs, the driver would be able to toss an independent physics or post-processing task into the shader units alongside it. Kollock from Oxide Games used the analogy of HyperThreading, which allows two CPU threads to be executed on the same core at the same time, as long as it has the capacity for it.
Kollock also notes that compute is becoming more important in the graphics pipeline, and it is possible to completely bypass graphics altogether. The fixed-function bits may never go away, but it's possible that at least some engines will completely bypass it -- maybe even their engine, several years down the road.
But, like always, you will not get an infinite amount of performance by reducing your waste. You are always bound by the theoretical limits of your components, and you cannot optimize past that (except for obviously changing the workload itself). The interesting part is: you can measure that. You can absolutely observe how long a GPU is idle, and represent it as a percentage of a time-span (typically a frame).
And, of course, game developers profile GPUs from time to time...
According to Kollock, he has heard of some console developers getting up to 30% increases in performance using Asynchronous Shaders. Again, this is on console hardware and so this amount may increase or decrease on the PC. In an informal chat with a developer at Epic Games, so massive grain of salt is required, his late night ballpark “totally speculative” guesstimate is that, on the Xbox One, the GPU could theoretically accept a maximum ~10-25% more work in Unreal Engine 4, depending on the scene. He also said that memory bandwidth gets in the way, which Asynchronous Shaders would be fighting against. It is something that they are interested in and investigating, though.
This is where I speculate on drivers. When Mantle was announced, I looked at its features and said “wow, this is everything that a high-end game developer wants, and a graphics developer absolutely does not”. From the OpenCL-like multiple GPU model taking much of the QA out of SLI and CrossFire, to the memory and resource binding management, this should make graphics drivers so much easier.
It might not be free, though. Graphics drivers might still have a bunch of games to play to make sure that work is stuffed through the GPU as tightly packed as possible. We might continue to see “Game Ready” drivers in the coming years, even though much of that burden has been shifted to the game developers. On the other hand, maybe these APIs will level the whole playing field and let all players focus on chip design and efficient injestion of shader code. As always, painfully always, time will tell.
It's Basically a Function Call for GPUs
Mantle, Vulkan, and DirectX 12 all claim to reduce overhead and provide a staggering increase in “draw calls”. As mentioned in the previous editorial, loading graphics card with tasks will take a drastic change in these new APIs. With DirectX 10 and earlier, applications would assign attributes to (what it is told is) the global state of the graphics card. After everything is configured and bound, one of a few “draw” functions is called, which queues the task in the graphics driver as a “draw call”.
While this suggests that just a single graphics device is to be defined, which we also mentioned in the previous article, it also implies that one thread needs to be the authority. This limitation was known about for a while, and it contributed to the meme that consoles can squeeze all the performance they have, but PCs are “too high level” for that. Microsoft tried to combat this with “Deferred Contexts” in DirectX 11. This feature allows virtual, shadow states to be loaded from secondary threads, which can be appended to the global state, whole. It was a compromise between each thread being able to create its own commands, and the legacy decision to have a single, global state for the GPU.
Some developers experienced gains, while others lost a bit. It didn't live up to expectations.
The paradigm used to load graphics cards is the problem. It doesn't make sense anymore. A developer might not want to draw a primitive with every poke of the GPU. At times, they might want to shove a workload of simple linear algebra through it, while other requests could simply be pushing memory around to set up a later task (or to read the result of a previous one). More importantly, any thread could want to do this to any graphics device.
The new graphics APIs allow developers to submit their tasks quicker and smarter, and it allows the drivers to schedule compatible tasks better, even simultaneously. In fact, the driver's job has been massively simplified altogether. When we tested 3DMark back in March, two interesting things were revealed:
- Both AMD and NVIDIA are only a two-digit percentage of draw call performance apart
- Both AMD and NVIDIA saw an order of magnitude increase in draw calls
Tick Tock Tick Tock Tick Tock Tock
A few websites have been re-reporting on a leak from BenchLife.info about Kaby Lake, which is supposedly a second 14nm redesign (“Tock”) to be injected between Skylake and Cannonlake.
UPDATE (July 2nd, 3:20pm ET): It has been pointed out that many hoaxes have come out of the same source, and that I should be more clear in my disclaimer. This is an unconfirmed, relatively easy to fake leak that does not have a second, independent source. I reported on it because (apart from being interesting enough) some details were listed on the images, but not highlighted in the leak, such as "GT0" and a lack of Iris Pro on -K. That suggests that the leaker got the images from somewhere, but didn't notice those details, which implies that the original source was hoaxed by an anonymous source, who only seeded the hoax to a single media outlet, or that it was an actual leak.
Either way, enjoy my analysis but realize that this is a single, unconfirmed source who allegedly published hoaxes in the past.
Image Credit: BenchLife.info
If true, this would be a major shift in both Intel's current roadmap as well as how they justify their research strategies. It also includes a rough stack of product categories, from 4.5W up to 91W TDPs, including their planned integrated graphics configurations. This leads to a pair of interesting stories:
How Kaby Lake could affect Intel's processors going forward. Since 2006, Intel has only budgeted a single CPU architecture redesign for any given fabrication process node. Taking two attempts on the 14nm process buys time for 10nm to become viable, but it could also give them more time to build up a better library of circuit elements, allowing them to assemble better processors in the future.
What type of user will be given Iris Pro? Also, will graphics-free options be available in the sub-Enthusiast class? When buying a processor from Intel, the high-end mainstream processors tend to have GT2-class graphics, such as the Intel HD 4600. Enthusiast architectures, such as Haswell-E, cannot be used without discrete graphics -- the extra space is used for more cores, I/O lanes, or other features. As we will discuss later, Broadwell took a step into changing the availability of Iris Pro in the high-end mainstream, but it doesn't seem like Kaby Lake will make any more progress. Also, if I am interpreting the table correctly, Kaby Lake might bring iGPU-less CPUs to LGA 1151.
Keeping Your Core Regular
To the first point, Intel has been on a steady tick-tock cycle since the Pentium 4 architecture reached the 65nm process node, which was a “tick”. The “tock” came from the Conroe/Merom architecture that was branded “Core 2”. This new architecture was a severe departure from the high clock, relatively low IPC design that Netburst was built around, which instantaneously changed the processor landscape from a dominant AMD to an Intel runaway lead.
After 65nm and Core 2 started the cycle, every new architecture alternated between shrinking the existing architecture to smaller transistors (tick) and creating a new design on the same fabrication process (tock). Even though Intel has been steadily increasing their R&D budget over time, which is now in the range of $10 to $12 billion USD each year, creating smaller, more intricate designs with new process nodes has been getting harder. For comparison, AMD's total revenue (not just profits) for 2014 was $5.51 billion USD.
Digging in a Little Deeper into the DiRT
Over the past few weeks I have had the chance to play the early access "DiRT Rally" title from Codemasters. This is a much more simulation based title that is currently PC only, which is a big switch for Codemasters and how they usually release their premier racing offerings. I was able to get a hold of Paul Coleman from Codemasters and set up a written interview with him. Paul's answers will be in italics.
Who are you, what do you do at Codemasters, and what do you do in your spare time away from the virtual wheel?
Hi my name is Paul Coleman and I am the Chief Games Designer on DiRT Rally. I’m responsible for making sure that the game is the most authentic representation of the sport it can be, I’m essentially representing the player in the studio. In my spare time I enjoy going on road trips with my family in our 1M Coupe. I’ve been co-driving in real world rally events for the last three years and I’ve used that experience to write and voice the co-driver calls in game.
If there is one area that DiRT has really excelled at is keeping frame rate consistent throughout multiple environments. Many games, especially those using cutting edge rendering techniques, often have dramatic frame rate drops at times. How do you get around this while still creating a very impressive looking game?
The engine that DiRT Rally has been built on has been constantly iterated on over the years and we have always been looking at ways of improving the look of the game while maintaining decent performance. That together with the fact that we work closely with GPU manufacturers on each project ensures that we stay current. We also have very strict performance monitoring systems that have come from optimising games for console. These systems have proved very useful when building DiRT Rally even though the game is exclusively on PC.
How do you balance out different controller use cases? While many hard core racers use a wheel, I have seen very competitive racing from people using handheld controllers as well as keyboards. Do you handicap/help those particular implementations so as not to make it overly frustrating to those users? I ask due to the difference in degrees of precision that a gamepad has vs. a wheel that can rotate 900 degrees.
Again this comes back to the fact that we have traditionally developed for console where the primary input device is a handheld controller. This is an area that other sims don’t usually have to worry about but for us it was second nature. There are systems that we have that add a layer between the handheld controller or keyboard and the game which help those guys but the wheel is without a doubt the best way to experience DiRT Rally as it is a direct input.
Process Technology Overview
We have been very spoiled throughout the years. We likely did not realize exactly how spoiled we were until it became very obvious that the rate of process technology advances hit a virtual brick wall. Every 18 to 24 months we were treated to a new, faster, more efficient process node that was opened up to fabless semiconductor firms and we were treated to a new generation of products that would blow our hair back. Now we have been in a virtual standstill when it comes to new process nodes from the pure-play foundries.
Few expected the 28 nm node to live nearly as long as it has. Some of the first cracks in the façade actually came from Intel. Their 22 nm Tri-Gate (FinFET) process took a little bit longer to get off the ground than expected. We also noticed some interesting electrical features from the products developed on that process. Intel skewed away from higher clockspeeds and focused on efficiency and architectural improvements rather than staying at generally acceptable TDPs and leapfrogging the competition by clockspeed alone. Overclockers noticed that the newer parts did not reach the same clockspeed heights as previous products such as the 32 nm based Sandy Bridge processors. Whether this decision was intentional from Intel or not is debatable, but my gut feeling here is that they responded to the technical limitations of their 22 nm process. Yields and bins likely dictated the max clockspeeds attained on these new products. So instead of vaulting over AMD’s products, they just slowly started walking away from them.
Samsung is one of the first pure-play foundries to offer a working sub-20 nm FinFET product line. (Photo courtesy of ExtremeTech)
When 28 nm was released the plans on the books were to transition to 20 nm products based on planar transistors, thereby bypassing the added expense of developing FinFETs. It was widely expected that FinFETs were not necessarily required to address the needs of the market. Sadly, that did not turn out to be the case. There are many other factors as to why 20 nm planar parts are not common, but the limitations of that particular process node has made it a relatively niche process node that is appropriate for smaller, low power ASICs (like the latest Apple SOCs). The Apple A8 is rumored to be around 90 mm square, which is a far cry from the traditional midrange GPU that goes from 250 mm sq. to 400+ mm sq.
The essential difficulty of the 20 nm planar node appears to be a lack of power scaling to match the increased transistor density. TSMC and others have successfully packed in more transistors into every square mm as compared to 28 nm, but the electrical characteristics did not scale proportionally well. Yes, there are improvements there per transistor, but when designers pack in all those transistors into a large design, TDP and voltage issues start to arise. As TDP increases, it takes more power to drive the processor, which then leads to more heat. The GPU guys probably looked at this and figured out that while they can achieve a higher transistor density and a wider design, they will have to downclock the entire GPU to hit reasonable TDP levels. When adding these concerns to yields and bins for the new process, the advantages of going to 20 nm would be slim to none at the end of the day.