All | Editorial | General Tech | Graphics Cards | Networking | Motherboards | Cases and Cooling | Processors | Chipsets | Memory | Displays | Systems | Storage | Mobile | Shows and Expos
Gunning for Broadwell-E
As I walked away from the St. Regis in downtown San Francisco tonight, I found myself wandering through the streets towards my hotel with something unique in tow. It was a smile. I was smiling, thinking about what AMD had just demonstrated and showed at its latest Zen processor reveal. The importance of this product launch can literally not be overstated for a company struggling to find a foothold to hang on to in a market that it once had a definitive lead. It’s been many years since I left a conference call, or a meeting, or a press conference feeling genuinely hopefully and enthusiastic about what AMD has shown me. Tonight I had that.
AMD’s CEO Lisa Su, and CTO Mark Papermaster, took stage down the street from the Intel Developer Forum to roll out a handful of new architectural details about the Zen architecture while also showing the first performance results comparing it to competing parts from Intel. The crowd in attendance, a mix of media and analysts, were impressed. The feeling was palpable in the room.
It’s late as I write this, and while there are some interesting architecture details to discuss, I think it is in everyone’s best interest that we touch on them lightly for now, and instead refocus on the deep-dive once the Hot Chips information comes out early next week. What you really want to know is clear: can Zen make Intel work again? Can Zen make that $1700 price tag on the Broadwell-E 6950X seem even more ludicrous? Yes.
The Zen Architecture
Much of what was discussed from the Zen architecture is a re-release of what has been out in recent months. This is a completely new, from the ground up, microarchitecture and not a revamp of the aging Bulldozer design. It integrated SMT (simultaneous multi-threading), a first for an AMD CPU, to better take efficient advantage of a longer pipeline. Intel has had HyperThreading for a long time now and AMD is finally joining the fold. A high bandwidth and low latency caching system is used to “feed the beast” as Papermaster put it and utilizing 14nm process technology (starting at Global Foundries) gives efficiency, and scaling a significant bump while enabling AMD to scale from notebooks to desktops to servers with the same architecture.
By far the most impressive claim from AMD thus far was that of a 40% increase in IPC over previous AMD designs. That’s a HUGE claim and is key to the success or failure of Zen. AMD proved to me today that the claims are real and that we will see the immediate impact of that architecture bump from day one.
Press was told of a handful of high level changes to the new architecture as well. Branch prediction gets a complete overhaul. This marks the first AMD processor to have a micro-op cache. Wider execution width with broader instruction schedulers are integrated, all of which adds up to much higher instruction level parallelism to improve single threaded performance.
Performance improvements aside, throughput and efficiency go up with Zen as well. AMD has integrated an 8MB L3 cache and improved prefetching for up 5x the cache bandwidth available per core on the CPU. SMT makes sure the pipeline stays full to prevent “bubbles” that introduce latency and lower efficiency while region-specific power gating means that we’ll see Zen in notebooks as well as enterprise servers in 2017. It truly is an impressive design from AMD.
Summit Ridge, the enthusiast platform that will be the first product available with Zen, is based on the AM4 platform and processors will go up to 8-cores and 16-threads. DDR4 memory support is included, PCI Express 3.0 and what AMD calls “next-gen” IO – I would expect a quick leap forward for AMD to catch up on things like NVMe and Thunderbolt.
The Real Deal – Zen Performance
As part of today’s reveal, AMD is showing the first true comparison between Zen and Intel processors. Sure, AMD showed a Zen-powered system running the upcoming Deus Ex running at 4K with a system powered by the Fury X, but the really impressive results where shown when comparing Zen to a Broadwell-E platform.
Using Blender to measure the performance of a rendering workload (a Zen CPU mockup of course), AMD ran an 8-core / 16-thread Zen processor at 3.0 GHz against an 8-core / 16-thread Broadwell-E processor at 3.0 GHz (likely a fixed clocked Core i7-6900K). The point of the demonstration was to showcase the IPC improvements of Zen and it worked: the render completed on the Zen platform a second or two faster than it did on the Intel Broadwell-E system.
Not much to look at, but Zen on the left, Broadwell-E on the right...
Of course there are lots of caveats: we didn’t setup the systems, I don’t know for sure that GPUs weren’t involved, we don’t know the final clocks of the Zen processors releasing in early 2017, etc. But I took two things away from the demonstration that are very important.
- The IPC of Zen is on-par or better than Broadwell.
- Zen will scale higher than 3.0 GHz in 8-core configurations.
AMD obviously didn’t state what specific SKUs were going to launch with the Zen architecture, what clock speeds they would run at, or even what TDPs they were targeting. Instead we were left with a vague but understandable remark of “comparable TDPs to Broadwell-E”.
Pricing? Overclocking? We’ll just have to wait a bit longer for that kind of information.
There is clearly a lot more for AMD to share about Zen but the announcement and showcase made this week with the early prototype products have solidified for me the capability and promise of this new microarchitecture. We have asked for, and needed, as an industry, a competitor to Intel in the enthusiast CPU space – something we haven’t legitimately had since the Athlon X2 days. Zen is what we have been pining over, what gamers and consumers have needed.
AMD’s processor stars might finally be aligning for a product that combines performance, efficiency and scalability at the right time. I’m ready for it –are you?
It always feels a little odd when covering NVIDIA’s quarterly earnings due to how they present their financial calendar. No, we are not reporting from the future. Yes, it can be confusing when comparing results and getting your dates mixed up. Regardless of the date before the earnings, NVIDIA did exceptionally well in a quarter that is typically the second weakest after Q1.
NVIDIA reported revenue of $1.43 billion. This is a jump from an already strong Q1 where they took in $1.30 billion. Compare this to the $1.027 billion of its competitor AMD who also provides CPUs as well as GPUs. NVIDIA sold a lot of GPUs as well as other products. Their primary money makers were the consumer space GPUs and the professional and compute markets where they have a virtual stranglehold on at the moment. The company’s GAAP net income is a very respectable $253 million.
The release of the latest Pascal based GPUs were the primary mover for the gains for this latest quarter. AMD has had a hard time competing with NVIDIA for marketshare. The older Maxwell based chips performed well against the entire line of AMD offerings and typically did so with better power and heat characteristics. Even though the GTX 970 was somewhat limited in its memory configuration as compared to the AMD products (3.5 GB + .5 GB vs. a full 4 GB implementation) it was a top seller in its class. The same could be said for the products up and down the stack.
Pascal was released at the end of May, but the company had been shipping chips to its partners as well as creating the “Founder’s Edition” models to its exacting specifications. These were strong sellers throughout the end of May until the end of the quarter. NVIDIA recently unveiled their latest Pascal based Quadro cards, but we do not know how much of an impact those have had on this quarter. NVIDIA has also been shipping, in very limited quantities, the Tesla P100 based units to select customers and outfits.
Is Enterprise Ascending Outside of Consumer Viability?
So a couple of weeks have gone by since the Quadro P6000 (update: was announced) and the new Titan X launched. With them, we received a new chip: GP102. Since Fermi, NVIDIA has labeled their GPU designs with a G, followed by a single letter for the architecture (F, K, M, or P for Fermi, Kepler, Maxwell, and Pascal, respectively), which is then followed by a three digit number. The last digit is the most relevant one, however, as it separates designs by their intended size.
Typically, 0 corresponds to a ~550-600mm2 design, which is about as larger of a design that fabrication labs can create without error-prone techniques, like
multiple exposures (update for clarity: trying to precisely overlap multiple designs to form a larger integrated circuit). 4 corresponds to ~300mm2, although GM204 was pretty large at 398mm2, which was likely to increase the core count while remaining on a 28nm process. Higher numbers, like 6 or 7, fill back the lower-end SKUs until NVIDIA essentially stops caring for that generation. So when we moved to Pascal, jumping two whole process nodes, NVIDIA looked at their wristwatches and said “about time to make another 300mm2 part, I guess?”
The GTX 1080 and the GTX 1070 (GP104, 314mm2) were born.
NVIDIA already announced a 600mm2 part, though. The GP100 had 3840 CUDA cores, HBM2 memory, and an ideal ratio of 1:2:4 between FP64:FP32:FP16 performance. (A 64-bit chunk of memory can store one 64-bit value, two 32-bit values, or four 16-bit values, unless the register is attached to logic circuits that, while smaller, don't know how to operate on the data.) This increased ratio, even over Kepler's 1:6 FP64:FP32, is great for GPU compute, but wasted die area for today's (and tomorrow's) games. I'm predicting that it takes the wind out of Intel's sales, as Xeon Phi's 1:2 FP64:FP32 performance ratio is one of its major selling points, leading to its inclusion in many supercomputers.
Despite the HBM2 memory controller supposedly being actually smaller than GDDR5(X), NVIDIA could still save die space while still providing 3840 CUDA cores (despite disabling a few on Titan X). The trade-off is that FP64 and FP16 performance had to decrease dramatically, from 1:2 and 2:1 relative to FP32, all the way down to 1:32 and 1:64. This new design comes in at 471mm2, although it's $200 more expensive than what the 600mm2 products, GK110 and GM200, launched at. Smaller dies provide more products per wafer, and, better, the number of defective chips should be relatively constant.
Anyway, that aside, it puts NVIDIA in an interesting position. Splitting the xx0-class chip into xx0 and xx2 designs allows NVIDIA to lower the cost of their high-end gaming parts, although it cuts out hobbyists who buy a Titan for double-precision compute. More interestingly, it leaves around 150mm2 for AMD to sneak in a design that's FP32-centric, leaving them a potential performance crown.
Image Credit: ExtremeTech
On the other hand, as fabrication node changes are becoming less frequent, it's possible that NVIDIA could be leaving itself room for Volta, too. Last month, it was rumored that NVIDIA would release two architectures at 16nm, in the same way that Maxwell shared 28nm with Kepler. In this case, Volta, on top of whatever other architectural advancements NVIDIA rolls into that design, can also grow a little in size. At that time, TSMC would have better yields, making a 600mm2 design less costly in terms of waste and recovery.
If this is the case, we could see the GPGPU folks receiving a new architecture once every second gaming (and professional graphics) architecture. That is, unless you are a hobbyist. If you are? I would need to be wrong, or NVIDIA would need to somehow bring their enterprise SKU into an affordable price point. The xx0 class seems to have been pushed up and out of viability for consumers.
Or, again, I could just be wrong.
A Watershed Moment in Mobile
This previous May I was invited to Austin to be briefed on the latest core innovations from ARM and their partners. We were introduced to new CPU and GPU cores, as well as the surrounding technologies that provide the basis of a modern SOC in the ARM family. We also were treated to more information about the process technologies that ARM would embrace with their Artisan and POP programs. ARM is certainly far more aggressive now in their designs and partnerships than they have been in the past, or at least they are more willing to openly talk about them to the press.
The big process news that ARM was able to share at this time was the design of 10nm parts using an upcoming TSMC process node. This was fairly big news as TSMC was still introducing parts on their latest 16nm FF+ line. NVIDIA had not even released their first 16FF+ parts to the world in early May. Apple had dual sourced their 14/16 nm parts from Samsung and TSMC respectively, but these were based on LPE and FF lines (early nodes not yet optimized to LPP/FF+). So the news that TSMC would have a working 10nm process in 2017 was important to many people. 2016 might be a year with some good performance and efficiency jumps, but it seems that 2017 would provide another big leap forward after years of seeming stagnation of pure play foundry technology at 28nm.
Yesterday we received a new announcement from ARM that shows an amazing shift in thought and industry inertia. ARM is partnering with Intel to introduce select products on Intel’s upcoming 10nm foundry process. This news is both surprising and expected. It is surprising in that it happened as quickly as it did. It is expected as Intel is facing a very different world than it had planned for 10 years ago. We could argue that it is much different than they planned for 5 years ago.
Intel is the undisputed leader in process technologies and foundry practices. They are the gold standard of developing new, cutting edge process nodes and implementing them on a vast scale. This has served them well through the years as they could provide product to their customers seemingly on demand. It also allowed them a leg up in technology when their designs may not have fit what the industry wanted or needed (Pentium 4, etc.). It also allowed them to potentially compete in the mobile market with designs that were not entirely suited for ultra-low power. x86 is a modern processor technology with decades of development behind it, but that development focused mainly on performance at higher TDP ranges.
This past year Intel signaled their intent to move out of the sub 5 watt market and cede it to ARM and their partners. Intel’s ultra mobile offerings just did not make an impact in an area that they were expected to. For all of Intel’s advances in process technology, the base ARM architecture is just better suited to these power envelopes. Instead of throwing good money after bad (in the form of development time, wafer starts, rebates) Intel has stepped away from this market.
This leaves Intel with a problem. What to do with extra production capacity? Running a fab is a very expensive endeavor. If these megafabs are not producing chips 24/7, then the company is losing money. This past year Intel has seen their fair share of layoffs and slowing down production/conversion of fabs. The money spent on developing new, cutting edge process technologies cannot stop for the company if they want to keep their dominant position in the CPU industry. Some years back they opened up their process products to select 3rd party companies to help fill in the gaps of production. Right now Intel has far more production line space than they need for the current market demands. Yes, there were delays in their latest Skylake based processors, but those were solved and Intel is full steam ahead. Unfortunately, they do not seem to be keeping their fabs utilized at the level needed or desired. The only real option seems to be opening up some fab space to more potential customers in a market that they are no longer competing directly in.
The Intel Custom Foundry Group is working with ARM to provide access to their 10nm HPM process node. Initial production of these latest generation designs will commence in Q1 2017 with full scale production in Q4 2017. We do not have exact information as to what cores will be used, but we can imagine that they will be Cortex-A73 and A53 parts in big.LITTLE designs. Mali graphics will probably be the first to be offered on this advanced node as well due to the Artisan/POP program. Initial customers have not been disclosed and we likely will not hear about them until early 2017.
This is a big step for Intel. It is also a logical progression for them when we look over the changing market conditions of the past few years. They were unable to adequately compete in the handheld/mobile market with their x86 designs, but they still wanted to profit off of this ever expanding area. The logical way to monetize this market is to make the chips for those that are successfully competing here. This will cut into Intel’s margins, but it should increase their overall revenue base if they are successful here. There is no reason to believe that they won’t be.
The last question we have is if the 10nm HPM node will be identical to what Intel will use for their next generation “Cannonlake” products. My best guess is that the foundry process will be slightly different and will not provide some of the “secret sauce” that Intel will keep for themselves. It will probably be a mobile focused process node that stresses efficiency rather than transistor switching speed. I could be very wrong here, but I don’t believe that Intel will open up their process to everyone that comes to them hat in hand (AMD).
The partnership between ARM and Intel is a very interesting one that will benefit customers around the globe if it is handled correctly from both sides. Intel has a “not invented here” culture that has both benefited it and caused it much grief. Perhaps some flexibility on the foundry side will reap benefits of its own when dealing with very different designs than Intel is used to. This is a titanic move from where Intel probably thought it would be when it first started to pursue the ultra-mobile market, but it is a move that shows the giant can still positively react to industry trends.
Take your Pascal on the go
Easily the strongest growth segment in PC hardware today is in the adoption of gaming notebooks. Ask companies like MSI and ASUS, even Gigabyte, as they now make more models and sell more units of notebooks with a dedicated GPU than ever before. Both AMD and NVIDIA agree on this point and it’s something that AMD was adamant in discussing during the launch of the Polaris architecture.
Both AMD and NVIDIA predict massive annual growth in this market – somewhere on the order of 25-30%. For an overall culture that continues to believe the PC is dying, seeing projected growth this strong in any segment is not only amazing, but welcome to those of us that depend on it. AMD and NVIDIA have different goals here: GeForce products already have 90-95% market share in discrete gaming notebooks. In order for NVIDIA to see growth in sales, the total market needs to grow. For AMD, simply taking back a portion of those users and design wins would help its bottom line.
But despite AMD’s early talk about getting Polaris 10 and 11 in mobile platforms, it’s NVIDIA again striking first. Gaming notebooks with Pascal GPUs in them will be available today, from nearly every system vendor you would consider buying from: ASUS, MSI, Gigabyte, Alienware, Razer, etc. NVIDIA claims to have quicker adoption of this product family in notebooks than in any previous generation. That’s great news for NVIDIA, but might leave AMD looking in from the outside yet again.
Technologically speaking though, this makes sense. Despite the improvement that Polaris made on the GCN architecture, Pascal is still more powerful and more power efficient than anything AMD has been able to product. Looking solely at performance per watt, which is really the defining trait of mobile designs, Pascal is as dominant over Polaris as Maxwell was to Fiji. And this time around NVIDIA isn’t messing with cut back parts that have brand changes – GeForce is diving directly into gaming notebooks in a way we have only seen with one release.
The ASUS G752VS OC Edition with GTX 1070
Do you remember our initial look at the mobile variant of the GeForce GTX 980? Not the GTX 980M mind you, the full GM204 operating in notebooks. That was basically a dry run for what we see today: NVIDIA will be releasing the GeForce GTX 1080, GTX 1070 and GTX 1060 to notebooks.
Even before the formulation of the term "Internet of things", Steve Gibson proposed home networking topology changes designed to deal with this new looming security threat. Unfortunately, little or no thought is given to the security aspects of the devices in this rapidly growing market.
One of Steve's proposed network topology adjustments involved daisy-chaining two routers together. The WAN port of an IOT-purposed router would be attached to the LAN port of the Border/root router.
In this arrangement, only IOT/Smart devices are connected to the internal (or IOT-purposed) router. The idea was to isolate insecure or poorly implemented devices from the more valuable personal local data devices such as a NAS with important files and or backups. Unfortunately this clever arrangement leaves any device directly connected to the “border” router open to attack by infected devices running on the internal/IOT router. Said devices could perform a simple trace-route and identify that an intermediate network exists between it and the public Internet. Any device running under the border router with known (or worse - unknown!) vulnerabilities can be immediately exploited.
Gibson's alternative formula reversed the positioning of the IOT and border router. Unfortunately, this solution also came with a nasty side-effect. The border router (now used as the "secure" or internal router) became subject to all manner of man-in-the-middle attacks. Since the local Ethernet network basically trusts all traffic within its domain, an infected device on the IOT router (now between the internal router and the public Internet) can manipulate or eavesdrop on any traffic emerging from the internal router. The potential consequences of this flaw are obvious.
The third time really is the charm for Steve! On February 2nd of this year (Episode #545 of Security Now!) Gibson presented us with his third (and hopefully final) foray into the magical land of theory-crafting as it related to securing our home networks against the Internet of Things.
Introduction, Features and Specifications
Cooler Master has a long standing and well respected reputation for delivering cases, power supplies, cooling products, and peripherals to the PC enthusiast market. They recently added the MasterBox 5 Series to their formidable case lineup, which includes three models: Black with side window, White with side window, and Black without a side window. The front bezel is also available with or without support for up to two 5.25” external drive bays. The MasterBox 5 fits into Cooler Master’s mid-tower case lineup, which includes nine other product lines and over seventy mid-tower cases in various sizes and colors!
Cooler Master MasterBox 5 Mid-Tower Case
The MasterBox 5 Series incorporates a straightforward design with numerous internal cutouts on the motherboard tray to allow for easy cable routing and flexible drive mounting. The case can accommodate larger, high-end components like tall CPU coolers, extended length graphic cards, and/or liquid cooling systems. The MasterBox 5 case can mount four different sizes of motherboards, ranging from mini-ITX to Extended-ATX and comes with two very quiet 120mm cooling fans preinstalled. Our review sample included a basic configuration of two internal 3.5” HDD bays, one SSD bracket, but no 5.25” external drive bays.
Cooler Master MasterBox 5 Mid-Tower Case Key Features:
• Mid-Tower ATX enclosure (LxWxH, 500x220x475mm, 19.7x8.7x18.7”)
• Flexible mounting options for SSDs and HDDs
• Supports E-ATX, ATX, Micro-ATX and Mini-ITX motherboards
• Easily removed dust filters on front and bottom panels
• Two included case fans (120mm intake and 120mm exhaust)
• Large acrylic side window
• Included shroud covers PSU and cabling for a clean look
• (2) USB 3.0, mic and headphone jacks on the top I/O panel
• Two internal 3.5” HDD / 2.5” SSD trays and one SSD bracket
• Up to 410mm (16.1”) for long graphic cards
• Up to 167mm (6.5”) of space for tall CPU coolers
• Price: $79.99 USD
Introduction and First Impressions
The Le Grande Macho RT is a massive air CPU cooler design from Thermalright that pairs a very large heatsink (with 7 heat pipes) with a quiet 140 mm fan. It certainly looks impressive, but you'll want to read on to find out how it performed on our test bench!
"With the Le Grand Macho RT we offer an actively cooled version of our famous semi-passive flagship. Thanks to the silent-running TY 147 B with fluid dynamic bearing, the Le Grand Macho RT can cool up to 280 watt.
The design of the heat sink has not been changed and is still asymmetrical. This offers the highest possible compatibility to the most recent motherboards. Thus it is guaranteed that the Le Grand Macho RT neither blocks the RAM spaces, nor the top-most PCIe slot on current ATX-boards."
While the Le Grand Macho RT is one of the largest coolers I've tested, it is still a little smaller than Thermalright's famous SilverArrow dual-tower cooler. In fact, the 159 mm height means it will fit a large number of enclosures (with 165 mm being a common limit).
The single-fan design of the Macho makes it look like a good candidate for low-noise air cooling, and it's physically larger than the Scythe Ninja 4 cooler I reviewed back in January - which was, incidentally, the quietest cooler I've tested to date.
Why install this giant on a mini-ITX board? Why not!
Introduction and Technical Specifications
Courtesy of ASUS
The Maximus VIII Extreme is the flagship in the ROG (Republic of Gamer) board line, offering supporting the Intel Z170 chipset. The board features the standard black and red ROG aesthetics in an E-ATX form factor to best accommodate the slew of features and supported options offered with this board. ASUS integrated black chrome heat sinks with integrated LEDs for a unique and customizable look. The board's integrated Intel Z170 chipset integrates support for the latest Intel LGA1151 Skylake processor line as well as Dual Channel DDR4 memory. The Maximus VIII Extreme may not be very approachable for most users at its $499 MSRP, but the price remains justified given the sheer amount of features integrated into the board and the included accessories.
Courtesy of ASUS
Courtesy of ASUS
Courtesy of ASUS
ASUS integrated the following features into the Maximus VIII Extreme board: four SATA 3 ports; two SATA-Express ports; one U.2 32Gbps port; one M.2 PCIe x4 capable port; an Intel I219-V Gigabit NIC; 3x3 802.11ac WiFI adapter; four PCI-Express x16 slots; two PCI-Express x1 slots; on-board power, reset, MemOK!, Safe Boot, ReTry, DirectKey, Multi-GPU, BIOS Switch, Clear CMOS, and USB BIOS Flashback buttons; Slow Mode and PCIe Lane switches; LN2 Mode jumper; ProbeIt voltage measurement points; 2-digit Q-Code LED diagnostic display; ROG SupremeFX 2015 8-Channel audio subsystem; integrated DisplayPort and HDMI video ports; and USB 3.0 and 3.1 Type-A and Type-C port support. ASUS also included their Fan Extension controller card and OC Panel II device and accesory pack with the Maximus VIII Extreme board.
Courtesy of ASUS
ASUS included their OC Panel II device with the Maximus VIII Extreme motherboard. This device can be used as an external panel or case-mounted using the included 5.25" device bay. The OC Panel II gives user access to a variety of OC settings as well as board bus speed, temperature, and fan speed monitoring settings. Additionally, the panel provides additional fan and temperature headers as well as headers used for advanced overclocking endeavors.
Introduction and Specifications
Barracuda is a name we have not heard in a good while from Seagate. Last seen on their 3TB desktop drive, it appears they thought it was time for a comeback. The company is revamping their product lines, along with launching a full round of 10TB Helium-filled offerings that cover just about anything you might need:
Starting from the center, IronWolf is their NAS drive, optimized for arrays as large as 8 disks. To the right is their surveillance drive offering, the SkyHawk. These are essentially NAS units with custom firmware optimized for multiple stream recording. Not mentioned above is the FireCuda, which is a rebrand of their Desktop SSHD. Those are not He-filled (yet) as their max capacity is not high enough to warrant it. We will be looking at those first two models in future pieces, but the subject of today’s review is the BarraCuda line. The base 3.5” BarraCuda line only goes to 4TB, but the BarraCuda Pro expands upon those capacities, including 6TB, 8TB, and 10TB models. The subject of today’s review is the 10TB BarraCuda Pro.
A Beautiful Graphics Card
As a surprise to nearly everyone, on July 21st NVIDIA announced the existence of the new Titan X graphics cards, which are based on the brand new GP102 Pascal GPU. Though it shares a name, for some unexplained reason, with the Maxwell-based Titan X graphics card launched in March of 2015, this is card is a significant performance upgrade. Using the largest consumer-facing Pascal GPU to date (with only the GP100 used in the Tesla P100 exceeding it), the new Titan X is going to be a very expensive, and very fast gaming card.
As has been the case since the introduction of the Titan brand, NVIDIA claims that this card is for gamers that want the very best in graphics hardware as well as for developers and need an ultra-powerful GPGPU device. GP102 does not integrate improved FP64 / double precision compute cores, so we are basically looking at an upgraded and improved GP104 Pascal chip. That’s nothing to sneeze at, of course, and you can see in the specifications below that we expect (and can now show you) Titan X (Pascal) is a gaming monster.
|Titan X (Pascal)||GTX 1080||GTX 980 Ti||TITAN X||GTX 980||R9 Fury X||R9 Fury||R9 Nano||R9 390X|
|GPU||GP102||GP104||GM200||GM200||GM204||Fiji XT||Fiji Pro||Fiji XT||Hawaii XT|
|Rated Clock||1417 MHz||1607 MHz||1000 MHz||1000 MHz||1126 MHz||1050 MHz||1000 MHz||up to 1000 MHz||1050 MHz|
|Memory Clock||10000 MHz||10000 MHz||7000 MHz||7000 MHz||7000 MHz||500 MHz||500 MHz||500 MHz||6000 MHz|
|Memory Interface||384-bit G5X||256-bit G5X||384-bit||384-bit||256-bit||4096-bit (HBM)||4096-bit (HBM)||4096-bit (HBM)||512-bit|
|Memory Bandwidth||480 GB/s||320 GB/s||336 GB/s||336 GB/s||224 GB/s||512 GB/s||512 GB/s||512 GB/s||320 GB/s|
|TDP||250 watts||180 watts||250 watts||250 watts||165 watts||275 watts||275 watts||175 watts||275 watts|
|Peak Compute||11.0 TFLOPS||8.2 TFLOPS||5.63 TFLOPS||6.14 TFLOPS||4.61 TFLOPS||8.60 TFLOPS||7.20 TFLOPS||8.19 TFLOPS||5.63 TFLOPS|
GP102 features 40% more CUDA cores than the GP104 at slightly lower clock speeds. The rated 11 TFLOPS of single precision compute of the new Titan X is 34% higher than that of the GeForce GTX 1080 and I would expect gaming performance to scale in line with that difference.
Titan X (Pascal) does not utilize the full GP102 GPU; the recently announced Pascal P6000 does, however, which gives it a CUDA core count of 3,840 (256 more than Titan X).
A full GP102 GPU
The complete GPU effectively loses 7% of its compute capability with the new Titan X, although that is likely to help increase available clock headroom and yield.
The new Titan X will feature 12GB of GDDR5X memory, not HBM as the GP100 chip has, so this is clearly a unique chip with a new memory interface. NVIDIA claims it has 480 GB/s of bandwidth on a 384-bit memory controller interface running at the same 10 Gbps as the GTX 1080.
Realworldtech with Compelling Evidence
Yesterday David Kanter of Realworldtech posted a pretty fascinating article and video that explored the two latest NVIDIA architectures and how they have branched away from the traditional immediate mode rasterization units. It has revealed through testing that with Maxwell and Pascal NVIDIA has gone to a tiling method with rasterization. This is a somewhat significant departure for the company considering they have utilized the same basic immediate mode rasterization model since the 90s.
The Videologic Apocolypse 3Dx based on the PowerVR PCX2.
(photo courtesy of Wikipedia)
Tiling is an interesting subject and we can harken back to the PowerVR days to see where it was first implemented. There are many advantages to tiling and deferred rendering when it comes to overall efficiency in power and memory bandwidth. These first TBDR (Tile Based Deferred Renderers) offered great performance per clock and could utilize slower memory as compared to other offerings of the day (namely Voodoo Graphics). There were some significant drawbacks to the technology. Essentially a lot of work had to be done by the CPU and driver in scene setup and geometry sorting. On fast CPU systems the PowerVR boards could provide very good performance, but it suffered on lower end parts as compared to the competition. This is a very simple explanation of what is going on, but the long and short of it is that TBDR did not take over the world due to limitations in its initial implementations. Traditional immediate mode rasters would improve in efficiency and performance with aggressive Z checks and other optimizations that borrow from the TBDR playbook.
Tiling is also present in a lot of mobile parts. Imagination’s PowerVR graphics technologies have been implemented by others such as Intel, Apple, Mediatek, and others. Qualcomm (Adreno) and ARM (Mali) both implement tiler technologies to improve power consumption and performance while increasing bandwidth efficiency. Perhaps most interestingly we can remember back to the Gigapixel days with the GP-1 chip that implemented a tiling method that seemed to work very well without the CPU hit and driver overhead that had plagued the PowerVR chips up to that point. 3dfx bought Gigapixel for some $150 million at the time. That company then went on to file bankruptcy a year later and their IP was acquired by NVIDIA.
Screenshot of the program used to uncover the tiling behavior of the rasterizer.
It now appears as though NVIDIA has evolved their raster units to embrace tiling. This is not a full TBDR implementation, but rather an immediate mode tiler that will still break up the scene in tiles but does not implement deferred rendering. This change should improve bandwidth efficiency when it comes to rasterization, but it does not affect the rest of the graphics pipeline by forcing it to be deferred (tessellation, geometry setup and shaders, etc. are not impacted). NVIDIA has not done a deep dive on this change for editors, so we do not know the exact implementation and what advantages we can expect. We can look at the evidence we have and speculate where those advantages exist.
The video where David Kanter explains his findings
Bandwidth and Power
Tilers have typically taken the tiled regions and buffered them on the chip. This is a big improvement in both performance and power efficiency as the raster data does not have to be cached and written out to the frame buffer and then swapped back. This makes quite a bit of sense considering the overall lack of big jumps in memory technologies over the past five years. We have had GDDR-5 since 2007/2008. The speeds have increased over time, but the basic technology is still much the same. We have seen HBM introduced with AMD’s Fury series, but large scale production of HBM 2 is still to come. Samsung has released small amounts of HBM 2 to the market, but not nearly enough to handle the needs of a mass produced card. GDDR-5X is an extension of GDDR-5 that does offer more bandwidth, but it is still not a next generation memory technology like HBM 2.
By utilizing a tiler NVIDIA is able to lower memory bandwidth needs for the rasterization stage. Considering that both Maxwell and Pascal architectures are based on GDDR-5 and 5x technologies, it makes sense to save as much bandwidth as possible where they can. This is again probably one, among many, of the reasons that we saw a much larger L2 cache in Maxwell vs. Kepler (2048 KB vs. 256KB respectively). Every little bit helps when we are looking at hard, real world bandwidth limits for a modern GPU.
The area of power efficiency has also come up in discussion when going to a tiler. Tilers have traditionally been more power efficient as well due to how the raster data is tiled and cached, requiring fewer reads and writes to main memory. The first impulse is to say, “Hey, this is the reason why NVIDIA’s Maxwell was so much more power efficient than Kepler and AMD’s latest parts!” Sadly, this is not exactly true. The tiler is more power efficient, but it is a small part to the power savings on a GPU.
The second fastest Pascal based card...
A modern GPU is very complex. There are some 7.2 billion transistors on the latest Pascal GP-104 that powers the GTX 1080. The vast majority of those transistors are implemented in the shader units of the chip. While the raster units are very important, they are but a fraction of that transistor budget. The rest is taken up by power regulation, PCI-E controllers, and memory controllers. In the big scheme of things the raster portion is going to be dwarfed in power consumption by the shader units. This does not mean that they are not important though. Going back to the hated car analogy, one does not achieve weight savings by focusing on one aspect alone. It is going over every single part of the car and shaving ounces here and there, and in the end achieving significant savings by addressing every single piece of a complex product.
This does appear to be the long and short of it. This is one piece of a very complex ASIC that improves upon memory bandwidth utilization and power efficiency. It is not the whole story, but it is an important part. I find it interesting that NVIDIA did not disclose this change to editors with the introduction of Maxwell and Pascal, but if it is transparent to users and developers alike then there is no need. There is a lot of “secret sauce” that goes into each architecture, and this is merely one aspect. The one question that I do have is how much of the technology is based upon the Gigapixel IP that 3dfx bought at such a premium? I believe that particular tiler was an immediate mode renderer as well due to it not having as many driver and overhead issues that PowerVR exhibited back in the day. Obviously it would not be a copy/paste of the technology that was developed back in the 90s, it would be interesting to see if it was the basis for this current implementation.
Introduction and Specifications
Dell's premium XPS notebook family includes both 15 inch and 13 inch variants, and ship with the latest 6th-generation Intel Skylake processors and all of the latest hardware. But the screens are what will grab your immediate attention; bright, rich, and with the narrowest bezels on any notebook courtesy of Dell's InfinityEdge displays.
Since Ryan’s review of the XPS 13, which is now his daily driver, Dell has added the XPS 15, which is the smallest 15-inch notebook design you will find anywhere. The XPS 13 is already "the smallest 13-inch laptop on the planet", according to Dell, giving their XPS series a significant advantage in the ultrabook market. The secret is in the bezel, or lack thereof, which allows Dell to squeeze these notebooks into much smaller physical dimensions than you might expect given their display sizes.
But you get more than just a compact size with these XPS notebooks, as the overall quality of the machines rivals that of anything else you will find; and may just be the best Windows notebooks you can buy right now. Is this simply bluster? Notebooks, like smartphones, are a personal thing. They need to conform to the user to provide a great experience, and there are obviously many different kinds of users to satisfy. Ultimately, however, Dell has produced what could easily be described as class leaders with these machines.
Introduction, Packaging, and Internals
Being a bit of a storage nut, I have run into my share of failed and/or corrupted hard drives over the years. I have therefore used many different data recovery tools to try to get that data back when needed. Thankfully, I now employ a backup strategy that should minimize the need for such a tool, but there will always be instances of fresh data on a drive that went down before a recent backup took place or a neighbor or friend that did not have a backup.
I’ve got a few data recovery pieces in the cooker, but this one will be focusing on ‘physical data recovery’ from drives with physically damaged or degraded sectors and/or heads. I’m not talking about so-called ‘logical data recovery’, where the drive is physically fine but has suffered some corruption that makes the data inaccessible by normal means (undelete programs also fall into this category). There are plenty of ‘hard drive recovery’ apps out there, and most if not all of them claim seemingly miraculous results on your physically failing hard drive. While there are absolutely success stories out there (most plastered all over testimonial pages at those respective sites), one must take those with an appropriate grain of salt. Someone who just got their data back with a <$100 program is going to be very vocal about it, while those who had their drive permanently fail during the process are likely to go cry quietly in a corner while saving up for a clean-room capable service to repair their drive and attempt to get their stuff back. I'll focus more on the exact issues with using software tools for hardware problems later in this article, but for now, surely there has to be some way to attempt these first few steps of data recovery without resorting to software tools that can potentially cause more damage?
Well now there is. Enter the RapidSpar, made by DeepSpar, who hope this little box can bridge the gap between dedicated data recovery operations and home users risking software-based hardware recoveries. DeepSpar is best known for making advanced tools used by big data recovery operations, so they know a thing or two about this stuff. I could go on and on here, but I’m going to save that for after the intro page. For now let’s get into what comes in the box.
Note: In this video, I read the MFT prior to performing RapidNebula Analysis. It's optimal to reverse those steps. More on that later in this article.
Cool your jets
Cool Your Jets: Can the Angelbird Wings PX1 Heatsink-Equipped PCIe Adapter Tame M.2 SSD Temps?
Introduction to the Angelbird Wings PX1
PCIe-based M.2 storage has been one of the more exciting topics in the PC hardware market during the past year. With tremendous performance packed into a small design no larger than a stick of chewing gum, PCIe M.2 SSDs open up new levels of storage performance and flexibility for both mobile and desktop computing. But these tiny, powerful drives can heat up significantly under load, to the point where thermal performance throttling was a critical concern when the drives first began to hit the market.
While thermal throttling is less of a concern for the latest generation of NVMe M.2 SSDs, Austrian SSD and accessories firm Angelbird wants to squash any possibility of performance-killing heat with its Wings line of PCIe SSD adapters. The company's first Wings-branded product is the PX1, a x4 PCIe adapter that can house an M.2 SSD in a custom-designed heatsink.
Angelbird claims that its aluminum-coated copper-core heatsink design can lower the operating temperature of hot M.2 SSDs like the Samsung 950 Pro, thereby preventing thermal throttling. But at a list price of $75, this potential protection doesn't come cheap. We set out to test the PX1's design to see if Angelbird's claims about reduced temperatures and increased performance hold true.
PX1 Design & Installation
PC Perspective's Allyn Malventano was impressed with the build quality of Angelbird's products when he reviewed its "wrk" series of SSDs in late 2014. Our initial impression of the PX1 revealed that Angelbird hasn't lost a step in that regard during the intervening years.
The PX1 features an attractive black design and removable heatsink, which is affixed to the PCB via six hex screws. A single M-key M.2 port resides in the center of the adapter, with mounting holes to accommodate 2230, 2242, 2260, 2280, and 22110-length drives.
Introduction and Technical Specifications
Courtesy of XSPC
Courtesy of XSPC
Courtesy of XSPC
XSPC is a well established name in the enthusiast cooling market, offering a wide range of custom cooling components and kits. Their newest CPU waterblock, the Raystorm Pro, offers a new look and optimized design in comparison to their last generation Raystorm CPU waterblock. The block features an all copper design with a dual metal / acrylic hold down plate for illumination around the outside edge of the block. The Raystorm Pro is compatible with all current CPU sockets with the currect mounting kit.
Make Sure You Understand Before the Deadline
I'm fairly sure that any of our readers who want Windows 10 have already gone through the process to get it, and the rest have made it their mission to block it at all costs (or they don't use Windows).
Regardless, there has been quite a bit of misunderstanding over the last couple of years, so it's better to explain it now than a week from now. Upgrading to Windows 10 will not destroy your original Windows 7 or Windows 8.x license. What you are doing is using that license to register your machine with Windows 10, which Microsoft will create a digital entitlement for. That digital entitlement will be good “for the supported lifetime of the Windows 10-enabled device”.
There's three misconceptions that kept recurring from the above paragraph.
First, “the supported lifetime of the Windows 10-enabled device” doesn't mean that Microsoft will deactivate Windows 10 on you. Instead, it apparently means that Microsoft will continue to update Windows 10, and require that users will keep the OS somewhat up to date (especially the Home edition). If an old or weird piece of hardware or software in your device becomes incompatible with that update, even if it is critical for the device to function, then Microsoft is allowing itself to shrug and say “that sucks”. There's plenty of room for legitimate complaints about this, and Microsoft's recent pattern of weakened QA and support, but the specific complaint that Microsoft is just trying to charge you down the line? False.
Second, even though I already stated it earlier in this post, I want to be clear: you can still go back to Windows 7 or Windows 8.x. Microsoft is granting the Windows 10 license for the Windows 7 or Windows 8.x device in addition to the original Windows 7 or Windows 8.x license granted to it. The upgrade process even leaves the old OS on your drive for a month, allowing the user to roll back through a recovery process. I've heard people say that, occasionally, this process can screw a few things up. It's a good idea to manage your own backup before upgrading, and/or plan on re-installing Windows 7 or 8.x the old fashioned way.
This brings us to the third misconception: you can re-install Windows 10 later!
If you upgrade to Windows 10, decide that you're better with Windows 7 or 8.x for a while, but decide to upgrade again in a few years, then your machine (assuming the hardware didn't change enough to look like a new device) will still use that Windows 10 entitlement that was granted to you on your first, free upgrade. You will need to download the current Windows 10 image from Microsoft's website, but, when you install it, you should be able to just input an empty license key (if they still ask for it by that point) and Windows 10 will pull down validation from your old activation.
If you have decided to avoid Windows 10, but based that decision on the above three, incorrect points? You now have the tools to make an informed decision before time runs out. Upgrading to Windows 10 (Update (immediate): waiting until it verifies that it successfully activated!) and rolling back is annoying, and it could be a hassle if it doesn't go cleanly (or your go super-safe and back-up ahead of time), but it might save you some money in the future.
On the other hand, if you don't want Windows 10, and never want Windows 10, then Microsoft will apparently stop asking Windows 7 and Windows 8.x users starting on the 29th, give or take.
Introduction and Features
SFX form factor cases and power supplies continue grow in popularity and in market share. As one of the original manufacturers of SFX power supplies, Silverstone Technology Co. is meeting demand with new products; continuing to raise the bar in the SFX power supply arena with the introduction of their new SX700-LPT unit.
(SX=SFX Form Factor, 700=700W, L=Lengthened, PT=Platinum certified)
SilverStone has a long-standing reputation for providing a full line of high quality enclosures, power supplies, cooling components, and accessories for PC enthusiasts. With a continued focus on smaller physical size and support for small form-factor enthusiasts, SilverStone added the new SX700-LPT to their SFX form factor series. There are now seven power supplies in the SFX Series, ranging in output capacity from 300W to 700W. The SX700-LPT is the second SFX unit to feature a lengthened chassis. The SX700-LPT enclosure is 30mm (1.2”) longer than a standard SFX chassis, which allows using a quieter 120mm cooling fan rather than the typical 80mm fan used in most SFX power supplies.
The new SX700-LPT power supply was designed for small form factor cases but it can also be used in place of a standard ATX power supply (in small cases) with an optional mounting bracket. In addition to its small size, the SX700-LPT features high efficiency (80 Plus Platinum certified), all modular flat ribbon-style cables, and provides up to 700W of continuous DC output (750W peak). The SX700-LPT also operates in semi-fanless mode and incorporates a very quiet 120mm cooling fan.
SilverStone SX700-LPT PSU Key Features:
• Small Form Factor (SFX-L) design
• 700W continuous power output rated for 24/7 operation
• Very quiet with semi-fanless operation
• 120mm cooling fan optimized for low noise
• 80 Plus Platinum certified for high efficiency
• Powerful single +12V rail with 58.4A capacity
• All-modular, flat ribbon-style cables
• High quality construction with all Japanese capacitors
• Strict ±3% voltage regulation and low AC ripple and noise
• Support for high-end GPUs with four PCI-E 8/6-pin connectors
• Safety Protections: OCP, OVP, UVP, SCP, OTP, and OPP
Introduction: Rethinking the Stock Cooler
AMD's Wraith cooler was introduced at CES this January, and has been available with select processors from AMD for a few months. We've now had a chance to put one of these impressive-looking CPU coolers through its paces on the test bench to see how much it improves on the previous model, and see if aftermarket cooling is necessary with AMD's flagship parts anymore.
While a switch in the bundled stock cooler might not seem very compelling, the fact that AMD has put effort into improving this aspect of their retail CPU offering is notable. AMD processors already present a great value relative to Intel's offerings for gaming and desktop productivity, but the stock coolers have to this point warranted a replacement.
Intel went the other direction with the current generation of enthusiast processors, as CPUs such as my Core i5-6600k no longer ship with a cooler of any kind. If AMD has upgraded the stock CPU cooler to the point that it now cools efficiently without significant noise, this will save buyers a little more cash when planning an upgrade, which is always a good thing.
The previous AMD stock cooler (left) and the AMD Wraith cooler (right)
A quick search for "Wraith" on Amazon yields retail-box products like the A10-7890K APU, and the FX-8370 CPU; options which have generally required an aftermarket cooler for the highest performance. In this review we’ll take a close look at the results with the previous cooler and the Wraith, and throw in results from the most popular aftermarket cooler of them all; the Cooler Master Hyper 212 EVO.
Yes, We're Writing About a Forum Post
Update - July 19th @ 7:15pm EDT: Well that was fast. Futuremark published their statement today. I haven't read it through yet, but there's no reason to wait to link it until I do.
Update 2 - July 20th @ 6:50pm EDT: We interviewed Jani Joki, Futuremark's Director of Engineering, on our YouTube page. The interview is embed just below this update.
Original post below
The comments of a previous post notified us of an Overclock.net thread, whose author claims that 3DMark's implementation of asynchronous compute is designed to show NVIDIA in the best possible light. At the end of the linked post, they note that asynchronous compute is a general blanket, and that we should better understand what is actually going on.
So, before we address the controversy, let's actually explain what asynchronous compute is. The main problem is that it actually is a broad term. Asynchronous compute could describe any optimization that allows tasks to execute when it is most convenient, rather than just blindly doing them in a row.
This is asynchronous computing.