All | Editorial | General Tech | Graphics Cards | Networking | Motherboards | Cases and Cooling | Processors | Chipsets | Memory | Displays | Systems | Storage | Mobile | Shows and Expos
Since the launch of NVIDIA's Pascal architecture with the GTX 1070 and 1080 last May, we've taken a look at a lot of Pascal-based products, including the recent launch of the GTX 1080 Ti. By now, it is clear that Pascal has proven itself in a gaming context.
One frequent request we get about GPU coverage is to look at professional uses cases for these sort of devices. While gaming is still far and away the most common use for GPUs, things like high-quality rendering in industries like architecture, and new industries like deep learning can see vast benefits from acceleration by GPUs.
Today, we are taking a look at some of the latest NVIDIA Quadro GPUs on the market, the Quadro P2000, P4000, and P5000.
Diving deep into the technical specs of these Pascal-based Quadro products and the AMD competitor we will be testing, we find a wide range of compute capability, power consumption, and price.
|Quadro P2000||Quadro P4000||Quadro P5000||Radeon Pro Duo|
|Code Name||GP106||GP104||GP104||Fiji XT x 2|
|Rated Clock Speed||1470 MHz (Boost)||1480 MHz (Boost)||1730 MHz (Boost)||up to 1000 MHz|
|Memory Width||160-bit||256-bit||256-bit||4096-bit (HBM) x 2|
|Compute Perf (FP32)||3.0 TFLOPS||5.3 TFLOPS||8.9 TFLOPS||16.38 TFLOPS|
|Compute Perf (FP64)||1/32 FP32||1/32 FP32||1/32 FP 32||1/16 FP32|
|Frame Buffer||5GB||8GB||16GB||8GB (4GB x 2)|
The astute readers will notice similarities to the NVIDIA GeForce line of products as they take a look at these specifications.
Introduction and Technical Specifications
Courtesy of ASUS
The Strix Z270E Gaming motherboard is among the Z270-based offerings in ASUS ROG Strix product line. The board's integrated Intel Z270 chipset integrates support for the latest Intel LGA1151 Kaby Lake processor line as well as Dual Channel DDR4 memory. With an MSRP of $199, the Strix Z270E Gaming board comes at a premium, more than justified by its feature set.
Tweaks for days
It seems like it’s been months since AMD launched Ryzen, its first new processor architecture in about a decade, when in fact we are only four weeks removed. One of the few concerns about the Ryzen processors centered on its performance in some gaming performance results, particularly in common resolutions like 1080p. While I was far from the only person to notice these concerns, our gaming tests clearly showed a gap between the Ryzen 7 1800X and the Intel Core i7-7700K and 6900K processors in Civilization 6, Hitman and Rise of the Tomb Raider.
A graph from our Ryzen launch coverage...
We had been working with AMD for a couple of weeks on the Ryzen launch and fed back our results with questions in the week before launch. On March 2nd, AMD’s CVP of Marketing John Taylor gave us a prepared statement that acknowledged the issue but promised changes come in form for game engine updates. These software updates would need to be implemented by the game developers themselves in order to take advantage of the unique and more complex core designs of the Zen architecture. We had quotes from the developers of Ashes of the Singularity as well as the Total War series to back it up.
And while statements promising change are nice, it really takes some proof to get the often skeptical tech media and tech enthusiasts to believe that change can actually happen. Today AMD is showing its first result.
The result of 400 developer hours of work, the Nitrous Engine powering Ashes of the Singularity received an update today to version 26118 that integrates updates to threading to better balance the performance across Ryzen 7’s 8 cores and 16 threads. I was able to do some early testing on the new revision, as well as with the previous retail shipping version (25624) to see what kind of improvements the patch brings with it.
Stardock / Oxide CEO Brad Wardell had this to say in a press release:
“I’ve always been vocal about taking advantage of every ounce of performance the PC has to offer. That’s why I’m a strong proponent of DirectX 12 and Vulkan® because of the way these APIs allow us to access multiple CPU cores, and that’s why the AMD Ryzen processor has so much potential,” said Stardock and Oxide CEO Brad Wardell. “As good as AMD Ryzen is right now – and it’s remarkably fast – we’ve already seen that we can tweak games like Ashes of the Singularity to take even more advantage of its impressive core count and processing power. AMD Ryzen brings resources to the table that will change what people will come to expect from a PC gaming experience.”
Our testing setup is in line with our previous CPU performance stories.
|Test System Setup|
|CPU||AMD Ryzen 7 1800X
Intel Core i7-6900K
|Motherboard||ASUS Crosshair VI Hero (Ryzen)
ASUS X99-Deluxe II (Broadwell-E)
|Storage||Corsair Force GS 240 SSD|
|Graphics Card||NVIDIA GeForce GTX 1080 8GB|
|Graphics Drivers||NVIDIA 378.49|
|Power Supply||Corsair HX1000|
|Operating System||Windows 10 Pro x64|
I was using the latest BIOS for our ASUS Crosshair VI Hero motherboard (1002) and upgraded to some Geil RGB (!!) memory capable of running at 3200 MHz on this board with a single BIOS setting adjustment. All of my tests were done at 1080p in order to return to the pain point that AMD was dealing with on launch day.
Let’s see the results.
These are substantial performance improvements with the new engine code! At both 2400 MHz and 3200 MHz memory speeds, and at both High and Extreme presets in the game (all running in DX12 for what that’s worth), the gaming performance on the GPU-centric is improved. At the High preset (which is the setting that AMD used in its performance data for the press release), we see a 31% jump in performance when running at the higher memory speed and a 22% improvement with the lower speed memory. Even when running at the more GPU-bottlenecked state of the Extreme preset, that performance improvement for the Ryzen processors with the latest Ashes patch is 17-20%!
It’s also important to note that Intel performance is unaffected – either for the better or worse. Whatever work Oxide did to improve the engine for AMD’s Ryzen processors had NO impact on the Core processors, which is interesting to say the least. The cynic in me would believe there is little chance that any agnostic changes to code would raise Intel’s multi-core performance at least a little bit.
So what exactly is happening to the engine with v26118? I haven’t had a chance to have an in-depth conversation with anyone at AMD or Oxide yet on the subject, but at a high level, I was told that this is what happens when instructions and sequences are analyzed for an architecture specifically. “For basically 5 years”, I was told, Oxide and other developers have dedicated their time to “instruction traces and analysis to maximize Intel performance” which helps to eliminate poor instruction setup. After spending some time with Ryzen and the necessary debug tools (and some AMD engineers), they were able to improve performance on Ryzen without adversely affecting Intel parts.
Core to core latency testing on Ryzen 7 1800X
I am hoping to get more specific detail in the coming days, but it would seem very likely that Oxide was able to properly handle the more complex core to core communication systems on Ryzen and its CCX implementation. We demonstrated early this month how thread to thread communication across core complexes causes substantially latency penalties, and that a developer that intelligently manages threads that have dependencies on the core complex can improve overall performance. I would expect this is at least part of the solution Oxide was able to integrate (and would also explain why Intel parts are unaffected).
- Ryzen 7 1800X - $499 - Amazon.com
- Ryzen 7 1700X - $399 - Amazon.com
- Ryzen 7 1700 - $329 - Amazon.com
What is important now is that AMD takes this momentum with Ashes of the Singularity and actually does something with it. Many of you will recognize Ashes as the flagship title for Mantle when AMD made that move to change the programming habits and models for developers, and though Mantle would eventually become Vulkan and drive DX12 development, it did not foretell an overall shift as it hoped to. Can AMD and its developer relations team continue to make the case that spending time and money (which is what 400 developer hours equates to) to make specific performance enhancements for Ryzen processors is in the best interest of everyone? We’ll soon find out.
Introduction and Packaging
Data Robotics shipped their first product 10 years ago. Dubbed the Drobo (short for Data Robot), it was a 4-bay hot-swappable USB 2.0 connected external storage device. At a time where RAID was still a term mostly unknown to typical PC users, the Drobo was already pushing the concept of data redundancy past what those familiar with RAID were used to. BeyondRAID offered a form of redundant data storage that decoupled rigid RAID structures from fixed capacity disk packs. While most RAID volumes were 'dumb', BeyondRAID was aware of what was stored within its partitions, distributing that data in block format across the available disks. This not only significantly speed up rebuilding (only used portions of the disks need be recopied), it allowed for other cool tricks like the ability to mix drive capacities within the same array. Switching between parity levels could also be done on-the-fly and with significantly less effort than traditional RAID migrations.
While all of the above was great, the original Drobo saw performance hits from its block level management, which was limited by the processing overhead combined with the available processing power for such a device at the time. The first Drobo model was lucky to break 15 MB/s, which could not even fully saturate a USB 2.0 link. After the launch, requests for network attached capability led to the launch of the DroboShare, which could act as a USB to ethernet bridge. It worked but was still limited by the link speed of the connected Drobo. A Drobo FS launched a few years later, but it was not much quicker. Three years after that we got the 5N, which was finally a worthy contender in the space.
10 years and nearly a dozen models later, we now have the Drobo 5N2, which will replace the aging 5N. The newer model retains the same 5-bay form factor and mSATA bay for optional SSD cache but adds a second bondable Gigabit Ethernet port and upgrades most of the internals. Faster hardware specs and newer more capable firmware enables increased throughput and volume sizes up to 64TB. Since BeyondRAID is thin provisioned, you always make the volume as large as it can be and simply add disk capacity as the amount of stored content grows over time.
Today Samsung released an update to their EVO+ microSD card line. The new model is the 'EVO Plus'. Yes, I know, it's confusing to me as well, especially when trying to research the new vs. old iterations for this mini-review. Here's a few quick visual comparisons between both models:
On the left, we have the 'older' version of the Plus (I mean the '+'), while on the right we have the new plus, designated as a '2017 model' on the Samsung site. Note the rating differences between the two. The '+' on the left is rated at UHS-I U1 (10 MB/s minimum write speed), while the newer 'Plus' version is rated at UHS-I U3 (30 MB/s minimum write speed). I also ran across what looked like the older version packaging.
The packaging on the right is what we had in hand for this review. The image on the left was found at the Samsung website, and confuses things even further, as the 'Plus' on the package does not match the markings on the card itself ('+'). It looks as if Samsung may have silently updated the specs of the 256GB '+' model at some point in the recent past, as that model claims significantly faster write speeds (90 MB/s) than the older/other '+' models previously claimed (~20 MB/s). With that confusion out of the way, let's dig into the specs of this newest EVO Plus:
For clarification on the Speed Class and Grade, I direct you to our previous article covering those aspects in detail. For here I'll briefly state that the interface can handle 104 MB/s while the media itself is required to sustain a minimum of 30 MB/s of typical streaming recorded content. The specs go on to claim 100MB/s reads and 90 MB/s writes (60 MB/s for the 64GB model). Doing some quick checks, here's what I saw with some simple file copies to and from a 128GB EVO Plus:
Our figures didn't exceed the specified performance, but they came close, which more than satisfies their 'up to' claim, with over 80 MB/s writes and 93 MB/s reads. I was able to separately confirm 85-89 MB/s writes and 99 MB/s reads with Iometer accessing with 128KB sequential transfers.
- 32GB: $29.99
- 64GB: $49.99
- 128GB: $99.99
- 256GB: coming soon (but there is already a 256GB EVO+ of similar specs???)
Pricing seems to be running a bit high on these, with pricing running close to double of the previous version of this very same part (the EVO+ 128GB can be found for $50 at the time of this writing). Sure you are getting a U3 rated card with over four times the achievable write speed, but the reads are very similar, and if your camera only requires U1 speeds, the price premium does not seem to be worthwhile. It is also worth noting that even faster UHS-II spec cards that transfer at 150 MB/s can be had and even come with a reader at a lower cost.
In summary, the Samsung EVO Plus microSD cards look to be decent performers, but the pricing needs to come down some to be truly competitive in this space. I'd also like to see the product labeling and marketing a bit more clear between the '+' and the 'Plus' models, as they can easily confuse those not so familiar with SD card classes and grades. It also makes searching for them rather difficult, as most search engines parse 'Plus' interchangeably with '+', adding to the potential confusion.
The Need for Speed
Around here storage is Allyn’s territory, but I decided to share my experience with a new $20 flash drive I picked up that promised some impressive speeds via USB 3.0. The drive is the Lexar JumpDrive P20, and I bought the 32GB version, which is the lowest capacity of the three drives in the series. 64GB and 128GB versions of the JumpDrive P20 are available, with advertised speeds of up to 400 MB/s from all three, and reads and up to 270 MB/s writes - if you buy the largest capacity.
My humble 32GB model still boasts up to 140 MB/s writes, which would be faster than any USB drive I’ve ever owned (my SanDisk Extreme USB 3.0 16GB drive is limited to 60 MB/s writes, and can hit about 190 MB/s reads), and the speeds of the P20 even approach that of some lower capacity SATA 3 SSDs - if it lives up to the claims. The price was right, so I took the plunge. (My hard-earned $20 at stake!)
Size comparison with other USB flash drives on hand (P20 on far right)
First we'll look at the features from Lexar:
- Among the fastest USB flash drives available, with speeds up to 400MB/s read and 270MB/s write
- Sleek design with metal alloy base and high-gloss mirror finish top
- Securely protects files using EncryptStick Lite software, an advanced security solution with 256-bit AES encryption
- Reliably stores and transfers files, photos, videos, and more
- High-capacity options to store more files on the go
- Compatible with PC and Mac systems
- Backwards compatible with USB 2.0 devices
- Limited lifetime warranty
Introduction and Features
SilverStone continues to push the envelope of power density with the release of their new SX800-LTI small form factor power supply. Following close on the heels of the SX700-LPT, the new unit now packs 800 watts into a small chassis. SFX form factor cases and power supplies continue grow in popularity and in market share and as one of the original manufacturers of SFX power supplies, Silverstone Technology Co. is striving to meet customer demand.
(SX=SFX Form Factor, 800=800W, L=Lengthened, TI=Titanium certified)
SilverStone has a long-standing reputation for providing a full line of high quality enclosures, power supplies, cooling components, and accessories for PC enthusiasts. With a continued focus on smaller physical size and support for small form-factor enthusiasts, SilverStone added the new SX800-LTI to their SFX form factor series. There are now eight power supplies in the SFX Series, ranging in output capacity from 300W to 800W. The SX800-LTI is the third SilverStone unit to feature a lengthened SFX chassis. The SX800-LTI enclosure is 30mm (1.2”) longer than a standard SFX power supply case, which allows using a quieter 120mm cooling fan rather than the typical 80mm fan used in most SFX power supplies.
In addition to its small size, the SX800-LTI features very high efficiency (80 Plus Titanium certified), all modular flat ribbon-style cables, and provides up to 800W of continuous DC output (850W peak). The SX800-LTI also operates in semi-fanless mode and incorporates a very quiet 120mm cooling fan.
SilverStone SX800-LTI PSU Key Features:
• Small Form Factor (SFX-L) design
• 800W continuous power output rated for 24/7 operation
• 80 Plus Titanium certified for very high efficiency
• Quiet operation with semi-fanless operation
• 120mm cooling fan optimized for low noise
• Powerful single +12V rail with 66A capacity
• All-modular, flat ribbon-style cables
• High quality construction with all Japanese capacitors
• Strict ±3% voltage regulation and low AC ripple and noise
• Support for high-end GPUs with four PCI-E 8/6-pin connectors
• Safety Protections: OCP, OPP, OVP, UVP, SCP, and OTP
Here is what SilverStone has to say about their new SX800-LTI power supply:
“Since its launch in 2015, the SFX-L form factor has garnered popular recognition and support among enthusiasts with its larger 120mm fan able to achieve better balance of power and quietness in small form factor PCs than what was possible with standard SFX. And as a leader in power supply miniaturization, SilverStone has continued its efforts in advancing the SFX-L forward to reach ever higher limit.
The SX800-LTI not only has unprecedented 800 watts of power output but also has the highest level of 80 PLUS efficiency with a Titanium rating. It includes all features available from top of the line SilverStone PSUs such as flexible flat cables, all Japanese capacitors and advanced semi-fanless capability. For those looking to build the most efficient small form factor systems possible with great quality and power, the SX800-LTI is definitely the top choice.”
Introduction: A Hybrid Approach
The Hex 2.0 from Phononic is not your typical CPU cooler. It functions as both a thermoelectric cooler (TEC) - which you may also know as a Peltier cooler - and as a standard heatsink/fan, depending on CPU load. It offers a small footprint for placement in all but the lowest-profile systems, yet it boasts cooling potential beyond other coolers of its size. Yes, it is expensive, but this is a far more complex device than a standard air or even all-in-one liquid cooler - and obviously much smaller than even the most compact AiO liquid coolers.
“The HEX 2.0 combines a proprietary state-of-the-art high performance thermoelectric module with an innovative heat exchanger. The small form factor CPU cooler pioneers a new category of cooling technology. The compact design comfortably fits in small chassis, including mini-ITX cases, while delivering cooling capacity beyond that of much larger coolers.”
Even though it does not always need to function as such, the Hex 2.0 is a thermoelectric cooling device, and that alone makes it interesting from a PC hardware enthusiast point of view (at least mine, anyway). The 'active-passive' approach taken by Phononic with the Hex 2.0 allows for greater performance potential that would otherwise be possible from a smaller TEC device, though our testing will of course reveal how effective it is in actual use.
HEX 2.0 features an Active-Passive design (Credit: Phononic)
The goal for the HEX 2.0 CPU cooler was to provide similar cooling performance to all-in-one (AIO) liquid coolers or the very largest fan-heat sinks in a package that could fit into the smallest PC form factors (like miniITX). The active-passive design is what makes this possible. By splitting the CPU heat into two paths, as shown in Figure 1 (Ed. the above image), the thermoelectric device can be sized at an optimal point where it can provide the most benefit for lowering CPU temperature without having to be large enough to pump the entire CPU thermal load. We also designed electronic controls to turn off the thermoelectric heat pump at times of low CPU load, making for an energy efficient cooler that provides adequate cooling with zero power draw at low CPU loads. However, when the CPU is stressed and the CPU heat load increases, the electronic controls energize the thermoelectric heat pump, lowering the temperature of the passive base plate and the CPU itself. The active-passive design has one further benefit – when used in conjunction with the electronic controls, this design virtually eliminates the risk of condensation for the HEX 2.0.
A new start
Qualcomm is finally ready to show the world how the Snapdragon 835 Mobile Platform performs. After months of teases and previews, including a the reveal that it was the first processor built on Samsung’s 10nm process technology and a mostly in-depth look at the architectural changes to the CPU and GPU portions of the SoC, the company let a handful of media get some hands-on time with development reference platform and run some numbers.
To frame the discussion as best I can, I am going to include some sections from my technology overview. This should give some idea of what to expect from Snapdragon 835 and what areas Qualcomm sees providing the widest variation from previous SD 820/821 product.
Qualcomm frames the story around the Snapdragon 835 processor with what they call the “five pillars” – five different aspects of mobile processor design that they have addressed with updates and technologies. Qualcomm lists them as battery life (efficiency), immersion (performance), capture, connectivity, and security.
Starting where they start, on battery life and efficiency, the SD 835 has a unique focus that might surprise many. Rather than talking up the improvements in performance of the new processor cores, or the power of the new Adreno GPU, Qualcomm is firmly planted on looking at Snapdragon through the lens of battery life. Snapdragon 835 uses half of the power of Snapdragon 801.
Since we already knew that the Snapdragon 835 was going to be built on the 10nm process from Samsung, the first such high performance part to do so, I was surprised to learn that Qualcomm doesn’t attribute much of the power efficiency improvements to the move from 14nm to 10nm. It makes sense – most in the industry see this transition as modest in comparison to what we’ll see at 7nm. Unlike the move from 28nm to 14/16nm for discrete GPUs, where the process technology was a huge reason for the dramatic power drop we saw, the Snapdragon 835 changes come from a combination of advancements in the power management system and offloading of work from the primary CPU cores to other processors like the GPU and DSP. The more a workload takes advantage of heterogeneous computing systems, the more it benefits from Qualcomm technology as opposed to process technology.
If you look at the current 2-in-1 notebook market, it is clear that the single greatest influence is the Lenovo Yoga. Despite initial efforts to differentiate convertible Notebook-tablet designs, newly released machines such as the HP Spectre x360 series and the Dell XPS 13" 2-in-1 make it clear that the 360-degree "Yoga-style" hinge is the preferred method.
Today, we are looking at a unique application on the 360-degree hinge, the Lenovo Yoga Book. Will this new take on the 2-in-1 concept be so influential?
The Lenovo Yoga Book is 10.1" tablet that aims to find a unique way to implement a stylus on a modern touch device. The device itself is a super thin clamshell-style design, featuring an LCD on one side of the device, and a large touch-sensitive area on the opposing side.
This large touch area serves two purposes. Primarily, it acts as a surface for the included stylus that Lenovo is calling the Real Pen. Using the Real Pen, users can do thing such as sketch in Adobe Photoshop and Illustrator or takes notes in an application such as Microsoft OneNote.
The Real Pen has more tricks up its sleeve than just a normal stylus. It can be converted from a pen with a Stylus tip on it to a full ballpoint pen. When paired with the "Create Pad" included with the Yoga Book, you can write on top of a piece of actual paper using the ballpoint pen, and still have the device pick up on what you are drawing.
New "Fabric" for ARM
There are cars that get you from point A to point B, and then there are luxurious grand touring cars which will get you there with power, comfort, and style - for a price. Based on the cost alone ($269.99 MSRP!) it seems like a safe bet to say that the REALFORCE RGB keyboard will be a similarly premium experience. Let’s take a look!
There is as much personal taste at issue when considering a keyboard (or dream car!) as almost any other factor, and regardless of build quality or performance a keyboard is probably not going to work out for you if it doesn’t feel right. Mechanical keyboards are obviously quite popular, and more companies than ever offer their own models, many using Cherry MX key switches (or generic ‘equivalents’ - which vary in quality). Topre keys are different, as they are a capacitive key with a rubber dome and metal spring, and have a very smooth, fast feel to them - not clicky at all.
“Topre capacitive key switches are a patented hybrid between a mechanical spring based switch, a rubber dome switch, and a capacitive sensor which, combined, provide tactility, comfort, and excellent durability. The unique electrostatic design of Topre switches requires no physical mechanical coupling and therefore key switch bounce/chatter is eliminated.”
Here Comes the Midrange!
Today AMD is announcing the upcoming Ryzen 5 CPUs. A little bit was known about them from several weeks ago when AMD talked about their upcoming 6 core processors, but official specifications were lacking. Today we get to see what Ryzen 5 is mostly about.
There are four initial SKUs that AMD is talking about this evening. These encompass quad core and six core products. There are two “enthusiast” level SKUs with the X connotation while the other two are aimed at a less edgy crowd.
The two six core CPUs are the 1600 and 1600X. The X version features the higher extended frequency range when combined with performance cooling. That unit is clocked at a base 3.6 GHz and achieves a boost of 4 GHz. This compares well to the top end R7 1800X, but it is short 2 cores and four threads. The price of the R5 1600X is a very reasonable $249. The 1600 does not feature the extended range, but it does come in at a 3.2 GHz base and 3.6 GHz boost. The R5 1600 has a MSRP of $219.
When we get to the four core, eight thread units we see much the same stratification. The top end 1500X comes in at $189 and features a base clock of 3.5 GHz and a boost of 3.7 GHz. What is interesting about this model is that the XFR is raised by 100 MHz vs. other XFR CPUs. So instead of an extra 100 MHz boost when high end cooling is present we can expect to see 200 MHz. In theory this could run at 3.9 GHz in the extended state. The lowest priced R5 is the 1400 which comes in at a very modest $169. This features a 3.2 GHz base clock and a 3.4 GHz boost.
The 1400, 1500, and 1600 CPUs come with Wraith cooling solutions. The 1600X comes bare as it is assumed that users want to use something a bit more robust. The R5 1400 comes with the lower end Wraith Stealth cooler while the R5 1500X and R5 1600 come with the bigger Wraith Spire. The bottom 3 SKUs are all rated at 65 watts TDP. The 1600X comes in at the higher 95 watt rating. Each of the CPUs are unlocked for overclocking.
These chips will provide a more fleshed out pricing structure for the Ryzen processors and provide users and enthusiasts with lower cost options for those wanting to invest in AMD again. These chips all run on the new AM4 platform which are pretty strong in terms of features and I/O performance.
AMD is not shipping these parts today, but rather announcing them. Review samples are not in hand yet and AMD expects world-wide availability by April 11. This is likely a very necessary step for AMD as current AM4 motherboard availability is not at the level we were expecting to see. We also are seeing some pretty quick firmware updates from motherboard partners to address issues with these first AM4 boards. By April 11 I would expect to see most of the issues solved and a healthy supply of motherboards on the shelves to handle the influx of consumers waiting to buy these more midrange priced CPUs from AMD.
What they did not cover or answer would be how the four core products would be presented. Would each be a single CCX and only 8 MB of L3 cace, or would AMD disable two cores in each CCX and present 16 MB of L3? We currently do not have the answer to this. Considering the latency between accessing different CCX units we can surely hope they only keep one CCX active.
Ryzen has certainly been a success for AMD and I have no doubt that their quarter will be pretty healthy with the estimated sales of around 1 million Ryzen CPUs since launch. Announcing these new chips will give the mainstream and budget enthusiasts something to look forward to and plan their purchases around. AMD is not announcing the Ryzen 3 products at this time.
Update: AMD got back to me this morning about a question I asked them about the makeup of cores, CCX units, and L3 cache. Here is their response.
1600X: 3+3 with 16MB L3 cache. 1600: 3+3 with 16MB L3 cache. 1500X: 2+2 with 16MB L3 cache. 1400: 2+2 with 8MB L3 cache. As with Ryzen 7, each core still has 512KB local L2 cache.
Background and setup
A couple of weeks back, during the excitement surrounding the announcement of the GeForce GTX 1080 Ti graphics card, NVIDIA announced an update to its performance reporting project known as FCAT to support VR gaming. The updated iteration, FCAT VR as it is now called, gives us the first true ability to not only capture the performance of VR games and experiences, but the tools with which to measure and compare.
Watch ths video walk through of FCAT VR with me and NVIDIA's Tom Petersen
I already wrote an extensive preview of the tool and how it works during the announcement. I think it’s likely that many of you overlooked it with the noise from a new GPU, so I’m going to reproduce some of it here, with additions and updates. Everyone that attempts to understand the data we will be presenting in this story and all VR-based tests going forward should have a baseline understanding of the complexity of measuring VR games. Previous tools don’t tell the whole story, and even the part they do tell is often incomplete.
If you already know how FCAT VR works from reading the previous article, you can jump right to the beginning of our results here.
Measuring and validating those claims has proven to be a difficult task. Tools that we used in the era of standard PC gaming just don’t apply. Fraps is a well-known and well-understood tool for measuring frame rates and frame times utilized by countless reviewers and enthusiasts, but Fraps lacked the ability to tell the complete story of gaming performance and experience. NVIDIA introduced FCAT and we introduced Frame Rating back in 2013 to expand the capabilities that reviewers and consumers had access to. Using more sophisticated technique that includes direct capture of the graphics card output in uncompressed form, a software-based overlay applied to each frame being rendered, and post-process analyzation of that data, we could communicate the smoothness of a gaming experience, better articulating it to help gamers make purchasing decisions.
For VR though, those same tools just don’t cut it. Fraps is a non-starter as it measures frame rendering from the GPU point of view and completely misses the interaction between the graphics system and the VR runtime environment (OpenVR for Steam/Vive and OVR for Oculus). Because the rendering pipeline is drastically changed in the current VR integrations, what Fraps measures is completely different than the experience the user actually gets in the headset. Previous FCAT and Frame Rating methods were still viable but the tools and capture technology needed to be updated. The hardware capture products we used since 2013 were limited in their maximum bandwidth and the overlay software did not have the ability to “latch in” to VR-based games. Not only that but measuring frame drops, time warps, space warps and reprojections would be a significant hurdle without further development.
NVIDIA decided to undertake the task of rebuilding FCAT to work with VR. And while obviously the company is hoping that it will prove its claims of performance benefits for VR gaming, it should not be overlooked the investment in time and money spent on a project that is to be open sourced and free available to the media and the public.
NVIDIA FCAT VR is comprised of two different applications. The FCAT VR Capture tool runs on the PC being evaluated and has a similar appearance to other performance and timing capture utilities. It uses data from Oculus Event Tracing as a part of the Windows ETW and SteamVR’s performance API, along with NVIDIA driver stats when used on NVIDIA hardware to generate performance data. It will and does work perfectly well on any GPU vendor’s hardware though with the access to the VR vendor specific timing results.
With the introduction of the Intel Kaby Lake processors and Intel Z270 chipset, unprecedented overclocking became the norm. The new processors easily hit a core speed of 5.0GHz with little more than CPU core voltage tweaking. This overclocking performance increase came with a price tag. The Kaby Lake processor runs significantly hotter than previous generation processors, a seeming reversal in temperature trends from previous generation Intel CPUs. At stock settings, the individual cores in the CPU were recording in testing at hitting up to 65C - and that's with a high performance water loop cooling the processor. Per reports from various enthusiasts sites, Intel used inferior TIM (thermal interface material) in between the CPU die and underside of the CPU heat spreader, leading to increased temperatures when compared with previous CPU generations (in particular Skylake). This temperature increase did not affect overclocking much since the CPU will hit 5.0GHz speed easily, but does impact the means necessary to hit those performance levels.
Like with the previous generation Haswell CPUs, a few of the more adventurous enthusiasts used known methods in an attempt to address the heat concerns of the Kaby Lake processor be delidding the processor. Unlike in the initial days of the Haswell processor, the delidding process is much more stream-lined with the availability of delidding kits from several vendors. The delidding process still involves physically removing the heat spreader from the CPU, and exposing the CPU die. However, instead of cooling the die directly, the "safer" approach is to clean the die and underside of the heat spreader, apply new TIM (thermal interface material), and re-affix the heat spreader to the CPU. Going this route instead of direct-die cooling is considered safer because no additional or exotic support mechanisms are needed to keep the CPU cooler from crushing your precious die. However, calling it safe is a bit of an over-statement, you are physically separating the heat spreader from the CPU surface and voiding your CPU warranty at the same time. Although if that was a concern, you probably wouldn't be reading this article in the first place.
** UPDATE 3/13 5 PM **
AMD has posted a follow-up statement that officially clears up much of the conjecture this article was attempting to clarify. Relevant points from their post that relate to this article as well as many of the requests for additional testing we have seen since its posting (emphasis mine):
"We have investigated reports alleging incorrect thread scheduling on the AMD Ryzen™ processor. Based on our findings, AMD believes that the Windows® 10 thread scheduler is operating properly for “Zen,” and we do not presently believe there is an issue with the scheduler adversely utilizing the logical and physical configurations of the architecture."
"Finally, we have reviewed the limited available evidence concerning performance deltas between Windows® 7 and Windows® 10 on the AMD Ryzen™ CPU. We do not believe there is an issue with scheduling differences between the two versions of Windows. Any differences in performance can be more likely attributed to software architecture differences between these OSes."
So there you have it, straight from the horse's mouth. AMD does not believe the problem lies within the Windows thread scheduler. SMT performance in gaming workloads was also addressed:
"Finally, we have investigated reports of instances where SMT is producing reduced performance in a handful of games. Based on our characterization of game workloads, it is our expectation that gaming applications should generally see a neutral/positive benefit from SMT. We see this neutral/positive behavior in a wide range of titles, including: Arma® 3, Battlefield™ 1, Mafia™ III, Watch Dogs™ 2, Sid Meier’s Civilization® VI, For Honor™, Hitman™, Mirror’s Edge™ Catalyst and The Division™. Independent 3rd-party analyses have corroborated these findings.
For the remaining outliers, AMD again sees multiple opportunities within the codebases of specific applications to improve how this software addresses the “Zen” architecture. We have already identified some simple changes that can improve a game’s understanding of the "Zen" core/cache topology, and we intend to provide a status update to the community when they are ready."
We are still digging into the observed differences of toggling SMT compared with disabling the second CCX, but it is good to see AMD issue a clarifying statement here for all of those out there observing and reporting on SMT-related performance deltas.
** END UPDATE **
Editor's Note: The testing you see here was a response to many days of comments and questions to our team on how and why AMD Ryzen processors are seeing performance gaps in 1080p gaming (and other scenarios) in comparison to Intel Core processors. Several outlets have posted that the culprit is the Windows 10 scheduler and its inability to properly allocate work across the logical vs. physical cores of the Zen architecture. As it turns out, we can prove that isn't the case at all. -Ryan Shrout
Initial reviews of AMD’s Ryzen CPU revealed a few inefficiencies in some situations particularly in gaming workloads running at the more common resolutions like 1080p, where the CPU comprises more of a bottleneck when coupled with modern GPUs. Lots of folks have theorized about what could possibly be causing these issues, and most recent attention appears to have been directed at the Windows 10 scheduler and its supposed inability to properly place threads on the Ryzen cores for the most efficient processing.
I typically have Task Manager open while running storage tests (they are boring to watch otherwise), and I naturally had it open during Ryzen platform storage testing. I’m accustomed to how the IO workers are distributed across reported threads, and in the case of SMT capable CPUs, distributed across cores. There is a clear difference when viewing our custom storage workloads with SMT on vs. off, and it was dead obvious to me that core loading was working as expected while I was testing Ryzen. I went back and pulled the actual thread/core loading data from my testing results to confirm:
The Windows scheduler has a habit of bouncing processes across available processor threads. This naturally happens as other processes share time with a particular core, with the heavier process not necessarily switching back to the same core. As you can see above, the single IO handler thread was spread across the first four cores during its run, but the Windows scheduler was always hitting just one of the two available SMT threads on any single core at one time.
My testing for Ryan’s Ryzen review consisted of only single threaded workloads, but we can make things a bit clearer by loading down half of the CPU while toggling SMT off. We do this by increasing the worker count (4) to be half of the available threads on the Ryzen processor, which is 8 with SMT disabled in the motherboard BIOS.
SMT OFF, 8 cores, 4 workers
With SMT off, the scheduler is clearly not giving priority to any particular core and the work is spread throughout the physical cores in a fairly even fashion.
Now let’s try with SMT turned back on and doubling the number of IO workers to 8 to keep the CPU half loaded:
SMT ON, 16 (logical) cores, 8 workers
With SMT on, we see a very different result. The scheduler is clearly loading only one thread per core. This could only be possible if Windows was aware of the 2-way SMT (two threads per core) configuration of the Ryzen processor. Do note that sometimes the workload will toggle around every few seconds, but the total loading on each physical core will still remain at ~%50. I chose a workload that saturated its thread just enough for Windows to not shift it around as it ran, making the above result even clearer.
Synthetic Testing Procedure
While the storage testing methods above provide a real-world example of the Windows 10 scheduler working as expected, we do have another workload that can help demonstrate core balancing with Intel Core and AMD Ryzen processors. A quick and simple custom-built C++ application can be used to generate generic worker threads and monitor for core collisions and resolutions.
This test app has a very straight forward workflow. Every few seconds it generates a new thread, capping at N/2 threads total, where N is equal to the reported number of logical cores. If the OS scheduler is working as expected, it should load 8 threads across 8 physical cores, though the division between the specific logical core per physical core will be based on very minute parameters and conditions going on in the OS background.
By monitoring the APIC_ID through the CPUID instruction, the first application thread monitors all threads and detects and reports on collisions - when a thread from our app is running on the same core as another thread from our app. That thread also reports when those collisions have been cleared. In an ideal and expected environment where Windows 10 knows the boundaries of physical and logical cores, you should never see more than one thread of a core loaded at the same time.
Click to Enlarge
This screenshot shows our app working on the left and the Windows Task Manager on the right with logical cores labeled. While it may look like all logical cores are being utilized at the same time, in fact they are not. At any given point, only LCore 0 or LCore 1 are actively processing a thread. Need proof? Check out the modified view of the task manager where I copy the graph of LCore 1/5/9/13 over the graph of LCore 0/4/8/12 with inverted colors to aid viewability.
If you look closely, by overlapping the graphs in this way, you can see that the threads migrate from LCore 0 to LCore 1, LCore 4 to LCore 5, and so on. The graphs intersect and fill in to consume ~100% of the physical core. This pattern is repeated for the other 8 logical cores on the right two columns as well.
Running the same application on a Core i7-5960X Haswell-E 8-core processor shows a very similar behavior.
Click to Enlarge
Each pair of logical cores shares a single thread and when thread transitions occur away from LCore N, they migrate perfectly to LCore N+1. It does appear that in this scenario the Intel system is showing a more stable threaded distribution than the Ryzen system. While that may in fact incur some performance advantage for the 5960X configuration, the penalty for intra-core thread migration is expected to be very minute.
The fact that Windows 10 is balancing the 8 thread load specifically between matching logical core pairs indicates that the operating system is perfectly aware of the processor topology and is selecting distinct cores first to complete the work.
Information from this custom application, along with the storage performance tool example above, clearly show that Windows 10 is attempting to balance work on Ryzen between cores in the same manner that we have experienced with Intel and its HyperThreaded processors for many years.
Flagship Performance Gets Cheaper
UPDATE! If you missed our launch day live stream, you can find the reply below:
It’s a very interesting time in the world of PC gaming hardware. We just saw the release of AMD’s Ryzen processor platform that shook up the processor market for the first time in a decade, AMD’s Vega architecture has been given the brand name “Vega”, and the anticipation for the first high-end competitive part from AMD since Hawaii grows as well. AMD was seemingly able to take advantage of Intel’s slow innovation pace on the processor and it was hoping to do the same to NVIDIA on the GPU. NVIDIA’s product line has been dominant in the mid and high-end gaming market since the 900-series with the 10-series products further cementing the lead.
The most recent high end graphics card release came in the form of the updated Titan X based on the Pascal architecture. That was WAY back in August of 2016 – a full seven months ago! Since then we have seen very little change at the top end of the product lines and what little change we did see came from board vendors adding in technology and variation on the GTX 10-series.
Today we see the release of the new GeForce GTX 1080 Ti, a card that offers only a handful of noteworthy technological changes but instead is able to shake up the market by instigating pricing adjustments to make the performance offers more appealing, and lowering the price of everything else.
The GTX 1080 Ti GP102 GPU
I already wrote about the specifications of the GPU in the GTX 1080 Ti when it was announced last week, so here’s a simple recap.
|GTX 1080 Ti||Titan X (Pascal)||GTX 1080||GTX 980 Ti||TITAN X||GTX 980||R9 Fury X||R9 Fury||R9 Nano|
|GPU||GP102||GP102||GP104||GM200||GM200||GM204||Fiji XT||Fiji Pro||Fiji XT|
|Base Clock||1480 MHz||1417 MHz||1607 MHz||1000 MHz||1000 MHz||1126 MHz||1050 MHz||1000 MHz||up to 1000 MHz|
|Boost Clock||1582 MHz||1480 MHz||1733 MHz||1076 MHz||1089 MHz||1216 MHz||-||-||-|
|Memory Clock||11000 MHz||10000 MHz||10000 MHz||7000 MHz||7000 MHz||7000 MHz||500 MHz||500 MHz||500 MHz|
|Memory Interface||352-bit||384-bit G5X||256-bit G5X||384-bit||384-bit||256-bit||4096-bit (HBM)||4096-bit (HBM)||4096-bit (HBM)|
|Memory Bandwidth||484 GB/s||480 GB/s||320 GB/s||336 GB/s||336 GB/s||224 GB/s||512 GB/s||512 GB/s||512 GB/s|
|TDP||250 watts||250 watts||180 watts||250 watts||250 watts||165 watts||275 watts||275 watts||175 watts|
|Peak Compute||10.6 TFLOPS||10.1 TFLOPS||8.2 TFLOPS||5.63 TFLOPS||6.14 TFLOPS||4.61 TFLOPS||8.60 TFLOPS||7.20 TFLOPS||8.19 TFLOPS|
The GTX 1080 Ti looks a whole lot like the TITAN X launched in August of last year. Based on the 12B transistor GP102 chip, the new GTX 1080 Ti will have 3,584 CUDA core with a 1.60 GHz Boost clock. That gives it the same processor count as Titan X but with a slightly higher clock speed which should make the new GTX 1080 Ti slightly faster by at least a few percentage points and has a 4.7% edge in base clock compute capability. It has 28 SMs, 28 geometry units, 224 texture units.
Interestingly, the memory system on the GTX 1080 Ti gets adjusted – NVIDIA has disabled a single 32-bit memory controller to give the card a total of 352-bit wide bus and an odd-sounding 11GB memory capacity. The ROP count also drops to 88 units. Speaking of 11, the memory clock on the G5X implementation on GTX 1080 Ti will now run at 11 Gbps, a boost available to NVIDIA thanks to a chip revision from Micron and improvements to equalization and reverse signal distortion.
The move from 12GB of memory on the GP102-based Titan X to 11GB on the GTX 1080 Ti is an interesting move, and evokes memories of the GTX 970 fiasco where NVIDIA disabled a portion of that memory controller but left the memory that would have resided on it ON the board. At that point, what behaved as 3.5GB of memory at one speed and 500 MB at another speed, was the wrong move to make. But releasing the GTX 970 with "3.5GB" of memory would have seemed odd too. NVIDIA is not making the same mistake, instead building the GTX 1080 Ti with 11GB out the gate.
The right angle
While many in the media and enthusiast communities are still trying to fully grasp the importance and impact of the recent AMD Ryzen 7 processor release, I have been trying to complete my review of the 1700X and 1700 processors, in between testing the upcoming GeForce GTX 1080 Ti and preparing for more hardware to show up at the offices very soon. There is still much to learn and understand about the first new architecture from AMD in nearly a decade, including analysis of the memory hierarchy, power consumption, overclocking, gaming performance, etc.
During my Ryzen 7 1700 testing, I went through some overclocking evaluation and thought the results might be worth sharing earlier than later. This quick article is just a preview of what we are working on so don’t expect to find the answers to Ryzen power management here, only a recounting of how I was able to get stellar performance from the lowest priced Ryzen part on the market today.
The system specifications for this overclocking test were identical to our original Ryzen 7 processor review.
|Test System Setup|
|CPU||AMD Ryzen 7 1800X
AMD Ryzen 7 1700X
AMD Ryzen 7 1700
Intel Core i7-7700K
Intel Core i5-7600K
Intel Core i7-6700K
Intel Core i7-6950X
Intel Core i7-6900K
Intel Core i7-6800K
|Motherboard||ASUS Crosshair VI Hero (Ryzen)
ASUS Prime Z270-A (Kaby Lake, Skylake)
ASUS X99-Deluxe II (Broadwell-E)
|Storage||Corsair Force GS 240 SSD|
|Graphics Card||NVIDIA GeForce GTX 1080 8GB|
|Graphics Drivers||NVIDIA 378.49|
|Power Supply||Corsair HX1000|
|Operating System||Windows 10 Pro x64|
Of note is that I am still utilizing the Noctua U12S cooler that AMD provided for our initial testing – all of the overclocking and temperature reporting in this story is air cooled.
First, let’s start with the motherboard. All of this testing was done on the ASUS Crosshair VI Hero with the latest 5704 BIOS installed. As I began to discover the different overclocking capabilities (BCLK adjustment, multipliers, voltage) I came across one of the ASUS presets. These presets offer pre-defined collections of settings that ASUS feels will offer simple overclocking capabilities. An option for higher BCLK existed but the one that caught my eye was straight forward – 4.0 GHz.
With the Ryzen 1700 installed, I thought I would give it a shot. Keep in mind that this processor has a base clock of 3.0 GHz, a rated maximum boost clock of 3.7 GHz, and is the only 65-watt TDP variant of the three Ryzen 7 processors released last week. Because of that, I didn’t expect the overclocking capability for it to match what the 1700X and 1800X could offer. Based on previous processor experience, when a chip is binned at a lower power draw than the rest of a family it will often have properties that make it disadvantageous for running at HIGHER power. Based on my results here, that doesn’t seem to the case.
By simply enabling that option in the ASUS UEFI and rebooting, our Ryzen 1700 processor was running at 4.0 GHz on all cores! For this piece, I won’t be going into the drudge and debate on what settings ASUS changed to get to this setting or if the voltages are overly aggressive – the point is that it just works out of the box.
Introduction and Specifications
The G533 Wireless headset is the latest offering from Logitech, combining the company’s premium Pro-G drivers, 15-hour battery life, and a new, more functional style. Obvious comparisons can be made to last year’s G933 Artemis Spectrum, since both are wireless headsets using Logitech’s Pro-G drivers; but this new model comes in at a lower price while offering much of the same functionality (while dropping the lighting effects). So does the new headset sound any different? What about the construction? Read on to find out!
The G533 exists alongside the G933 Artemis Spectrum in Logitech’s current lineup, but it takes most of the features from that high-end wireless model, while paring it down to create a lean, mean option for gamers who don’t need (or want) RGB lighting effects. The 40 mm Pro-G drivers are still here, and the new G533 offers a longer battery life (15 hours) than the G933 could manage, even with its lighting effects disabled (12 hours). 7.1-channel surround effects and full EQ and soundfield customization remain, though only DTS effects are present (no Dolby this time).
What do these changes translate to? First of all, the G533 headset is being introduced with a $149 MSRP, which is $50 lower than the G933 Artemis Spectrum at $199. I think many of our readers would trade RGB effects for lower cost, making this a welcome change (especially considering lighting effects don’t really mean much when you are wearing the headphones).Another difference is the overall weight of the headset at 12.5 oz, which is 0.5 oz lighter than the G933 at 13 oz.
Introduction and Features
Riotoro is a new player in the already crowded PC power supply market. Formed in 2014 and based in California, Riotoro originally started their PC hardware business with a focus on cases, mice, and LED fans targeted towards the gaming community. Now they are expanding their product offerings to include two new power supply lines, the Enigma and Onyx Series, along with two liquid CPU coolers and several RGB gaming keyboards. We will be taking a detailed look at Riotoro’s new Enigma 850W power supply in this review.
Riotoro announced the introduction of the three power supplies at Computex 2016: the Enigma 850W, Onyx 750W, and Onyx 650W. All three power supplies were developed in partnership with Great Wall and are based on new platforms designed to hit the sweet spot for practical real-world performance, reliability, and price. The Onyx line will initially be available in 650W and 750W models. The more up scale Enigma line will kick off with the 850W model.
The Riotoro Enigma 850W power supply is certified to comply with the 80 Plus Gold criteria for high efficiency, comes with semi-modular cables, and uses a quiet 140mm variable speed fan for cooling.
Riotoro Enigma 850W PSU Key Features:
• 850W Continuous DC output at up to 40°C
• 80 PLUS Gold certified for high efficiency
• Semi-modular cables
• Quiet 140mm cooling fan
• Japanese made bulk (electrolytic) capacitors
• Compatible with Intel and AMD processors and motherboards
• Active Power Factor correction with Universal AC input (100 to 240 VAC)
• Safety protections: OVP, UVP, OCP, OPP, and SCP
• 5-Year warranty
• MSRP: $119.99 USD