Background and setup
A couple of weeks back, during the excitement surrounding the announcement of the GeForce GTX 1080 Ti graphics card, NVIDIA announced an update to its performance reporting project known as FCAT to support VR gaming. The updated iteration, FCAT VR as it is now called, gives us the first true ability to not only capture the performance of VR games and experiences, but the tools with which to measure and compare.
Watch ths video walk through of FCAT VR with me and NVIDIA's Tom Petersen
I already wrote an extensive preview of the tool and how it works during the announcement. I think it’s likely that many of you overlooked it with the noise from a new GPU, so I’m going to reproduce some of it here, with additions and updates. Everyone that attempts to understand the data we will be presenting in this story and all VR-based tests going forward should have a baseline understanding of the complexity of measuring VR games. Previous tools don’t tell the whole story, and even the part they do tell is often incomplete.
If you already know how FCAT VR works from reading the previous article, you can jump right to the beginning of our results here.
Measuring and validating those claims has proven to be a difficult task. Tools that we used in the era of standard PC gaming just don’t apply. Fraps is a well-known and well-understood tool for measuring frame rates and frame times utilized by countless reviewers and enthusiasts, but Fraps lacked the ability to tell the complete story of gaming performance and experience. NVIDIA introduced FCAT and we introduced Frame Rating back in 2013 to expand the capabilities that reviewers and consumers had access to. Using more sophisticated technique that includes direct capture of the graphics card output in uncompressed form, a software-based overlay applied to each frame being rendered, and post-process analyzation of that data, we could communicate the smoothness of a gaming experience, better articulating it to help gamers make purchasing decisions.
For VR though, those same tools just don’t cut it. Fraps is a non-starter as it measures frame rendering from the GPU point of view and completely misses the interaction between the graphics system and the VR runtime environment (OpenVR for Steam/Vive and OVR for Oculus). Because the rendering pipeline is drastically changed in the current VR integrations, what Fraps measures is completely different than the experience the user actually gets in the headset. Previous FCAT and Frame Rating methods were still viable but the tools and capture technology needed to be updated. The hardware capture products we used since 2013 were limited in their maximum bandwidth and the overlay software did not have the ability to “latch in” to VR-based games. Not only that but measuring frame drops, time warps, space warps and reprojections would be a significant hurdle without further development.
NVIDIA decided to undertake the task of rebuilding FCAT to work with VR. And while obviously the company is hoping that it will prove its claims of performance benefits for VR gaming, it should not be overlooked the investment in time and money spent on a project that is to be open sourced and free available to the media and the public.
NVIDIA FCAT VR is comprised of two different applications. The FCAT VR Capture tool runs on the PC being evaluated and has a similar appearance to other performance and timing capture utilities. It uses data from Oculus Event Tracing as a part of the Windows ETW and SteamVR’s performance API, along with NVIDIA driver stats when used on NVIDIA hardware to generate performance data. It will and does work perfectly well on any GPU vendor’s hardware though with the access to the VR vendor specific timing results.
Subject: General Tech | March 14, 2017 - 05:53 PM | Jeremy Hellstrom
Tagged: Truly Ergonomic, kailh brown, mechanical keyboard
Ergonomic keyboards have come in a wide variety of shapes and sizes and tend to be either loathed or loved. Truly Ergonomic release the tenkeyless board you can see below, a bit of a change from the usual design which separates the keys on a much larger angle and incorporates Kailh Brown switches into the design. At 234 x 327.6 x 38.1mm (9.2 x 12.8 x 1.5") it is of similar size to most TKL boards and much smaller than other ergonomic designs, especially if you chose to remove the wrist rest. TechPowerUp tried it out and ran into some strange issues, troubles with USB 3.x connectivity and the ability to brick the keyboard requiring disassembly to return it to working condition. On the other hand, their wrists were happy with the layout; read the full review here.
"The Truly Ergonomic Keyboard in its current revisions 227 and 229 aims to get past the issues that plagued the predecessors to re-establish a loyal customer base. It features all new switches, updated firmware, support for niche keyboard layouts, full programmability and more in a form factor smaller than most keyboards."
Here is some more Tech News from around the web:
- AZIO MK Retro Mechanical Keyboard @ Benchmark Reviews
- Corsair K70 LUX RGB Keyboard @ Benchmark Reviews
- Fnatic Gear Clutch G1 Mouse @ Kitguru
Subject: Processors | March 14, 2017 - 03:17 PM | Jeremy Hellstrom
Tagged: nvidia, JetsonTX1, Denver, Cortex A57, pascal, SoC
Amongst the furor of the Ryzen launch the NVIDIA's new Jetson TX2 SoC was quietly sent out to reviewers and today the NDA expired so we can see how it performs. There are more Ryzen reviews below the fold, including Phoronix's Linux testing if you want to skip ahead. In addition to the specifications in the quote, you will find 8GB of 128-bit LPDDR4 offering memory bandwidth of 58.4 GB/s and 32GBs of eMMC for local storage. This Jetson is running JetPack 3.0 L4T based off of the Linux 4.4.15 kernel. Phoronix tested out its performance, see for yourself.
"Last week we got to tell you all about the new NVIDIA Jetson TX2 with its custom-designed 64-bit Denver 2 CPUs, four Cortex-A57 cores, and Pascal graphics with 256 CUDA cores. Today the Jetson TX2 is shipping and the embargo has expired for sharing performance metrics on the JTX2."
Here are some more Processor articles from around the web:
- Hands-On Nvidia Jetson TX2: Fast Processing for Embedded Devices @ Hack a Day
- AMD Ryzen 7 1700X Review; Testing SMT @ Hardware Canucks
- AMD Ryzen 7 1700 Linux Benchmarks: Great Multi-Core Performance For $329 @ Phoronix
Subject: General Tech | March 14, 2017 - 11:54 AM | Jeremy Hellstrom
Tagged: nm, storage
As we are not going to see scanning tunnelling microscopes included in our home computers anytime soon this experiment is simply proof of the concept that data can be stored on a single atom. That does not make it any less interesting for those fascinated by atomic storage techniques. A single atom of holmium can be made to spin either up or down, signifying either a 0 or 1, and that spin state can be 'read' by measuring the vibration of a single iron atom located close by. The holmium atoms used for storage can be separated by a mere nanometer without interfering with the spin of its neighbours. The spin state only lasts a few hours but shows that this could someday be a viable storage technology. You can read more at nanotechweb, who also have links to the Nature article.
"Information has been stored in a single atom for the first time. The nascent binary memory was created by Andreas Heinrich at the Institute of Basic Science in South Korea and an international team."
Here is some more Tech News from around the web:
- Synology's RT-2600ac wireless router @ The Tech Report
- Nintendo Switch Ships With Unpatched 6-Month-Old WebKit Vulnerabilities @ Slashdot
- Samsung commits to monthly security updates for unlocked US smartphones @ Ars Technica
- Windows Vista is four weeks from hasta la vista @ The Inquirer
- Three hack: 76,000 more customers hit by November breach @ The Inquirer
Subject: Processors | March 13, 2017 - 08:48 PM | Sebastian Peak
Tagged: Windows 7, windows 10, thread scheduling, SMT, ryzen, Robert Hallock, processor, cpu, amd
AMD's Robert Hallock (previously the Head of Global Technical Marketing for AMD and now working full time on the CPU side of things) has posted a comprehensive Ryzen update, covering AMD's official stance on Windows 10 thread scheduling, the performance implications of SMT, Windows power management settings, and more. The post in its entirety is reproduced below, and also available from AMD by following this link.
It’s been about two weeks since we launched the new AMD Ryzen™ processor, and I’m just thrilled to see all the excitement and chatter surrounding our new chip. Seems like not a day goes by when I’m not being tweeted by someone doing a new build, often for the first time in many years. Reports from media and users have also been good:
- “This CPU gives you something that we needed for a long time, which is a CPU that gives you a well-rounded experience.” –JayzTwoCents
- Competitive performance at 1080p, with Tech Spot saying the “affordable Ryzen 7 1700” is an “awesome option” and a “safer bet long term.”
- ExtremeTech showed strong performance for high-end GPUs like the GeForce GTX 1080 Ti, especially for gamers that understand how much value AMD Ryzen™ brings to the table
- Many users are noting that the 8-core design of AMD Ryzen™ 7 processors enables “noticeably SMOOTHER” performance compared to their old platforms.
While these findings have been great to read, we are just getting started! The AMD Ryzen™ processor and AM4 Platform both have room to grow, and we wanted to take a few minutes to address some of the questions and comments being discussed across the web.
We have investigated reports alleging incorrect thread scheduling on the AMD Ryzen™ processor. Based on our findings, AMD believes that the Windows® 10 thread scheduler is operating properly for “Zen,” and we do not presently believe there is an issue with the scheduler adversely utilizing the logical and physical configurations of the architecture.
As an extension of this investigation, we have also reviewed topology logs generated by the Sysinternals Coreinfo utility. We have determined that an outdated version of the application was responsible for originating the incorrect topology data that has been widely reported in the media. Coreinfo v3.31 (or later) will produce the correct results.
Finally, we have reviewed the limited available evidence concerning performance deltas between Windows® 7 and Windows® 10 on the AMD Ryzen™ CPU. We do not believe there is an issue with scheduling differences between the two versions of Windows. Any differences in performance can be more likely attributed to software architecture differences between these OSes.
Going forward, our analysis highlights that there are many applications that already make good use of the cores and threads in Ryzen, and there are other applications that can better utilize the topology and capabilities of our new CPU with some targeted optimizations. These opportunities are already being actively worked via the AMD Ryzen™ dev kit program that has sampled 300+ systems worldwide.
Above all, we would like to thank the community for their efforts to understand the Ryzen processor and reporting their findings. The software/hardware relationship is a complex one, with additional layers of nuance when preexisting software is exposed to an all-new architecture. We are already finding many small changes that can improve the Ryzen performance in certain applications, and we are optimistic that these will result in beneficial optimizations for current and future applications.
The primary temperature reporting sensor of the AMD Ryzen™ processor is a sensor called “T Control,” or tCTL for short. The tCTL sensor is derived from the junction (Tj) temperature—the interface point between the die and heatspreader—but it may be offset on certain CPU models so that all models on the AM4 Platform have the same maximum tCTL value. This approach ensures that all AMD Ryzen™ processors have a consistent fan policy.
Specifically, the AMD Ryzen™ 7 1700X and 1800X carry a +20°C offset between the tCTL° (reported) temperature and the actual Tj° temperature. In the short term, users of the AMD Ryzen™ 1700X and 1800X can simply subtract 20°C to determine the true junction temperature of their processor. No arithmetic is required for the Ryzen 7 1700. Long term, we expect temperature monitoring software to better understand our tCTL offsets to report the junction temperature automatically.
The table below serves as an example of how the tCTL sensor can be interpreted in a hypothetical scenario where a Ryzen processor is operating at 38°C.
Users may have heard that AMD recommends the High Performance power plan within Windows® 10 for the best performance on Ryzen, and indeed we do. We recommend this plan for two key reasons:
- Core Parking OFF: Idle CPU cores are instantaneously available for thread scheduling. In contrast, the Balanced plan aggressively places idle CPU cores into low power states. This can cause additional latency when un-parking cores to accommodate varying loads.
- Fast frequency change: The AMD Ryzen™ processor can alter its voltage and frequency states in the 1ms intervals natively supported by the “Zen” architecture. In contrast, the Balanced plan may take longer for voltage and frequency (V/f) changes due to software participation in power state changes.
In the near term, we recommend that games and other high-performance applications are complemented by the High Performance plan. By the first week of April, AMD intends to provide an update for AMD Ryzen™ processors that optimizes the power policy parameters of the Balanced plan to favor performance more consistent with the typical usage models of a desktop PC.
Simultaneous Multi-threading (SMT)
Finally, we have investigated reports of instances where SMT is producing reduced performance in a handful of games. Based on our characterization of game workloads, it is our expectation that gaming applications should generally see a neutral/positive benefit from SMT. We see this neutral/positive behavior in a wide range of titles, including: Arma® 3, Battlefield™ 1, Mafia™ III, Watch Dogs™ 2, Sid Meier’s Civilization® VI, For Honor™, Hitman™, Mirror’s Edge™ Catalyst and The Division™. Independent 3rd-party analyses have corroborated these findings.
For the remaining outliers, AMD again sees multiple opportunities within the codebases of specific applications to improve how this software addresses the “Zen” architecture. We have already identified some simple changes that can improve a game’s understanding of the "Zen" core/cache topology, and we intend to provide a status update to the community when they are ready.
Overall, we are thrilled with the outpouring of support we’ve seen from AMD fans new and old. We love seeing your new builds, your benchmarks, your excitement, and your deep dives into the nuts and bolts of Ryzen. You are helping us make Ryzen™ even better by the day. You should expect to hear from us regularly through this blog to answer new questions and give you updates on new improvements in the Ryzen ecosystem.
Such topics as Windows 7 vs. Windows 10 performance, SMT impact, and thread scheduling will no doubt still be debated, and AMD has correctly pointed out that optimization for this brand new architecture will only improve Ryzen performance going forward. Our own findings as to Ryzen and the Windows 10 thread scheduler appear to be validated as AMD officially dismisses performance impact in that area, though there is still room for improvement in other areas from our initial gaming performance findings. As mentioned in the post, AMD will have an update for Windows power plan optimization by the first week of April, and the company has "already identified some simple changes that can improve a game’s understanding of the 'Zen' core/cache topology, and we intend to provide a status update to the community when they are ready", as well.
It is refreshing to see a company publicly acknowledging the topics that have resulted in so much discussion in the past couple of weeks, and their transparency is commendable, with every issue (that this author is aware of) being touched on in the post.
Subject: General Tech | March 13, 2017 - 08:02 PM | Scott Michaud
Tagged: webassembly, ue4, mozilla, epic games
HTML5 was a compile target for Unreal Engine since Unreal Engine 3, but it was supposed to be a bigger push for Unreal Engine 4 then it has been. At the time, Mozilla was pushing for web browsers to be the main source of games. Thanks to Flash, users are even already accustomed to that use case; it’s just a matter of getting performance and functionality close enough to competing platforms, and supporting content that will show it off.
That brings us to Zen Garden. This demo was originally designed to show off the Metal API for iOS, but Epic has re-purposed it for the recently released web browser features, WebAssembly and WebGL 2.0. Personally, I find it slightly less impressive than the Firefox demo of Unreal Tournament 3 that I played at Mozilla Summit 2013, but it’s a promising example that big-name engines are taking Web standards seriously again. You don’t get much bigger than Unreal Engine 4.
So yeah... if you have Firefox 52, then play around with it. It’s free.
Subject: Cases and Cooling | March 13, 2017 - 03:47 PM | Jeremy Hellstrom
Tagged: DUOFan, modular psu, enermax, Revolution Duo, 700W
Enermax is bringing back a classic in PSU design, dual fans in a push pull configuration was the standard not even a decade ago. Their marketing team chose Revolution to describe the new PSU, perhaps missing a chance to refer to it as a Heritage, Pedigree or other such designations that you see on a lot of lifestyle products lately. The two fans, one 100mm and one 80mm are housed in a casing just shy of 6" which may be attractive to those with limited room inside their cases. There is a dial on the back of the PSU which allows you to manually adjust the speed of the fans, allowing for quiet operation if you so choose. Even at the highest settings this PSU is still much quieter than those PSUs of old cooled by Delta screamers, modern fans are a definite plus in this design. Check out how this 700W PSU stacks up to the competition in [H]ard|OCP's full review.
"Two is always better than one, right? Two fans in your new PSU is a win-win? That is the theory behind the Enermax Revolution Duo series of Power Supply Units. Enermax has a long history of producing PSUs ranging from good to excellent. Where will the new Duo fall in line today?"
Here are some more Cases & Cooling reviews from around the web:
- Kolink Continuum KL-C1050PL 1050W Platinum @ Kitguru
- Super Flower Leadex II 1000W 80 Plus Gold @ Kitguru
- BitFenix Whisper M Series 850W @ Kitguru
- Cougar LX Series 600 W @ Kitguru
- FSP Twins 500 W Redundant PSU @ techPowerUp
- FSP Twins 500W Redundant Power Supply @ Kitguru
Subject: General Tech | March 13, 2017 - 02:35 PM | Jeremy Hellstrom
Tagged: Intel, mobileye, self driving car, billions
BMW's self driving car division asked Intel and Mobileye to partner together to design the iNext spin off of BMW's electric car division. Mobileye specializes in sensors and software for autonomous or assisted driving, Tesla used their products in the Model S. Their success has not gone unnoticed and today they are Intel's latest acquisition in the IoT market, purchased for a total of roughly $15.3 billion, US. Expect to see more Intel Inside stickers on cars, as they have recently purchased another IoT firm specializing in chip security as well as one focused on computer vision. Pop by The Inquirer for links to those other purchases.
"On Monday, Intel announced that it has purchased the company for £12.5bn, marking the biggest-ever acquisition of an Israeli tech company. It's also the biggest purchase of a company solely focused on the autonomous driving sector."
Here is some more Tech News from around the web:
- 6 of the most useful Google things no one uses @ The Inquirer
- Malware infecting Androids somewhere in the supply chain @ The Register
- IBM pushes blockchain system for e-transaction @ DigiTimes
- Q Has Nothing on Naomi Wu @ Hack a Day
- User lubed PC with butter, because pressing a button didn't work @ The Register
- Tim Berners-Lee says privacy needs fixing – and calls for 'algorithmic transparency' @ The Register
- Arozzi Arena Gaming Desk Review @ NikKTech
With the introduction of the Intel Kaby Lake processors and Intel Z270 chipset, unprecedented overclocking became the norm. The new processors easily hit a core speed of 5.0GHz with little more than CPU core voltage tweaking. This overclocking performance increase came with a price tag. The Kaby Lake processor runs significantly hotter than previous generation processors, a seeming reversal in temperature trends from previous generation Intel CPUs. At stock settings, the individual cores in the CPU were recording in testing at hitting up to 65C - and that's with a high performance water loop cooling the processor. Per reports from various enthusiasts sites, Intel used inferior TIM (thermal interface material) in between the CPU die and underside of the CPU heat spreader, leading to increased temperatures when compared with previous CPU generations (in particular Skylake). This temperature increase did not affect overclocking much since the CPU will hit 5.0GHz speed easily, but does impact the means necessary to hit those performance levels.
Like with the previous generation Haswell CPUs, a few of the more adventurous enthusiasts used known methods in an attempt to address the heat concerns of the Kaby Lake processor be delidding the processor. Unlike in the initial days of the Haswell processor, the delidding process is much more stream-lined with the availability of delidding kits from several vendors. The delidding process still involves physically removing the heat spreader from the CPU, and exposing the CPU die. However, instead of cooling the die directly, the "safer" approach is to clean the die and underside of the heat spreader, apply new TIM (thermal interface material), and re-affix the heat spreader to the CPU. Going this route instead of direct-die cooling is considered safer because no additional or exotic support mechanisms are needed to keep the CPU cooler from crushing your precious die. However, calling it safe is a bit of an over-statement, you are physically separating the heat spreader from the CPU surface and voiding your CPU warranty at the same time. Although if that was a concern, you probably wouldn't be reading this article in the first place.
Subject: General Tech, Processors | March 12, 2017 - 05:11 PM | Tim Verry
Tagged: pascal, nvidia, machine learning, iot, Denver, Cortex A57, ai
Measuring 50mm x 87mm, the Jetson TX2 packs quite a bit of processing power and I/O including an SoC with two 64-bit Denver 2 cores with 2MB L2, four ARM Cortex A57 cores with 2MB L2, and a 256-core GPU based on NVIDIA’s Pascal architecture. The TX2 compute module also hosts 8 GB of LPDDR4 (58.3 GB/s) and 32 GB of eMMC storage (SDIO and SATA are also supported). As far as I/O, the Jetson TX2 uses a 400-pin connector to connect the compute module to the development board or final product and the final I/O available to users will depend on the product it is used in. The compute module supports up to the following though:
- 2 x DSI
- 2 x DP 1.2 / HDMI 2.0 / eDP 1.4
- USB 3.0
- USB 2.0
- 12 x CSI lanes for up to 6 cameras (2.5 GB/second/lane)
- PCI-E 2.0:
- One x4 + one x1 or two x1 + one x2
- Gigabit Ethernet
The Jetson TX2 runs the “Linux for Tegra” operating system. According to NVIDIA the Jetson TX2 can deliver up to twice the performance of the TX1 or up to twice the efficiency at 7.5 watts at the same performance.
The extra horsepower afforded by the faster CPU, updated GPU, and increased memory and memory bandwidth will reportedly enable smart end user devices with faster facial recognition, more accurate speech recognition, and smarter AI and machine learning tasks (e.g. personal assistant, smart street cameras, smarter home automation, et al). Bringing more power locally to these types of internet of things devices is a good thing as less reliance on the cloud potentially means more privacy (unfortunately there is not as much incentive for companies to make this type of product for the mass market but you could use the TX2 to build your own).
Cisco will reportedly use the Jetson TX2 to add facial and speech recognition to its Cisco Spark devices. In addition to the hardware, NVIDIA offers SDKs and tools as part of JetPack 3.0. The JetPack 3.0 toolkit includes Tensor-RT, cuDNN 5.1, VisionWorks 1.6, CUDA 8, and support and drivers for OpenGL 4.5, OpenGL ES 3 2, EGL 1.4, and Vulkan 1.0.
The TX2 will enable better, stronger, and faster (well I don't know about stronger heh) industrial control systems, robotics, home automation, embedded computers and kiosks, smart signage, security systems, and other connected IoT devices (that are for the love of all processing are hardened and secured so they aren't used as part of a botnet!).
Interested developers and makers can pre-order the Jetson TX2 Development Kit for $599 with a ship date for US and Europe of March 14 and other regions “in the coming weeks.” If you just want the compute module sans development board, it will be available later this quarter for $399 (in quantities of 1,000 or more). The previous generation Jetson TX1 Development Kit has also received a slight price cut to $499.
Subject: General Tech, Processors | March 11, 2017 - 10:02 PM | Tim Verry
Tagged: softbank, investments, business, arm
Japanese telecom powerhouse SoftBank, which recently purchased ARM Holdings for $32 billion USD is reportedly in talks to sell off a 25% stake in its new subsidiary to a new investment fund. Specifically, the New York Times cites a source inside SoftBank familiar with the matter who revealed that SoftBank is in talks with the Vision Fund to purchase a stake in ARM Holdings worth approximately $8 billion USD.
The $100 billion Vision Fund is an investment fund started by SoftBank founder Masayoshi Son with a goal of investing in high growth technology start-ups and major technology IP holders. The fund is currently comprised of investments from SoftBank worth $25 billion, $45 billion from Saudi Arabia (via Saudi Arabia Public Investment Fund), and minor investments from Apple and Oracle co-founder Lawrence Ellison. The fund is approximately 75% of the way to its $100 billion funding goal with the state owned Mubadala Development investment company in Abu Dhabi and the Qatari government allegedly interested in joining the fund. The Vision Fund is based in the UK and led by SoftBank's Head of Strategic Finance Rajeev Mistra (Investment bankers Nizar al-Bassam and Dalin Ariburnu formerly of Deutsche Bank and Goldman Sachs respectively are also involved.)
It is interesting that SoftBank plans to sell off such a large stake in ARM Holdings so soon after purchasing the company (the sale finalized only six months ago), but it may be a move to entice investors to the investment fund which SoftBank is a part of to further diversify its assets. The more interesting question is the political and regulatory reaction to this news and what it will mean for ARM and its IP to have even more countries controlling it and its direction(s). I do not have the geopolitical acumen to speculate on whether this is a good or bad thing (heh). It does continue the trend of countries outside of the US increasing their investments in established technology companies with lots of IP (wether US based or not) as well as new start ups. New money entering this sector is likely overall good though, at least for the companies involved heh.
I guess we will just have to wait and see if the sale completes and where ARM goes from there! What are your thoughts on the SoftBank sale of a quarter stake in ARM?
** UPDATE 3/13 5 PM **
AMD has posted a follow-up statement that officially clears up much of the conjecture this article was attempting to clarify. Relevant points from their post that relate to this article as well as many of the requests for additional testing we have seen since its posting (emphasis mine):
"We have investigated reports alleging incorrect thread scheduling on the AMD Ryzen™ processor. Based on our findings, AMD believes that the Windows® 10 thread scheduler is operating properly for “Zen,” and we do not presently believe there is an issue with the scheduler adversely utilizing the logical and physical configurations of the architecture."
"Finally, we have reviewed the limited available evidence concerning performance deltas between Windows® 7 and Windows® 10 on the AMD Ryzen™ CPU. We do not believe there is an issue with scheduling differences between the two versions of Windows. Any differences in performance can be more likely attributed to software architecture differences between these OSes."
So there you have it, straight from the horse's mouth. AMD does not believe the problem lies within the Windows thread scheduler. SMT performance in gaming workloads was also addressed:
"Finally, we have investigated reports of instances where SMT is producing reduced performance in a handful of games. Based on our characterization of game workloads, it is our expectation that gaming applications should generally see a neutral/positive benefit from SMT. We see this neutral/positive behavior in a wide range of titles, including: Arma® 3, Battlefield™ 1, Mafia™ III, Watch Dogs™ 2, Sid Meier’s Civilization® VI, For Honor™, Hitman™, Mirror’s Edge™ Catalyst and The Division™. Independent 3rd-party analyses have corroborated these findings.
For the remaining outliers, AMD again sees multiple opportunities within the codebases of specific applications to improve how this software addresses the “Zen” architecture. We have already identified some simple changes that can improve a game’s understanding of the "Zen" core/cache topology, and we intend to provide a status update to the community when they are ready."
We are still digging into the observed differences of toggling SMT compared with disabling the second CCX, but it is good to see AMD issue a clarifying statement here for all of those out there observing and reporting on SMT-related performance deltas.
** END UPDATE **
Editor's Note: The testing you see here was a response to many days of comments and questions to our team on how and why AMD Ryzen processors are seeing performance gaps in 1080p gaming (and other scenarios) in comparison to Intel Core processors. Several outlets have posted that the culprit is the Windows 10 scheduler and its inability to properly allocate work across the logical vs. physical cores of the Zen architecture. As it turns out, we can prove that isn't the case at all. -Ryan Shrout
Initial reviews of AMD’s Ryzen CPU revealed a few inefficiencies in some situations particularly in gaming workloads running at the more common resolutions like 1080p, where the CPU comprises more of a bottleneck when coupled with modern GPUs. Lots of folks have theorized about what could possibly be causing these issues, and most recent attention appears to have been directed at the Windows 10 scheduler and its supposed inability to properly place threads on the Ryzen cores for the most efficient processing.
I typically have Task Manager open while running storage tests (they are boring to watch otherwise), and I naturally had it open during Ryzen platform storage testing. I’m accustomed to how the IO workers are distributed across reported threads, and in the case of SMT capable CPUs, distributed across cores. There is a clear difference when viewing our custom storage workloads with SMT on vs. off, and it was dead obvious to me that core loading was working as expected while I was testing Ryzen. I went back and pulled the actual thread/core loading data from my testing results to confirm:
The Windows scheduler has a habit of bouncing processes across available processor threads. This naturally happens as other processes share time with a particular core, with the heavier process not necessarily switching back to the same core. As you can see above, the single IO handler thread was spread across the first four cores during its run, but the Windows scheduler was always hitting just one of the two available SMT threads on any single core at one time.
My testing for Ryan’s Ryzen review consisted of only single threaded workloads, but we can make things a bit clearer by loading down half of the CPU while toggling SMT off. We do this by increasing the worker count (4) to be half of the available threads on the Ryzen processor, which is 8 with SMT disabled in the motherboard BIOS.
SMT OFF, 8 cores, 4 workers
With SMT off, the scheduler is clearly not giving priority to any particular core and the work is spread throughout the physical cores in a fairly even fashion.
Now let’s try with SMT turned back on and doubling the number of IO workers to 8 to keep the CPU half loaded:
SMT ON, 16 (logical) cores, 8 workers
With SMT on, we see a very different result. The scheduler is clearly loading only one thread per core. This could only be possible if Windows was aware of the 2-way SMT (two threads per core) configuration of the Ryzen processor. Do note that sometimes the workload will toggle around every few seconds, but the total loading on each physical core will still remain at ~%50. I chose a workload that saturated its thread just enough for Windows to not shift it around as it ran, making the above result even clearer.
Synthetic Testing Procedure
While the storage testing methods above provide a real-world example of the Windows 10 scheduler working as expected, we do have another workload that can help demonstrate core balancing with Intel Core and AMD Ryzen processors. A quick and simple custom-built C++ application can be used to generate generic worker threads and monitor for core collisions and resolutions.
This test app has a very straight forward workflow. Every few seconds it generates a new thread, capping at N/2 threads total, where N is equal to the reported number of logical cores. If the OS scheduler is working as expected, it should load 8 threads across 8 physical cores, though the division between the specific logical core per physical core will be based on very minute parameters and conditions going on in the OS background.
By monitoring the APIC_ID through the CPUID instruction, the first application thread monitors all threads and detects and reports on collisions - when a thread from our app is running on the same core as another thread from our app. That thread also reports when those collisions have been cleared. In an ideal and expected environment where Windows 10 knows the boundaries of physical and logical cores, you should never see more than one thread of a core loaded at the same time.
Click to Enlarge
This screenshot shows our app working on the left and the Windows Task Manager on the right with logical cores labeled. While it may look like all logical cores are being utilized at the same time, in fact they are not. At any given point, only LCore 0 or LCore 1 are actively processing a thread. Need proof? Check out the modified view of the task manager where I copy the graph of LCore 1/5/9/13 over the graph of LCore 0/4/8/12 with inverted colors to aid viewability.
If you look closely, by overlapping the graphs in this way, you can see that the threads migrate from LCore 0 to LCore 1, LCore 4 to LCore 5, and so on. The graphs intersect and fill in to consume ~100% of the physical core. This pattern is repeated for the other 8 logical cores on the right two columns as well.
Running the same application on a Core i7-5960X Haswell-E 8-core processor shows a very similar behavior.
Click to Enlarge
Each pair of logical cores shares a single thread and when thread transitions occur away from LCore N, they migrate perfectly to LCore N+1. It does appear that in this scenario the Intel system is showing a more stable threaded distribution than the Ryzen system. While that may in fact incur some performance advantage for the 5960X configuration, the penalty for intra-core thread migration is expected to be very minute.
The fact that Windows 10 is balancing the 8 thread load specifically between matching logical core pairs indicates that the operating system is perfectly aware of the processor topology and is selecting distinct cores first to complete the work.
Information from this custom application, along with the storage performance tool example above, clearly show that Windows 10 is attempting to balance work on Ryzen between cores in the same manner that we have experienced with Intel and its HyperThreaded processors for many years.
Subject: Systems | March 10, 2017 - 02:57 PM | Jeremy Hellstrom
Tagged: msi, Trident 3, kaby lake, i7-7700, gtx 1060
MSI's Trident 3 is much smaller than an Ohio class submarine, measuring roughly the same size as a PS4 Pro, albeit with a very different look. Inside you will find a Kaby Lake i7-7700, 16GB of DDR4-2400 and a GTX 1060 with storage consisting of a 256GB Kingston M.2 SATA SSD and a 1TB Hitachi Travelstar. It does lack USB 3.1 and Thunderbolt as the USB Type-C port you see is USB 3.0, it is also worth mentioning the front HDMI port will not function without the included HDMI passthrough connected on the back. The Tech Report tested it against a similar machine, the Zotac Magnus EN1070 which features a much more powerful mobile GTX 1070. On the other hand the $1300 Trident 3 comes ready to play, whereas the Zotac lacks a Windows license, storage and memory so even though it sells at $100 less than the MSI system, it may cost you more in the long run.
"MSI's Trident 3 compact PC houses a desktop Core i7-7700 CPU and a GeForce GTX 1060 6GB graphics card in a case no larger than many of today's consoles. We put that tantalizing combo to the test to see whether MSI has achieved small-form-factor gaming nirvana."
Here are some more Systems articles from around the web:
Subject: Motherboards | March 10, 2017 - 02:22 PM | Jeremy Hellstrom
Tagged: Z270 GAMING M6 AC, z270, ryzen, msi, amd
The new MSI Z270 GAMING M6 AC has a huge selection of features, up to and including a free Phanteks RGB LED strip for those who suffer from chronic RGBitis
The add-in card you see on the side is an Intel Wi-Fi/Bluetooth card which supports MU-MIMO. The onboard audio is powered by Nahimic, which MSI refers to as Audio Boost 4 and it is isolated from the other components on the motherboard to prevent noise. There is a U.2 slot and two M.2 slots with a removable heatsink they call M.2 Shield. They fully isolated the memory circuit design and as you can see below The Witcher 3 seems to like the DDR4 Boost design.
Check out the PR below for a closer look at the features included, including the special USB slot for your VR headset and the One-Click to VR option.
MSI, world leading in gaming hardware innovation, is proud to announce a brand new Enthusiast GAMING motherboard, the Z270 GAMING M6 AC with its incredibly versatile and complete foundation for a high-end gaming system. Inspired from a futuristic armored spaceship, the Z270 GAMING M6 AC design with multilayer plating, wings and armaments emphasize an ultramodern style. Erupting from the core, the entire color spectrum flows through illuminated lines. The complete motherboard and heatsink design offers a strong look and feel and uses heavy quality components to deliver the best performance and stability as the base of any gaming rig. Added features such as Audio Boost 4 with Nahimic 2, Twin Turbo M.2 with M.2 Shield, VR Boost, Killer LAN & Intel WIFI AC, and the option to fully customize the RGB LEDs to any color using Mystic light, makes the Z270 GAMING M6 AC one of the most high-end and desirable Z270 motherboards to build a gaming rig with.
Through fully isolating the memory circuit design, the DDR4 Boost ensures maximum performance and stability. The technical enhancements of DDR4 Boost allow for more stability at higher memory speeds compared to other brands.DDR4 Boost benchmark based on The Witcher 3 Enjoy the additional boost in gaming performance or when working with large video and photo files. Enable Intel® Extreme Memory Profile with ease using a single option in the BIOS to gain performance and create a perfectly stable system.
Twin Turbo M.2 with M.2 shield & U.2
Enjoy a blazing fast system boot up and insanely quick loading of applications and games with MSI motherboards. Twin Turbo M.2 delivers PCI-E Gen3 x4 performance with transfer speeds up to 64 Gb/s for the latest SSDs. It also supports the all-new Intel® Optane™ technology. M.2 Shield (patent pending) is a thermal solution, which keeps the M.2 or Optane™ device safe and cool to prevent damage and thermal throttling. M.2 GENIE makes setting up RAID easy by taking less steps, using any M.2 or PCI-E SSD (even when used in a mixed configuration). The Z270 GAMING M6 AC supports the latest storage interface, U.2 as well.
Audio Boost 4 with Nahimic 2
With Audio Boost, powered by Nahimic, MSI motherboards deliver the highest sound quality through the use of premium quality audio components and an isolated audio PCB. An added audio cover and golden audio connectors ensure the purest audio signal.
VR Ready & VR Boost
VR Boost is a smart chip that ensures a clean and strong signal to a VR optimized USB port located on the back, to reduce motion sickness caused by a bad signal. The One-Click to VR option in the MSI Gaming App gets your PC primed for VR use in just a single click by setting your components to max. performance and preventing other applications from impacting your VR experience negatively.
Intel Wi-Fi AC with Antennas
Optimize your gaming rig to deliver game networking traffic over LAN for the best possible online gaming experience, while using WiFi for other online applications. This next-generation Intel® Wi-Fi / Bluetooth solution uses smart MU-MIMO technology, delivering AC speeds up to 867Mbps. Perfect for streaming and gaming at the same time.
Includes free Phanteks RGB LED strip
This RGB LED strip helps to transform and synchronize colors in your case to any liking. Simply connect the plug & play strip to the Mystic Light Extension pin header located on MSI motherboards, without the need of external power, and set a color and choose an LED effect to match it with your motherboard and other peripherals RGB LEDs. Use the included double sided 3M tape to place the strip firmly wherever you want inside (or even outside) your chassis.
Subject: General Tech | March 10, 2017 - 01:02 PM | Jeremy Hellstrom
Tagged: ryzen, delidding
There are now two less working Ryzen 1700 processors on the planet, sacrificed in an experiment to delid the new AMD products. The third lived and was tested by der8auer, the mad experimenter, to see what benefits cooling the die directly provide. The answer is a 2C drop. This does not seem worth it, considering the high risk, an opinion that Guru 3D shares. You can of course proceed to do so if you wish, but you might want to buy a half dozen processors to save yourself some time.
"We mentioned in our reviews that you should not delid AMD Ryzen processors for the sheer fact that even the heatspreader has sensors and that it is soldered. Next to that AMD did the cooling part rather well so the benefits of a lower temperatures versus the risk of bricking that processor might not be worth it."
Here is some more Tech News from around the web:
- Plex launches Cloud-based media servers for your mp3 and video collection @ The Inquirer
- Oculus CTO Carmack is suing Zenimax for $22.5m over sales of Doom studio @ The Inquirer
- Security Oops! 185,000-plus Wi-Fi cameras on the web with insecure admin panels @ The Register
- MAC randomization: A massive failure that leaves iPhones, Android mobes open to tracking @ The Register
- Microsoft to close its social network on a week's notice – and SIX people complain @ The Register
- The AMD Ryzen 7 Gaming Performance Examined @ TechARP
Subject: General Tech, Graphics Cards | March 10, 2017 - 11:15 AM | Ryan Shrout
Tagged: video, tom petersen, pascal, nvidia, live, gtx 1080 ti, gtx, gp102, geforce
Our review of the GeForce GTX 1080 Ti 11GB graphics card is live and ready for consumption! Make sure you check it out before this afternoon's live stream!
Did you miss our GTX 1080 Ti Live Stream? Catch the reply below!
Ready your mind and body, it’s time for another GeForce GTX live stream hosted by PC Perspective’s Ryan Shrout and NVIDIA’s Tom Petersen. The general details about the GeForce GTX 1080 Ti graphics card are already official and based on the hype train and the response on social media, there is more than a little excitement.
On hand to talk about the new graphics card will be Tom Petersen, well known in our community. While the GTX 1080 Ti will be the flagship part of our live stream we will also be diving into the world of VR performance evaluation and how the new FCAT VR tool will help reviewers and standard enthusiast see where their systems stand in producing smooth, effective virtual reality gaming. We have done quite a few awesome live steams with Tom in the past, check them out if you haven't already.
NVIDIA GeForce GTX 1080 Ti and FCAT VR Live Stream
1pm PT / 4pm ET - March 9th
Need a reminder? Join our live mailing list!
The event will take place Thursday, March 9th at 4pm ET / 1pm PT at http://www.pcper.com/live. There you’ll be able to catch the live video stream as well as use our chat room to interact with the audience, asking questions for me and Tom to answer live.
Tom has a history of being both informative and entertaining and these live streaming events are always full of fun and technical information that you can get literally nowhere else. Previous streams have produced news as well – including statements on support for Adaptive Sync, release dates for displays and first-ever demos of triple display G-Sync functionality. You never know what’s going to happen or what will be said!
This just in fellow gamers: Tom is going to be providing a GeForce GTX 1080 Ti graphics card to give away during the live stream! We won't be able to ship it until the end of next week, but one lucky viewer of the live stream will be able to get their paws on the fastest graphics card we have ever tested!! Make sure you are scheduled to be here on March 9th at 1pm PT / 4pm ET!!
Win this beauty.
If you have questions, please leave them in the comments below and we'll look through them just before the start of the live stream. Of course you'll be able to tweet us questions @pcper and we'll be keeping an eye on the IRC chat as well for more inquiries. What do you want to know and hear from Tom or I?
So join us! Set your calendar for this coming Thursday at 4pm ET / 1pm PT and be here at PC Perspective to catch it. If you are a forgetful type of person, sign up for the PC Perspective Live mailing list that we use exclusively to notify users of upcoming live streaming events including these types of specials and our regular live podcast. I promise, no spam will be had!
Subject: General Tech | March 10, 2017 - 07:01 AM | Scott Michaud
Tagged: zenimax, Lawsuit, john carmack
According to Dallas News, John Carmack is suing ZeniMax for monies owed after he sold his company, id Software, to them. He claims that the company promised $45.1 million USD, half of which was used to buy stock in ZeniMax; specifically, the lawsuit states that “sour grapes is not an affirmative defense to breach of contract,” which... not so loosely implies that ZeniMax is just mad about the whole situation. ZeniMax, on the other hand, said that this was already rejected by a court in a previous filing.
As our readers probably know, this comes on the heels of ZeniMax suing Oculus VR, including John Carmack, over ownership of virtual reality technologies. While ZeniMax was awarded $500 million in prior damages by the jury decision, none of these damages were attributed to John Carmack.
Subject: Graphics Cards | March 10, 2017 - 02:49 AM | Scott Michaud
Tagged: nvidia, graphics drivers
Alongside the launch of the GeForce GTX 1080 Ti, NVIDIA has released a new graphics driver that, one, obviously supports the new card and, two, also rolls in a bunch of optimizations for DirectX 12 titles. The graphics vendor already announced the initiative at last week’s GDC, but it is now released and available for public use. 378.78 is also “Game Ready” for Ghost Recon Wildlands, although that’s mostly for Ansel support; most of the optimizations for Wildlands were pushed into the previous driver.
The advertised gains vary from title to title, but they claim that Rise of the Tomb Raider at 4K will jump from 20 FPS to 27 FPS. This can be viewed as either a frame rate gain of about 33%, or it can be seen as an average frame time savings of about 12ms each and every frame. If that’s what actual end-users will see -- that’s a lot!
They also note improvements in Vulkan support, too, but without any hard, numeric assertions.
If you have a GeForce 1050 Ti notebook, then this driver is also said to fix a potential bluescreen bug that you have been facing. You can pick it up from GeForce Experience or the NVIDIA website.
Subject: Graphics Cards | March 9, 2017 - 01:53 PM | Jeremy Hellstrom
Tagged: 1080 ti, geforce, gp102, gtx 1080 ti, nvidia, pascal
As you have probably noticed from our front page, today is the day we can see how the GTX 1080 Ti performs in reviewers systems. The unfortunate news is that you can't buy one yet nor do we know when you will be able to spend the $699 it will cost to order one. We can share the performance with you, once again NVIDIA's Ti model takes the top spot out performing even the $1200 TITAN X. As for overclocking the reference model, as we have not had a chance to test any cards with third party cooler on them, [H]ard|OCP were able to increase the GPU frequency over 200MHz to 1967-1987MHz in game and push the memory to 12GHz, somewhat better than what Ryan was able to. Check out their full review here, with many more just below.
"NVIDIA is launching the fastest video card it offers for gaming today in the new $699 GeForce GTX 1080 Ti. We will take this video card and test it against the GeForce GTX 1080 and GeForce GTX TITAN X at 1440p and 4K resolutions to find out how it compares. Is it really faster than a $1200 GeForce GTX TITAN X Pascal?"
Here are some more Graphics Card articles from around the web:
- Nvidia's GeForce GTX 1080 Ti @ The Tech Report
- Nvidia GTX 1080 Ti review: The fastest graphics card, again @ Ars Technica
- NVIDIA GeForce GTX 1080 Ti Founders Edition 11 GB @ techPowerUp
- The NVIDIA GTX 1080 Ti 11GB Review @ Hardware Canucks
- Nvidia GTX 1080 Ti Founders Edition 11GB @ Kitguru
- The GTX 1080 Ti Performance Review vs. the TITAN XP & the GTX 1080 @ BabelTechReviews
- Nvidia GTX 1080 Ti CPU Showdown: i7 7700k Vs Ryzen R7 1800x Vs i7 5820k @ eTeknix
- Nvidia GeForce GTX 1080 Ti 11GB @ eTeknix
- NVIDIA GeForce GTX 1080 Ti Review: A Look At 4K & Ultrawide Gaming @ Techgage
- MSI GeForce GTX 1060 Armor OC 6 GB @ techPowerUp
Subject: General Tech | March 9, 2017 - 12:58 PM | Jeremy Hellstrom
Tagged: Kaspersky, antivirus, security, Threat de Toilette
If you are not aware of the story of John McAfee, who created the popular antivirus software before leaving to live a far more interesting life you should read up on it. Those who work in online and information security will have some sympathy for his decision as the job is rather thankless and not exactly something you can effectively use as a topic of conversation at a party. Kaspersky Labs may now be showing signs of distress after launching their new perfume line, Threat de Toilette. Yes, perfume.
There is a method to their madness if you read past the first few paragraphs on The Register. The perfume line is being advertised by fashion bloggers, who have reason to want their online information to be secure as it is the source of their livelihood and who have an audience which is not particularly knowledgeable about keeping themselves safe online. It is an intriguing way to try to spread the word about online security; here's hoping it helps at least a few people.
"The thing is, while Kaspersky is possibly talking crap about the perfume, it does manage to squeeze in a lot of good advice about security and the personal protection of it. Why it would send this to us is another mystery."
Here is some more Tech News from around the web:
- IBM Researchers Prove It Is Possible To Store Data In a Single Atom @ Slashdot
- Microsoft: Can't wait for ARM to power MOST of our cloud data centers! Take that, Intel! Ha! Ha! @ The Register
- Apache Struts 2 needs patching, without delay. It's under attack now @ The Register
- Microsoft is adding 'adverts' for OneDrive in Windows 10's File Explorer @ The Inquirer
- Video intercom firm Doorbird wants $80 for device password resets @ The Register
- The 32-Core AMD Naples CPU Tech Report @ TechARP
- Uber Admits Its Ghost Driver 'Greyball' Tool Was Used To Thwart Regulators, Vows To Stop @ Slashdot