Subject: Graphics Cards | July 30, 2017 - 10:07 PM | Josh Walrath
Tagged: Vega, Siggraph, Nano
This doesn't look like it was really meant to happen, but it is in the wild now! Twitter user Drew has posted a picture of Chris Hook holding up a Vega Nano card outside the show. It draws its design from the previous Vega products that we have seen with the shroud and the red cube in the top right corner. No specifications were included with this post, but we can see that the card is significantly shorter than the RX Vega FE that Ryan had reviewed.
TDPs should be in the sub-200 watt range for such a design. The original Nano was a 150 watt TDP part that performed quite well at the time. Pricing is again not included, but we will be able to guess once the rest of the Vega lineup is announced later.
Subject: Graphics Cards | July 27, 2017 - 05:45 PM | Scott Michaud
Alongside the big Radeon Software Crimson ReLive 17.7.2 release, AMD pushed out a new developer tool to profile performance on AMD GPUs. First and foremost, it’s only designed to work with the newer graphics APIs, DirectX 12 and Vulkan, although it supports many operating systems: Windows 7, Windows 10, and Linux (Ubuntu 16.04). It doesn’t (yet) support Vega, so you will need to have a 400-, 500-, or Fury series GPU. I expect that will change in the near future, though.
So what does it do? These new graphics APIs are low-level, and there’s a lot going on within a single frame. Other tools exist to debug thing like “which draw call is painting a white blotch over part of my frame”, with AMD recommending RenderDoc. Radeon GPU Profiler is more for things like “did I feed my GPU enough tasks to mask global memory access latency?” or “what draw call took the longest to process?” Now that a lot of this is in the hands of game developers, AMD wants them to have the tools to efficiently load their GPUs.
While the software is freely available, it’s not open source. (You will see a “Source code” link in the release section of GitHub, but it’s just a Readme.)
The software team at AMD and the Radeon Technologies Group is releasing Radeon Crimson ReLive Edition 17.7.2 this evening and it includes a host of new features, improved performance capabilities, and stability improvements to boot. This isn’t the major reboot of the software that we have come to expect on an annual basis, but rather an attempt to get the software team’s work out in front of media and gamers before the onslaught of RX Vega and Threadripper steal the attention.
AMD’s software team is big on its user satisfaction ratings, which it should be after the many years of falling behind NVIDIA in this department. With 16 individual driver releases in 2017 (so far) and 20 new games optimized and supported with day one releases, the 90% rating seems to be about right. Much of the work that could be done to improve multi-GPU and other critical problems are more than a calendar year behind us, so it seems reasonable the Radeon gamers would be in a good place in terms of software support.
One big change for Crimson ReLive today is that all of those lingering settings that remained in the old Catalyst Control Panel will now reside in the proper Radeon Settings. This means matching UI and streamlined interface.
The ReLive capture and streaming capability sees a handful of upgrades today including a bump from 50mbps to 100mbps maximum bit rate, transparency support for webcams, improved optimization to lower the memory usage (and thus the overhead of running ReLive), notifications of replays and record timers, and audio controls for microphone volume and push-to-talk.
Subject: Graphics Cards | July 25, 2017 - 06:36 PM | Jeremy Hellstrom
Tagged: evga, Kingpin, 1080 ti, nvidia
A fancy new card with a fancy way of spelling K|NGP|N has just been announced by EVGA. It is a rather attractive card, eschewing RGBitis for a copper heatsink peeking through the hexagonal grill and three fans. The only glowing parts indicate the temperature of the GPU, memory and PWM controller; a far more functional use.
As you would expect, the card arrives with default clocks, a base clock of 1582MHz and boost of 1695MHz, however the card is guaranteed to hit 2025MHz and higher when you overclock the cards. The base model ships with a dual-slot profile, however EVGA chose to move the DVI port down, leaving the top of the card empty except for cooling vents, this also means you could purchase a Hydro Copper Waterblock and reduce the cards height to a single slot.
The card currently holds several single GPU World Records:
- 3DMark Time Spy World Record – 14,219
- 3DMark Fire Strike Extreme World Record – 19,361
- 3DMark Fire Strike World Record – 31,770
- UNIGINE Superposition – 8,642
July 25th, 2017 - The GeForce® GTX™ 1080 Ti was designed to be the most powerful desktop GPU ever created, and indeed it was. EVGA built upon its legacy of innovative cooling solutions and powerful overclocking with its GTX 1080 Ti SC2 and FTW3 graphics cards. Despite the overclocking headroom provided by the frigid cooling of EVGA's patented iCX Technology, the potential of the GTX 1080 Ti still leaves room for one more card at the top...and man is it good to be the K|NG.
Specifications and Design
Just a couple of short weeks ago we looked at the Radeon Vega Frontier Edition 16GB graphics card in its air-cooled variety. The results were interesting – gaming performance proved to fall somewhere between the GTX 1070 and the GTX 1080 from NVIDIA’s current generation of GeForce products. That is under many of the estimates from players in the market, including media, fans, and enthusiasts. But before we get to the RX Vega product family that is targeted at gamers, AMD has another data point for us to look at with a water-cooled version of Vega Frontier Edition. At a $1500 MSRP, which we shelled out ourselves, we are very interested to see how it changes the face of performance for the Vega GPU and architecture.
Let’s start with a look at the specifications of this version of the Vega Frontier Edition, which will be…familiar.
|Vega Frontier Edition (Liquid)||Vega Frontier Edition||Titan Xp||GTX 1080 Ti||Titan X (Pascal)||GTX 1080||TITAN X||GTX 980||R9 Fury X|
|Base Clock||1382 MHz||1382 MHz||1480 MHz||1480 MHz||1417 MHz||1607 MHz||1000 MHz||1126 MHz||1050 MHz|
|Boost Clock||1600 MHz||1600 MHz||1582 MHz||1582 MHz||1480 MHz||1733 MHz||1089 MHz||1216 MHz||-|
|Memory Clock||1890 MHz||1890 MHz||11400 MHz||11000 MHz||10000 MHz||10000 MHz||7000 MHz||7000 MHz||1000 MHz|
|Memory Interface||2048-bit HBM2||2048-bit HBM2||384-bit G5X||352-bit||384-bit G5X||256-bit G5X||384-bit||256-bit||4096-bit (HBM)|
|Memory Bandwidth||483 GB/s||483 GB/s||547.7 GB/s||484 GB/s||480 GB/s||320 GB/s||336 GB/s||224 GB/s||512 GB/s|
|300 watts||250 watts||250 watts||250 watts||180 watts||250 watts||165 watts||275 watts|
|Peak Compute||13.1 TFLOPS||13.1 TFLOPS||12.0 TFLOPS||10.6 TFLOPS||10.1 TFLOPS||8.2 TFLOPS||6.14 TFLOPS||4.61 TFLOPS||8.60 TFLOPS|
The base specs remain unchanged and AMD lists the same memory frequency and even GPU clock rates across both models. In practice though, the liquid cooled version runs at higher sustained clocks and can overclock a bit easier as well (more details later). What does change with the liquid cooled version is a usable BIOS switch on top of the card that allows you to move between two distinct power draw states: 300 watts and 350 watts.
First, it’s worth noting this is a change from the “375 watt” TDP that this card was listed at during the launch and announcement. AMD was touting a 300-watt and 375-watt version of Frontier Edition, but it appears the company backed off a bit on that, erring on the side of caution to avoid breaking any of the specifcations of PCI Express (board slot or auxiliary connectors). Even more concerning is that AMD chose to have the default state of the switch on the Vega FE Liquid card at 300 watts rather than the more aggressive 350 watts. AMD claims this to avoid any problems with lower quality power supplies that may struggle to hit slightly over 150 watts of power draw (and resulting current) from the 8-pin power connections. I would argue that any system that is going to install a $1500 graphics card can and should be prepared to provide the necessary power, but for the professional market, AMD leans towards caution. (It’s worth pointing out the RX 480 power issues that may have prompted this internal decision making were more problematic because they impacted the power delivery through the motherboard, while the 6- and 8-pin connectors are generally much safer to exceed the ratings.)
Even without clock speed changes, the move to water cooling should result in better and more consistent performance by removing the overheating concerns that surrounded our first Radeon Vega Frontier Edition review. But let’s dive into the card itself and see how the design process created a unique liquid cooled solution.
Subject: Graphics Cards | July 13, 2017 - 01:19 PM | Jeremy Hellstrom
Tagged: ROG Poseidon GTX 1080 Ti Platinum, gtx 1080 ti, asus, water cooling, factory overclocked
We have seen the test results that ASUS' Poseidon GTX 1080 Ti can manage on air cooling and now it is time to revist the card when it is watercooled. [H]ard|OCP attached the card to a Koolance Exos Liquid Cooling System Model EX2-755 and fired up the system to benchmark it. The difference is immediately noticeable, the minimum clock on watercooling almost matches the highest clock seen on air cooling, with an average observed frequency of 2003MHz, 2076MHz once they manually overclocked. This did translate into better gameplay and significantly lower operating temperatures which you can see in detail here.
"It’s time to let the liquid flow and put the ASUS ROG Poseidon GTX 1080 Ti Platinum Edition to the ultimate test. We will connect a Koolance Liquid Cooling System and test GPU frequency, gaming performance, and push the video card as hard as possible for its best overclock. Let’s find out what a little liquid can do for a GTX 1080 Ti."
Here are some more Graphics Card articles from around the web:
- MSI GTX 1080 Ti Lightning Z 11 GB @ techPowerUp
- MSI GTX 1080 Ti Lightning Z 11GB @ Kitguru
- GeForce GTX 1080 Ti @ Hardware Secrets
There has been a lot of news lately about the release of Cryptocurrency-specific graphics cards from both NVIDIA and AMD add-in board partners. While we covered the currently cryptomining phenomenon in an earlier article, today we are taking a look at one of these cards geared towards miners.
It's worth noting that I purchased this card myself from Newegg, and neither AMD or Sapphire are involved in this article. I saw this card pop up on Newegg a few days ago, and my curiosity got the best of me.
There has been a lot of speculation, and little official information from vendors about what these mining cards will actually entail.
From the outward appearance, it is virtually impossible to distinguish this "new" RX 470 from the previous Sapphire Nitro+ RX 470, besides the lack of additional display outputs beyond the DVI connection. Even the branding and labels on the card identify it as a Nitro+ RX 470.
In order to test the hashing rates of this GPU, we are using Claymore's Dual Miner Version 9.6 (mining Ethereum only) against a reference design RX 470, also from Sapphire.
On the reference RX 470 out of the box, we hit rates of about 21.8 MH/s while mining Ethereum.
Once we moved to the Sapphire mining card, we move up to at least 24 MH/s from the start.
A long time coming
External video cards for laptops have long been a dream of many PC enthusiasts, and for good reason. It’s compelling to have a thin-and-light notebook with great battery life for things like meetings or class, with the ability to plug it into a dock at home and enjoy your favorite PC games.
Many times we have been promised that external GPUs for notebooks would be a viable option. Over the years there have been many commercial solutions involving both industry standard protocols like ExpressCard, as well as proprietary connections to allow you to externally connect PCIe devices. Inspiring hackers have also had their hand with this for many years, cobbling together interesting solutions using mPCIe and M.2 ports on their notebooks which were meant for other devices.
With the introduction of Intel’s Thunderbolt standard in 2011, there was a hope that we would finally achieve external graphics nirvana. A modern, Intel-backed protocol promising PCIe x4 speeds (PCIe 2.0 at that point) sounded like it would be ideal for connecting GPUs to notebooks, and in some ways it was. Once again the external graphics communities managed to get it to work through the use of enclosures meant to connect other non-GPU PCIe devices such as RAID and video capture cards to systems. However, software support was still a limiting factor. You were required to use an external monitor to display your video, and it still felt like you were just riding the line between usability and a total hack. It felt like we were never going to get true universal support for external GPUs on notebooks.
Then, seemingly of out of nowhere, Intel decided to promote native support for external GPUs as a priority when they introduced Thunderbolt 3. Fast forward, and we've already seen a much larger adoption of Thunderbolt 3 on PC notebooks than we ever did with the previous Thunderbolt implementations. Taking all of this into account, we figured it was time to finally dip our toes into the eGPU market.
For our testing, we decided on the AKiTio Node for several reasons. First, at around $300, it's by far the lowest cost enclosure built to support GPUs. Additionally, it seems to be one of the most compatible devices currently on the market according to the very helpful comparison chart over at eGPU.io. The eGPU site is a wonderful resource for everything external GPU, over any interface possible, and I would highly recommend heading over there to do some reading if you are interested in trying out an eGPU for yourself.
The Node unit itself is a very utilitarian design. Essentially you get a folded sheet metal box with a Thunderbolt controller and 400W SFX power supply inside.
In order to install a GPU into the Node, you must first unscrew the enclosure from the back and slide the outer shell off of the device.
Once inside, we can see that there is ample room for any graphics card you might want to install in this enclosure. In fact, it seems a little too large for any of the GPUs we installed, including GTX 1080 Ti models. Here, you can see a more reasonable RX 570 installed.
Beyond opening up the enclosure to install a GPU, there is very little configuration required. My unit required a firmware update, but that was easily applied with the tools from the AKiTio site.
From here, I simply connected the Node to a ThinkPad X1, installed the NVIDIA drivers for our GTX 1080 Ti, and everything seemed to work — including using the 1080 Ti with the integrated notebook display and no external monitor!
Now that we've got the Node working, let's take a look at some performance numbers.
Two Vegas...ha ha ha
When the preorders for the Radeon Vega Frontier Edition went up last week, I made the decision to place orders in a few different locations to make sure we got it in as early as possible. Well, as it turned out, we actually had the cards show up very quickly…from two different locations.
So, what is a person to do if TWO of the newest, most coveted GPUs show up on their doorstep? After you do the first, full review of the single GPU iteration, you plug those both into your system and do some multi-GPU CrossFire testing!
There of course needs to be some discussion up front about this testing and our write up. If you read my first review of the Vega Frontier Edition you will clearly note my stance on the idea that “this is not a gaming card” and that “the drivers aren’t ready. Essentially, I said these potential excuses for performance were distraction and unwarranted based on the current state of Vega development and the proximity of the consumer iteration, Radeon RX.
But for multi-GPU, it’s a different story. Both competitors in the GPU space will tell you that developing drivers for CrossFire and SLI is incredibly difficult. Much more than simply splitting the work across different processors, multi-GPU requires extra attention to specific games, game engines, and effects rendering that are not required in single GPU environments. Add to that the fact that the market size for CrossFire and SLI has been shrinking, from an already small state, and you can see why multi-GPU is going to get less attention from AMD here.
Even more, when CrossFire and SLI support gets a focus from the driver teams, it is often late in the process, nearly last in the list of technologies to address before launch.
With that in mind, we all should understand the results we are going to show you might be indicative of the CrossFire scaling when Radeon RX Vega launches, but it very well could not. I would look at the data we are presenting today as a “current state” of CrossFire for Vega.
Performance not two-die four.
When designing an integrated circuit, you are attempting to fit as much complexity as possible within your budget of space, power, and so forth. One harsh limitation for GPUs is that, while your workloads could theoretically benefit from more and more processing units, the number of usable chips from a batch shrinks as designs grow, and the reticle limit of a fab’s manufacturing node is basically a brick wall.
What’s one way around it? Split your design across multiple dies!
NVIDIA published a research paper discussing just that. In their diagram, they show two examples. In the first diagram, the GPU is a single, typical die that’s surrounded by four stacks of HBM, like GP100; the second configuration breaks the GPU into five dies, four GPU modules and an I/O controller, with each GPU module attached to a pair of HBM stacks.
NVIDIA ran simulations to determine how this chip would perform, and, in various workloads, they found that it out-performed the largest possible single-chip GPU by about 45.5%. They scaled up the single-chip design until it had the same amount of compute units as the multi-die design, even though this wouldn’t work in the real world because no fab could actual lithograph it. Regardless, that hypothetical, impossible design was only ~10% faster than the actually-possible multi-chip one, showing that the overhead of splitting the design is only around that much, according to their simulation. It was also faster than the multi-card equivalent by 26.8%.
While NVIDIA’s simulations, run on 48 different benchmarks, have accounted for this, I still can’t visualize how this would work in an automated way. I don’t know how the design would automatically account for fetching data that’s associated with other GPU modules, as this would probably be a huge stall. That said, they spent quite a bit of time discussing how much bandwidth is required within the package, and figures of 768 GB/s to 3TB/s were mentioned, so it’s possible that it’s just the same tricks as fetching from global memory. The paper touches on the topic several times, but I didn’t really see anything explicit about what they were doing.
If you’ve been following the site over the last couple of months, you’ll note that this is basically the same as AMD is doing with Threadripper and EPYC. The main difference is that CPU cores are isolated, so sharing data between them is explicit. In fact, when that product was announced, I thought, “Huh, that would be cool for GPUs. I wonder if it’s possible, or if it would just end up being Crossfire / SLI.”
Apparently not? It should be possible?
I should note that I doubt this will be relevant for consumers. The GPU is the most expensive part of a graphics card. While the thought of four GP102-level chips working together sounds great for 4K (which is 4x1080p in resolution) gaming, quadrupling the expensive part sounds like a giant price-tag. That said, the market of GP100 (and the upcoming GV100) would pay five-plus digits for the absolute fastest compute device for deep-learning, scientific research, and so forth.
The only way I could see this working for gamers is if NVIDIA finds the sweet-spot for performance-to-yield (for a given node and time) and they scale their product stack with multiples of that. In that case, it might be cost-advantageous to hit some level of performance, versus trying to do it with a single, giant chip.
This is just my speculation, however. It’ll be interesting to see where this goes, whenever it does.