Will you still need me when I'm sixty; four generations of mid ranged GTXes on Linux;

Subject: Graphics Cards | July 21, 2016 - 02:04 PM |
Tagged: gtx 460, gtx 760, gtx 960, gtx 1060, fermi, kepler, maxwell, pascal

Phoronix took a look at how NVIDIA's mid range cards performance on Linux has changed over the past four generations of GPU, from Fermi, through Kepler, Maxwell, and finally Pascal.  CS:GO was run at 4k to push the newer GPUs as was DOTA, much to the dismay of the GTX 460.  The scaling is rather interesting, there is a very large delta between Fermi and Kepler which comes close to being replicated when comparing Maxwell to Pascal.  From the looks of the vast majority of the tests, the GTX 1060 will be a noticeable upgrade for Linux users no matter which previous mid range card they are currently using.  We will likely see a similar article covering AMD in the near future.

image.php_.jpg

"To complement yesterday's launch-day GeForce GTX 1060 Linux review, here are some more benchmark results with the various NVIDIA x60 graphics cards I have available for testing going back to the GeForce GTX 460 Fermi. If you are curious about the raw OpenGL/OpenCL/CUDA performance and performance-per-Watt for these mid-range x60 graphics cards from Fermi, Kepler, Maxwell, and Pascal, here are these benchmarks from Ubuntu 16.04 Linux." Here are some more Graphics Card articles from around the web:

Graphics Cards

Source: Phoronix

Report: NVIDIA GeForce GTX 1070M and 1060M Specs Leaked

Subject: Graphics Cards | July 20, 2016 - 12:19 PM |
Tagged: VideoCardz, rumor, report, nvidia, GTX 1070M, GTX 1060M, GeForce GTX 1070, GeForce GTX 1060, 2048 CUDA Cores

Specifications for the upcoming mobile version of NVIDIA's GTX 1070 GPU may have leaked, and according to the report at VideoCardz.com this GTX 1070M will have 2048 CUDA cores; 128 more than the desktop version's 1920 cores.

nvidia-geforce-gtx-1070-mobile-specs.jpg

Image credit: BenchLife via VideoCardz

The report comes via BenchLife, with the screenshot of GPU-Z showing the higher CUDA core count (though VideoCardz mentions the TMU count should be 128). The memory interface remains at 256-bit for the mobile version, with 8GB of GDDR5.

VideoCardz reported another GPU-Z screenshot (via PurePC) of the mobile GTX 1060, which appears to offer the same specs of the desktop version, at a slightly lower clock speed.

nvidia-geforce-gtx-1060-mobile-specs.jpg

Image credit: PurePC via VideoCardz

Finally, this chart was provided for reference:

videocardz_chart.PNG

Image credit: VideoCardz

Note the absence of information about a mobile variant of the GTX 1080, details of which are still unknown (for now).

Source: VideoCardz
Manufacturer: Overclock.net

Yes, We're Writing About a Forum Post

Update - July 19th @ 7:15pm EDT: Well that was fast. Futuremark published their statement today. I haven't read it through yet, but there's no reason to wait to link it until I do.

Update 2 - July 20th @ 6:50pm EDT: We interviewed Jani Joki, Futuremark's Director of Engineering, on our YouTube page. The interview is embed just below this update.

Original post below

The comments of a previous post notified us of an Overclock.net thread, whose author claims that 3DMark's implementation of asynchronous compute is designed to show NVIDIA in the best possible light. At the end of the linked post, they note that asynchronous compute is a general blanket, and that we should better understand what is actually going on.

amd-mantle-queues.jpg

So, before we address the controversy, let's actually explain what asynchronous compute is. The main problem is that it actually is a broad term. Asynchronous compute could describe any optimization that allows tasks to execute when it is most convenient, rather than just blindly doing them in a row.

I will use JavaScript as a metaphor. In this language, you can assign tasks to be executed asynchronously by passing functions as parameters. This allows events to execute code when it is convenient. JavaScript, however, is still only single threaded (without Web Workers and newer technologies). It cannot run callbacks from multiple events simultaneously, even if you have an available core on your CPU. What it does, however, is allow the browser to manage its time better. Many events can be delayed until the browser renders the page, it performs other high-priority tasks, or until the asynchronous code has everything it needs, like assets that are loaded from the internet.

mozilla-architecture.jpg

This is asynchronous computing.

However, if JavaScript was designed differently, it would have been possible to run callbacks on any available thread, not just the main thread when available. Again, JavaScript is not designed in this way, but this is where I pull the analogy back into AMD's Asynchronous Compute Engines. In an ideal situation, a graphics driver will be able to see all the functionality that a task will require, and shove them down an at-work GPU, provided the specific resources that this task requires are not fully utilized by the existing work.

Read on to see how this is being implemented, and what the controversy is.

NVIDIA's GTX 1060, the newest in their Hari Seldon lineup of cards

Subject: Graphics Cards | July 19, 2016 - 01:54 PM |
Tagged: pascal, nvidia, gtx 1060, gp106, geforce, founders edition

The GTX 1060 Founders Edition has arrived and also happens to be our first look at the 16nm FinFET GP106 silicon, the GTX 1080 and 1070 used GP104.  This card features 10 SMs, 1280 CUDA cores, 48 ROPs and 80 texture units, in many ways it is a half of a GTX 1080. The GPU is clocked at a base of 1506MHz with a boost of 1708MHz, the 6GB of VRAM at 8GHz.  [H]ard|OCP took this card through its paces, contrasting it with the RX480 and the GTX 980 at resolutions of 1440p as well as the more common 1080p.  As they do not use the frame rating tools which are the basis of our graphics testing of all cards, including the GTX 1060 of course, they included the new DOOM in their test suite.  Read on to see how they felt the card compared to the competition ... just don't expect to see a follow up article on SLI performance.

1468921254mrv4f5CHZE_1_14_l.jpg

"NVIDIA's GeForce GTX 1060 video card is launched today in the $249 and $299 price point for the Founders Edition. We will find out how it performs in comparison to AMD Radeon RX 480 in DOOM with the Vulkan API as well as DX12 and DX11 games. We'll also see how a GeForce GTX 980 compares in real world gaming."

Here are some more Graphics Card articles from around the web:

Graphics Cards

Source: [H]ard|OCP
Author:
Manufacturer: NVIDIA

GP106 Specifications

Twelve days ago, NVIDIA announced its competitor to the AMD Radeon RX 480, the GeForce GTX 1060, based on a new Pascal GPU; GP 106. Though that story was just a brief preview of the product, and a pictorial of the GTX 1060 Founders Edition card we were initially sent, it set the community ablaze with discussion around which mainstream enthusiast platform was going to be the best for gamers this summer.

Today we are allowed to show you our full review: benchmarks of the new GeForce GTX 1060 against the likes of the Radeon RX 480, the GTX 970 and GTX 980, and more. Starting at $250, the GTX 1060 has the potential to be the best bargain in the market today, though much of that will be decided based on product availability and our results on the following pages.

Does NVIDIA’s third consumer product based on Pascal make enough of an impact to dissuade gamers from buying into AMD Polaris?

01.jpg

All signs point to a bloody battle this July and August and the retail cards based on the GTX 1060 are making their way to our offices sooner than even those based around the RX 480. It is those cards, and not the reference/Founders Edition option, that will be the real competition that AMD has to go up against.

First, however, it’s important to find our baseline: where does the GeForce GTX 1060 find itself in the wide range of GPUs?

Continue reading our review of the GeForce GTX 1060 6GB graphics card!!

NVIDIA's New #OrderOf10 Origins Contest

Subject: Graphics Cards | July 19, 2016 - 01:07 AM |
Tagged: nvidia

Honestly, when I first received this news, I thought it was a mistaken re-announcement of the contest from a few months ago. The original Order of 10 challenge was made up of a series of puzzles, and the first handful of people to solve it, received a GTX 10-Series graphics card. Turns out, NVIDIA is doing it again.

nvidia-2016-orderof10-july.png

For four weeks, starting on July 21st, NVIDIA will add four new challenges and, more importantly, 100 new “chances to win”. They did not announce what those prizes will be or whether all of them will be distributed to the first 25 complete entries of each challenge, though. Some high-profile YouTube personalities, such as some of the members of Rooster Teeth, were streaming their attempts the last time around, so there might be some of that again this time, too.

Source: NVIDIA

Asus ROG STRIX RX 480 Graphics Card Coming Next Month

Subject: Graphics Cards | July 16, 2016 - 11:03 PM |
Tagged: rx 480, ROG, Radeon RX 480, polaris 10 xt, polaris 10, DirectCU III, asus

Following its previous announcement, Asus has released more information on the Republic of Gamers STRIX RX 480 graphics card. Pricing is still a mystery but the factory overclocked card will be available in the middle of next month!

In my previous coverage, I detailed that the STRIX RX 480 would be using a custom PCB along with Asus' DirectCU III cooler and Aura RGB back lighting. Yesterday, Asus revealed that the card also has a custom VRM solution that, in an interesting twist, draws all of the graphics card's power from the two PCI-E power connectors and nothing from the PCI-E slot. This would explain the inclusion of both a 6-pin and 8-pin power connector on the card! I do think that it is a bit of an over-reaction to not draw anything from the slot, but it is an interesting take on powering a graphics card and I'm interested to see how it all works out once the reviews hit and overclockers get a hold of it!

Asus ROG Strix RX 480.jpg

The custom graphics card is assembled using Asus' custom "Auto Extreme" automated assembly process and uses "Super Alloy Power II" components (which is to say that Asus claims to be using high quality hardware and build quality). The DirectCU III cooler is similar to the one used on the STRIX GTX 1080 and features direct contact heatpipes, an aluminum fin stack, and three Wing Blade fans that can spin down to zero RPMs when the card is being used on the desktop or during "casual gaming." The fan shroud and backplate are both made of metal which is a nice touch. Asus claims that the cooler is 30% cooler and three times quieter than the RX 480 reference cooler.

Last but certainly not least, Asus revealed boost clock speeds! The STRIX RX 480 will clock up to 1,330 MHz in OC Mode and up to 1,310 MHz in Gaming Mode. Further Asus has not touched the GDDR5 memory frequency which stays at the reference 8 GHz. Asus did not reveal base (average) GPU clocks. I was somewhat surprised by the factory overclock as I did not expect much out of the box, but 1,330 MHz is fairly respectable. This card should have a lot more headroom beyond that though, and fortunately Asus provides software that will automatically overclock the card even further with one click (GPU Tweak II also lets advanced users manually overclock the card). Users should be able to hit at least 1,450 MHz assuming they do decently in the silicon lottery.

For reference, stock RX 480s are clocked at 1,120 MHz base and up to 1,266 MHz boost. Asus claims their factory overclock results in a 15% higher score in 3DMark Fire Strike and 19% more performance in DOOM and Hitman.

Other features of the STRIX RX 480 include FanConnect which is two 4-pin fan headers that allows users to hook up two case fans and allow them to be controlled by the GPU. Aura RGB LEDs on the shroud and backplate allow users to match their build aesthetics. Asus also includes XSplit GameCaster for game streaming with the card.

No word on pricing yet, but you will be able to get your hands on the card in the middle of next month (specifically "worldwide from mid-August")! 

This card is definitely one of the most interesting RX 480 designs so far and I am anxiously awaiting the full reviews!

How far do you think the triple fan cooler can push AMD's Polaris 10 XT GPU?

Source: Asus

Rumor: 16nm for NVIDIA's Volta Architecture

Subject: Graphics Cards | July 16, 2016 - 06:37 PM |
Tagged: Volta, pascal, nvidia, maxwell, 16nm

For the past few generations, NVIDIA has been roughly trying to release a new architecture with a new process node, and release a refresh the following year. This ran into a hitch as Maxwell was delayed a year, apart from the GTX 750 Ti, and then pushed back to the same 28nm process that Kepler utilized. Pascal caught up with 16nm, although we know that some hard, physical limitations are right around the corner. The lattice spacing for silicon at room temperature is around ~0.5nm, so we're talking about features the size of ~the low 30s of atoms in width.

nvidia-2016-gtc-pascal-fivemiracles.png

This rumor claims that NVIDIA is not trying to go with 10nm for Volta. Instead, it will take place on the same, 16nm node that Pascal is currently occupying. This is quite interesting, because GPUs scale quite well with complexity changes, as they have many features with a relatively low clock rate, so the only real ways to increase performance are to make the existing architecture more efficient, or make a larger chip.

That said, GP100 leaves a lot of room on the table for an FP32-optimized, ~600mm2 part to crush its performance at the high end, similar to how GM200 replaced GK110. The rumored GP102, expected in the ~450mm2 range for Titan or GTX 1080 Ti-style parts, has some room to grow. Like GM200, however, it would also be unappealing to GPU compute users who need FP64. If this is what is going on, and we're totally just speculating at the moment, it would signal that enterprise customers should expect a new GPGPU card every second gaming generation.

That is, of course, unless NVIDIA recognized ways to make the Maxwell-based architecture significantly more die-space efficient in Volta. Clocks could get higher, or the circuits themselves could get simpler. You would think that, especially in the latter case, they would have integrated those ideas into Maxwell and Pascal, though; but, like HBM2 memory, there might have been a reason why they couldn't.

We'll need to wait and see. The entire rumor could be crap, who knows?

Source: Fudzilla

AMD Reveals Radeon RX 460 and RX 470 Specifications

Subject: Graphics Cards | July 16, 2016 - 01:10 AM |
Tagged: rx 470, rx 460, polaris 11, polaris 10, gcn4, esports, amd

At a launch event in Australia earlier this week AMD talked about its Polaris architecture, launched the RX 480 and revealed the specifications for the Polaris 10-based RX 470 and Polaris 11-derived RX 470 GPUs. The new budget GPUs are aimed at 1080p or lower gaming and will allegedly be available for purchase sometime in August.

AMD Polaris 10 and Polaris 11.png

First up is the AMD Radeon RX 470. This GPU is based on Polaris 10 (like the RX 480) but has some hardware disabled (mainly the number of stream processors). Based on the same 14nm process the GPU has 2,048 cores running at not yet known clocks. Thankfully, AMD has left the memory interface intact, and the RX 470 uses the same 256-bit memory bus pairing the GPU with 4GB of GDDR5 memory on the reference design and up to 8GB GDDR5 on partner cards.

Speaking of the reference design, the reference RX 470 will utilize a blower style cooler that AIBs can use but AMD expects that partners will opt to use their own custom dual and triple fan coolers (as would I). The card is powered by a single 6-pin power connector though, again, AIBs are allowed to design a card with more.

This card is reportedly aimed at 1080p gaming at "ultra and max settings". Video outputs will include DisplayPort 1.3/1.4 HDR support.

AMD Radeon RX 480 RX 470 and RX 460.png

Breaking away from Polaris 10 is the RX 460 which is the first GPU AMD has talked about using Polaris 11. This GCNv4 architecture is similar it its larger Polaris sibling but is further cut down and engineered for low power and mobile environments. While the "full" Polaris 11 appears to have 16 CUs (Compute Units), RX 460 will feature 14 of them (this should open up opportunities for lots of salvaged dies and once yields are good enough we might see a RX 465 or something with all of its stream processors enabled). With 14 CUs, that means RX 460 has 896 stream processors (again clock speeds were not discussed) and a 128-bit memory bus. AMD's reference design will pair this card with 2GB of GDDR5 but I would not be surprised to see 4GB versions possibly in a gaming laptop SKU if only just because it looks better (heh). There is no external PCI-E power connector on this card so it will be drawing all of its power from the PCI-E slot on the motherboard.

The reference graphics card is a tiny affair with a single fan HSF and support for DP 1.3/1.4 HDR. AMD further mentions 4K H.264 / HEVC encoding/decoding support. AMD is positioning this card at HTPCs and "eSports" budget gamers.

One other tidbit of information from the announcement was that AMD reiterated their new "RX" naming scheme saying that RX would be reserved for gaming and we would no longer see R9, R7, and R5 branding though AMD did not rule out future products that would not use RX aimed at other non-gaming workloads. I would expect that this will apply to APU GPUs eventually as well.

Naturally, AMD is not talking exact shipping dates or pricing but expect them to be well under the $239 of the RX 480! I would guess that RX 470 would be around the $150 mark while RX 460 will be a sub $100 part (if only barely).

What do you think about the RX 470 and RX 460? If you are interested in watching the whole event, there is a two part video of it available on YouTube. Part 1 and Part 2 are embedded below the break.

Source: Videocardz
Author:
Manufacturer: Futuremark
Tagged:

Through the looking glass

Futuremark has been the most consistent and most utilized benchmark company for PCs for quite a long time. While other companies have faltered and faded, Futuremark continues to push forward with new benchmarks and capabilities in an attempt to maintain a modern way to compare performance across platforms with standardized tests.

Back in March of 2015, 3DMark added support for an API Overhead test to help gamers and editors understand the performance advantages of Mantle and DirectX 12 compared to existing APIs. Though the results were purely “peak theoretical” numbers, the data helped showcase to consumers and developers what low levels APIs brought to the table.

3dmark-time-spy-screenshot-2.jpg

Today Futuremark is releasing a new benchmark that focuses on DX12 gaming. No longer just a feature test, Time Spy is a fully baked benchmark with its own rendering engine and scenarios for evaluating the performance of graphics cards and platforms. It requires Windows 10 and a DX12-capable graphics card, and includes two different graphics tests and a CPU test. Oh, and of course, there is a stunningly gorgeous demo mode to go along with it.

I’m not going to spend much time here dissecting the benchmark itself, but it does make sense to have an idea of what kind of technologies are built into the game engine and tests. The engine is based purely on DX12, and integrates technologies like asynchronous compute, explicit multi-adapter and multi-threaded workloads. These are highly topical ideas and will be the focus of my testing today.

Futuremark provides an interesting diagram to demonstrate the advantages DX12 has over DX11. Below you will find a listing of the average number of vertices, triangles, patches and shader calls in 3DMark Fire Strike compared with 3DMark Time Spy.

daigram.png

It’s not even close here – the new Time Spy engine has more than a factor of 10 more processing calls for some of these items. As Futuremark states, however, this kind of capability isn’t free.

With DirectX 12, developers can significantly improve the multi-thread scaling and hardware utilization of their titles. But it requires a considerable amount of graphics expertise and memory-level programming skill. The programming investment is significant and must be considered from the start of a project.

Continue reading our look at 3DMark Time Spy Asynchronous Compute performance!!