An odd Q2 for tablets and PCs

Subject: General Tech, Graphics Cards | August 19, 2014 - 12:30 PM |
Tagged: jon peddie, gpu market share, q2 2014

Jon Peddie Research's latest Market Watch adds even more ironic humour to the media's continuing proclamations of the impending doom of the PC industry.  This quarter saw tablet sales decline while overall PCs were up and that was without any major releases to drive purchasers to adopt new technology.  While JPR does touch on the overall industry this report is focused on the sale of GPUs and APUs and happens to contain some great news for AMD.  They saw their overall share of the market increase by 11% from last quarter and by just over a percent of the entire market.  Intel saw a small rise in share though it does still hold the majority of the market as PCs with no discrete GPU are more likely to contain Intel's chips than AMDs.  That leaves NVIDIA who are still banking solely on discrete GPUs and saw over an 8% decline from last quarter and a decline of almost two percent in the total market.  Check out the other graphs in JPR's overview right here.

unnamed.jpg

"The big drop in graphics shipments in Q1 has been partially offset by a small rise this quarter. Shipments were up 3.2% quarter-to-quarter, and down 4.5% compared to the same quarter last year."

Here is some more Tech News from around the web:

Tech Talk

Khronos Announces "Next" OpenGL & Releases OpenGL 4.5

Subject: General Tech, Graphics Cards, Shows and Expos | August 15, 2014 - 08:33 PM |
Tagged: siggraph 2014, Siggraph, OpenGL Next, opengl 4.5, opengl, nvidia, Mantle, Khronos, Intel, DirectX 12, amd

Let's be clear: there are two stories here. The first is the release of OpenGL 4.5 and the second is the announcement of the "Next Generation OpenGL Initiative". They both occur on the same press release, but they are two, different statements.

OpenGL 4.5 Released

OpenGL 4.5 expands the core specification with a few extensions. Compatible hardware, with OpenGL 4.5 drivers, will be guaranteed to support these. This includes features like direct_state_access, which allows accessing objects in a context without binding to it, and support of OpenGL ES3.1 features that are traditionally missing from OpenGL 4, which allows easier porting of OpenGL ES3.1 applications to OpenGL.

opengl_logo.jpg

It also adds a few new extensions as an option:

ARB_pipeline_statistics_query lets a developer ask the GPU what it has been doing. This could be useful for "profiling" an application (list completed work to identify optimization points).

ARB_sparse_buffer allows developers to perform calculations on pieces of generic buffers, without loading it all into memory. This is similar to ARB_sparse_textures... except that those are for textures. Buffers are useful for things like vertex data (and so forth).

ARB_transform_feedback_overflow_query is apparently designed to let developers choose whether or not to draw objects based on whether the buffer is overflowed. I might be wrong, but it seems like this would be useful for deciding whether or not to draw objects generated by geometry shaders.

KHR_blend_equation_advanced allows new blending equations between objects. If you use Photoshop, this would be "multiply", "screen", "darken", "lighten", "difference", and so forth. On NVIDIA's side, this will be directly supported on Maxwell and Tegra K1 (and later). Fermi and Kepler will support the functionality, but the driver will perform the calculations with shaders. AMD has yet to comment, as far as I can tell.

nvidia-opengl-debugger.jpg

Image from NVIDIA GTC Presentation

If you are a developer, NVIDIA has launched 340.65 (340.23.01 for Linux) beta drivers for developers. If you are not looking to create OpenGL 4.5 applications, do not get this driver. You really should not have any use for it, at all.

Next Generation OpenGL Initiative Announced

The Khronos Group has also announced "a call for participation" to outline a new specification for graphics and compute. They want it to allow developers explicit control over CPU and GPU tasks, be multithreaded, have minimal overhead, have a common shader language, and "rigorous conformance testing". This sounds a lot like the design goals of Mantle (and what we know of DirectX 12).

amd-mantle-queues.jpg

And really, from what I hear and understand, that is what OpenGL needs at this point. Graphics cards look nothing like they did a decade ago (or over two decades ago). They each have very similar interfaces and data structures, even if their fundamental architectures vary greatly. If we can draw a line in the sand, legacy APIs can be supported but not optimized heavily by the drivers. After a short time, available performance for legacy applications would be so high that it wouldn't matter, as long as they continue to run.

Add to it, next-generation drivers should be significantly easier to develop, considering the reduced error checking (and other responsibilities). As I said on Intel's DirectX 12 story, it is still unclear whether it will lead to enough performance increase to make most optimizations, such as those which increase workload or developer effort in exchange for queuing fewer GPU commands, unnecessary. We will need to wait for game developers to use it for a bit before we know.

AMD Catalyst 14.7 Release Candidate 3

Subject: Graphics Cards | August 14, 2014 - 07:20 PM |
Tagged: catalyst 14.7 RC3, beta, amd

A new Catalyst Release Candidate has arrived and as with the previous driver it no longer supports Windows 8.0 or the WDDM 1.2 driver, so upgrade to Win 7 or Win 8.1 before installing please.  AMD will eventually release a driver which supports WDDM 1.1 under Win 8.0 for those who do not upgrade.

AMD-Catalyst-12-11-Beta-11-7900-Modded-Driver-Crafted-for-Performance.jpg

Feature Highlights of the AMD Catalyst 14.7​ RC3 Driver for Windows Includes all improvements found in the AMD Catalyst 14.7 RC driver

  • Display interface enhancements to improve 4k monitor performance and reduce flickering.
  • Improvements apply to the following products: ​
    • AMD Radeon R9 290 Series
    • AMD Radeon R9 270 Series
    • AMD Radeon HD 7800 Series​ ​​
  • Even with these improvements, cable quality and other system variables can affect 4k performance. AMD recommends using DisplayPort 1.2 HBR2 certified cables with a length of 2m (~6 ft) or less when driving 4K monitors.​
  • Wildstar: AMD Crossfire profile support
  • Lichdom: Single GPU and Multi-GPU performance enhancements
  • Watch Dogs: Smoother gameplay on single GPU and Multi-GPU configurations​

Feature Highlights of the AMD Catalyst 14.7​ RC Driver for Windows

  • Includes all improvements found in the AMD Catalyst 14.6 RC driver
    • AMD ​CrossFire and AMD Radeon Dual Graphics profile update for Plants vs. Zombies​​​
    • Assassin's Creed IV - improved CrossFire scaling (3840x2160 High Settings) up to 93%
    • Collaboration with AOC has identified non-standard display timings as the root cause of 60Hz SST flickering exhibited by the AOC U2868PQU panel on certain AMD Radeon graphics cards.
    • A software workaround has been implemented in AMD Catalyst 14.7 RC driver to resolve the display timing issues with this display. Users are further encouraged to obtain newer display firmware from AOC that will resolve flickering at its origin.
    • Users are additionally advised to utilize DisplayPort-certified cables to ensure the integrity of the DisplayPort data connection.​​​

Feature Highlights of the AMD Catalyst 14.6 RC Driver for Windows

  • Plants vs. Zombies (Direct3D performance improvements):
    • AMD Radeon R9 290X - 1920x1080 Ultra – improves up to 11%
    • AMD Radeon R9 290X - 2560x1600 Ultra – improves up to 15%
    • AMD Radeon R9 290X CrossFire configuration (3840x2160 Ultra) - 92% scaling
  • 3DMark Sky Diver improvements:
    • AMD A4-6300 – improves up to 4%
    • Enables AMD Dual Graphics/AMD CrossFire support
  • Grid Auto Sport: AMD CrossFire profile
  • Wildstar: Power Xpress profile
    • Performance improvements to improve smoothness of application
    • Performance improves up to 24% at 2560x1600 on the AMD Radeon R9 and R7 Series of products for both single GPU and multi-GPU configurations.
  • Watch Dogs: AMD CrossFire – Frame pacing improvements
  • Battlefield Hardline Beta: AMD CrossFire profile

Known Issues

  • Running Watch Dogs with a R9 280X CrossFire configuration may result in the application running in CrossFire software compositing mode
  • Enabling Temporal SMAA in a CrossFire configuration when playing Watch Dogs will result in flickering
  • AMD CrossFire configurations with AMD Eyefinity enabled will see instability with BattleField 4 or Thief when running Mantle
  • Catalyst Install Manager text is covered by Express/Custom radio button text
  • Express Uninstall does not remove C:\Program Files\(AMD or ATI) folder
Source: AMD

Intel and Microsoft Show DirectX 12 Demo and Benchmark

Subject: General Tech, Graphics Cards, Processors, Mobile, Shows and Expos | August 13, 2014 - 09:55 PM |
Tagged: siggraph 2014, Siggraph, microsoft, Intel, DirectX 12, directx 11, DirectX

Along with GDC Europe and Gamescom, Siggraph 2014 is going on in Vancouver, BC. At it, Intel had a DirectX 12 demo at their booth. This scene, containing 50,000 asteroids, each in its own draw call, was developed on both Direct3D 11 and Direct3D 12 code paths and could apparently be switched while the demo is running. Intel claims to have measured both power as well as frame rate.

intel-dx12-LockedFPS.png

Variable power to hit a desired frame rate, DX11 and DX12.

The test system is a Surface Pro 3 with an Intel HD 4400 GPU. Doing a bit of digging, this would make it the i5-based Surface Pro 3. Removing another shovel-load of mystery, this would be the Intel Core i5-4300U with two cores, four threads, 1.9 GHz base clock, up-to 2.9 GHz turbo clock, 3MB of cache, and (of course) based on the Haswell architecture.

While not top-of-the-line, it is also not bottom-of-the-barrel. It is a respectable CPU.

Intel's demo on this processor shows a significant power reduction in the CPU, and even a slight decrease in GPU power, for the same target frame rate. If power was not throttled, Intel's demo goes from 19 FPS all the way up to a playable 33 FPS.

Intel will discuss more during a video interview, tomorrow (Thursday) at 5pm EDT.

intel-dx12-unlockedFPS-1.jpg

Maximum power in DirectX 11 mode.

For my contribution to the story, I would like to address the first comment on the MSDN article. It claims that this is just an "ideal scenario" of a scene that is bottlenecked by draw calls. The thing is: that is the point. Sure, a game developer could optimize the scene to (maybe) instance objects together, and so forth, but that is unnecessary work. Why should programmers, or worse, artists, need to spend so much of their time developing art so that it could be batch together into fewer, bigger commands? Would it not be much easier, and all-around better, if the content could be developed as it most naturally comes together?

That, of course, depends on how much performance improvement we will see from DirectX 12, compared to theoretical max efficiency. If pushing two workloads through a DX12 GPU takes about the same time as pushing one, double-sized workload, then it allows developers to, literally, perform whatever solution is most direct.

intel-dx12-unlockedFPS-2.jpg

Maximum power when switching to DirectX 12 mode.

If, on the other hand, pushing two workloads is 1000x slower than pushing a single, double-sized one, but DirectX 11 was 10,000x slower, then it could be less relevant because developers will still need to do their tricks in those situations. The closer it gets, the fewer occasions that strict optimization is necessary.

If there are any DirectX 11 game developers, artists, and producers out there, we would like to hear from you. How much would a (let's say) 90% reduction in draw call latency (which is around what Mantle claims) give you, in terms of fewer required optimizations? Can you afford to solve problems "the naive way" now? Some of the time? Most of the time? Would it still be worth it to do things like object instancing and fewer, larger materials and shaders? How often?

To boldy go where no 290X has gone before?

Subject: Graphics Cards | August 13, 2014 - 06:11 PM |
Tagged: factory overclocked, sapphire, R9 290X, Vapor-X R9 290X TRI-X OC

As far as factory overclocks go, the 1080MHz core and 5.64GHz RAM on the new Sapphire Vapor-X 290X is impressive and takes the prize for the highest factory overclock on this card [H]ard|OCP has seen yet.  That didn't stop them from pushing it to 1180MHz and 5.9GHz after a little work which is even more impressive.  At both the factory and manual overclocks the card handily beat the reference model and the manually overclocked benchmarks could meet or beat the overclocked MSI GTX 780 Ti GAMING 3G OC card.  The speed is not the only good feature, Intelligent Fan Control keeps two of the three fans from spinning when the GPU is under 60C which vastly reduces the noise produced by this card.  It is currently selling for $646, lower than the $710 that the GeForce is currently selling for as well.

1406869221rJVdvhdB2o_1_6_l.jpg

"We take a look at the SAPPHIRE Vapor-X R9 290X TRI-X OC video card which has the highest factory overclock we've ever encountered on any AMD R9 290X video card. This video card is feature rich and very fast. We'll overclock it to the highest GPU clocks we've seen yet on R9 290X and compare it to the competition."

Here are some more Graphics Card articles from around the web:

Graphics Cards

Source: [H]ard|OCP

Time to update your Gallium3D

Subject: General Tech, Graphics Cards | August 6, 2014 - 01:34 PM |
Tagged: radeon, Gallium3D, catalyst 14.6 Beta, linux, ubuntu 14.04

The new Gallium3D is up against the open source Catalyst 14.6 Beta, running under Ubuntu 14.04 and both the 3.14 and 3.16 Linux kernels, giving Phoronix quite a bit of testing to do.  They have numerous cards in their test ranging from an HD 6770 to an R9 290 though unfortunately there are no Gallium3D results for the R9 290 as it will not function until the release of the Linux 3.17 kernel.  Overall the gap is closing, the 14.6 Beta still remains the best performer but the open source alternative is quickly closing the gap.

image.php_.jpg

"After last week running new Nouveau vs. NVIDIA proprietary Linux graphics benchmarks, here's the results when putting AMD's hardware on the test bench and running both their latest open and closed-source drivers. Up today are the results of using the latest Radeon Gallium3D graphics code and Linux kernel against the latest beta of the binary-only Catalyst driver."

Here is some more Tech News from around the web:

Tech Talk

Source: Phoronix

Rumor: NVIDIA GeForce GTX 880 Is Actually September?

Subject: General Tech, Graphics Cards | August 3, 2014 - 04:59 PM |
Tagged: nvidia, maxwell, gtx 880

Just recently, we posted a story that claimed NVIDIA was preparing to launch high-end Maxwell in the October/November time frame. Apparently, that was generous. The graphics company is said to announce their GeForce GTX 880 in mid-September, with availability coming later in the month. It is expected to be based on the GM204 architecture (which previous rumors claim is 28nm).

nvidia-geforce.png

It is expected that the GeForce GTX 880 will be available with 4GB of video memory, with an 8GB version possible at some point. As someone who runs multiple (five) monitors, I can tell you that 2GB is not enough for someone of my use case. Windows 7 says the same. It kicks me out of applications to tell me that it does not have enough video memory. This would be enough reason for me to get more GPU memory.

We still do not know how many CUDA cores will be present in the GM204 chip, or if the GeForce GTX 880 will have all of them enabled (but I would be surprised if it didn't). Without any way to derive its theoretical performance, we cannot compare it against the GTX 780 or 780Ti. It could be significantly faster, it could be marginally faster, or it could be somewhere between.

But we will probably find out within two months.

Source: Videocardz

AMD Releases FreeSync Information as a FAQ

Subject: General Tech, Graphics Cards, Displays | July 29, 2014 - 09:02 PM |
Tagged: vesa, nvidia, g-sync, freesync, DisplayPort, amd

Dynamic refresh rates have two main purposes: save power by only forcing the monitor to refresh when a new frame is available, and increase animation smoothness by synchronizing to draw rates (rather than "catching the next bus" at 16.67ms, on the 16.67ms, for 60 Hz monitors). Mobile devices prefer the former, while PC gamers are interested in the latter.

Obviously, the video camera nullifies the effect.

NVIDIA was first to make this public with G-Sync. AMD responded with FreeSync, starting with a proposal that was later ratified by VESA as DisplayPort Adaptive-Sync. AMD, then, took up "Project FreeSync" as an AMD "hardware/software solution" to make use of DisplayPort Adaptive-Sync in a way that benefits PC gamers.

Today's news is that AMD has just released an FAQ which explains the standard much more thoroughly than they have in the past. For instance, it clarifies the distinction between DisplayPort Adaptive-Sync and Project FreeSync. Prior to the FAQ, I thought that FreeSync became DisplayPort Adaptive-Sync, and that was that. Now, it is sounding a bit more proprietary, just built upon an open, VESA standard.

If interested, check out the FAQ at AMD's website.

Source: AMD

NVIDIA 340.52 Drivers Are Now Available

Subject: General Tech, Graphics Cards | July 29, 2014 - 08:27 PM |
Tagged: nvidia, geforce, graphics drivers, shield tablet, shield

Alongside the NVIDIA SHIELD Tablet launch, the company has released their GeForce 340.52 drivers. This version allows compatible devices to use GameStream and it, also, is optimized for Metro: Redux and Final Fantasy XIV (China).

nvidia-geforce.png

The driver supports GeForce 8-series graphics cards, and later. As a reminder, for GPUs that are not based on the Fermi architecture (or later), 340.xx will be your last driver version. NVIDIA does intend to provided extended support for 340.xx (and earlier) drivers until April 1st, 2016. But, when Fermi, Kepler, and Maxwell move on to 343.xx, Tesla and earlier will not. That said, most of the content of this driver is aimed at Kepler and later. Either way, the driver itself is available for those pre-Fermi cards.

I should also mention that a user of Anandtech's forums noted the removal of Miracast from NVIDIA documentation. NVIDIA has yet to comment, although it is still very short notice, at this point.

Source: NVIDIA

This high end multi-GPU 4k showdown includes overclocking

Subject: Graphics Cards | July 29, 2014 - 02:27 PM |
Tagged: asus, gtx 780, R9 290X DC2 OC, sli, crossfire, STRIX GTX 780 OC 6GB, R9 290X

We have seen [H]ard|OCP test ASUS' STRIX GTX 780 OC 6GB and R9 290X DirectCU II before but this time they have been overclocked and paired up for a 4k showdown.  For a chance NewEgg gives the price advantage to AMD, $589 versus $599 at the time of writing (with odd blips in prices on Amazon).   The GTX 780 has been set to 1.2GHz and 6.6GHz while the 290X is 1.1GHz and 5.6GHz, keep in mind dual GPU setups may not reach the same frequencies as single cards.  Read on for their conclusions and decide if you prefer to brag about a higher overclock or have better overall performance.

14060235239aDa7rbLPT_1_1_l.jpg

"We take the ASUS STRIX GTX 780 OC 6GB video card and run two in SLI and overclock both of these at 4K resolutions to find the ultimate gameplay performance with 6GB of VRAM. We will also compare these to two overclocked ASUS Radeon R9 290X DirectCU II CrossFire video cards for the ultimate VRAM performance showdown."

Here are some more Graphics Card articles from around the web:

Graphics Cards

Source: [H]ard|OCP

Raptr Update Available (for Both NVIDIA and AMD GPUs)

Subject: General Tech, Graphics Cards | July 28, 2014 - 09:00 AM |
Tagged: raptr, pc game streaming

Raptr seems to be gaining in popularity. Total playtime recorded by the online service was up 15% month-over-month, from May to June. The software is made up of a few features that are designed to make the lives of PC gamers easier and better, ranging from optimizing game settings to recording gameplay. If you have used a recent version of GeForce Experience, then you probably have a good idea of what Raptr does.

raptr_game_settings_example_balanced.jpg

Today, Raptr has announced a new, major update. The version's headlining feature is hardware accelerated video recording, and streaming, for both AMD and NVIDIA GPUs. Raptr claims that their method leads to basically no performance lost, regardless of which GPU vendor is used. Up to 20 minutes of previous gameplay can be recorded after it happened and video of unlimited length can be streamed on demand.

Raptr_WOW-Quality-Video.jpg

Notice the recording overlay in the top left.

The other, major feature of this version is enhanced sharing of said videos. They can be uploaded to Raptr.com and shared to Facebook and Twitter, complete with hashtags (#BecauseYolo?)

If interested, check out Raptr at their website.

Source: Raptr

Rumor: NVIDIA GeForce 800-Series Is 28nm in Oct/Nov.

Subject: General Tech, Graphics Cards | July 24, 2014 - 07:32 PM |
Tagged: nvidia, gtx 880

Many of our readers were hoping to drop one (or more) Maxwell-based GPUs in their system for use with their 4K monitors, 3D, or whatever else they need performance for. That has not happened, nor do we even know, for sure, when it will. The latest rumors claim that the NVIDIA GeForce GTX 870 and 880 desktop GPUs will arrive in October or November. More interesting, it is expected to be based on GM204 at the current, 28nm process.

nvidia-pascal-roadmap.jpg

The recent GPU roadmap, as of GTC 2014

NVIDIA has not commented on the delay, at least that I know of, but we can tell something is up from their significantly different roadmap. We can also make a fairly confident guess, by paying attention to the industry as a whole. TSMC has been struggling to keep up with 28nm production, having increased wait times by six extra weeks in May, according to Digitimes, and whatever 20nm capacity they had was reportedly gobbled up by Apple until just recently. At around the same time, NVIDIA inserted Pascal between Maxwell and Volta with 3D memory, NVLink, and some unified memory architecture (which I don't believe they yet elaborated on).

nvidia-previous-roadmap.jpg

The previous roadmap. (Source: Anandtech)

And, if this rumor is true, Maxwell was pushed from 20nm to a wholly 28nm architecture. It was originally supposed to be host of unified virtual memory, not Pascal. If I had to make a safe guess, I would assume that NVIDIA needed to redesign their chip to 28nm and, especially with the extra delays at TSMC, cannot get the volume they need until Autumn.

Lastly, going by the launch of the 750ti, Maxwell will basically be a cleaned-up Kepler architecture. Its compute units were shifted into power-of-two partitions, reducing die area for scheduling logic (and so forth). NVIDIA has been known to stash a few features into each generation, sometimes revealing them well after retail availability, so that is not to say that Maxwell will be "a more efficient Kepler".

I expect its fundamental architecture should be pretty close, though.

Source: KitGuru

NVIDIA Preparing GeForce 800M (Laptop) Maxwell GPUs?

Subject: General Tech, Graphics Cards, Mobile | July 19, 2014 - 03:29 AM |
Tagged: nvidia, geforce, maxwell, mobile gpu, mobile graphics

Apparently, some hardware sites got their hands on an NVIDIA driver listing with several new product codes. They claim thirteen N16(P/E) chips are listed (although I count twelve (??)). While I do not have much knowledge of NVIDIA's internal product structure, the GeForce GTX 880M, based on Kepler, is apparently listed as N15E.

nvidiamaxwellroadmap.jpg

Things have changed a lot since this presentation.

These new parts will allegedly be based on the second-generation Maxwell architecture. Also, the source believes that these new GPUs will in the GeForce GTX 800-series, possibly with the MX suffix that was last seen in October 2012 with the GeForce GTX 680MX. Of course, being a long-time PC gamer, the MX suffix does not exactly ring positive with my memory. It used to be the Ti-line that you wanted, and the MX-line that you could afford. But who am I kidding? None of that is relevant these days. Get off my lawn.

Source: Videocardz

Intel AVX-512 Expanded

Subject: General Tech, Graphics Cards, Processors | July 19, 2014 - 03:05 AM |
Tagged: Xeon Phi, xeon, Intel, avx-512, avx

It is difficult to know what is actually new information in this Intel blog post, but it is interesting none-the-less. Its topic is the AVX-512 extension to x86, designed for Xeon and Xeon Phi processors and co-processors. Basically, last year, Intel announced "Foundation", the minimum support level for AVX-512, as well as Conflict Detection, Exponential and Reciprocal, and Prefetch, which are optional. This, earlier blog post was very much focused on Xeon Phi, but it acknowledged that the instructions will make their way to standard, CPU-like Xeons at around the same time.

Intel_Xeon_Phi_Family.jpg

This year's blog post brings in a bit more information, especially for common Xeons. While all AVX-512-supporting processors (and co-processors) will support "AVX-512 Foundation", the instruction set extensions are a bit more scattered.

 
Xeon
Processors
Xeon Phi
Processors
Xeon Phi
Coprocessors (AIBs)
Foundation Instructions Yes Yes Yes
Conflict Detection Instructions Yes Yes Yes
Exponential and Reciprocal Instructions No Yes Yes
Prefetch Instructions No Yes Yes
Byte and Word Instructions Yes No No
Doubleword and Quadword Instructions Yes No No
Vector Length Extensions Yes No No

Source: Intel AVX-512 Blog Post (and my understanding thereof).

So why do we care? Simply put: speed. Vectorization, the purpose of AVX-512, has similar benefits to multiple cores. It is not as flexible as having multiple, unique, independent cores, but it is easier to implement (and works just fine with having multiple cores, too). For an example: imagine that you have to multiply two colors together. The direct way to do it is multiply red with red, green with green, blue with blue, and alpha with alpha. AMD's 3DNow! and, later, Intel's SSE included instructions to multiply two, four-component vectors together. This reduces four similar instructions into a single operating between wider registers.

Smart compilers (and programmers, although that is becoming less common as compilers are pretty good, especially when they are not fighting developers) are able to pack seemingly unrelated data together, too, if they undergo similar instructions. AVX-512 allows for sixteen 32-bit pieces of data to be worked on at the same time. If your pixel only has four, single-precision RGBA data values, but you are looping through 2 million pixels, do four pixels at a time (16 components).

For the record, I basically just described "SIMD" (single instruction, multiple data) as a whole.

This theory is part of how GPUs became so powerful at certain tasks. They are capable of pushing a lot of data because they can exploit similarities. If your task is full of similar problems, they can just churn through tonnes of data. CPUs have been doing these tricks, too, just without compromising what they do well.

Source: Intel

Google I/O 2014: Android Extension Pack Announced

Subject: General Tech, Graphics Cards, Mobile, Shows and Expos | July 7, 2014 - 04:06 AM |
Tagged: tegra k1, OpenGL ES, opengl, Khronos, google io, google, android extension pack, Android

Sure, this is a little late. Honestly, when I first heard the announcement, I did not see much news in it. The slide from the keynote (below) showed four points: Tesselation, Geometry Shaders, Computer [sic] Shaders, and ASTC Texture Compression. Honestly, I thought tesselation and geometry shaders were part of the OpenGL ES 3.1 spec, like compute shaders. This led to my immediate reaction: "Oh cool. They implemented OpenGL ES 3.1. Nice. Not worth a news post."

google-android-opengl-es-extensions.jpg

Image Credit: Blogogist

Apparently, they were not part of the ES 3.1 spec (although compute shaders are). My mistake. It turns out that Google is cooking their their own vendor-specific extensions. This is quite interesting, as it adds functionality to the API without the developer needing to target a specific GPU vendor (INTEL, NV, ATI, AMD), waiting for approval from the Architecture Review Board (ARB), or using multi-vendor extensions (EXT). In other words, it sounds like developers can target Google's vendor without knowing the actual hardware.

Hiding the GPU vendor from the developer is not the only reason for Google to host their own vendor extension. The added features are mostly from full OpenGL. This makes sense, because it was announced with NVIDIA and their Tegra K1, Kepler-based SoC. Full OpenGL compatibility was NVIDIA's selling point for the K1, due to its heritage as a desktop GPU. But, instead of requiring apps to be programmed with full OpenGL in mind, Google's extension pushes it to OpenGL ES 3.1. If the developer wants to dip their toe into OpenGL, then they could add a few Android Extension Pack features to their existing ES engine.

Epic Games' Unreal Engine 4 "Rivalry" Demo from Google I/O 2014.

The last feature, ASTC Texture Compression, was an interesting one. Apparently the Khronos Group, owners of OpenGL, were looking for a new generation of texture compression technologies. NVIDIA suggested their ZIL technology. ARM and AMD also proposed "Adaptive Scalable Texture Compression". ARM and AMD won, although the Khronos Group stated that the collaboration between ARM and NVIDIA made both proposals better than either in isolation.

Android Extension Pack is set to launch with "Android L". The next release of Android is not currently associated with a snack food. If I was their marketer, I would block out the next three versions as 5.x, and name them (L)emon, then (M)eringue, and finally (P)ie.

Would I do anything with the two skipped letters before pie? (N)(O).

ASUS STRIX GTX 780 OC 6GB in SLI, better than a Titan and less expensive to boot!

Subject: Graphics Cards | July 4, 2014 - 01:40 PM |
Tagged: STRIX GTX 780 OC 6GB, sli, crossfire, asus, 4k

Multiple monitor and 4k testing of the ASUS STRIX GTX 780 OC cards in SLI is not about the 52MHz out of box overclock but about the 12GB of VRAM that your system will have.  Apart from an issue with BF4, [H]ard|OCP tested the STRIX against a pair of reference GTX 780s and HD 290X cards at resolutions of 5760x1200 and 3840x2160.   The extra RAM made the STRIX shine in comparison to the reference card as not only was the performance better but [H] could raise many of the graphical settings but was not enough to push its performance past the 290X cards in Crossfire.  One other takeaway from this review is that even 6GB of VRAM is not enough to run Watch_Dogs with Ultra textures at these resolutions.

1402436254j0CnhAb2Z5_1_20_l.jpg

"You’ve seen the new ASUS STRIX GTX 780 OC Edition 6GB DirectCU II video card, now let’s look at two of these in an SLI configuration! We will explore 4K and NV Surround performance with two ASUS STRIX video cards for the ultimate high-resolution experience and see if the extra memory helps this GPU make better strides at high resolutions."

Here are some more Graphics Card articles from around the web:

Graphics Cards

Source: [H]ard|OCP

Intel's Knights Landing (Xeon Phi, 2015) Details

Subject: General Tech, Graphics Cards, Processors | July 2, 2014 - 03:55 AM |
Tagged: Intel, Xeon Phi, xeon, silvermont, 14nm

Anandtech has just published a large editorial detailing Intel's Knights Landing. Mostly, it is stuff that we already knew from previous announcements and leaks, such as one by VR-Zone from last November (which we reported on). Officially, few details were given back then, except that it would be available as either a PCIe-based add-in board or as a socketed, bootable, x86-compatible processor based on the Silvermont architecture. Its many cores, threads, and 512 bit registers are each pretty weak, compared to Haswell, for instance, but combine to about 3 TFLOPs of double precision performance.

itsbeautiful.png

Not enough graphs. Could use another 256...

The best way to imagine it is running a PC with a modern, Silvermont-based Atom processor -- only with up to 288 processors listed in your Task Manager (72 actual cores with quad HyperThreading).

The main limitation of GPUs (and similar coprocessors), however, is memory bandwidth. GDDR5 is often the main bottleneck of compute performance and just about the first thing to be optimized. To compensate, Intel is packaging up-to 16GB of memory (stacked DRAM) on the chip, itself. This RAM is based on "Hybrid Memory Cube" (HMC), developed by Micron Technology, and supported by the Hybrid Memory Cube Consortium (HMCC). While the actual memory used in Knights Landing is derived from HMC, it uses a proprietary interface that is customized for Knights Landing. Its bandwidth is rated at around 500GB/s. For comparison, the NVIDIA GeForce Titan Black has 336.4GB/s of memory bandwidth.

Intel and Micron have worked together in the past. In 2006, the two companies formed "IM Flash" to produce the NAND flash for Intel and Crucial SSDs. Crucial is Micron's consumer-facing brand.

intel-knights-landing.jpg

So the vision for Knights Landing seems to be the bridge between CPU-like architectures and GPU-like ones. For compute tasks, GPUs edge out CPUs by crunching through bundles of similar tasks at the same time, across many (hundreds of, thousands of) computing units. The difference with (at least socketed) Xeon Phi processors is that, unlike most GPUs, Intel does not rely upon APIs, such as OpenCL, and drivers to translate a handful of functions into bundles of GPU-specific machine language. Instead, especially if the Xeon Phi is your system's main processor, it will run standard, x86-based software. The software will just run slowly, unless it is capable of vectorizing itself and splitting across multiple threads. Obviously, OpenCL (and other APIs) would make this parallelization easy, by their host/kernel design, but it is apparently not required.

It is a cool way that Intel arrives at the same goal, based on their background. Especially when you mix-and-match Xeons and Xeon Phis on the same computer, it is a push toward heterogeneous computing -- with a lot of specialized threads backing up a handful of strong ones. I just wonder if providing a more-direct method of programming will really help developers finally adopt massively parallel coding practices.

I mean, without even considering GPU compute, how efficient is most software at splitting into even two threads? Four threads? Eight threads? Can this help drive heterogeneous development? Or will this product simply try to appeal to those who are already considering it?

Source: Intel

AMD Catalyst 14.6 RC is now available

Subject: Graphics Cards | June 24, 2014 - 07:00 PM |
Tagged: amd, beta, Catalyst 14.6 RC

Starting with AMD Catalyst 14.6 Beta, AMD will no longer support Windows 8.0 (and the WDDM 1.2 driver) so Windows 8.0 users should upgrade to Windows 8.1, AMD Catalyst 14.4 will continue to work on Windows 8.0.

The WDDM 1.1 Windows 7 driver currently works on Win 7 and in a future release will be used to install updated drivers under Windows 8.0.

Features of the lastest Catalyst include:

  • Plants vs. Zombies (Direct3D performance improvements):
    • AMD Radeon R9 290X - 1920x1080 Ultra – improves up to 11%
    • AMD Radeon R9290X - 2560x1600 Ultra – improves up to 15%
    • AMD Radeon R9290X CrossFire configuration (3840x2160 Ultra) - 92% scaling
  • 3DMark Sky Diver improvements:
    • AMD A4 6300 – improves up to 4%
    • Enables AMD Dual Graphics / AMD CrossFire support
  • Grid Auto Sport: AMD CrossFire profile
  • Wildstar:
    • Power Xpress profile
    • Performance improvements to improve smoothness of application
  • Watch Dogs: AMD CrossFire – Frame pacing improvements
  • Battlefield Hardline Beta: AMD CrossFire profile

Get the driver and more information right here.

images.jpg

Source: AMD

AMD Planning Open Source GameWorks Competitor, Mantle for Linux

Subject: Graphics Cards | June 19, 2014 - 10:35 AM |
Tagged: video, richard huddy, radeon, openworks, Mantle, freesync, amd

On Tuesday, AMD's newly minted Gaming Scientist, Richard Huddy, stopped by the PC Perspective office to talk about the current state of the company's graphics division. The entire video of the interview is embedded below but several of the points that are made are quite interesting and newsworthy. During the discussion we hear about Mantle on Linux, a timeline for Mantle being opened publicly as well as a surprising new idea for a competitor to NVIDIA's GameWorks program.

Richard is new to the company but not new to the industry, starting with 3DLabs many years ago and taking jobs at NVIDIA, ATI, Intel and now returning to AMD. The role of Gaming Scientist is to directly interface with the software developers for gaming and make sure that the GPU hardware designers are working hand in hand with future, high end graphics technology. In essence, Huddy's job is to make sure AMD continues to innovate on the hardware side to facilitate innovation on the software side.

AMD Planning an "OpenWorks" Program

(33:00) After the volume of discussion surrounding the NVIDIA GameWorks program and its potential to harm the gaming ecosystem by not providing source code in an open manner, Huddy believes that the answer to problem is to simply have NVIDIA release the SDK with source code publicly. Whether or not NVIDIA takes that advice has yet to be seen, but if they don't, it appears that AMD is going down the road of creating its own competing solution that is open and flexible.

The idea of OpenFX or OpenWorks as Huddy refers to it is to create an open source repository for gaming code and effects examples that can be updated, modified and improved upon by anyone in the industry. AMD would be willing to start the initiative by donating its entire SDK to the platform and then invite other software developers, as well as other hardware developers, to add or change to the collection. The idea is to create a competitor to what GameWorks accomplishes but in a license free and open way.

gameworks.jpg

NVIDIA GameWorks has been successful; can AMD OpenWorks derail it?

Essentially the "OpenWorks" repository would work in a similar way to a Linux group where the public has access to the code to submit changes that can be implemented by anyone else. Someone would be able to improve the performance for specific hardware easily but if performance was degraded on any other hardware then it could be easily changed and updated. Huddy believes this is how you move the industry forward and how you ensure that the gamer is getting the best overall experience regardless of the specific platform they are using.

"OpenWorks" is still in the planning stages and AMD is only officially "talking about it" internally. However, bringing Huddy back to AMD wasn't done without some direction already in mind and it would not surprise me at all if this was essentially a done deal. Huddy believes that other hardware companies like Qualcomm and Intel would participate in such an open system but the real question is whether or not NVIDIA, as the discrete GPU market share leader, would be in any way willing to do as well.

Still, this initiative continues to show the differences between the NVIDIA and AMD style of doing things. NVIDIA prefers a more closed system that it has full control over to perfect the experience, to hit aggressive timelines and to improve the ecosystem as they see it. AMD wants to provide an open system that everyone can participate in and benefit from but often is held back by the inconsistent speed of the community and partners. 

Mantle to be Opened by end of 2014, Potentially Coming to Linux

(7:40) The AMD Mantle API has been an industry changing product, I don't think anyone can deny that. Even if you don't own AMD hardware or don't play any of the games currently shipping with Mantle support, the re-focusing on a higher efficiency API has impacted NVIDIA's direction with DX11, Microsoft's plans for DX12 and perhaps even Apple's direction with Metal. But for a company that pushes the idea of open standards so heavily, AMD has yet to offer up Mantle source code in a similar fashion to its standard SDK. As it stands right now, Mantle is only given to a group of software developers in the beta program and is specifically tuned for AMD's GCN graphics hardware.

mantlepic.jpg

Huddy reiterated that AMD has made a commitment to release a public SDK for Mantle by the end of 2014 which would allow any other hardware vendor to create a driver that could run Mantle game titles. If AMD lives up to its word and releases the full source code for it, then in theory, NVIDIA could offer support for Mantle games on GeForce hardware, Intel could offer support those same games on Intel HD graphics. There will be no license fees, no restrictions at all.

The obvious question is whether or not any other IHV would choose to do so. Both because of competitive reasons and with the proximity of DX12's release in late 2015. Huddy agrees with me that the pride of these other hardware vendors may prevent them from considering Mantle adoption though the argument can be made that the work required to implement it properly might not be worth the effort with DX12 (and its very similar feature set) around the corner.

(51:45) When asked about AMD input on SteamOS and its commitment to the gamers that see that as the future, Huddy mentioned that AMD was considering, but not promising, bringing the Mantle API to Linux. If the opportunity exists, says Huddy, to give the gamer a better experience on that platform with the help of Mantle, and developers ask for the support for AMD, then AMD will at the very least "listen to that." It would incredibly interesting to see a competitor API in the landscape of Linux where OpenGL is essentially the only game in town. 

AMD FreeSync / Adaptive Sync Benefits

(59:15) Huddy discussed the differences, as he sees it, between NVIDIA's G-Sync technology and the AMD option called FreeSync but now officially called Adaptive Sync as part of the DisplayPort 1.2a standard. Beside the obvious difference of added hardware and licensing costs, Adaptive Sync is apparently going to be easier to implement as the maximum and minimum frequencies are actually negotiated by the display and the graphics card when the monitor is plugged in. G-Sync requires a white list in the NVIDIA driver to work today and as long as NVIDIA keeps that list updated, the impact on gamers buying panels should be minimal. But with DP 1.2a and properly implemented Adaptive Sync monitors, once a driver supports the negotiation it doesn't require knowledge about the specific model beforehand.

freesync1.jpg

AMD demos FreeSync at Computex 2014

According to Huddy, the new Adaptive Sync specification will go up to as high as 240 Hz and as low as 9 Hz; these are specifics that before today weren't known. Of course, not every panel (and maybe no panel) will support that extreme of a range for variable frame rate technology, but this leaves a lot of potential for improved panel development in the years to come. More likely you'll see Adaptive Sync ready display listing a range closer to 30-60 Hz or 30-80 Hz initially. 

Prototypes of FreeSync monitors will be going out to some media in the September or October time frame, while public availability will likely occur in the January or February window. 

How does AMD pick game titles for the Never Settle program?

(1:14:00) Huddy describes the fashion in which games are vetted for inclusion in the AMD Never Settle program. The company looks for games that have a good history of course, but also ones that exemplify the use of AMD hardware. Games that benchmark well and have reproducible results that can be reported by AMD and the media are also preferred. Inclusion of an integrated benchmark mode in the game is also a plus as it more likely gets review media interested in including that game in their test suite and also allows the public to run their own tests to compare results. 

Another interesting note was the games that are included in bundles often are picked based on restrictions in certain countries. Germany, for example, has very strict guidelines for violence in games and thus add-in card partners would much prefer a well known racing game than an ultra-bloody first person shooter. 

Closing Thoughts

First and foremost, a huge thanks to Richard Huddy for making time to stop by the offices and talk with us. And especially for allowing us to live stream it to our fans and readers. I have had the privilege to have access to some of the most interesting minds in the industry, but they are very rarely open to having our talks broadcast to the world without editing and without a precompiled list of questions. For allowing it, both AMD and Mr. Huddy have gained some respect! 

There is plenty more discussed in the interview including AMD's push to a non-PC based revenue split, whether DX12 will undermine the use of the Mantle API, and how code like TressFX compares to NVIDIA GameWorks. If you haven't watched it yet I think you'll find the full 90 minutes to be quite informative and worth your time.

UPDATE: I know that some of our readers, and some contacts and NVIDIA, took note of Huddy's comments about TressFX from our interview. Essentially, NVIDIA denied that TressFX was actually made available before the release of Tomb Raider. When I asked AMD for clarification, Richard Huddy provided me with the following statement.

I would like to take the opportunity to correct a false impression that I inadvertently created during the interview.

Contrary to what I said, it turns out that TressFX was first published in AMD's SDK _after_ the release of Tomb Raider.

Nonetheless the full source code to TressFX was available to the developer throughout, and we also know that the game was available to NVIDIA several weeks ahead of the actual release for NVIDIA to address the bugs in their driver and to optimize for TressFX.

Again, I apologize for the mistake.

That definitely paints a little bit of a different picture on around the release of TressFX with the rebooted Tomb Raider title. NVIDIA's complaint that "AMD was doing the same thing" holds a bit more weight. Since Richard Huddy was not with AMD at the time of this arrangement I can see how he would mix up the specifics, even after getting briefed by other staff members.

END UPDATE

If you want to be sure you don't miss any more of our live streaming events, be sure to keep an eye on the schedule on the right hand side of our page or sign up for our PC Perspective Live mailing list right here.

PCPer Live! Interview with AMD's Richard Huddy June 17th, 4pm ET / 1pm PT

Subject: General Tech, Graphics Cards | June 18, 2014 - 05:08 PM |
Tagged: video, richard huddy, live, amd

UPDATE: Did you miss the live event? Well, there's good news and bad news. First, the bad: you can't win any of those prizes we discussed. The good: you can watch the replay posted below!

AMD recently brought back Richard Huddy in the role of Gaming Scientist, acting as the information conduit between hardware development, the software and driver teams and the game developers that make our industry exciting. 

Richard stopped by the offices of PC Perspective to talk about several subjects including his history in the industry (including NVIDIA and Intel), Mantle and other low-level APIs, the NVIDIA GameWorks debate, G-Sync versus FreeSync and a whole lot more.

This is an interview that you won't want to miss! 

On June 3rd it was announced that Richard Huddy, an industry stalwart and vetern of ATI, NVIDIA and Intel, would be rejoining AMD as Chief Gaming Scientist

Interesting news is crossing the ocean today as we learn that Richard Huddy, who has previously had stints at NVIDIA, ATI, AMD and most recently, Intel, is teaming up with AMD once again. Richard brings with him years of experience and innovation in the world of developer relations and graphics technology. Often called "the Godfather" of DirectX, AMD wants to prove to the community it is taking PC gaming seriously.

richardhuddy.jpg

Richard Huddy will be stopping by the PC Perspective offices on June 17th for a live, on-camera interview that you can watch unfold on PC Perspective's Live page. Though we plan to talk anything and everything centered on gaming and PC hardware we have a few topics that have been hot-buttons lately we know we want to ask about. Those include the AMD versus NVIDIA stint with GameWorks, AMD's developer relations and the Gaming Evolved program, how AMD feels about the current status of Adaptive Sync (G-Sync like features) and much more.

We want to take your questions as well, which is one of the reasons for this post. Richard has agreed to answer as many inquiries as possible in our allotted time and to help make this easier, we are asking our readers to give us their questions and input in the comments section of this news post. We will still take live questions in the chat room during the event, but if your question is here then you have a much better chance of that being seen and addressed.

If the intensity of these topics wasn't enough to entice you to watch the live stream, then how about this? We have a massive prize pool provided by AMD that is unmatched in our live stream history! Here's the list:

  • 1x AMD Radeon R9 295X2 8GB Graphics Card plus a power supply!
  • 1x MSI Radeon R9 280X
  • 1x Sapphire Radeon R9 280
  • 1x MSI Radeon R9 270
  • 1x HIS Radeon R9 270
  • 1x Sapphire R7 260X
  • 15x Never Settle Forever codes

Yup, that's all correct; no typos there. All you have to do is be on the PC Perspective Live! page during the stream on June 17th! We will be giving all of this hardware away to those watching the interview.

pcperlive.png

AMD's Richard Huddy Interview and Q&A

4pm ET / 1pm PT - June 17th

PC Perspective Live! Page

How can you be sure you are here at the right time? If you want some additional security besides just setting your own alarm, you can sign up for our PC Perspective Live mailing list, a simple email list that is used ONLY for these types of live events. Just head over to this page, give us your name and email address, and we'll let you know before we start the event!

I am very excited to talk with Richard again and I think that anyone interested in PC gaming is going to want to take part in this discussion!

UPDATE: I know that some of our readers, and some contacts and NVIDIA, took note of Huddy's comments about TressFX from our interview. Essentially, NVIDIA denied that TressFX was actually made available before the release of Tomb Raider. When I asked AMD for clarification, Richard Huddy provided me with the following statement.

I would like to take the opportunity to correct a false impression that I inadvertently created during the interview.

Contrary to what I said, it turns out that TressFX was first published in AMD's SDK _after_ the release of Tomb Raider.

Nonetheless the full source code to TressFX was available to the developer throughout, and we also know that the game was available to NVIDIA several weeks ahead of the actual release for NVIDIA to address the bugs in their driver and to optimize for TressFX.

Again, I apologize for the mistake.

That definitely paints a little bit of a different picture on around the release of TressFX with the rebooted Tomb Raider title. NVIDIA's complaint that "AMD was doing the same thing" holds a bit more weight. Since Richard Huddy was not with AMD at the time of this arrangement I can see how he would mix up the specifics, even after getting briefed by other staff members.

END UPDATE

Source: PCPer Live!