Raptr Update Available (for Both NVIDIA and AMD GPUs)

Subject: General Tech, Graphics Cards | July 28, 2014 - 09:00 AM |
Tagged: raptr, pc game streaming

Raptr seems to be gaining in popularity. Total playtime recorded by the online service was up 15% month-over-month, from May to June. The software is made up of a few features that are designed to make the lives of PC gamers easier and better, ranging from optimizing game settings to recording gameplay. If you have used a recent version of GeForce Experience, then you probably have a good idea of what Raptr does.


Today, Raptr has announced a new, major update. The version's headlining feature is hardware accelerated video recording, and streaming, for both AMD and NVIDIA GPUs. Raptr claims that their method leads to basically no performance lost, regardless of which GPU vendor is used. Up to 20 minutes of previous gameplay can be recorded after it happened and video of unlimited length can be streamed on demand.


Notice the recording overlay in the top left.

The other, major feature of this version is enhanced sharing of said videos. They can be uploaded to Raptr.com and shared to Facebook and Twitter, complete with hashtags (#BecauseYolo?)

If interested, check out Raptr at their website.

Source: Raptr

Rumor: NVIDIA GeForce 800-Series Is 28nm in Oct/Nov.

Subject: General Tech, Graphics Cards | July 24, 2014 - 07:32 PM |
Tagged: nvidia, gtx 880

Many of our readers were hoping to drop one (or more) Maxwell-based GPUs in their system for use with their 4K monitors, 3D, or whatever else they need performance for. That has not happened, nor do we even know, for sure, when it will. The latest rumors claim that the NVIDIA GeForce GTX 870 and 880 desktop GPUs will arrive in October or November. More interesting, it is expected to be based on GM204 at the current, 28nm process.


The recent GPU roadmap, as of GTC 2014

NVIDIA has not commented on the delay, at least that I know of, but we can tell something is up from their significantly different roadmap. We can also make a fairly confident guess, by paying attention to the industry as a whole. TSMC has been struggling to keep up with 28nm production, having increased wait times by six extra weeks in May, according to Digitimes, and whatever 20nm capacity they had was reportedly gobbled up by Apple until just recently. At around the same time, NVIDIA inserted Pascal between Maxwell and Volta with 3D memory, NVLink, and some unified memory architecture (which I don't believe they yet elaborated on).


The previous roadmap. (Source: Anandtech)

And, if this rumor is true, Maxwell was pushed from 20nm to a wholly 28nm architecture. It was originally supposed to be host of unified virtual memory, not Pascal. If I had to make a safe guess, I would assume that NVIDIA needed to redesign their chip to 28nm and, especially with the extra delays at TSMC, cannot get the volume they need until Autumn.

Lastly, going by the launch of the 750ti, Maxwell will basically be a cleaned-up Kepler architecture. Its compute units were shifted into power-of-two partitions, reducing die area for scheduling logic (and so forth). NVIDIA has been known to stash a few features into each generation, sometimes revealing them well after retail availability, so that is not to say that Maxwell will be "a more efficient Kepler".

I expect its fundamental architecture should be pretty close, though.

Source: KitGuru

NVIDIA Preparing GeForce 800M (Laptop) Maxwell GPUs?

Subject: General Tech, Graphics Cards, Mobile | July 19, 2014 - 03:29 AM |
Tagged: nvidia, geforce, maxwell, mobile gpu, mobile graphics

Apparently, some hardware sites got their hands on an NVIDIA driver listing with several new product codes. They claim thirteen N16(P/E) chips are listed (although I count twelve (??)). While I do not have much knowledge of NVIDIA's internal product structure, the GeForce GTX 880M, based on Kepler, is apparently listed as N15E.


Things have changed a lot since this presentation.

These new parts will allegedly be based on the second-generation Maxwell architecture. Also, the source believes that these new GPUs will in the GeForce GTX 800-series, possibly with the MX suffix that was last seen in October 2012 with the GeForce GTX 680MX. Of course, being a long-time PC gamer, the MX suffix does not exactly ring positive with my memory. It used to be the Ti-line that you wanted, and the MX-line that you could afford. But who am I kidding? None of that is relevant these days. Get off my lawn.

Source: Videocardz

Intel AVX-512 Expanded

Subject: General Tech, Graphics Cards, Processors | July 19, 2014 - 03:05 AM |
Tagged: Xeon Phi, xeon, Intel, avx-512, avx

It is difficult to know what is actually new information in this Intel blog post, but it is interesting none-the-less. Its topic is the AVX-512 extension to x86, designed for Xeon and Xeon Phi processors and co-processors. Basically, last year, Intel announced "Foundation", the minimum support level for AVX-512, as well as Conflict Detection, Exponential and Reciprocal, and Prefetch, which are optional. This, earlier blog post was very much focused on Xeon Phi, but it acknowledged that the instructions will make their way to standard, CPU-like Xeons at around the same time.


This year's blog post brings in a bit more information, especially for common Xeons. While all AVX-512-supporting processors (and co-processors) will support "AVX-512 Foundation", the instruction set extensions are a bit more scattered.

Xeon Phi
Xeon Phi
Coprocessors (AIBs)
Foundation Instructions Yes Yes Yes
Conflict Detection Instructions Yes Yes Yes
Exponential and Reciprocal Instructions No Yes Yes
Prefetch Instructions No Yes Yes
Byte and Word Instructions Yes No No
Doubleword and Quadword Instructions Yes No No
Vector Length Extensions Yes No No

Source: Intel AVX-512 Blog Post (and my understanding thereof).

So why do we care? Simply put: speed. Vectorization, the purpose of AVX-512, has similar benefits to multiple cores. It is not as flexible as having multiple, unique, independent cores, but it is easier to implement (and works just fine with having multiple cores, too). For an example: imagine that you have to multiply two colors together. The direct way to do it is multiply red with red, green with green, blue with blue, and alpha with alpha. AMD's 3DNow! and, later, Intel's SSE included instructions to multiply two, four-component vectors together. This reduces four similar instructions into a single operating between wider registers.

Smart compilers (and programmers, although that is becoming less common as compilers are pretty good, especially when they are not fighting developers) are able to pack seemingly unrelated data together, too, if they undergo similar instructions. AVX-512 allows for sixteen 32-bit pieces of data to be worked on at the same time. If your pixel only has four, single-precision RGBA data values, but you are looping through 2 million pixels, do four pixels at a time (16 components).

For the record, I basically just described "SIMD" (single instruction, multiple data) as a whole.

This theory is part of how GPUs became so powerful at certain tasks. They are capable of pushing a lot of data because they can exploit similarities. If your task is full of similar problems, they can just churn through tonnes of data. CPUs have been doing these tricks, too, just without compromising what they do well.

Source: Intel

Google I/O 2014: Android Extension Pack Announced

Subject: General Tech, Graphics Cards, Mobile, Shows and Expos | July 7, 2014 - 04:06 AM |
Tagged: tegra k1, OpenGL ES, opengl, Khronos, google io, google, android extension pack, Android

Sure, this is a little late. Honestly, when I first heard the announcement, I did not see much news in it. The slide from the keynote (below) showed four points: Tesselation, Geometry Shaders, Computer [sic] Shaders, and ASTC Texture Compression. Honestly, I thought tesselation and geometry shaders were part of the OpenGL ES 3.1 spec, like compute shaders. This led to my immediate reaction: "Oh cool. They implemented OpenGL ES 3.1. Nice. Not worth a news post."


Image Credit: Blogogist

Apparently, they were not part of the ES 3.1 spec (although compute shaders are). My mistake. It turns out that Google is cooking their their own vendor-specific extensions. This is quite interesting, as it adds functionality to the API without the developer needing to target a specific GPU vendor (INTEL, NV, ATI, AMD), waiting for approval from the Architecture Review Board (ARB), or using multi-vendor extensions (EXT). In other words, it sounds like developers can target Google's vendor without knowing the actual hardware.

Hiding the GPU vendor from the developer is not the only reason for Google to host their own vendor extension. The added features are mostly from full OpenGL. This makes sense, because it was announced with NVIDIA and their Tegra K1, Kepler-based SoC. Full OpenGL compatibility was NVIDIA's selling point for the K1, due to its heritage as a desktop GPU. But, instead of requiring apps to be programmed with full OpenGL in mind, Google's extension pushes it to OpenGL ES 3.1. If the developer wants to dip their toe into OpenGL, then they could add a few Android Extension Pack features to their existing ES engine.

Epic Games' Unreal Engine 4 "Rivalry" Demo from Google I/O 2014.

The last feature, ASTC Texture Compression, was an interesting one. Apparently the Khronos Group, owners of OpenGL, were looking for a new generation of texture compression technologies. NVIDIA suggested their ZIL technology. ARM and AMD also proposed "Adaptive Scalable Texture Compression". ARM and AMD won, although the Khronos Group stated that the collaboration between ARM and NVIDIA made both proposals better than either in isolation.

Android Extension Pack is set to launch with "Android L". The next release of Android is not currently associated with a snack food. If I was their marketer, I would block out the next three versions as 5.x, and name them (L)emon, then (M)eringue, and finally (P)ie.

Would I do anything with the two skipped letters before pie? (N)(O).

ASUS STRIX GTX 780 OC 6GB in SLI, better than a Titan and less expensive to boot!

Subject: Graphics Cards | July 4, 2014 - 01:40 PM |
Tagged: STRIX GTX 780 OC 6GB, sli, crossfire, asus, 4k

Multiple monitor and 4k testing of the ASUS STRIX GTX 780 OC cards in SLI is not about the 52MHz out of box overclock but about the 12GB of VRAM that your system will have.  Apart from an issue with BF4, [H]ard|OCP tested the STRIX against a pair of reference GTX 780s and HD 290X cards at resolutions of 5760x1200 and 3840x2160.   The extra RAM made the STRIX shine in comparison to the reference card as not only was the performance better but [H] could raise many of the graphical settings but was not enough to push its performance past the 290X cards in Crossfire.  One other takeaway from this review is that even 6GB of VRAM is not enough to run Watch_Dogs with Ultra textures at these resolutions.


"You’ve seen the new ASUS STRIX GTX 780 OC Edition 6GB DirectCU II video card, now let’s look at two of these in an SLI configuration! We will explore 4K and NV Surround performance with two ASUS STRIX video cards for the ultimate high-resolution experience and see if the extra memory helps this GPU make better strides at high resolutions."

Here are some more Graphics Card articles from around the web:

Graphics Cards

Source: [H]ard|OCP
Manufacturer: Intel

When Magma Freezes Over...

Intel confirms that they have approached AMD about access to their Mantle API. The discussion, despite being clearly labeled as "an experiment" by an Intel spokesperson, was initiated by them -- not AMD. According to AMD's Gaming Scientist, Richard Huddy, via PCWorld, AMD's response was, "Give us a month or two" and "we'll go into the 1.0 phase sometime this year" which only has about five months left in it. When the API reaches 1.0, anyone who wants to participate (including hardware vendors) will be granted access.


AMD inside Intel Inside???

I do wonder why Intel would care, though. Intel has the fastest per-thread processors, and their GPUs are not known to be workhorses that are held back by API call bottlenecks, either. Of course, that is not to say that I cannot see any reason, however...

Read on to see why, I think, Intel might be interested and what this means for the industry.

Manufacturer: MSI

The Radeon R9 280

Though not really new, the AMD Radeon R9 280 GPU is a part that we really haven't spent time with at PC Perspective. Based on the same Tahiti GPU found in the R9 280X, the HD 7970, the HD 7950 and others, the R9 280 fits at a price point and performance level that I think many gamers will see as enticing. MSI sent along a model that includes some overclocked settings and an updated cooler, allowing the GPU to run at its top speed without much noise.

With a starting price of just $229 or so, the MSI Radeon R9 280 Gaming graphics cards has some interesting competition as well. From the AMD side it butts heads with the R9 280X and the R9 270X. The R9 280X costs $60-70 more though and as you'll see in our benchmarks, the R9 280 will likely cannibalize some of those sales. From NVIDIA, the GeForce GTX 760 is priced right at $229 as well, but does it really have the horsepower to keep with Tahiti?


Continue reading our review of the MSI Radeon R9 280 3GB Gaming Graphics Card!!

Intel's Knights Landing (Xeon Phi, 2015) Details

Subject: General Tech, Graphics Cards, Processors | July 2, 2014 - 03:55 AM |
Tagged: Intel, Xeon Phi, xeon, silvermont, 14nm

Anandtech has just published a large editorial detailing Intel's Knights Landing. Mostly, it is stuff that we already knew from previous announcements and leaks, such as one by VR-Zone from last November (which we reported on). Officially, few details were given back then, except that it would be available as either a PCIe-based add-in board or as a socketed, bootable, x86-compatible processor based on the Silvermont architecture. Its many cores, threads, and 512 bit registers are each pretty weak, compared to Haswell, for instance, but combine to about 3 TFLOPs of double precision performance.


Not enough graphs. Could use another 256...

The best way to imagine it is running a PC with a modern, Silvermont-based Atom processor -- only with up to 288 processors listed in your Task Manager (72 actual cores with quad HyperThreading).

The main limitation of GPUs (and similar coprocessors), however, is memory bandwidth. GDDR5 is often the main bottleneck of compute performance and just about the first thing to be optimized. To compensate, Intel is packaging up-to 16GB of memory (stacked DRAM) on the chip, itself. This RAM is based on "Hybrid Memory Cube" (HMC), developed by Micron Technology, and supported by the Hybrid Memory Cube Consortium (HMCC). While the actual memory used in Knights Landing is derived from HMC, it uses a proprietary interface that is customized for Knights Landing. Its bandwidth is rated at around 500GB/s. For comparison, the NVIDIA GeForce Titan Black has 336.4GB/s of memory bandwidth.

Intel and Micron have worked together in the past. In 2006, the two companies formed "IM Flash" to produce the NAND flash for Intel and Crucial SSDs. Crucial is Micron's consumer-facing brand.


So the vision for Knights Landing seems to be the bridge between CPU-like architectures and GPU-like ones. For compute tasks, GPUs edge out CPUs by crunching through bundles of similar tasks at the same time, across many (hundreds of, thousands of) computing units. The difference with (at least socketed) Xeon Phi processors is that, unlike most GPUs, Intel does not rely upon APIs, such as OpenCL, and drivers to translate a handful of functions into bundles of GPU-specific machine language. Instead, especially if the Xeon Phi is your system's main processor, it will run standard, x86-based software. The software will just run slowly, unless it is capable of vectorizing itself and splitting across multiple threads. Obviously, OpenCL (and other APIs) would make this parallelization easy, by their host/kernel design, but it is apparently not required.

It is a cool way that Intel arrives at the same goal, based on their background. Especially when you mix-and-match Xeons and Xeon Phis on the same computer, it is a push toward heterogeneous computing -- with a lot of specialized threads backing up a handful of strong ones. I just wonder if providing a more-direct method of programming will really help developers finally adopt massively parallel coding practices.

I mean, without even considering GPU compute, how efficient is most software at splitting into even two threads? Four threads? Eight threads? Can this help drive heterogeneous development? Or will this product simply try to appeal to those who are already considering it?

Source: Intel

AMD Catalyst 14.6 RC is now available

Subject: Graphics Cards | June 24, 2014 - 07:00 PM |
Tagged: amd, beta, Catalyst 14.6 RC

Starting with AMD Catalyst 14.6 Beta, AMD will no longer support Windows 8.0 (and the WDDM 1.2 driver) so Windows 8.0 users should upgrade to Windows 8.1, AMD Catalyst 14.4 will continue to work on Windows 8.0.

The WDDM 1.1 Windows 7 driver currently works on Win 7 and in a future release will be used to install updated drivers under Windows 8.0.

Features of the lastest Catalyst include:

  • Plants vs. Zombies (Direct3D performance improvements):
    • AMD Radeon R9 290X - 1920x1080 Ultra – improves up to 11%
    • AMD Radeon R9290X - 2560x1600 Ultra – improves up to 15%
    • AMD Radeon R9290X CrossFire configuration (3840x2160 Ultra) - 92% scaling
  • 3DMark Sky Diver improvements:
    • AMD A4 6300 – improves up to 4%
    • Enables AMD Dual Graphics / AMD CrossFire support
  • Grid Auto Sport: AMD CrossFire profile
  • Wildstar:
    • Power Xpress profile
    • Performance improvements to improve smoothness of application
  • Watch Dogs: AMD CrossFire – Frame pacing improvements
  • Battlefield Hardline Beta: AMD CrossFire profile

Get the driver and more information right here.


Source: AMD