Rumor: NVIDIA GeForce 800-Series Is 28nm in Oct/Nov.

Subject: General Tech, Graphics Cards | July 24, 2014 - 07:32 PM |
Tagged: nvidia, gtx 880

Many of our readers were hoping to drop one (or more) Maxwell-based GPUs in their system for use with their 4K monitors, 3D, or whatever else they need performance for. That has not happened, nor do we even know, for sure, when it will. The latest rumors claim that the NVIDIA GeForce GTX 870 and 880 desktop GPUs will arrive in October or November. More interesting, it is expected to be based on GM204 at the current, 28nm process.

nvidia-pascal-roadmap.jpg

The recent GPU roadmap, as of GTC 2014

NVIDIA has not commented on the delay, at least that I know of, but we can tell something is up from their significantly different roadmap. We can also make a fairly confident guess, by paying attention to the industry as a whole. TSMC has been struggling to keep up with 28nm production, having increased wait times by six extra weeks in May, according to Digitimes, and whatever 20nm capacity they had was reportedly gobbled up by Apple until just recently. At around the same time, NVIDIA inserted Pascal between Maxwell and Volta with 3D memory, NVLink, and some unified memory architecture (which I don't believe they yet elaborated on).

nvidia-previous-roadmap.jpg

The previous roadmap. (Source: Anandtech)

And, if this rumor is true, Maxwell was pushed from 20nm to a wholly 28nm architecture. It was originally supposed to be host of unified virtual memory, not Pascal. If I had to make a safe guess, I would assume that NVIDIA needed to redesign their chip to 28nm and, especially with the extra delays at TSMC, cannot get the volume they need until Autumn.

Lastly, going by the launch of the 750ti, Maxwell will basically be a cleaned-up Kepler architecture. Its compute units were shifted into power-of-two partitions, reducing die area for scheduling logic (and so forth). NVIDIA has been known to stash a few features into each generation, sometimes revealing them well after retail availability, so that is not to say that Maxwell will be "a more efficient Kepler".

I expect its fundamental architecture should be pretty close, though.

Source: KitGuru

NVIDIA Preparing GeForce 800M (Laptop) Maxwell GPUs?

Subject: General Tech, Graphics Cards, Mobile | July 19, 2014 - 03:29 AM |
Tagged: nvidia, geforce, maxwell, mobile gpu, mobile graphics

Apparently, some hardware sites got their hands on an NVIDIA driver listing with several new product codes. They claim thirteen N16(P/E) chips are listed (although I count twelve (??)). While I do not have much knowledge of NVIDIA's internal product structure, the GeForce GTX 880M, based on Kepler, is apparently listed as N15E.

nvidiamaxwellroadmap.jpg

Things have changed a lot since this presentation.

These new parts will allegedly be based on the second-generation Maxwell architecture. Also, the source believes that these new GPUs will in the GeForce GTX 800-series, possibly with the MX suffix that was last seen in October 2012 with the GeForce GTX 680MX. Of course, being a long-time PC gamer, the MX suffix does not exactly ring positive with my memory. It used to be the Ti-line that you wanted, and the MX-line that you could afford. But who am I kidding? None of that is relevant these days. Get off my lawn.

Source: Videocardz

Intel AVX-512 Expanded

Subject: General Tech, Graphics Cards, Processors | July 19, 2014 - 03:05 AM |
Tagged: Xeon Phi, xeon, Intel, avx-512, avx

It is difficult to know what is actually new information in this Intel blog post, but it is interesting none-the-less. Its topic is the AVX-512 extension to x86, designed for Xeon and Xeon Phi processors and co-processors. Basically, last year, Intel announced "Foundation", the minimum support level for AVX-512, as well as Conflict Detection, Exponential and Reciprocal, and Prefetch, which are optional. This, earlier blog post was very much focused on Xeon Phi, but it acknowledged that the instructions will make their way to standard, CPU-like Xeons at around the same time.

Intel_Xeon_Phi_Family.jpg

This year's blog post brings in a bit more information, especially for common Xeons. While all AVX-512-supporting processors (and co-processors) will support "AVX-512 Foundation", the instruction set extensions are a bit more scattered.

 
Xeon
Processors
Xeon Phi
Processors
Xeon Phi
Coprocessors (AIBs)
Foundation Instructions Yes Yes Yes
Conflict Detection Instructions Yes Yes Yes
Exponential and Reciprocal Instructions No Yes Yes
Prefetch Instructions No Yes Yes
Byte and Word Instructions Yes No No
Doubleword and Quadword Instructions Yes No No
Vector Length Extensions Yes No No

Source: Intel AVX-512 Blog Post (and my understanding thereof).

So why do we care? Simply put: speed. Vectorization, the purpose of AVX-512, has similar benefits to multiple cores. It is not as flexible as having multiple, unique, independent cores, but it is easier to implement (and works just fine with having multiple cores, too). For an example: imagine that you have to multiply two colors together. The direct way to do it is multiply red with red, green with green, blue with blue, and alpha with alpha. AMD's 3DNow! and, later, Intel's SSE included instructions to multiply two, four-component vectors together. This reduces four similar instructions into a single operating between wider registers.

Smart compilers (and programmers, although that is becoming less common as compilers are pretty good, especially when they are not fighting developers) are able to pack seemingly unrelated data together, too, if they undergo similar instructions. AVX-512 allows for sixteen 32-bit pieces of data to be worked on at the same time. If your pixel only has four, single-precision RGBA data values, but you are looping through 2 million pixels, do four pixels at a time (16 components).

For the record, I basically just described "SIMD" (single instruction, multiple data) as a whole.

This theory is part of how GPUs became so powerful at certain tasks. They are capable of pushing a lot of data because they can exploit similarities. If your task is full of similar problems, they can just churn through tonnes of data. CPUs have been doing these tricks, too, just without compromising what they do well.

Source: Intel

Google I/O 2014: Android Extension Pack Announced

Subject: General Tech, Graphics Cards, Mobile, Shows and Expos | July 7, 2014 - 04:06 AM |
Tagged: tegra k1, OpenGL ES, opengl, Khronos, google io, google, android extension pack, Android

Sure, this is a little late. Honestly, when I first heard the announcement, I did not see much news in it. The slide from the keynote (below) showed four points: Tesselation, Geometry Shaders, Computer [sic] Shaders, and ASTC Texture Compression. Honestly, I thought tesselation and geometry shaders were part of the OpenGL ES 3.1 spec, like compute shaders. This led to my immediate reaction: "Oh cool. They implemented OpenGL ES 3.1. Nice. Not worth a news post."

google-android-opengl-es-extensions.jpg

Image Credit: Blogogist

Apparently, they were not part of the ES 3.1 spec (although compute shaders are). My mistake. It turns out that Google is cooking their their own vendor-specific extensions. This is quite interesting, as it adds functionality to the API without the developer needing to target a specific GPU vendor (INTEL, NV, ATI, AMD), waiting for approval from the Architecture Review Board (ARB), or using multi-vendor extensions (EXT). In other words, it sounds like developers can target Google's vendor without knowing the actual hardware.

Hiding the GPU vendor from the developer is not the only reason for Google to host their own vendor extension. The added features are mostly from full OpenGL. This makes sense, because it was announced with NVIDIA and their Tegra K1, Kepler-based SoC. Full OpenGL compatibility was NVIDIA's selling point for the K1, due to its heritage as a desktop GPU. But, instead of requiring apps to be programmed with full OpenGL in mind, Google's extension pushes it to OpenGL ES 3.1. If the developer wants to dip their toe into OpenGL, then they could add a few Android Extension Pack features to their existing ES engine.

Epic Games' Unreal Engine 4 "Rivalry" Demo from Google I/O 2014.

The last feature, ASTC Texture Compression, was an interesting one. Apparently the Khronos Group, owners of OpenGL, were looking for a new generation of texture compression technologies. NVIDIA suggested their ZIL technology. ARM and AMD also proposed "Adaptive Scalable Texture Compression". ARM and AMD won, although the Khronos Group stated that the collaboration between ARM and NVIDIA made both proposals better than either in isolation.

Android Extension Pack is set to launch with "Android L". The next release of Android is not currently associated with a snack food. If I was their marketer, I would block out the next three versions as 5.x, and name them (L)emon, then (M)eringue, and finally (P)ie.

Would I do anything with the two skipped letters before pie? (N)(O).

ASUS STRIX GTX 780 OC 6GB in SLI, better than a Titan and less expensive to boot!

Subject: Graphics Cards | July 4, 2014 - 01:40 PM |
Tagged: STRIX GTX 780 OC 6GB, sli, crossfire, asus, 4k

Multiple monitor and 4k testing of the ASUS STRIX GTX 780 OC cards in SLI is not about the 52MHz out of box overclock but about the 12GB of VRAM that your system will have.  Apart from an issue with BF4, [H]ard|OCP tested the STRIX against a pair of reference GTX 780s and HD 290X cards at resolutions of 5760x1200 and 3840x2160.   The extra RAM made the STRIX shine in comparison to the reference card as not only was the performance better but [H] could raise many of the graphical settings but was not enough to push its performance past the 290X cards in Crossfire.  One other takeaway from this review is that even 6GB of VRAM is not enough to run Watch_Dogs with Ultra textures at these resolutions.

1402436254j0CnhAb2Z5_1_20_l.jpg

"You’ve seen the new ASUS STRIX GTX 780 OC Edition 6GB DirectCU II video card, now let’s look at two of these in an SLI configuration! We will explore 4K and NV Surround performance with two ASUS STRIX video cards for the ultimate high-resolution experience and see if the extra memory helps this GPU make better strides at high resolutions."

Here are some more Graphics Card articles from around the web:

Graphics Cards

Source: [H]ard|OCP
Manufacturer: Intel

When Magma Freezes Over...

Intel confirms that they have approached AMD about access to their Mantle API. The discussion, despite being clearly labeled as "an experiment" by an Intel spokesperson, was initiated by them -- not AMD. According to AMD's Gaming Scientist, Richard Huddy, via PCWorld, AMD's response was, "Give us a month or two" and "we'll go into the 1.0 phase sometime this year" which only has about five months left in it. When the API reaches 1.0, anyone who wants to participate (including hardware vendors) will be granted access.

AMD_Mantle_Logo.png

AMD inside Intel Inside???

I do wonder why Intel would care, though. Intel has the fastest per-thread processors, and their GPUs are not known to be workhorses that are held back by API call bottlenecks, either. Of course, that is not to say that I cannot see any reason, however...

Read on to see why, I think, Intel might be interested and what this means for the industry.

Author:
Manufacturer: MSI

The Radeon R9 280

Though not really new, the AMD Radeon R9 280 GPU is a part that we really haven't spent time with at PC Perspective. Based on the same Tahiti GPU found in the R9 280X, the HD 7970, the HD 7950 and others, the R9 280 fits at a price point and performance level that I think many gamers will see as enticing. MSI sent along a model that includes some overclocked settings and an updated cooler, allowing the GPU to run at its top speed without much noise.

With a starting price of just $229 or so, the MSI Radeon R9 280 Gaming graphics cards has some interesting competition as well. From the AMD side it butts heads with the R9 280X and the R9 270X. The R9 280X costs $60-70 more though and as you'll see in our benchmarks, the R9 280 will likely cannibalize some of those sales. From NVIDIA, the GeForce GTX 760 is priced right at $229 as well, but does it really have the horsepower to keep with Tahiti?

IMG_0277.JPG

Continue reading our review of the MSI Radeon R9 280 3GB Gaming Graphics Card!!

Intel's Knights Landing (Xeon Phi, 2015) Details

Subject: General Tech, Graphics Cards, Processors | July 2, 2014 - 03:55 AM |
Tagged: Intel, Xeon Phi, xeon, silvermont, 14nm

Anandtech has just published a large editorial detailing Intel's Knights Landing. Mostly, it is stuff that we already knew from previous announcements and leaks, such as one by VR-Zone from last November (which we reported on). Officially, few details were given back then, except that it would be available as either a PCIe-based add-in board or as a socketed, bootable, x86-compatible processor based on the Silvermont architecture. Its many cores, threads, and 512 bit registers are each pretty weak, compared to Haswell, for instance, but combine to about 3 TFLOPs of double precision performance.

itsbeautiful.png

Not enough graphs. Could use another 256...

The best way to imagine it is running a PC with a modern, Silvermont-based Atom processor -- only with up to 288 processors listed in your Task Manager (72 actual cores with quad HyperThreading).

The main limitation of GPUs (and similar coprocessors), however, is memory bandwidth. GDDR5 is often the main bottleneck of compute performance and just about the first thing to be optimized. To compensate, Intel is packaging up-to 16GB of memory (stacked DRAM) on the chip, itself. This RAM is based on "Hybrid Memory Cube" (HMC), developed by Micron Technology, and supported by the Hybrid Memory Cube Consortium (HMCC). While the actual memory used in Knights Landing is derived from HMC, it uses a proprietary interface that is customized for Knights Landing. Its bandwidth is rated at around 500GB/s. For comparison, the NVIDIA GeForce Titan Black has 336.4GB/s of memory bandwidth.

Intel and Micron have worked together in the past. In 2006, the two companies formed "IM Flash" to produce the NAND flash for Intel and Crucial SSDs. Crucial is Micron's consumer-facing brand.

intel-knights-landing.jpg

So the vision for Knights Landing seems to be the bridge between CPU-like architectures and GPU-like ones. For compute tasks, GPUs edge out CPUs by crunching through bundles of similar tasks at the same time, across many (hundreds of, thousands of) computing units. The difference with (at least socketed) Xeon Phi processors is that, unlike most GPUs, Intel does not rely upon APIs, such as OpenCL, and drivers to translate a handful of functions into bundles of GPU-specific machine language. Instead, especially if the Xeon Phi is your system's main processor, it will run standard, x86-based software. The software will just run slowly, unless it is capable of vectorizing itself and splitting across multiple threads. Obviously, OpenCL (and other APIs) would make this parallelization easy, by their host/kernel design, but it is apparently not required.

It is a cool way that Intel arrives at the same goal, based on their background. Especially when you mix-and-match Xeons and Xeon Phis on the same computer, it is a push toward heterogeneous computing -- with a lot of specialized threads backing up a handful of strong ones. I just wonder if providing a more-direct method of programming will really help developers finally adopt massively parallel coding practices.

I mean, without even considering GPU compute, how efficient is most software at splitting into even two threads? Four threads? Eight threads? Can this help drive heterogeneous development? Or will this product simply try to appeal to those who are already considering it?

Source: Intel

AMD Catalyst 14.6 RC is now available

Subject: Graphics Cards | June 24, 2014 - 07:00 PM |
Tagged: amd, beta, Catalyst 14.6 RC

Starting with AMD Catalyst 14.6 Beta, AMD will no longer support Windows 8.0 (and the WDDM 1.2 driver) so Windows 8.0 users should upgrade to Windows 8.1, AMD Catalyst 14.4 will continue to work on Windows 8.0.

The WDDM 1.1 Windows 7 driver currently works on Win 7 and in a future release will be used to install updated drivers under Windows 8.0.

Features of the lastest Catalyst include:

  • Plants vs. Zombies (Direct3D performance improvements):
    • AMD Radeon R9 290X - 1920x1080 Ultra – improves up to 11%
    • AMD Radeon R9290X - 2560x1600 Ultra – improves up to 15%
    • AMD Radeon R9290X CrossFire configuration (3840x2160 Ultra) - 92% scaling
  • 3DMark Sky Diver improvements:
    • AMD A4 6300 – improves up to 4%
    • Enables AMD Dual Graphics / AMD CrossFire support
  • Grid Auto Sport: AMD CrossFire profile
  • Wildstar:
    • Power Xpress profile
    • Performance improvements to improve smoothness of application
  • Watch Dogs: AMD CrossFire – Frame pacing improvements
  • Battlefield Hardline Beta: AMD CrossFire profile

Get the driver and more information right here.

images.jpg

Source: AMD

AMD Planning Open Source GameWorks Competitor, Mantle for Linux

Subject: Graphics Cards | June 19, 2014 - 10:35 AM |
Tagged: video, richard huddy, radeon, openworks, Mantle, freesync, amd

On Tuesday, AMD's newly minted Gaming Scientist, Richard Huddy, stopped by the PC Perspective office to talk about the current state of the company's graphics division. The entire video of the interview is embedded below but several of the points that are made are quite interesting and newsworthy. During the discussion we hear about Mantle on Linux, a timeline for Mantle being opened publicly as well as a surprising new idea for a competitor to NVIDIA's GameWorks program.

Richard is new to the company but not new to the industry, starting with 3DLabs many years ago and taking jobs at NVIDIA, ATI, Intel and now returning to AMD. The role of Gaming Scientist is to directly interface with the software developers for gaming and make sure that the GPU hardware designers are working hand in hand with future, high end graphics technology. In essence, Huddy's job is to make sure AMD continues to innovate on the hardware side to facilitate innovation on the software side.

AMD Planning an "OpenWorks" Program

(33:00) After the volume of discussion surrounding the NVIDIA GameWorks program and its potential to harm the gaming ecosystem by not providing source code in an open manner, Huddy believes that the answer to problem is to simply have NVIDIA release the SDK with source code publicly. Whether or not NVIDIA takes that advice has yet to be seen, but if they don't, it appears that AMD is going down the road of creating its own competing solution that is open and flexible.

The idea of OpenFX or OpenWorks as Huddy refers to it is to create an open source repository for gaming code and effects examples that can be updated, modified and improved upon by anyone in the industry. AMD would be willing to start the initiative by donating its entire SDK to the platform and then invite other software developers, as well as other hardware developers, to add or change to the collection. The idea is to create a competitor to what GameWorks accomplishes but in a license free and open way.

gameworks.jpg

NVIDIA GameWorks has been successful; can AMD OpenWorks derail it?

Essentially the "OpenWorks" repository would work in a similar way to a Linux group where the public has access to the code to submit changes that can be implemented by anyone else. Someone would be able to improve the performance for specific hardware easily but if performance was degraded on any other hardware then it could be easily changed and updated. Huddy believes this is how you move the industry forward and how you ensure that the gamer is getting the best overall experience regardless of the specific platform they are using.

"OpenWorks" is still in the planning stages and AMD is only officially "talking about it" internally. However, bringing Huddy back to AMD wasn't done without some direction already in mind and it would not surprise me at all if this was essentially a done deal. Huddy believes that other hardware companies like Qualcomm and Intel would participate in such an open system but the real question is whether or not NVIDIA, as the discrete GPU market share leader, would be in any way willing to do as well.

Still, this initiative continues to show the differences between the NVIDIA and AMD style of doing things. NVIDIA prefers a more closed system that it has full control over to perfect the experience, to hit aggressive timelines and to improve the ecosystem as they see it. AMD wants to provide an open system that everyone can participate in and benefit from but often is held back by the inconsistent speed of the community and partners. 

Mantle to be Opened by end of 2014, Potentially Coming to Linux

(7:40) The AMD Mantle API has been an industry changing product, I don't think anyone can deny that. Even if you don't own AMD hardware or don't play any of the games currently shipping with Mantle support, the re-focusing on a higher efficiency API has impacted NVIDIA's direction with DX11, Microsoft's plans for DX12 and perhaps even Apple's direction with Metal. But for a company that pushes the idea of open standards so heavily, AMD has yet to offer up Mantle source code in a similar fashion to its standard SDK. As it stands right now, Mantle is only given to a group of software developers in the beta program and is specifically tuned for AMD's GCN graphics hardware.

mantlepic.jpg

Huddy reiterated that AMD has made a commitment to release a public SDK for Mantle by the end of 2014 which would allow any other hardware vendor to create a driver that could run Mantle game titles. If AMD lives up to its word and releases the full source code for it, then in theory, NVIDIA could offer support for Mantle games on GeForce hardware, Intel could offer support those same games on Intel HD graphics. There will be no license fees, no restrictions at all.

The obvious question is whether or not any other IHV would choose to do so. Both because of competitive reasons and with the proximity of DX12's release in late 2015. Huddy agrees with me that the pride of these other hardware vendors may prevent them from considering Mantle adoption though the argument can be made that the work required to implement it properly might not be worth the effort with DX12 (and its very similar feature set) around the corner.

(51:45) When asked about AMD input on SteamOS and its commitment to the gamers that see that as the future, Huddy mentioned that AMD was considering, but not promising, bringing the Mantle API to Linux. If the opportunity exists, says Huddy, to give the gamer a better experience on that platform with the help of Mantle, and developers ask for the support for AMD, then AMD will at the very least "listen to that." It would incredibly interesting to see a competitor API in the landscape of Linux where OpenGL is essentially the only game in town. 

AMD FreeSync / Adaptive Sync Benefits

(59:15) Huddy discussed the differences, as he sees it, between NVIDIA's G-Sync technology and the AMD option called FreeSync but now officially called Adaptive Sync as part of the DisplayPort 1.2a standard. Beside the obvious difference of added hardware and licensing costs, Adaptive Sync is apparently going to be easier to implement as the maximum and minimum frequencies are actually negotiated by the display and the graphics card when the monitor is plugged in. G-Sync requires a white list in the NVIDIA driver to work today and as long as NVIDIA keeps that list updated, the impact on gamers buying panels should be minimal. But with DP 1.2a and properly implemented Adaptive Sync monitors, once a driver supports the negotiation it doesn't require knowledge about the specific model beforehand.

freesync1.jpg

AMD demos FreeSync at Computex 2014

According to Huddy, the new Adaptive Sync specification will go up to as high as 240 Hz and as low as 9 Hz; these are specifics that before today weren't known. Of course, not every panel (and maybe no panel) will support that extreme of a range for variable frame rate technology, but this leaves a lot of potential for improved panel development in the years to come. More likely you'll see Adaptive Sync ready display listing a range closer to 30-60 Hz or 30-80 Hz initially. 

Prototypes of FreeSync monitors will be going out to some media in the September or October time frame, while public availability will likely occur in the January or February window. 

How does AMD pick game titles for the Never Settle program?

(1:14:00) Huddy describes the fashion in which games are vetted for inclusion in the AMD Never Settle program. The company looks for games that have a good history of course, but also ones that exemplify the use of AMD hardware. Games that benchmark well and have reproducible results that can be reported by AMD and the media are also preferred. Inclusion of an integrated benchmark mode in the game is also a plus as it more likely gets review media interested in including that game in their test suite and also allows the public to run their own tests to compare results. 

Another interesting note was the games that are included in bundles often are picked based on restrictions in certain countries. Germany, for example, has very strict guidelines for violence in games and thus add-in card partners would much prefer a well known racing game than an ultra-bloody first person shooter. 

Closing Thoughts

First and foremost, a huge thanks to Richard Huddy for making time to stop by the offices and talk with us. And especially for allowing us to live stream it to our fans and readers. I have had the privilege to have access to some of the most interesting minds in the industry, but they are very rarely open to having our talks broadcast to the world without editing and without a precompiled list of questions. For allowing it, both AMD and Mr. Huddy have gained some respect! 

There is plenty more discussed in the interview including AMD's push to a non-PC based revenue split, whether DX12 will undermine the use of the Mantle API, and how code like TressFX compares to NVIDIA GameWorks. If you haven't watched it yet I think you'll find the full 90 minutes to be quite informative and worth your time.

UPDATE: I know that some of our readers, and some contacts and NVIDIA, took note of Huddy's comments about TressFX from our interview. Essentially, NVIDIA denied that TressFX was actually made available before the release of Tomb Raider. When I asked AMD for clarification, Richard Huddy provided me with the following statement.

I would like to take the opportunity to correct a false impression that I inadvertently created during the interview.

Contrary to what I said, it turns out that TressFX was first published in AMD's SDK _after_ the release of Tomb Raider.

Nonetheless the full source code to TressFX was available to the developer throughout, and we also know that the game was available to NVIDIA several weeks ahead of the actual release for NVIDIA to address the bugs in their driver and to optimize for TressFX.

Again, I apologize for the mistake.

That definitely paints a little bit of a different picture on around the release of TressFX with the rebooted Tomb Raider title. NVIDIA's complaint that "AMD was doing the same thing" holds a bit more weight. Since Richard Huddy was not with AMD at the time of this arrangement I can see how he would mix up the specifics, even after getting briefed by other staff members.

END UPDATE

If you want to be sure you don't miss any more of our live streaming events, be sure to keep an eye on the schedule on the right hand side of our page or sign up for our PC Perspective Live mailing list right here.