Subject: General Tech, Graphics Cards, Processors | December 3, 2013 - 04:12 AM | Scott Michaud
Tagged: Kaveri, APU, amd
The launch and subsequent availability of Kaveri is scheduled for the CES time frame. The APU unites Steamroller x86 cores with several Graphics Core Next (GCN) cores. The high-end offering, the A10-7850K, is capable of 856 GFLOPs of compute power (most of which is of course from the GPU).
Image/Leak Credit: Prohardver.hu
We now know about two SKUs: the A10-7850K and the A10-7700K. Both parts are quite similar except that the higher model is given a 200 MHz CPU bump, 3.8 GHz to 4.0 Ghz, and 33% more GPU units, 6 to 8.
But how does this compare? The original source (prohardver.hu) claims that Kaveri will achieve an average 28 FPS in Crysis 3 on low at 1680x1050; this is a 12% increase over Richland. It also achieved an average 53 FPS with Sleeping Dogs on Medium which is 26% more than Richland.
These are healthy increases over the previous generation but do not even account for HSA advantages. I am really curious what will happen if integrated graphics become accessible enough that game developers decide to target it for general compute applications. The reduction in latency (semi-wasted time bouncing memory between compute devices) might open this architecture to where it can really shine.
We will do our best to keep you up to date on this part especially when it launches at CES.
Subject: General Tech, Graphics Cards | December 2, 2013 - 03:16 PM | Scott Michaud
Tagged: nvidia, ShadowPlay
They grow up so fast these days...
GeForce Experience is NVIDIA's software package, often bundled with their driver updates, to optimize the experience of their customers. This could be adding interesting features, such as GPU-accelerated game video capture, or just recommending graphics settings for popular games.
Version 1.8 adds many desired features lacking from the previous version. I always found it weird that GeForce Experience would recommend one good baseline settings for games, and set them for you, but force you to then go into the game and tweak from there. It would be nice to see multiple presets but that is not what we get; instead, we are able to tweak the settings from within GeForce Experience. The baseline tries to provide a solid 40 FPS at the most difficult moments, computationally. You can then tune the familiar performance and quality slider from there.
You are also able to set resolutions up to 3840x2160 and select whether you would like to play in windowed (including "borderless") mode.
Also, with ShadowPlay, Windows 7 users will also be able to "shadow" the last 20 minutes like their Windows 8 neighbors. You will also be able to combine your microphone audio with the in-game audio should you select it. I can see the latter feature being very useful for shoutcasters. Apparently it allows capturing VoIP communication and not just your microphone itself.
Still no streaming to Twitch.tv, yet. It is still coming.
For now, you can download GeForce Experience from NVIDIA's GeForce website. If you want to read a little more detail about it, first, you can check out their (much longer) blog post.
Subject: General Tech, Graphics Cards, Processors | November 28, 2013 - 03:30 AM | Scott Michaud
Tagged: Intel, Xeon Phi, gpgpu
Intel was testing the waters with their Xeon Phi co-processor. Based on the architecture designed for the original Pentium processors, it was released in six products ranging from 57 to 61 cores and 6 to 16GB of RAM. This lead to double precision performance of between 1 and 1.2 TFLOPs. It was fabricated using their 22nm tri-gate technology. All of this was under the Knights Corner initiative.
In 2015, Intel plans to have Knights Landing ready for consumption. A modified Silvermont architecture will replace the many simple (basically 15 year-old) cores of the previous generation; up to 72 Silvermont-based cores (each with 4 threads) in fact. It will introduce the AVX-512 instruction set. AVX-512 allows applications to vectorize 8 64-bit (double-precision float or long integer) or 16 32-bit (single-precision float or standard integer) values.
In other words, packing a bunch of related problems into a single instruction.
The most interesting part? Two versions will be offered: Add-In Boards (AIBs) and a standalone CPU. It will not require a host CPU, because of its x86 heritage, if your application is entirely suited for an MIC architecture; unlike a Tesla, it is bootable with existing and common OSes. It can also be paired with standard Xeon processors if you would like a few strong threads with the 288 (72 x 4) the Xeon Phi provides.
And, while I doubt Intel would want to cut anyone else in, VR-Zone notes that this opens the door for AIB partners to make non-reference cards and manage some level of customer support. I'll believe a non-Intel branded AIB only when I see it.
Subject: Graphics Cards | November 27, 2013 - 04:44 PM | Jeremy Hellstrom
Tagged: sapphire, radeon, R9 290X, hawaii, amd, 290x
Ryan is not the only one who felt it necessary to investigate the reports of differing performance between retail R9 290X cards and the ones sent out for review. Legit Reviews also ordered a retail card made by Sapphire and tested it against the card sent to them by AMD. As with our results, ambient temperature had more effect on the frequency of the retail card than it did on the press sample with a 14% difference being common. Legit had another idea after they noticed that while the BIOS version was the same on both cards the part numbers differed. Find out what happened when they flashed the retail card to exactly match the press sample.
"The AMD Radeon R9 290X and R9 290 have been getting a ton of attention lately due to a number of reports that the retail cards are performing differently than the press cards that the media sites received. We have been following these stories for the past few weeks and finally decided to look into the situation ourselves."
Here are some more Graphics Card articles from around the web:
- HIS R9 270X IceQ X² Turbo Boost 2 GB @ techPowerUp
- Sapphire Toxic Edition R9 280X Video Card Review @HiTech Legion
- ASUS R9 270 Direct CU II OC 2 GB @ techPowerUp
- Powercolor Radeon R9-270X Devil @ Bjorn3D
- AMD Radeon R9 290 Review On Linux @ Phoronix
- PowerColor Devil R9 270X 2GB @ Custom PC Review
- 2560×1600: GeForce GTX 780 Ti vs Radeon R9 290X @ Benchmark Reviews
- ASUS GTX 760 MARS @ Kitguru
- Gigabyte GeForce GTX 760 4GB Video Card Review – 2GB or 4GB of VRAM @ Legit Reviews
- NVIDIA GeForce GTX 780 Ti Steams Ahead On Linux @ Phoronix
- Palit GTX 780 Ti JetStream OC @ Kitguru
- EVGA GTX 780 Ti SC ACX Review @ Hardware Canucks
- NVIDIA GeForce GTX TITAN: Windows 8.1 vs. Ubuntu 13.10 @ Phoronix
Another retail card reveals the results
Since the release of the new AMD Radeon R9 290X and R9 290 graphics cards, we have been very curious about the latest implementation of AMD's PowerTune technology and its scaling of clock frequency as a result of the thermal levels of each graphics card. In the first article covering this topic, I addressed the questions from AMD's point of view - is this really a "configurable" GPU as AMD claims or are there issues that need to be addressed by the company?
The biggest problems I found were in the highly variable clock speeds from game to game and from a "cold" GPU to a "hot" GPU. This affects the way many people in the industry test and benchmark graphics cards as running a game for just a couple of minutes could result in average and reported frame rates that are much higher than what you see 10-20 minutes into gameplay. This was rarely something that had to be dealt with before (especially on AMD graphics cards) so to many it caught them off-guard.
Because of the new PowerTune technology, as I have discussed several times before, clock speeds are starting off quite high on the R9 290X (at or near the 1000 MHz quoted speed) and then slowly drifting down over time.
Another wrinkle occurred when Tom's Hardware reported that retail graphics cards they had seen were showing markedly lower performance than the reference samples sent to reviewers. As a result, AMD quickly released a new driver that attempted to address the problem by normalizing to fan speeds (RPM) rather than fan voltage (percentage). The result was consistent fan speeds on different cards and thus much closer performance.
However, with all that being said, I was still testing retail AMD Radeon R9 290X and R9 290 cards that were PURCHASED rather than sampled, to keep tabs on the situation.
Subject: General Tech, Graphics Cards | November 26, 2013 - 03:18 AM | Scott Michaud
Tagged: R9 290X, r9 290, amd
Multiple sites are reporting that some AMD's Radeon R9 290 cards could be software-unlocked into 290Xs with a simple BIOS update. While the difference in performance is minor, free extra shader processors might be tempting for some existing owners.
"Binning" is when a manufacturer increases yield by splitting one product into several based on how they test after production. Semiconductor fabrication, specifically, is prone to constant errors and defects. Maybe only some of your wafers are not stable at 4 GHz but they can attain 3.5 or 3.7 GHz. Why throw those out when they can be sold as 3.5 GHz parts?
This is especially relevant to multi-core CPUs and GPUs. Hawaii XT has 2816 Stream processors; a compelling product could be made even with a few of those shut down. The R9 290, for instance, permits 2560 of these cores. The remaining have been laser cut or, at least, should have been.
Apparently certain batches of Radeon R9 290s were developed with fully functional Hawaii XT chips that were software locked to 290 specifications. There have been reports that several users of cards from multiple OEMs were able to flash a new BIOS to unlock these extra cores. However, other batches seem to be properly locked.
This could be interesting for lucky and brave users but I wonder why this happened. I can think of two potential causes:
- Someone (OEMs or AMD) had too many 290X chips, or
- The 290 launch was just that unprepared.
Either way, newer shipments should be properly locked even from affected OEMs. Again, not that it really matters given the performance differences we are talking about.
Subject: General Tech, Graphics Cards | November 22, 2013 - 06:26 PM | Scott Michaud
Tagged: nvidia, jpr, amd
Jen Peddie Research (JPR) reports an 8% rise in quarter-to-quarter shipments of graphics add-in boards (AIBs) for NVIDIA and a decrease of 3% for AMD. This reverses the story from last quarter where NVIDIA lost 8% and AMD gained. In all, NVIDIA holds over half the market (64.5%).
JPR attributed AMD's gains seen last quarter to consumers who added a discrete graphics solution to systems which already contain an integrated product. SLi and Crossfire were noted but pale in comparison. I expect that Never Settle to have contributed heavily. This quarter, the free games initiative was reduced with the new GPU lineup. For a decent amount of time, nothing was offered.
At the same time, NVIDIA launched the GTX 780 Ti and their own game bundle. While I do not believe this promotion was as popular as AMD's Never Settle, it probably helped. That said, it is still probably too early to tell whether the Battlefield 4 promotion (or Thief's addition to Silver Tier) will help them regain some ground.
The other vendors, Matrox and S3, were "flat to declining". Their story is the same as last quarter: they less than (maybe much less than) 7000 units. On the whole, add-in board shipments are rising from last quarter; that quarter, however, was a 5.4% drop from the one before.
Subject: General Tech, Graphics Cards, Systems | November 21, 2013 - 09:47 PM | Scott Michaud
Tagged: nvidia, tesla, supercomputing
GPUs are very efficient in terms of operations per watt. Their architecture is best suited for a gigantic bundle of similar calculations (such as a set of operations for each entry of a large blob of data). These are the tasks which also take up the most computation time especially for, not surprisingly, 3D graphics (where you need to do something to every pixel, fragment, vertex, etc.). It is also very relevant for scientific calculations, financial and other "big data" services, weather prediction, and so forth.
Tokyo Tech KFC achieves over 4 GigaFLOPs per watt of power draw from 160 Tesla K20X GPUs in its cluster. That is about 25% more calculations per watt than current leader of the Green500 (CINECA Eurora System in Italy, with 3.208 GFLOPs/W).
One interesting trait: this supercomputer will be cooled by oil immersion. NVIDIA offers passively cooled Tesla cards which, according to my understanding of how this works, suit very well to this fluid system. I am fairly certain that they remove all of the fans before dunking the servers (I figured they would be left on).
By the way, was it intentional to name computers dunked in giant vats of heat-conducting oil, "KFC"?
Intel has done a similar test, which we reported on last September, submerging numerous servers for over a year. Another benefit of being green is that you are not nearly as concerned about air conditioning.
NVIDIA is actually taking it to the practical market with another nice supercomputer win.
Other NVIDIA Supercomputing News:
- IBM and NVIDIA collaborate on GPU-accelerating IBM's enterprise software.
- Piz Daint, powered by Tesla K20X GPUs, greenest PFLOP-scale supercomputer.
Subject: Graphics Cards | November 20, 2013 - 01:52 PM | Jeremy Hellstrom
Tagged: mars, asus, ROG MARS 760, gtx 760, dual gpu
Fremont, CA (November 19, 2013) - ASUS Republic of Gamers (ROG) today announced the MARS 760 graphics card featuring two GeForce GTX 760 graphics-processing units (GPUs) capable of delivering incredible gaming performance and ensuring ultra-smooth high-resolution gameplay. The MARS 760 even outpaces the GeForce GTX TITAN — with game performance that’s up to 39% faster overall. The MARS 760 is a two-slot card packed with exclusive ASUS technologies including DirectCU II for 20%-cooler and vastly quieter operation, DIGI+ voltage-regulator module (VRM) for ultra-stable power delivery and GPU Tweak, an easy-to-use utility that lets users safely overclock the two GTX 760 GPUs.
Exclusive ASUS features provide cool, quiet, durable and stable performance ASUS exclusive DirectCU II technology puts 8 highly-conductive cooling copper heatpipes in direct contact with both GPUs. These heatpipes provide extremely efficient cooling, allowing the MARS 760 to run 20% cooler and vastly quieter than reference GeForce GTX 690 cards. Dual 90mm dust-proof fans help to provide six times (6X) greater airflow than reference design. And with 4GB of GDDR5 video memory, the ASUS ROG MARS 760 is capable of delivering visuals with incredibly high frame rates and no stutter, ensuring extremely smooth gameplay — even at WQHD resolutions. An attention-grabbing LED even illuminates as the MARS 760 is operating under load.
The MARS 760 is equipped with ROG’s acclaimed DIGI+ voltage-regulation module (VRM), featuring a 12-phase power design that reduces power noise by 30% and enhances efficiency by 15%. Custom sourced black metallic capacitors offer 20%-better temperature endurance for a lifespan that’s up to five times (5X) longer. The new card is built with extremely hardwearing polymerized organic-semiconductor capacitors (POSCAPs) and has an aluminum back-plate, further lowering power noise while increasing both durability and stability to unlock overclocking potential.
The exclusive GPU Tweak tuning tool allows quick, simple and safe control over clock speeds, voltages, cooling-fan speeds and power-consumption thresholds; GPU Tweak lets users push the two GTX 760 GPUs even further. The ROG edition of GPU Tweak included with the MARS 760 also enables detailed GPU load-line calibration and VRM-frequency tuning, allowing for the most extensive control and tweaking parameters in order to maximize overclocking potential — all adjusted via an attractive and easy-to-use graphical interface.
The GPU Tweak Streaming feature, the newest addition to the GPU Tweak tool, lets users share on-screen action over the internet in real time so others can watch live as games are played. It’s even possible to add a title to the streaming window along with scrolling text, pictures and webcam images.
- NVIDIA GeForce GTX 760 SLI
- PCI Express 3.0
- 4096MB GDDR5 memory (2GB per GPU)
- 1008MHz (1072MHz boosted) core speed
- 6004 MHz (1501 MHz GDDR5) memory clock
- 512-bit memory interface
- 2560 x 1600 maximum DVI resolution
- 2 x dual-link DVI-I output
- 1 x dual-link DVI-D output
- 1 x Mini DisplayPort output
- HDMI output (via dongle)
Subject: General Tech, Graphics Cards | November 18, 2013 - 03:33 PM | Scott Michaud
Tagged: tesla, nvidia, K40, GK110b
The Tesla K20X ruled NVIDIA's headless GPU portfolio for quite some time now. The part is based on the GK110 chip with 192 shader cores disabled, like the GeForce Titan, and achieved 3.9 TeraFLOPs of compute performance (1.31 TeraFLOPs in double precision). Also, like the Titan, the K20X offers 6GB of memory.
The Tesla K40X
So the layout was basically the following: GK104 ruled the gamer market except for the, in hindsight, oddly-positioned GeForce Titan which was basically a Tesla K20X without a few features like error correction (ECC). The Quadro K6000 was the only card to utilize all 2880 CUDA cores.
Then, at the recent G-Sync event, NVIDIA CEO Jen-Hsun Huang announced the GeForce GTX 780Ti. This card uses the GK110b processor and incorporates all 2880 CUDA cores albeit with reduced double-precision performance (for the 780 Ti, not for GK110b in general). So now we have Quadro and GeForce with the full power Kepler, your move Tesla.
And they did, the Tesla K40 launched this morning and it brought more than just cores.
A brief overview
The GeForce launch was famous for its inclusion of GPU Boost, a feature absent in the Tesla line. It turns out that NVIDIA was paying attention to the feature but wanted to include it in a way that suited data centers. GeForce cards boost based on the status of the card, its temperature or its power draw. This is apparently unsuitable for data centers because they would like every unit operating at a very similar performance. The Tesla K40 has a base clock of 745 MHz but gives the data center two boost clocks that they can manually set: 810 MHz and 875 MHz.
Relative performance benchmarks
The Tesla K40 also doubles the amount of RAM to 12GB. Of course this allows for the GPU to work on larger data sets without streaming in the computation from system memory or worse.
There is currently no public information on pricing for the Tesla K40 but it is available starting today. What we do know are the launch OEM partners: ASUS, Bull, Cray, Dell, Eurotech, HP, IBM, Inspur, SGI, Sugon, Supermicro, and Tyan.
If you are interested in testing out a K40, NVIDIA has remotely hosted clusters that your company can sign up for at the GPU Test Drive website.
Get notified when we go live!