NVIDIA Tesla K40: GK110b Gets a Career (and more vRAM)

Subject: General Tech, Graphics Cards | November 18, 2013 - 03:33 PM |
Tagged: tesla, nvidia, K40, GK110b

The Tesla K20X ruled NVIDIA's headless GPU portfolio for quite some time now. The part is based on the GK110 chip with 192 shader cores disabled, like the GeForce Titan, and achieved 3.9 TeraFLOPs of compute performance (1.31 TeraFLOPs in double precision). Also, like the Titan, the K20X offers 6GB of memory.

nvidia-k40-hero.jpg

The Tesla K40X

So the layout was basically the following: GK104 ruled the gamer market except for the, in hindsight, oddly-positioned GeForce Titan which was basically a Tesla K20X without a few features like error correction (ECC). The Quadro K6000 was the only card to utilize all 2880 CUDA cores.

Then, at the recent G-Sync event, NVIDIA CEO Jen-Hsun Huang announced the GeForce GTX 780Ti. This card uses the GK110b processor and incorporates all 2880 CUDA cores albeit with reduced double-precision performance (for the 780 Ti, not for GK110b in general). So now we have Quadro and GeForce with the full power Kepler, your move Tesla.

And they did, the Tesla K40 launched this morning and it brought more than just cores.

nvidia-tesla-k40.png

A brief overview

The GeForce launch was famous for its inclusion of GPU Boost, a feature absent in the Tesla line. It turns out that NVIDIA was paying attention to the feature but wanted to include it in a way that suited data centers. GeForce cards boost based on the status of the card, its temperature or its power draw. This is apparently unsuitable for data centers because they would like every unit operating at a very similar performance. The Tesla K40 has a base clock of 745 MHz but gives the data center two boost clocks that they can manually set: 810 MHz and 875 MHz.

nvidia-telsa-k40-2.png

Relative performance benchmarks

The Tesla K40 also doubles the amount of RAM to 12GB. Of course this allows for the GPU to work on larger data sets without streaming in the computation from system memory or worse.

There is currently no public information on pricing for the Tesla K40 but it is available starting today. What we do know are the launch OEM partners: ASUS, Bull, Cray, Dell, Eurotech, HP, IBM, Inspur, SGI, Sugon, Supermicro, and Tyan.

If you are interested in testing out a K40, NVIDIA has remotely hosted clusters that your company can sign up for at the GPU Test Drive website.

Press blast after the break!

Source: NVIDIA

AMD's Holiday Game + GPU Bundles

Subject: General Tech, Graphics Cards | November 14, 2013 - 07:54 PM |
Tagged: never settle forever, never settle, battlefield 4, amd

UPDATE (11/14/2013): After many complaints from the community about the lack of availability of graphics cards that actually HAD the Battlefield 4 bundle included with them, AMD is attempting to clarify the situation.  In a statement sent through email, AMD says that the previous information sent to press "was not clear and has led to some confusion" which is definitely the case.  While it was implied that all customers that bought R9 series graphics cards would get a free copy of BF4, when purchased on or after November 13th, the truth is that "add-in-board partners ultimately decide which select AMD Radeon R9 SKUs will include a copy of BF4."

So, how are you to know what SKUs and cards are actually going to include BF4?  AMD is trying hard to setup a landing page at http://amd.com/battlefield4 that will give gamers clear, and absolute, listings of which R9 series cards include the free copy of the game.  When I pushed AMD for a timeline on exactly when these would be posted, the best I could get was "in the next day or two." 

As for users that bought an R9 280X, R9 270X, R9 270, R9 290X or R9 290 after the announcement of the bundle program changes but DID NOT get a copy of BF4, AMD is going to try and help them out by offering up 1,000 Battlefield 4 keys over AMD's social channels.  The cynic in me thinks this is another ploy to get more Facebook likes and Twitter followers, but in truth the logistics of verifying purchases at this point would be a nightmare for AMD.  Though I don't have details on HOW they are going to distribute these keys, I certainly hope they are going to find a way to target those users that were screwed over in this mess.   Follow www.facebook.com/amdgaming or www.twitter.com/amdradeon for more information on this upcoming promotion.

AMD did send over a couple of links to cards that are currently selling with Battlefield 4 included, as an example of what to look for:

As far as I know, the board partners will also decide which online outlets to offer the bundle through, so even if you see the same SKU on Amazon.com, it may not come with Battlefield 4 as well.  It appears in this case, and going forward, extreme caution is in order when looking for the right card for you.

END UPDATE (11/14/2013)

AMD announced the first Never Settle on October 22nd, 2012 with Sleeping Dogs, Far Cry 3, Hitman: Absolution, and 20% off of Medal of Honor: Warfighter. The deal was valued at around $170. It has exploded since then to become a choose-your-own-bundle across a variety of tiers.

This bundle is mostly different.

AMD-holiday-bundle.png

Basically, apart from the R7 260X (I will get to that later), all applicable cards will receive Battlefield 4. This is a one-game promotion unlike Never Settle. Still, it is one very good game that will soon be accelerated with Mantle in an upcoming patch. It should be a good example of games based on Frostbite 3 for at least the next few years.

The qualifying cards are: R9 270, R9 270X, R9 280, R9 280X, R9 290, and R9 290X. They must be purchased from a participating retailer beginning November 13th.

The R7 260X is slightly different because it is more familiar to Never Settle. It will not have access to a free copy of Battlefield 4. Instead, the R7 260X will have access to two of six Never Settle Forever Silver Tier games: Hitman: Absolution, Sleeping Dogs, Sniper Elite (V2), Far Cry 3: Blood Dragon, DiRT 3, and (for the first time) THIEF. It is possible that other silver-tier Never Settle Forever owners, who have yet to redeem their voucher, might qualify as well. I am not sure about that. Regardless, THIEF was chosen because the developer worked closely with AMD to support both Mantle as well as TrueAudio.

Since this deal half-updates Never Settle and half-doesn't... I am unsure what this means for the future of the bundle. They seem to be simultaneously supporting and disavowing it. My personal expectation is that AMD wants to continue with Never Settle but they just cut their margins too thin with this launch. This will be a good question to revisit later in the GPU lifecycle when margins become more comfortable.

What do you think? Does AMD's hyper-aggressive hardware pricing warrant a temporary suspension of Never Settle? I mean, until today, they were being purchased without any bundle what-so-ever.

Qualifying R9-Series Cards (purchased after Nov 13 from participating retailers) can check out AMD's Battlefield 4 portal.

Qualifying R7 260X owners, on the other hand, can check out the Never Settle Forever portal.

Source: AMD

AMD Mantle Deep Dive Video from AMD APU13 Event

Subject: Graphics Cards | November 13, 2013 - 09:54 PM |
Tagged: video, Mantle, apu13, amd

While attending the AMD APU13 event, an annual developer conference the company uses to promote heterogeneous computing, I got to sit in during a deep dive on the AMD Mantle, a new hardware level API first announced in September.  Rather than attempt to re-explain what was explained quite well, I decided to record the session on video and then intermix the slides presented in a produced video for our readers.

The result is likely the best (and seemingly first) explanation of how Mantle actually works and what it does differently than existing APIs like DirectX and OpenGL.

Also, because we had some requests, I am embedding the live blog we ran during Johan Andersson's keynote from APU13.  Enjoy!

Video: Battlefield 4 Running on AMD A10 Kaveri APU and Image Decoder HSA Acceleration

Subject: Graphics Cards, Processors | November 12, 2013 - 06:10 PM |
Tagged: amd, Kaveri, APU, video, hsa

Yesterday at the AMD APU13 developer conference, the company showed off the upcoming Kaveri APU running Battlefield 4 completely on the integrated graphics.  I was able to push the AMD guys along and get a little more personal demo to share with our readers.  The Kaveri APU had some of its details revealed this week:

  • Quad-core Steamroller x86
  • 512 Stream Processor GPU
  • 856 GFLOPS of theoretical performance
  • 3.7 GHz CPU clock speed, 720 MHz GPU clock speed

AMD wanted to be sure we pointed out in this video that the estimate clock speeds for FLOP performance may not be what the demo system was run at (likely a bit lower).  Also, the version of Battlefield 4 here is the standard retail version and with further improvements from the driver team as the upcoming Mantle API implementation will likely introduce even more performance for the APU.

The game was running at 1920x1080 with MOSTLY medium quality settings (lighting set to low) but the results still looked damn impressive and the frame rates were silky and smooth.  Considering this is running on a desktop with integrated processor graphics, the game play experience is simply unmatched.  

Memory in the system was running at 2133 MHz.

The second demo looks at the image decoding acceleration that AMD is going to enable with Kaveri APUs upon release with a driver.  Essentially, as the demonstration shows in the video, AMD is overwriting the integrated Windows JPG decompression algorithm with a new one that utilizes HSA to accelerate on both the x86 and SIMD (GPU) portions of the silicon.  For the most strenuous demo that used 22 MP images saw a 100% increase in performance compared to the Kaveri CPU cores alone.

Author:
Manufacturer: EVGA

EVGA Brings Custom GTX 780 Ti Early

Reference cards for new graphics card releases are very important for a number of reasons.  Most importantly, these are the cards presented to the media and reviewers that judge the value and performance of these cards out of the gate.  These various articles are generally used by readers and enthusiasts to make purchasing decisions, and if first impressions are not good, it can spell trouble.  Also, reference cards tend to be the first cards sold in the market (see the recent Radeon R9 290/290X launch) and early adopters get the same technology in their hands; again the impressions reference cards leave will live in forums for eternity.

All that being said, retail cards are where partners can differentiate and keep the various GPUs relevant for some time to come.  EVGA is probably the most well known NVIDIA partner and is clearly their biggest outlet for sales.  The ACX cooler is one we saw popularized with the first GTX 700-series cards and the company has quickly adopted it to the GTX 780 Ti, released by NVIDIA just last week

evga780tiacx.jpg

I would normally have a full review for you as soon as we could but thanks to a couple of upcoming trips that will keep me away from the GPU test bed, that will take a little while longer.  However, I thought a quick preview was in order to show off the specifications and performance of the EVGA GTX 780 Ti ACX.

gpuz.png

As expected, the EVGA ACX design of the GTX 780 Ti is overclocked.  While the reference card runs at a base clock of 875 MHz and a typical boost clock of 928 MHz, this retail model has a base clock of 1006 MHz and a boost clock of 1072 MHz.  This means that all 2,880 CUDA cores are going to run somewhere around 15% faster on the EVGA ACX model than the reference GTX 780 Ti SKUs. 

We should note that though the cooler is custom built by EVGA, the PCB design of this GTX 780 Ti card remains the same as the reference models. 

Continue reading our preview of the EVGA GeForce GTX 780 Ti ACX custom-cooled graphics card!!

NVIDIA strikes back!

Subject: Graphics Cards | November 8, 2013 - 04:41 PM |
Tagged: nvidia, kepler, gtx 780 ti, gk110, geforce

Here is a roundup of the reviews of what is now the fastest single GPU card on the planet, the GTX 780 Ti, which is a fully active GK110 chip.  The 7GHz GDDR5 is faster than AMD's memory but use a 384-bit memory bus which is less than the R9 290X which leads to some interesting questions about the performance of this card under high resolutions.  Are you willing to pay quite a bit more for better performance and a quieter card? Check out the performance deltas at [H]ard|OCP and see if that changes your mind at all.

You can see how it measures up in ISUs in Ryan's review as well.

1383802230J422mbwkoS_1_10_l.jpg

"NVIDIA's fastest single-GPU video card is being launched today. With the full potential of the Kepler architecture and GK110 GPU fully unlocked, how will it perform compared to the new R9 290X with new drivers? Will the price versus performance make sense? Will it out perform a TITAN? We find out all this and more."

Here are some more Graphics Card articles from around the web:

Graphics Cards

Source: [H]ard|OCP
Author:
Manufacturer: AMD

An issue of variance

AMD just sent along an email to the press with a new driver to use for Radeon R9 290X and Radeon R9 290 testing going forward.  Here is the note:

We’ve identified that there’s variability in fan speeds across AMD R9 290 series boards. This variability in fan speed translates into variability of the cooling capacity of the fan-sink.

The flexibility of AMD PowerTune technology enables us to correct this variability in a driver update. This update will normalize the fan RPMs to the correct values.

The correct target RPM values are 2200RPM for the AMD Radeon R9 290X ‘Quiet mode’, and 2650RPM for the R9 290. You can verify these in GPU-Z.

If you’re working on stories relating to R9 290 series products, please use this driver as it will reduce any variability in fan speeds. This driver will be posted publicly tonight.

Great!  This is good news!  Except it also creates some questions. 

When we first tested the R9 290X and the R9 290, we discussed the latest iteration of AMD's PowerTune technology. That feature attempts to keep clocks as high as possible under the constraints of temperature and power.  I took issue with the high variability of clock speeds on our R9 290X sample, citing this graph:

clock-avg.png

I then did some digging into the variance and the claims that AMD was building a "configurable" GPU.  In that article we found that there were significant performance deltas between "hot" and "cold" GPUs; we noticed that doing simple, quick benchmarks would produce certain results that were definitely not real-world in nature.  At the default 40% fan speed, Crysis 3 showed 10% variance with the 290X at 2560x1440:

Crysis3_2560x1440_OFPS.png

Continue reading our coverage of the most recent driver changes and how they affect the R9 290X and R9 290!!

AMD Releases Catalyst 13.11 Beta 9.2 Driver To Correct Performance Variance Issue of R9 290 Series Graphics Cards

Subject: Graphics Cards, Cases and Cooling | November 8, 2013 - 02:41 AM |
Tagged: R9 290X, powertune, hawaii, graphics drivers, gpu, GCN, catalyst 13.11 beta, amd, 290x

AMD recently launched its 290X graphics card, which is the new high-end single GPU solution using a GCN-based Hawaii architecture. The new GPU is rather large and incorporates an updated version of AMD's PowerTune technology to automatically adjust clockspeeds based on temperature and a maximum fan speed of 40%. Unfortunately, it seems that some 290X cards available at retail exhibited performance characteristics that varied from review units.

Retail versus Review Sample Performance Variance Testing.jpg

AMD has looked into the issue and released the following statement in response to the performance variances (which PC Perspective is looking into as well).

Hello, We've identified that there's variability in fan speeds across AMD R9 290 series boards. This variability in fan speed translates into variability of the cooling capacity of the fan-sink. The flexibility of AMD PowerTune technology enables us to correct this variability in a driver update. This update will normalize the fan RPMs to the correct values.

The correct target RPM values are 2200RPM for the AMD Radeon R9 290X "Quiet mode", and 2650RPM for the R9 290. You can verify these in GPU-Z. If you're working on stories relating to R9 290 series products, please use this driver as it will reduce any variability in fan speeds. This driver will be posted publicly tonight.

From the AMD statement, it seems to be an issue with fan speeds from card to card causing the performance variances. With a GPU that is rated to run at up to 95C, a fan limited to 40% maximum, and dynamic clockspeeds, it is only natural that cards could perform differently, especially if case airflow is not up to par. On the other hand, the specific issue pointed out by other technology review sites (per my understanding, it was initially Tom's Hardware that reported on the retail vs review sample variance) is  an issue where the 40% maximum on certain cards is not actually the RPM target that AMD intended.

AMD intended for the Radeon R9 290X's fan to run at 2200RPM (40%) in Quiet Mode and the fan on the R9 290 (which has a maximum fan speed percentage of 47%) to spin at 2650 RPM in Quiet Mode. However, some cards 40% values are not actually hitting those intended RPMs, which is causing performance differences due to cooling and PowerTune adjusting the clockspeeds accordingly.

Luckily, the issue is being worked on by AMD, and it is reportedly rectified by a driver update. The driver update ensures that the fans are actually spinning at the intended speed when set to the 40% (R9 290X) or 47% (R9 290) values in Catalyst Control Center. The new driver, which includes the fix, is version Catalyst 13.11 Beta 9.2 and is available for download now. 

If you are running a R9 290 or R9 290X in your system, you should consider updating to the latest driver to ensure you are getting the cooling (and as a result gaming) performance you are supposed to be getting.

Catalyst 13.11 Beta 9.2 is available from the AMD website.

Also read:

Stay tuned to PC Perspective for more information on the Radeon R9 290 series GPU performance variance issue as it develops.

Image credit: Ryan Shrout (PC Perspective).

Source: AMD
Author:
Manufacturer: NVIDIA

GK110 in all its glory

I bet you didn't realize that October and November were going to become the onslaught of graphics cards it has been.  I know I did not and I tend to have a better background on these things than most of our readers.  Starting with the release of the AMD Radeon R9 280X, 270X and R7 260X in the first week of October, it has pretty much been a non-stop battle between NVIDIA and AMD for the hearts, minds, and wallets of PC gamers. 

Shortly after the Tahiti refresh came NVIDIA's move into display technology with G-Sync, a variable refresh rate feature that will work with upcoming monitors from ASUS and others as long as you have a GeForce Kepler GPU.  The technology was damned impressive, but I am still waiting for NVIDIA to send over some panels for extended testing. 

Later in October we were hit with the R9 290X, the Hawaii GPU that brought AMD back in the world of ultra-class single GPU card performance.  It has produced stellar benchmarks and undercut the prices (then at least) of the GTX 780 and GTX TITAN.  We tested it in both single and multi-GPU configurations and found that AMD had made some impressive progress in fixing its frame pacing issues, even with Eyefinity and 4K tiled displays. 

NVIDIA dropped a driver release with ShadowPlay that allows gamers to record playback locally without a hit on performance.  I posted a roundup of R9 280X cards which showed alternative coolers and performance ranges.  We investigated the R9 290X Hawaii GPU and the claims that performance is variable and configurable based on fan speeds.  Finally, the R9 290 (non-X model) was released this week to more fanfare than the 290X thanks to its nearly identical performance and $399 price tag. 

IMG_1862.JPG

And today, yet another release.  NVIDIA's GeForce GTX 780 Ti takes the performance of the GK110 and fully unlocks it.  The GTX TITAN uses one fewer SMX and the GTX 780 has three fewer SMX units so you can expect the GTX 780 Ti to, at the very least, become the fastest NVIDIA GPU available.  But can it hold its lead over the R9 290X and validate its $699 price tag?

Continue reading our review of the NVIDIA GeForce GTX 780 Ti 3GB GK110 Graphics Card!!

NVIDIA Grid GPUs Available for Amazon EC2

Subject: General Tech, Graphics Cards, Systems | November 5, 2013 - 09:33 PM |
Tagged: nvidia, grid, AWS, amazon

Amazon Web Services allows customers (individuals, organizations, or companies) to rent servers of certain qualities to match their needs. Many websites are hosted at their data centers, mostly because you can purchase different (or multiple) servers if you have big variations in traffic.

I, personally, sometimes use it as a game server for scheduled multiplayer events. The traditional method is spending $50-80 USD per month on a... decent... server running all-day every-day and using it a couple of hours per week. With Amazon EC2, we hosted a 200 player event (100 vs 100) by purchasing a dual-Xeon (ironically the fastest single-threaded instance) server connected to Amazon's internet backbone by 10 Gigabit Ethernet. This server cost just under $5 per hour all expenses considered. It was not much of a discount but it ran like butter.

nvidia-grid-bracket.png

This leads me to today's story: NVIDIA GRID GPUs are now available at Amazon Web Services. Both companies hope their customers will use (or create services based on) these instances. Applications they expect to see are streamed games, CAD and media creation, and other server-side graphics processing. These Kepler-based instances, named "g2.2xlarge", will be available along side the older Fermi-based Cluster Compute Instances ("cg1.4xlarge").

It is also noteworthy that the older Fermi-based Tesla servers are about 4x as expensive. GRID GPUs are based on GK104 (or GK107, but those are not available on Amazon EC2) and not the more compute-intensive GK110. It would probably be a step backwards for customers intending to perform GPGPU workloads for computational science or "big data" analysis. The newer GRID systems do not have 10 Gigabit Ethernet, either.

So what does it have? Well, I created an AWS instance to find out.

aws-grid-cpu.png

Its CPU is advertised as an Intel E5-2670 with 8 threads and 26 Compute Units (CUs). This is particularly odd as that particular CPU is eight-core with 16 threads; it is also usually rated by Amazon at 22 CUs per 8 threads. This made me wonder whether the CPU is split between two clients or if Amazon disabled Hyper-Threading to push the clock rates higher (and ultimately led me to just log in to an instance and see). As it turns out, HT is still enabled and the processor registers as having 4 physical cores.

The GPU was slightly more... complicated.

aws-grid-gpu.png

NVIDIA control panel apparently does not work over remote desktop and the GPU registers as a "Standard VGA Graphics Adapter". Actually, two are available in Device Manager although one has the yellow exclamation mark of driver woe (random integrated graphics that wasn't disabled in BIOS?). GPU-Z was not able to pick much up from it but it was of some help.

Keep in mind: I did this without contacting either Amazon or NVIDIA. It is entirely possible that the OS I used (Windows Server 2008 R2) was a poor choice. OTOY, as a part of this announcement, offers Amazon Machine Image (AMI)s for Linux and Windows installations integrated with their ORBX middleware.

I spot three key pieces of information: The base clock is 797 MHz, the memory size is 2990 MB, and the default drivers are Forceware 276.52 (??). The core and default clock rate, GK104 and 797 MHz respectively, are characteristic of the GRID K520 GPU with its 2 GK104 GPUs clocked at 800 MHz. However, since the K520 gives each GPU 4GB and this instance only has 3GB of vRAM, I can tell that the product is slightly different.

I was unable to query the device's shader count. The K520 (similar to a GeForce 680) has 1536 per GPU which sounds about right (but, again, pure speculation).

I also tested the server with TCPing to measure its networking performance versus the cluster compute instances. I did not do anything like Speedtest or Netalyzr. With a normal cluster instance I achieve about 20-25ms pings; with this instance I was more in the 45-50ms range. Of course, your mileage may vary and this should not be used as any official benchmark. If you are considering using the instance for your product, launch an instance and run your own tests. It is not expensive. Still, it seems to be less responsive than Cluster Compute instances which is odd considering its intended gaming usage.

Regardless, now that Amazon picked up GRID, we might see more services (be it consumer or enterprise) which utilizes this technology. The new GPU instances start at $0.65/hr for Linux and $0.767/hr for Windows (excluding extra charges like network bandwidth) on demand. Like always with EC2, if you will use these instances a lot, you can get reduced rates if you pay a fee upfront.

Official press blast after the break.

Source: NVIDIA