GTC 2013: Pedraforca Is A Power Efficient ARM + GPU Cluster For Homogeneous (GPU) Workloads

Subject: General Tech, Graphics Cards | March 20, 2013 - 01:47 PM |
Tagged: tesla, tegra 3, supercomputer, pedraforca, nvidia, GTC 2013, GTC, graphics cards, data centers

There is a lot of talk about heterogeneous computing at GTC, in the sense of adding graphics cards to servers. If you have HPC workloads that can benefit from GPU parallelism, adding GPUs gives you computing performance in less physical space, and using less power, than a CPU only cluster (for equivalent TFLOPS).

However, there was a session at GTC that actually took things to the opposite extreme. Instead of a CPU only cluster or a mixed cluster, Alex Ramirez (leader of Heterogeneous Architectures Group at Barcelona Supercomputing Center) is proposing a homogeneous GPU cluster called Pedraforca.
Pedraforca V2 combines NVIDIA Tesla GPUs with low power ARM processors. Each node is comprised of the following components:

  • 1 x Mini-ITX carrier board
  • 1 x Q7 module (which hosts the ARM SoC and memory)
    • Current config is one Tegra 3 @ 1.3GHz and 2GB DDR2
  • 1 x NVIDIA Tesla K20 accelerator card (1170 GFLOPS)
  • 1 x InfiniBand 40Gb/s card (via Mellanox ConnectX-3 slot)
  • 1 x 2.5" SSD (SATA 3 MLC, 250GB)

The ARM processor is used solely for booting the system and facilitating GPU communication between nodes. It is not intended to be used for computing. According to Dr. Ramirez, in situations where running code on a CPU would be faster, it would be best to have a small number of Intel Xeon powered nodes to do the CPU-favorable computing, and then offload the parallel workloads to the GPU cluster over the InfiniBand connection (though this is less than ideal, Pedraforca would be most-efficient with data-sets that can be processed solely on the Tesla cards).

DSCF2421.JPG

While Pedraforca is not necessarily locked to NVIDIA's Tegra hardware, it is currently the only SoC that meets their needs. The system requires the ARM chip to have PCI-E support. The Tegra 3 SoC has four PCI-E lanes, so the carrier board is using two PLX chips to allow the Tesla and InfiniBand cards to both be connected.

The researcher stated that he is also looking forward to using NVIDIA's upcoming Logan processor in the Pedraforca cluster. It will reportedly be possible to upgrade existing Pedraforca clusters with the new chips by replacing the existing (Tegra 3) Q7 module with one that has the Logan SoC when it is released.

Pedraforca V2 has an initial cluster size of 64 nodes. While the speaker was reluctant to provide TFLOPS performance numbers, as it would depend on the workload, with 64 Telsa K20 cards, it should provide respectable performance. The intent of the cluster is to save power costs by using a low power CPU. If your sever kernel and applications can run on GPUs alone, there are noticeable power savings to be had by switching from a ~100W Intel Xeon chip to a lower-power (approximately 2-3W) Tegra 3 processor. If you have a kernel that needs to run on a CPU, it is recommended to run the OS on an Intel server and transfer just the GPU work to the Pedraforca cluster. Each Pedraforca node is reportedly under 300W, with the Tesla card being the majority of that figure. Despite the limitations, and niche nature of the workloads and software necessary to get the full power-saving benefits, Pedraforca is certainly an interesting take on a homogeneous server cluster!

DSCF2413.JPG

In another session relating to the path to exascale computing, power use in data centers was listed as one of the biggest hurdles to getting to Exaflop-levels of performance, and while Pedraforca is not the answer to Exascale, it should at least be a useful learning experience at wringing the most parallelism out of code and pushing GPGPU to the limits. And that research will help other clusters use the GPUs more efficiently as researchers explore the future of computing.

The Pedraforca project built upon research conducted on Tibidabo, a multi-core ARM CPU cluster, and CARMA (CUDA on ARM development kit) which is a Tegra SoC paired with an NVIDIA Quadro card. The two slides below show CARMA benchmarks and a Tibidabo cluster (click on image for larger version).

Stay tuned to PC Perspective for more GTC 2013 coverage!

 

GTC 2013: TYAN Launches New HPC Servers Powered by Kepler-based Tesla Cards

Subject: General Tech, Graphics Cards | March 19, 2013 - 06:52 PM |
Tagged: GTC 2013, tyan, HPC, servers, tesla, kepler, nvidia

Server platform manufacturer TYAN is showing off several of its latest servers aimed at the high performance computing (HPC) market. The new servers range in size from 2U to 4U chassis and hold up to 8 Kepler-based Tesla accelerator cards. The new product lineup consists of two motherboards and three bare-bones systems. The S7055 and S7056 are the motherboards while the FT77-B7059, TA77-B7061, and FT48-B7055.

FT48_B7055_3D_2_Rev2_S.jpg

The TA77-B7061 is the smallest system, with support for two Intel Xeon E5-2600 processors and four Kepler-based Tesla accelerator cards. The FT48-B7055 has si7056 specifications but is housed in a 4U chassis. Finally, the FT77-B7059 is a 4U system with support for two Intel Xeon E5-2600 processors, and up to eight Tesla accelerator cards. The S7055 supports a maximum of 4 GPUs while the S7056 can support two Tesla cards, though these are bare boards so you will have to supply your own cards, processors, and RAM (of course).

FT77A-B7059_3D_S.jpg

According to TYAN, the new Kepler-based HPC systems will be available in Q2 2013, though there is no word on pricing yet.

Stay tuned to PC Perspective for further GTC 2013 Coverage!

GTC 2013: Jen-Hsun Huang Takes the Stage to Discuss NVIDIA's Future, New Hardware

Subject: General Tech, Graphics Cards | March 19, 2013 - 02:55 PM |
Tagged: unified virtual memory, ray tracing, nvidia, GTC 2013, grid vca, grid, graphics cards

Today, NVIDIA's CEO Jen-Hsun Huang stepped on stage to present the GTC keynote. In the presentation (which was live streamed on the GTC website and archived here.), NVIDIA discussed five major points, looking back over 2013 and into the future of its mobile and professional products. In addition to the product roadmap, NVIDIA discussed the state of computer graphics and GPGPU software. Remote graphics and GPU virtualization was also on tap. Finally, towards the end of the Keynote, the company revealed its first appliance with the NVIDIA GRID VCA. The culmination of NVIDIA's GRID and GPU virtualization technology, the VCA is a device that hosts up to 16 virtual machines which each can tap into one of 16 Kepler-based graphics processors (8 cards, 16 GPUs per card) to fully hardware accelerate software running of the VCA. Three new mobile Tegra parts and two new desktop graphics processors were also hinted at, with improvements to power efficiency and performance.

DSCF2303.JPG

On the desktop side of things, NVIDIA's roadmap included two new GPUs. Following Kepler, NVIDIA will introduce Maxwell and Volta. Maxwell will feature a new virtualized memory technology called Unified Virtual Memory. This tech will allow both the CPU and GPU to read from a single (virtual) memory store. Much as with the promise of AMD's Kaveri APU, the Unified Virtual Meory will result in speed improvements in heterogeneous applications because data will not have to be copied to/from the GPU and CPU in order for the data to be processed. Server applications will really benefit from the shared memory tech. NVIDIA did not provide details, but from the sound of it, the CPU and GPU both continue to write to their own physical memory, but their is a layer of virtualized memory on top of that, that will allow the two (or more) different processors to read from each other's memory store.
Following Maxwell, Volta will be a physically smaller chip with more transistors (likely a smaller process node). In addition to the power efficiency improvements over Maxwell, it steps up the memory bandwidth significantly. NVIDIA will use TSV (through silicon via) technology to physically mount the graphics DRAM chips over the GPU (attached to the same silicon substrate electrically). According to NVIDIA, this new TSV-mounted memory will achieve up to 1 Terabytes/second of memory bandwidth, which is a notable increase over existing GPUs.

DSCF2354.JPG

NVIDIA continues to pursue the mobile market with its line of Tegra chips that pair an ARM CPU, NVIDIA GPU, and SDR modem. Two new mobile chips called Logan and Parker will follow Tegra 4. Both new chips will support the full CUDA 5 stack and OpenGL 4.3 out of the box. Logan will feature a Kepler-based graphics porcessor on the chip that can “everything a modern computer ought to do” according to NVIDIA. Parker will have a yet-to-be-revealed graphics processor (Kepler successor). This mobile chip will utilize 3D FinFET transistors. It will have a greater number of transistors in a smaller package than previous Tegra parts (it will be about the size of a dime), and NVIDIA also plans to ramp up the frequency to wrangle more performance out of the mobile chip. NVIDIA has stated that Logan silicon should be completed towards the end of 2013, with the mobile chips entering production in 2014.

DSCF2371.JPG

Interestingly, Logan has a sister chip that NVIDIA is calling Kayla. This mobile chip is capable of running ray tracing applications and features OpenGL geometric shaders. It can support GPGPU code and will be compatible with Linux.

NVIDIA has been pushing CUDA for several years, now. The company has seen some respectable adoption rates, by growing from 1 Tesla supercomputer in 2008 to its graphics cards being used in 50 supercomputers, with 500 million CUDA processors on the market. There are now allegedly 640 universities working with CUDA and 37,000 academic papers on CUDA.

DSCF2331.JPG

Finally, NVIDIA's hinted-at new product announcement was the NVIDIA VCA, which is a GPU virtualization appliance that hooks into the network and can deliver up to 16 virtual machines running independant applications. These GPU accelerated workspaces can be presneted to thin clinets over the netowrk by installing the GRID client software on users' workstations. The specifications of the GRID VCA is rather impressive, as well.

The GRID VCA features:

  • 2 x Intel Xeon processors with 16 threads each (32 total threads)
  • 192GB to 384GB of system memory
  • 8 Kepler-based graphics cards, with two GPUs each (16 total GPUs)
  • 16 x GPU-accelerated virtual machines

The GRID VCA fits into a 4U case. It can deliver remote graphics to workstations, and is allegedly fast enough to deliver gpu accelerated software that is equivalent to having it run on the local machine (at least over LAN). The GRID Visual Computing Appliance will come in two flavors at different price points. The first will have 8 Kepler GPUs with 4GB of memory each, 16 CPU threads, and 192GB of system memory for $24,900. The other version will cost $34,900 and features 16 Kepler GPUs (4GB memory), 32 CPU threads, and 384GB system memory. On top of the hardware cost, NVIDIA is also charging licensing fees. While both GRID VCA devices can support unlimited devices, the licenses cost $2,400 and $4,800 per year respectively.

DSCF2410.JPG

Overall, it was an interesting keynote, and the proposed graphics cards look to be offering up some unique and necessary features that should help hasten the day of ubiquitous general purpose GPU computing. The Unified Virtual Memory was something I was not expecting, and it will be interesting to see how AMD responds. AMD is already promising shared memory in its Kaveri APU, but I am interested to see the details of how NVIDIA and AMD will accomplish shared memory with dedicated grapahics cards (and whether CrossFire/SLI setups will all have a single shared memory pool)..

Stay tuned to PC Perspective for more GTC 2013 Coverage!

GTC 2013: Prepare for Graphics Overload

Subject: General Tech, Graphics Cards, Mobile, Shows and Expos | March 18, 2013 - 09:10 PM |
Tagged: GTC 2013, nvidia

We just received word from Tim Verry, our GTC correspondent and news troll, about his first kick at the conference. This... is his story.

Graphics card manufacturer, NVIDIA, is hosting its annual GPU Technology Conference (GTC 2013) in San Jose, California this week. PC Perspective will be roaming the exhibit floor and covering sessions as NVIDIA and its partners discuss upcoming graphics technologies, GPGPU, programming, and a number of other low level computing topics.

gtc2013-intro.png

The future... is tomorrow!

A number of tech companies will be on site and delivering presentations to show off their latest Kepler-based systems. NVIDIA will deliver its keynote presentation tomorrow for the press, financial and industry analysts, and business partners to provide a glimpse at the green team's roadmap throughout 2013 - and maybe beyond.

We cannot say for certain what NVIDIA will reveal during its keynote; but, since we have not been briefed ahead of time, we are completely free to speculate! I think one certainty is the official launch of the Kepler-based K6000 workstation card; for example. While I do not expect to see Maxwell, we could possibly see a planned refresh of the Kepler-based components with some incremental improvements: I predict power efficiency over performance. Perhaps we will receive a cheaper Titan-like consumer card towards the end of 2013? Wishful thinking on my part? A refresh of its GK104 architecture would be nice to see as well, even if actual hardware will not show up until next year. I expect that NVIDIA will react to whatever plans AMD has to decide whether it is in their interest to match them or not.

I do expect to see more information on GRID and Project SHIELD, however. NVIDIA has reportedly broadened the scope of this year's conference to include mobile sessions: expect Tegra programming and mobile GPGPU goodness to be on tap.

It should be an interesting week of GPU news. Stay tuned to PC Perspective for more coverage as the conference gets underway.

What are you hoping to see from NVIDIA at GTC 2013?

ASUS HD 7970 DirectCU II versus a dual linked Dell 3007WFP

Subject: Graphics Cards | March 18, 2013 - 03:17 PM |
Tagged: 2560x1600, amd, hd7970 direct cu 2, asus, dell, 3007WFP

[H]ard|OCP has wanted to publish their review of the ASUS HD 7970 DirectCU II for a while but ran into a compatibility issue during their testing and ended up being a perfect example of what sometimes happens to review sites and enthusiasts on the bleeding edge.  [H] uses a Dell 3007WFP with a resolution of 2560x1600 which necessitates the use of a dual link DVI connection, which cause the issue you can see below.  No other setup seemed to reproduce this problem, even the same monitor on a single link DVI at 1920x1080 or at the higher resolution on Display Port would not display the issue.  So what began as a review of an HD 7970 with some nice extra features from ASUS became a long session of troubleshooting.  Take a read through the review as these cards should be back in stock over the next few months, very likely with a solution to this problem already incorporated.

Hoops.jpg

"Today we have the ASUS HD 7970 DirectCU II strapped to our test bench for your reading pleasure. We will compare it to the AMD Radeon HD 7970 GHz Edition and to the NVIDIA GeForce GTX 680 to determine whether the custom VRMs and DirectCU II cooling solution are the droids you are looking for in your next graphics card purchase."

Here are some more Graphics Card articles from around the web:

Graphics Cards

Source: [H]ard|OCP

NVIDIA Allegedly Launching Quadro K6000 GK110 GPU For Professionals

Subject: Graphics Cards | March 8, 2013 - 09:17 AM |
Tagged: quadro, nvidia, kepler, k6000, gk110

Earlier this week, NVIDIA updated its Quadro line of workstation cards with new GPUs with GK104 “Kepler” cores. The updated line introduced four new Kepler cards, but the Quadro 6000 successor was notably absent from the NVIDIA announcement. If rumors hold true, professionals may get access to a K6000 Quadro card after all, and one that is powered by GK110 as well.

GK110 Block Diagram.jpg

According to rumors around the Internet, NVIDIA has reserved its top-end Quadro slot for a GK110-based graphics card. Dubbed the K6000 (and in line with the existing Kepler Quadro cards), the high-end workstation card will feature 13 SMX units, 2,496 CUDA cores, 192 Texture Manipulation Units, 40 Raster Operations Pipeline units, and a 320-bit memory bus. The K6000 card will likely have 5GB of GDDR5 memory, like its Tesla K20 counterpart. Interestingly, this Quadro K6000 graphics card has one less SMX unit than NVIDIA’s Tesla K20X and even NVIDIA’s consumer-grade GTX Titan GPU. A comparison between the rumored K6000 card, the Quadro K5000 (GK104), and other existing GK110 cards is available in the table below. Also, note that the (rumored) K6000 specs put it more in like with the Tesla K20 than the K20X, but as it is the flagship Quadro card I felt it was still fair to compare it to the flagship Telsa and GeForce cards.

  Quadro K6000 Tesla K20X GTX Titan GK110 Full   (Not available yet) Quadro K5000
SMX Units 13 14 14 15 8
CUDA Cores 2,496 2,688 2,688 2,880 1536
TMUs 192 224 224 256 128
ROPs 40 48 48 48 32
Memory Bus 320-bit 384-bit 384-bit 384-bit 256-bit
DP TFLOPS ~1.17 TFLOPS 1.31 TFLOPS 1.31 TFLOPS ~1.4 TFLOPS .09 TFLOPS
Core GK110 GK110 GK110 GK110 GK104

The Quadro cards are in an odd situation when it comes to double precision floating point performance. The Quadro K5000 which uses GK104 brings an abysmal 90 GFLOPS of double precision. The rumored GK110-powered Quadro K6000 brings double precision performance up to approximately 1 TFLOPS, which is quite the jump and shows that GK104 really was cut down to focus on gaming performance! Further, the card that the K6000 is replacing in name, the Quadro 6000 (no prefixed K), is based on NVIDIA’s previous-generation Fermi architecture and offers .5152 TFLOPS (515.2 GFLOPS) of double precision performance. On the plus side, users can expect around 3.5 TFLOPS of single precision horsepower, which is a substantial upgrade over Quadro 6000's 1.03 TFLOPS of single precision floating point. For comparison, the GK104-based Quadro K5000 offers 2.1 TFLOPS of single precision. Although it's no full GK110, it looks to be the Quadro card to beat for the intended usage.

nvidia-quadro-k5000 GPU.jpg

Of course, Quadro is more about stable drivers, beefy memory, and single precision than double precision, but it would be nice to see the expensive Quadro workstation cards have the ability to pull double duty, as it were. NVIDIA’s Tesla line is where DP floating point is key. It is just a rather wide gap between the two lineups that the K6000 somewhat closes, fortunately. I would have really liked to see the K6000 have at least 14 SMX units, to match consumer Titan and the Tesla K20X, but rumors are not looking positive in that regard. Professionals should expect to see quite the premium with the K6000 versus the Titan, despite the hardware differences. It will likely be sold for around $3,000.

No word on availability, but the card will likely be released soon in order to complete the Kepler Quadro lineup update. 

NVIDIA Refreshes Quadro with Kepler

Subject: General Tech, Graphics Cards | March 6, 2013 - 08:02 PM |
Tagged: quadro, nvidia

KeplerQuadroTop.png

Be polite, be efficient, have a plan to Kepler every card that you meet.

The professional graphics market is not designed for gamers although that should have been fairly clear. These GPUs are designed to effectively handle complex video, 3D, and high resolution display environments found in certain specialized workspaces.

This is the class of cards which allow a 3D animator to edit their creations with stereoscopic 3D glasses, for instance.

NVIDIA's branding will remain consistent with the scheme developed for the prior generation. Previously, if you were in the market for a Fermi-based Quadro solution, you would have the choice between: the Quadro 600, the 2000, the 4000, the 5000, and the 6000. Now that the world revolves around Kepler... heh heh heh... each entry has been prefixed with a K with the exception of the highest-end 6000 card. These entries are therefore:

  • Quadro K600, 192 CUDA Cores, 1GB, $199 MSRP
  • Quadro K2000, 384 CUDA Cores, 2GB, $599 MSRP
  • Quadro K4000, 768 CUDA Cores, 3GB, $1,269 MSRP
  • Quadro K5000, 1536 CUDA Cores, 4GB + ECC, $2,249 MSRP

This product line is demonstrated graphically by the NVIDIA slide below.

KeplerQuadro.png

Clicking the image while viewing the article will enlargen it.

It should be noted that each of the above products have been developed on the series of GK10X architectures and not the more computationally-intensive GK110 products. As the above slide alludes: while these Quadro cards are designed to handle the graphically-intensive applications, they are designed to be paired with GK110-based Tesla K20 cards to offload the GPGPU muscle.

Should you need the extra GPGPU performance, particularly when it comes to double precision mathematics, those cards can be found online for somewhere in the ballpark of $3,300 and $3,500.

The new Quadro products were available starting yesterday, March 5th, from “leading OEM and Channel Partners.”

Source: NVIDIA

A year of GeForce drivers reviewed

Subject: Graphics Cards | March 5, 2013 - 02:28 PM |
Tagged: nvidia, geforce, graphics drivers

After evaluating the evolution of AMD's drivers over 2012, [H]ard|OCP has now finalized their look at NVIDIA's offerings over the past year.  They chose a half dozen drivers spanning March to December, tested on both the GTX680 and GTX 670.  As you can see throughout the review, NVIDIA's performance was mostly stable apart from the final driver of 2012 which provided noticeably improved performance in several games.  [H] compared the frame rates from both companies on the same chart and it makes the steady improvement of AMD's drivers over the year even more obvious.  That does imply that AMD's initial drivers for this year needed improvement and that perhaps the driver team at AMD has a lot of work cut out for them in 2013 if they want to reach a high level of performance across the board, with game specific improvements offering the only deviation in performance.

H_Geforce.jpg

"We have evaluated AMD and NVIDIA's 2012 video card driver performances separately. Today we will be combining these two evaluations to show each companies full body of work in 2012. We will also be looking at some unique graphs that show how each video cards driver improved or worsened performance in each game throughout the year."

Here are some more Graphics Card articles from around the web:

Graphics Cards

Source: [H]ard|OCP

PCPer Live! Tomb Raider Game Stream - Win Games and Graphics Cards from AMD!

Subject: Graphics Cards | March 4, 2013 - 04:31 PM |
Tagged: video, tomb raider, tahiti, radeon, never settle reloaded, live, crysis, amd

UPDATE: Missed the live stream?  Relive the incredible experience right here!  

On March 5th on the PC Perspective Live! page we will be streaming some game action of the new Tomb Raider.  In what might be one of the most impressive game series reboots in history, this iteration of the action-adventure gameplay is definitely the most impressive looking to date.  And don't forget all the hair animation we are likely to see...

We will be teaming up with AMD once again to provide a fun and exciting PCPer Game Stream that includes game demonstrations and of course, prizes and game keys for those that watch the event LIVE! 

UPDATE: We are excited to announce that Crystal Dynamics' Brian Horton, Senior Art Director for Tomb Raider, will be joining us on the PC Perpsective Game Stream to answer questions and to give us more detail on the visual effects at work in the PC version of the game!

tr1.jpg

Tomb Raider Game Stream

5pm PT / 8pm ET - March 5th

PC Perspective Live! Page

Warning: this one will DEFINITELY have mature language and content!!

The stream will be sponsored by AMD and its Never Settle Reloaded game bundles which we previously told you about.  Depending on the AMD Radeon HD 7000 series GPU that you buy, you could get some amazing free games including:

  • Radeon HD 7900 Series
    • FREE Crysis 3
    • FREE Bioshock Infinite
  • Radeon HD 7800 Series
    • FREE Bioshock Infinite
    • FREE Tomb Raider
  • Radeon HD 7900 CrossFire Set
    • FREE Crysis 3
    • FREE Bioshock Infinite
    • FREE Tomb Raider
    • FREE Far Cry 3
    • FREE Hitman: Absolution
    • FREE Sleeping Dogs

nsr_matrix.jpg

AMD's Antal Tungler (@ColoredRocks on twitter) will be joining us via Skype to talk about the game's technology, performance considerations as well as helping me with some co-op gaming!

Of course, just to sweeten the deal a bit we have some prizes lined up for those of you that participate in our Tomb Raider Game Stream:

  • 2 x Gigabyte Radeon HD 7870 OC 2GB graphics cards (plus Tomb Raider & Bioshock Infinite)
  • 1 x HIS 7850 iPower IceQ Turbo 4GB graphics card (plus Tomb Raider & Bioshock Infinite)
  • 3 x Combo codes for both Tomb Raider AND Bioshock Infinite

gboc.png

Pretty nice, huh?  All you have to do to win is be present on the PC Perspective Live! Page during the event as we will announce both the content/sweepstakes method AND the winners!

Stop in on March 5h for some PC gaming fun!!

tr2.jpg

AMD Releases the FIREPRO R5000: PCoIP and Teradici

Subject: Graphics Cards | February 27, 2013 - 09:42 PM |
Tagged: workstations, virtualization, Teradici, remote management, R5000, pitcairn, PCoIP, firepro, amd

 

A few days back AMD released one of their latest FIREPRO workstation graphics cards.  For most users out there this will be received with a bit of a shrug.  This release is a bit different though, and it reflects a change in direction in the PC market.  The original PC freed users from mainframes and made computing affordable for most people.  Today we are seemingly heading back to the mainframe/thin client setup of yore, but with hardware and connectivity that obviously was not present in the late 70s.  The FIREPRO R5000 is hoping to redefine remote graphics.

r5000.jpg

Today’s corporate environment is chaotic when it comes to IT systems.  The amount of malware, poor user decisions, and variability in software and hardware configurations is a constant headache to IT workers.  A big push it to make computing more centralized in the company with easy oversight from IT workers.  Servers with multiple remote users can be more easily updated and upgraded than going to individual PCs around the offices to do the same work.  This is good for a lot of basic users, but it does not address the performance needs of power users who typically run traditional workstations.

remote.jpg

AMD hopes to change that thinking with the R5000.  This is a Pitcairn based product (7800 series on the desktop) that is built to workstation standards.  It also features a secret weapon; the Teradici TERA2240 host processor.  Teradici is a leader in PCoIP technology.  PCoIP is simply “PC over IP”.  Instead of a traditional remote host which limits performance and desktop space, Teradici developed PCoIP to more adequately send large amounts of pixel data over a network.  The user essentially is able to leverage the power of a modern GPU rather than rely on the more software based rendering of remote sessions.  The user has a thin client provided by a variety of OEMs to choose from and they connect directly over IP.

tera_pcoip.jpg

The advantages here is that the GPU is again used to its full potential, which is key for those doing heavy video editing work, 3D visualization, and CADD type workloads.  The latest R5000 can support resolutions up to 2560x1600 up to two displays.  The same card can support 1920x1200 on four displays.  It supports upwards of 60 fps in applications.  The TERA2240 essentially encodes the output and streams it over IP.  The thin client re-encodes the stream and displays the results.  This promises very low latency over smaller networks, and very manageable latency over large or wide area networks.

res_vs_bw.jpg

The downside here is that one client at a time can connect to the card.  The card cannot be virtualized as such so that multiple users can access the resources of the GPU.  The card CAN run in a virtualized environment, but it is again limited to one client per card.  Multiple cards can be placed in each server and the hardware is then placed in its own VM.  While this makes management of hardware a bit easier, it still is an expensive solution when it comes to a per user basis.  Where efficiency may be regained is when it is placed in an environment where shift work takes place.   Or another setting is a University where these cards are housed in high powered servers away from classrooms so cooling and sound are not issues impeding learning.

tera_specs.jpg

Source: AMD

AMD and Crystal Dynamics Use TressFX to bring GPU Compute to Hair Simulation

Subject: Graphics Cards | February 26, 2013 - 10:04 AM |
Tagged: amd, tressfx, lara croft, tomb raider, crystal dynamics

Last week we got an email from AMD teasing an upcoming technology called TressFX that had something to do with hair and something to do with graphics.  It should come as no surprise today that AMD has announced that TressFX is a hair modeling technology that utilized DirectCompute for simulation.  The proper rendering of hair has been a thorn in the side of game developers for decades now and it seems that with every generation of GPU released by either NVIDIA or AMD/ATI we would see a tech demo about how hair modeling "has been changed forever."

This time though, we are seeing the technology in a AAA gaming title.

lara1.jpg

TressFX Hair revolutionizes Lara Croft’s locks by using the DirectCompute programming language to unlock the massively-parallel processing capabilities of the Graphics Core Next architecture, enabling image quality previously restricted to pre-rendered images. Building on AMD’s previous work on Order Independent Transparency (OIT), this method makes use of Per-Pixel Linked-List (PPLL) data structures to manage rendering complexity and memory usage.

lara2.jpg

DirectCompute is additionally utilized to perform the real-time physics simulations for TressFX Hair. This physics system treats each strand of hair as a chain with dozens of links, permitting for forces like gravity, wind and movement of the head to move and curl Lara’s hair in a realistic fashion. Further, collision detection is performed to ensure that strands do not pass through one another, or other solid surfaces such as Lara’s head, clothing and body. Finally, hair styles are simulated by gradually pulling the strands back towards their original shape after they have moved in response to an external force.

It's a lot of technology for a little bit of rendering - but realistic hair presents a very unique problem and I am very interested to see this in action when Tomb Raider releases on March 5th. 

I asked AMD a couple of questions including if this was going to be a technology that NVIDIA users would be missing out on.  Their response?  "We don't create features that lock out other vendors."  That doesn't mean GTX 600-series card users will have access to this accelerated hair technology or that it will perform similarly if they do, but I'll take a look when I get my hands on the game.

lara4.jpg

We are hoping to get some video to go along with our screenshots as I think that will have a stronger impact.  You can find more details on AMD's TressFX landing page

Source: AMD

NVIDIA Details Tegra 4 and Tegra 4i Graphics

Subject: Graphics Cards | February 25, 2013 - 08:01 PM |
Tagged: nvidia, tegra, tegra 4, Tegra 4i, pixel, vertex, PowerVR, mali, adreno, geforce

 

When Tegra 4 was introduced at CES there was precious little information about the setup of the integrated GPU.  We all knew that it would be a much more powerful GPU, but we were not entirely sure how it was set up.  Now NVIDIA has finally released a slew of whitepapers that deal with not only the GPU portion of Tegra 4, but also some of the low level features of the Cortex A15 processor.  For this little number I am just going over the graphics portion.

layout.jpg

This robust looking fellow is the Tegra 4.  Note the four pixel "pipelines" that can output 4 pixels per clock.

The graphics units on the Tegra 4 and Tegra 4i are identical in overall architecture, just that the 4i has fewer units and they are arranged slightly differently.  Tegra 4 is comprised of 72 units, 48 of which are pixel shaders.  These pixel shaders are VLIW based VEC4 units.  The other 24 units are vertex shaders.  The Tegra 4i is comprised of 60 units, 48 of which are pixel shaders and 12 are vertex shaders.  We knew at CES that it was not a unified shader design, but we were still unsure of the overall makeup of the part.  There are some very good reasons why NVIDIA went this route, as we will soon explore.

If NVIDIA were to transition to unified shaders, it would increase the overall complexity and power consumption of the part.  Each shader unit would have to be able to handle both vertex and pixel workloads, which means more transistors are needed to handle it.  Simpler shaders focused on either pixel or vertex operations are more efficient at what they do, both in terms of transistors used and power consumption.  This is the same train of thought when using fixed function units vs. fully programmable.  Yes, the programmability will give more flexibility, but the fixed function unit is again smaller, faster, and more efficient at its workload.

layout_4i.jpg

On the other hand here we have the Tegra 4i, which gives up half the pixel pipelines and vertex shaders, but keeps all 48 pixel shaders.

If there was one surprise here, it would be that the part is not completely OpenGL ES 3.0 compliant.  It is lacking in one major function that is required for certification.  This particular part cannot render at FP32 levels.  It has been quite a few years since we have heard of anything not being able to do FP32 in the PC market, but it is quite common to not support it in the power and transistor conscious mobile market.  NVIDIA decided to go with a FP 20 partial precision setup.  They claim that for all intents and purposes, it will not be noticeable to the human eye.  Colors will still be rendered properly and artifacts will be few and far between.  Remember back in the day when NVIDIA supported FP16 and FP32 while they chastised ATI for choosing FP24 with the Radeon 9700 Pro?  Times have changed a bit.  Going with FP20 is again a power and transistor saving decision.  It still supports DX9.3 and OpenGL ES 2.0, but it is not fully OpenGL ES 3.0 compliant.  This is not to say that it does not support any 3.0 features.  It in fact does support quite a bit of the functionality required by 3.0, but it is still not fully compliant.

This will be an interesting decision to watch over the next few years.  The latest Mali 600 series, PowerVR 6 series, and Adreno 300 series solutions all support OpenGL ES 3.0.  Tegra 4 is the odd man out.  While most developers have no plans to go to 3.0 anytime in the near future, it will eventually be implemented in software.  When that point comes, then the Tegra 4 based devices will be left a bit behind.  By then NVIDIA will have a fully compliant solution, but that is little comfort for those buying phones and tablets in the near future that will be saddled with non-compliance once applications hit.

ogles_feat.jpg

The list of OpenGL ES 3.0 features that are actually present in Tegra 4, but the lack of FP32 relegates it to 2.0 compliant status.

The core speed is increased to 672 MHz, well up from the 520 MHz in Tegra 3 (8 pixel and 4 vertex shaders).  The GPU can output four pixels per clock, double that of Tegra 3.  Once we consider the extra clock speed and pixel pipelines, the Tegra 4 increases pixel fillrate by 2.6x.  Pixel and vertex shading will get a huge boost in performance due to the dramatic increase of units and clockspeed.  Overall this is a very significant improvement over the previous generation of parts.

The Tegra 4 can output to a 4K display natively, and that is not the only new feature for this part.  Here is a quick list:

2x/4x Multisample Antialiasing (MSAA)

24-bit Z (versus 20-bit Z in the Tegra 3 processor) and 8-bit Stencil

4K x 4K texture size incl. Non-Power of Two textures (versus 2K x 2K in the Tegra 3 processor) – for higher quality textures, and easier to port full resolution textures from  console and PC games to Tegra 4 processor.  Good for high resolution displays.

16:1 Depth (Z) Compression and 4:1 Color Compression (versus none in Tegra 3 processor) – this is lossless compression and is useful for reducing bandwidth to/from the frame buffer, and especially effective in antialiasing processing when processing multiple samples per pixel

Depth Textures

Percentage Closer Filtering for Shadow Texture Mapping and Soft Shadows

Texture border color eliminate coarse MIP-level bleeding

sRGB for Texture Filtering, Render Surfaces and MSAA down-filter

1 - CSAA is no longer supported in Tegra 4 processors

This is a big generational jump, and now we only have to see how it performs against the other top end parts from Qualcomm, Samsung, and others utilizing IP from Imagination and ARM.

Source: NVIDIA

A graphical description of market woes from Jon Peddie

Subject: General Tech, Graphics Cards | February 25, 2013 - 01:32 PM |
Tagged: jon peddie, graphics, market share

If last weeks report from Jon Peddie Research on sales for all add in and integrated graphics had you worried, the news this week is not gong to help boost your confidence.  This week the report focuses solely on add in boards and the drop is dramatic; Q4 2012 sales plummeted just short of 20% compared to Q3 2012.  When you look at the entire year, sales dropped 10% overall as AMD's APUs are making serious inroads into the mobile market, as are Intel's, with many notebooks being sold without a discrete GPU.  The losses are coming from the mainstream market, enthusiast level GPUs actually saw a slight increase in sales but the small volume is utterly drowned by the mainstream market.  You can check out the full press release here.

PR_108.jpg

"JPR found that AIB shipments during Q4 2012 behaved according to past years with regard to seasonality, but the drop was considerably more dramatic. AIB shipments decreased 17.3% from the last quarter (the 10 year average is just -0.68%). On a year-to-year comparison, shipments were down 10%."

Here is some more Tech News from around the web:

Tech Talk

AMD wants to wash your hair, with graphics. What??

Subject: Graphics Cards | February 22, 2013 - 05:29 PM |
Tagged: tressfx, amd

I got an odd email just now that I thought I would share with you.  From AMD's Gaming Evolved account I got this:

You're at the top of your game. Why isn't your hair? TressFX is specially formulated with dynamic compounds like PPLL to re-energize your tired locks with vitality and luster.

WAT?

tressfx.jpg

An odd campaign for sure, but it appears that on Tuesday AMD is going to discuss a technology that will bring realistic hair to gaming.  Finally some use for all that GPGPU horsepower on the Southern Islands graphics cards?

You can see the landing page for yourself right here.

Source: AMD

Join PCPer and NVIDIA for a GeForce GTX TITAN Live Review!

Subject: Graphics Cards | February 21, 2013 - 01:12 PM |
Tagged: video, titan, nvidia, live review, live, kepler, geforce titan, geforce

Missed the live event?  Here is the full replay feature me and Tom Petersen!

Hopefully by now you have read our review of the NVIDIA GeForce GTX TITAN 6GB graphics card that was just released.  This is definitely a product release that highlights a generations of GPUs and I would really encourage you to read the article and offer your feedback.

However, we have another event to promote right now: NVIDIA's Tom Petersen will be joining me on PCPer Live! at 11am PT / 2pm ET to talk about the GeForce GTX TITAN and its performance, features, pricing and more! 

pcperlive2.png

GeForce GTX TITAN Live Review Stream

11am PT / 2pm ET - February 21st

PC Perspective Live! Page

If you have questions for Tom or me, you can leave them in the comments below (no registration required)!

nvidia1.jpg

TITAN up your ... you know

Subject: Graphics Cards | February 21, 2013 - 12:57 PM |
Tagged: titan, nvidia, kepler, gtx titan, gk110, geforce

Before getting into the performance of the $1000 NVIDIA TITAN it is worth looking at the improvements NVIDIA has added to this GK110 beast.  At 10.5" long it is a half inch longer than a 680 and a full 1.5" shorter than a 690, which allows it to fit in a wider variety of cases and the vastly improved thermals allow the usage of much smaller cases than other high end GPUs can manage without exotic cooling solutions.  There is also a reduction in noise generated, to the point where SLI'd TITANs run quieter than some single card solutions, not to mention much faster.  To take a look at just how much faster you can see [H]ard|OCP's results which you can compare to Ryan's results.

H_TITAN.jpg

"NVIDIA is launching a TITAN today, literally, the new GeForce GTX TITAN video card is here, and we have a lot to talk about. We test single-GPU and 2-way SLI today, with more to follow later. We will find out if this TITAN of a video card really is worth it, and just who this video card is designed for. Be prepared to face the fastest single-GPU video card."

Here are some more Graphics Card articles from around the web:

Graphics Cards

Source: [H]ard|OCP

LucidLogix Virtu MVP 2.0 Software Suite Now Available

Subject: General Tech, Graphics Cards | February 20, 2013 - 12:49 PM |
Tagged: lucid, virtu MVP, virtu, hyperformance

As promised at CES, Lucidlogix has released their Virtu MVP 2.0 for purchase to anyone who wants to buy it.  Their GPU Virtualization software for SandyBridge and IvyBridge based systems with a discrete card allows you to jump back a forth between the embedded GPU on your processor and the graphics card without needing to move monitor cables or reboot.  That allows you to save your laptops battery life when the discrete GPU is not needed but to instantly enable it the second you fire up a compatible game, the list of which has grown since the release of their original Virtu MVP.  They have also improved their Virtual VSync and Hyperformance features which we reviewed last summer on an Origin laptop

The move to selling the product directly to consumers is beneficial as previously you could only get the software and updates from the manufacturer of your motherboard or your laptop.  As anyone who has dealt with the infrequency graphics driver updates from manufacturers is well aware, the updates are few and far between.  It is much better to be able to acquire the software from the vendor who creates it in the first place.  Head over to Lucidlogix to read more and perhaps buy one of the three versions available.

lucidlogix.png

"The optimal system specifications Virtu MVP 2.0 include an Intel® Core™ i5 (Sandy Bridge) on an Intel Sandy Bridge or Ivy Bridge motherboard with an NVIDIA® Geforce 460GTX or similar or better AIB and 2GB or more memory running Windows® 7 or Windows 8 in either 32-bit or 64-bit modes.

With special launch prices, Virtu MVP 2.0 is now available in three models: Basic with GPU virtualization for $34.99 (USD), Standard with Virtual Vsync for $44.99 and Pro with Hyperformance and Virtual Vsync for $54.99."

Here is some more Tech News from around the web:

Tech Talk

Source: Lucidlogix

PCPer Live! Crysis 3 Game Stream - Win Games and Graphics Cards from AMD!

Subject: Graphics Cards | February 19, 2013 - 08:00 PM |
Tagged: video, tahiti, radeon, never settle reloaded, live, Crysis 3, crysis, amd

UPDATE: If you missed the live stream you can still catch the YouTube replay right here!!

On February 19th on the PC Perspective Live! page we will be streaming some single player game action of the new Crysis 3.  If there has ever been a game that defined the world of PC gaming graphics and technology, it is the Crysis series. 

"Sure, but can it play Crysis?"

There is probably no more famous line of dialogue that pigeon hole's new hardware releases. 

With the release of the latest version of Crysis 3 on February 19th, we will be teaming up with AMD once again to provide a fun and exciting PCPer Game Stream that includes game demonstrations and of course, prizes and game keys for those that watch the event LIVE! 

crysis1.jpg

Crysis 3 Game Stream

5pm PT / 8pm ET - February 19th

PC Perspective Live! Page

Warning: this one will DEFINITELY have mature language and content!!

The stream will be sponsored by AMD and its Never Settle Reloaded game bundles which we previously told you about.  Depending on the AMD Radeon HD 7000 series GPU that you buy, you could get some amazing free games including:

  • Radeon HD 7900 Series
    • FREE Crysis 3
    • FREE Bioshock Infinite
  • Radeon HD 7800 Series
    • FREE Bioshock Infinite
    • FREE Tomb Raider
  • Radeon HD 7900 CrossFire Set
    • FREE Crysis 3
    • FREE Bioshock Infinite
    • FREE Tomb Raider
    • FREE Far Cry 3
    • FREE Hitman: Absolution
    • FREE Sleeping Dogs

nsr_matrix.jpg

AMD's Robert Hallock (@Thracks on twitter) will be joining us via Skype to talk about the game's technology, performance considerations as well as helping me with some co-op gaming!

Of course, just to sweeten the deal a bit we have some prizes lined up for those of you that participate in our Crysis 3 Game Stream:

  • 2 x Radeon HD 7970 3GB graphics cards
  • 4 x Combo codes for both Crysis 3 AND Bioshock Infinite

Pretty nice, huh?  All you have to do to win is be present on the PC Perspective Live! Page during the event as we will announce both the content/sweepstakes method AND the winners!

Stop in on February 19th for some PC gaming fun!!

crysis2.jpg

We interrupt your Titan previews for a look at comparitive Catalyst version performance

Subject: Graphics Cards | February 19, 2013 - 06:09 PM |
Tagged: amd, catalyst, 2012

Today might be Titan Preview Day as you can see from the links below as well as Ryan's article here, but [H]ard|OCP would like to offer you solid performance numbers instead.  They took a look back at the Catalyst 12.x series of drivers that AMD GPU owners have been using over the past year. With the HD 7970 and HD 7950 they tested 7 of AMD's past drivers for performance on four popular games.  The findings are fairly clear, after a poor start to the year AMD's drivers showed improved performance as the year went on, with leaps after games were released and the driver could be optimized for speed.  The HD7970 did improve over the year but it was the 7950 that proved to receive the biggest gains.

H7950.gif

"We continuing our look at driver performance improvements over time by evaluating AMD’s 2012 driver performances on both the AMD Radeon HD 7970 and HD 7950 video cards. We will see how drivers from the beginning of the year to the end of year have impacted real world gameplay performance . "

Here are some more Graphics Card articles from around the web:

Graphics Cards

Source: [H]ard|OCP

GeForce 314.07 WHQL Drivers: Optimized For Crysis 3, Assassin's Creed 3 & Far Cry 3

Subject: Graphics Cards | February 19, 2013 - 01:50 PM |
Tagged: nvidia, graphics drivers, geforce, 314.07

Just in time for the arrival of the Titan previews comes the new WHQL 314.07 Geforce driver from NVIDIA.  Instead of offering a list of blanket improvements and average frame rate increased, NVIDIA has assembled a list of charts showing performance differences between this driver and the previous one for their four top GPUs in both SLI and single card setups.  As well they attempt to answer the question "Will it play Crysis 3?" with the chart below, showing the performance you can expect with Very High settings at 1080p resolution and 4x AA.  They also provide a link to their GeForce Experience tool which will optimize your Crysis 3 settings to whatever NVIDIA card(s) you happen to be using.  Upgrade now as the new driver seems to offer improvements across the board.

nvidia-geforce-314-07-whql-drivers-crysis-3-performance-chart-650.png

 

The new GeForce 314.07 WHQL driver is now available to download. An essential update for gamers jumping into Crysis 3 this week, 314.07 WHQL improves single-GPU and multi-GPU performance in Crytek’s sci-fi shooter by up to 65%.

Other highlights include sizeable SLI and single-GPU performance gains of up to 27% in Assassin’s Creed III, 19% in Civilization V, 14% in Call of Duty: Black Ops 2, 14% in DiRT 3, 11% in Just Cause 2, 10% in Deus Ex: Human Revolution, 10% in F1 2012, and 10% in Far Cry 3.

Rounding out the release is a ‘Excellent’ 3D Vision profile for Crysis 3, a SLI profile for Ninja Theory’s DmC: Devil May Cry, and an updated SLI profile for the free-to-play, third-person co-op shooter, Warframe.

You can download the GeForce 314.07 WHQL drivers with one click from the GeForce.com homepage; Windows XP, Windows 7 and Windows 8 packages are available for desktop systems, and for notebooks there are Windows 7 and Windows 8 downloads that cover all non-legacy products.

Source: NVIDIA