Subject: General Tech, Graphics Cards | March 19, 2013 - 02:55 PM | Tim Verry
Tagged: unified virtual memory, ray tracing, nvidia, GTC 2013, grid vca, grid, graphics cards
Today, NVIDIA's CEO Jen-Hsun Huang stepped on stage to present the GTC keynote. In the presentation (which was live streamed on the GTC website and archived here.), NVIDIA discussed five major points, looking back over 2013 and into the future of its mobile and professional products. In addition to the product roadmap, NVIDIA discussed the state of computer graphics and GPGPU software. Remote graphics and GPU virtualization was also on tap. Finally, towards the end of the Keynote, the company revealed its first appliance with the NVIDIA GRID VCA. The culmination of NVIDIA's GRID and GPU virtualization technology, the VCA is a device that hosts up to 16 virtual machines which each can tap into one of 16 Kepler-based graphics processors (8 cards, 16 GPUs per card) to fully hardware accelerate software running of the VCA. Three new mobile Tegra parts and two new desktop graphics processors were also hinted at, with improvements to power efficiency and performance.
On the desktop side of things, NVIDIA's roadmap included two new GPUs. Following Kepler, NVIDIA will introduce Maxwell and Volta. Maxwell will feature a new virtualized memory technology called Unified Virtual Memory. This tech will allow both the CPU and GPU to read from a single (virtual) memory store. Much as with the promise of AMD's Kaveri APU, the Unified Virtual Meory will result in speed improvements in heterogeneous applications because data will not have to be copied to/from the GPU and CPU in order for the data to be processed. Server applications will really benefit from the shared memory tech. NVIDIA did not provide details, but from the sound of it, the CPU and GPU both continue to write to their own physical memory, but their is a layer of virtualized memory on top of that, that will allow the two (or more) different processors to read from each other's memory store.
Following Maxwell, Volta will be a physically smaller chip with more transistors (likely a smaller process node). In addition to the power efficiency improvements over Maxwell, it steps up the memory bandwidth significantly. NVIDIA will use TSV (through silicon via) technology to physically mount the graphics DRAM chips over the GPU (attached to the same silicon substrate electrically). According to NVIDIA, this new TSV-mounted memory will achieve up to 1 Terabytes/second of memory bandwidth, which is a notable increase over existing GPUs.
NVIDIA continues to pursue the mobile market with its line of Tegra chips that pair an ARM CPU, NVIDIA GPU, and SDR modem. Two new mobile chips called Logan and Parker will follow Tegra 4. Both new chips will support the full CUDA 5 stack and OpenGL 4.3 out of the box. Logan will feature a Kepler-based graphics porcessor on the chip that can “everything a modern computer ought to do” according to NVIDIA. Parker will have a yet-to-be-revealed graphics processor (Kepler successor). This mobile chip will utilize 3D FinFET transistors. It will have a greater number of transistors in a smaller package than previous Tegra parts (it will be about the size of a dime), and NVIDIA also plans to ramp up the frequency to wrangle more performance out of the mobile chip. NVIDIA has stated that Logan silicon should be completed towards the end of 2013, with the mobile chips entering production in 2014.
Interestingly, Logan has a sister chip that NVIDIA is calling Kayla. This mobile chip is capable of running ray tracing applications and features OpenGL geometric shaders. It can support GPGPU code and will be compatible with Linux.
NVIDIA has been pushing CUDA for several years, now. The company has seen some respectable adoption rates, by growing from 1 Tesla supercomputer in 2008 to its graphics cards being used in 50 supercomputers, with 500 million CUDA processors on the market. There are now allegedly 640 universities working with CUDA and 37,000 academic papers on CUDA.
Finally, NVIDIA's hinted-at new product announcement was the NVIDIA VCA, which is a GPU virtualization appliance that hooks into the network and can deliver up to 16 virtual machines running independant applications. These GPU accelerated workspaces can be presneted to thin clinets over the netowrk by installing the GRID client software on users' workstations. The specifications of the GRID VCA is rather impressive, as well.
The GRID VCA features:
- 2 x Intel Xeon processors with 16 threads each (32 total threads)
- 192GB to 384GB of system memory
- 8 Kepler-based graphics cards, with two GPUs each (16 total GPUs)
- 16 x GPU-accelerated virtual machines
The GRID VCA fits into a 4U case. It can deliver remote graphics to workstations, and is allegedly fast enough to deliver gpu accelerated software that is equivalent to having it run on the local machine (at least over LAN). The GRID Visual Computing Appliance will come in two flavors at different price points. The first will have 8 Kepler GPUs with 4GB of memory each, 16 CPU threads, and 192GB of system memory for $24,900. The other version will cost $34,900 and features 16 Kepler GPUs (4GB memory), 32 CPU threads, and 384GB system memory. On top of the hardware cost, NVIDIA is also charging licensing fees. While both GRID VCA devices can support unlimited devices, the licenses cost $2,400 and $4,800 per year respectively.
Overall, it was an interesting keynote, and the proposed graphics cards look to be offering up some unique and necessary features that should help hasten the day of ubiquitous general purpose GPU computing. The Unified Virtual Memory was something I was not expecting, and it will be interesting to see how AMD responds. AMD is already promising shared memory in its Kaveri APU, but I am interested to see the details of how NVIDIA and AMD will accomplish shared memory with dedicated grapahics cards (and whether CrossFire/SLI setups will all have a single shared memory pool)..
Stay tuned to PC Perspective for more GTC 2013 Coverage!
Subject: Graphics Cards, Shows and Expos | January 12, 2013 - 11:38 AM | Ryan Shrout
Tagged: CES, ces 2013, caustic, imagination, ray tracing, series2
We have talked with Caustic on several occassions over the past couple of years about their desire to build a ray tracing accelerator. Back in April of 2009 we first met with Caustic, learning who they were and what the goals of the company were; we saw early models of the CausticOne and CausticTwo and a demonstration of the capabilities of the hardware and software model.
While at CES this year we found the group at a new place - the Imagination Technologies booth - having been acquired since we last talked. Now named the Caustic Series2 OpenRL accelerator boards, we are looking at fully integrated ASICs rather than demonstration FPGAs.
This is the Caustic 2500 and it will retail for $1495 and includes a pair of the RT2 chips and 16GB of memory. One of the benefits of the Caustic technology is that while you need a lot of memory, you do not need expensive, fast memory like GDDR5 used in today's graphics cards. By utilizing DDR2 memory Imagination is able to put a whopping 16GB on the 2500 model.
A key benefit of the Caustic ray tracing accelerators comes with the simply software integration. You can see above that a AutoDesk Maya 2013 is utilizing the Caustic Visualizer as a simple viewport into the project just as you would use with any other RT or preview rendering technique. The viewport software is also available for 3ds max.
There is a lower cost version of the hardware, the Caustic 2100, that uses a single chip and has half the memory for a $795 price tag. They are shipping this month and we are interested to see how quickly, and how eager developers are, to utilize this technology.
PC Perspective's CES 2013 coverage is sponsored by AMD.
Follow all of our coverage of the show at http://pcper.com/ces!
Subject: General Tech, Graphics Cards, Processors, Mobile | March 8, 2012 - 04:02 AM | Scott Michaud
Tagged: ray tracing, tablet, tablets, knight's ferry, Intel
Intel looks to bring ray-tracing from their Many Integrated Core (Intel MIC) architecture to your tablet… by remotely streaming from a server loaded with one or more Knight’s Ferry cards.
The anticipation of ray-tracing engulfed almost the entirety of 3D video gaming history. The reasonable support of ray-tracing is very seductive for games as it enables easier access to effects such as global illumination, reflections, and so forth. Ray-tracing is well deserved of its status as a buzzword.
Render yourself in what Knight’s Ferry delivered… with scaling linearly and ray-traced Wolfenstein
Screenshot from Intel Blogs.
Obviously Intel would love to make headway into the graphics market. In the past Intel has struggled to put forth an acceptable offering for graphics. It is my personal belief that Intel did not take graphics seriously when they were content selling cheap GPUs to be packed in with PCs. While the short term easy money flowed in, the industry slipped far enough ahead of them that they could not just easily pounce back into contention with a single huge R&D check.
Intel obviously cares about graphics now, and has been relentless at their research into the field. Their CPUs are far ahead of any competition in terms of serial performance -- and power consumption is getting plenty of attention itself.
Intel has long ago acknowledged the importance of massively parallel computing but was never quite able to bring products like Larabee against anything the companies they once ignored could retaliate with. This brings us back to ray-tracing: what is the ultimate advantage of ray-tracing?
Ray-tracing is a dead simple algorithm.
A ray-trace renderer is programmed very simply and elegantly. Effects are often added directly and without much approximation necessary. No hacking around is required in the numerous caveats within graphics APIs in order to get a functional render on screen. If you can keep throwing enough coal on the fire, it will burn without much effort -- so to speak. Intel just needs to put a fast enough processor behind it, and away they go.
Throughout the article, Daniel Pohl has in fact discussed numerous enhancements that they have made to their ray-tracing engine to improve performance. One of the most interesting improvements is their approach to antialiasing. If the rays from two neighboring pixels strike different meshes or strike the same mesh at the point of a sharp change in direction, denoted by color, between pixels then they are flagged for supersampling. The combination of that shortcut with MLAA will also be explored by Intel at some point.
A little behind-the-scenes trickery...
Screenshot from Intel Blogs.
Intel claims that they were able to achieve 20-30 FPS at 1024x600 resolutions streaming from a server with a single Knight’s Ferry card installed to an Intel Atom-based tablet. They were able to scale to within a couple percent of theoretical 8x performance with 8 Knight’s Ferry cards installed.
I very much dislike trusting my content to online streaming services as I am an art nut. I value the preservation of content which just is not possible if you are only able to access it through some remote third party -- can you guess my stance on DRM? That aside, I understand that Intel and others will regularly find ways to push content to where there just should not be enough computational horsepower to accept it.
Ray-tracing might be Intel’s attempt to circumvent all of the years of research that they ignored with conventional real-time rendering technologies. Either way, gaming engines are going the way of simpler rendering algorithms as GPUs become more generalized and less reliant on fixed-function hardware assigned to some arbitrary DirectX or OpenGL specification.
Intel just hopes that they can have a compelling product at that destination whenever the rest of the industry arrives.
Subject: Graphics Cards, Processors, Shows and Expos | September 15, 2011 - 06:17 PM | Ryan Shrout
Tagged: ray tracing, knights ferry, idf 2011, idf
Very few things impress like a collection of 256 processor cores in a box. But that is exactly what we saw on our last visit to the floor at the Intel Developer Forum this year when I stopped by to visit friend-of-the-site Daniel Pohl to discuss updates to the ray tracing research he has been doing for many years now. This is what he showed us:
What you see there is a dual-Xeon server running a set of 8 (!!) Knights Ferry many-core processor discrete cards. Each card holds a chip with 32 Intel Architecture cores running at 1.2 GHz on it and each core can handle 4 threads for a total of 1024 threads in flight at any given time! Keep in mind these are all modified x86 cores with support for 16-bit wide vector processing so they are pumping through a LOT of FLOPS. Pohl did note that only 31-32 of the cores are actually doing ray tracing at any given time though as they reserve a couple for scheduling tasks, operating system interaction, etc.
Each of the the eight cards in the system is using a pair of 6-pin PCIe power connectors and they are jammed in there pretty tight. Pohl noted this was the only case they could find that would fit 8 dual-slot add-in cards into it so I'll take a note of that for when I build my own system around them. Of course there are no display outputs on the Knights Ferry cards as they were never really turned into GPUs in the traditional sense. They are essentially development and research for exascale computing and HPC workloads for servers though the plan is to bring the power to consumers eventually.
To run the demo the Knights Ferry ray tracing server was communicating over a Gigabit Ethernet connection to this workstation that was running game processing, interaction processing and more and passed off data about the movements of the camera and objects in the ray traced world to the server. The eight Knights Ferry cards then render the frame, the Xeon CPUs compress the image (8:1 using a standard Direct 3D format) and send the data across the network. All of this happens in real time with basically no latency issues when compared to direct PC gaming.
While the ray tracing game engine projects might seem a little less exciting since the demise of Larrabee, Pohl and his team have been spending a lot of time on learning how to take advantage of the x86 cores available. The Wolfenstein demo we have seen in past events has been improved to add things like HDR lighting, anti-aliasing and more.
Though these features have obviously been around in rasterization based solutions for quite a long time, the demo was meant to showcase the fact that ray tracing doesn't inherently have difficulty performing those kinds of tasks as long as the processing power is there and alotted to it.
I am glad to see the ray tracing research continuing at Intel as I think that in the long-term future, that is the route that gaming and other graphics-based applications will be rendering. And I am not alone - id Software founder and Doom/Quake creator John Carmack agreed in a recent interview we held with him.