Subject: Graphics Cards | November 13, 2012 - 04:15 PM | Tim Verry
Tagged: tahiti, HPC, gpgpu, firepro s10000, firepro
On Monday, AMD launched its latest graphics card aimed at the server and workstation market. Called the AMD FirePro S10000 (for clarity, that’s FirePro S10,000), it is a dual GPU Tahiti graphics card that offers up some impressive performance numbers.
No, unfortunately, this is not the (at this point) mythical dual-7970 AMD HD 7990 graphics card. Rather, the FirePro S10,000 is essentially two Radeon 7950 GPUs on a single PCB along with 6 GB of GDDR5 memory. Specifications on the card include 3,584 stream processors, a GPU clock speed of 825 MHz, and 6 GB GDDR5 with a total of 480 GB/s of memory bandwidth. That is 1,792 stream processors and 3 GB of memory per GPU. Interestingly, this is a dual slot card with an active cooler. At 375W, a passive cooler is just not possible in a form factor necessary to fit into a server rack. Therefore, AMD has equipped the FirePro S10,000 GPGPU card with a triple fan cooler reminiscant of the setup PowerColor uses on its custom (2x7970) Devil 13, but not as large. The FirePro card has three red fans (shrouded by a black cover) over a heatpipe and aluminum fin heatsink. The card does include display outputs for workstation uses including one DVI and four mini DisplayPort ports.
AMD is claiming 1.48 TFLOPS in double precision work and 5.91 TFLOPS in single precision workloads. Those are impressive numbers, and the card even manages to beat NVIDIA’s new Tesla K20X with big Kepler GK110 and the company’s dual GPU GK104 Tesla K10 by notable margins. Additionally, the new FirePro S10000 manages to beat its FirePro 9000 predecessor handily. The S9000 in comparison is rated at 0.806 TFLOPS for double precision calculations and 3.23 TFLOPS on single precision work. The S9000 is a single GPU card equivalent to the Radeon 7950 on the consumer side of things with 1,792 shader cores. AMD has essentially taken two S9000 cards and put them on a single PCB, and managed to get almost twice the potential performance without needing twice the power.
Efficiency and calculations per watt were numbers that AMD did not dive too much into, but the company did share that the new FirePro S10000 achieves 3.94 GLOPS/W. AMD compares this to NVIDIA’s dual GPU (Fermi-based) Tesla M2090 at 2.96 GFLOPS/W. Unfortunately, NVIDIA has not shared a single GPU GFLOPS/W rating on its new K20X cards.
|AMD S10000||AMD S9000||NVIDIA K20X||NVIDIA K10|
|Double Precision||1.48 TF||0.806 TF||1.31 TF||0.19 TF|
|Single Precision||5.91 TF||3.23 TF||3.95 TF||4.58 TF|
|Architecture||Tahiti (x2)||Tahiti (x1)||GK110||GK104 (x2)|
|Memory Bandwidth||480 GB/s||264 GB/s||250 GB/s||320 GB/s|
|Memory Capacity||6 GB||6 GB||6 GB||8 GB|
|Core clock speed||825 MHz||900 MHz||732 MHz||745 M|
Other features of the AMD FirePro S10000 include support for OpenCL, Microsoft RemoteFX, Direct GPU pass-through, and (shared) virtualized graphics. AMD envisions businesses using these FirePro cards to provide GPU hardware acceleration for virtualized desktops and thin clients. With Xen Server, multiple users are able to tap into the hardware acceleration offered by the FirePro S10000 to speed up desktop and speed up programs that support it.
Operating systems in particular have begun tapping into GPU acceleration to speed up the user interface and run things like the Aero desktop in Windows 7. High end software for workstations also have a high GPU acceleration adoption rate, so there are benefits to be had, and AMD is continuing to offer it with its latest FirePro card.
AMD is offering up a card that can be used for a mix of compute or graphics output, making them an interesting choice for workstations. The FirePro S10000’s major fault lies with a 375W TDP, and while the peak performance is respectable it is going to use more power while provided that compute muscle.
The cards are available now with an MSRP of $3,599. It is neat to finally see AMD come out with a dual GPU card with Tahiti chips, and it will be interesting to see what kind of design wins the company is able to get for its beastly FirePro S10000.
Subject: General Tech, Graphics Cards | November 13, 2012 - 01:17 PM | Jeremy Hellstrom
Tagged: amd, Intel, firepro, firepro s10000, HPC, Xeon Phi, 3120A, 5110P, Knight's Corner
AMD's new Tahiti based FirePro S10000 sports a little more than just a GPU upgrade it sports two GPU updates as this is a dual GPU card. According to The Register it should run about $3,600 and need 375W to perform, numbers which make it a more efficient card than the S9000 even though it needs significantly more cash and power to run. It is a 2 slot card, a necessity in the server and workstation world and while it does not support CrossFire it does support EyeFinity with its DVI port and four Mini DisplayPorts.
The Register also got some news about Xeon Phi, Intel's answer to the HPC cards on offer from AMD and Intel. Knights Corner is the evolution of Larrabee into an actual product, in this case two 62 core cards though not all of the cores are active. The passively cooled 5110P has 60 cores running at 1.053GHz, while the 3120A has 57 cores clocked slightly higher at 1.1GHz and sports a fan. Both cards produce just over a teraflop of double precision floating point math, compared to the 1.48 teraflops offered by AMD's S10000 or the 1.3 offered by the Tesla K20x. Check out more on these coprocessors at The Register.
"With the FirePro S10000, not only is the GPU geared down to 825MHz, but the memory is similarly downshifted to 5GHz. The memory interface is 384-bit wide on each GPU, with two blocks of GDDR5 memory yielding a total of 6GB. (This could be a little skinny on the memory for some HPC workloads, given that the S9000 card has 6GB of memory for one Tahiti GPU.) Each GPU can access 240GB/sec of memory bandwidth linking to each 3GB chunk of GDDR5 memory.
Because the card is double-stuffed, it can deliver a very impressive 5.91 teraflops SP and 1.48 teraflops DP in peak floating point oomph."
Here is some more Tech News from around the web:
- The TR Podcast 123: Incremental improvements
- Microsoft Makes Direct X 11.1 a Windows 8 Exclusive @ Slashdot
- Random Linux Commands to Make Google Talk, Fix Wifi, Find Duplicate Files, and More @ Linux.com
- Microsoft Surface RT may only achieve 60% of forecasted sales @ DigiTimes
- Windows chief Steven Sinofsky leaves Microsoft @ The Inquirer
- Fedora 'Spherical Cow' delayed by bugs, Secure Boot @ The Register
- Microsoft rolls out always-on Skype for Windows Phone 8 @ The Register
- Gaming in Windows 8 vs Windows 7: what's the difference in performance? @ Hardware.info
- Windows 7 vs Windows 8 – The Definitive Performance Guide @ hardCOREware
- How to Change the Start Screen Background in Windows 8 @ TechSpot
- TP-Link TL-WDR3600 and WDR4300 review: two shades of black @ Hardware.info
- Win 1 El'Druin ARPG Gaming Mouse, 2 Hellion Gaming Mice and 1 Aegis Gaming Pad @ NikKTech
Subject: General Tech | November 12, 2012 - 06:29 AM | Tim Verry
Tagged: tesla, supercomputer, nvidia, k20x, HPC, CUDA, computing
Graphics card manufacturer NVIDIA launched a new Tesla K20X accelerator card today that supplants the existing K20 as the top of the line model. The new card cranks up the double and single precision floating point performance, beefs up the memory capacity and bandwidth, and brings some efficiency improvements to the supercomputer space.
While it is not yet clear how many CUDA cores the K20X has, NVIDIA has stated that it is using the GK110 GPU, and is running with 6GB of memory with 250 GB/s of bandwidth – a nice improvement over the K20’s 5GB at 208 GB/s. Both the new K20X and K20 accelerator cards are based on the company’s Kepler architecture, but NVIDIA has managed to wring out more performance from the K20X. The K20 is rated at 1.17 TFlops peak double precision and 3.52 TFlops peak single precision while the K20X is rated at 1.31 TFlops and 3.95 TFlops.
The K20X manages to score 1.22 TFlops in DGEmm, which puts it at almost three times faster than the previous generation Tesla M2090 accelerator based on the Fermi architecture.
Aside from pure performance, NVIDIA is also touting efficiency gains with the new K20X accelerator card. When two K20X cards are paired with a 2P Sandy Bridge server, NVIDIA claims to achieve 76% efficiency versus 61% efficiency with a 2P Sandy Bridge server equipped with two previous generation M2090 accelerator cards. Additionally, NVIDIA claims to have enabled the Titan supercomputer to reach the #1 spot on the top 500 green supercomputers thanks to its new cards with a rating of 2,120.16 MFLOPS/W (million floating point operations per second per watt).
NVIDIA claims to have already shipped 30 PFLOPS worth of GPU accelerated computing power. Interestingly, most of that computing power is housed in the recently unveiled Titan supercomputer. This supercomputer contains 18,688 Tesla K20X (Kepler GK110) GPUs and 299,008 16-core AMD Opteron 6274 processors. It will consume 9 megawatts of power and is rated at a peak of 27 Petaflops and 17.59 Petaflops during a sustained Linpack benchmark. Further, when compared to Sandy Bridge processors, the K20 series offers up between 8.2 and 18.1 times more performance at several scientific applications.
While the Tesla cards undoubtedly use more power than CPUs, you need far fewer numbers of accelerator cards than processors to hit the same performance numbers. That is where NVIDIA is getting its power efficiency numbers from.
NVIDIA is aiming the accelerator cards at researchers and businesses doing 3D graphics, visual effects, high performance computing, climate modeling, molecular dynamics, earth science, simulations, fluid dynamics, and other such computationally intensive tasks. Using CUDA and the parrallel nature of the GPU, the Tesla cards can acheive performance much higher than a CPU-only system can. NVIDIA has also engineered software to better parrellelize workloads and keep the GPU accelerators fed with data that the company calls Hyper-Q and Dynamic Parallelism respectively.
It is interesting to see NVIDIA bring out a new flagship, especially another GK110 card. Systems using the K20 and the new K20X are available now with cards shipping this week and general availability later this month.
You can find the full press release below and a look at the GK110 GPU in our preview.
Anandtech also managed to get a look inside the Titan supercomputer at Oak Ridge National Labratory, where you can see the Tesla K20X cards in action.
Subject: General Tech | June 19, 2012 - 03:04 PM | Jeremy Hellstrom
Tagged: nvidia, tesla, K10, GK104, HPC
One of NVIDIA 's line of Tesla HPC cards, the Tesla K10 has actually been seen in the wild. the new Tesla series is split between the GK104 based K10 model specifically designed for single-precision tasks and the GK110 based Tesla K20 and it is optimized for double-precision tasks. The K10 is capable of 4.58 teraflops thanks to a pair of GK104s with 8GB of GDDR5, whereas the K20 should in theory double Intel's Xeon Phi at 2 teraflops of double-precision performance but that has yet to be demonstrated. The K10 that was demonstrated also showed off another of the benefits of NVIDIA's new architecture, even with two GPUs the card remains within a 225W thermal envelop, something that is incredibly important if you are building a cluster. The Register has gathered together some of the benchmarks and slides from NVIDIA's release, which you can see here.
"The Top 500 supercomputer ranking is based on the performance of machines running the Linpack Fortran matrix math benchmark using double-precision floating point math, but a lot of applications will do just fine with single-precision math. And it is for these workloads, graphics chip maker and supercomputing upstart Nvidia says, that it designed the new Tesla K10 server coprocessors."
Here is some more Tech News from around the web:
- Samsung reportedly raising 4GB DDR3 module prices @ DigiTimes
- Can't watch Flash vids in Firefox? It's not just you @ The Register
- TSMC reportedly seeing orders slow down @ DigiTimes
- How To Unlock Huawei 3G Modems @ TechARP
- How to Clean and Fine-Tune a Tape Deck @ Hardware Secrets
- Interview with MEDION's Sandro Fabris @ HardwareHeaven
- HuntKey Joint Contest @ NikKTech
Subject: Processors | June 19, 2012 - 11:46 AM | Josh Walrath
Tagged: Xeon Phi, xeon e5, nvidia, larrabee, knights corner, Intel, HPC, gpgpu, amd
The one positive thing for Intel’s competitors is that it seems their enthusiasm for massively parallel computing is justified. Intel just entered that ring with a unique architecture that will certainly help push high performance computing more towards true heterogeneous computing.
Subject: General Tech | March 28, 2012 - 01:21 PM | Jeremy Hellstrom
Tagged: amd, seamicro, interconnect, purchase, HPC, 3d torus, freedom
In the beginning of March it was announced that AMD would be spending $334 million to purchase SeaMicro, a company who holds the patents on the 3D torus interconnect for High Powered Computing and servers. This interconnect utilizes PCIe lanes to connect large amounts of processors together to create what was commonly referred to as a supercomputer and is now more likely to be labelled an HPC machine. SeaMicro's current SM1000 chassis can hold 64 processor cards, each of which have a processor socket, chipset and memory slots which makes the entire design beautifully modular.
One of the more interesting features of the Freedom systems design is that it can currently utilize either Atom or Xeon chips on those processor cards. With AMD now in the mix you can expect to see compatibility with Opteron chips in the very near future. That will give AMD a chance to grab market share from Intel in the HPC market segment. The Opteron series may not be as powerful as the current Xeons but they do cost noticeably less which makes them very attractive for customers who cannot afford 64 Xeons but need more power than an Atom can provide.
The competition is not just about price however; with Intel's recent purchase of QLogic and the InfiniBand interconnect technology, AMD needs to ensure they can also provide a backbone which is comparable in speed. The current Freedom interconnect has 1.28Tb/sec of aggregate bandwidth on a 3D torus, and supports up to sixteen 10-Gigabit Ethernet links or 64 Gigabit links, which is in the same ballpark as a 64 channel InfiniBand based system. The true speed will actually depend on which processors AMD plans to put into these systems, but as Michael Detwiler told The Register, that will depend on what customers actually want and not on what AMD thinks will be best.
"As last week was winding down, Advanced Micro Devices took control of upstart server maker SeaMicro, and guess what? AMD is still not getting into the box building business, even if it does support SeaMicro's customers for the foreseeable future out of necessity.
Further: Even if AMD doesn't have aspirations to build boxes, the company may be poised to shake up the server racket as a component supplier. Perhaps not as dramatically as it did with the launch of the Opteron chips nearly a decade ago, but then again, maybe as much or more - depending on how AMD plays it and Intel and other server processor makers react."
Here is some more Tech News from around the web:
- AMD collaborates with Green Hills to port Integrity real-time OS @ The Inquirer
- Death of a data haven: cypherpunks, WikiLeaks, and the world's smallest nation @ Ars Technica
- Rockyou security blunder exposed data on 32 million gamers @ The Inquirer
- Plastic that SELF-REPAIRS using light unleashed by prof @ The Register
- ARM adds Mali support to the new DS5 suite @ SemiAccurate
- ASUS EA-N66U Wireless-N450 Ethernet Adapter @ Benchmark Reviews
- Canon PowerShot SX260 HS Review @ TechReviewSource
- The new Comcast Xbox Xfinity app is the first nail in net neutrality’s coffin @ ExtremeTech
Subject: General Tech | March 1, 2012 - 02:09 PM | Jeremy Hellstrom
Tagged: amd, seamicro, interconnect, purchase, HPC
There is more movement in the low power server market as AMD purchased SeaMicro for $334 million, an investment that may help them keep their share of the server market. You might have thought that a company that arrived on the scene with a server based on 512 single core Atoms would either stick with Intel or even consider ARM but instead it was AMD which grabbed them. It is an important move for AMD to retain competitiveness against Intel considering Intel's purchase of QLogic and its InfiniBand interconnect technology which could lead to entirely new server architecture. Using SeaMicro's experience of connecting a large amount of individually weak processors into a powerful server AMD will be able to develop the SoC business that they have been pursuing for quite a while now. Check out the full story at The Inquirer.
"AMD's new CEO Rory Read was fired up about executing better in the server racket at the company's analyst day earlier this month and has wasted little time in stirring things up with the acquisition of low-power server start-up SeaMicro for $334m."
Here is some more Tech News from around the web:
- Neat nanoparticles could bring 10TB disks @ The Register
- Adata launches 1.35V DDR3 modules for overclockers @ The Inquirer
- Microsoft shows off their transparent 3D desktop prototype @ Hack a Day
- Intel isn’t ready to announce Centerton, yet @ SemiAccurate
- Mozilla Collusion lets you see who is tracking you @ The Inquirer
- Hands On With Windows 8 Consumer Preview @ TechReviewSource
- Configuring a Windows 8 Virtual Machine @ Techspot
- Wicked Lasers Spyder III Krypton 1 Watt Green Laser @ Tweaktown
- Canon Pixma MG3120 Review @ TechReviewSource
- TechwareLabs Powerbag Give Away Contest
Subject: General Tech | January 24, 2012 - 01:24 PM | Jeremy Hellstrom
Tagged: Intel, QLogic, purchase, Infiniband, HPC
Intel blew tiny $125 million piece of their record breaking quarterly income to purchase QLogic's InfiniBand business, which gives them access to a networking technology significantly faster than Ethernet. InfiniBand is what is referred to as a switched fabric technology which allows multiple switches to connect to multiple hosts or data stores as opposed to the more point to point single broadcast which current ethernet based networks use.
That may look familiar to some, but not as a network technology; it matches the communications architecture behind PCIe and SATA. As we have seen, the speed difference between parallel connections and serial is quite impressive and InfiniBand's fastest implementation is currently capable of transferring 25 Gbit/s per lane. That is significantly faster than the 1Gbit/s per lane PCIe 3.0 can provide which is why some current implementations of InfiniBand are used in High Performance Computing (HPC) applications. InfiniBand also offers incredibly low latency of between 100 to 200 nanoseconds, depending on the implementation.
Getting a hold of this interconnect technology gives Intel a huge boost in their capabilities of creating high performance networking technologies. They have been looking for a way to grow in that area and push out Application Specific Integrated Circuit (ASIC) manufactures from the market, replacing those chips with low power Xeons or future Intel chips. This would open up an entirely new market for Intel, who could see their already impressive growth increase significantly. Intel could become even more attractive to customers by taking advantage of the benefits of owning McAfee by placing virus/malware protection directly onto their switches. We have already seen evidence of one project along these lines at IDF 2011 when they announced the DeepSAFE project which is software that operates below the OS level, providing what they refer to as "hardware-assisted" security. With that OS-agnostic approach it would be possible to run the security software on a network switch or on an HPC interconnect. That could give Intel not only the fastest interconnect technology but also the most secure.
When discussing this with The Inquirer, Intel's representative Kirk Skaugen stated that this purchase will help Intel design and produce an exaflop level supercomputer by 2018. It is unlikely that this is Intel's only goal, with the purchase of Fulcrum Microsystems this summer, a company which designs ASICs for Ethernet switches and routers that run at 10Gbit and 40Gbit, they are well on their way to designing network switches for HPC applications. The Register ponders what this could mean for companies which have used InfiniBand technology in their products. Will they be snatched up by a networking company like Cisco, could AMD pick them up and provide competition in this industry or will they consider offering themselves to Intel the best alternative? We will be keeping an eye on this as it will not only develop into the next generation of networking technology but could also drive the successor to PCIe.
"The high-performance networking market just got a whole lot more interesting, with Intel shelling out $125m to acquire the InfiniBand switch and adapter product lines from upstart QLogic.
Intel has made no secret that it wants to bolster its Data Center and Connected Systems business by getting network equipment providers to use Xeon processors inside of their networking gear – that Intel division posted $10.1bn in revenues in 2011, and the company wants to break $20bn in the next five years."
Here is some more Tech News from around the web:
- And the Nvidia Kepler/GK104 price is…….. @ SemiAccurate
- Intel calls the successor to Romley…… @ SemiAccurate
- Western Digital reveals Thai floods cost $199m to clean up @ The Inquirer
- Pure graphene conducts heat exceptionally well @ Nanotechweb
- Laser used to cool semiconductor @ The Register
- Pirate Bay To Offer Physical Item Downloads @ Slashdot
- Is Lion Server suitable for home use? Ars investigates
- Ubuntu 11.04, 11.10, 12.04 On The NVIDIA Tegra 2 @ Phoronix
- Intel's Open-Source Driver Can Beat Mac OS X @ Phoronix
- Apple rewrites the history books @ The Tech Report
- I/ITSEC: Cutting Edge Simulation and Training Trade Show Coverage @ Tweaktown
- CES Live Coverage Part 5 @ Hi Tech Legion
- 2012 CES: OCZ Technology @ OCIA
Subject: Systems | May 24, 2011 - 09:07 PM | Tim Verry
Tagged: tesla, supercomputer, petaflop, HPC, bulldozer
Cray has been a huge name in the supercomputer market for years, and with the new XK6 they are promising to deliver a supercomputer capable of 50 Thousand Trillion operations per second. Powered by AMD Operton CPUs and NVIDIA GPUs, each XK6 blade is comprised of 2 Gemini interconnects pairing four AMD Opteron CPUs with four NVIDIA Tesla X2090 embedded graphics cards. The graphics cards in each blade have access to 6GB of GDDR5 memory, and are connected via PCI-E 2.0 links to the Opteron processors. The CPUS have access to four DDR3 memory slots “running at 1.6GHz for every G34 socket,” according to The Register. This amounts to 32GB per two-socket node when using 4GB sticks.
Cray plans to wait until AMD releases the 16 core 32nm Opteron CPUs in Q3, dubbed the Opteron 6200s. The Register quotes AMD’s CEO Thomas Siefert as promising the processors are based on the new Bulldozer cores (and would be compatible with the current G34 sockets) “would ship by summer.”
Further, they claim that Cray’s goal with the XK6 was to keep the new blades within the same thermal boundaries as its predecessor, despite the inclusion of GPUs into the mix. Cray has indicated that, due to their success in remaining within the thermal envelope, their customers will be able to use XE6 and XK6 blades interchangeably and will allow them to customize their supercomputer load-out to meet the demands of their specific computing workloads.
Each cabinet is capable of storing up to 24 blades, and can deliver up to 50 kilowatts of power. Each of the Tesla X2090 GPUS are capable of 665 gigaflops during double-precision floating point operations, something that GPUs excel at. As each XK6 blade contains 4 GPUS, and each cabinet can hold 24 blades, customers are looking at 63.8 teraflops of computing power solely from the graphics cards. On the CPU side of things, Cray is not able to release specifications on the processors as AMD has yet to deliver the chips in question. The Register estimates that each XK6 blade will provide 3.5 teraflops of floating point computing power, which amounts to approximately 84 teraflops per cabinet.
With a claimed capability to utilize up to 300 cabinets full of XK6 blades, customers are looking at approximately 44 petaflops of computing horsepower, with GPUs delivering 19.14 petaflops, and the CPUs estimated to provide 25.2 petaflops of floating point computational power.
The first customer of this system will be the Swiss National Supercomputing Centre. According to the Seattle Times, the center’s director Professor Thomas Schulthess stated that they chose the Cray XK6 based supercomputer not for it’s raw performance, but because “the Cray XK6 promises to be the first general-purpose supercomputer based on GPU technology, and we are very much looking forward to exploring its performance and productivity on real applications relevant to our scientists.”