That old chestnut again? Intel compares their current gen hardware against older NVIDIA kit

Subject: General Tech | August 17, 2016 - 12:41 PM |
Tagged: nvidia, Intel, HPC, Xeon Phi, maxwell, pascal, dirty pool

There is a spat going on between Intel and NVIDIA over the slide below, as you can read about over at Ars Technica.  It seems that Intel have reached into the industries bag of dirty tricks and polished off an old standby, testing new hardware and software against older products from their competitors.  In this case it was high performance computing products which were tested, Intel's new Xeon Phi against NVIDIA's Maxwell, tested on an older version of the Caffe AlexNet benchmark.

NVIDIA points out that not only would they have done better than Intel if an up to date version of the benchmarking software was used, but that the comparison should have been against their current architecture, Pascal.  This is not quite as bad as putting undocumented flags into compilers to reduce the performance of competitors chips or predatory discount programs but it shows that the computer industry continues to have only a passing acquaintance with fair play and honest competition.

intel-xeon-phi-performance-claim.jpg

"At this juncture I should point out that juicing benchmarks is, rather sadly, par for the course. Whenever a chip maker provides its own performance figures, they are almost always tailored to the strength of a specific chip—or alternatively, structured in such a way as to exacerbate the weakness of a competitor's product."

Here is some more Tech News from around the web:

Tech Talk

Source: Ars Technica
Manufacturer: NVIDIA

93% of a GP100 at least...

NVIDIA has announced the Tesla P100, the company's newest (and most powerful) accelerator for HPC. Based on the Pascal GP100 GPU, the Tesla P100 is built on 16nm FinFET and uses HBM2.

nvidia-2016-gtc-pascal-banner.png

NVIDIA provided a comparison table, which we added what we know about a full GP100 to:

  Tesla K40 Tesla M40 Tesla P100 Full GP100
GPU GK110 (Kepler) GM200 (Maxwell) GP100 (Pascal) GP100 (Pascal)
SMs 15 24 56 60
TPCs 15 24 28 (30?)
FP32 CUDA Cores / SM 192 128 64 64
FP32 CUDA Cores / GPU 2880 3072 3584 3840
FP64 CUDA Cores / SM 64 4 32 32
FP64 CUDA Cores / GPU 960 96 1792 1920
Base Clock 745 MHz 948 MHz 1328 MHz TBD
GPU Boost Clock 810/875 MHz 1114 MHz 1480 MHz TBD
FP64 GFLOPS 1680 213 5304 TBD
Texture Units 240 192 224 240
Memory Interface 384-bit GDDR5 384-bit GDDR5 4096-bit HBM2 4096-bit HBM2
Memory Size Up to 12 GB Up to 24 GB 16 GB TBD
L2 Cache Size 1536 KB 3072 KB 4096 KB TBD
Register File Size / SM 256 KB 256 KB 256 KB 256 KB
Register File Size / GPU 3840 KB 6144 KB 14336 KB 15360 KB
TDP 235 W 250 W 300 W TBD
Transistors 7.1 billion 8 billion 15.3 billion 15.3 billion
GPU Die Size 551 mm2 601 mm2 610 mm2 610mm2
Manufacturing Process 28 nm 28 nm 16 nm 16nm

This table is designed for developers that are interested in GPU compute, so a few variables (like ROPs) are still unknown, but it still gives us a huge insight into the “big Pascal” architecture. The jump to 16nm allows for about twice the number of transistors, 15.3 billion, up from 8 billion with GM200, with roughly the same die area, 610 mm2, up from 601 mm2.

nvidia-2016-gp100_block_diagram-1-624x368.png

A full GP100 processor will have 60 shader modules, compared to GM200's 24, although Pascal stores half of the shaders per SM. The GP100 part that is listed in the table above is actually partially disabled, cutting off four of the sixty total. This leads to 3584 single-precision (32-bit) CUDA cores, which is up from 3072 in GM200. (The full GP100 architecture will have 3840 of these FP32 CUDA cores -- but we don't know when or where we'll see that.) The base clock is also significantly higher than Maxwell, 1328 MHz versus ~1000 MHz for the Titan X and 980 Ti, although Ryan has overclocked those GPUs to ~1390 MHz with relative ease. This is interesting, because even though 10.6 TeraFLOPs is amazing, it's only about 20% more than what GM200 could pull off with an overclock.

Continue reading our preview of the NVIDIA Pascal architecture!!

AMD Brings Dual Fiji and HBM Memory To Server Room With FirePro S9300 x2

Subject: Graphics Cards | April 5, 2016 - 02:13 AM |
Tagged: HPC, hbm, gpgpu, firepro s9300x2, firepro, dual fiji, deep learning, big data, amd

Earlier this month AMD launched a dual Fiji powerhouse for VR gamers it is calling the Radeon Pro Duo. Now, AMD is bringing its latest GCN architecture and HBM memory to servers with the dual GPU FirePro S9300 x2.

AMD Firepro S9300x2 Server HPC Card.jpg

The new server-bound professional graphics card packs an impressive amount of computing hardware into a dual-slot card with passive cooling. The FirePro S9300 x2 combines two full Fiji GPUs clocked at 850 MHz for a total of 8,192 cores, 512 TUs, and 128 ROPs. Each GPU is paired with 4GB of non-ECC HBM memory on package with 512GB/s of memory bandwidth which AMD combines to advertise this as the first professional graphics card with 1TB/s of memory bandwidth.

Due to lower clockspeeds the S9300 x2 has less peak single precision compute performance versus the consumer Radeon Pro Duo at 13.9 TFLOPS versus 16 TFLOPs on the desktop card. Businesses will be able to cram more cards into their rack mounted servers though since they do not need to worry about mounting locations for the sealed loop water cooling of the Radeon card.

  FirePro S9300 x2 Radeon Pro Duo R9 Fury X FirePro S9170
GPU Dual Fiji Dual Fiji Fiji Hawaii
GPU Cores 8192 (2 x 4096) 8192 (2 x 4096) 4096 2816
Rated Clock 850 MHz 1050 MHz 1050 MHz 930 MHz
Texture Units 2 x 256 2 x 256 256 176
ROP Units 2 x 64 2 x 64 64 64
Memory 8GB (2 x 4GB) 8GB (2 x 4GB) 4GB 32GB ECC
Memory Clock 500 MHz 500 MHz 500 MHz 5000 MHz
Memory Interface 4096-bit (HBM) per GPU 4096-bit (HBM) per GPU 4096-bit (HBM) 512-bit
Memory Bandwidth 1TB/s (2 x 512GB/s) 1TB/s (2 x 512GB/s) 512 GB/s 320 GB/s
TDP 300 watts ? 275 watts 275 watts
Peak Compute 13.9 TFLOPS 16 TFLOPS 8.60 TFLOPS 5.24 TFLOPS
Transistor Count 17.8B 17.8B 8.9B 8.0B
Process Tech 28nm 28nm 28nm 28nm
Cooling Passive Liquid Liquid Passive
MSRP $6000 $1499 $649 $4000

AMD is aiming this card at datacenter and HPC users working on "big data" tasks that do not require the accuracy of double precision floating point calculations. Deep learning tasks, seismic processing, and data analytics are all examples AMD says the dual GPU card will excel at. These are all tasks that can be greatly accelerated by the massive parallel nature of a GPU but do not need to be as precise as stricter mathematics, modeling, and simulation work that depend on FP64 performance. In that respect, the FirePro S9300 x2 has only 870 GLFOPS of double precision compute performance.

Further, this card supports a GPGPU optimized Linux driver stack called GPUOpen and developers can program for it using either OpenCL (it supports OpenCL 1.2) or C++. AMD PowerTune, and the return of FP16 support are also features. AMD claims that its new dual GPU card is twice as fast as the NVIDIA Tesla M40 (1.6x the K80) and 12 times as fast as the latest Intel Xeon E5 in peak single precision floating point performance. 

The double slot card is powered by two PCI-E power connectors and is rated at 300 watts. This is a bit more palatable than the triple 8-pin needed for the Radeon Pro Duo!

The FirePro S9300 x2 comes with a 3 year warranty and will be available in the second half of this year for $6000 USD. You are definitely paying a premium for the professional certifications and support. Here's hoping developers come up with some cool uses for the dual 8.9 Billion transistor GPUs and their included HBM memory!

Source: AMD

Meet the Boltzmann Initiative, AMD's answer to HPC

Subject: General Tech | November 18, 2015 - 12:35 PM |
Tagged: amd, firepro, boltzmann, HPC, hsa

AMD has announced the Boltzmann Initiative to compete against Intel and NVIDIA in the HPC market this week at SC15.  It is not a physical product but rather new a way to unite the processing power of HSA compliant AMD APUs and FirePro GPUs.  They have announced several new projects including the Heterogeneous Compute Compiler (HCC) and Heterogeneous-compute Interface for Portability (HIP) for CUDA based apps which can automatically convert CUDA code into C++.  They also announced a headless Linux driver and HSA runtime infrastructure interface for managing clusters which utilizes their InfiniBand fabric interconnect to interface system memory directly to GPU memory as well as adding P2P GPU support and numerous other enhancements.   Check out more at DigiTimes.

225px-Boltzmann2.jpg

"The Boltzmann Initiative leverages HSA's ability to harness both central processing units (CPU) and AMD FirePro graphics processing units (GPU) for maximum compute efficiency through software."

Here is some more Tech News from around the web:

Tech Talk

Source: DigiTimes

Seagate joins the HPC super team

Subject: General Tech | July 13, 2015 - 01:31 PM |
Tagged: Seagate, IBM, HPC, hp

IBM will be making its Spectrum Scale software available on Seagate's ClusterStore HPC products, which are due out towards the end of the year.  This marks a turning point in Seagate's HPC business as previously their products were only useful to a small group of companies which used the Lustre file system, moving to IBM's product grows the available pool of customers significantly. HP will be adding their Apollo software suite into the deal making this even more attractive for potential clients.  As The Inquirer points out, this is part of the shift of international companies moving their data out of US borders, good news for ISPs and data providers in the rest of the world but not such good news for those looking for employment in the industry within the USA.

cluster-stor-6000-400x400.jpg

"SEAGATE HAS JOINED FORCES with HP and IBM in a bid to boost its position in the high-performance computing (HPC) market."

Here is some more Tech News from around the web:

Tech Talk

 

Source: The Inquirer

AMD is making SeaMicro walk the plank

Subject: General Tech | April 20, 2015 - 01:17 PM |
Tagged: amd, seamicro, HPC

Just over three years ago AMD purchased SeaMicro for $334 million to give them a way to compete in HPC applications against Intel who had recently bought up QLogic and the InfiniBand interconnect technology.  The purchase of SeaMicro included their Freedom Fabric technology which was at that time able to create servers which could use Atom or Xeon chips in the same infrastructure.  AMD developed compatibility with their existing Opteron chips and it was thought that this would be a perfect platform to launch Seattle, their hybrid 64bit ARM chips on.  Unfortunately with the poor revenue that AMD has seen means that the SeaMicro server division is being cut so they can focus on their other products.  Lisa Su obviously has more information that we do on the performance of AMD but it seems counter-intuitive to shut down the only business segment to make positive income, but as The Register points out the $45m which they made is down almost 50% from this time last year.  AMD will keep the fabric patents but as of now we do not know if they are looking to sell their server business, license the patents or follow some other business plan.

seamicro_logo.png

"Tattered AMD says it's done with its SeaMicro server division, following a grim quarter that saw the ailing chipmaker weather losses beyond the expectations of even the gloomiest of Wall Street analysts."

Here is some more Tech News from around the web:

Tech Talk

 

Source: The Register

Hints of things to come from AMD

Subject: General Tech | March 31, 2015 - 12:18 PM |
Tagged: skybridge, HPC, arm, amd

The details are a little sparse but we now have hints of what AMD's plans are for next year and 2017. In 2016 we should see AMD chips with ARM cores, the Skybridge architecture which Josh described almost a year ago, which will be pin compatible allowing the same motherboard to run with either an ARM processor or an AMD64 depending on your requirements.  The GPU portion of their APUs will move forward on a two year cycle so we should not expect any big jumps in the next year but they are talking about an HPC capable part by 2017.  The final point that The Register translated covers that HPC part which is supposed to utilize a new memory architecture which will be nine times faster than existing GDDR5.

003l.jpg

"Consumer and commercial business lead Junji Hayashi told the PC Cluster Consortium workshop in Osaka that the 2016 release CPU cores (an ARMv8 and an AMD64) will get simultaneous multithreading support, to sit alongside the clustered multithreading of the company's Bulldozer processor families."

Here is some more Tech News from around the web:

Tech Talk

Source: The Register

AMD hits the peak of performance in gaming and productivity

Subject: General Tech | August 7, 2014 - 12:45 PM |
Tagged: HPC, amd, firepro, S9150, S9050, opencl

The new cooling on the 290X tends to have it at the top of the gaming charts and with the impending release of two new FirePro HPC cards AMD looks to take the productivity title away from the Tesla K40.  The higher end S9150 boasts 16GB GDDR5 memory with a 512-bit memory interface, 44 GCN compute units with 64 stream processors each there is a total of 2816 stream processors on board.  That equates to 5.07 TFLOPS peak single-precision  2.53 TFLOPS peak double-precision performance with theoretical memory bandwidth of 320GB per second.  AMD expects the S9150 to have support for OpenCL 2.0 drivers by the end of the year, which the lower priced and specced S9050 will not though both will support AMD Stream technology and OpenCL 1.2.  Check them out at The Register.

amd_firepro_s9150.jpg

"The company's new big gun is the FirePro S9150 card, which maxes out at a blistering 5.07 TFLOPS peak single-precision floating-point performance and 2.53 TFLOPS peak double-precision performance."

Here is some more Tech News from around the web:

Tech Talk

Source: The Register

How about a little High Powered Computing?

Subject: General Tech | July 2, 2014 - 02:58 PM |
Tagged: HPC, ISS

The Register visited this years ISS and snapped some pictures of the hardware that was on display.  There were a lot of storage solutions being demonstrated like the Silent Brick Library from Fast LTA which offers an alternative to tape archives with the ability to can hold up to 60TB of uncompressed data with 12 bricks in a rack mounted device.  Samsung had a brief presentation on 3D V-NAND but did not reveal anything new about their new type of NAND.  AMD showed off their new W9100 FirePro and quite a few vendors, Intel included, are increasing their usage of watercooling in racks.  Click over to see the latest expensive HPC gear.

silent_brick.jpg

"The International Supercomputer Show in Leipzig, Germany, was full of fascinating things at the high-end grunt front of the computing business. Here's what attracted this roving hack's eye."

Here is some more Tech News from around the web:

Tech Talk

Source: The Register

Fibre at 32 gigabits per second comes a little closer

Subject: General Tech | December 9, 2013 - 12:47 PM |
Tagged: interconnect, fibre optics, 32 Gbps, HPC

With new emphasis on building modular HPC machines from multitudes of low powered processors working in parallel interconnect technology needs to provide immense amounts of bandwidth.   This is becoming much closer to reality as 32 Gbps channel is undergoing standardization and will likely be quickly accepted and certified.  Products using this standard are still a year or more from market but will likely be quickly adopted by companies who depend on large arrays of VMs.  According to the roadmap on The Register 64 Gbps is already starting development with 2016 as a possible goal for its standardization process to begin.

fibre.png

"The Association has let it be known that the “INCITS T11 standards committee has recently completed the Fibre Channel Physical Interface - sixth generation (FC-PI-6) industry standard for specifying 32 Gigabit per second (Gbps) Fibre Channel and will forward it to the American National Standards Institute (ANSI) for publication in the first quarter of 2014."

Here is some more Tech News from around the web:

Tech Talk

Source: The Register