Samsung and SK Hynix Discuss The Future of High Bandwidth Memory (HBM) At Hot Chips 28

Subject: Memory | August 25, 2016 - 02:39 AM |
Tagged: TSV, SK Hynix, Samsung, hot chips, hbm3, hbm

Samsung and SK Hynix were in attendance at the Hot Chips Symposium in Cupertino, California to (among other things) talk about the future of High Bandwidth Memory (HBM). In fact, the companies are working on two new HBM products: HBM3 and an as-yet-unbranded "low cost HBM." HBM3 will replace HBM2 at the high end and is aimed at the HPC and "prosumer" markets while the low cost HBM technology lowers the barrier to entry and is intended to be used in mainstream consumer products.

As currently planned, HBM3 (Samsung refers to its implementation as Extreme HBM) features double the density per layer and at least double the bandwidth of the current HBM2 (which so far is only used in NVIDIA's planned Tesla P100). Specifically, the new memory technology offers up 16Gb (~2GB) per layer and as many as eight (or more) layers can be stacked together using TSVs into a single chip. So far we have seen GPUs use four HBM chips on a single package, and if that holds true with HBM3 and interposer size limits, we may well see future graphics cards with 64GB of memory! Considering the HBM2-based Tesla will have 16 and AMD's HBM-based Fury X cards had 4GB, HBM3 is a sizable jump!

Capacity is not the only benefit though. HBM3 doubles the bandwidth versus HBM2 with 512GB/s (or more) of peak bandwidth per stack! In the theoretical example of a graphics card with 64GB of HBM3 (four stacks), that would be in the range of 2 TB/s of theoretical maximum peak bandwidth! Real world may be less, but still that is many terabytes per second of bandwidth which is exciting because it opens a lot of possibilities for gaming especially as developers push graphics further towards photo realism and resolutions keep increasing. HBM3 should be plenty for awhile as far as keeping the GPU fed with data on the consumer and gaming side of things though I'm sure the HPC market will still crave more bandwidth.

Samsung further claims that HBM3 will operate at similar (~500MHz) clocks to HBM2, but will use "much less" core voltage (HBM2 is 1.2V).

HBM Four Stacked.jpg

Stacked HBM memory on an interposer surrounding a processor. Upcoming HBM technologies will allow memory stacks with double the number of layers.

HBM3 is perhaps the most interesting technologically; however, the "low cost HBM" is exciting in that it will enable HBM to be used in the systems and graphics cards most people purchase. There were less details available on this new lower cost variant, but Samsung did share a few specifics. The low cost HBM will offer up to 200GB/s per stack of peak bandwidth while being much cheaper to produce than current HBM2. In order to reduce the cost of production, their is no buffer die or ECC support and the number of Through Silicon Vias (TSV) connections have been reduced. In order to compensate for the lower number of TSVs, the pin speed has been increased to 3Gbps (versus 2Gbps on HBM2). Interestingly, Samsung would like for low cost HBM to support traditional silicon as well as potentially cheaper organic interposers. According to NVIDIA, TSV formation is the most expensive part of interposer fabrication, so making reductions there (and somewhat making up for it in increased per-connection speeds) makes sense when it comes to a cost-conscious product. It is unclear whether organic interposers will win out here, but it is nice to seem them get a mention and is an alternative worth looking into.

Both high bandwidth and low latency memory technologies are still years away and the designs are subject to change, but so far they are both plans are looking rather promising. I am intrigued by the possibilities and hope to see new products take advantage of the increased performance (and in the latter case lower cost). On the graphics front, HBM3 is way too far out to see a Vega release, but it may come just in time for AMD to incorporate it into its high end Navi GPUs, and by 2020 the battle between GDDR and HBM in the mainstream should be heating up.

What are your thoughts on the proposed HBM technologies?

Source: Ars Technica

What dwells in the heart of HoloLens? Now we all know!

Subject: General Tech | August 23, 2016 - 12:40 PM |
Tagged: hololens, microsoft, Tensilica, Cherry Trail, hot chips

Microsoft revealed information about the internals of the new holographic processor used in their Hololens at Hot Chips, the first peek we have had.  The new headset is another win for Tensilica as they provide the DSP and instruction extensions; previously we have seen them work with VIA to develop an SSD controller and with AMD for TrueAudio solutions.  Each of the 24 cores has a different task it is hardwired for, offering more efficient processing than software running on flexible hardware.

The processing power for your interface comes from a 14nm Cherry Trail processor with 1GB of DDR and yes, your apps will run on Windows 10.  For now the details are still sparse, there is still a lot to be revealed about Microsoft's answer to VR.  Drop by The Register for more slides and info.

hololens_large.jpg

"The secretive HPU is a custom-designed TSMC-fabricated 28nm coprocessor that has 24 Tensilica DSP cores. It has about 65 million logic gates, 8MB of SRAM, and a layer of 1GB of low-power DDR3 RAM on top, all in a 12mm-by-12mm BGA package. We understand it can perform a trillion calculations a second."

Here is some more Tech News from around the web:

Tech Talk

Source: The Register

A hint of what to come from Hot Chips

Subject: General Tech | August 25, 2015 - 02:57 PM |
Tagged: amd, hot chips, SK Hynix

Thanks to DigiTimes we are getting some information out of Hot Chips about what is coming up from AMD.  As Sebastian just posted we now have a bit more about the R9 Nano and you can bet we will see more in the near future.  They also describe the new HBM developed in partnership with SK Hynix,  4GB of high-bandwidth memory over a 4096-bit interface will offer an impressive 512Gb/s of memory bandwidth.  We also know a bit more about the new A-series APUs which will range up to 12 compute cores, four Excavator based CPUs and eight GCN based GPUs.  They will also be introducing new power saving features called Adaptive Voltage and Frequency Scaling (AVFS) and will support the new H.265 compression standard.  Click on through to DigiTimes or wait for more pictures and documentation to be released from Hot Chips.

index.jpg

"AMD is showcasing its new high-performance accelerated processing unit (APU), codenamed Carrizo, and the new AMD Radeon R9 Fury family of GPUs, codenamed Fiji, at the annual Hot Chips symposium."

Here is some more Tech News from around the web:

Tech Talk

Source: DigiTimes

Report: Leaked Slide From AMD Gives Glimpse of R9 Nano Performance

Subject: Graphics Cards | August 24, 2015 - 02:37 PM |
Tagged: rumor, report, Radeon R9 Nano, R9 290X, leak, hot chips, hbm, amd

A report from German-language tech site Golem contains what appears to be a slide leaked from AMD's GPU presentation at Hot Chips in Cupertino, and the results paint a very efficient picture of the upcoming Radeon R9 Nano GPU.

nano_chart.png

The spelling of "performance" doesn't mean this is fake, does it?

While only managing 3 FPS better than the Radeon R9 290X in this particular benchmark, this result was achieved with 1.9x the performance per watt of the baseline 290X in the test. The article speculates on the possible clock speed of the R9 Nano based on the relative performance, and estimates 850 MHz (which is of course up for debate as no official specs are known).

The most compelling part of the result has to be the ability of the Nano to match or exceed the R9 290X in performance, while only requiring a single 8-pin PCIe connector and needing an average of only 175 watts. With a mini-ITX friendly 15 cm board (5.9 inches) this could be one of the more compelling options for a mini gaming rig going forward.

We have a lot of questions that have yet to be answered of course, including the actual speed of both core and HBM, and just how quiet this air-cooled card might be under load. We shouldn't have to wait much longer!

Source: Golem.de

It's not just Broadwell today, we also have Seattle news

Subject: General Tech | August 11, 2014 - 01:47 PM |
Tagged: amd, seattle, hot chips

AMD has been showing off a reference Seattle-based server at Hot Chips and The Tech Report had an opportunity to see it.  Eight 64-bit Cortex-A57 chips are set up in pairs, each pair sharing 1MB of L2 cache while the 8MB of L3 cache is accessible by all eight chips as well as the coprocessors, memory controller, and I/O subsystems.  The system can address up to 128GB of DDR3 or DDR4, and you get support fot 8 SATA 6Gbps ports and 8 lanes of PCIe 3.0 to apportion between the slots.  There is a secure System Control Processor, a partitioned Cortex-A5 core with its own ROM, RAM, and I/O to control power, boot and configuration control with support for TrustZone as well as a Cryptographic Coprocessor which accelerates all encryption processes as you might well expect.  Read on for more information about AMD's unique new take on server technology.

seattle.png

"For some time now, the features of AMD's Seattle server processor have been painted in broad brush strokes. This morning, at the Hot Chips symposium, AMD is filling in most of the missing details. We were treated to an advance briefing last week, where AMD provided previously confidential information about Seattle's cache network, memory controller, I/O features, and coprocessors."

Here is some more Tech News from around the web:

Tech Talk

Come on AMD, spill the beans on Steamroller already

Subject: General Tech | September 6, 2012 - 02:58 PM |
Tagged: vishera, trinity, Steamroller, piledriver, hot chips, bulldozer, amd, Abu Dhabi

You've seen the slides everywhere and read through what Josh could observe and predict from those slides but at the end of Hot Chips will still know little more about the core everyone is waiting for.  The slides show a core little changed from Bulldozer, which is exactly what we've been expecting as AMD has always described Steamroller as a refined Bulldozer design, improving the existing architecture as opposed to a complete redesign.  SemiAccurate did pull out one little gem which might mean good news for both AMD and consumers which pertains to the high density libraries slide.  The 30% decrease in size and power consumption seems to have been implemented by simply using the high density libraries that AMD uses for GPUs.  As this library already exists, AMD didn't need to spend money to develop it, they essentially managed this 30% improvement with a button press, as SemiAccurate put it.  This could well mean that Steamroller will either come out at a comparatively low price or will give AMD higher profit margins ... or a mix of both.

sr_sl05.jpg

"With that in mind, the HDL slide was rather interesting. AMD is claiming that if you rebuild Bulldozer with an HDL library, the resulting chip has a 30% decrease in size and power use. To AMD at least, this is worth a full shrink, but we only buy that claim if it is 30% smaller and 30% less power hungry, not 30% in aggregate. That said, it is a massive gain with just a button press.

AMD should be applauded, or it would have been, but during the keynote, the one thing that kept going through my mind was, “Why didn’t they do this 5 years ago?”. If you can get 30% from changing out a library to the ones you build your GPUs with, didn’t someone test this out before you decided on layout tools?"

Here is some more Tech News from around the web:

Tech Talk

Source: SemiAccurate

Fee PHI fo fum; Intel changes the smell of a Pentium

Subject: General Tech | September 5, 2012 - 03:49 PM |
Tagged: Xeon Phi, xeon, larrabee, knights corner, Intel, hot chips

The Register is back with more information from Hot Chips about Intel's Xeon Phi coprocessor, which seems to be much more than just a GPU in drag.  Inside the shell you will find at least 50 cores and at least 8GB of GDDR5 graphics, wwith the cores being very heavily modified 22-nanometer Tri-Gate process Pentium P54C chips clocked somewhere between 1.2-1.6GHz.  There is a brand new Vector Processing Unit which processes 512-bit SIMD instructions and sports an Extended Math Unit to handle calculations with hardware not software.  Read on for more details about the high-speed ring interconnects that allow these chips to communicate among themselves and with the Xeon server it will be a part of.

ElReg_intel_xeon_phi_block_diagram.jpg

"Intel has been showing off the performance of the "Knights Corner" x86-based coprocessor for so long that it's easy to forget that it is not yet a product you can actually buy. Back in June, Knights Corner was branded as the "Xeon Phi", making it clear that Phi was a Xeon coprocessor even if it does not bear a lot of resemblance to the Xeon processors at the heart of the vast majority of the world's servers."

Here is some more Tech News from around the web:

Tech Talk

Source: The Register

A lot of little Phi coprocessors lightens the load

Subject: General Tech | August 31, 2012 - 02:43 PM |
Tagged: Intel, xeon, Xeon Phi, hot chips, larrabee

The Xeon Phi is not Larrabee but it does give a chance to remind people that Intel did at one time swear we would be seeing huge results from a lot of strung together Pentium chips.  Nor is Many Integrated Cores the same as AMD's Magny-cours, although you can be forgiven if that thought popped into your head.  Instead the Xeon Phi is a co-processor that will have 50 or more 512-bit SIMD architecture based processors, each with 512KB of Level 2 cache.  These cores are comparatively slow on their own but have been designed to spread tasks over dozens of cores for parallel processing to make up for the lack of individual power.  Intel sees Phi as a way to create HPC servers which will be physically smaller than one based solely on traditional Xeon based servers as well as being more efficient.  There is still a lot more we need to learn about these chips; until then you can check out The Inquirer's article on Intel's answer to NVIDIA and AMD's HPC cards.

Xeon_Phi_PCIe_Card.jpg

"CHIPMAKER Intel revealed some architectural details of its upcoming Xeon Phi accelerator at the Hotchips conference, saying that the chip will feature 512-bit SIMD units."

Here is some more Tech News from around the web:

Tech Talk

Source: The Inquirer
Author:
Subject: Processors
Manufacturer: AMD

HotChips 2012

 

Ah, the end of August.  School is about to start.  American college football is about to get underway.  Hot Chips is now in full swing.  I guess the end of August caters to all sorts of people.  For the people who are most interested in Hot Chips, the amount of information on next generation CPU architectures is something to really look forward to.  AMD is taking this opportunity to give us a few tantalizing bits of information about their next generation Steamroller core which will be introduced with the codenamed “Kaveri” APU due out in 2013.

sr_sl_intro.jpg

AMD is seemingly on the brink of releasing the latest architectural update with Vishera.  This is a Piledriver+ based CPU that will find its way into AM3+ sockets.  On the server side it is expected that the Abu Dhabi processors will also be released in a late September timeframe.  Trinity was the first example of a Piledriver based product, and it showed markedly improved thermals as compared to previous Bulldozer based products, and featured a nice little bump in IPC in both single and multi-threaded applications.  Vishera and Abu Dhabi look to be Piledriver+, which essentially means that there are a few more tweaks in the design that *should* allow it to go faster per clock than Trinity.  There have been a few performance leaks so far, but nothing that has been concrete (or has shown final production-ready silicon).

Until that time when Vishera and its ilk are released, AMD is teasing us with some Steamroller information.  This presentation is featured at Hotchips today (August 28).  It is a very general overview of improvements, but very few details about how AMD is achieving increased performance with this next gen architecture are given.  So with that, I will dive into what information we have.

Click to read the entire article here.

Hot Chips is coming and IBM has already spilled its beans

Subject: General Tech | August 21, 2012 - 03:27 PM |
Tagged: IBM, power7+, Intel, amd, hot chips

While it doesn't get the news coverage that Intel and AMD's chips do, IBM's Power series has been with us for a while and they seem really excited about the new Power7+ chip that they are about to drop.  They are so excited that they didn't even wait for the Hot Chips conference where many manufacturers will be revealing their new silicon.  For instance, the new chip will carry 32MB of L3 cache, AES and SHA-2 acceleration and models running from a modest 4 cores at 3GHz, a 4GHz 8 core model and a possible 4 core model topping 5GHz if The Register got their maths right.  Check it all out here; with more likely to come at Hot Chips next week.

elreg_ibm_power_roadmap_circa_2011.jpg

"The Hot Chips 24 conference hosted by Stanford University is next week, and IBM, Oracle, Advanced Micro Devices, Fujitsu, and Intel are expected to talk tech relating to just-announced or impending processors. But Big Blue seems unable to contain its enthusiasm for the Power7+ chip that it will talk about alongside its next-generation zNext processors for its System z mainframes."

Here is some more Tech News from around the web:

Tech Talk

Source: The Register