Subject: Graphics Cards, Memory | December 17, 2018 - 04:33 PM | Sebastian Peak
Tagged: Vega, radeon, JESD235, jedec, high bandwidth memory, hbm, DRAM, amd
In a press release today JEDEC has announced an update to the HBM standard, with potential implications for graphics cards utilizing the technology (such as an AMD Radeon Vega 64 successor, perhaps?).
"This update extends the per pin bandwidth to 2.4 Gbps, adds a new footprint option to accommodate the 16 Gb-layer and 12-high configurations for higher density components, and updates the MISR polynomial options for these new configurations."
Original HBM graphic via AMD
The revised spec brings the JEDEC standard up to the level we saw with Samsung's "Aquabolt" HBM2 and its 307.2 GB/s per-stack bandwidth, but with 12-high TSV stacks (up from 8) which raises memory capacity from 8GB to a whopping 24GB per stack.
The full press release from JEDEC follows:
ARLINGTON, Va., USA – DECEMBER 17, 2018 – JEDEC Solid State Technology Association, the global leader in the development of standards for the microelectronics industry, today announced the publication of an update to JESD235 High Bandwidth Memory (HBM) DRAM standard. HBM DRAM is used in Graphics, High Performance Computing, Server, Networking and Client applications where peak bandwidth, bandwidth per watt, and capacity per area are valued metrics to a solution’s success in the market. The standard was developed and updated with support from leading GPU and CPU developers to extend the system bandwidth growth curve beyond levels supported by traditional discrete packaged memory. JESD235B is available for download from the JEDEC website.
JEDEC standard JESD235B for HBM leverages Wide I/O and TSV technologies to support densities up to 24 GB per device at speeds up to 307 GB/s. This bandwidth is delivered across a 1024-bit wide device interface that is divided into 8 independent channels on each DRAM stack. The standard can support 2-high, 4-high, 8-high, and 12-high TSV stacks of DRAM at full bandwidth to allow systems flexibility on capacity requirements from 1 GB – 24 GB per stack.
This update extends the per pin bandwidth to 2.4 Gbps, adds a new footprint option to accommodate the 16 Gb-layer and 12-high configurations for higher density components, and updates the MISR polynomial options for these new configurations. Additional clarifications are provided throughout the document to address test features and compatibility across generations of HBM components.
Subject: General Tech, Graphics Cards | December 4, 2017 - 05:47 PM | Tim Verry
Tagged: navi, HBM2, hbm, gddr6, amd
WCCFTech reports that AMD is working on a GDDR6 memory controller for its upcoming graphics cards. Starting with an AMD Technical Engineer listing GDDR6 on his portfolio, the site claims to have verified through sources familiar with the matter that AMD is, in fact, supporting the new graphics memory standard and will be using their own controller to support it (rather than licensing one).
AMD is not abandoning HBM2 memory though. The company is sticking to its previously released roadmaps and Navi will still utilize HBM2 memory – at least on the high-end SKUs. While AMD has so far only released RX Vega 64 and RX Vega 56 graphics cards, the company may well release lower-end Vega-based cards with GDDR5 at some point although for now the Polaris architecture is handling the lower end. AMD supporting GDDR6 is a good thing and should enable cheaper mid-range cards that are not limited by supply shortages of the more expensive (albeit much higher bandwidth) High Bandwidth Memory that have seemingly plagues both NVIDIA and AMD at various points in time. GDDR6 further offers several advantages over GDDR5 with almost twice the speed (9 Gbps versus 16 Gbps) at lower power (1.5V versus 1.35V) and more density and underlying technology optimizations than even GDDR5X. While the G5X memory is capable of hitting the same 16 Gbps launch speeds of GDDR6, the newer memory technology offers up to 32Gb dies* versus 16Gb and a two channel design (which ends up being a bit more efficient and easier to produce / for GPU manufacturers to wire up). GDDR6 will represent a nice speed bump for mid-range cards (very low end may well stick with GDDR5 save for mobile parts which could benefit from the lower power GDDR6) while letting AMD have a bit better profit margins on these lower end margin SKUs and being able to produce more cards to satisfy demand. HBM2 is nice to have but it is more well suited for the compute-oriented cards for workstation and data center usage rather than gaming right now and GDDR6 can offer more price-to-performance for the consumer gaming cards.
As for the question of why AMD would want to design their own GDDR6 memory controller rather than license one, I think that comes down to AMD thinking long-term. It will be more expensive up front to design their own controller, but AMD will be able to more fully integrate it and tune it to work with their graphics cards such that it can be more power efficient. Also, having their own GDDR6 memory controller means they can use it in other areas such as their APUs and SoCs offered through their Semi Custom Business Unit (e.g. the SoCs used in gaming consoles). Being able to offer that controller to other companies in their semi-custom SoCs free of third party licensing fees is a good thing for AMD.
With GDDR6 becoming readily available early next year, there is a good chance AMD will be ready to use the new memory technology as soon as Navi but likely not until closer to the end of 2018 or early 2019 when AMD launches new lower and mid-range gaming cards (consumer-level) based on Navi and/or Vega.
*At launch it appears that GDDR6 from the big three (Micron, Samsung, and SK Hynix) will use 16Gb dies, but the standard allows for up to 32Gb dies. The G5X standard allows for up to 16Gb dies.
- (Leak) AMD Vega 10 and Vega 20 Information Leaked
- Micron Pushes GDDR5X To 16Gbps, Expects To Launch GDDR6 In Early 2018
- Micron Planning To Launch GDDR6 Graphics Memory In 2017
- Podcast #436 - ECS Mini-STX, NVIDIA Quadro, AMD Zen Arch, Optane, GDDR6 and more!
- AMD Q3 2017 Earnings: A Pleasant Surprise
Subject: Memory | August 25, 2016 - 02:39 AM | Tim Verry
Tagged: TSV, SK Hynix, Samsung, hot chips, hbm3, hbm
Samsung and SK Hynix were in attendance at the Hot Chips Symposium in Cupertino, California to (among other things) talk about the future of High Bandwidth Memory (HBM). In fact, the companies are working on two new HBM products: HBM3 and an as-yet-unbranded "low cost HBM." HBM3 will replace HBM2 at the high end and is aimed at the HPC and "prosumer" markets while the low cost HBM technology lowers the barrier to entry and is intended to be used in mainstream consumer products.
As currently planned, HBM3 (Samsung refers to its implementation as Extreme HBM) features double the density per layer and at least double the bandwidth of the current HBM2 (which so far is only used in NVIDIA's planned Tesla P100). Specifically, the new memory technology offers up 16Gb (~2GB) per layer and as many as eight (or more) layers can be stacked together using TSVs into a single chip. So far we have seen GPUs use four HBM chips on a single package, and if that holds true with HBM3 and interposer size limits, we may well see future graphics cards with 64GB of memory! Considering the HBM2-based Tesla will have 16 and AMD's HBM-based Fury X cards had 4GB, HBM3 is a sizable jump!
Capacity is not the only benefit though. HBM3 doubles the bandwidth versus HBM2 with 512GB/s (or more) of peak bandwidth per stack! In the theoretical example of a graphics card with 64GB of HBM3 (four stacks), that would be in the range of 2 TB/s of theoretical maximum peak bandwidth! Real world may be less, but still that is many terabytes per second of bandwidth which is exciting because it opens a lot of possibilities for gaming especially as developers push graphics further towards photo realism and resolutions keep increasing. HBM3 should be plenty for awhile as far as keeping the GPU fed with data on the consumer and gaming side of things though I'm sure the HPC market will still crave more bandwidth.
Samsung further claims that HBM3 will operate at similar (~500MHz) clocks to HBM2, but will use "much less" core voltage (HBM2 is 1.2V).
Stacked HBM memory on an interposer surrounding a processor. Upcoming HBM technologies will allow memory stacks with double the number of layers.
HBM3 is perhaps the most interesting technologically; however, the "low cost HBM" is exciting in that it will enable HBM to be used in the systems and graphics cards most people purchase. There were less details available on this new lower cost variant, but Samsung did share a few specifics. The low cost HBM will offer up to 200GB/s per stack of peak bandwidth while being much cheaper to produce than current HBM2. In order to reduce the cost of production, their is no buffer die or ECC support and the number of Through Silicon Vias (TSV) connections have been reduced. In order to compensate for the lower number of TSVs, the pin speed has been increased to 3Gbps (versus 2Gbps on HBM2). Interestingly, Samsung would like for low cost HBM to support traditional silicon as well as potentially cheaper organic interposers. According to NVIDIA, TSV formation is the most expensive part of interposer fabrication, so making reductions there (and somewhat making up for it in increased per-connection speeds) makes sense when it comes to a cost-conscious product. It is unclear whether organic interposers will win out here, but it is nice to seem them get a mention and is an alternative worth looking into.
Both high bandwidth and low latency memory technologies are still years away and the designs are subject to change, but so far they are both plans are looking rather promising. I am intrigued by the possibilities and hope to see new products take advantage of the increased performance (and in the latter case lower cost). On the graphics front, HBM3 is way too far out to see a Vega release, but it may come just in time for AMD to incorporate it into its high end Navi GPUs, and by 2020 the battle between GDDR and HBM in the mainstream should be heating up.
What are your thoughts on the proposed HBM technologies?
Subject: Graphics Cards | April 5, 2016 - 02:13 AM | Tim Verry
Tagged: HPC, hbm, gpgpu, firepro s9300x2, firepro, dual fiji, deep learning, big data, amd
Earlier this month AMD launched a dual Fiji powerhouse for VR gamers it is calling the Radeon Pro Duo. Now, AMD is bringing its latest GCN architecture and HBM memory to servers with the dual GPU FirePro S9300 x2.
The new server-bound professional graphics card packs an impressive amount of computing hardware into a dual-slot card with passive cooling. The FirePro S9300 x2 combines two full Fiji GPUs clocked at 850 MHz for a total of 8,192 cores, 512 TUs, and 128 ROPs. Each GPU is paired with 4GB of non-ECC HBM memory on package with 512GB/s of memory bandwidth which AMD combines to advertise this as the first professional graphics card with 1TB/s of memory bandwidth.
Due to lower clockspeeds the S9300 x2 has less peak single precision compute performance versus the consumer Radeon Pro Duo at 13.9 TFLOPS versus 16 TFLOPs on the desktop card. Businesses will be able to cram more cards into their rack mounted servers though since they do not need to worry about mounting locations for the sealed loop water cooling of the Radeon card.
|FirePro S9300 x2||Radeon Pro Duo||R9 Fury X||FirePro S9170|
|GPU||Dual Fiji||Dual Fiji||Fiji||Hawaii|
|GPU Cores||8192 (2 x 4096)||8192 (2 x 4096)||4096||2816|
|Rated Clock||850 MHz||1050 MHz||1050 MHz||930 MHz|
|Texture Units||2 x 256||2 x 256||256||176|
|ROP Units||2 x 64||2 x 64||64||64|
|Memory||8GB (2 x 4GB)||8GB (2 x 4GB)||4GB||32GB ECC|
|Memory Clock||500 MHz||500 MHz||500 MHz||5000 MHz|
|Memory Interface||4096-bit (HBM) per GPU||4096-bit (HBM) per GPU||4096-bit (HBM)||512-bit|
|Memory Bandwidth||1TB/s (2 x 512GB/s)||1TB/s (2 x 512GB/s)||512 GB/s||320 GB/s|
|TDP||300 watts||?||275 watts||275 watts|
|Peak Compute||13.9 TFLOPS||16 TFLOPS||8.60 TFLOPS||5.24 TFLOPS|
AMD is aiming this card at datacenter and HPC users working on "big data" tasks that do not require the accuracy of double precision floating point calculations. Deep learning tasks, seismic processing, and data analytics are all examples AMD says the dual GPU card will excel at. These are all tasks that can be greatly accelerated by the massive parallel nature of a GPU but do not need to be as precise as stricter mathematics, modeling, and simulation work that depend on FP64 performance. In that respect, the FirePro S9300 x2 has only 870 GLFOPS of double precision compute performance.
Further, this card supports a GPGPU optimized Linux driver stack called GPUOpen and developers can program for it using either OpenCL (it supports OpenCL 1.2) or C++. AMD PowerTune, and the return of FP16 support are also features. AMD claims that its new dual GPU card is twice as fast as the NVIDIA Tesla M40 (1.6x the K80) and 12 times as fast as the latest Intel Xeon E5 in peak single precision floating point performance.
The double slot card is powered by two PCI-E power connectors and is rated at 300 watts. This is a bit more palatable than the triple 8-pin needed for the Radeon Pro Duo!
The FirePro S9300 x2 comes with a 3 year warranty and will be available in the second half of this year for $6000 USD. You are definitely paying a premium for the professional certifications and support. Here's hoping developers come up with some cool uses for the dual 8.9 Billion transistor GPUs and their included HBM memory!
Some Hints as to What Comes Next
On March 14 at the Capsaicin event at GDC AMD disclosed their roadmap for GPU architectures through 2018. There were two new names in attendance as well as some hints at what technology will be implemented in these products. It was only one slide, but some interesting information can be inferred from what we have seen and what was said in the event and afterwards during interviews.
Polaris the the next generation of GCN products from AMD that have been shown off for the past few months. Previously in December and at CES we saw the Polaris 11 GPU on display. Very little is known about this product except that it is small and extremely power efficient. Last night we saw the Polaris 10 being run and we only know that it is competitive with current mainstream performance and is larger than the Polaris 11. These products are purportedly based on Samsung/GLOBALFOUNDRIES 14nm LPP.
The source of near endless speculation online.
In the slide AMD showed it listed Polaris as having 2.5X the performance per watt over the previous 28 nm products in AMD’s lineup. This is impressive, but not terribly surprising. AMD and NVIDIA both skipped the 20 nm planar node because it just did not offer up the type of performance and scaling to make sense economically. Simply put, the expense was not worth the results in terms of die size improvements and more importantly power scaling. 20 nm planar just could not offer the type of performance overall that GPU manufacturers could achieve with 2nd and 3rd generation 28nm processes.
What was missing from the slide is mention that Polaris will integrate either HMB1 or HBM2. Vega, the architecture after Polaris, does in fact list HBM2 as the memory technology it will be packaged with. It promises another tick up in terms of performance per watt, but that is going to come more from aggressive design optimizations and likely improvements on FinFET process technologies. Vega will be a 2017 product.
Beyond that we see Navi. It again boasts an improvement in perf per watt as well as the inclusion of a new memory technology behind HBM. Current conjecture is that this could be HMC (hybrid memory cube). I am not entirely certain of that particular conjecture as it does not necessarily improve upon the advantages of current generation HBM and upcoming HBM2 implementations. Navi will not show up until 2018 at the earliest. This *could* be a 10 nm part, but considering the struggle that the industry has had getting to 14/16nm FinFET I am not holding my breath.
AMD provided few details about these products other than what we see here. From here on out is conjecture based upon industry trends, analysis of known roadmaps, and the limitations of the process and memory technologies that are already well known.
Subject: Graphics Cards | March 15, 2016 - 02:02 AM | Ryan Shrout
Tagged: vulkan, raja koduri, Polaris, HBM2, hbm, dx12, crossfire, amd
After hosting the AMD Capsaicin event at GDC tonight, the SVP and Chief Architect of the Radeon Technologies Group Raja Koduri sat down with me to talk about the event and offered up some additional details on the Radeon Pro Duo, upcoming Polaris GPUs and more. The video below has the full interview but there are several highlights that stand out as noteworthy.
- Raja claimed that one of the reasons to launch the dual-Fiji card as the Radeon Pro Duo for developers rather than pure Radeon, aimed at gamers, was to “get past CrossFire.” He believes we are at an inflection point with APIs. Where previously you would abstract two GPUs to appear as a single to the game engine, with DX12 and Vulkan the problem is more complex than that as we have seen in testing with early titles like Ashes of the Singularity.
But with the dual-Fiji product mostly developed and prepared, AMD was able to find a market between the enthusiast and the creator to target, and thus the Radeon Pro branding was born.
Raja further expands on it, telling me that in order to make multi-GPU useful and productive for the next generation of APIs, getting multi-GPU hardware solutions in the hands of developers is crucial. He admitted that CrossFire in the past has had performance scaling concerns and compatibility issues, and that getting multi-GPU correct from the ground floor here is crucial.
- With changes in Moore’s Law and the realities of process technology and processor construction, multi-GPU is going to be more important for the entire product stack, not just the extreme enthusiast crowd. Why? Because realities are dictating that GPU vendors build smaller, more power efficient GPUs, and to scale performance overall, multi-GPU solutions need to be efficient and plentiful. The “economics of the smaller die” are much better for AMD (and we assume NVIDIA) and by 2017-2019, this is the reality and will be how graphics performance will scale.
Getting the software ecosystem going now is going to be crucial to ease into that standard.
- The naming scheme of Polaris (10, 11…) has no equation, it’s just “a sequence of numbers” and we should only expect it to increase going forward. The next Polaris chip will be bigger than 11, that’s the secret he gave us.
There have been concerns that AMD was only going to go for the mainstream gaming market with Polaris but Raja promised me and our readers that we “would be really really pleased.” We expect to see Polaris-based GPUs across the entire performance stack.
- AMD’s primary goal here is to get many millions of gamers VR-ready, though getting the enthusiasts “that last millisecond” is still a goal and it will happen from Radeon.
- No solid date on Polaris parts at all – I tried! (Other than the launches start in June.) Though Raja did promise that after tonight, he will only have his next alcoholic beverage until the launch of Polaris. Serious commitment!
- Curious about the HBM2 inclusion in Vega on the roadmap and what that means for Polaris? Though he didn’t say it outright, it appears that Polaris will be using HBM1, leaving me to wonder about the memory capacity limitations inherent in that. Has AMD found a way to get past the 4GB barrier? We are trying to figure that out for sure.
Why is Polaris going to use HBM1? Raja pointed towards the extreme cost and expense of building the HBM ecosystem prepping the pipeline for the new memory technology as the culprit and AMD obviously wants to recoup some of that cost with another generation of GPU usage.
Speaking with Raja is always interesting and the confidence and knowledge he showcases is still what gives me assurance that the Radeon Technologies Group is headed in the correct direction. This is going to be a very interesting year for graphics, PC gaming and for GPU technologies, as showcased throughout the Capsaicin event, and I think everyone should be looking forward do it.
Fighting for Relevance
AMD is still kicking. While the results of this past year have been forgettable, they have overcome some significant hurdles and look like they are improving their position in terms of cutting costs while extracting as much revenue as possible. There were plenty of ups and downs for this past quarter, but when compared to the rest of 2015 there were some solid steps forward here.
The company reported revenues of $958 million, which is down from $1.06 billion last quarter. The company also recorded a $103 million loss, but that is down significantly from the $197 million loss the quarter before. Q3 did have a $65 million write-down due to unsold inventory. Though the company made far less in revenues, they also shored up their losses. The company is still bleeding, but they still have plenty of cash on hand for the next several quarters to survive. When we talk about non-GAAP figures, AMD reports a $79 million loss for this past quarter.
For the entire year AMD recorded $3.99 billion in revenue with a net loss of $660 million. This is down from FY 2014 revenues of $5.51 billion and a net loss of $403 million. AMD certainly is trending downwards year over year, but they are hoping to reverse that come 2H 2016.
Graphics continues to be solid for AMD as they increased their sales from last quarter, but are down year on year. Holiday sales were brisk, but with only the high end Fury series being a new card during this season, the impact of that particular part was not as great as compared to the company having a new mid-range series like the newly introduced R9 380X. The second half of 2016 will see the introduction of the Polaris based GPUs for both mobile and desktop applications. Until then, AMD will continue to provide the current 28 nm lineup of GPUs to the market. At this point we are under the assumption that AMD and NVIDIA are looking at the same timeframe for introducing their next generation parts due to process technology advances. AMD already has working samples on Samsung’s/GLOBALFOUNDRIES 14nm LPP (low power plus) that they showed off at CES 2016.
Subject: Graphics Cards, Memory | January 19, 2016 - 11:01 PM | Scott Michaud
Tagged: Samsung, HBM2, hbm
Samsung has just announced that they have begun mass production of 4GB HBM2 memory modules. When used on GPUs, four packages can provide 16GB of Video RAM with very high performance. They do this with a very wide data bus, which trade off frequency for transferring huge chunks. Samsung's offering is rated at 256 GB/s per package, which is twice what the Fury X could do with HBM1.
They also expect to mass produce 8GB HBM2 packages within this calendar year. I'm guessing that this means we'll see 32GB GPUs in the late-2016 or early-2017 time frame unless "within this year" means very, very soon (versus Q3/Q4). They will likely be for workstation or professional cards, but, in NVIDIA's case, those are usually based on architectures that are marketed to high-end gaming enthusiasts through some Titan offering. There's a lot of ways this could go, but a 32GB Titan seems like a bit much; I wouldn't expect that this affects the enthusiast gamer segment. It might mean that professionals looking to upgrade from the Kepler-based Tesla K-series might be waiting a little longer, maybe even GTC 2017. Alternatively, they might get new cards, just with a 16GB maximum until a refresh next year. There's not enough information to know one way or the other, but it's something to think about when more of it starts rolling in.
Samsung's HBM2 are compatible with ECC, although I believe that was also true for at least some HBM1 modules from SK Hynix.
Subject: Graphics Cards | January 11, 2016 - 06:05 PM | Sebastian Peak
Tagged: rumor, report, pascal, nvidia, HBM2, hbm, GP104
A delivery of GPUs and related test equipment from Taiwan to Banglore has led to speculation about NVIDIA's upcoming GP104 Pascal GPU.
Image via Zauba.com
How much information can be gleaned from an import shipping manifest (linked here)? The data indicates a chip with a 37.5 x 37.5 mm package and 2152 pins, which is being attributed to the GP104 based on knowledge of “earlier, similar deliveries” (or possible inside information). This has prompted members of the 3dcenter.org forums (German language) to speculate on the use of GDDR5 or GDDR5X memory based on the likelihood of HBM being implemented on a die of this size.
Of course, NVIDIA has stated that Pascal will implement 3D memory, and the upcoming GP100 will reportedly be on a 55 x 55 mm package using HBM2. Could this be a new, lower-cost part using the existing GDDR5 standard or the faster GDDR5X instead? VideoCardz and WCCFtech have posted stories based on the 3DCenter report, and to quote directly from the VideoCardz post on the subject:
"3DCenter has a theory that GP104 could actually not use HBM, but GDDR5(X) instead. This would rather be a very strange decision, but could NVIDIA possibly make smaller GPU (than GM204) and still accommodate 4 HBM modules? This theory is not taken from the thin air. The GP100 aka the Big Pascal, would supposedly come in 55x55mm BGA package. That’s 10mm more than GM200, which were probably required for additional HBM modules. Of course those numbers are for the whole package (with interposer), not just the GPU."
All of this is a lot to take from a shipping record that might not even be related to an NVIDIA product, but the report has made the rounds at this point so now we’ll just have to wait for new information.
Subject: Graphics Cards | November 18, 2015 - 01:22 PM | Sebastian Peak
Tagged: Sapphire TriXX, R9 Fury X, overclocking, hbm, amd
The new version (5.2.1) of Sapphire's TriXX overclocking utility has been released, and it finally unlocks voltage and HBM overclocking for AMD's R9 Fury X.
(Image credit: Sapphire)
Previously the voltage of the R9 Fury X core was not adjustable, leaving what would seem to be quite a bit of untapped headroom for the cards which shipped with a powerful liquid-cooling solution rated for 500 watts of thermal dissipation. This should allow for much better results than what Ryan was able to achieve when he attempted overclocking for our review of the R9 Fury X in June (without the benefit of voltage adjustments):
"My net result: a clock speed of 1155 MHz rather than 1050 MHz, an increase of 10%. That's a decent overclock for a first attempt with a brand new card and new architecture, but from the way that AMD had built up the "500 watt cooler" and the "375 watts available power" from the dual 8-pin power connectors, I was honestly expecting quite a bit more. Hopefully we'll see some community adjustments, like voltage modifications, that we can mess around with later..."
(Image credit: Sapphire)
- You can download the new TriXX v5.2.1 software from the product page at Sapphire here.
Will TriXX v5.2.1 unleash the full potential of the Fury X? We will have to wait for some overclocked benchmark numbers, but having the ability can only be a good thing for enthusiasts.