UPDATE (August 22nd, 11:11pm ET): I reached out to GlobalFoundries over the weekend for a comment and the company had this to say:
"We would like to confirm that GF is transitioning directly from 14nm to 7nm. We consider 10nm as more a half node in scaling, due to its limited performance adder over 14nm for most applications. For most customers in most of the markets, 7nm appears to be a more favorable financial equation. It offers a much larger economic benefit, as well as performance and power advantages, that in most cases balances the design cost a customer would have to spend to move to the next node.
As you stated in your article, we will be leveraging our presence at SUNY Polytechnic in Albany, the talent and know-how gained from the acquisition of IBM Microelectronics, and the world-class R&D pipeline from the IBM Research Alliance—which last year produced the industry’s first 7nm test chip with working transistors."
An unexpected bit of news popped up today via TPU that alleges GlobalFoundries is not only developing 7nm technology (expected), but that the company will skip production of the 10nm node altogether in favor of jumping straight from the 14nm FinFET technology (which it licensed from Samsung) to 7nm manufacturing based on its own in house design process.
Reportedly, the move to 7nm would offer 60% smaller chips at three times the design cost of 14nm which is to say that this would be both an expensive and impressive endeavor. Aided by Extreme Ultraviolet (EUV) lithography, GlobalFoundries expects to be able to hit 7nm production sometime in 2020 with prototyping and small usage of EUV in the year or so leading up to it. The in house process tech is likely thanks to the research being done at the APPC (Advanced Patterning and Productivity Center) in Albany New York along with the expertise of engineers and design patents and technology (e.g. ASML NXE 3300 and 3300B EUV) purchased from IBM when it acquired IBM Microelectronics. The APPC is reportedly working simultaneously on research and development of manufacturing methods (especially EUV where extremely small wavelengths of ultraviolet light (14nm and smaller) are used to etch patterns into silicon) and supporting production of chips at GlobalFoundries' "Malta" fab in New York.
Advanced Patterning and Productivity Center in Albany, NY where Global Foundries, SUNY Poly, IBM Engineers, and other partners are forging a path to 7nm and beyond semiconductor manufacturing. Photo by Lori Van Buren for Times Union.
Intel's Custom Foundry Group will start pumping out ARM chips in early 2017 followed by Intel's own 10nm Cannon Lake processors in 2018 and Samsung will be offering up its own 10nm node as soon as next year. Meanwhile, TSMC has reportedly already tapped out 10nm wafers and will being prodction in late 2016/early 2017 and claims that it will hit 5nm by 2020. With its rivals all expecting production of 10nm chips as soon as Q1 2017, GlobalFoundries will be at a distinct disadvantage for a few years and will have only its 14nm FinFET (from Samsung) and possibly its own 14nm tech to offer until it gets the 7nm production up and running (hopefully!).
Previously, GlobalFoundries has stated that:
“GLOBALFOUNDRIES is committed to an aggressive research roadmap that continually pushes the limits of semiconductor technology. With the recent acquisition of IBM Microelectronics, GLOBALFOUNDRIES has gained direct access to IBM’s continued investment in world-class semiconductor research and has significantly enhanced its ability to develop leading-edge technologies,” said Dr. Gary Patton, CTO and Senior Vice President of R&D at GLOBALFOUNDRIES. “Together with SUNY Poly, the new center will improve our capabilities and position us to advance our process geometries at 7nm and beyond.”
If this news turns out to be correct, this is an interesting move and it is certainly a gamble. However, I think that it is a gamble that GlobalFoundries needs to take to be competitive. I am curious how this will affect AMD though. While I had expected AMD to stick with 14nm for awhile, especially for Zen/CPUs, will this mean that AMD will have to go to TSMC for its future GPUs or will contract limitations (if any? I think they have a minimum amount they need to order from GlobalFoundries) mean that GPUs will remain at 14nm until GlobalFoundries can offer its own 7nm? I would guess that Vega will still be 14nm, but Navi in 2018/2019? I guess we will just have to wait and see!
Also read:
- To 7nm And Beyond (Interview @ Semiconductor Engineering)
- GloFo Looks For 7nm Leadership @ Electronics Weekly
- GlobalFoundries develops 7nm and 10nm technologies in-house @ KitGuru
- SUNY Poly and GLOBALFOUNDRIES Announce New $500M R&D Program in Albany To Accelerate Next Generation Chip Technology @ GlobalFoundries (PR)
- AMD GPU Roadmap: Capsaicin Names Upcoming Architectures @ PC Perspective
- Next Gen Graphics and Process Migration: 20 nm and Beyond @ PC Perspective
Are GF fab sizes the same as
Are GF fab sizes the same as intel though or do they exaggerate? I think latter.
There is still debate as to whether 5nm is feasible.
http://spectrum.ieee.org/semi
http://spectrum.ieee.org/semiconductors/devices/the-status-of-moores-law-its-complicated
A lot of node naming is total BS. That article explains why.
You cant really go by a single number when it comes to ANY computer performance either. FLOPS is not enough, and when it comes to supercomputers this is especially true. HPL is not enough. HPCG and HPL and the ratio of performance of HPCG to HPL gives you a decent idea of a systems performance.
When assessing a node, however many nanometers they pretty much arbitrarily decide to name it is not a real indicator of anything at all though.
Some guys on forums thought
Some guys on forums thought that AMD being 14nm meant it would be “better” than NVidia at 16nm. I tried to tell them that there is no correlation, and that the time spent to optimizing the process is now the indicator of quality (read: you know how it performs once it is running).
That’s a difference in
That’s a difference in architecture NOT process. The Apple A9 chip in 14nm Samsung form is a smaller die and has slightly lower power than the same chip made on TSMC 16nm.
If you calculate the gate
If you calculate the gate pitch(distance between gates) and other gate geometry features, one 14nm process node can be different from another 14nm process node. but what gives the 14nm, or 16nm/other nm##, is its actual gate size and total gate geometry in three dimensions that makes a circuit able to use less power and have the better performance features. So a 14nm gate size with a larger pitch is still going to be the same 14nm gate size than a 14nm gates size process with a smaller gate pitch, the only difference is the one with the smaller pitch is going to be more densely packed with more circuits per unit area.
Apple’s 14nm GF/Samsung process gate size is smaller than TSMC’s 16nm gate size so the power usage on the 14nm process is going to be less, and Intel’s 14nm process is in its second generation finfet stage with the actual Fins taller and a better overall gate contact surface area for better circuit features than the others 14nm gate processes. The circuit doping recipes are different so the various proportions of elements used may also be slightly different process to process.
The actual design(Automated mostly) libraries for any processor can also be different with some processors like GPUs made using high density designs libraries with the metal traces packed closer together on more layers allowing the circuits to be more densely packed. But for GPUs this comes at the cost of GPUs only able to be run at lower clock rates relative to CPUs that are laid out using low density design libraries that allow for less circuit density but more clock speeds relative to GPUs.
AMD did some creative engineering with its initial Carrizo CPU cores that where only going to be used for mobile usage and clocked lower anyways. AMD did this by using the GPU style high density design libraries in the layout of Carrizo’s cores at 28nm, and this allowed for a 30% space savings for each Carrizo core while still using that very mature 28nm process node. So the Carrizo APU had more space available for more GPU/other resources and allowed AMD to get more circuits on that 28nm process without having have a process node shrink.
The good thing about this for AMD is that maybe for a Zen Mobile SKU that will not be clocked as high as a desktop/server Zen part AMD could still utilize the GPU style high density design libraries on a 14nm Zen mobile/lite core at get 30% space savings on top of a 14nm process node shrink space savings and get a low power/mobile Zen based APU with even more GPU/Ace units while having the whole part more densely packed with circuits, and mobile parts are going to be clocked lower anyways so why waste space using low density design libraries if the clocks are going to be lower to begin with.
“Some guys on forums thought that AMD being 14nm meant it would be “better” than NVidia at 16nm”
AMD’s 14nm GF/Samsung process is not as mature as TSMC’s 16nm process, and besides that AMD’s GPUs at 14nm have more compute so that is going to be the reason for the higher power usage metrics, and not so much related to the 14nm process node being able to use less power than 16nm. 14nm gates use less power to be switched, but if the processor has many more transistors in active use for compute as AMD’s GCN GPUs do then it’s AMD’s overall GPU processor design and not the 14nm GF/Samsung process that must be taken into account for the power usage metric. I like that AMD designs their consumer GPU SKUs to have more compute relative to Nvidia, as Nvidia’s consumer GPU SKUs are not as good for some compute workloads, and Nvidia users will pay through the nose to get that extra compute by being forced to purchase Nvidia’s very very costly pro GPUs to get more compute.
P.S. Nvidia has to overclock
P.S. Nvidia has to overclock their GPUs to get the same numbers of FLOPS relative to AMD’s lower clocked Polaris parts, so AMD’s Polaris SKUs have more transistors dedicated to compute so that is where the power usage comes from. You can get 2 RX 480 for around $500 and their compute will match a $1200 Titan X(Pascal) SKU’s total flops provided the Titan X is not using boost mode. Also if the 2 RX 480s are overclocked then they will match the Titan X(Pascal) boost mode even though the 2 RX480s are clocked lower relative to any Nvidia Pascal SKU!
AMD’s gives more compute circuits for the dollar that Nvidia does and AMD does this on their consumer SKUs, so AMD’s customers who use GPUs for other compute uses get a much better deal, GPUs are not all about gaming, but even for gaming that extra GCN compute is going to be used with DX12/Vulkan optimized games, and VR games!
Is that peak or max FLOPS?
Is that peak or max FLOPS? FLOPS has long been a useful performance indicator, but it really isn’t anymore. This is especially true for gaming and supercomputers.
As systems get larger, having tons of FLOPS, even max FLOPS and not just peak theoretical, means less and less.
Memory and interconnects are the main bottlenecks in large scale supercomputers. Take the K computer for example. It is “only” 10 PFLOPS or so, but it still beats the 100 PFLOPS Taihu Light system and is #1 on Graph500.
http://www.graph500.org/
Why? Its architecture is far better, despite being 5 years old. It has 93% computational efficiency, and one of the closest in HPCG to HPL ratios.
http://www.nextplatform.com/2016/06/21/measuring-top-supercomputer-performance-real-world/ see the graph and its explanation.
When it comes to gaming, color compression and other techniques that effectively allow GPUs to get away with lower memory bandwidth or FLOPS make that single number relatively useless. Nvidia makes three different GPUs that have about 10 TFLOPS SP, but for very different workloads(GP102 and GP104 are more similar than either is to GP100).
I havent looked at the specs for some of the AMD GPUs youre talking about, but DP FLOPS usually arent cheap, even from AMD.
DP flops are mostly used on
DP flops are mostly used on Both AMD’s and Nvidia’s Pro/HPC GPU variants, but for consumer SKUs those AMD RX 480 SKUs offer up plenty of 32 bit FP flops for the around 250$, or lower per GPU costs. So that is why the coin miners always liked AMD more until the coin mining ASICs became available. So if you want to get Titan X(Pascal, at $1200) levels of FP(32) performance at lower clocks the RX 480 in a dual configuration is the way to go for $500 as that will get roughly the same numbers of 32 bit FP(SP) flops, for 32bit workloads.
Render farms with 2 or 4 RX 480s per board are a bit more affordable than Nvidia’s higher cost solutions, and wait for Vega, and maybe some more Polaris variants. Maybe AMD/AIB partner could really make a Dual RX 480 GPU single PCI card variants and make some sales for single board options before Vega gets here in volume, and I’ll bet they could get some home render farm business from a Dual RX 480s on single card solution. Non gaming graphics rendering for low cost in Blender(Cycles/other rendering on the GPU not the CPU) would be one reason to go with the RX 480, so 4 RX 480s at around $1000 would have 32GB of V-memory and twice the 32 Bit FP compute as a Titian X(Pascal) in which to accelerate the Ray interaction calculations for some very high resolution animation projects.
I’m waiting for Zen to arrive and AM4, or I’ll even be looking at some single socket Opteron options if that means getting 16 Zen cores at a more affordable price than Xeon. But what I am really hoping that AMD will do at some future point is to begin to offer in shaders(hardware optimized for ray tracing) that can run full Ray Tracing algorithms without the need for any CPU assistance at all, so GPU users can be totally free of any CPU dependency for any and all Ray Tracing workloads for graphics uses. OpenCL Ray Tracing acceleration is good but not as efficient as doing Ray Tracing with a Ray Tracing ASIC block on the GPU or even on a dedicated Ray Tracing ASIC card.
Color compression is fine for gaming, but not for non gaming rendering unless it is totally lossless compression, and FPS does not matter for animation rendering, that is more or less not a concern for graphics(Non Gaming) rendering as the more Rays Traced mean better animation frame quality with realistic shadows, and AA/AO and reflection effects. Animations are going to be run at a constant rate after the frames are generated and its mostly single frames per minutes rendering for animation rather than the FPSs for gaming at Gaming’s lower quality low resolutions. So the more 32bit Flops to accelerate the Ray Tracing calculations the better for animation rendering, and the RX 480 has plenty of 32 bit FLOPS for around $250.
Imagination Technologies’ PowerVR(Wizard) GPUs have specialized Ray Tracing hardware on their GPUs with the IP, so that IP looks very interesting. I wish that some of that PowerVR wizard Ray Tracing IP would be adopted by Apple for a real Pro Graphics tablet SKU, and Apple sure needs to be focusing on some niche graphics oriented Tablet SKUs now that Tablet sales are like PC sales are not growing fast anymore, PC and Laptop sales are actually shrinking for the OEM market. Apple appears to now only think of its Tablet/Phone hardware and its laptop hardware more as portals for its online services and not really as functioning Tablet/laptop computers. Once AMD finally get out of its slump with Polaris, Zen, and Vega, maybe they can build a Reference Tablet with a Optimized 4 core Zen/Polaris low wattage APU, with the Zen cores taped out using High density design libraries for more Zen cores in less space.
I’m also very interested in the AMD K12 custom ARMv8A ISA running cores, as AMD could maybe go with a 6+ instruction issue custom AMD designed ARMv8A ISA running core, like Apple’s A7, and maybe add SMT capabilities to K12 and get even better overall CPU core execution resources utilization from K12 than even Apple and really make and AMD custom ARM ISA based tablet APU that would have plenty of Polaris ACE units. AMD appears to have under Jim Keller designed Both Zen and K12 in a similar fashion with both Zen’s X86 based cores, and K12’s ARMv8A ISA cores having similar Caching systems, and other features and maybe even both having SMT capabilities, and SMT for an ARMv8A ISA custom core would be a first, if that is in fact what Jim Keller has done while overseeing both Zen’s and K12’s design teams.
With an HSA APU, it will be
With an HSA APU, it will be much lower latency to share work between the CPU and GPU components, so more specialized hardware may be unnecessary.
HSA only does something
HSA only does something bebeficial if the fact that the CPU and GPU have a bottleneck and much like all the hype around asynchronous compute its bebefits are wildly overblown.
You need HMC/HBM and enough nonvolitile memory that is fast and close by for exascale, in order to support the fast checkpointing that exascale requires.
A lot of specialized hardware like FPGAs and ASICs exist because they are much better than CPUs or GPUs for certain tasks, which has nothing to do with CPU to GPU communication latency.
in case of mining it is not
in case of mining it is not because of radeon have better peak compute performance. but simply the mining program works better with how AMD architecture are built. this has been discussed a lot in the past. high FLOPs numbers are useless if they can’t really use them efficiently.
comparing compute
comparing compute performance(FLOPS)between AMD and nvidia is useless. if they represent real world performance why Fury X are not significantly faster than 980Ti? they said polaris is better than pascal because low clock polaris can match high clocked pascal but they forget that in case RX480 vs GTX1060 the latter also have much less shader cores compared to RX480. and look no further than top500 numbers if you want to know how efficient actually AMD designs are. look at SANAM (one of the super computer powered by AMD accelerator). they got impressive “theorical” peak performance but when running real application they can only get 50% from that peak performance.
also when it comes to VR it seems AMD have no definite advantage like many said it was before. actually even maxwell based card perform just as fine in VR.
Thanks much for this
Thanks much for this incredibly illuminating and instructive post. i assume the p.s. and the other post that followed that by Anonymous are also yours. They add a bunch more very useful insights and information. Thanks again.
In the past when Apple was
In the past when Apple was manufacturing A9(?I think) at both Samsung’s 14nm and TSMC’s 16nm many sites where trying to see which was faster and more efficient. Some sites where showing Samsung’s better, other TSMC’s better.
Heh yeah i remember reading
Heh yeah i remember reading those articles. I think we did one on battery life between the two though in cant recall which one won.
Yeah part of the problem, as
Yeah part of the problem, as I understand it, is that there is no standard for naming among all the various fabs as far as process nodes and what part of the chips they measure to determine what nanometer they say their node is at. It is pretty confusing comparing across fabs and generations… and then marketing murks it up further (e.g. flash memory is "Xnm-class" heh). Aren't Samsung's 14nm FinFET and TSMC's 16nm closer in size than the Xnm numbers would suggest? I've been on vacation and can't remember off hand sorry :).
I will give that article a read later, I could use a refresher esp. since JoshTekk is gonig on vacation soon now hehe.
Im not sure about those
Im not sure about those specific nodes, but what it really comes down to is who has the overall better process and architecture overall.
The process is determined by the EDA that goes into it, the testing and refinement, and moost of that depends on machines that NONE of the fabs themselves make, but are rather created by what is likely the most technologically advanced company on Earth: Applied Materials.
Im pretty sure that all the fabs are using machines made by Applied Materials. I might be wrong about that, but Im fairly sure thats the case.
A lot of the node names are marketing, but it would be hard to sell chips, even to hyperscalers, without some easy reference naming scheme. Talking about bending light with resolution enhancement technologies and creative mask design doesnt resonate with consumers typically.
I havent read many details about the latest processes but it SOUNDS like AMD has something good with Zen and the node theyre using with that.
On a side note, i think that focusing too much on one thing, like the nanometer name of a node, is detrimental to the market overall. It allows for marketing to drive sales, rather than actual good product designs.
The problem in terms of computer performance right now, from a desktop to a pre exascale 100PFLOPS supercomputer is not really the size of the transistors. Its the fact that there are so many bottlenecks between CPUs and other parts of the system. In the case of desktops, thats CPU to RAM, GPU or storage. In the case of a supercomputer, its CPUs to RAM and other compute nodes or accelerators.
Fortunately some people have realized this and we are seeing the adoption of HMC, HBM, XPoint and NVMe NVMf etc.
Wait for AMD’s
Wait for AMD’s HPC/Server/Workstation APU’s on an interposer SKUs that will be the $/flops, Power/Flops, leaders that will have the widest parallel connection fabrics etched out on the interposer’s silicon substrate. What AMD will be doing for the exascale computing market will be putting the connection fabric for CPU/GPU/FPGAs/DSPs on an active interposer and wiring up all these processors and HBM2 via the interposer and thousands of traces. Just Look at what AMD’s Fury X SKUs are doing with HBM to GPU connection traces, and all that will be needed is to add some Zen cores with their own wide parallel traces from the Zen cores die/s to the Greenland/Vega GPU die.
Navi will be made up of modular GPU dies and HBM2 on an interposer. AMD will even be putting an FPGA compute die in the HBM2 memory stacks, see AMD’s patent filings, so there can be dedicated distributed compute on some of AMD’s HPC/Server/exascale SKUs, and all the wide power saving low clocked high effective bandwidth interconnects for the GPU, CPU, DSP, HBM, other dies will be etched on the interposer’s silicon substrate. Interposers will be etched with active circuits instead of just passive traces so expect that at some future time that interposers of the future will have completely functioning fully coherent connection fabrics like multiple ring buses or complex butterfly/doughnut connection topologies etched into the interposer’s silicon to wire up some every powerful power saving APU designs for complete systems on an interposer for the exascale market.
The power savings from having an interposer etched with thousands of traces is great and allows for much higher effective bandwidth to be sent at lower clocks over very wide connection fabrics, so for exascale computing’s strict power usage regimens the interposer based APU/SOC will be very attractive proposition. U-Sam is offering billions in grants to get its exascale computing initiative going and get some exascale systems up and running by 2020-2024, and the power usage on any exascale system in measured in the megawatts, so there will be a big intrinsic advantage for some APU/SOC systems integrated on interposer, or MCM modules, with their Own CPU/GPU/Other compute in a more localized area with plenty of HBM2/NVM right on the HBM2, or on the MCM module, to keep the power usage from excessive data transfers down to a minimum. So some FPGA compute right in the HBM2/other memory and GPU accelerators/other accelerators right on the interposer, or MCM module will be what is used to make exascale computing practicable and possible.
Expect that all those government grants for the exascale research to lead to the same technologies being used for the consumer market, what with the government grants paying for most of the R&D for the exascale systems that the government needs. So just like that Apollo program that government sponsored R&D found its way quickly into the consumer market.
This isnt the same economic
This isnt the same economic or political climate that put humans on the Moon. I think its proabably a good thing that the government is attempting to keep the US competetive in the race to exascale, but I really doubt that the USA, Europe or China will be first.
Much like the Earth Simulator and K showed the world the most advanced architectures and highest real performance, i think the Japanese will be the first to make a useful exadcale computer.
For one thing, they have had the most advanced pre exascale architecture available and in use since late 2014.
http://www.fujitsu.com/global/products/computing/servers/supercomputer/primehpc-fx100/
That is the successor to the K and FX10, which scaled to 10s of PFLOPS. FX100 scales to over 100 PFLOPS. It has been using HMC on its CPUs since 2014, while Intel, AMD and Nvidia were hyping up their still basically nonexistent HBM and HMC offerings.
Here are a few other things the Japanese have in their favor. They just bought ARM. And not long after, they announced the switch from SPARC to ARM for the Post K computer, which will be 10x the performance of FX100 and 100x the performance of K, or one exaflop.
Intel or AMD may be able to catch up, but i doubt they will. Its funny that most people dont think of Fujitsu as making the most advanced supercomputers, but they currently do. Not even the 100 PFLOPS Taihu Light can beat the ancient K in the Graph500 rankings, and Fujitsu’s current architecture is a 32+2(for the OS) core homogenous CPU only with 1 TFLOPS DP 2 TFLOPS SP, 480GB/s HMC and an interconnect that allows for scaling to thousands of nodes with >90% computational efficiency.
First does not matter, It’s
First does not matter, It’s getting APUs/SOCs on interposers with HBM2 available for the consumer market at affordable pries because U-Sam/other Sams need those exaflop computers for whatever! With U-Sam footing the bill for a lot of Interposer R&D, HBM2/next gen HBM, and FPGA compute on the HBM2/whatever stacks, and other technology that gets used in the consumer market quicker because of any government R&D funding. Hell, it could be the Chinese/French/Russians government grants, for all I care, as long as AMD/others get the grants and that technology gets into the consumer SKUs sooner rather than later. I do not care why/what they need the exaflops for as long as AMD/Others are getting the grants to Fund the R&D that will improve the consumer SKUs the quickest.
I want my laptop to have 8 Zen cores and a big fat Vega die and 32 GBs of HBM2, and some FPGA(On the HBM2 stacks) programmable compute for any new Vulkan API features not yet implemented in the Vega GPU’s ASIC at the time of its release.
You do realize that all of
You do realize that all of those technologies currently exist and are in use right? HBM2 on interposers exist in GP100, Altera and Intel make FPGA parts that use EMIB which is better than interposers already.
Government grants are also not really necessary for the R&D. Its more for building a massive exascale system that you need a billion dollars, a lot of which is from private investments.
You forget that consumers dont drive the market for computing, with the exception of mobile chips either.
Then you have not been
Then you have not been reading about the government exascale initiative and current Interposer usage is limited to GPUs and HBM using passive interposer designs! The real government research at the university/private enterprise levels of the Exacsale research will go into developing active interposer technology and complex on interposer interconnect topologies to keep as much work happening on the interposer module and its various CPU/GPU/HBM/FPGA and other specialized processor components including some form of NVM to keep the large data sets local to the interposer/module compute as much as possible to save on power usage. Interposer technology is expensive because of the current limited use of interposer technology is not good for the amortization of that interposer R&D so that is why Interposers cost too much currently, to pay for the R&D costs. More government grants for university and private R&D will be spent on the development of better and larger interposers and more efficient/power saving interconnect topologies that will allow for the power savings that the exascale power usage regimens require.
AMD has already submitted an Exascale APU proposal using 32 Zen cores and a large Greenland/Vega GPU on and interposer including HBM2 and some FPGA compute on the HBM2 stacks. Silicon interposer technology is still passive in nature so the R&D will mostly be for getting the costs of interposer production down and developing the active types of interposers with fully functioning interconnect networks on the interposer that have all the circuitry to manage the interconnect etched into the interposer’s silicon. AMD is not the only one getting the grants, as Nvidia, Intel IBM, and others will all have their proposals and get some R&D grants. As far as the power saving needs of the exascale program go, a whole lot of that government funded R&D will find its way into consumer PC/Laptop products down to Tablets and phones, because using interposer technology and HBM/other memory technology as well as NVM in the DRAM technology will save plenty of power, as computing needs are going to put a stress on the whole world’s power production capacity.
All the funding for government exascale R&D is there to apply for and will help with getting the power usage down and the Gflops/watts metric as efficient as possible, and just like IBM, Burroughs, Univac, and others benefited from Federal/Government mainframe R&D grants/contracts in the past for technologies that where used it the business side of computing, the Current companies that make processors will benefit also from the Exascale R&D grants with that technology being used in the business/consumer markets.
Modular and scalable GPUs are going to be made using interposers to wire up smaller more affordably fabricated GPU dies with higher yield efficiencies into larger and more powerful GPU systems that will appear to any software as a single GPU! Ditto for big fat APUs/SOC on an interposer made from separately fabricated CPU dies, and separately fabricated GPU/FPGA/DSP other processor dies and HMM/NVM(XPoint/other) dies using Active Interposers with whole coherent connection fabrics/circuits etched into the interposer’s silicon substrate.
edit: HMM/
to: HBM/HMC/
edit: HMM/
to: HBM/HMC/
Actually i do read about it,
Actually i do read about it, and i also post about exascale topics on http://www.nextplatform.com a lot.
Ive read about AMDs exascale architecture. I dont think its as promising as Fujitsu’s.
I don’t see EMIB as being
I don’t see EMIB as being superior.
Oh and i forgot to mention
Oh and i forgot to mention that K and its AICS run by Riken cost $1.3 billion and they have set aside over $1 billion for Post K. Thats just for the computer and facility upgrades. Their overall spending on HPC and AI is huge. Japan also has HPCI, something much more coherent and cooperative than anyone else from what i can tell.
I’m honestly surprised
I’m honestly surprised something like “Extreme Ultra Violet Lithography” isn’t more of a hyped thing. I get that science never gets hype, but the fact we’re carving things with lasers that are so minute in scale we’re almost at single digit nanometer wavelengths? :X definitely mind blowing!
Because its not in use yet?
Because its not in use yet?
It’s been hyped since the
It’s been hyped since the late 1990s 🙂
Global Foundries don’t really
Global Foundries don’t really need 10nm, they have 22nm FD-SOI (22FDX), which should be as cheap as 28nm for design/production, about as low power as 10nm FF, and more than fast enough for mobile/IOT.
Sure, ‘flagship’ phone/tablet SoCs will probably go with 10nm. But GF will probably hoover up the rest. As ultra low power *and* cheap is going to be very popular.
AMD will do what Nvidia is
AMD will do what Nvidia is already doing. They will go to Samsung. They are already testing Samsung’s manufacturing, but they are behind Nvidia who are already getting ready to produce at least one GPU at Samsung(1080Ti?).
Look PCIe 4.0 and OCcLink
Look PCIe 4.0 and OCcLink external PCIe cables, and 300 Min Watts power on the PCIe connector.
“When we asked the PCI-SIG, we received the news that for the first time, PCIe will get a massive power increase at the connector. Solomon couldn’t recall the exact ceiling because member companies have proposed several options. Solomon stated that the minimum would be 300W, but the ceiling “may be 400 or 500W.” (1)
(1)
“PCI Express 4.0 Brings 16 GT/s And At Least 300 Watts At The Slot”
http://www.tomshardware.com/news/pcie-4.0-power-speed-express,32525.html
edit: OCcLink
to :
edit: OCcLink
to : OCuLink
That’s OCuLink for 32Gbs in each direction for external boxes/etc. with a 4 lane connection!
Hot Chips starts today 8/21
Hot Chips starts today 8/21 and runs throug 8/23! 2016 and beyond is going to be fun at lower costs for some x86 with SMT cores from AMD! Man dig them crazy Threads, and they easy on the wallet!
On 8/23 5:45 PM
A New, High Performance x86 Core Design from AMD, Michael Clark, AMD
More juicy details for that “New, High Performance x86 Core Design from AMD”!
http://www.hotchips.org/program/
GF is, as always, late with
GF is, as always, late with the next process iteration so they announce future plans to divert shareholders attention.
GF is licensing/using
GF is licensing/using Samsung’s 14nm process. And IBM, Samsung, GF have been in a chip fabrication technology IP sharing partnership/foundation for some years now! You do see that IBM is using GF for its fabrication needs. IBM licenses/transfers(In limited amounts) a lot of Chip Fab IP to its technology sharing partners. It costs Intel Billions to maintain its chip fabs, so even Intel is doing limited third party chip fabrication business. GF/Samsung/IBM, and others, have plenty of billions to spend for R&D to catch up to Intel and get down to 7nm even faster than Intel, and it’s more about the costs of doing so than any special ability on Intel’s part. There are about 6 companies world wide that supply the majority of the Chip Fabrication equipment and other IP to all the Chip makers, and Intel did not invent FINFET and a lot of that 14nm process from Intel is licensed from Universities/other IP sources.
“The term FinFET (Fin Field Effect Transistor) was coined by University of California, Berkeley researchers (Profs. Chenming Hu, Tsu-Jae King-Liu and Jeffrey Bokor) to describe a nonplanar, double-gate transistor built on an SOI substrate,[8] based on the earlier DELTA (single-gate) transistor design.[9] The distinguishing characteristic of the FinFET is that the conducting channel is wrapped by a thin silicon “fin”, which forms the body of the device. The thickness of the fin (measured in the direction from source to drain) determines the effective channel length of the device. The Wrap-around gate structure provides a better electrical control over the channel and thus helps in reducing the leakage current and overcoming other short-channel effects.”(1)
(1)
https://en.wikipedia.org/wiki/Multigate_device
GF worked on their own trying
GF worked on their own trying to come up with something better then their 28nm and failed miserably. Lucky for them, Samsung come up with 14nm and was willing to license the process. Seems like GF is again hoping for miracle to save them.
IBM had something to do with
IBM had something to do with GF getting a license to use Samsung 14nm process, what with IBM just about gifting to GF IBM’s foundry business. Both GF and Samsung will be in line to get some Power9 business from IBM(After GF’s agreement with IBM expires) and the third party OpenPower Power9(Google others) licensees will be open to choose GF or Samsung. IBM is ONLY interested in keeping the Power ISA and Power IP going and growing, thus assuring IBM of more software/services penetration in the Power8/Power9 marketplace, and a supply of Power8+/Power9 parts for IBM’s needs. IBM retains all of the Power IP that can be licensed to any OpenPower Licensee, IBM also retains all of its world class research facilities, including chip fabrication research facilities that IBM and its partners(GF, Samsung, others) will use to create a ready supply of advanced fab capacity for the OpenPower hardware ecosystem.
P.S. The Power9 processor will be revealed at Hot Chips this year along with AMD’s New x86 High Performance CPU core design, and other new CPU/GPU IP like ARM Holdings’(SoftBank) new Bifrost GPU micro-architecture ans A73 cores, etc!
ARM is adding variable 128 up
ARM is adding variable 128 up to 2,048 bit SVE to the ARMv8A ISA! Man I hope that AMD’s K12 gets some of that.
“Hot Chips ARM is bolting an extra data-crunching engine onto its 64-bit processor architecture to get it ready for Fujitsu’s Post-K exascale supercomputer.
Specifically, ARM is adding a Scalable Vector Extension (SVE) to its ARMv8-A core architecture. SVE can handle vectors from 128 to 2,048 bits in length. This technology is not an extension to NEON, which is ARM’s stock SIMD vector processing unit; we’re told SVE will be separate.”(1)
(1)
“Little ARMs pump 2,048-bit muscles in training for Fujitsu’s Post-K exascale mega-brain”
http://www.theregister.co.uk/2016/08/22/armv8_scalable_vectors/
I can see such vector
I can see such vector extensions being good for HPC, but not necessarily for consumer systems. They need to skew the front end towards high bandwidth at the cost of latency to support such units. For consumer systems, it is generally better to execute such code on the GPU (at higher latency) than to optimize the system for keeping such units fed on the CPU. HSA systems with a single pool of memory will reduce the latency for going out to be GPU significantly. This is similar to when the FPU was a separate, optional chip with high latency. Now the FPU is tightly integrated into the CPU.
Its for Fujitsu’s Post K
Its for Fujitsu’s Post K exascale computer.
“Noch mehr Offizielles zu
“Noch mehr Offizielles zu Zen: Architekturdetails und Benchmark gegen Core i7-6900K [Update]”
More Zen slides!
http://www.pcgameshardware.de/AMD-Zen-Codename-261795/Specials/Architekturdetails-Benchmark-IPC-1205041/galerie/2625556/
Do we really expect 10 nm to
Do we really expect 10 nm to be that much better than 14 nm? EUV has been talked about for so long that it is unclear whether it is even commercially feasible. I don’t know how much of a gamble it is since what are the other options? A highly tweaked 14 nm process might be very close to a 10 nm process. The defect rate on 10 nm without EUV may be really high, so not going to EUV could be considered a gamble also. Scaling after 10 nm is not guaranteed. There could be another long wait for process improvements in the next few years.