Subject: General Tech | June 7, 2017 - 09:31 PM | Josh Walrath
Tagged: silicon nanosheet, Samsung, IBM, GLOBALFOUNDRIES, FinFET, 5nm
It seems only yesterday that we saw Intel introduce their 22nm FinFET technology, and now we are going all the way down to 5nm. This is obviously an exaggeration. The march of process technology has been more than a little challenging for the past 5+ years for everyone in the industry. Intel has made it look a little easier by being able to finance these advances a little better than the other pure-play foundries. It does not mean that they have not experienced challenges on their own.
We have seen some breakthroughs these past years with everyone jumping onto FinFETs with TSMC, Samsung, and GLOBALFOUNDRIES introducing their own processes. GLOBALFOUNDRIES initially had set out on their own, but that particular endeavor did not pan out. The ended up licensing Samsung’s 14nm processes (LPE and LPP) to start producing chips of their own, primarily for AMD in their graphics and this latest generation of Ryzen CPUs.
These advances have not been easy. While FinFETs are needed at these lower nodes to continue to provide the performance and power efficiency while supporting these transistor densities, the technology will not last forever. 10nm and 7nm lines will continue to use them, but many believe that while we will see the densities improve, the power characteristics will start to lag behind. The theory is that past 7nm nodes traditional FinFETs will no longer work as desired. This is very reminiscent of the sub 28nm processes that attempted to use planar structures on bulk silicon. In that case the chips could be made, but power issues plagued the designs and eventually support for those process lines were dropped.
IBM and their research associates Samsung, GLOBALFOUNDRIES at SUNY Polytechnic Institute Colleges of Nanoscale Science and Engineering’s NanoTech Complex in Albany, NY have announced a breakthrough in a new “Gate-All-Around” architecture made on a 5nm process. FinFETs are essentially a rectangle surround on three sides by gates, giving it the “fin” physical characteristics. This new technology now covers the fourth side and embeds these channels in nanosheets of silicon.
The problem with FinFETs is that they will eventually be unable to scale with power as transistors get closer and closer. While density scales, power and performance will get worse as compared to previous nodes. The 5nm silicon nanosheet technology gives a significant boost to power and efficiency, thereby doing to FinFETs what they did with planar structures at the 20/22nm nodes.
One of the working EUV litho machines at SUNY Albany.
IBM asserts that the average chip the size of a fingernail can contain up to 30 billion transistors and continue to see the density, power, and efficiency improvements that we would expect with a normal process shrink. The company expects these process nodes to start rolling out in a 2019 time frame if all goes as planned.
There are few details in how IBM was able to achieve this result. We do know a couple things about it. EUV lithography was used extensively to avoid the multi-patterning nightmare that this would entail. For the past two years Ametek has been installing 100 watt EUV litho machines throughout the world to select clients. One of these is located on the SUNY Albany campus where this research was done. We also know that deposition was done layer by layer with silicon and the other materials.
What we don’t know is how long it takes to create a complete wafer. Usually these test wafers are packed full of SRAM and very little logic. It is a useful test and creates a baseline for many structures that will eventually be applied to this process. We do not know how long it takes to produce such a wafer, but considering how the layers look to be deposited it takes a long, long time with current tools and machinery. Cutting edge wafers in production can take upwards of 16 weeks to complete. I hesitate to even guess how long each test wafer takes. Because of the very 3D nature of the design, I am curious as to how the litho stages work and how many passes are still needed to complete the design.
This looks to be a very significant advancement in process technology that should be mass produced in the timeline suggested by IBM. It is a significant jump, but it seems to borrow a lot of previous FinFET structures. It does not encompass anything exotic like “quantum wells”, but is able to go lower than the currently specified 7nm processes that TSMC, Samsung, and Intel have hinted at (and yes, process node names should be taken with a grain of salt from all parties at this time). IBM does appear to be comparing this to what Samsung calls its 7nm process in terms of dimensions and transistor density.
Cross section of a 5nm transistor showing the embedded channels and silicon nanosheets.
While Moore’s Law has been stretched thin as of late, we are still seeing these scientists and engineers pushing against the laws of physics to achieve better performance and scaling at incredibly small dimensions. The silicon nanosheet technology looks to be an effective and relatively affordable path towards smaller sizes without requiring exotic materials to achieve. IBM and its partners look to have produced a process node that will continue the march towards smaller, more efficient, and more powerful devices. It is not exactly around the corner, but 2019 is close enough to start planning designs that could potentially utilize this node.
What Makes Ryzen Tick
We have been exposed to details about the Zen architecture for the past several Hot Chips conventions as well as other points of information directly from AMD. Zen was a clean sheet design that borrowed some of the best features from the Bulldozer and Jaguar architectures, as well as integrating many new ideas that had not been executed in AMD processors before. The fusion of ideas from higher performance cores, lower power cores, and experience gained in APU/GPU design have all come together in a very impressive package that is the Ryzen CPU.
It is well known that AMD brought back Jim Keller to head the CPU group after the slow downward spiral that AMD entered in CPU design. While the Athlon 64 was a tremendous part for the time, the subsequent CPUs being offered by the company did not retain that leadership position. The original Phenom had problems right off the bat and could not compete well with Intel’s latest dual and quad cores. The Phenom II shored up their position a bit, but in the end could not keep pace with the products that Intel continued to introduce with their newly minted “tic-toc” cycle. Bulldozer had issues out of the gate and did not have performance numbers that were significantly greater than the previous generation “Thuban” 6 core Phenom II product, much less the latest Intel Sandy Bridge and Ivy Bridge products that it would compete with.
AMD attempted to stop the bleeding by iterating and evolving the Bulldozer architecture with Piledriver, Steamroller, and Excavator. The final products based on this design arc seemed to do fine for the markets they were aimed at, but certainly did not regain any marketshare with AMD’s shrinking desktop numbers. No matter what AMD did, the base architecture just could not overcome some of the basic properties that impeded strong IPC performance.
The primary goal of this new architecture is to increase IPC to a level consistent to what Intel has to offer. AMD aimed to increase IPC per clock by at least 40% over the previous Excavator core. This is a pretty aggressive goal considering where AMD was with the Bulldozer architecture that was focused on good multi-threaded performance and high clock speeds. AMD claims that it has in fact increased IPC by an impressive 54% from the previous Excavator based core. Not only has AMD seemingly hit its performance goals, but it exceeded them. AMD also plans on using the Zen architecture to power products from mobile products to the highest TDP parts offered.
The Zen Core
The basis for Ryzen are the CCX modules. These modules contain four Zen cores along with 8 MB of shared L3 cache. Each core has 64 KB of L1 I-cache and 32 KB of D-cache. There is a total of 512 KB of L2 cache. These caches are inclusive. The L3 cache acts as a victim cache which partially copies what is in L1 and L2 caches. AMD has improved the performance of their caches to a very large degree as compared to previous architectures. The arrangement here allows the individual cores to quickly snoop any changes in the caches of the others for shared workloads. So if a cache line is changed on one core, other cores requiring that data can quickly snoop into the shared L3 and read it. Doing this allows the CPU doing the actual work to not be interrupted by cache read requests from other cores.
Each core can handle two threads, but unlike Bulldozer has a single integer core. Bulldozer modules featured two integer units and a shared FPU/SIMD. Zen gets rid of CMT for good and we have a single integer and FPU units for each core. The core can address two threads by utilizing AMD’s version of SMT (symmetric multi-threading). There is a primary thread that gets higher priority while the second thread has to wait until resources are freed up. This works far better in the real world than in how I explained it as resources are constantly being shuffled about and the primary thread will not monopolize all resources within the core.
With the near comes a new push for performance, efficiency and feature leadership from Qualcomm and its Snapdragon line of mobile SoCs. The Snapdragon 835 was officially announced in November of last year when the partnership with Samsung on 10nm process technology was announced, but we now have the freedom to share more of the details on this new part and how it changes Qualcomm’s position in the ultra-device market. Though devices with the new 835 part won’t be on the market for several more months, with announcements likely coming at CES this year.
Qualcomm frames the story around the Snapdragon 835 processor with what they call the “five pillars” – five different aspects of mobile processor design that they have addressed with updates and technologies. Qualcomm lists them as battery life (efficiency), immersion (performance), connectivity, and security.
Starting where they start, on battery life and efficiency, the SD 835 has a unique focus that might surprise many. Rather than talking up the improvements in performance of the new processor cores, or the power of the new Adreno GPU, Qualcomm is firmly planted on looking at Snapdragon through the lens of battery life. Snapdragon 835 uses half of the power of Snapdragon 801.
The company touts usage claims of 1+ day of talk time, 5+ days of music playback, 11 hours of 4K video playback, 3 hours of 4K video capture and 2+ hours of sustained VR gaming. These sound impressive, but as we must always do in this market, you must wait for consumer devices from Qualcomm partners to really measure how well this platform will do. Going through a typical power user comparison of a device built on the Snapdragon 835 to one use the 820, Qualcomm thinks it could result in 2 or more hours of additional battery life at the end of the day.
We have already discussed the new Quick Charge 4 technology, that can offer 5 hours of use with just 5 minutes of charge time.
Subject: Processors | December 8, 2016 - 09:00 AM | Josh Walrath
Tagged: Xilinx, TSMC, standard cells, layout, FinFET, EDA, custom cell, arm, 7nm
Today ARM is announcing their partnership with Xilinx to deliver design solutions for their products on TSMC’s upcoming 7nm process node. ARM has previously partnered with Xilinx on other nodes including 28, 20, and 16nm. Their partnership extends into design considerations to improve the time to market of complex parts and to rapidly synthesize new designs for cutting edge process nodes.
Xilinx is licensing out the latest ARM Artisan Physical IP platform for TSMC’s 7nm. Artisan Physical IP is a set of tools to help rapidly roll out complex designs as compared to what previous generations of products faced. ARM has specialized libraries and tools to help implement these designs on a variety of processes and receive good results even on the shortest possible design times.
Design relies on two basic methodologies. There is custom cell and then standard cell designs. Custom cell design allows for a tremendous amount of flexibility in layout and electrical characteristics, but it requires a lot of man-hours to complete even the simplest logic. Custom cell designs typically draw less power and provide higher clockspeeds than standard cell design. Standard cells are like Legos in that the cells can be quickly laid out to create complex logic. Software called EDA (Electronic Design Automation) can quickly place and route these cells. GPUs lean heavily on standard cells and EDA software to get highly complex products out to market quickly.
These two basic methods have netted good results over the years, but during that time we have seen implementations of standard cells become more custom in how they behave. While not achieving full custom performance, we have seen semi-custom type endeavors achieve appreciable gains without requiring the man hours to achieve fully custom.
In this particular case ARM is achieving a solid performance in power and speed through automated design that improves upon standard cells, but without the downsides of a fully custom part. This provides positive power and speed benefits without the extra power draw of a traditional standard cell. ARM further improves upon this with the ARM Artisan Power Grid Architect (PGA) which simplifies the development of a complex power grid that services a large and complex chip.
We have seen these types of advancements in the GPU world that NVIDIA and AMD enjoy talking about. A better power grid allows the ASIC to perform at lower power envelopes due to less impedence. The GPU guys have also utilized High Density Libraries to pack in the transistors as tight as possible to utilize less space and increase spatial efficiency. A smaller chip, which requires less power is always a positive development over a larger chip of the same capabilities that requires more power. ARM looks to be doing their own version of these technologies and are applying them to TSMC’s upcoming 7nm FinFET process.
TSMC is not releasing this process to mass production until at least 2018. In 1H 2017 we will see some initial test and early production runs for a handful of partners. Full blown production of 7nm will be in 2018. Early runs and production are increasingly being used for companies working with low power devices. We can look back at 20/16/14 nm processes and see that they were initially used by designs that do not require a lot of power and will run at moderate clockspeeds. We have seen a shift in who uses these new processes with the introduction of sub-28nm process nodes. The complexity of the design, process steps, materials, and libraries have pushed the higher performance and power hungry parts to a secondary position as the foundries attempt to get these next generation nodes up to speed. It isn’t until after some many months of these low power parts are pushed through that we see adjustments and improvements in these next generation nodes to handle the higher power and clockspeed needs of products like desktop CPUs and GPUs.
ARM is certainly being much more aggressive in addressing next generation nodes and pushing their cutting edge products on them to allow for far more powerful mobile products that also exhibit improved battery life. This step with 7nm and Xilinx will provide a lot of data to ARM and its partners downstream when the time comes to implement new designs. Artisan will continue to evolve to allow partners to quickly and efficiently introduce new products on new nodes to the market at an accelerated rate as compared to years past.
Subject: Processors, Mobile | November 17, 2016 - 07:30 AM | Ryan Shrout
Tagged: snapdragon, Samsung, qualcomm, FinFET, 835, 10nm
Though we are still months away from shipping devices, Qualcomm has announced that it will be building its upcoming flagship Snapdragon 835 mobile SoC on Samsung’s 10nm 2nd generation FinFET process technology. Qualcomm tells us that integrating the 10nm node in 2017 will keep it “the technology leader in mobile platforms” and this makes the 835 the world's first 10nm production processor.
“Using the new 10nm process node is expected to allow our premium tier Snapdragon 835 processor to deliver greater power efficiency and increase performance while also allowing us to add a number of new capabilities that can improve the user experience of tomorrow’s mobile devices.”
Samsung announced its 10nm FinFET process technology in October of this year and it sports some impressive specifications and benefits to the Snapdragon 835 platform. Per Samsung, it offers “up to a 30% increase in area efficiency with 27% higher performance or up to 40% lower power consumption.” For Qualcomm and its partners, that means a smaller silicon footprint for innovative device designs, including thinner chassis or larger batteries (yes, please).
Other details on the Snapdragon 835 are still pending a future reveal, but Qualcomm says that 835 is in production now and will be shipping in commercial devices in the first half of 2017. We did hear that the new 10nm chip is built on "more than 3 billion transistors" - making it an incredibly complex design!
Keith Kressin SVP, Product Management, Qualcomm Technologies Inc and Ben Suh, SVP, Foundry Marketing, Samsung, show off first 10nm mobile processor, Snapdragon 835, in New York at Qualcomm's Snapdragon Technology Summit.
I am very curious to see how the market reacts to the release of the Snapdragon 835. We are still seeing new devices being released using the 820/821 SoCs, including Google’s own flagship Pixel phones this fall. Qualcomm wants to maintain leadership in the SoC market by innovating on both silicon and software but consumers are becoming more savvy to the actual usable benefits that new devices offer. Qualcomm promises features, performance and power benefits on SD 835 to make the case for your next upgrade.
Subject: Graphics Cards | June 30, 2016 - 07:54 PM | Scott Michaud
Tagged: amd, nvidia, FinFET, Polaris, polaris 10, pascal
If you're trying to purchase a Pascal or Polaris-based GPU, then you are probably well aware that patience is a required virtue. The problem is that, as a hardware website, we don't really know whether the issue is high demand or low supply. Both are manufactured on a new process node, which could mean that yield is a problem. On the other hand, it's been about four years since the last fabrication node, which means that chips got much smaller for the same performance.
Over time, manufacturing processes will mature, and yield will increase. But what about right now? AMD made a very small chip that produces ~GTX 970-level performance. NVIDIA is sticking with their typical, 3XXmm2 chip, which ended up producing higher than Titan X levels of performance.
It turns out that, according to online retailer, Overclockers UK, via Fudzilla, both the RX480 and GTX 1080 have sold over a thousand units at that location alone. That's quite a bit, especially when you consider that it only considers one (large) online retailer from Europe. It's difficult to say how much stock other stores (and regions) received compared to them, but it's still a thousand units in a day.
It's sounding like, for both vendors, pent-up demand might be the dominant factor.
10nm Sooner Than Expected?
It seems only yesterday that we had the first major GPU released on 16nm FF+ and now we are talking about ARM about to receive their first 10nm FF test chips! Well, in fact it was yesterday that NVIDIA formally released performance figures on the latest GeForce GTX 1080 which is based on TSMC’s 16nm FF+ process technology. Currently TSMC is going full bore on their latest process node and producing the fastest current graphics chip around. It has taken the foundry industry as a whole a lot longer to develop FinFET technology than expected, but now that they have that piece of the puzzle seemingly mastered they are moving to a new process node at an accelerated rate.
TSMC’s 10nm FF is not well understood by press and analysts yet, but we gather that it is more of a marketing term than a true drop to 10 nm features. Intel has yet to get past 14nm and does not expect 10 nm production until well into next year. TSMC is promising their version in the second half of 2016. We cannot assume that TSMC’s version will match what Intel will be doing in terms of geometries and electrical characteristics, but we do know that it is a step past TSMC’s 16nm FF products. Lithography will likely get a boost with triple patterning exposure. My guess is that the back end will also move away from the “20nm metal” stages that we see with 16nm. All in all, it should be an improved product from what we see with 16nm, but time will tell if it can match the performance and density of competing lines that bear the 10nm name from Intel, Samsung, and GLOBALFOUNDRIES.
ARM has a history of porting their architectures to new process nodes, but they are being a bit more aggressive here than we have seen in the past. It used to be that ARM would announce a new core or technology, and it would take up to two years to be introduced into the market. Now we are seeing technology announcements and actual products hitting the scenes about nine months later. With the mobile market continuing to grow we expect to see products quicker to market still.
The company designed a simplified test chip to tape out and send to TSMC for test production on the aforementioned 10nm FF process. The chip was taped out in December, 2015. The design was shipped to TSMC for mask production and wafer starts. ARM is expecting the finished wafers to arrive this month.
Subject: Processors | March 15, 2016 - 12:52 PM | Sebastian Peak
Tagged: TSMC, SoC, servers, process technology, low power, FinFET, datacenter, cpu, arm, 7nm, 7 nm FinFET
ARM and TSMC have announced their collaboration on 7 nm FinFET process technology for future SoCs. A multi-year agreement between the companies, products produces on this 7 nm FinFET process are intended to expand ARM’s reach “beyond mobile and into next-generation networks and data centers”.
TSMC Headquarters (Image credit: AndroidHeadlines)
So when can we expect to see 7nm SoCs on the market? The report from The Inquirer offers this quote from TSMC:
“A TSMC spokesperson told the INQUIRER in a statement: ‘Our 7nm technology development progress is on schedule. TSMC's 7nm technology development leverages our 10nm development very effectively. At the same time, 7nm offers a substantial density improvement, performance improvement and power reduction from 10nm’.”
Full press release after the break.
Subject: Mobile | February 12, 2016 - 04:26 PM | Sebastian Peak
Tagged: X16 modem, qualcomm, mu-mimo, modem, LTE, Gigabit LTE, FinFET, Carrier Aggregation, 14nm
Qualcomm’s new X16 LTE Modem is the industry's first Gigabit LTE chipset to be announced, achieving speeds of up to 1 Gbps using 4x Carrier Aggregation. The X16 succeeds the recently announced X12 modem, improving on the X12's 3x Carrier Aggregation and moving from LTE CAT 12 to CAT 16 on the downlink, while retaining CAT 13 on the uplink.
"In order to make a Gigabit Class LTE modem a reality, Qualcomm added a suite of enhancements – built on a foundation of commercially-proven Carrier Aggregation technology. The Snapdragon X16 LTE modem employs sophisticated digital signal processing to pack more bits per transmission with 256-QAM, receives data on four antennas through 4x4 MIMO, and supports for up to 4x Carrier Aggregation — all of which come together to achieve unprecedented download speeds."
Gigabit speeds are only possible if multiple data streams are connected to the device simultaneously, and with the new X16 modem such aggregation is performed using LTE-U and LAA.
(Image via EE Times)
What does all of this mean? Aggregation is a term you'll see a lot as we progress into the next generation of cellular data technology, and with the X16 Qualcomm is emphasizing carrier over link aggregation. Essentially Carrier Aggregation works by combining the carrier LTE data signal (licensed, high transmit power) with a shorter-range, shared spectrum (unlicensed, low transmit power) LTE signal. When the signals are combined at the device (i.e. your smartphone), significantly better throughput is possible with this larger (aggregated) data ‘pipe’.
Qualcomm lists the four main options for unlicensed LTE deployment as follows:
- LTE-U: Based on 3GPP Rel. 12, LTE-U targets early mobile operators deployments in USA, Korea and India, with coexistence tests defined by LTE-U forum
- LAA: Defined in 3GPP Rel. 13, LAA (Licensed Assisted Access) targets deployments in Europe, Japan, & beyond.
- LWA: Defined in 3GPP Rel. 13, LWA (LTE - Wi-Fi link aggregation) targets deployments where the operators already has carrier Wi-Fi deployments.
- MulteFire: Broadens the LTE ecosystem to new deployment opportunities by operating solely in unlicensed spectrum without a licensed anchor channel
The X16 is also Qualcomm’s first modem to be built on 14nm FinFet process, which Qualcomm says is highly scalable and will enable the company to evolve the modem product line “to address an even wider range of product, all the way down to power-efficient connectivity for IoT devices.”
Qualcomm has already begun sampling the X16, and expects the first commercial products in the second half of 2016.
Looking Towards 2016
ARM invited us to a short conversation with them on the prospects of 2016. The initial answer as to how they feel the upcoming year will pan out is, “Interesting”. We covered a variety of topics ranging from VR to process technology. ARM is not announcing any new products at this time, but throughout this year they will continue to push their latest Mali graphics products as well as the Cortex A72.
Trends to Watch in 2016
The one overriding trend that we will see is that of “good phones at every price point”. ARM’s IP scales from very low to very high end mobile SOCs and their partners are taking advantage of the length and breadth of these technologies. High end phones based on custom cores (Apple, Qualcomm) will compete against those licensing the Cortex A72 and A57 parts for their phones. Lower end options that are less expensive and pull less power (which then requires less battery) will flesh out the midrange and budget parts. Unlike several years ago, the products from top to bottom are eminently usable and relatively powerful products.
Camera improvements will also take center stage for many products and continue to be a selling point and an area of differentiation for competitors. Improved sensors and software will obviously be the areas where the ARM partners will focus on, but ARM is putting some work into this area as well. Post processing requires quite a bit of power to do quickly and effectively. ARM is helping here to leverage the Neon SIMD engine and leveraging the power of the Mali GPU.
4K video is becoming more and more common as well with handhelds, and ARM is hoping to leverage that capability in shooting static pictures. A single 4K frame is around 8 megapixels in size. So instead of capturing video, the handheld can achieve a “best shot” type functionality. So the phone captures the 4K video and then users can choose the best shot available to them in that period of time. This is a simple idea that will be a nice feature for those with a product that can capture 4K video.