Subject: Processors | March 15, 2016 - 12:52 PM | Sebastian Peak
Tagged: TSMC, SoC, servers, process technology, low power, FinFET, datacenter, cpu, arm, 7nm, 7 nm FinFET
ARM and TSMC have announced their collaboration on 7 nm FinFET process technology for future SoCs. A multi-year agreement between the companies, products produces on this 7 nm FinFET process are intended to expand ARM’s reach “beyond mobile and into next-generation networks and data centers”.
TSMC Headquarters (Image credit: AndroidHeadlines)
So when can we expect to see 7nm SoCs on the market? The report from The Inquirer offers this quote from TSMC:
“A TSMC spokesperson told the INQUIRER in a statement: ‘Our 7nm technology development progress is on schedule. TSMC's 7nm technology development leverages our 10nm development very effectively. At the same time, 7nm offers a substantial density improvement, performance improvement and power reduction from 10nm’.”
Full press release after the break.
Subject: Mobile | February 12, 2016 - 04:26 PM | Sebastian Peak
Tagged: X16 modem, qualcomm, mu-mimo, modem, LTE, Gigabit LTE, FinFET, Carrier Aggregation, 14nm
Qualcomm’s new X16 LTE Modem is the industry's first Gigabit LTE chipset to be announced, achieving speeds of up to 1 Gbps using 4x Carrier Aggregation. The X16 succeeds the recently announced X12 modem, improving on the X12's 3x Carrier Aggregation and moving from LTE CAT 12 to CAT 16 on the downlink, while retaining CAT 13 on the uplink.
"In order to make a Gigabit Class LTE modem a reality, Qualcomm added a suite of enhancements – built on a foundation of commercially-proven Carrier Aggregation technology. The Snapdragon X16 LTE modem employs sophisticated digital signal processing to pack more bits per transmission with 256-QAM, receives data on four antennas through 4x4 MIMO, and supports for up to 4x Carrier Aggregation — all of which come together to achieve unprecedented download speeds."
Gigabit speeds are only possible if multiple data streams are connected to the device simultaneously, and with the new X16 modem such aggregation is performed using LTE-U and LAA.
(Image via EE Times)
What does all of this mean? Aggregation is a term you'll see a lot as we progress into the next generation of cellular data technology, and with the X16 Qualcomm is emphasizing carrier over link aggregation. Essentially Carrier Aggregation works by combining the carrier LTE data signal (licensed, high transmit power) with a shorter-range, shared spectrum (unlicensed, low transmit power) LTE signal. When the signals are combined at the device (i.e. your smartphone), significantly better throughput is possible with this larger (aggregated) data ‘pipe’.
Qualcomm lists the four main options for unlicensed LTE deployment as follows:
- LTE-U: Based on 3GPP Rel. 12, LTE-U targets early mobile operators deployments in USA, Korea and India, with coexistence tests defined by LTE-U forum
- LAA: Defined in 3GPP Rel. 13, LAA (Licensed Assisted Access) targets deployments in Europe, Japan, & beyond.
- LWA: Defined in 3GPP Rel. 13, LWA (LTE - Wi-Fi link aggregation) targets deployments where the operators already has carrier Wi-Fi deployments.
- MulteFire: Broadens the LTE ecosystem to new deployment opportunities by operating solely in unlicensed spectrum without a licensed anchor channel
The X16 is also Qualcomm’s first modem to be built on 14nm FinFet process, which Qualcomm says is highly scalable and will enable the company to evolve the modem product line “to address an even wider range of product, all the way down to power-efficient connectivity for IoT devices.”
Qualcomm has already begun sampling the X16, and expects the first commercial products in the second half of 2016.
Looking Towards 2016
ARM invited us to a short conversation with them on the prospects of 2016. The initial answer as to how they feel the upcoming year will pan out is, “Interesting”. We covered a variety of topics ranging from VR to process technology. ARM is not announcing any new products at this time, but throughout this year they will continue to push their latest Mali graphics products as well as the Cortex A72.
Trends to Watch in 2016
The one overriding trend that we will see is that of “good phones at every price point”. ARM’s IP scales from very low to very high end mobile SOCs and their partners are taking advantage of the length and breadth of these technologies. High end phones based on custom cores (Apple, Qualcomm) will compete against those licensing the Cortex A72 and A57 parts for their phones. Lower end options that are less expensive and pull less power (which then requires less battery) will flesh out the midrange and budget parts. Unlike several years ago, the products from top to bottom are eminently usable and relatively powerful products.
Camera improvements will also take center stage for many products and continue to be a selling point and an area of differentiation for competitors. Improved sensors and software will obviously be the areas where the ARM partners will focus on, but ARM is putting some work into this area as well. Post processing requires quite a bit of power to do quickly and effectively. ARM is helping here to leverage the Neon SIMD engine and leveraging the power of the Mali GPU.
4K video is becoming more and more common as well with handhelds, and ARM is hoping to leverage that capability in shooting static pictures. A single 4K frame is around 8 megapixels in size. So instead of capturing video, the handheld can achieve a “best shot” type functionality. So the phone captures the 4K video and then users can choose the best shot available to them in that period of time. This is a simple idea that will be a nice feature for those with a product that can capture 4K video.
Subject: General Tech | January 5, 2016 - 04:40 AM | Ken Addison
Tagged: podcast, video, CES, CES 2016, Lenovo, Thinkpad, x1 carbon, x1 yoga, nvidia, pascal, amd, Polaris, FinFET, 14nm
CES 2016 Podcast Day 1 - 01/05/16
CES is just beginning. Join us for announcements from Lenovo, NVIDIA Press Conference, new AMD GPUs and more!
The URL for the podcast is: http://pcper.com/podcast - Share with your friends!
- iTunes - Subscribe to the podcast directly through the iTunes Store
- RSS - Subscribe through your regular RSS reader
- MP3 - Direct download link to the MP3 file
Hosts: Ryan Shrout, Josh Walrath, Allyn Malventano, Ken Addison and Sebastian Peak
Program length: 1:11:05
AMD Polaris Architecture Coming Mid-2016
In early December, I was able to spend some time with members of the newly formed Radeon Technologies Group (RTG), which is a revitalized and compartmentalized section of AMD that is taking over all graphics work. During those meetings, I was able to learn quite a bit about the plans for RTG going forward, including changes for AMD FreeSync and implementation of HDR display technology, and their plans for the GPUOpen open-sourced game development platform. Perhaps most intriguing of all: we received some information about the next-generation GPU architecture, targeted for 2016.
Codenamed Polaris, this new architecture will be the 4th generation of GCN (Graphics Core Next), and it will be the first AMD GPU that is built on FinFET process technology. These two changes combined promise to offer the biggest improvement in performance per watt, generation to generation, in AMD’s history.
Though the amount of information provided about the Polaris architecture is light, RTG does promise some changes to the 4th iteration of its GCN design. Those include primitive discard acceleration, an improved hardware scheduler, better pre-fetch, increased shader efficiency, and stronger memory compression. We have already discussed in a previous story that the new GPUs will include HDMI 2.0a and DisplayPort 1.3 display interfaces, which offer some impressive new features and bandwidth. From a multimedia perspective, Polaris will be the first GPU to include support for h.265 4K decode and encode acceleration.
This slide shows us quite a few changes, most of which were never discussed specifically that we can report, coming to Polaris. Geometry processing and the memory controller stand out as potentially interesting to me – AMD’s Fiji design continues to lag behind NVIDIA’s Maxwell in terms of tessellation performance and we would love to see that shift. I am also very curious to see how the memory controller is configured on the entire Polaris lineup of GPUs – we saw the introduction of HBM (high bandwidth memory) with the Fury line of cards.
Subject: General Tech | January 4, 2016 - 10:48 AM | Jeremy Hellstrom
Tagged: amd, Polaris, FinFET
Ryan's coverage of the new Polaris architecture will be up momentarily but in the meantime you can take a peek at The Tech Report's coverage here. The new architecture will utilize FinFETs of an unspecified process node and is designed to power the new UHD displays and VR headsets due for release over this coming year. Raja Koduri discusses the two major goals of the new architecture, fast pixels and deep pixels. Fast pixels refers to the awe inspiring amount of bandwidth required to draw on UHD displays, twin 4K displays would require addressing 1.8 gigapixels per refresh which would certainly need some fast pixels. Deep pixels refers to improved support for variable refresh rates and likely encompasses support for the new HDR technology we will see appear on the market in the near future. If you can't hold off your curiosity for our coverage you can pop over here.
"AMD will release new Radeons built on its next-gen Polaris architecture in mid-2016. We got an early look at this new architecture and AMD's plans for building these chips with FinFETs last month at the company's Radeon Technologies Group tech summit."
Here is some more Tech News from around the web:
- The AMD Polaris GPU Architecture Preview @ Hardware Canucks
- Steam says a config error and DoS attack caused sensitive data leak @ The Inquirer
- Happy 2016, and here's the year's first ransomware story @ The Register
- Twitter To Revive Politwoops, Archive of Politicians' Deleted Tweets @ Slashdot
- Best Hardware of 2015 - The KitGuru Editorial Awards
- Windows 10 is now running on 10 percent of all PCs and laptops @ The Inquirer
- Netgear ProSAFE XS728T 24-Port 10GbE Ethernet Switch @ techPowerUp
Subject: General Tech | November 12, 2015 - 02:47 PM | Ken Addison
Tagged: podcast, video, qualcomm, snapdragon 820, Lenovo, yoga 900, be quiet!, amd, r9 380x, GLOBALFOUNDRIES, 14nm, FinFET, nvidia, asus, Maximus VIII Extreme, Thrustmaster, T300
PC Perspective Podcast #375 - 11/12/2015
Join us this week as we discuss the Snapdragon 820, Lenovo Yoga 900, R9 380X and more!
The URL for the podcast is: http://pcper.com/podcast - Share with your friends!
- iTunes - Subscribe to the podcast directly through the Store
- RSS - Subscribe through your regular RSS reader
- MP3 - Direct download link to the MP3 file
Hosts: Ryan Shrout, Jeremy Hellstrom, Josh Walrath, and Sebastian Peak
Program length: 1:22:11
Week in Review:
0:29:30 This week’s podcast is brought to you by Casper. Use code PCPER at checkout for $50 towards your order!
News item of interest:
Hardware/Software Picks of the Week:
GPU Enthusiasts Are Throwing a FET
NVIDIA is rumored to launch Pascal in early (~April-ish) 2016, although some are skeptical that it will even appear before the summer. The design was finalized months ago, and unconfirmed shipping information claims that chips are being stockpiled, which is typical when preparing to launch a product. It is expected to compete against AMD's rumored Arctic Islands architecture, which will, according to its also rumored numbers, be very similar to Pascal.
This architecture is a big one for several reasons.
Image Credit: WCCFTech
First, it will jump two full process nodes. Current desktop GPUs are manufactured at 28nm, which was first introduced with the GeForce GTX 680 all the way back in early 2012, but Pascal will be manufactured on TSMC's 16nm FinFET+ technology. Smaller features have several advantages, but a huge one for GPUs is the ability to fit more complex circuitry in the same die area. This means that you can include more copies of elements, such as shader cores, and do more in fixed-function hardware, like video encode and decode.
That said, we got a lot more life out of 28nm than we really should have. Chips like GM200 and Fiji are huge, relatively power-hungry, and complex, which is a terrible idea to produce when yields are low. I asked Josh Walrath, who is our go-to for analysis of fab processes, and he believes that FinFET+ is probably even more complicated today than 28nm was in the 2012 timeframe, which was when it launched for GPUs.
It's two full steps forward from where we started, but we've been tiptoeing since then.
Image Credit: WCCFTech
Second, Pascal will introduce HBM 2.0 to NVIDIA hardware. HBM 1.0 was introduced with AMD's Radeon Fury X, and it helped in numerous ways -- from smaller card size to a triple-digit percentage increase in memory bandwidth. The 980 Ti can talk to its memory at about 300GB/s, while Pascal is rumored to push that to 1TB/s. Capacity won't be sacrificed, either. The top-end card is expected to contain 16GB of global memory, which is twice what any console has. This means less streaming, higher resolution textures, and probably even left-over scratch space for the GPU to generate content in with compute shaders. Also, according to AMD, HBM is an easier architecture to communicate with than GDDR, which should mean a savings in die space that could be used for other things.
Third, the architecture includes native support for three levels of floating point precision. Maxwell, due to how limited 28nm was, saved on complexity by reducing 64-bit IEEE 754 decimal number performance to 1/32nd of 32-bit numbers, because FP64 values are rarely used in video games. This saved transistors, but was a huge, order-of-magnitude step back from the 1/3rd ratio found on the Kepler-based GK110. While it probably won't be back to the 1/2 ratio that was found in Fermi, Pascal should be much better suited for GPU compute.
Image Credit: WCCFTech
Mixed precision could help video games too, though. Remember how I said it supports three levels? The third one is 16-bit, which is half of the format that is commonly used in video games. Sometimes, that is sufficient. If so, Pascal is said to do these calculations at twice the rate of 32-bit. We'll need to see whether enough games (and other applications) are willing to drop down in precision to justify the die space that these dedicated circuits require, but it should double the performance of anything that does.
So basically, this generation should provide a massive jump in performance that enthusiasts have been waiting for. Increases in GPU memory bandwidth and the amount of features that can be printed into the die are two major bottlenecks for most modern games and GPU-accelerated software. We'll need to wait for benchmarks to see how the theoretical maps to practical, but it's a good sign.
Subject: Processors | September 30, 2015 - 09:55 PM | Josh Walrath
Tagged: TSMC, Samsung, FinFET, apple, A9, 16 nm, 14 nm
So the other day the nice folks over at Chipworks got word that Apple was in fact sourcing their A9 SOC at both TSMC and Samsung. This is really interesting news on multiple fronts. From the information gleaned the two parts are the APL0898 (Samsung fabbed) and the APL1022 (TSMC).
These process technologies have been in the news quite a bit. As we well know, it has been a hard time for any foundry to go under 28 nm in an effective way if your name is not Intel. Even Intel has had some pretty hefty issues with their march to sub 32 nm parts, but they have the resources and financial ability to push through a lot of these hurdles. One of the bigger problems that affected the foundries was the idea that they could push back FinFETs beyond what they were initially planning. The idea was to hit 22/20 nm and use planar transistors and push development back to 16/14 nm for FinFET technology.
The Chipworks graphic that explains the differences between Samsung's and TSMC's A9 products.
There were many reasons why this did not work in an effective way for the majority of products that the foundries were looking to service with a 22/20 nm planar process. Yes, there were many parts that were fabricated using these nodes, but none of them were higher power/higher performance parts that typically garner headlines. No CPUs, no GPUs, and only a handful of lower power SOCs (most notably Apple's A8, which was around 89 mm squared and consumed up to 5 to 10 watts at maximum). The node just did not scale power very effectively. It provided a smaller die size, but it did not increase power efficiency and switching performance significantly as compared to 28 nm high performance nodes.
The information Chipworks has provided also verifies that Samsung's 14 nm FF process is more size optimized than TSMC's 16 nm FF. There was originally some talk about both nodes being very similar in overall transistor size and density, but Samsung has a slightly tighter design. Neither of them are smaller than Intel's latest 14 nm which is going into its second generation form. Intel still has a significant performance and size advantage over everyone else in the field. Going back to size we see the Samsung chip is around 96 mm square while the TSMC chip is 104.5 mm square. This is not huge, but it does show that the Samsung process is a little tighter and can squeeze more transistors per square mm than TSMC.
In terms of actual power consumption and clock scaling we have nothing to go on here. The chips are both represented in the 6S and 6S+. Testing so far has not shown there to be significant differences between the two SOCs so far. In theory one could be performing better than the other, but in reality we have not tested these chips at a low enough level to discern any major performance or power issue. My gut feeling here is that Samsung's process is more mature and running slightly better than TSMC's, but the differences are going to be minimal at best.
The next piece of info that we can glean from this is that there just isn't enough line space for all of the chip companies who want to fabricate their parts with either Samsung or TSMC. From a chip standpoint a lot of work has to be done to port a design to two different process nodes. While 14 and 16 are similar in overall size and the usage of FinFETS, the standard cells and design libraries for both Samsung and TSMC are going to be very different. It is not a simple thing to port over a design. A lot of work has to be done in the design stage to make a chip work with both nodes. I can tell you that there is no way that both chips are identical in layout. It is not going to be a "dumb port" where they just adjust the optics with the same masks and magically make these chips work right off the bat. Different mask sets for each fab, verification of both designs, and troubleshooting the yields by metal layer changes will be different for each manufacturer.
In the end this means that there just simply was not enough space at either TSMC or Samsung to handle the demand that Apple was expecting. Because Apple has deep pockets they contracted out both TSMC and Samsung to produce two very similar, but still different parts. Apple also likely outbid and locked down what availability to process wafers that Samsung and TSMC have, much to the dismay of other major chip firms. I have no idea what is going on in the background with people like NVIDIA and AMD when it comes to line space for manufacturing their next generation parts. At least for AMD it seems that their partnership with GLOBALFOUNDRIES and their version of 14 nm FF is having a hard time taking off. Eventually more space will be made in production and yields and bins will improve. Apple will stop taking up so much space and we can get other products rolling off the line. In the meantime, enjoy that cutting edge iPhone 6S/+ with the latest 14/16 nm FF chips.
Process Technology Overview
We have been very spoiled throughout the years. We likely did not realize exactly how spoiled we were until it became very obvious that the rate of process technology advances hit a virtual brick wall. Every 18 to 24 months we were treated to a new, faster, more efficient process node that was opened up to fabless semiconductor firms and we were treated to a new generation of products that would blow our hair back. Now we have been in a virtual standstill when it comes to new process nodes from the pure-play foundries.
Few expected the 28 nm node to live nearly as long as it has. Some of the first cracks in the façade actually came from Intel. Their 22 nm Tri-Gate (FinFET) process took a little bit longer to get off the ground than expected. We also noticed some interesting electrical features from the products developed on that process. Intel skewed away from higher clockspeeds and focused on efficiency and architectural improvements rather than staying at generally acceptable TDPs and leapfrogging the competition by clockspeed alone. Overclockers noticed that the newer parts did not reach the same clockspeed heights as previous products such as the 32 nm based Sandy Bridge processors. Whether this decision was intentional from Intel or not is debatable, but my gut feeling here is that they responded to the technical limitations of their 22 nm process. Yields and bins likely dictated the max clockspeeds attained on these new products. So instead of vaulting over AMD’s products, they just slowly started walking away from them.
Samsung is one of the first pure-play foundries to offer a working sub-20 nm FinFET product line. (Photo courtesy of ExtremeTech)
When 28 nm was released the plans on the books were to transition to 20 nm products based on planar transistors, thereby bypassing the added expense of developing FinFETs. It was widely expected that FinFETs were not necessarily required to address the needs of the market. Sadly, that did not turn out to be the case. There are many other factors as to why 20 nm planar parts are not common, but the limitations of that particular process node has made it a relatively niche process node that is appropriate for smaller, low power ASICs (like the latest Apple SOCs). The Apple A8 is rumored to be around 90 mm square, which is a far cry from the traditional midrange GPU that goes from 250 mm sq. to 400+ mm sq.
The essential difficulty of the 20 nm planar node appears to be a lack of power scaling to match the increased transistor density. TSMC and others have successfully packed in more transistors into every square mm as compared to 28 nm, but the electrical characteristics did not scale proportionally well. Yes, there are improvements there per transistor, but when designers pack in all those transistors into a large design, TDP and voltage issues start to arise. As TDP increases, it takes more power to drive the processor, which then leads to more heat. The GPU guys probably looked at this and figured out that while they can achieve a higher transistor density and a wider design, they will have to downclock the entire GPU to hit reasonable TDP levels. When adding these concerns to yields and bins for the new process, the advantages of going to 20 nm would be slim to none at the end of the day.