Subject: Editorial | February 9, 2017 - 06:59 PM | Josh Walrath
Tagged: TSMC, Samsung, Results, quadro, Q4, nvidia, Intel, geforce, Drive PX2, amd, 2017, 2016
It is most definitely quarterly reports time for our favorite tech firms. NVIDIA’s is unique with their fiscal vs. calendar year as compared to how AMD and Intel report. This has to do when NVIDIA had their first public offering and set the fiscal quarters ahead quite a few months from the actual calendar. So when NVIDIA announces Q4 2017, it is actually reflecting the Q4 period in 2016. Clear as mud?
Semantics aside, NVIDIA had a record quarter. Gross revenue was an impressive $2.173 billion US. This is up slightly more than $700 million from the previous Q4. NVIDIA has shown amazing growth during this time attributed to several factors. Net income (GAAP) is at $655 million. This again is a tremendous amount of profit for a company that came in just over $2 billion in revenue. We can compare this to AMD’s results two weeks ago that hit $1.11 billion in revenue and a loss of $51 million for the quarter. Consider that AMD provides CPUs, chipsets, and GPUs to the market and is the #2 x86 manufacturer in the world.
The yearly results were just as impressive. FY 2017 featured record revenue and net income. Revenue was $6.91 billion as compare to FY 2016 at $5 billion. Net income for the year was $1.666 billion with comparison to $614 million for FY 2016. The growth for the entire year is astounding, and certainly the company had not seen an expansion like this since the early 2000s.
The core strength of the company continues to be gaming. Gaming GPUs and products provided $1.348 billion in revenue by themselves. Since the manufacturing industry was unable to provide a usable 20 nm planar product for large, complex ASICs companies such as NVIDIA and AMD were forced to innovate in design to create new products with greater feature sets and performance, all the while still using the same 28 nm process as previous products. Typically process shrinks accounted for the majority of improvements (more transistors packed into a smaller area with corresponding switching speed increases). Many users kept cards that were several years old due to there not being a huge impetus to upgrade. With the arrival of the 14 nm and 16 nm processes from Samsung and TSMC respectively, users suddenly had a very significant reason to upgrade. NVIDIA was able to address the entire market from high to low with their latest GTX 10x0 series of products. AMD on the other hand only had new products that hit the midrange and budget markets.
The next biggest area for NVIDIA is that of the datacenter. This has seen tremendous growth as compared to the other markets (except of course gaming) that NVIDIA covers. It has gone from around $97 million in Q4 2016 up to $296 million this last quarter. Tripling revenue in any one area is rare. Gaming “only” about doubled during this same time period. Deep learning and AI are two areas that required this type of compute power and NVIDIA was able to deliver a comprehensive software stack, as well as strategic partnerships that provided turnkey solutions for end users.
After datacenter we still have the visualization market based on the Quadro products. This area has not seen the dramatic growth as other aspects of the company, but it remains a solid foundation and a good money maker for the firm. The Quadro products continue to be improved upon and software support grows.
One area that promises to really explode in the next three to four years is the automotive sector. The Drive PX2 system is being integrated into a variety of cars and NVIDIA is focused on providing a solid and feature packed solution for manufacturers. Auto-pilot and “co-pilot” modes will become more and more important in upcoming models and should reach wide availability by 2020, if not a little sooner. NVIDIA is working with some of the biggest names in the industry from both automakers and parts suppliers. BMW should release a fully automated driving system later this year with their i8 series. Audi also has higher end cars in the works that will utilize NVIDIA hardware and fully automated operation. If NVIDIA continues to expand here, eventually it could become as significant a source of income as gaming is today.
There was one bit of bad news from the company. Their OEM & IP division has seen several drops over the past several quarters. NVIDIA announced that the IP licensing to Intel would be discontinued this quarter and would not be renewed. We know that AMD has entered into an agreement with Intel to provide graphics IP to the company in future parts and to cover Intel in potential licensing litigation. This was a fair amount of money per quarter for NVIDIA, but their other divisions more than made up for the loss of this particular income.
NVIDIA certainly seems to be hitting on all cylinders and is growing into markets that previously were unavailable as of five to ten years ago. They are spreading out their financial base so as to avoid boom and bust cycles of any one industry. Next quarter NVIDIA expects revenue to be down seasonally into the $1.9 billion range. Even though that number is down, it would still represent the 3rd highest quarterly revenue.
Subject: Processors | December 8, 2016 - 09:00 AM | Josh Walrath
Tagged: Xilinx, TSMC, standard cells, layout, FinFET, EDA, custom cell, arm, 7nm
Today ARM is announcing their partnership with Xilinx to deliver design solutions for their products on TSMC’s upcoming 7nm process node. ARM has previously partnered with Xilinx on other nodes including 28, 20, and 16nm. Their partnership extends into design considerations to improve the time to market of complex parts and to rapidly synthesize new designs for cutting edge process nodes.
Xilinx is licensing out the latest ARM Artisan Physical IP platform for TSMC’s 7nm. Artisan Physical IP is a set of tools to help rapidly roll out complex designs as compared to what previous generations of products faced. ARM has specialized libraries and tools to help implement these designs on a variety of processes and receive good results even on the shortest possible design times.
Design relies on two basic methodologies. There is custom cell and then standard cell designs. Custom cell design allows for a tremendous amount of flexibility in layout and electrical characteristics, but it requires a lot of man-hours to complete even the simplest logic. Custom cell designs typically draw less power and provide higher clockspeeds than standard cell design. Standard cells are like Legos in that the cells can be quickly laid out to create complex logic. Software called EDA (Electronic Design Automation) can quickly place and route these cells. GPUs lean heavily on standard cells and EDA software to get highly complex products out to market quickly.
These two basic methods have netted good results over the years, but during that time we have seen implementations of standard cells become more custom in how they behave. While not achieving full custom performance, we have seen semi-custom type endeavors achieve appreciable gains without requiring the man hours to achieve fully custom.
In this particular case ARM is achieving a solid performance in power and speed through automated design that improves upon standard cells, but without the downsides of a fully custom part. This provides positive power and speed benefits without the extra power draw of a traditional standard cell. ARM further improves upon this with the ARM Artisan Power Grid Architect (PGA) which simplifies the development of a complex power grid that services a large and complex chip.
We have seen these types of advancements in the GPU world that NVIDIA and AMD enjoy talking about. A better power grid allows the ASIC to perform at lower power envelopes due to less impedence. The GPU guys have also utilized High Density Libraries to pack in the transistors as tight as possible to utilize less space and increase spatial efficiency. A smaller chip, which requires less power is always a positive development over a larger chip of the same capabilities that requires more power. ARM looks to be doing their own version of these technologies and are applying them to TSMC’s upcoming 7nm FinFET process.
TSMC is not releasing this process to mass production until at least 2018. In 1H 2017 we will see some initial test and early production runs for a handful of partners. Full blown production of 7nm will be in 2018. Early runs and production are increasingly being used for companies working with low power devices. We can look back at 20/16/14 nm processes and see that they were initially used by designs that do not require a lot of power and will run at moderate clockspeeds. We have seen a shift in who uses these new processes with the introduction of sub-28nm process nodes. The complexity of the design, process steps, materials, and libraries have pushed the higher performance and power hungry parts to a secondary position as the foundries attempt to get these next generation nodes up to speed. It isn’t until after some many months of these low power parts are pushed through that we see adjustments and improvements in these next generation nodes to handle the higher power and clockspeed needs of products like desktop CPUs and GPUs.
ARM is certainly being much more aggressive in addressing next generation nodes and pushing their cutting edge products on them to allow for far more powerful mobile products that also exhibit improved battery life. This step with 7nm and Xilinx will provide a lot of data to ARM and its partners downstream when the time comes to implement new designs. Artisan will continue to evolve to allow partners to quickly and efficiently introduce new products on new nodes to the market at an accelerated rate as compared to years past.
It always feels a little odd when covering NVIDIA’s quarterly earnings due to how they present their financial calendar. No, we are not reporting from the future. Yes, it can be confusing when comparing results and getting your dates mixed up. Regardless of the date before the earnings, NVIDIA did exceptionally well in a quarter that is typically the second weakest after Q1.
NVIDIA reported revenue of $1.43 billion. This is a jump from an already strong Q1 where they took in $1.30 billion. Compare this to the $1.027 billion of its competitor AMD who also provides CPUs as well as GPUs. NVIDIA sold a lot of GPUs as well as other products. Their primary money makers were the consumer space GPUs and the professional and compute markets where they have a virtual stranglehold on at the moment. The company’s GAAP net income is a very respectable $253 million.
The release of the latest Pascal based GPUs were the primary mover for the gains for this latest quarter. AMD has had a hard time competing with NVIDIA for marketshare. The older Maxwell based chips performed well against the entire line of AMD offerings and typically did so with better power and heat characteristics. Even though the GTX 970 was somewhat limited in its memory configuration as compared to the AMD products (3.5 GB + .5 GB vs. a full 4 GB implementation) it was a top seller in its class. The same could be said for the products up and down the stack.
Pascal was released at the end of May, but the company had been shipping chips to its partners as well as creating the “Founder’s Edition” models to its exacting specifications. These were strong sellers throughout the end of May until the end of the quarter. NVIDIA recently unveiled their latest Pascal based Quadro cards, but we do not know how much of an impact those have had on this quarter. NVIDIA has also been shipping, in very limited quantities, the Tesla P100 based units to select customers and outfits.
Subject: General Tech | August 8, 2016 - 11:06 PM | Tim Verry
Tagged: xbox one s, xbox one, TSMC, microsoft, console, 16nm
Microsoft recently unleashed a smaller version of its gaming console in the form of the Xbox One S. The new "S" variant packs an internal power supply, 4K Blu-ray optical drive, and a smaller (die shrunk) AMD SoC into a 40% smaller package. The new console is clad in all white with black accents and a circular vent on left half of the top. A USB port and pairing button has been added to the front and the power and eject buttons are now physical rather than capacitive (touch sensitive).
Rear I/O remains similar to the original console and includes a power input, two HDMI ports (one input, one output), two USB 3.0 ports, one Ethernet, one S/PDIF audio out, and one IR out port. There is no need for the power brick anymore though as the power supply is now internal. Along with being 40% smaller, it can now be mounted vertically using an included stand. While there is no longer a dedicated Kinect port, it is still possible to add a Kinect to your console using an adapter.
The internal specifications of the Xbox One S remain consistent with the original Xbox One console except that it will now be available in a 2TB model. The gaming console is powered by a nearly identical processor that is now 35% smaller thanks to being manufactured on a smaller 16nm FinFet process node at TSMC. While the chip is more power efficient, it still features the same eight Jaguar CPU cores clocked at 1.75 GHz and 12 CU graphics portion (768 stream processors). Microsoft and AMD now support HDR and 4K resolutions and upscaling with the new chip. The graphics portion is where the new Xbox One S gets a bit interesting because it appears that Microsoft has given the GPU a bit of an overclock to 914 MHz. Compared to the original Xbox One's 853 MHz, this is a 7.1% increase in clockspeed. The increased GPU clocks also results in increased bandwidth for the ESRAM (204 GB/s on the original Xbox One versus 219 GB/s on the Xbox One S).
According to Microsoft, the increased GPU clockspeeds were necessary to be able to render non HDR versions of the game for Game DVR, Game Streaming, and taking screenshots in real time. A nice side benefit to this though is that the extra performance can result in improved game play in certain games. In Digital Foundry's testing, Richard Leadbetter found this to be especially true in games with unlocked frame rates or in games that are 30 FPS locked but where the original console could not hit 30 FPS consistently. The increased clocks can be felt in slightly smoother game play and less screen tearing. For example, they found that the Xbox One S got up to 11% higher frames in Project Cars (47 FPS versus 44) and between 6% to 8% in Hitman. Further, they found that the higher clocks help performance in playing Xbox 360 games on the Xbox One in backwards compatibility mode such as Alan Wake's American Nightmare.
The 2TB Xbox One S is available now for $400 while the 1TB ($350) and 500GB ($300) versions will be available on the 23rd. For comparison, the 500GB Xbox One (original) is currently $250. The Xbox One 1TB game console varies in price depending on game bundle.
What are your thoughts on the smaller console? While the ever so slight performance boost is a nice bonus, I definitely don't think that it is worth specifically upgrading for if you already have an Xbox One. If you have been holding off, now is the time to get a discounted original or smaller S version though! If you are hoping for more performance, definitely wait for Microsoft's Scorpio project or it's competitor the PlayStation 4 Neo (or even better a gaming PC right!? hehe).
I do know that Ryan has gotten his hands on the slimmer Xbox One S, so hopefully we will see some testing of our own as well as a teardown (hint, hint!).
- Xbox One Teardown - Microsoft still hates you
- PC vs. PS4 vs. Xbox One Hardware Comparison: Building a Competing Gaming PC
- Sony PS4 and Microsoft Xbox One Already Hitting a Performance Wall
- Tech Interview: Inside Xbox One S @ Eurogamer
New Products for 2017
PC Perspective was invited to Austin, TX on May 11 and 12 to participate in ARM’s yearly tech day. Also invited were a handful of editors and analysts that cover the PC and mobile markets. Those folks were all pretty smart, so it is confusing as to why they invited me. Perhaps word of my unique talent of screenshoting PDFs into near-unreadable JPGs preceded me? Regardless of the reason, I was treated to two full days of in-depth discussion of the latest generation of CPU and GPU cores, 10nm test chips, and information on new licensing options.
Today ARM is announcing their next CPU core with the introduction of the Cortex-A73. They are also unwrapping the latest Mali-G71 graphics technology. Other technologies such as the CCI-550 interconnect are also revealed. It is a busy and important day for ARM, especially in light of Intel seemingly abandoning the sub-milliwatt mobile market.
ARM previously announced the Cortex-A72 in February, 2015. Since that time it has been seen in most flagship mobile devices in late 2015 and throughout 2016. The market continues to evolve, and as such the workloads and form factors have pushed ARM to continue to develop and improve their CPU technology.
The Sofia Antipolis, France design group is behind the new A73. The previous several core architectures had been developed by the Cambridge group. As such, the new design differs quite dramatically from the previous A72. I was actually somewhat taken aback by the differences in the design philosophy of the two groups and the changes between the A72 and A73, but the generational jumps we have seen in the past make a bit more sense to me.
The marketplace is constantly changing when it comes to workloads and form factors. More and more complex applications are being ported to mobile devices, including hot technologies like AR and VR. Other technologies include 3D/360 degree video, greater than 20 MP cameras, and 4K/8K displays and their video playback formats. Form factors on the other hand have continued to decrease in size, especially in overall height. We have relatively large screens on most premium devices, but the designers have continued to make these phones thinner and thinner throughout the years. This has put a lot of pressure on ARM and their partners to increase performance while keeping TDPs in check, and even reducing them so they more adequately fit in the TDP envelope of these extremely thin devices.
Subject: Graphics Cards | May 18, 2016 - 12:49 PM | Josh Walrath
Tagged: nvidia, pascal, gtx 1070, 1070, gtx, GTX 1080, 16nm FF+, TSMC, Founder's Edition
Several weeks ago when NVIDIA announced the new GTX 1000 series of products, we were given a quick glimpse of the GTX 1070. This upper-midrange card is to carry a $379 price tag in retail form while the "Founder's Edition" will hit the $449 mark. Today NVIDIA released the full specifications of this card on their website.
The interest of the GTX 1070 is incredibly great because of the potential performance of this card vs. the previous generation. Price is also a big consideration here as it is far easier to raise $370 than it is to make the jump to GTX 1080 and shell out $599 once non-Founder's Edition cards are released. The GTX 1070 has all of the same features as the GTX 1080, but it takes a hit when it comes to clockspeed and shader units.
The GTX 1070 is a Pascal based part that is fabricated on TSMC's 16nm FF+ node. It shares the same overall transistor count of the GTX 1080, but it is partially disabled. The GTX 1070 contains 1920 CUDA cores as compared to the 2560 cores of the 1080. Essentially one full GPC is disabled to reach that number. The clockspeeds take a hit as well compared to the full GTX 1080. The base clock for the 1070 is still an impressive 1506 MHz and boost reaches 1683 MHz. This combination of shader counts and clockspeed makes this probably a little bit faster than the older GTX 980 ti. The rated TDP for the card is 150 watts with a single 8 pin PCI-E power connector. This means that there should be some decent headroom when it comes to overclocking this card. Due to binning and yields, we may not see 2+ GHz overclocks with these cards, especially if NVIDIA cut down the power delivery system as compared to the GTX 1080. Time will tell on that one.
The memory technology that NVIDIA is using for this card is not the cutting edge GDDR5x or HBM, but rather the tried and true GDDR5. 8 GB of this memory sits on a 256 bit bus, but it is running at a very, very fast 8 gbps. This gives overall bandwidth in the 256 GB/sec region. When we combine this figure with the memory compression techniques implemented with the Pascal architecture we can see that the GTX 1070 will not be bandwidth starved. We have no information if this generation of products will mirror what we saw with the previous generation GTX 970 in terms of disabled memory controllers and the 3.5 GB/500 MB memory split due to that unique memory subsystem.
Beyond those things, the GTX 1070 is identical to the GTX 1080 in terms of DirectX features, display specifications, decoding support, double bandwidth SLI, etc. There is an obvious amount of excitement for this card considering its potential performance and price point. These supposedly will be available in the Founder's Edition release on June 10 for the $449 MSRP. I know many people are considering using these cards in SLI to deliver performance for half the price of last year's GTX 980ti. From all indications, these cards will be a signficant upgrade for anyone using GTX 970s in SLI. With the greater access to monitors that hit 4K as well as Surround Gaming, this could be a solid purchase for anyone looking to step up their game in these scenarios.
10nm Sooner Than Expected?
It seems only yesterday that we had the first major GPU released on 16nm FF+ and now we are talking about ARM about to receive their first 10nm FF test chips! Well, in fact it was yesterday that NVIDIA formally released performance figures on the latest GeForce GTX 1080 which is based on TSMC’s 16nm FF+ process technology. Currently TSMC is going full bore on their latest process node and producing the fastest current graphics chip around. It has taken the foundry industry as a whole a lot longer to develop FinFET technology than expected, but now that they have that piece of the puzzle seemingly mastered they are moving to a new process node at an accelerated rate.
TSMC’s 10nm FF is not well understood by press and analysts yet, but we gather that it is more of a marketing term than a true drop to 10 nm features. Intel has yet to get past 14nm and does not expect 10 nm production until well into next year. TSMC is promising their version in the second half of 2016. We cannot assume that TSMC’s version will match what Intel will be doing in terms of geometries and electrical characteristics, but we do know that it is a step past TSMC’s 16nm FF products. Lithography will likely get a boost with triple patterning exposure. My guess is that the back end will also move away from the “20nm metal” stages that we see with 16nm. All in all, it should be an improved product from what we see with 16nm, but time will tell if it can match the performance and density of competing lines that bear the 10nm name from Intel, Samsung, and GLOBALFOUNDRIES.
ARM has a history of porting their architectures to new process nodes, but they are being a bit more aggressive here than we have seen in the past. It used to be that ARM would announce a new core or technology, and it would take up to two years to be introduced into the market. Now we are seeing technology announcements and actual products hitting the scenes about nine months later. With the mobile market continuing to grow we expect to see products quicker to market still.
The company designed a simplified test chip to tape out and send to TSMC for test production on the aforementioned 10nm FF process. The chip was taped out in December, 2015. The design was shipped to TSMC for mask production and wafer starts. ARM is expecting the finished wafers to arrive this month.
Subject: Processors, Mobile | May 9, 2016 - 01:42 PM | Scott Michaud
Tagged: apple, a11, 10nm, TSMC
Before I begin, the report comes from DigiTimes and they cite anonymous sources for this story. As always, a grain of salt is required when dealing with this level of alleged leak.
That out of the way, rumor has it that Apple's A11 SoC has been taped out on TSMC's 10nm process node. This is still a little way's away from production, however. From here, TSMC should be providing samples of the now finalized chip in Q1 2017, start production a few months later, and land in iOS devices somewhere in Q3/Q4. Knowing Apple, that will probably align with their usual release schedule -- around September.
DigiTimes also reports that Apple will likely make their split-production idea a recurring habit. Currently, the A9 processor is fabricated at TSMC and Samsung on two different process nodes (16nm for TSMC and 14nm for Samsung). They claim that two-thirds of A11 chips will come from TSMC.
Subject: Processors | March 15, 2016 - 12:52 PM | Sebastian Peak
Tagged: TSMC, SoC, servers, process technology, low power, FinFET, datacenter, cpu, arm, 7nm, 7 nm FinFET
ARM and TSMC have announced their collaboration on 7 nm FinFET process technology for future SoCs. A multi-year agreement between the companies, products produces on this 7 nm FinFET process are intended to expand ARM’s reach “beyond mobile and into next-generation networks and data centers”.
TSMC Headquarters (Image credit: AndroidHeadlines)
So when can we expect to see 7nm SoCs on the market? The report from The Inquirer offers this quote from TSMC:
“A TSMC spokesperson told the INQUIRER in a statement: ‘Our 7nm technology development progress is on schedule. TSMC's 7nm technology development leverages our 10nm development very effectively. At the same time, 7nm offers a substantial density improvement, performance improvement and power reduction from 10nm’.”
Full press release after the break.
Subject: Processors, Mobile | February 22, 2016 - 11:11 AM | Sebastian Peak
Tagged: TSMC, SoC, octa-core, MWC 2016, MWC, mediatek, Mali-T880, LPDDR4X, Cortex-A53, big.little, arm
MediaTek might not be well-known in the United States, but the company has been working to expand from China, where it had a 40% market share as of June 2015, into the global market. While 2015 saw the introduction of the 8-core Helio P10 and the 10-core helio X20 SoCs, the company continues to expand their lineup, today announcing the Helio P20 SoC.
There are a number of differences between the recent SoCs from MediaTek, beginning with the CPU core configuration. This new Helio P20 is a “True Octa-Core” design, but rather than a big.LITTLE configuration it’s using 8 identically-clocked ARM Cortex-A53 cores at 2.3 GHz. The previous Helio P10 used a similar CPU configuration, though clocks were limited to 2.0 GHz with that SoC. Conversely, the 10-core Helio X20 uses a tri-cluster configuration, with 2x ARM Cortex-A72 cores running at 2.5 GHz, along with a typical big.LITTLE arrangement (4x Cortex-A53 cores at 2.0 Ghz and 4x Cortex-A53 cores at 1.4 GHz).
Another change affecting MediaTek’s new SoC and he industry at large is the move to smaller process nodes. The Helio P10 was built on 28 nm HPM, and this new P20 moves to 16 nm FinFET. Just as with the Helio P10 and Helio X20 (a 20 nm part) this SoC is produced at TSMC using their 16FF+ (FinFET Plus) technology. This should provide up to “40% higher speed and 60% power saving” compared to the company’s previous 20 nm process found in the Helio X20, though of course real-world results will have to wait until handsets are available to test.
The Helio P20 also takes advantage of LPDDR4X, and is “the world’s first SoC to support low power double data rate random access memory” according to MediaTek. The company says this new memory provides “70 percent more bandwidth than the LPDDR3 and 50 percent power savings by lowering supply voltage to 0.6v”. Graphics are powered by ARM’s high-end Mali T880 GPU, clocked at an impressive 900 MHz. And all-important modem connectivity includes CAT6 LTE with 2x carrier aggregation for speeds of up to 300 Mbps down, 50 Mbps up. The Helio P20 also supports up to 4k/30 video decode with H.264/265 support, and the 12-bit dual camera ISP supports up to 24 MP sensors.
Specs from MediaTek:
- Process: 16nm
- Apps CPU: 8x Cortex-A53, up to 2.3GHz
- Memory: Up to 2 x LPDDR4X 1600MHz (up to 6GB) + 1x LPDDR3 933Mhz (up to 4GB) + eMMC 5.1
- Camera: Up to 24MP at 24FPS w/ZSD, 12bit Dual ISP, 3A HW engine, Bayer & Mono sensor support
- Video Decode: Up to 4Kx2K 30fps H.264/265
- Video Encode: Up to 4Kx2K 30fps H.264
- Graphics: Mali T-880 MP2 900MHz
- Display: FHD 1920x1080 60fps. 2x DSI for dual display
- Modem: LTE FDD TDD R.11 Cat.6 with 2x20 CA. C2K SRLTE. L+W DSDS support
- Connectivity: WiFiac/abgn (with MT6630). GPS/Glonass/Beidou/BT/FM.
- Audio: 110db SNR & -95db THD
It’s interesting to see SoC makers experiment with less complex CPU designs after a generation of multi-cluster (big.LITTLE) SoCs, as even the current flagship Qualcomm SoC, the Snapdragon 820, has reverted to a straight quad-core design. The P20 is expected to be in shipping devices by the second half of 2016, and we will see how this configuration performs once some devices using this new P20 SoC are in the wild.
Full press release after the break: