All | Editorial | General Tech | Graphics Cards | Networking | Motherboards | Cases and Cooling | Processors | Chipsets | Memory | Displays | Systems | Storage | Mobile | Shows and Expos
Subject: Processors | February 21, 2017 - 10:54 AM | Sebastian Peak
Tagged: ryzen, rumor, report, R7, processor, leak, IPC, cpu, Cinebench, benchmark, amd, 1700X
The Ryzen 7 1700X is reportedly an 8-core/16-thread processor with a base clock speed of 3.40 GHz, and while overall performance from the leaked benchmarks looks very impressive, it is the single-threaded score from the Cinebench R15 run pictured which really makes this CPU look like major competition for Intel with IPC.
An overall score of 1537 is outstanding, placing the CPU almost even with the i7-6900K at 1547 based on results from AnandTech:
Image credit AnandTech
And the single-threaded performance score of the reported Ryzen 7 1700X is 154, which places it above the i7-6900K's score of 153. (It is worth noting that Cinebench R15 shows a clock speed of 3.40 GHz for this CPU, which is the base, while CPU-Z is displaying 3.50 GHz - likely indicating a boost clock, which can reportedly surpass 3.80 GHz with this CPU.)
Other results from the reported leak include 3DMark Fire Strike, with a physics score of 17,916 with Ryzen 7 1700X clocking in at ~3.90 GHz:
We will know soon enough where this and other Ryzen processors stand relative to Intel's current offerings, and if Intel will respond to the (rumored) price/performance double whammy of Ryzen. An i7-6900K retails for $1099 and currently sells for $1049 on Newegg.com, and the rumored pricing (taken from Wccftech), if correct, gives AMD a big win here. Competition is very, very good!
Chart credit Wccftech.com
Subject: Processors | February 8, 2017 - 09:38 PM | Josh Walrath
Tagged: Zen, Skylake, Samsung, ryzen, kaby lake, ISSCC, Intel, GLOBALFOUNDRIES, amd, AM4, 14 nm FinFET
Yesterday EE Times posted some interesting information that they had gleaned at ISSCC. AMD released a paper describing the design process and advances they were able to achieve with the Zen architecture manufactured on Samsung’s/GF’s 14nm FinFETT process. AMD went over some of the basic measurements at the transistor scale and how it compares to what Intel currently has on their latest 14nm process.
The first thing that jumps out is that AMD claimes that their 4 core/8 thread x86 core is about 10% smaller than what Intel has with one of their latest CPUs. We assume it is either Kaby Lake or Skylake. AMD did not exactly go over exactly what they were counting when looking at the cores because there are some significant differences between the two architectures. We are not sure if that 44mm sq. figure includes the L3 cache or the L2 caches. My guess is that it probably includes L2 cache but not L3. I could be easily wrong here.
Going down the table we see that AMD and Samsung/GF are able to get their SRAM sizes down smaller than what Intel is able to do. AMD has double the amount of L2 cache per core, but it is only about 60% larger than Intel’s 256 KB L2. AMD also has a much smaller L3 cache as well than Intel. Both are 8 MB units but AMD comes in at 16 mm sq. while Intel is at 19.1 mm sq. There will be differences in how AMD and Intel set up these caches, and until we see L3 performance comparisons we cannot assume too much.
(Image courtesy of ISSCC)
In some of the basic measurements of the different processes we see that Intel has advantages throughout. This is not surprising as Intel has been well known to push process technology beyond what others are able to do. In theory their products will have denser logic throughout, including the SRAM cells. When looking at this information we wonder how AMD has been able to make their cores and caches smaller. Part of that is due to the likely setup of cache control and access.
One of the most likely culprits of this smaller size is that the less advanced FPU/SSE/AVX units that AMD has in Zen. They support AVX-256, but it has to be done in double the cycles. They can do single cycle AVX-128, but Intel’s throughput is much higher than what AMD can achieve. AVX is not the end-all, be-all but it is gaining in importance in high performance computing and editing applications. David Kanter in his article covering the architecture explicitly said that AMD made this decision to lower the die size and power constraints for this product.
Ryzen will undoubtedly be a pretty large chip overall once both modules and 16 MB of L3 cache are put together. My guess would be in the 220 mm sq. range, but again that is only a guess once all is said and done (northbridge, southbridge, PCI-E controllers, etc.). What is perhaps most interesting of it all is that AMD has a part that on the surface is very close to the Broadwell-E based Intel i7 chips. The i7-6900K runs at 3.2 to 3.7 GHz, features 8 cores and 16 threads, and around 20 MB of L2/L3 cache. AMD’s top end looks to run at 3.6 GHz, features the same number of cores and threads, and has 20 MB of L2/L3 cache. The Intel part is rated at 140 watts TDP while the AMD part will have a max of 95 watts TDP.
If Ryzen is truly competitive in this top end space (with a price to undercut Intel, yet not destroy their own margins) then AMD is going to be in a good position for the rest of this year. We will find out exactly what is coming our way next month, but all indications point to Ryzen being competitive in overall performance while being able to undercut Intel in TDPs for comparable cores/threads. We are counting down the days...
Subject: Processors | February 8, 2017 - 01:16 PM | Jeremy Hellstrom
Tagged: kaby lake, i5-7600K, Intel
[H]ard|OCP followed up their series on replacing the TIM underneath the heatspreader on Kaby Lake processors with another series depicting the i5-7600K in the buff. They removed the heatspreader completely and tried watercooling the die directly. As you can see in the video this requires more work than you might immediately assume, it was not simply shimming which was involved, some of the socket on the motherboard needed to be trimmed with a knife in order to get the waterblock to sit directly on the core. In the end the results were somewhat depressing, the risks involved are high and the benefits almost non-existent. If you are willing to risk it, replacing the TIM and reattaching the heatspreader is a far better choice.
"After our recent experiments with delidding and relidding our 7700K and 7600K to see if we could get better operating temperatures, we decided it was time to go topless! Popping the top on your CPU is one thing, and getting it to work in the current processor socket is another. Get out your pocket knife, we are going to have to make some cuts."
Here are some more Processor articles from around the web:
Subject: Processors | February 3, 2017 - 08:22 PM | Sebastian Peak
Tagged: titan x, ryzen, report, processor, nvidia, leak, cpu, benchmark, ashes of the singularity, amd
AMD's upcoming 8-core Ryzen CPU has appeared online in an apparent leak showing performance from an Ashes of the Singularity benchmark run. The benchmark results, available here on imgur and reported by TechPowerUp (among others today) shows the result of a run featuring the unreleased CPU paired with an NVIDIA Titan X graphics card.
It is interesting to consider that this rather unusual system configuration was also used by AMD during their New Horizon fan event in December, with an NVIDIA Titan X and Ryzen 8-core processor powering the 4K game demos of Battlefield 1 that were pitted against an Intel Core i7-6900K/Titan X combo.
It is also interesting to note that the processor listed in the screenshot above is (apparently) not an engineering sample, as TechPowerUp points out in their post:
"Unlike some previous benchmark leaks of Ryzen processors, which carried the prefix ES (Engineering Sample), this one carried the ZD Prefix, and the last characters on its string name are the most interesting to us: F4 stands for the silicon revision, while the 40_36 stands for the processor's Turbo and stock speeds respectively (4.0 GHz and 3.6 GHz)."
March is fast approaching, and we won't have to wait long to see just how powerful this new processor will be for 4K gaming (and other, less important stuff). For now, I want to find results from an AotS benchmark with a Titan X and i7-6900K to see how these numbers compare!
Subject: Processors | January 30, 2017 - 02:29 PM | Jeremy Hellstrom
Tagged: kaby lake, core i7 7700k, overclocking, delidding, risky business
Recently [H]ard|OCP popped the lid off of an i7-7700k to see if the rumours that once again Intel did not use high quality thermal interface material underneath the heatspreader. The experiment was a success in one way, the temperatures dropped 25.28%, from 91C to 68C. However the performance did not change much, they still could not reach a stable 5GHz overclock. They did not let that initial failure discourage them and spent some more time with their enhanced Kaby Lake processor to find scenarios in which they could reach or pass the 5GHz mark. They met with success when they reduced the RAM frequency to 2666MHz, by disabling Hyperthreading they could reach 5GHz with 3600MHz RAM but only when they increased the VCore did they manage to break 5GHz.
Of course you must exercise caution when tweaking to this level, a higher VCore will certainly reduce the lifespan of your chip and delidding can have a disastrous outcome even if done carefully. If you are interested in trying this, The Tech Report has a link to a 3D printed tool to help you in your endeavours.
"Last week we shared our overclocking results with our retail purchased Core i7-7700K Kaby Lake processor. We then took the Integrated Heat Spreader off, replaced the Thermal Interface Material and tried again for 5GHz with 3600MHz memory and failed. This time, less RAM MHz and more core voltage!"
Here are some more Processor articles from around the web:
Subject: Processors | January 16, 2017 - 04:11 PM | Jeremy Hellstrom
Tagged: kaby lake, sandy bridge
Not too long ago the release of a new processor family meant a noticeable improvement from the previous generation and the only question was how to upgrade, not if you should upgrade. Like many other things, that has passed on into the proverbial good old days and now we need reviews like this one published by [H]ard|OCP. Is there any noticeable performance difference between the two chips outside of synthetic benchmarks?
The test systems are slightly different as the memory has changed, the 7700K has 2666MHz DDR4 while the 2600K has 2133MHz DDR3; both CPUs are clocked at 4.5GHz however. Their results show actual performance deltas in productivity software such as HandBrake and Blender, justifying the upgrade for those who focus on content creation. As for gaming, if you have no GPU then you will indeed see performance increases; but nothing compared to buying a GPU.
"There are many HardOCP readers that are still running Sandy Bridge CPUs and have been waiting with anticipation of one day upgrading to a new system. One of the biggest things asked in the last month is just how the 2600K stacks up against the new 7700K processor. So we got hold of one of our readers 2600K systems and put it to the test."
Here are some more Processor articles from around the web:
Subject: Processors | January 3, 2017 - 03:54 PM | Jeremy Hellstrom
Tagged: z270, overclocking, kaby lake, Intel, i7-7700k, core i7-7700k, 7th generation core, 7700k, 14nm
Having already familiarized yourself with Intel's new Kaby Lake architecture and the i7-7700k processor in Ryan's review you may now be wondering how well the new CPU overclocks for others. [H]ard|OCP received three i7-7700k's and three different Z270 motherboards for testing and they set about overclocking these in combination to see what frequency they could reach. Only one of the chips was ever stable at 5GHz, and it is reassuring that it managed that on all three motherboards, the remaining two would only hit 4.8GHz which is still not a bad result. Drop by to see their settings in full detail.
"After having a few weeks to play around with Intel's new Kaby Lake architecture Core i7-7700K processors, we finally have some results that we want to discuss when it comes to overclocking and the magic 5GHz many of us are looking for, and what we think your chances are of getting there yourself."
Here are some more Processor articles from around the web:
- Intel's Core i7-7700K 'Kaby Lake' CPU @ The Tech Report
- Intel Kaby Lake i7-7700K & i5-7600K Review @ Hardware Canucks
- Intel Core i7-7700K vs 6700K: 22 Games, RX 480 & GTX 1080 @ techPowerUp
- ntel Kaby Lake Core i7-7700K Performance & Z270 Chipset Overview @ Techgage
- Intel 7th Generation Core i7 7700K Processor Review @ OCC
- Intel Kaby Lake Core i7-7700K IPC @ [H]ard|OCP
- Core i5-6400 @ Hardware Secrets
- FX-4300 @ Hardware Secrets
- AMD's New Ryzen CPU - SMT and IPC @ [H]ard|OCP
Subject: Processors | January 2, 2017 - 05:33 PM | Scott Michaud
Tagged: sandy bridge, Intel
OC3D is claiming that Intel is working on a significantly new architecture, targeting somewhere around the 2019 or 2020 time frame. Like AMD’s Bulldozer, while there were several architectures after the initial release, they were all based around a set of the same basic assumptions with tweaks for better IPC, reducing bottlenecks, and so forth. Intel has also been using the same fundamentals since Sandy Bridge, albeit theirs aligned much better with how x86 applications were being developed.
According to the report, Intel’s new architecture is expected to remove some old instructions, which will make it less compatible with applications that use these commands. This is actually very similar to what AMD was attempting to do with Bulldozer... to a point. AMD projected that applications would scale well to multiple cores, and use GPUs for floating-point operations; as such, they designed cores in pairs, and decided to eliminate redundant parts, such as half of the floating-point units. Hindsight being 20/20, we now know that developers didn’t change their habits (and earlier Bulldozer parts were allegedly overzealous with cutting out elements in a few areas, too).
In Intel’s case, from what we hear about at the moment, their cuts should be less broad than AMD’s. Rather than projecting a radical shift in programming, they’re just going to cut the fat of their existing instruction set, unless there’s bigger changes planned for the next couple years of development. As for the unlucky applications that use these instructions, OC3D speculates that either Intel or the host operating systems will provide some emulation method, likely in software.
If the things they cut haven’t been used in several years, then you can probably get acceptable performance in the applications that require them via emulation. On the other hand, a bad decision could choke the processor in the same way that Bulldozer, especially the early variants, did for AMD. On the other-other hand, Intel has something that AMD didn’t: the market-share to push (desktop) developers in a given direction. On the fourth hand, which I’ll return to its rightful owner, I promise, we don’t know how much the “(desktop)” clause will translate to overall software in two years.
Right now, it seems like x86 is successfully holding off ARM in performance-critical, consumer applications. If that continues, then Intel might be able to push x86 software development, even if they get a little aggressive like AMD did five-plus-development-time years ago.
Subject: General Tech, Processors | December 15, 2016 - 12:29 PM | Jeremy Hellstrom
Tagged: leak, kaby lake, intel 200
Tech ARP have an interesting story posted today, it would seem they pried the specs of the upcoming Kaby Lake processors and accompanying Intel 200 chipset. The top chip, the $349 Core i7-7700K will have 4 cores and 8 threads running at 4.2 GHz, with an 8 MB L3 cache and a TDP of 95W while the non-K version will have it core clock dropped to 3.6GHz, TDP dropped to 65W and price lowered to $309. The chipsets will encompass series similar to the previous generations from Intel, including the LGA 1151 Z270, H270, Q270, B250 and Q250 series. There is no information on the socket the server level C422 and high end X299 boards will use in this leak, but we are sure you can extrapolate from existing rumours and innuendo. Follow that link for the entire lineup.
"As AMD gears up to launch the AMD Ryzen desktop processor in early Q1 2017, Intel has finalised the launch plans for their desktop Kaby Lake processors, and the accompanying 200 Series chipsets.
Although Intel has been extremely secretive, we managed to obtain the specifications and launch details of the desktop Kaby Lake processors, and the 200 Series chipsets. Check it out!"
Here is some more Tech News from around the web:
- Ashley Madison is getting off lightly just like its clients @ The Inquirer
- Microsoft quietly emits patch to undo its earlier patch that broke Windows 10 networking @ The Register
- PC vendors trying out Qualcomm/Windows products @ DigiTimes
- Uh-oh! Microsoft has another chatbot – but racism is a no-go for Zo @ The Register
- Delete your account: Yahoo admits to another hack affecting one billion customers @ The Inquirer
- Docker opens up crucial container plumbing code cunningly disguised as 'boring infrastructure' @ The Register
- Malvertising Campaign Infects Your Router Instead of Your Browser @ Slashdot
- Top 7 Videos from ApacheCon and Apache Big Data 2016 @ Linux.com
Subject: Processors | December 8, 2016 - 09:00 AM | Josh Walrath
Tagged: Xilinx, TSMC, standard cells, layout, FinFET, EDA, custom cell, arm, 7nm
Today ARM is announcing their partnership with Xilinx to deliver design solutions for their products on TSMC’s upcoming 7nm process node. ARM has previously partnered with Xilinx on other nodes including 28, 20, and 16nm. Their partnership extends into design considerations to improve the time to market of complex parts and to rapidly synthesize new designs for cutting edge process nodes.
Xilinx is licensing out the latest ARM Artisan Physical IP platform for TSMC’s 7nm. Artisan Physical IP is a set of tools to help rapidly roll out complex designs as compared to what previous generations of products faced. ARM has specialized libraries and tools to help implement these designs on a variety of processes and receive good results even on the shortest possible design times.
Design relies on two basic methodologies. There is custom cell and then standard cell designs. Custom cell design allows for a tremendous amount of flexibility in layout and electrical characteristics, but it requires a lot of man-hours to complete even the simplest logic. Custom cell designs typically draw less power and provide higher clockspeeds than standard cell design. Standard cells are like Legos in that the cells can be quickly laid out to create complex logic. Software called EDA (Electronic Design Automation) can quickly place and route these cells. GPUs lean heavily on standard cells and EDA software to get highly complex products out to market quickly.
These two basic methods have netted good results over the years, but during that time we have seen implementations of standard cells become more custom in how they behave. While not achieving full custom performance, we have seen semi-custom type endeavors achieve appreciable gains without requiring the man hours to achieve fully custom.
In this particular case ARM is achieving a solid performance in power and speed through automated design that improves upon standard cells, but without the downsides of a fully custom part. This provides positive power and speed benefits without the extra power draw of a traditional standard cell. ARM further improves upon this with the ARM Artisan Power Grid Architect (PGA) which simplifies the development of a complex power grid that services a large and complex chip.
We have seen these types of advancements in the GPU world that NVIDIA and AMD enjoy talking about. A better power grid allows the ASIC to perform at lower power envelopes due to less impedence. The GPU guys have also utilized High Density Libraries to pack in the transistors as tight as possible to utilize less space and increase spatial efficiency. A smaller chip, which requires less power is always a positive development over a larger chip of the same capabilities that requires more power. ARM looks to be doing their own version of these technologies and are applying them to TSMC’s upcoming 7nm FinFET process.
TSMC is not releasing this process to mass production until at least 2018. In 1H 2017 we will see some initial test and early production runs for a handful of partners. Full blown production of 7nm will be in 2018. Early runs and production are increasingly being used for companies working with low power devices. We can look back at 20/16/14 nm processes and see that they were initially used by designs that do not require a lot of power and will run at moderate clockspeeds. We have seen a shift in who uses these new processes with the introduction of sub-28nm process nodes. The complexity of the design, process steps, materials, and libraries have pushed the higher performance and power hungry parts to a secondary position as the foundries attempt to get these next generation nodes up to speed. It isn’t until after some many months of these low power parts are pushed through that we see adjustments and improvements in these next generation nodes to handle the higher power and clockspeed needs of products like desktop CPUs and GPUs.
ARM is certainly being much more aggressive in addressing next generation nodes and pushing their cutting edge products on them to allow for far more powerful mobile products that also exhibit improved battery life. This step with 7nm and Xilinx will provide a lot of data to ARM and its partners downstream when the time comes to implement new designs. Artisan will continue to evolve to allow partners to quickly and efficiently introduce new products on new nodes to the market at an accelerated rate as compared to years past.
Subject: Processors | November 30, 2016 - 06:52 PM | Scott Michaud
Tagged: kaby lake, Intel, core i7 7700k
Someone, who wasn’t Intel, seeded Tom’s Hardware an Intel Core i7-7700k, which is expected for release in the new year. This is the top end of the mainstream SKUs, bringing four cores (eight threads) to 4.2 GHz base, 4.5 GHz boost. Using a motherboard built around the Z170 chipset, they were able to clock the CPU up to 4.8 GHz, which is a little over 4% higher than the Skylake-based Core i7-6700k maximum overclock on the same board.
Image Credit: Tom's Hardware
Lucky number i7-77.
Before we continue, these results are based on a single sample. (Update: @7:01pm -- Also, the motherboard they used has some known overclock and stability issues. They mentioned it a bit in the post, like why their BCLK is 99.65MHz, but I forgot to highlight it here. Thankfully, Allyn caught it in the first ten minutes.) This sample has retail branding, but Intel would not confirm that it performs like they expect a retail SKU would. Normally, pre-release products are labeled as such, but there’s no way to tell if this one part is some exception. Beyond concerns that it might be slightly different from what consumers will eventually receive, there is also a huge variation in overclocking performance due to binning. With a sample size of one, we cannot tell whether this chip has an abnormally high, or an abnormally low, defect count, which affects both power and maximum frequency.
That aside, if this chip is representative of Kaby Lake performance, users should expect an increase in headroom for clock rates, but it will come at the cost of increased power consumption. In fact, Tom’s Hardware states that the chip “acts like an overclocked i7-6700K”. Based on this, it seems like, unless they want an extra 4 PCIe lanes on Z270, Kaby Lake’s performance might already be achievable for users with a lucky Skylake.
I should note that Tom’s Hardware didn’t benchmark the iGPU. I don’t really see it used for much more than video encoding anyway, but it would be nice to see if Intel improved in that area, seeing as how they incremented the model number. Then again, even users who are concerned about that will probably be better off just adding a second, discrete GPU anyway.
Subject: Processors | November 28, 2016 - 09:26 PM | Scott Michaud
Tagged: amd, Zen, Summit Ridge
Guru3D got hold of a product list, which includes entries for AMD’s upcoming Zen architecture.
Four SKUs are thus rumored to exist:
- Zen SR3: (65W, quad-core, eight threads, ~$150 USD)
- Zen SR5: (95W, hexa-core, twelve threads, ~$250 USD)
- Zen SR7: (95W, octo-core, sixteen threads, ~$350 USD)
- Special Zen SR7: (95W, octo-core, sixteen threads, ~$500 USD)
The sheet also states that none of these are supposed to contain integrated graphics, like we see on the current FX line. There is some merit to using integrated GPUs for specific tasks, like processing video while the main GPU is busy or doing a rapid, massively parallel calculation without the latency of memory copies, but AMD is probably right to not waste resources, such as TDP, fighting our current lack of compatible software and viable use cases for these SKUs.
Image Credit: Guru3D
The sheet also contains benchmarks for Cinebench R15. While pre-rendered video is a task that really should be done on GPUs at this point, especially with permissive, strong, open-source projects like Cycles, they do provide a good example of multi-core performance that scales. In this one test, the Summit Ridge 7 CPU ($350) roughly matches the Intel Core i7-6850K ($600), again, according to this one unconfirmed benchmark. It doesn’t list clock rates, but other rumors claim that the top-end chip will be around 3.2 GHz base, 3.5 GHz boost at stock, with manual overclocks exceeding 4 GHz.
These performance figures suggest that Zen will not beat Skylake on single-threaded performance, but it might be close. That might not matter, however. CPUs, these days, are kind-of converging around a certain level of per-thread performance, and are differentiating with core count, price, and features. Unfortunately, there doesn’t seem to have been many leaks regarding enthusiast-level chipsets for Zen, so we don’t know if there will be compelling use cases yet.
Zen is expected early in 2017.
Subject: Processors, Mobile | November 17, 2016 - 07:30 AM | Ryan Shrout
Tagged: snapdragon, Samsung, qualcomm, FinFET, 835, 10nm
Though we are still months away from shipping devices, Qualcomm has announced that it will be building its upcoming flagship Snapdragon 835 mobile SoC on Samsung’s 10nm 2nd generation FinFET process technology. Qualcomm tells us that integrating the 10nm node in 2017 will keep it “the technology leader in mobile platforms” and this makes the 835 the world's first 10nm production processor.
“Using the new 10nm process node is expected to allow our premium tier Snapdragon 835 processor to deliver greater power efficiency and increase performance while also allowing us to add a number of new capabilities that can improve the user experience of tomorrow’s mobile devices.”
Samsung announced its 10nm FinFET process technology in October of this year and it sports some impressive specifications and benefits to the Snapdragon 835 platform. Per Samsung, it offers “up to a 30% increase in area efficiency with 27% higher performance or up to 40% lower power consumption.” For Qualcomm and its partners, that means a smaller silicon footprint for innovative device designs, including thinner chassis or larger batteries (yes, please).
Other details on the Snapdragon 835 are still pending a future reveal, but Qualcomm says that 835 is in production now and will be shipping in commercial devices in the first half of 2017. We did hear that the new 10nm chip is built on "more than 3 billion transistors" - making it an incredibly complex design!
Keith Kressin SVP, Product Management, Qualcomm Technologies Inc and Ben Suh, SVP, Foundry Marketing, Samsung, show off first 10nm mobile processor, Snapdragon 835, in New York at Qualcomm's Snapdragon Technology Summit.
I am very curious to see how the market reacts to the release of the Snapdragon 835. We are still seeing new devices being released using the 820/821 SoCs, including Google’s own flagship Pixel phones this fall. Qualcomm wants to maintain leadership in the SoC market by innovating on both silicon and software but consumers are becoming more savvy to the actual usable benefits that new devices offer. Qualcomm promises features, performance and power benefits on SD 835 to make the case for your next upgrade.
Subject: Processors, Mobile | October 20, 2016 - 11:40 AM | Ryan Shrout
Tagged: Nintendo, switch, nvidia, tegra
It's been a hell of a 24 hours for NVIDIA and the Tegra processor. A platform that many considered dead in the water after the failure of it to find its way into smartphones or into an appreciable amount of consumer tablets, had two major design wins revealed. First, it was revealed that NVIDIA is powered the new fully autonomous driving system in the Autopilot 2.0 hardware implementation in Tesla's current Model S, X and upcoming Model 3 cars.
Now, we know that Nintendo's long rumored portable and dockable gaming system called Switch is also powered by a custom NVIDIA Tegra SoC.
We don't know much about the hardware that gives the Switch life, but NVIDIA did post a short blog with some basic information worth looking at. Based on it, we know that the Tegra processor powering this Nintendo system is completely custom and likely uses Pascal architecture GPU CUDA cores; though we don't know how many and how powerful it will be. It will likely exceed the performance of the Nintendo Wii U, which was only 0.35 TFLOPS and consisting of 320 AMD-based stream processors. How much faster we just don't know yet.
On the CPU side we assume that this is built using an ARM-based processor, most likely off-the-shelf core designs to keep things simple. Basing it on custom designs like Denver might not be necessary for this type of platform.
Nintendo has traditionally used custom operating systems for its consoles and that seems to be what is happening with the Switch as well. NVIDIA mentions a couple of times how much work the technology vendor put into custom APIs, custom physic engines, new libraries, etc.
The Nintendo Switch’s gaming experience is also supported by fully custom software, including a revamped physics engine, new libraries, advanced game tools and libraries. NVIDIA additionally created new gaming APIs to fully harness this performance. The newest API, NVN, was built specifically to bring lightweight, fast gaming to the masses.
We’ve optimized the full suite of hardware and software for gaming and mobile use cases. This includes custom operating system integration with the GPU to increase both performance and efficiency.
The system itself looks pretty damn interesting, with the ability to switch (get it?) between a docked to your TV configuration to a mobile one with attached or wireless controllers. Check out the video below for a preview.
I've asked both NVIDIA and Nintendo for more information on the hardware side but these guys tend to be tight lipped on the custom silicon going into console hardware. Hopefully one or the other is excited to tell us about the technology so we can some interesting specifications to discuss and debate!
UPDATE: A story on The Verge claims that Nintendo "took the chip from the Shield" and put it in the Switch. This is more than likely completely false; the Shield is a significantly dated product and that kind of statement could undersell the power and capability of the Switch and NVIDIA's custom SoC quite dramatically.
Subject: Processors, Mobile | October 18, 2016 - 11:32 AM | Sebastian Peak
Tagged: SoC, Snapdragon 653, Snapdragon 626, Snapdragon 427, snapdragon, smartphone, qualcomm, mobile
Qualcomm has announced new 400 and 600-series Snapdragon parts, and these new SoCs (Snapdragon 653, 626, and 427) inherit technology found previously on the 800-series parts, including fast LTE connectivity and dual-camera support.
The integrated LTE modem has been significantly for each of these SoCs, and Qualcomm lists these features for each of the new products:
- X9 LTE with CAT 7 modem (300Mbps DL; 150Mbps UL) designed to provide users with a 50 percent increase in maximum uplink speeds over the X8 LTE modem.
- LTE Advanced Carrier Aggregation with up to 2x20 MHz in the downlink and uplink
- Support for 64-QAM in the uplink
- Superior call clarity and higher call reliability with the Enhanced Voice Services (EVS) codec on VoLTE calls.
In addition to the new X9 modem, all three SoCs offer faster CPU and GPU performance, with the Snapdragon 653 (which replaces the 652) now supporting up to 8GB of memory - up from a max of 4GB previously. Each of the new SoCs also feature Qualcomm's Quick Charge 3.0 for fast charging.
Full specifications for these new products can be found on the updated Snapdragon product page.
Availability of the new 600-series Snapdragon processors is set for the end of this year, so we could start seeing handsets with the faster parts soon; while the Snapdragon 427 is expected to ship in devices early in 2017.
Subject: Processors | October 10, 2016 - 02:25 AM | Tim Verry
Tagged: SoC, Intel, FPGA, Cortex A53, arm, Altera
Intel and recently acquired Altera have launched a new FPGA product based on Intel’s 14nm Tri-Gate process featuring an ARM CPU, 5.5 million logic element FPGA, and HBM2 memory in a single package. The Stratix 10 is aimed at data center, networking, and radar/imaging customers.
The Stratix 10 is an Altera-designed FPGA (field programmable gate array) with 5.5 million logic elements and a new HyperFlex architecture that optimizes registers, pipeline, and critical pathing (feed-forward designs) to increase core performance and increase the logic density by five times that of previous products. Further, the upcoming FPGA SoC reportedly can run at twice the core performance of Stratix V or use up to 70% less power than its predecessor at the same performance level.
The increases in logic density, clockspeed, and power efficiency are a combination of the improved architecture and Intel’s 14nm FinFET (Tri-Gate) manufacturing process.
Intel rates the FPGA at 10 TFLOPS of single precision floating point DSP performance and 80 GFLOPS/watt.
Interestingly, Intel is using an ARM processor to feed data to the FPGA chip rather than its own Quark or Atom processors. Specifically, the Stratix 10 uses an ARM CPU with four Cortex A53 cores as well as four stacks of on package HBM2 memory with 1TB/s of bandwidth to feed data to the FPGA. There is also a “secure device manager” to ensure data integrity and security.
The Stratix 10 is aimed at data centers and will be used with in specialized tasks that demand high throughput and low latency. According to Intel, the processor is a good candidate for co-processors to offload and accelerate encryption/decryption, compression/de-compression, or Hadoop tasks. It can also be used to power specialized storage controllers and networking equipment.
Intel has started sampling the new chip to potential customers.
In general, FPGAs are great at highly parallelized workloads and are able to efficiently take huge amounts of inputs and process the data in parallel through custom programmed logic gates. An FPGA is essentially a program in hardware that can be rewired in the field (though depending on the chip it is not necessarily a “fast” process and it can take hours or longer to switch things up heh). These processors are used in medical and imaging devices, high frequency trading hardware, networking equipment, signal intelligence (cell towers, radar, guidance, ect), bitcoin mining (though ASICs stole the show a few years ago), and even password cracking. They can be almost anything you want which gives them an advantage over traditional CPUs and graphics cards though cost and increased coding complexity are prohibitive.
The Stratix 10 stood out as interesting to me because of its claimed 10 TFLOPS of single precision performance which is reportedly the important metric when it comes to training neural networks. In fact, Microsoft recently began deploying FPGAs across its Azure cloud computing platform and plans to build the “world’s fastest AI supercomputer. The Redmond-based company’s Project Catapult saw the company deploy Stratix V FPGAs to nearly all of its Azure datacenters and is using the programmable silicon as part of an “acceleration fabric” in its “configurable cloud” architecture that will be used initially to accelerate the company’s Bing search and AI research efforts and later by independent customers for their own applications.
It is interesting to see Microsoft going with FPGAs especially as efforts to use GPUs for GPGPU and neural network training and inferencing duties have increased so dramatically over the years (with NVIDIA being the one pushing the latter). It may well be a good call on Microsoft’s part as it could enable better performance and researchers would be able to code their AI accelerator platforms down to the gate level to really optimize things. Using higher level languages and cheaper hardware with GPUs does have a lower barrier to entry though. I suppose ti will depend on just how much Microsoft is going to charge customers to use the FPGA-powered instances.
FPGAs are in kind of a weird middle ground and while they are definitely not a new technology, they do continue to get more complex and powerful!
What are your thoughts on Intel's new FPGA SoC?
- Microsoft Goes All in for FPGAs to Build Out AI Cloud
- Microsoft Focusing Efforts, Forming AI and Research Group
- Stratix 10 Architecture Video
- Are FPGAs the future of password cracking and supercomputing?
Subject: Processors | October 1, 2016 - 06:11 PM | Tim Verry
Tagged: xavier, Volta, tegra, SoC, nvidia, machine learning, gpu, drive px 2, deep neural network, deep learning
Earlier this week at its first GTC Europe event in Amsterdam, NVIDIA CEO Jen-Hsun Huang teased a new SoC code-named Xavier that will be used in self-driving cars and feature the company's newest custom ARM CPU cores and Volta GPU. The new chip will begin sampling at the end of 2017 with product releases using the future Tegra (if they keep that name) processor as soon as 2018.
NVIDIA's Xavier is promised to be the successor to the company's Drive PX 2 system which uses two Tegra X2 SoCs and two discrete Pascal MXM GPUs on a single water cooled platform. These claims are even more impressive when considering that NVIDIA is not only promising to replace the four processors but it will reportedly do that at 20W – less than a tenth of the TDP!
The company has not revealed all the nitty-gritty details, but they did tease out a few bits of information. The new processor will feature 7 billion transistors and will be based on a refined 16nm FinFET process while consuming a mere 20W. It can process two 8k HDR video streams and can hit 20 TOPS (NVIDIA's own rating for deep learning int(8) operations).
Specifically, NVIDIA claims that the Xavier SoC will use eight custom ARMv8 (64-bit) CPU cores (it is unclear whether these cores will be a refined Denver architecture or something else) and a GPU based on its upcoming Volta architecture with 512 CUDA cores. Also, in an interesting twist, NVIDIA is including a "Computer Vision Accelerator" on the SoC as well though the company did not go into many details. This bit of silicon may explain how the ~300mm2 die with 7 billion transistors is able to match the 7.2 billion transistor Pascal-based Telsa P4 (2560 CUDA cores) graphics card at deep learning (tera-operations per second) tasks. Of course in addition to the incremental improvements by moving to Volta and a new ARMv8 CPU architectures on a refined 16nm FF+ process.
|Drive PX||Drive PX 2||NVIDIA Xavier||Tesla P4|
|CPU||2 x Tegra X1 (8 x A57 total)||2 x Tegra X2 (8 x A57 + 4 x Denver total)||1 x Xavier SoC (8 x Custom ARM + 1 x CVA)||N/A|
|GPU||2 x Tegra X1 (Maxwell) (512 CUDA cores total||2 x Tegra X2 GPUs + 2 x Pascal GPUs||1 x Xavier SoC GPU (Volta) (512 CUDA Cores)||2560 CUDA Cores (Pascal)|
|TFLOPS||2.3 TFLOPS||8 TFLOPS||?||5.5 TFLOPS|
|DL TOPS||?||24 TOPS||20 TOPS||22 TOPS|
|TDP||~30W (2 x 15W)||250W||20W||up to 75W|
|Process Tech||20nm||16nm FinFET||16nm FinFET+||16nm FinFET|
|Transistors||?||?||7 billion||7.2 billion|
For comparison, the currently available Tesla P4 based on its Pascal architecture has a TDP of up to 75W and is rated at 22 TOPs. This would suggest that Volta is a much more efficient architecture (at least for deep learning and half precision)! I am not sure how NVIDIA is able to match its GP104 with only 512 Volta CUDA cores though their definition of a "core" could have changed and/or the CVA processor may be responsible for closing that gap. Unfortunately, NVIDIA did not disclose what it rates the Xavier at in TFLOPS so it is difficult to compare and it may not match GP104 at higher precision workloads. It could be wholly optimized for int(8) operations rather than floating point performance. Beyond that I will let Scott dive into those particulars once we have more information!
Xavier is more of a teaser than anything and the chip could very well change dramatically and/or not hit the claimed performance targets. Still, it sounds promising and it is always nice to speculate over road maps. It is an intriguing chip and I am ready for more details, especially on the Volta GPU and just what exactly that Computer Vision Accelerator is (and will it be easy to program for?). I am a big fan of the "self-driving car" and I hope that it succeeds. It certainly looks to continue as Tesla, VW, BMW, and other automakers continue to push the envelope of what is possible and plan future cars that will include smart driving assists and even cars that can drive themselves. The more local computing power we can throw at automobiles the better and while massive datacenters can be used to train the neural networks, local hardware to run and make decisions are necessary (you don't want internet latency contributing to the decision of whether to brake or not!).
I hope that NVIDIA's self-proclaimed "AI Supercomputer" turns out to be at least close to the performance they claim! Stay tuned for more information as it gets closer to launch (hopefully more details will emerge at GTC 2017 in the US).
What are your thoughts on Xavier and the whole self-driving car future?
- NVIDIA Teases Xavier, a High-Performance ARM SoC for Drive PX & AI @ AnandTech
- Tegra Related News @ PC Perspective
- Tesla P4 Specifications @ NVIDIA
- CES 2016: NVIDIA Launches DRIVE PX 2 With Dual Pascal GPUs Driving A Deep Neural Network @ PC Perspective
Subject: Processors | September 27, 2016 - 07:01 AM | Scott Michaud
Tagged: overclock, Bristol Ridge, amd
Update 9/27 @ 5:10pm: Added a link to Anandtech's discussion of Bristol Ridge. It was mentioned in the post, but I forgot to add the link itself when I transfered it to the site. The text is the same, though.
While Zen is nearing release, AMD has launched the AM4 platform with updated APUs. They will be based on an updated Excavator architecture, which we discussed during the Carrizo launch in mid-2015. Carrizo came about when AMD decided to focus heavily on the 15W and 35W power targets, giving the best possible experience for that huge market of laptops, in the tasks that those devices usually encounter, such as light gaming and media consumption.
Image Credit: NAMEGT via HWBot
Bristol Ridge, instead, focuses on the 35W and 65W thermal points. This will be targeted more at OEMs who want to release higher-performance products in the holiday time-frame, although consumers can purchase it directly, according to Anandtech, later in the year. I'm guessing it won't be pushed too heavily to DIY users, though, because they know that those users know Zen is coming.
It turns out that overclockers already have their hands on it, though, and it seems to take a fairly high frequency. NAMEGT, from South Korea, uploaded a CPU-Z screenshot to HWBot that shows the 28nm, quad-core part clocked at 4.8 GHz. The included images claim that this was achieved on air, using AMD's new stock “Wraith” cooler.
Subject: Processors | September 19, 2016 - 10:35 AM | Sebastian Peak
Tagged: Socket AM4, processor, FX, cpu, APU, amd, 1331 pins
Image credit: Bit-Tech via HWSW
AMD's newest socket will merge the APU and FX series CPUs into this new AM4 socket, unlike the previous generation which split the two between AM3+ and FM2+. This is great news for system builders, who now have the option of starting with an inexpensive CPU/APU, and upgrading to a more powerful FX processor later on - with the same motherboard.
The new socket will apparently require a new cooler design, which is contrary to early reports (yes, we got it wrong, too) that the AM4 socket would be compatible with existing AM3 cooler mounts (manufacturers could of course offer hardware kits for existing cooler designs). In any case, AMD's new socket takes more of the delicate copper pins you love to try not to bend!
Subject: Processors | September 13, 2016 - 06:51 PM | Tim Verry
Tagged: GLOBALFOUNDRIES, FD-SOI, 12FDX, process technology
In addition to the company’s efforts to get its own next generation FinFET process technology up and running, GlobalFoundries announced that will continue to pursue FD-SOI process technology with the addition of a 12nm FD-SOI (FDX in GlobalFoundries parlance) node to its roadmap with a slated release of 2019 at the earliest.
FD-SOI stands for Fully Depleted Silicon On Insulator and is a planar process technology that uses a thin insulator on top of the base silicon which is then covered by a very thin layer of silicon that is used as the transistor channel. The promise of FD-SOI is that it offers the performance of a FinFET node with lower power consumption and cost than other bulk processes. While the substrate is more expensive with FD-SOI, it uses 50% of the lithography layers and companies can take advantage of reportedly easy-to-implement body biasing to design a single chip that can fulfill multiple products and roles. For example, in the case of 22FDX – which should start rolling out towards the end of this year – GlobalFoundries claims that it offers the performance of 14 FinFET at the 28nm bulk pricing. 22FDX is actually a 14nm front end (FEOL) and 28nm back end of line (BEOL) combined. Notably, it purportedly uses 70% lower power than 28nm HKMG.
A GloFo 22nm FD-SOI "22FDX" transistor.
The FD-SOI design offers lower static leakage and allows chip makers to use body biasing (where substrate is polarized) to balance performance and leakage. Forward Body Biasing allows the transistor to switch faster and/or operate at much lower voltages. On the other hand, Reverse Body Biasing further reduces leakage and frequency to improves energy efficiency. Dynamic Body Biasing (video link) allows for things like turbo modes whereby increasing voltage to the back gate can increase transistor switching speed or reducing voltage can reduce switching speeds and leakage. For a process technology that is aimed at battery powered wearables, mobile devices, and various Internet of Things products, energy efficiency and being able to balance performance and power depending on what is needed is important.
22FDX offers body biasing.
While the process node numbers are not as interesting as the news that FD-SOI will continue itself (thanks to marketing mucking up things heh), GlobalFoundries did share that 12FDX (12nm FD-SOI) will be a true full node shrink that will offer the performance of 10nm FinFET (presumably its own future FinFET tech though they do not specify) with better power characteristics and lower cost than 16nm FinFET. I am not sure if GlobalFoundries is using theoretical numbers or compared it to TSMC’s process here since they do not have their own 16nm FinFET process. Further, 12FDX will feature 15% higher performance and up to 50% lower power consumption that today’s FinFET technologies. The future process is aimed at the “cost sensitive mobile market” that includes IoT, automotive (entertainment and AI), mobile, and networking. FD-SOI is reportedly well suited for processors that combine both digital and analog (RF) elements as well.
Following the roll out of 22FDX GlobalFoundries will be preparing its Fab 1 facility in Dresden, Germany for the 12nm FD-SOI (12FDX) process. The new process is slated to begin tapping out products in early 2019 which should mean products using chips will hit the market in 2020.
The news is interesting because it indicates that there is still interest and research/development being made on FD-SOI and GlobalFoundries is the first company to talk about next generation process plans. Samsung and STMicroelectronics also support FD-SOI but have not announced their future plans yet.
If I had to guess, Samsung will be the next company to talk about future FD-SOI as the company continues to offer both FinFET and FD-SOI to its customers though they certainly do not talk as much about the latter. What are your thoughts on FD-SOI and its place in the market?
Also read: FD-SOI Expands, But Is It Disruptive? @ EETimes