Subject: Processors | January 2, 2017 - 10:33 PM | Scott Michaud
Tagged: sandy bridge, Intel
OC3D is claiming that Intel is working on a significantly new architecture, targeting somewhere around the 2019 or 2020 time frame. Like AMD’s Bulldozer, while there were several architectures after the initial release, they were all based around a set of the same basic assumptions with tweaks for better IPC, reducing bottlenecks, and so forth. Intel has also been using the same fundamentals since Sandy Bridge, albeit theirs aligned much better with how x86 applications were being developed.
According to the report, Intel’s new architecture is expected to remove some old instructions, which will make it less compatible with applications that use these commands. This is actually very similar to what AMD was attempting to do with Bulldozer... to a point. AMD projected that applications would scale well to multiple cores, and use GPUs for floating-point operations; as such, they designed cores in pairs, and decided to eliminate redundant parts, such as half of the floating-point units. Hindsight being 20/20, we now know that developers didn’t change their habits (and earlier Bulldozer parts were allegedly overzealous with cutting out elements in a few areas, too).
In Intel’s case, from what we hear about at the moment, their cuts should be less broad than AMD’s. Rather than projecting a radical shift in programming, they’re just going to cut the fat of their existing instruction set, unless there’s bigger changes planned for the next couple years of development. As for the unlucky applications that use these instructions, OC3D speculates that either Intel or the host operating systems will provide some emulation method, likely in software.
If the things they cut haven’t been used in several years, then you can probably get acceptable performance in the applications that require them via emulation. On the other hand, a bad decision could choke the processor in the same way that Bulldozer, especially the early variants, did for AMD. On the other-other hand, Intel has something that AMD didn’t: the market-share to push (desktop) developers in a given direction. On the fourth hand, which I’ll return to its rightful owner, I promise, we don’t know how much the “(desktop)” clause will translate to overall software in two years.
Right now, it seems like x86 is successfully holding off ARM in performance-critical, consumer applications. If that continues, then Intel might be able to push x86 software development, even if they get a little aggressive like AMD did five-plus-development-time years ago.
Subject: General Tech, Processors | December 15, 2016 - 05:29 PM | Jeremy Hellstrom
Tagged: leak, kaby lake, intel 200
Tech ARP have an interesting story posted today, it would seem they pried the specs of the upcoming Kaby Lake processors and accompanying Intel 200 chipset. The top chip, the $349 Core i7-7700K will have 4 cores and 8 threads running at 4.2 GHz, with an 8 MB L3 cache and a TDP of 95W while the non-K version will have it core clock dropped to 3.6GHz, TDP dropped to 65W and price lowered to $309. The chipsets will encompass series similar to the previous generations from Intel, including the LGA 1151 Z270, H270, Q270, B250 and Q250 series. There is no information on the socket the server level C422 and high end X299 boards will use in this leak, but we are sure you can extrapolate from existing rumours and innuendo. Follow that link for the entire lineup.
"As AMD gears up to launch the AMD Ryzen desktop processor in early Q1 2017, Intel has finalised the launch plans for their desktop Kaby Lake processors, and the accompanying 200 Series chipsets.
Although Intel has been extremely secretive, we managed to obtain the specifications and launch details of the desktop Kaby Lake processors, and the 200 Series chipsets. Check it out!"
Here is some more Tech News from around the web:
- Ashley Madison is getting off lightly just like its clients @ The Inquirer
- Microsoft quietly emits patch to undo its earlier patch that broke Windows 10 networking @ The Register
- PC vendors trying out Qualcomm/Windows products @ DigiTimes
- Uh-oh! Microsoft has another chatbot – but racism is a no-go for Zo @ The Register
- Delete your account: Yahoo admits to another hack affecting one billion customers @ The Inquirer
- Docker opens up crucial container plumbing code cunningly disguised as 'boring infrastructure' @ The Register
- Malvertising Campaign Infects Your Router Instead of Your Browser @ Slashdot
- Top 7 Videos from ApacheCon and Apache Big Data 2016 @ Linux.com
Ryzen coming in 2017
As much as we might want it to be, today is not the day that AMD launches its new Zen processors to the world. We’ve been teased with it for years now, with trickles of information at event after event…but we are going to have to wait a little bit longer with one more tease at least. Today’s AMD is announcing the official branding of the consumer processors based on Zen, previously code named Summit Ridge, along with a clock speed data point and a preview of five technology that will help it be competitive with the Intel Core lineup.
The future consumer desktop processor from AMD will now officially be known as Ryzen. That’s pronounced “RISE-IN” not “RIS-IN”, just so we are all on the same page. CEO Lisa Su was on stage during the reveal at a media event last week and claimed that while media, fans and AMD fell in love with the Zen name, it needed a differentiation from the architecture itself. The name is solid – not earth shattering though I foresee a long life of mispronunciation ahead of it.
Now that we have the official branding behind us, let’s get to the rest of the disclosed information we can reveal today.
We already knew that Summit Ridge would ship with an 8 core, 16 thread version (with lower core counts at lower prices very likely) but now we know a frequency and a cache size. AMD tells us that there will be a processor (the flagship) that will have a base clock of 3.4 GHz with boost clocks above that. How much above that is still a mystery – AMD is likely still tweaking its implementation of boost to get as much performance as possible for launch. This should help put those clock speed rumors to rest for now.
The 20MB of cache matches the Core i7-6900K, though obviously with some dramatic architecture differences between Broadwell and Zen, the effect and utilization of that cache will be interesting measure next year.
We already knew that Ryzen will be utilizing the AM4 platform, but it’s nice to see it reiterated a modern feature set and expandability. DDR4 memory, PCI Express Gen3, native USB 3.1 and NVMe support – there are all necessary building blocks for a modern consumer and enthusiast PC. We still should see how many of these ports the chipset offers and how aggressive motherboard companies like ASUS, MSI and Gigabyte are in their designs. I am hoping there are as many options as would see for an X99/Z170 platform, including budget boards in the $100 space as well as “anything and everything” options for those types of buyers that want to adopt AMD’s new CPU.
Subject: Processors | December 8, 2016 - 02:00 PM | Josh Walrath
Tagged: Xilinx, TSMC, standard cells, layout, FinFET, EDA, custom cell, arm, 7nm
Today ARM is announcing their partnership with Xilinx to deliver design solutions for their products on TSMC’s upcoming 7nm process node. ARM has previously partnered with Xilinx on other nodes including 28, 20, and 16nm. Their partnership extends into design considerations to improve the time to market of complex parts and to rapidly synthesize new designs for cutting edge process nodes.
Xilinx is licensing out the latest ARM Artisan Physical IP platform for TSMC’s 7nm. Artisan Physical IP is a set of tools to help rapidly roll out complex designs as compared to what previous generations of products faced. ARM has specialized libraries and tools to help implement these designs on a variety of processes and receive good results even on the shortest possible design times.
Design relies on two basic methodologies. There is custom cell and then standard cell designs. Custom cell design allows for a tremendous amount of flexibility in layout and electrical characteristics, but it requires a lot of man-hours to complete even the simplest logic. Custom cell designs typically draw less power and provide higher clockspeeds than standard cell design. Standard cells are like Legos in that the cells can be quickly laid out to create complex logic. Software called EDA (Electronic Design Automation) can quickly place and route these cells. GPUs lean heavily on standard cells and EDA software to get highly complex products out to market quickly.
These two basic methods have netted good results over the years, but during that time we have seen implementations of standard cells become more custom in how they behave. While not achieving full custom performance, we have seen semi-custom type endeavors achieve appreciable gains without requiring the man hours to achieve fully custom.
In this particular case ARM is achieving a solid performance in power and speed through automated design that improves upon standard cells, but without the downsides of a fully custom part. This provides positive power and speed benefits without the extra power draw of a traditional standard cell. ARM further improves upon this with the ARM Artisan Power Grid Architect (PGA) which simplifies the development of a complex power grid that services a large and complex chip.
We have seen these types of advancements in the GPU world that NVIDIA and AMD enjoy talking about. A better power grid allows the ASIC to perform at lower power envelopes due to less impedence. The GPU guys have also utilized High Density Libraries to pack in the transistors as tight as possible to utilize less space and increase spatial efficiency. A smaller chip, which requires less power is always a positive development over a larger chip of the same capabilities that requires more power. ARM looks to be doing their own version of these technologies and are applying them to TSMC’s upcoming 7nm FinFET process.
TSMC is not releasing this process to mass production until at least 2018. In 1H 2017 we will see some initial test and early production runs for a handful of partners. Full blown production of 7nm will be in 2018. Early runs and production are increasingly being used for companies working with low power devices. We can look back at 20/16/14 nm processes and see that they were initially used by designs that do not require a lot of power and will run at moderate clockspeeds. We have seen a shift in who uses these new processes with the introduction of sub-28nm process nodes. The complexity of the design, process steps, materials, and libraries have pushed the higher performance and power hungry parts to a secondary position as the foundries attempt to get these next generation nodes up to speed. It isn’t until after some many months of these low power parts are pushed through that we see adjustments and improvements in these next generation nodes to handle the higher power and clockspeed needs of products like desktop CPUs and GPUs.
ARM is certainly being much more aggressive in addressing next generation nodes and pushing their cutting edge products on them to allow for far more powerful mobile products that also exhibit improved battery life. This step with 7nm and Xilinx will provide a lot of data to ARM and its partners downstream when the time comes to implement new designs. Artisan will continue to evolve to allow partners to quickly and efficiently introduce new products on new nodes to the market at an accelerated rate as compared to years past.
Subject: Processors | November 30, 2016 - 11:52 PM | Scott Michaud
Tagged: kaby lake, Intel, core i7 7700k
Someone, who wasn’t Intel, seeded Tom’s Hardware an Intel Core i7-7700k, which is expected for release in the new year. This is the top end of the mainstream SKUs, bringing four cores (eight threads) to 4.2 GHz base, 4.5 GHz boost. Using a motherboard built around the Z170 chipset, they were able to clock the CPU up to 4.8 GHz, which is a little over 4% higher than the Skylake-based Core i7-6700k maximum overclock on the same board.
Image Credit: Tom's Hardware
Lucky number i7-77.
Before we continue, these results are based on a single sample. (Update: @7:01pm -- Also, the motherboard they used has some known overclock and stability issues. They mentioned it a bit in the post, like why their BCLK is 99.65MHz, but I forgot to highlight it here. Thankfully, Allyn caught it in the first ten minutes.) This sample has retail branding, but Intel would not confirm that it performs like they expect a retail SKU would. Normally, pre-release products are labeled as such, but there’s no way to tell if this one part is some exception. Beyond concerns that it might be slightly different from what consumers will eventually receive, there is also a huge variation in overclocking performance due to binning. With a sample size of one, we cannot tell whether this chip has an abnormally high, or an abnormally low, defect count, which affects both power and maximum frequency.
That aside, if this chip is representative of Kaby Lake performance, users should expect an increase in headroom for clock rates, but it will come at the cost of increased power consumption. In fact, Tom’s Hardware states that the chip “acts like an overclocked i7-6700K”. Based on this, it seems like, unless they want an extra 4 PCIe lanes on Z270, Kaby Lake’s performance might already be achievable for users with a lucky Skylake.
I should note that Tom’s Hardware didn’t benchmark the iGPU. I don’t really see it used for much more than video encoding anyway, but it would be nice to see if Intel improved in that area, seeing as how they incremented the model number. Then again, even users who are concerned about that will probably be better off just adding a second, discrete GPU anyway.
Subject: Processors | November 29, 2016 - 02:26 AM | Scott Michaud
Tagged: amd, Zen, Summit Ridge
Guru3D got hold of a product list, which includes entries for AMD’s upcoming Zen architecture.
Four SKUs are thus rumored to exist:
- Zen SR3: (65W, quad-core, eight threads, ~$150 USD)
- Zen SR5: (95W, hexa-core, twelve threads, ~$250 USD)
- Zen SR7: (95W, octo-core, sixteen threads, ~$350 USD)
- Special Zen SR7: (95W, octo-core, sixteen threads, ~$500 USD)
The sheet also states that none of these are supposed to contain integrated graphics, like we see on the current FX line. There is some merit to using integrated GPUs for specific tasks, like processing video while the main GPU is busy or doing a rapid, massively parallel calculation without the latency of memory copies, but AMD is probably right to not waste resources, such as TDP, fighting our current lack of compatible software and viable use cases for these SKUs.
Image Credit: Guru3D
The sheet also contains benchmarks for Cinebench R15. While pre-rendered video is a task that really should be done on GPUs at this point, especially with permissive, strong, open-source projects like Cycles, they do provide a good example of multi-core performance that scales. In this one test, the Summit Ridge 7 CPU ($350) roughly matches the Intel Core i7-6850K ($600), again, according to this one unconfirmed benchmark. It doesn’t list clock rates, but other rumors claim that the top-end chip will be around 3.2 GHz base, 3.5 GHz boost at stock, with manual overclocks exceeding 4 GHz.
These performance figures suggest that Zen will not beat Skylake on single-threaded performance, but it might be close. That might not matter, however. CPUs, these days, are kind-of converging around a certain level of per-thread performance, and are differentiating with core count, price, and features. Unfortunately, there doesn’t seem to have been many leaks regarding enthusiast-level chipsets for Zen, so we don’t know if there will be compelling use cases yet.
Zen is expected early in 2017.
A Holiday Project
A couple of years ago, I performed an experiment around the GeForce GTX 750 Ti graphics card to see if we could upgrade basic OEM, off-the-shelf computers to become competent gaming PCs. The key to this potential upgrade was that the GTX 750 Ti offered a great amount of GPU horsepower (at the time) without the need for an external power connector. Lower power requirements on the GPU meant that even the most basic of OEM power supplies should be able to do the job.
That story was a success, both in terms of the result in gaming performance and the positive feedback it received. Today, I am attempting to do that same thing but with a new class of GPU and a new class of PC games.
The goal for today’s experiment remains pretty much the same: can a low-cost, low-power GeForce GTX 1050 Ti graphics card that also does not require any external power connector offer enough gaming horsepower to upgrade current shipping OEM PCs to "gaming PC" status?
Our target PCs for today come from Dell and ASUS. I went into my local Best Buy just before the Thanksgiving holiday and looked for two machines that varied in price and relative performance.
|Dell Inspiron 3650||ASUS M32CD-B09|
|Processor||Intel Core i3-6100||Intel Core i7-6700|
|Memory||8GB DDR4||12GB DDR4|
|Graphics Card||Intel HD Graphics 530||Intel HD Graphics 530|
|Storage||1TB HDD||1TB Hybrid HDD|
|Power Supply||240 watt||350 watt|
|OS||Windows 10 64-bit||Windows 10 64-bit|
|Total Price||$429 (Best Buy)||$749 (Best Buy)|
The specifications of these two machines are relatively modern for OEM computers. The Dell Inspiron 3650 uses a modest dual-core Core i3-6100 processor with a fixed clock speed of 3.7 GHz. It has a 1TB standard hard drive and a 240 watt power supply. The ASUS M32CD-B09 PC has a quad-core HyperThreaded processor with a 4.0 GHz maximum Turbo clock, a 1TB hybrid hard drive and a 350 watt power supply. Both of the CPUs share the same Intel brand of integrated graphics, the HD Graphics 520. You’ll see in our testing that not only is this integrated GPU unqualified for modern PC gaming, but it also performs quite differently based on the CPU it is paired with.
In August at the company’s annual developer forum, Intel officially took the lid off its 7th generation of Core processor series, codenamed Kaby Lake. The build up to this release has been an interesting one as we saw the retirement of the “tick tock” cadence of processor releases and instead are moving into a market where Intel can spend more development time on a single architecture design to refine and tweak it as the engineers see fit. With that knowledge in tow, I believed, as I think many still do today, that Kaby Lake would be something along the lines of a simple rebrand of current shipping product. After all, since we know of no major architectural changes from Skylake other than improvements in the video and media side of the GPU, what is left for us to look forward to?
As it turns out, the advantages of the 7th Generation Core processor family and Kaby Lake are more substantial than I expected. I was able to get a hold of two different notebooks from the HP Spectre lineup, as near to identical as I could manage, with the primary difference being the move from the 6th Generation Skylake design to the 7th Generation Kaby Lake. After running both machines through a gamut of tests ranging from productivity to content creation and of course battery life, I can say with authority that Intel’s 7th Gen product deserves more accolades than it is getting.
Before we get into the systems and to our results, I think it’s worth taking some time to quickly go over some of what we know about Kaby Lake from the processor perspective. Most of this content was published back in August just after the Intel Developer Forum, so if you are sure you are caught up, you can jump right along to a pictorial look at the two notebooks being tested today.
At its core, the microarchitecture of Kaby Lake is identical to that of Skylake. Instructions per clock (IPC) remain the same with the exception of dedicated hardware changes in the media engine, so you should not expect any performance differences with Kaby Lake except with improved clock speeds.
Also worth noting is that Intel is still building Kaby Lake on 14nm process technology, the same used on Skylake. The term “same” will be debated as well as Intel claims that improvements made in the process technology over the last 24 months have allowed them to expand clock speeds and improve on efficiency.
Dubbing this new revision of the process as “14nm+”, Intel tells me that they have improved the fin profile for the 3D transistors as well as channel strain while more tightly integrating the design process with manufacturing. The result is a 12% increase in process performance; that is a sizeable gain in a fairly tight time frame even for Intel.
That process improvement directly results in higher clock speeds for Kaby Lake when compared to Skylake when running at the same target TDPs. In general, we are looking at 300-400 MHz higher peak clock speeds in Turbo Boost situations when compared to similar TDP products in the 6th generation. Sustained clocks will very likely remain voltage / thermally limited but the ability spike up to higher clocks for even short bursts can improve performance and responsiveness of Kaby Lake when compared to Skylake.
Along with higher fixed clock speeds for Kaby Lake processors, tweaks to Speed Shift will allow these processors to get to peak clock speeds more quickly than previous designs. I extensively tested Speed Shift when the feature was first enabled in Windows 10 and found that the improvement in user experience was striking. Though the move from Skylake to Kaby Lake won’t be as big of a change, Intel was able to improve the behavior.
The graphics architecture and EU (execution unit) layout remains the same from Skylake, but Intel was able to integrate a new video decode unit to improve power efficiency. That new engine can work in parallel with the EUs to improve performance throughput as well, but obviously at the expensive of some power efficiency.
Specific additions to the codec lineup include decode support for 10-bit HEVC and 8/10-bit VP9 as well as encode support for 10-bit HEVC and 9-bit VP9. The video engine adds HDR support with tone mapping though it does require EU utilization. Wide Color Gamut (Rec. 2020) is prepped and ready to go according to Intel for when that standard starts rolling out to displays.
Performance levels for these new HEVC encode/decode blocks is set to allow for 4K 120mbps real-time on both the Y-series (4.5 watt) and U-series (15 watt) processors.
It’s obvious that the changes to Kaby Lake from Skylake are subtle and even I found myself overlooking the benefits that it might offer. While the capabilities it has will be tested on the desktop side at a later date in 2017, for thin and light notebooks, convertibles and even some tablets, the 7th Generation Core processors do in fact take advantage of the process improvements and higher clock speeds to offer an improved user experience.
Subject: Processors, Mobile | November 17, 2016 - 12:30 PM | Ryan Shrout
Tagged: snapdragon, Samsung, qualcomm, FinFET, 835, 10nm
Though we are still months away from shipping devices, Qualcomm has announced that it will be building its upcoming flagship Snapdragon 835 mobile SoC on Samsung’s 10nm 2nd generation FinFET process technology. Qualcomm tells us that integrating the 10nm node in 2017 will keep it “the technology leader in mobile platforms” and this makes the 835 the world's first 10nm production processor.
“Using the new 10nm process node is expected to allow our premium tier Snapdragon 835 processor to deliver greater power efficiency and increase performance while also allowing us to add a number of new capabilities that can improve the user experience of tomorrow’s mobile devices.”
Samsung announced its 10nm FinFET process technology in October of this year and it sports some impressive specifications and benefits to the Snapdragon 835 platform. Per Samsung, it offers “up to a 30% increase in area efficiency with 27% higher performance or up to 40% lower power consumption.” For Qualcomm and its partners, that means a smaller silicon footprint for innovative device designs, including thinner chassis or larger batteries (yes, please).
Other details on the Snapdragon 835 are still pending a future reveal, but Qualcomm says that 835 is in production now and will be shipping in commercial devices in the first half of 2017. We did hear that the new 10nm chip is built on "more than 3 billion transistors" - making it an incredibly complex design!
Keith Kressin SVP, Product Management, Qualcomm Technologies Inc and Ben Suh, SVP, Foundry Marketing, Samsung, show off first 10nm mobile processor, Snapdragon 835, in New York at Qualcomm's Snapdragon Technology Summit.
I am very curious to see how the market reacts to the release of the Snapdragon 835. We are still seeing new devices being released using the 820/821 SoCs, including Google’s own flagship Pixel phones this fall. Qualcomm wants to maintain leadership in the SoC market by innovating on both silicon and software but consumers are becoming more savvy to the actual usable benefits that new devices offer. Qualcomm promises features, performance and power benefits on SD 835 to make the case for your next upgrade.
Subject: Processors, Mobile | October 20, 2016 - 03:40 PM | Ryan Shrout
Tagged: Nintendo, switch, nvidia, tegra
It's been a hell of a 24 hours for NVIDIA and the Tegra processor. A platform that many considered dead in the water after the failure of it to find its way into smartphones or into an appreciable amount of consumer tablets, had two major design wins revealed. First, it was revealed that NVIDIA is powered the new fully autonomous driving system in the Autopilot 2.0 hardware implementation in Tesla's current Model S, X and upcoming Model 3 cars.
Now, we know that Nintendo's long rumored portable and dockable gaming system called Switch is also powered by a custom NVIDIA Tegra SoC.
We don't know much about the hardware that gives the Switch life, but NVIDIA did post a short blog with some basic information worth looking at. Based on it, we know that the Tegra processor powering this Nintendo system is completely custom and likely uses Pascal architecture GPU CUDA cores; though we don't know how many and how powerful it will be. It will likely exceed the performance of the Nintendo Wii U, which was only 0.35 TFLOPS and consisting of 320 AMD-based stream processors. How much faster we just don't know yet.
On the CPU side we assume that this is built using an ARM-based processor, most likely off-the-shelf core designs to keep things simple. Basing it on custom designs like Denver might not be necessary for this type of platform.
Nintendo has traditionally used custom operating systems for its consoles and that seems to be what is happening with the Switch as well. NVIDIA mentions a couple of times how much work the technology vendor put into custom APIs, custom physic engines, new libraries, etc.
The Nintendo Switch’s gaming experience is also supported by fully custom software, including a revamped physics engine, new libraries, advanced game tools and libraries. NVIDIA additionally created new gaming APIs to fully harness this performance. The newest API, NVN, was built specifically to bring lightweight, fast gaming to the masses.
We’ve optimized the full suite of hardware and software for gaming and mobile use cases. This includes custom operating system integration with the GPU to increase both performance and efficiency.
The system itself looks pretty damn interesting, with the ability to switch (get it?) between a docked to your TV configuration to a mobile one with attached or wireless controllers. Check out the video below for a preview.
I've asked both NVIDIA and Nintendo for more information on the hardware side but these guys tend to be tight lipped on the custom silicon going into console hardware. Hopefully one or the other is excited to tell us about the technology so we can some interesting specifications to discuss and debate!
UPDATE: A story on The Verge claims that Nintendo "took the chip from the Shield" and put it in the Switch. This is more than likely completely false; the Shield is a significantly dated product and that kind of statement could undersell the power and capability of the Switch and NVIDIA's custom SoC quite dramatically.