High Bandwidth Cache
Apart from AMD’s other new architecture due out in 2017, its Zen CPU design, there is no other product that has had as much build up and excitement surrounding it than its Vega GPU architecture. After the world learned that Polaris would be a mainstream-only design that was released as the Radeon RX 480, the focus for enthusiasts came straight to Vega. It’s been on the public facing roadmaps for years and signifies the company’s return to the world of high end GPUs, something they have been missing since the release of the Fury X in mid-2015.
Let’s be clear: today does not mark the release of the Vega GPU or products based on Vega. In reality, we don’t even know enough to make highly educated guesses about the performance without more details on the specific implementations. That being said, the information released by AMD today is interesting and shows that Vega will be much more than simply an increase in shader count over Polaris. It reminds me a lot of the build to the Fiji GPU release, when the information and speculation about how HBM would affect power consumption, form factor and performance flourished. What we can hope for, and what AMD’s goal needs to be, is a cleaner and more consistent product release than how the Fury X turned out.
The Design Goals
AMD began its discussion about Vega last month by talking about the changes in the world of GPUs and how the data sets and workloads have evolved over the last decade. No longer are GPUs only worried about games, but instead they must address profession workloads, enterprise workloads, scientific workloads. Even more interestingly, as we have discussed the gap in CPU performance vs CPU memory bandwidth and the growing gap between them, AMD posits that the gap between memory capacity and GPU performance is a significant hurdle and limiter to performance and expansion. Game installs, professional graphics sets, and compute data sets continue to skyrocket. Game installs now are regularly over 50GB but compute workloads can exceed petabytes. Even as we saw GPU memory capacities increase from Megabytes to Gigabytes, reaching as high as 12GB in high end consumer products, AMD thinks there should be more.
Coming from a company that chose to release a high-end product limited to 4GB of memory in 2015, it’s a noteworthy statement.
The High Bandwidth Cache
Bold enough to claim a direct nomenclature change, Vega 10 will feature a HBM2 based high bandwidth cache (HBC) along with a new memory hierarchy to call it into play. This HBC will be a collection of memory on the GPU package just like we saw on Fiji with the first HBM implementation and will be measured in gigabytes. Why the move to calling it a cache will be covered below. (But can’t we call get behind the removal of the term “frame buffer”?) Interestingly, this HBC doesn’t have to be HBM2 and in fact I was told that you could expect to see other memory systems on lower cost products going forward; cards that integrate this new memory topology with GDDR5X or some equivalent seem assured.
With the near comes a new push for performance, efficiency and feature leadership from Qualcomm and its Snapdragon line of mobile SoCs. The Snapdragon 835 was officially announced in November of last year when the partnership with Samsung on 10nm process technology was announced, but we now have the freedom to share more of the details on this new part and how it changes Qualcomm’s position in the ultra-device market. Though devices with the new 835 part won’t be on the market for several more months, with announcements likely coming at CES this year.
Qualcomm frames the story around the Snapdragon 835 processor with what they call the “five pillars” – five different aspects of mobile processor design that they have addressed with updates and technologies. Qualcomm lists them as battery life (efficiency), immersion (performance), connectivity, and security.
Starting where they start, on battery life and efficiency, the SD 835 has a unique focus that might surprise many. Rather than talking up the improvements in performance of the new processor cores, or the power of the new Adreno GPU, Qualcomm is firmly planted on looking at Snapdragon through the lens of battery life. Snapdragon 835 uses half of the power of Snapdragon 801.
The company touts usage claims of 1+ day of talk time, 5+ days of music playback, 11 hours of 4K video playback, 3 hours of 4K video capture and 2+ hours of sustained VR gaming. These sound impressive, but as we must always do in this market, you must wait for consumer devices from Qualcomm partners to really measure how well this platform will do. Going through a typical power user comparison of a device built on the Snapdragon 835 to one use the 820, Qualcomm thinks it could result in 2 or more hours of additional battery life at the end of the day.
We have already discussed the new Quick Charge 4 technology, that can offer 5 hours of use with just 5 minutes of charge time.
Subject: Processors | January 3, 2017 - 03:54 PM | Jeremy Hellstrom
Tagged: z270, overclocking, kaby lake, Intel, i7-7700k, core i7-7700k, 7th generation core, 7700k, 14nm
Having already familiarized yourself with Intel's new Kaby Lake architecture and the i7-7700k processor in Ryan's review you may now be wondering how well the new CPU overclocks for others. [H]ard|OCP received three i7-7700k's and three different Z270 motherboards for testing and they set about overclocking these in combination to see what frequency they could reach. Only one of the chips was ever stable at 5GHz, and it is reassuring that it managed that on all three motherboards, the remaining two would only hit 4.8GHz which is still not a bad result. Drop by to see their settings in full detail.
"After having a few weeks to play around with Intel's new Kaby Lake architecture Core i7-7700K processors, we finally have some results that we want to discuss when it comes to overclocking and the magic 5GHz many of us are looking for, and what we think your chances are of getting there yourself."
Here are some more Processor articles from around the web:
- Intel's Core i7-7700K 'Kaby Lake' CPU @ The Tech Report
- Intel Kaby Lake i7-7700K & i5-7600K Review @ Hardware Canucks
- Intel Core i7-7700K vs 6700K: 22 Games, RX 480 & GTX 1080 @ techPowerUp
- ntel Kaby Lake Core i7-7700K Performance & Z270 Chipset Overview @ Techgage
- Intel 7th Generation Core i7 7700K Processor Review @ OCC
- Intel Kaby Lake Core i7-7700K IPC @ [H]ard|OCP
- Core i5-6400 @ Hardware Secrets
- FX-4300 @ Hardware Secrets
- AMD's New Ryzen CPU - SMT and IPC @ [H]ard|OCP
It probably doesn't surprise any of our readers that there has been a tepid response to the leaks and reviews that have come out about the new Core i7-7700K CPU ahead of the scheduled launch of Kaby Lake-S from Intel. Replacing the Skylake-based 6700K part as the new "flagship" consumer enthusiast CPU, the 7700K has quite a bit stacked against it. We know that Kaby Lake is the first in the new sequence of tick-tock-optimize, and thus there are few architectural changes to any portion of the chip. However, that does not mean that the 7700K and Kaby Lake in general don't offer new capabilities (HEVC) or performance (clock speed).
The Core i7-7700K is in an interesting spot as well with regard to motherboards and platforms. Nearly all motherboards that run the Z170 chipset will be able to run the new Kaby Lake parts without requiring an upgrade to the newly released Z270 chipset. However, the likelihood that any user on a Z170 platform today using a Skylake processor will feel the NEED to upgrade to Kaby Lake is minimal, to say the least. The Z270 chipset only offers a couple of new features compared to last generation, so the upgrade path is again somewhat limited in excitement.
Let's start by taking a look at the Core i7-7700K and how it compares to the previous top-end parts from the consumer processor line and then touch on the changes that Kaby Lake brings to the table.
With the beginning of CES just days away (as I write this), Intel is taking the wrapping paper off of its first gift of 2017 to the industry. As you can see from the slide above, more than just the Kaby Lake-S consumer socketed processors are launching today, but other components including Iris Plus graphics implementations and quad-core notebook implementations will need to wait for another day.
For DIY builders and OEMs, Kaby Lake-S, now known as the 7th Generation Core Processor family, offer some changes and additions. First, we will get a dual-core HyperThreaded processor with an unlocked designation in the Core i3-7350K. Other than the aforementioned Z270 chipset, Kaby Lake will be the first platform compatible with Intel Optane memory. (To be extra clear, I was told that previous processors will NOT be able to utilize Optane in its M.2 form factor.)
Though we have already witnessed Lenovo announcing products using Optane, this is the first official Intel discussion about it. Optane memory will be available in M.2 modules that can be installed on Z270 motherboards, improving snappiness and responsiveness. It seems this will be launched later in the quarter as we don't have any performance numbers or benchmarks to point to demonstrating the advantages that Intel touts. I know both Allyn and I are very excited to see how this differs from previous Intel caching technologies.
|Core i7-7700K||Core i7-6700K||Core i7-5775C||Core i7-4790K||Core i7-4770K||Core i7-3770K|
|Architecture||Kaby Lake||Skylake||Broadwell||Haswell||Haswell||Ivy Bridge|
|Socket||LGA 1151||LGA 1151||LGA 1150||LGA 1150||LGA 1150||LGA 1155|
|Base Clock||4.2 GHz||4.0 GHz||3.3 GHz||4.0 GHz||3.5 GHz||3.5 GHz|
|Max Turbo Clock||4.5 GHz||4.2 GHz||3.7 GHz||4.4 GHz||3.9 GHz||3.9 GHz|
|Memory Speeds||Up to 2400 MHz||Up to 2133 MHz||Up to 1600 MHz||Up to 1600 MHz||Up to 1600 MHz||Up to 1600 MHz|
|Cache (L4 Cache)||8MB||8MB||6MB (128MB)||8MB||8MB||8MB|
|System Bus||DMI3 - 8.0 GT/s||DMI3 - 8.0 GT/s||DMI2 - 6.4 GT/s||DMI2 - 5.0 GT/s||DMI2 - 5.0 GT/s||DMI2 - 5.0 GT/s|
|Graphics||HD Graphics 630||HD Graphics 530||Iris Pro 6200||HD Graphics 4600||HD Graphics 4600||HD Graphics 4000|
|Max Graphics Clock||1.15 GHz||1.15 GHz||1.15 GHz||1.25 GHz||1.25 GHz||1.15 GHz|
Subject: Processors | January 2, 2017 - 05:33 PM | Scott Michaud
Tagged: sandy bridge, Intel
OC3D is claiming that Intel is working on a significantly new architecture, targeting somewhere around the 2019 or 2020 time frame. Like AMD’s Bulldozer, while there were several architectures after the initial release, they were all based around a set of the same basic assumptions with tweaks for better IPC, reducing bottlenecks, and so forth. Intel has also been using the same fundamentals since Sandy Bridge, albeit theirs aligned much better with how x86 applications were being developed.
According to the report, Intel’s new architecture is expected to remove some old instructions, which will make it less compatible with applications that use these commands. This is actually very similar to what AMD was attempting to do with Bulldozer... to a point. AMD projected that applications would scale well to multiple cores, and use GPUs for floating-point operations; as such, they designed cores in pairs, and decided to eliminate redundant parts, such as half of the floating-point units. Hindsight being 20/20, we now know that developers didn’t change their habits (and earlier Bulldozer parts were allegedly overzealous with cutting out elements in a few areas, too).
In Intel’s case, from what we hear about at the moment, their cuts should be less broad than AMD’s. Rather than projecting a radical shift in programming, they’re just going to cut the fat of their existing instruction set, unless there’s bigger changes planned for the next couple years of development. As for the unlucky applications that use these instructions, OC3D speculates that either Intel or the host operating systems will provide some emulation method, likely in software.
If the things they cut haven’t been used in several years, then you can probably get acceptable performance in the applications that require them via emulation. On the other hand, a bad decision could choke the processor in the same way that Bulldozer, especially the early variants, did for AMD. On the other-other hand, Intel has something that AMD didn’t: the market-share to push (desktop) developers in a given direction. On the fourth hand, which I’ll return to its rightful owner, I promise, we don’t know how much the “(desktop)” clause will translate to overall software in two years.
Right now, it seems like x86 is successfully holding off ARM in performance-critical, consumer applications. If that continues, then Intel might be able to push x86 software development, even if they get a little aggressive like AMD did five-plus-development-time years ago.
Subject: General Tech, Processors | December 15, 2016 - 12:29 PM | Jeremy Hellstrom
Tagged: leak, kaby lake, intel 200
Tech ARP have an interesting story posted today, it would seem they pried the specs of the upcoming Kaby Lake processors and accompanying Intel 200 chipset. The top chip, the $349 Core i7-7700K will have 4 cores and 8 threads running at 4.2 GHz, with an 8 MB L3 cache and a TDP of 95W while the non-K version will have it core clock dropped to 3.6GHz, TDP dropped to 65W and price lowered to $309. The chipsets will encompass series similar to the previous generations from Intel, including the LGA 1151 Z270, H270, Q270, B250 and Q250 series. There is no information on the socket the server level C422 and high end X299 boards will use in this leak, but we are sure you can extrapolate from existing rumours and innuendo. Follow that link for the entire lineup.
"As AMD gears up to launch the AMD Ryzen desktop processor in early Q1 2017, Intel has finalised the launch plans for their desktop Kaby Lake processors, and the accompanying 200 Series chipsets.
Although Intel has been extremely secretive, we managed to obtain the specifications and launch details of the desktop Kaby Lake processors, and the 200 Series chipsets. Check it out!"
Here is some more Tech News from around the web:
- Ashley Madison is getting off lightly just like its clients @ The Inquirer
- Microsoft quietly emits patch to undo its earlier patch that broke Windows 10 networking @ The Register
- PC vendors trying out Qualcomm/Windows products @ DigiTimes
- Uh-oh! Microsoft has another chatbot – but racism is a no-go for Zo @ The Register
- Delete your account: Yahoo admits to another hack affecting one billion customers @ The Inquirer
- Docker opens up crucial container plumbing code cunningly disguised as 'boring infrastructure' @ The Register
- Malvertising Campaign Infects Your Router Instead of Your Browser @ Slashdot
- Top 7 Videos from ApacheCon and Apache Big Data 2016 @ Linux.com
Ryzen coming in 2017
As much as we might want it to be, today is not the day that AMD launches its new Zen processors to the world. We’ve been teased with it for years now, with trickles of information at event after event…but we are going to have to wait a little bit longer with one more tease at least. Today’s AMD is announcing the official branding of the consumer processors based on Zen, previously code named Summit Ridge, along with a clock speed data point and a preview of five technology that will help it be competitive with the Intel Core lineup.
The future consumer desktop processor from AMD will now officially be known as Ryzen. That’s pronounced “RISE-IN” not “RIS-IN”, just so we are all on the same page. CEO Lisa Su was on stage during the reveal at a media event last week and claimed that while media, fans and AMD fell in love with the Zen name, it needed a differentiation from the architecture itself. The name is solid – not earth shattering though I foresee a long life of mispronunciation ahead of it.
Now that we have the official branding behind us, let’s get to the rest of the disclosed information we can reveal today.
We already knew that Summit Ridge would ship with an 8 core, 16 thread version (with lower core counts at lower prices very likely) but now we know a frequency and a cache size. AMD tells us that there will be a processor (the flagship) that will have a base clock of 3.4 GHz with boost clocks above that. How much above that is still a mystery – AMD is likely still tweaking its implementation of boost to get as much performance as possible for launch. This should help put those clock speed rumors to rest for now.
The 20MB of cache matches the Core i7-6900K, though obviously with some dramatic architecture differences between Broadwell and Zen, the effect and utilization of that cache will be interesting measure next year.
We already knew that Ryzen will be utilizing the AM4 platform, but it’s nice to see it reiterated a modern feature set and expandability. DDR4 memory, PCI Express Gen3, native USB 3.1 and NVMe support – there are all necessary building blocks for a modern consumer and enthusiast PC. We still should see how many of these ports the chipset offers and how aggressive motherboard companies like ASUS, MSI and Gigabyte are in their designs. I am hoping there are as many options as would see for an X99/Z170 platform, including budget boards in the $100 space as well as “anything and everything” options for those types of buyers that want to adopt AMD’s new CPU.
Subject: Processors | December 8, 2016 - 09:00 AM | Josh Walrath
Tagged: Xilinx, TSMC, standard cells, layout, FinFET, EDA, custom cell, arm, 7nm
Today ARM is announcing their partnership with Xilinx to deliver design solutions for their products on TSMC’s upcoming 7nm process node. ARM has previously partnered with Xilinx on other nodes including 28, 20, and 16nm. Their partnership extends into design considerations to improve the time to market of complex parts and to rapidly synthesize new designs for cutting edge process nodes.
Xilinx is licensing out the latest ARM Artisan Physical IP platform for TSMC’s 7nm. Artisan Physical IP is a set of tools to help rapidly roll out complex designs as compared to what previous generations of products faced. ARM has specialized libraries and tools to help implement these designs on a variety of processes and receive good results even on the shortest possible design times.
Design relies on two basic methodologies. There is custom cell and then standard cell designs. Custom cell design allows for a tremendous amount of flexibility in layout and electrical characteristics, but it requires a lot of man-hours to complete even the simplest logic. Custom cell designs typically draw less power and provide higher clockspeeds than standard cell design. Standard cells are like Legos in that the cells can be quickly laid out to create complex logic. Software called EDA (Electronic Design Automation) can quickly place and route these cells. GPUs lean heavily on standard cells and EDA software to get highly complex products out to market quickly.
These two basic methods have netted good results over the years, but during that time we have seen implementations of standard cells become more custom in how they behave. While not achieving full custom performance, we have seen semi-custom type endeavors achieve appreciable gains without requiring the man hours to achieve fully custom.
In this particular case ARM is achieving a solid performance in power and speed through automated design that improves upon standard cells, but without the downsides of a fully custom part. This provides positive power and speed benefits without the extra power draw of a traditional standard cell. ARM further improves upon this with the ARM Artisan Power Grid Architect (PGA) which simplifies the development of a complex power grid that services a large and complex chip.
We have seen these types of advancements in the GPU world that NVIDIA and AMD enjoy talking about. A better power grid allows the ASIC to perform at lower power envelopes due to less impedence. The GPU guys have also utilized High Density Libraries to pack in the transistors as tight as possible to utilize less space and increase spatial efficiency. A smaller chip, which requires less power is always a positive development over a larger chip of the same capabilities that requires more power. ARM looks to be doing their own version of these technologies and are applying them to TSMC’s upcoming 7nm FinFET process.
TSMC is not releasing this process to mass production until at least 2018. In 1H 2017 we will see some initial test and early production runs for a handful of partners. Full blown production of 7nm will be in 2018. Early runs and production are increasingly being used for companies working with low power devices. We can look back at 20/16/14 nm processes and see that they were initially used by designs that do not require a lot of power and will run at moderate clockspeeds. We have seen a shift in who uses these new processes with the introduction of sub-28nm process nodes. The complexity of the design, process steps, materials, and libraries have pushed the higher performance and power hungry parts to a secondary position as the foundries attempt to get these next generation nodes up to speed. It isn’t until after some many months of these low power parts are pushed through that we see adjustments and improvements in these next generation nodes to handle the higher power and clockspeed needs of products like desktop CPUs and GPUs.
ARM is certainly being much more aggressive in addressing next generation nodes and pushing their cutting edge products on them to allow for far more powerful mobile products that also exhibit improved battery life. This step with 7nm and Xilinx will provide a lot of data to ARM and its partners downstream when the time comes to implement new designs. Artisan will continue to evolve to allow partners to quickly and efficiently introduce new products on new nodes to the market at an accelerated rate as compared to years past.
Subject: Processors | November 30, 2016 - 06:52 PM | Scott Michaud
Tagged: kaby lake, Intel, core i7 7700k
Someone, who wasn’t Intel, seeded Tom’s Hardware an Intel Core i7-7700k, which is expected for release in the new year. This is the top end of the mainstream SKUs, bringing four cores (eight threads) to 4.2 GHz base, 4.5 GHz boost. Using a motherboard built around the Z170 chipset, they were able to clock the CPU up to 4.8 GHz, which is a little over 4% higher than the Skylake-based Core i7-6700k maximum overclock on the same board.
Image Credit: Tom's Hardware
Lucky number i7-77.
Before we continue, these results are based on a single sample. (Update: @7:01pm -- Also, the motherboard they used has some known overclock and stability issues. They mentioned it a bit in the post, like why their BCLK is 99.65MHz, but I forgot to highlight it here. Thankfully, Allyn caught it in the first ten minutes.) This sample has retail branding, but Intel would not confirm that it performs like they expect a retail SKU would. Normally, pre-release products are labeled as such, but there’s no way to tell if this one part is some exception. Beyond concerns that it might be slightly different from what consumers will eventually receive, there is also a huge variation in overclocking performance due to binning. With a sample size of one, we cannot tell whether this chip has an abnormally high, or an abnormally low, defect count, which affects both power and maximum frequency.
That aside, if this chip is representative of Kaby Lake performance, users should expect an increase in headroom for clock rates, but it will come at the cost of increased power consumption. In fact, Tom’s Hardware states that the chip “acts like an overclocked i7-6700K”. Based on this, it seems like, unless they want an extra 4 PCIe lanes on Z270, Kaby Lake’s performance might already be achievable for users with a lucky Skylake.
I should note that Tom’s Hardware didn’t benchmark the iGPU. I don’t really see it used for much more than video encoding anyway, but it would be nice to see if Intel improved in that area, seeing as how they incremented the model number. Then again, even users who are concerned about that will probably be better off just adding a second, discrete GPU anyway.
Subject: Processors | November 28, 2016 - 09:26 PM | Scott Michaud
Tagged: amd, Zen, Summit Ridge
Guru3D got hold of a product list, which includes entries for AMD’s upcoming Zen architecture.
Four SKUs are thus rumored to exist:
- Zen SR3: (65W, quad-core, eight threads, ~$150 USD)
- Zen SR5: (95W, hexa-core, twelve threads, ~$250 USD)
- Zen SR7: (95W, octo-core, sixteen threads, ~$350 USD)
- Special Zen SR7: (95W, octo-core, sixteen threads, ~$500 USD)
The sheet also states that none of these are supposed to contain integrated graphics, like we see on the current FX line. There is some merit to using integrated GPUs for specific tasks, like processing video while the main GPU is busy or doing a rapid, massively parallel calculation without the latency of memory copies, but AMD is probably right to not waste resources, such as TDP, fighting our current lack of compatible software and viable use cases for these SKUs.
Image Credit: Guru3D
The sheet also contains benchmarks for Cinebench R15. While pre-rendered video is a task that really should be done on GPUs at this point, especially with permissive, strong, open-source projects like Cycles, they do provide a good example of multi-core performance that scales. In this one test, the Summit Ridge 7 CPU ($350) roughly matches the Intel Core i7-6850K ($600), again, according to this one unconfirmed benchmark. It doesn’t list clock rates, but other rumors claim that the top-end chip will be around 3.2 GHz base, 3.5 GHz boost at stock, with manual overclocks exceeding 4 GHz.
These performance figures suggest that Zen will not beat Skylake on single-threaded performance, but it might be close. That might not matter, however. CPUs, these days, are kind-of converging around a certain level of per-thread performance, and are differentiating with core count, price, and features. Unfortunately, there doesn’t seem to have been many leaks regarding enthusiast-level chipsets for Zen, so we don’t know if there will be compelling use cases yet.
Zen is expected early in 2017.