ARM Releases Cortex-A72 for Licensing
On February 3rd, ARM announced a slew of new designs, including the Cortex A72. Few details were shared with us, but what we learned was that it could potentially redefine power and performance in the ARM ecosystem. Ryan was invited to London to participate in a deep dive of what ARM has done to improve its position against market behemoth Intel in the very competitive mobile space. Intel has a leg up on process technology with their 14nm Tri-Gate process, but they are continuing to work hard in making their x86 based processors more power efficient, while still maintaining good performance. There are certain drawbacks to using an ISA that is focused on high performance computing rather than being designed from scratch to provide good performance with excellent energy efficiency.
ARM has been on a pretty good roll with their Cortex A9, A7, A15, A17, A53, and A57 parts over the past several years. These designs have been utilized in a multitude of products and scenarios, with configurations that have scaled up to 16 cores. While each iteration has improved upon the previous, ARM is facing the specter of Intel’s latest generation, highly efficient x86 SOCs based on the 2nd gen 14nm Tri-Gate process. Several things have fallen into place for ARM to help them stay competitive, but we also cannot ignore the experience and design hours that have led to this product.
(Editor's Note: During my time with ARM last week it became very apparent that it is not standing still, not satisfied with its current status. With competition from Intel, Qualcomm and others ramping up over the next 12 months in both mobile and server markets, ARM will more than ever be depedent on the evolution of core design and GPU design to maintain advantages in performance and efficiency. As Josh will go into more detail here, the Cortex-A72 appears to be an incredibly impressive design and all indications and conversations I have had with others, outside of ARM, believe that it will be an incredibly successful product.)
Cortex A72: Highest Performance ARM Cortex
ARM has been ubiquitous for mobile applications since it first started selling licenses for their products in the 90s. They were found everywhere it seemed, but most people wouldn’t recognize the name ARM because these chips were fabricated and sold by licensees under their own names. Guys like Ti, Qualcomm, Apple, DEC and others all licensed and adopted ARM technology in one form or the other.
ARM’s importance grew dramatically with the introduction of increased complexity cellphones and smartphones. They also gained attention through multimedia devices such as the Microsoft Zune. What was once a fairly niche company with low performance, low power offerings became the 800 pound gorilla in the mobile market. Billions of chips are sold yearly based on ARM technology. To stay in that position ARM has worked aggressively on continually providing excellent power characteristics for their parts, but now they are really focusing on overall performance and capabilities to address, not only the smartphone market, but also the higher performance computing and server spaces that they want a significant presence in.
Subject: Processors | April 27, 2015 - 06:06 PM | Josh Walrath
Tagged: Zen, Steamroller, Kaveria, k12, Excavator, carrizo, bulldozer, amd
There are some pretty breathless analysis of a single leaked block diagram that is supposedly from AMD. This is one of the first indications of what the Zen architecture looks like from a CPU core standpoint. The block diagram is very simple, but looks in the same style as what we have seen from AMD. There are some labels, but this is almost a 50,000 foot view of the architecture rather than a slightly clearer 10,000 foot view.
There are a few things we know for sure about Zen. It is a clean sheet design that moves away from what AMD was pursuing with their Bulldozer family of cores. Zen gives up CMT for SMT support for handling more threads. The design has a cluster of four cores sharing 8 MB of L3 cache, with each core having access to 512 KB of L2 cache. There is a lot of optimism that AMD can kick the trend of falling more and more behind Intel every year with this particular design. Jim Keller is viewed very positively due to his work at AMD in the K7 through K8 days, as well as what he accomplished at Apple with their ARM based offerings.
One of the first sites to pick up this diagram wrote quite a bit about what they saw. There was a lot of talk about, “right off the bat just by looking at the block diagram we can tell that Zen will have substantially higher single threaded performance compared to Excavator and the Bulldozer family.” There was the assumption that because it had two 256-bit FMACs that it could fuse them to create a single 512 bit AVX product.
These assumptions are pretty silly. This is a very simple block diagram that answers few very important questions about the architecture. Yes, it shows 6 int pipelines, but we don’t know how many are address generation vs. execution units. We don’t know how wide decode is. We don’t know latency to L2 cache, much less how L3 is connected and shared out. So just because we see more integer pipelines per core does not automatically mean, “Da, more is better, strong like tractor!” We don’t know what improvements or simplifications we will see in the schedulers. There is no mention of the front-end other than Fetch and Decode. How about Branch Prediction? What is the latency for the memory controller when addressing external memory?
Essentially, this looks like a simplified way of expressing to analysts that AMD is attempting to retain their per core integer performance while boosting floating point/AVX at a similar level. Other than that, there is very little that can be gleaned from this simple block diagram.
Other leaks that are interesting concerning Zen are the formats that we will see these products integrated into. One leak detailed a HPC aimed APU that features 16 Zen cores with 32 MB of L3 cache attached to a very large GPU. Another leak detailed a server level chip that will support 32 cores and will be seen in 2P systems. Zen certainly appears to be very flexible, and in ways it reminds me of a much beefier Jaguar type CPU. My gut feeling is that AMD will get closer to Intel than it has been in years, and perhaps they can catch Intel by surprise with a few extra features. The reality of the situation is that AMD is far behind and only now are we seeing pure-play foundries start to get even close to Intel in terms of process technology. AMD is very much at a disadvantage here.
Still, the company needs to release new, competitive products that will refill the company coffers. The previous quarter’s loss has dug into cash reserves, but AMD is still stable in terms of cash on hand and long term debt. 2015 will see new GPUs, an APU refresh, and the release of the new Carrizo parts. 2016 looks to be the make or break year with Zen and K12.
Edit 2015-04-28: Thanks to SH STON we have a new slide that has been leaked from the same deck as this one. This has some interesting info in that AMD may be going away from exclusive cache designs. Exclusive was a good idea when cache was small and expensive, as data was not replicated through each level of cache (L1 was not replicated in L2 and L2 was not replicated in L3). Intel has been using inclusive cache since forever, where data is replicated and simpler to handle. Now it looks like AMD is moving towards inclusive. This is not necessarily a bad thing as the 512 KB of L2 can easily handle what looks to be 128 KB of L1 and the shared 8 MB of L3 cache can easily handle the 2 MB of L2 data. Here is the link to that slide.
The new slide in question.
Subject: General Tech, Graphics Cards, Processors | April 19, 2015 - 02:08 PM | Scott Michaud
Tagged: moores law, Intel
While he was the director of research and development at Fairchild Semiconductor, Gordon E. Moore predicted that the number of components in an integrated circuits would double every year. Later, this time-step would slow to every two years; you can occasionally hear people talk about eighteen months too, but I am not sure who derived that number. In a few years, he would go on to found Intel with Robert Noyce, where they spend tens of billions of dollars annually to keep up with the prophecy.
It works out for the most part, but we have been running into physical issues over the last few years though. One major issue is that, with our process technology dipping into the single- and low double-digit nanometers, we are running out of physical atoms to manipulate. The distance between silicon atoms in a solid at room temperature is about 0.5nm; a 14nm product has features containing about 28 atoms, give or take a few in rounding error.
It has been a good fifty years since the start of Moore's Law. Humanity has been developing plans for how to cope with the eventual end of silicon lithography process shrinks. We will probably transition to smaller atoms and molecules and later consider alternative technologies like photonic crystals, which routes light in the hundreds of terahertz through a series of waveguides that make up an integrated circuit. Another interesting thought: will these technologies fall in line with Moore's Law in some way?
Subject: Processors | April 15, 2015 - 10:04 PM | Ryan Shrout
Tagged: Intel, Skylake, skylake-s, lga1151, 100 series
Some slides have leaked out with information about Intel's forthcoming 6th Generation Core processor, code named Skylake. We have known that Skylake was coming, and coming this year, but there have been a lot of questions about enthusiast parts and what that means for DIY builders. The slides were first seen over at WCCFTech.com and show some interesting new information.
Dubbed Skylake-S, the LGA (socketed) processor will use a new derivative with 1151 pins as well as a new set of chipsets, the Intel 100-series. Skylake is built on the same 14nm process technology used with Broadwell but will feature a new microarchitecture for both the IA cores and the graphics systems. Obviously you can read the slide yourself above, but some of the highlights are worth touching on individually. Skylake will support both DDR3L and DDR4 memory systems with the enthusiast grade parts likely the only ones to attempt to push the newer, faster DDR4 speeds.
Enthusiasts will also be glad to know that there are planned 95 watt quad-core SKUs that will support unlocked features and overclocking capability. Intel lists an "enhanced" BCLK overclocking with the term "full range" which likely means there will no longer be a need for straps to 125 MHz, etc. A 95 watt TDP is higher than the 88 watt limit we saw on Haswell processors so there is a chance we might actually witness usable performance gains if Intel can get the clock speeds up and above where they sit today with current generation parts.
The use of DMI 3.0, the connection between the processor and the chipset, sees the first increase in bandwidth in many generations. Rated at 8 GT/s, twice that of the DMI 2.0 interface used on Haswell, should allow for fewer bottlenecks on storage and external PCIe connections coming from the chipset.
The new Intel 100-series chipsets will come in three variants at launch: the Z170, the H170 and the H110. The one we are most concerned with is the Z170 of course as it will be paired wit the higher end 65 watt and 95 watt enthusiast processors. Based on these specs, Skylake will continue to operate with only 16 lanes of PCI Express 3.0 capable of running at 1 x16, 2 x8 or 1 x8 and 2 x4 connections. With either DDR3L or DDR4 you will have a dual-channel memory system.
For storage, the Z170 still has six SATA 6.0 Gb/s ports, moves to 14 USB ports maximum with 10 of them capable of USB 3.0 speeds and it upgrades Intel RST to support PCIe storage drivers. Of note here is that the Intel chipset does not include USB 3.1 capability so motherboard vendors will continue to need an external controller to integrate it. Without a doubt the 100-series chipsets will be able to support booting and compatibility with the new Intel 750-series PCIe SSDs, the current king of the hill.
As for timing, the roadmap lists the Z170 chipset and the Skylake-S processor as a Q3 2015 release. I would normally expect that to line up with Computex in early June but that doesn't appear to be the case based on other information I am getting.
Subject: Processors | April 7, 2015 - 05:56 PM | Jeremy Hellstrom
Tagged: amd, FX-8320e
Over at Techgage one of the writers recently updated their system, due to budget constraints they needed to stay in the $600-700 range all told which of course indicates an AMD build. They chose the $138 FX-8320E for their processor, along with a pair of GTX 760s, the ASUS M5A99FX Pro R2.0, 8GB of DDR3-1866 and with storage, power, cooling and case they managed to keep within the ir budget. The question remain is if it is powerful enough for reasonable gaming duties such as Borderlands 2. Read on to see if the recommendation is to go with AMD or the i3-4330 and a low end H97 board.
"Released this past fall, AMD’s FX-8320E processor promises to deliver a lot of processing power for those on a budget. It sports eight cores, and as a Black Edition, its overclocking capabilities are unrestricted. But is that enough to make this the best go-to budget processor, especially for gamers?"
Here are some more Processor articles from around the web:
- A10-7800 CPU Review @ Hardware Secrets
- AMD A8-7650k Kaveri @ eTeknix
- A10-6800K vs. Core i3-4150 CPU Review @ Hardware Secrets
Subject: Processors, Mobile | March 25, 2015 - 09:51 PM | Scott Michaud
Tagged: Intel, core m, atom, surface, Surface 2, Windows 8.1, windows 10
The stack of Microsoft tablet devices had high-end Intel Core processors hovering over ARM SoCs, the two separated by a simple “Pro” label (and Windows 8.x versus Windows RT). While the Pro line has been kept reasonably up to date, the lower tier has been stagnant for a while. That is apparently going to change. WinBeta believes that a new, non-Pro Surface will be announced soon, at or before BUILD 2015. Unlike previous Surface models, it will be powered by an x86 processor from Intel, either an Atom or a Core M.
This also means it will run Windows 8.1.
The article claims, somewhat tongue-in-cheek, that Windows RT is dead. No. But still, the device should be eligible for a Windows 10 upgrade when it launches, unlike the RT-based Surfaces. Whether that is a surprise depends on the direction you view it from. I would find it silly for Microsoft to release a new Surface device, months before an OS update, but design it to be incompatible with it. On the other hand, it would be the first non-Pro Surface to do so. Either way, it was reported.
The “Surface 3”, whatever it will be called, is expected to be a fanless design. VR-Zone expects that it will be similar to the 10.6-inch, 1080p form factor of the Surface 2, but that seems to be their speculation. That is about all that we know thus far.
Subject: Processors | March 17, 2015 - 03:20 PM | Jeremy Hellstrom
Tagged: Ivy Bridge-E, Intel, i7-4970K, i7-4960X, i7-4770k, Haswell-E
TechPowerUp has put together a quick overview of the differences of Intel's current offerings for your reference when purchasing a new machine or considering an upgrade. The older i7-4770K would run you $310 as compared to $338 for the i7-4790K or $385 for an i7-5820K while the i7-4960X would set you back $1025. Is it worth upgrading your machine if you have an older Haswell, or going full hog to pick up the $1000 flagship model? The results are presented in a handy format and while perhaps not an in depth review the results are quite striking, especially the performance while gaming.
"We review the Haswell-E lineup by pitting all its processors against each other and the Ivy Bridge-E Intel Core i7-4960X, Haswell Refresh Intel Core i7-4970K, and Haswell Intel Core i7-4770K. If you are looking to build a high-end gaming PC, or are looking to upgrade, then look no further: This review will tell you which CPU you will want to get to cover your needs."
Here are some more Processor articles from around the web:
- A6-6400K vs. Pentium G3220 CPU Review @ Hardware Secrets
- Core i7-5960X CPU Review @ Hardware Secrets
- Intel Core i5 4690K - the 5GHz project @ HardwareOverclock
Subject: Editorial, Processors | March 12, 2015 - 08:29 PM | Tim Verry
Tagged: Xeon D, xeon, servers, opinion, microserver, Intel
Intel dealt a blow to AMD and ARM this week with the introduction of the Xeon Processor D Product Family of low power server SoCs. The new Xeon D chips use Intel’s latest 14nm process and top out at 45W. The chips are aimed at low power high density servers for general web hosting, storage clusters, web caches, and networking hardware.
Currently, Intel has announced two Xeon D chips, the Xeon D-1540 and Xeon D-1520. Both chips are comprised of two dies inside a single package. The main die uses a 14nm process and holds the CPU cores, L3 cache, DDR3 and DDR4 memory controllers, networking controller, PCI-E 3.0, and USB 3.0 while a secondary die using a larger (but easier to implement) manufacturing process hosts the higher latency I/O that would traditionally sit on the southbridge including SATA, PCI-E 2.0, and USB 2.0.
In all, a fairly typical SoC setup from Intel. The specifics are where things get interesting, however. At the top end, Xeon D offers eight Broadwell-based CPU cores (with Hyper-Threading for 16 total threads) clocked at 2.0 GHz base and 2.5 GHz max all-core Turbo (2.6 GHz on a single core). The cores are slightly more efficient than Haswell, especially in this low power setup. The eight cores can tap into 12MB of L3 cache as well as up to 128GB of registered ECC memory (or 64GB unbuffered and/or SODIMMs) in DDR3 1600 MHz or DDR4 2133 MHz flavors. Xeon D also features 24 PCI-E 3.0 lanes (which can be broken up to as small as six PCI-E 3.0 x4 lanes or in a x16+x8 configuration among others), eight PCI-E 2.0 lanes, two 10GbE connections, six SATA III 6.0 Gbps channels, four USB 3.0 ports, and four USB 2.0 ports.
All of this hardware is rolled into a part with a 45W TDP. Needless to say, this is a new level of efficiency for Xeons! Intel chose to compare the new chips to its Atom C2000 “Avoton” (Silvermont-based) SoCs which were also aimed at low power servers and related devices. According to the company, Xeon D offers up to 3.4-times the performance and 1.7-times the performance-per-watt of the top end Atom C2750 processor. Keeping in mind that Xeon D uses approximately twice the power as Atom C2000, it is still looking good for Intel since you are getting more than twice the performance and a more power efficient part. Further, while the TDPs are much higher,
Intel has packed Xeon D with a slew of power management technology including Integrated Voltage Regulation (IVR), an energy efficient turbo mode that will analyze whether increased frequencies actually help get work done faster (and if not will reduce turbo to allow extra power to be used elsewhere on the chip or to simply reduce wasted energy), and optional “hardware power management” that allows the processor itself to determine the appropriate power and sleep states independently from the OS.
Being server parts, Xeon D supports ECC, PCI-E Non-Transparent Bridging, memory and PCI-E Checksums, and corrected (errata-free) TSX instructions.
Ars Technica notes that Xeon D is strictly single socket and that Intel has reserved multi-socket servers for its higher end and more expensive Xeons (Haswell-EP). Where does the “high density” I mentioned come from then? Well, by cramming as many Xeon D SoCs on small motherboards with their own RAM and IO into rack mounted cases as possible, of course! It is hard to say just how many Xeon Ds will fit in a 1U, 2U, or even 4U rack mounted system without seeing associated motherboards and networking hardware needed but Xeon D should fare better than Avoton in this case since we are looking at higher bandwidth networking links and more PCI-E lanes, but AMD with SeaMicro’s Freedom Fabric and head start on low power x86 and ARM-based Opteron chip research as well as other ARM-based companies like AppliedMicro (X-Gene) will have a slight density advantage (though the Intel chips will be faster per chip).
Which brings me to my final point. Xeon D truly appears like a shot across both ARM and AMD’s bow. It seems like Intel is not content with it’s dominant position in the overall server market and is putting its weight into a move to take over the low power server market as well, a niche that ARM and AMD in particular have been actively pursuing. Intel is not quite to the low power levels that AMD and other ARM-based companies are, but bringing Xeon down to 45W (with Atom-based solutions going upwards performance wise), the Intel juggernaut is closing in and I’m interested to see how it all plays out.
Right now, ARM still has the TDP and customization advantage (where customers can create custom chips and cores to suit their exact needs) and AMD will be able to leverage its GPU expertise by including processor graphics for a leg up on highly multi-threaded GPGPU workloads. On the other hand, Intel has the better manufacturing process and engineering budget. Xeon D seems to be the first step towards going after a market that they have in the past not really focused on.
With Intel pushing its weight around, where will that leave the little guys that I have been rooting for in this low power high density server space?
Subject: Processors | March 10, 2015 - 10:20 AM | Sebastian Peak
Tagged: uefi, motherboards, lga 1150, Intel, Broadwell, bios, asus
ASUS has announced that all current Intel 9 Series motherboards will support the upcoming 5th-Generation Intel Broadwell LGA 1150 CPUs with an UEFI update.
We reported last week that Intel’s 5th-generation Broadwell CPU had been demonstrated at GDC using Intel’s Iris Pro graphics, though official details about the new LGA versions of Broadwell are not yet public. The desktop variants will no doubt use the same 14nm process technology of the current BGA parts, and it has been rumored that the new CPUs will initially launch in both Core i5 and i7 versions, with the potential for Core i3 and Pentium branded parts to follow (though any potential product information is mere speculation at this point).
It will be interesting to see if the upcoming LGA 5th-Generation CPUs will be able offer any higher perfomance for desktop users compared to existing Haswell parts (such as the i7-4790K), or if there will even be unlocked processors. Considering Broadwell is a mobile-focused part designed for efficency and lower power consumption the chips could offer a compelling solution for small form-factor computers such as HTPCs, as they will presumably provide lower heat and higher IPC than existing parts.
The UEFI updates will go live later today (some updates have already been released) and include all ASUS motherboard models with Z97 and H97 chipsets.
Subject: Processors | March 4, 2015 - 09:07 PM | Ryan Shrout
Tagged: GDC, gdc 15, Intel, Broadwell, iris pro, LGA1150, core i7
Consumer have been asking for it since the first time Intel announced it, but Iris Pro graphics is finally finding its way to the desktop, socketed market. Shown powering one of Dell's new 5K displays, this processor shipping in "mid-2015", is going to be configured with a 65 watt TDP and will be unlocked for overclockers to tweak. Intel first disclosed these plans way back in May of 2014 so we are going to be approaching the 12-month mark for availability.
It doesn't look special, but this system has the first desktop Iris Pro processor
In a new disclosure at GDC, Intel showed the first 5th Generation Core LGA-socketed CPU with Intel® Iris™ Pro graphics. This 65 watt unlocked desktop processor, available mid-2015, will bring new levels of performance and power efficiency to Mini PCs and desktop All-In-Ones. Since 2006 the 3D performance of Intel Graphics has increased nearly 100 fold (Intel 3DMark06 measurements) and powerful form factors from Acer, Medion and Intel’s own NUCs are becoming available with 5th Generation Intel Core processors with Intel Iris Graphics.
Under that little heatsink...
Details of this new CPU offering, including clock speed and graphics performance, are still unknown but Intel claims we will have this part in our hands in the near future. This isn't targeted to overtake consumers with mid-range discrete graphics systems but instead will bring users interested in a SFF or low power system with both home theater features and improved gaming capability. Our testing with Iris Pro graphics in notebooks has proven that the gaming performance gains can be substantial, but often the battery life demands have limited implementations from OEMs. With a desktop part, we might actually be able to see the full capability of an integrated GPU with embedded memory.