All | Editorial | General Tech | Graphics Cards | Networking | Motherboards | Cases and Cooling | Processors | Chipsets | Memory | Displays | Systems | Storage | Mobile | Shows and Expos
A third primary processor
As the Hot Chips conference begins in Cupertino this week, Qualcomm is set to divulge another set of information about the upcoming Snapdragon 820 processor. Earlier this month the company revealed details about the Adreno 5xx GPU architecture, showcasing improved performance and power efficiency while also adding a new Spectra 14-bit image processor. Today we shift to what Qualcomm calls the “third pillar in the triumvirate of programmable processors” that make up the Snapdragon SoC. The Hexagon DSP (digital signal processor), introduced initially by Qualcomm in 2004, has gone through a massive architecture shift and even programmability shift over the last 10 years.
Qualcomm believes that building a balanced SoC for mobile applications is all about heterogeneous computing with no one processor carrying the entire load. The majority of the work that any modern Snapdragon processor must handle goes through the primary CPU cores, the GPU or the DSP. We learned about upgrades to the Adreno 5xx series for the Snapdragon 820 and we are promised information about Kryo CPU architecture soon as well. But the Hexagon 600-series of DSPs actually deals with some of the most important functionality for smartphones and tablets: audio, voice, imaging and video.
Interestingly, Qualcomm opened up the DSP to programmability just four years ago, giving developers the ability to write custom code and software to take advantages of the specific performance capabilities that the DSP offers. Custom photography, videography and sound applications could benefit greatly in terms of performance and power efficiency if utilizing the QC DSP rather than the primary system CPU or GPU. As of this writing, Qualcomm claims there are “hundreds” of developers actively writing code targeting its family of Hexagon processors.
The Hexagon DSP in Snapdragon 820 consists of three primary partitions. The main compute DSP works in conjunction with the GPU and CPU cores and will do much of the heavy lifting for encompassed workloads. The modem DSP aids the cellular modem in communication throughput. The new guy here is the lower power DSP in the Low Power Island (LPI) that shifts how always-on sensors can communicate with the operating system.
Core and Interconnect
The Skylake architecture is Intel’s first to get a full release on the desktop in more than two years. While that might not seem like a long time in the grand scheme of technology, for our readers and viewers that is a noticeable change and shift from recent history that Intel has created with the tick-tock model of releases. Yes, Broadwell was released last year and was solid product, but Intel focused almost exclusively on the mobile platforms (notebooks and tablets) with it. Skylake will be much more ubiquitous and much more quickly than even Haswell.
Skylake represents Intel’s most scalable architecture to date. I don’t mean only frequency scaling, though that is an important part of this design, but rather in terms of market segment scaling. Thanks to brilliant engineering and design from Intel’s Israeli group Intel will be launching Skylake designs ranging from 4.5 watt TDP Core M solutions all the way up to the 91 watt desktop processors that we have already reviewed in the Core i7-6700K. That’s a range that we really haven’t seen before and in the past Intel has depended on the Atom architecture to make up ground on the lowest power platforms. While I don’t know for sure if Atom is finally trending towards the dodo once Skylake’s reign is fully implemented, it does make me wonder how much life is left there.
Scalability also refers to the package size – something that ensures that the designs the engineers created can actually be built and run in the platform segments they are targeting. Starting with the desktop designs for LGA platforms (DIY market) that fits on a 1400 mm2 design on the 91 watt TDP implementation Intel is scaling all the way down to 330 mm2 in a BGA1515 package for the 4.5 watt TDP designs. Only with a total product size like that can you hope to get Skylake in a form factor like the Compute Stick – which is exactly what Intel is doing. And note that the smaller packages require the inclusion of the platform IO chip as well, something that H- and S-series CPUs can depend on the motherboard to integrate.
Finally, scalability will also include performance scaling. Clearly the 4.5 watt part will not offer the user the same performance with the same goals as the 91 watt Core i7-6700K. The screen resolution, attached accessories and target applications allow Intel to be selective about how much power they require for each series of Skylake CPUs.
The fundamental design theory in Skylake is very similar to what exists today in Broadwell and Haswell with a handful of significant and hundreds of minor change that make Skylake a large step ahead of previous designs.
This slide from Julius Mandelblat, Intel Senior Principle Engineer, shows a higher level overview of the entirety of the consumer integration of Skylake. You can see that Intel’s goals included a bigger and wider core design, higher frequency, improved right architecture and fabric design and more options for eDRAM integration. Readers of PC Perspective will already know that Skylake supports both DDR3L and DDR4 memory technologies but the inclusion of the camera ISP is new information for us.
I knew that the move to DirectX 12 was going to be a big shift for the industry. Since the introduction of the AMD Mantle API along with the Hawaii GPU architecture we have been inundated with game developers and hardware vendors talking about the potential benefits of lower level APIs, which give more direct access to GPU hardware and enable more flexible threading for CPUs to game developers and game engines. The results, we were told, would mean that your current hardware would be able to take you further and future games and applications would be able to fundamentally change how they are built to enhance gaming experiences tremendously.
I knew that the reader interest in DX12 was outstripping my expectations when I did a live blog of the official DX12 unveil by Microsoft at GDC. In a format that consisted simply of my text commentary and photos of the slides that were being shown (no video at all), we had more than 25,000 live readers that stayed engaged the whole time. Comments and questions flew into the event – more than me or my staff could possible handle in real time. It turned out that gamers were indeed very much interested in what DirectX 12 might offer them with the release of Windows 10.
Today we are taking a look at the first real world gaming benchmark that utilized DX12. Back in March I was able to do some early testing with an API-specific test that evaluates the overhead implications of DX12, DX11 and even AMD Mantle from Futuremark and 3DMark. This first look at DX12 was interesting and painted an amazing picture about the potential benefits of the new API from Microsoft, but it wasn’t built on a real game engine. In our Ashes of the Singularity benchmark testing today, we finally get an early look at what a real implementation of DX12 looks like.
And as you might expect, not only are the results interesting, but there is a significant amount of created controversy about what those results actually tell us. AMD has one story, NVIDIA another and Stardock and the Nitrous engine developers, yet another. It’s all incredibly intriguing.
It comes after 8, but before 10
As the week of Intel’s Developer Forum (IDF) begins, you can expect to see a lot of information about Intel’s 6th Generation Core architecture, codenamed Skylake, finally revealed. When I posted my review of the Core i7-6700K, the first product based on that architecture to be released in any capacity, I was surprised that Intel was willing to ship product without the normal amount of background information for media and developers. Rather than give us the details and then ship product, which has happened for essentially every consumer product release I have been a part of, Intel did the reverse: ship a consumer friendly CPU and then promise to tell us how it all works later in the month at IDF.
Today I came across a document posted on Intel’s website that dives into very specific detail on the new Gen9 graphics and compute architecture of Skylake. Details on the Core architecture changes are not present, and instead we are given details on how the traditional GPU portion of the SoC has changed. To be clear: I haven’t had any formal briefing from Intel on this topic or anything surrounding the architecture of Skylake or the new Gen9 graphics system but I wanted to share the details we found available. I am sure we’ll learn more this week as IDF progresses so I will update this story where necessary.
What Intel calls Processor Graphics is what we used to call simply integrated graphics for the longest time. The purpose and role of processor graphics has changed drastically over the years and it is now not only responsible for 3D graphics rendering but compute, media and display capabilities of the Intel Skylake SoC (when discrete add-in graphics is not used). The architecture document used to source this story focuses on Gen9 graphics, the compute architecture utilized in the latest Skylake CPUs. The Intel HD Graphics 530 on the Core i7-6700K / Core i5-6600K is the first product released and announced using Gen9 graphics and is also the first to adopt Intel’s new 3-digit naming scheme.
This die shot of the Core i7-6700K shows the increased size and prominence of the Gen9 graphics in the overall SoC design. Containing four traditional x86 CPU cores and 1 “slice” implementation of Gen9 graphics (with three visible sub-slices we’ll describe below), this is not likely to be the highest performing iteration of the latest Intel HD Graphics technology.
Like the Intel processors before it, the Skylake design utilizes a ring bus architecture to connect the different components of the SoC. This bi-directional interconnect has a 32-byte wide data bus and connects to multiple “agents” on the CPU. Each individual CPU core is considered its own agent while the Gen9 compute architecture is considered one complete agent. The system agent bundles the DRAM memory, the display controller, PCI Express and other I/O interface that communicate with the rest of the PC. Any off-chip memory requests and transactions occur through this bus while on-chip data transfers tend to be handled differently.
Light on architecture details
Our Intel Skylake launch coverage is intense! Make sure you hit up all the stories and videos that are interesting for you!
- The Intel Core i7-6700K Review - Skylake First for Enthusiasts (Video)
- Skylake vs. Sandy Bridge: Discrete GPU Showdown (Video)
- ASUS Z170-A Motherboard Preview
- Intel Skylake / Z170 Rapid Storage Technology Tested - PCIe and SATA RAID
The Intel Skylake architecture has been on our radar for quite a long time as Intel's next big step in CPU design. Through leaks and some official information discussed by Intel over the past few months, we know at least a handful of details: DDR4 memory support, 14nm process technology, modest IPC gains and impressive GPU improvements. But the details have remained a mystery on how the "tock" of Skylake on the 14nm process technology will differ from Broadwell and Haswell.
Interestingly, due to some shifts in how Intel is releasing Skylake, we are going to be doing a review today with very little information on the Skylake architecture and design (at least officially). While we are very used to the company releasing new information at the Intel Developer Forum along with the launch of a new product, Intel has instead decided to time the release of the first Skylake products with Gamescom in Cologne, Germany. Parts will go on sale today (August 5th) and we are reviewing a new Intel processor without the background knowledge and details that will be needed to really explain any of the changes or differences in performance that we see. It's an odd move honestly, but it has some great repercussions for the enthusiasts that read PC Perspective: Skylake will launch first as an enthusiast-class product for gamers and DIY builders.
For many of you this won't change anything. If you are curious about the performance of the new Core i7-6700K, power consumption, clock for clock IPC improvements and anything else that is measurable, then you'll get exactly what you want from today's article. If you are a gear-head that is looking for more granular details on how the inner-workings of Skylake function, you'll have to wait a couple of weeks longer - Intel plans to release that information on August 18th during IDF.
So what does the addition of DDR4 memory, full range base clock manipulation and a 4.0 GHz base clock on a brand new 14nm architecture mean for users of current Intel or AMD platforms? Also, is it FINALLY time for users of the Core i7-2600K or older systems to push that upgrade button? (Let's hope so!)
Bioshock Infinite Results
Our Intel Skylake launch coverage is intense! Make sure you hit up all the stories and videos that are interesting for you!
- The Intel Core i7-6700K Review - Skylake First for Enthusiasts (Video)
- Skylake vs. Sandy Bridge: Discrete GPU Showdown (Video)
- ASUS Z170-A Motherboard Preview
- Intel Skylake / Z170 Rapid Storage Technology Tested - PCIe and SATA RAID
Today marks the release of Intel's newest CPU architecture, code named Skylake. I already posted my full review of the Core i7-6700K processor so, if you are looking for CPU performance and specification details on that part, you should start there. What we are looking at in this story is the answer to a very simple, but also very important question:
Is it time for gamers using Sandy Bridge system to finally bite the bullet and upgrade?
I think you'll find that answer will depend on a few things, including your gaming resolution and aptitude for multi-GPU configuration, but even I was surprised by the differences I saw in testing.
Our testing scenario was quite simple. Compare the gaming performance of an Intel Core i7-6700K processor and Z170 motherboard running both a single GTX 980 and a pair of GTX 980s in SLI against an Intel Core i7-2600K and Z77 motherboard using the same GPUs. I installed both the latest NVIDIA GeForce drivers and the latest Intel system drivers for each platform.
|Skylake System||Sandy Bridge System|
|Processor||Intel Core i7-6700K||Intel Core i7-2600K|
|Motherboard||ASUS Z170-Deluxe||Gigabyte Z68-UD3H B3|
|Memory||16GB DDR4-2133||8GB DDR3-1600|
|Graphics Card||1x GeForce GTX 980
2x GeForce GTX 980 (SLI)
|1x GeForce GTX 980
2x GeForce GTX 980 (SLI)
|OS||Windows 8.1||Windows 8.1|
Our testing methodology follows our Frame Rating system, which uses a capture-based system to measure frame times at the screen (rather than trusting the software's interpretation).
If you aren't familiar with it, you should probably do a little research into our testing methodology as it is quite different than others you may see online. Rather than using FRAPS to measure frame rates or frame times, we are using an secondary PC to capture the output from the tested graphics card directly and then use post processing on the resulting video to determine frame rates, frame times, frame variance and much more.
This amount of data can be pretty confusing if you attempting to read it without proper background, but I strongly believe that the results we present paint a much more thorough picture of performance than other options. So please, read up on the full discussion about our Frame Rating methods before moving forward!!
While there are literally dozens of file created for each “run” of benchmarks, there are several resulting graphs that FCAT produces, as well as several more that we are generating with additional code of our own.
If you need some more background on how we evaluate gaming performance on PCs, just check out my most recent GPU review for a full breakdown.
I only had time to test four different PC titles:
- Bioshock Infinite
- Grand Theft Auto V
- GRID 2
- Metro: Last Light
Tick Tock Tick Tock Tick Tock Tock
A few websites have been re-reporting on a leak from BenchLife.info about Kaby Lake, which is supposedly a second 14nm redesign (“Tock”) to be injected between Skylake and Cannonlake.
UPDATE (July 2nd, 3:20pm ET): It has been pointed out that many hoaxes have come out of the same source, and that I should be more clear in my disclaimer. This is an unconfirmed, relatively easy to fake leak that does not have a second, independent source. I reported on it because (apart from being interesting enough) some details were listed on the images, but not highlighted in the leak, such as "GT0" and a lack of Iris Pro on -K. That suggests that the leaker got the images from somewhere, but didn't notice those details, which implies that the original source was hoaxed by an anonymous source, who only seeded the hoax to a single media outlet, or that it was an actual leak.
Either way, enjoy my analysis but realize that this is a single, unconfirmed source who allegedly published hoaxes in the past.
Image Credit: BenchLife.info
If true, this would be a major shift in both Intel's current roadmap as well as how they justify their research strategies. It also includes a rough stack of product categories, from 4.5W up to 91W TDPs, including their planned integrated graphics configurations. This leads to a pair of interesting stories:
How Kaby Lake could affect Intel's processors going forward. Since 2006, Intel has only budgeted a single CPU architecture redesign for any given fabrication process node. Taking two attempts on the 14nm process buys time for 10nm to become viable, but it could also give them more time to build up a better library of circuit elements, allowing them to assemble better processors in the future.
What type of user will be given Iris Pro? Also, will graphics-free options be available in the sub-Enthusiast class? When buying a processor from Intel, the high-end mainstream processors tend to have GT2-class graphics, such as the Intel HD 4600. Enthusiast architectures, such as Haswell-E, cannot be used without discrete graphics -- the extra space is used for more cores, I/O lanes, or other features. As we will discuss later, Broadwell took a step into changing the availability of Iris Pro in the high-end mainstream, but it doesn't seem like Kaby Lake will make any more progress. Also, if I am interpreting the table correctly, Kaby Lake might bring iGPU-less CPUs to LGA 1151.
Keeping Your Core Regular
To the first point, Intel has been on a steady tick-tock cycle since the Pentium 4 architecture reached the 65nm process node, which was a “tick”. The “tock” came from the Conroe/Merom architecture that was branded “Core 2”. This new architecture was a severe departure from the high clock, relatively low IPC design that Netburst was built around, which instantaneously changed the processor landscape from a dominant AMD to an Intel runaway lead.
After 65nm and Core 2 started the cycle, every new architecture alternated between shrinking the existing architecture to smaller transistors (tick) and creating a new design on the same fabrication process (tock). Even though Intel has been steadily increasing their R&D budget over time, which is now in the range of $10 to $12 billion USD each year, creating smaller, more intricate designs with new process nodes has been getting harder. For comparison, AMD's total revenue (not just profits) for 2014 was $5.51 billion USD.
Qualcomm’s GPU History
Despite its market dominance, Qualcomm may be one of the least known contenders in the battle for the mobile space. While players like Apple, Samsung, and even NVIDIA are often cited as the most exciting and most revolutionary, none come close to the sheer sales, breadth of technology, and market share that Qualcomm occupies. Brands like Krait and Snapdragon have helped push the company into the top 3 semiconductor companies in the world, following only Intel and Samsung.
Founded in July 1985, seven industry veterans came together in the den of Dr. Irwin Jacobs’ San Diego home to discuss an idea. They wanted to build “Quality Communications” (thus the name Qualcomm) and outlined a plan that evolved into one of the telecommunications industry’s great start-up success stories.
Though Qualcomm sold its own handset business to Kyocera in 1999, many of today’s most popular mobile devices are powered by Qualcomm’s Snapdragon mobile chipsets with integrated CPU, GPU, DSP, multimedia CODECs, power management, baseband logic and more. In fact the typical “chipset” from Qualcomm encompasses up to 20 different chips of different functions besides just the main application processor. If you are an owner of a Galaxy Note 4, Motorola Droid Turbo, Nexus 6, or Samsung Galaxy S5, then you are most likely a user of one of Qualcomm’s Snapdragon chipsets.
Qualcomm’s GPU History
Before 2006, the mobile GPU as we know it today was largely unnecessary. Feature phones and “dumb” phones were still the large majority of the market with smartphones and mobile tablets still in the early stages of development. At this point all the visual data being presented on the screen, whether on a small monochrome screen or with the color of a PDA, was being drawn through a software renderer running on traditional CPU cores.
But by 2007, the first fixed-function, OpenGL ES 1.0 class of GPUs started shipping in mobile devices. These dedicated graphics processors were originally focused on drawing and updating the user interface on smartphones and personal data devices. Eventually these graphics units were used for what would be considered the most basic gaming tasks.
Digging into a specific market
A little while ago, I decided to think about processor design as a game. You are given a budget of complexity, which is determined by your process node, power, heat, die size, and so forth, and the objective is to lay out features in the way that suits your goal and workload best. While not the topic of today's post, GPUs are a great example of what I mean. They make the assumption that in a batch of work, nearby tasks are very similar, such as the math behind two neighboring pixels on the screen. This assumption allows GPU manufacturers to save complexity by chaining dozens of cores together into not-quite-independent work groups. The circuit fits the work better, and thus it lets more get done in the same complexity budget.
Carrizo is aiming at a 63 million unit per year market segment.
This article is about Carrizo, though. This is AMD's sixth-generation APU, starting with Llano's release in June 2011. For this launch, Carrizo is targeting the 15W and 35W power envelopes for $400-$700 USD notebook devices. AMD needed to increase efficiency on the same, 28nm process that we have seen in their product stack since Kabini and Temash were released in May of 2013. They tasked their engineers to optimize their APU's design for these constraints, which led to dense architectures and clever features on the same budget of complexity, rather than smaller transistors or a bigger die.
15W was their primary target, and they claim to have exceeded their own expectations.
Backing up for a second. Beep. Beep. Beep. Beep.
When I met with AMD last month, I brought up the Bulldozer architecture with many individuals. I suspected that it was a quite clever design that didn't reach its potential because of external factors. As I started this editorial, processor design is a game and, if you can save complexity by knowing your workload, you can do more with less.
Bulldozer looked like it wanted to take a shortcut by cutting elements that its designers believed would be redundant going forward. First and foremost, two cores share a single floating point (decimal) unit. While you need some floating point capacity, upcoming workloads could use the GPU for a massive increase in performance, which is right there on the same die. As such, the complexity that is dedicated to every second FPU can be cut and used for something else. You can see this trend throughout various elements of the architecture.
Some Fresh Hope for 2016
EDIT 2015-05-07: A day after the AMD analyst meeting we now know that the roadmaps delivered here are not legitimate. While some of the information is likely correct on the roadmaps, they were not leaked by AMD. There is no FM3 socket, rather AMD is going with AM4. AMD will be providing more information throughout this quarter about their roadmaps, but for now take all of this information as "not legit".
SH SOTN has some eagle eyes and spotted the latest leaked roadmap for AMD. These roadmaps cover both mobile and desktop, from 2015 through 2016. There are obviously quite a few interesting tidbits of information here.
On the mobility roadmap we see the upcoming release of Carrizo, which we have been talking about since before CES. This will be the very first HSA 1.0 compliant part to hit the market, and AMD has done some really interesting things with the design in terms of performance, power efficiency, and die size optimizations. Carrizo will span the market from 15 watts to 35 watts TDP. This is a mobile only part, but indications point to it being pretty competent overall. This is a true SOC that will support all traditional I/O functions of older standalone southbridges. Most believe that this part will be manufactured by GLOBALFOUNDIRES on their 28 nm HKMG process that is more tuned to AMD's APU needs.
Carrizo-L will be based on the Puma+ architecture and will go from 10 watts to 15 watts TDP. This will use the same FP4 BGA connection as the big Carrizo APU. This should make these parts more palatable for OEMs as they do not have to differentiate the motherboard infrastructure. Making things easier for OEMs will give more reasons for these folks to offer products based on Carrizo and Carrizo-L APUs. The other big reason will be the GCN graphics compute units. Puma+ is a very solid processor architecture for low power products, but these parts are still limited to the older 28 nm HKMG process from TSMC.
One interesting addition here is that AMD will be introducing their "Amur" APU for the low power and ultra-low power markets. These will be comprised of four Cortex-A57 CPUs combined with AMD's GCN graphics units. This will be the first time we see this combination, and the first time AMD has integrated with ARM since ATI spun off their mobile graphics to Qualcomm under the "Adreno" branding (anagram for "Radeon"). What is most interesting here is that this APU will be a 20 nm part most likely fabricated by TSMC. This is not to say that Samsung or GLOBALFOUNDRIES might be producing it, but those companies are expending their energy on the 14 nm FinFET process that will be their bread and butter for years to come. This will be a welcome addition to the mobile market (tablets and handhelds) and could be a nice profit center for AMD if they are able to release this in a timely manner.
2016 is when things get very interesting. The Zen x86 design will dominate the upper 2/3 of the roadmap. I had talked about Zen when we had some new diagram leaks yesterday, but now we get to see the first potential products based off of this architecture. In mobile it will span from 5 watts to 35 watts TDP. The performance and mainstream offerings will be the "Bristol Ridge" APU which will feature 4 Zen cores (or one Zen module) combined with the next gen GCN architecture. This will be a 14nm part, and the assumption is that it will be GLOBALFOUNDRIES using 14nm FinFET LPP (Low Power Plus) that will be more tuned for larger APUs. This will also be a full SOC.
The next APU will be codenamed "Basilisk" that will span the 5 watt to 15 watt range. It will be comprised of 2 Zen cores (1/2 of a Zen module) and likely feature 2 to 4 MB of L3 cache, depending on power requirements. This looks to be the first Skybridge set of APUs that will share the same infrastructure as the ARM based Amur SOC. FT4 BGA is the basis for both the 2015 Amur and 2016 Basilisk SOCs.
Finally we have the first iteration of AMD's first ground up implementation of ARM's ARMv8-A ISA. The "Styx" APU features the new K12 CPU cores that AMD has designed from scratch. It too will feature the next generation GCN units as well as share the same FT4 BGA connection. Many are anxiously watching this space to see if AMD can build a better mousetrap when it comes to licensing the ARM ISA (as have Qualcomm, NVIDIA, and others).
2015 shows no difference in the performance desktop space, as it is still serviced by the now venerable Piledriver based FX parts on AM3+. The only change we expect to see here is that there will be a handful of new motherboard offerings from the usual suspects that will include the new USB 3.1 functionality derived from a 3rd party controller.
Mainstream and Performance will utilize the upcoming Godavari APUs. These are power and speed optimized APUs that are still based on the current Kaveri design. These look to be a simple refresh/rebadge with a slight performance tweak. Not exciting, but needs to happen for OEMs.
Low power will continue to be addressed by Beema based APUs. These are regular Puma based cores (not Puma+). AMD likely does not have the numbers to justify a new product in this rather small market.
2016 is when things get interesting again. We see the release of the FM3 socket (final proof that AM3+ is dead) that will house the latest Zen based APUs. At the top end we see "Summit Ridge" which will be composed of 8 Zen cores (or 2 Zen modules). This will have 4 MB of L2 cache and 16 MB of L3 cache if our other leaks are correct. These will be manufactured on 14nm FinFET LPE (the more appropriate process product for larger, more performance oriented parts). These will not be SOCs. We can expect these to be the basis of new Opterons as well, but there is obviously no confirmation of that on these particular slides. This will be the first new product in some years from AMD that has the chance to compete with higher end desktop SKUs from Intel.
From there we have the lower power Bristol Ridge and Basilisk APUs that we already covered in the mobile discussion. These look to be significant upgrades from the current Kaveri (and upcoming Godavari) APUs. New graphics cores, new CPU cores, and new SOC implementations where necessary.
AMD will really be shaking up the game in 2016. At the very least they will have proven that they can still change up their game and release higher end (and hopefully competitive) products. AMD has enough revenue and cash on hand to survive through 2016 and 2017 at the rate they are going now. We can only hope that this widescale change will allow AMD to make some significant inroads with OEMs on all levels. Otherwise Intel is free to do what they want and what price they want across multiple markets.
ARM Releases Cortex-A72 for Licensing
On February 3rd, ARM announced a slew of new designs, including the Cortex A72. Few details were shared with us, but what we learned was that it could potentially redefine power and performance in the ARM ecosystem. Ryan was invited to London to participate in a deep dive of what ARM has done to improve its position against market behemoth Intel in the very competitive mobile space. Intel has a leg up on process technology with their 14nm Tri-Gate process, but they are continuing to work hard in making their x86 based processors more power efficient, while still maintaining good performance. There are certain drawbacks to using an ISA that is focused on high performance computing rather than being designed from scratch to provide good performance with excellent energy efficiency.
ARM has been on a pretty good roll with their Cortex A9, A7, A15, A17, A53, and A57 parts over the past several years. These designs have been utilized in a multitude of products and scenarios, with configurations that have scaled up to 16 cores. While each iteration has improved upon the previous, ARM is facing the specter of Intel’s latest generation, highly efficient x86 SOCs based on the 2nd gen 14nm Tri-Gate process. Several things have fallen into place for ARM to help them stay competitive, but we also cannot ignore the experience and design hours that have led to this product.
(Editor's Note: During my time with ARM last week it became very apparent that it is not standing still, not satisfied with its current status. With competition from Intel, Qualcomm and others ramping up over the next 12 months in both mobile and server markets, ARM will more than ever be depedent on the evolution of core design and GPU design to maintain advantages in performance and efficiency. As Josh will go into more detail here, the Cortex-A72 appears to be an incredibly impressive design and all indications and conversations I have had with others, outside of ARM, believe that it will be an incredibly successful product.)
Cortex A72: Highest Performance ARM Cortex
ARM has been ubiquitous for mobile applications since it first started selling licenses for their products in the 90s. They were found everywhere it seemed, but most people wouldn’t recognize the name ARM because these chips were fabricated and sold by licensees under their own names. Guys like Ti, Qualcomm, Apple, DEC and others all licensed and adopted ARM technology in one form or the other.
ARM’s importance grew dramatically with the introduction of increased complexity cellphones and smartphones. They also gained attention through multimedia devices such as the Microsoft Zune. What was once a fairly niche company with low performance, low power offerings became the 800 pound gorilla in the mobile market. Billions of chips are sold yearly based on ARM technology. To stay in that position ARM has worked aggressively on continually providing excellent power characteristics for their parts, but now they are really focusing on overall performance and capabilities to address, not only the smartphone market, but also the higher performance computing and server spaces that they want a significant presence in.
SoFIA, Cherry Trail Make Debuts
Mobile World Congress is traditionally dominated by Samsung, Qualcomm, HTC, and others yet Intel continues to make in-roads into the mobile market. Though the company has admittedly lost a lot of money during this growing process, Intel pushes forward with today's announcement of a trio of new processor lines that keep the Atom brand. The Atom x3, the Atom x5, and the Atom x7 will be the company's answer in 2015 for a wide range of products, starting at the sub-$75 phone market and stretching up to ~$400 tablets and all-in-ones.
There are some significant differences in these Atom processors, more than the naming scheme might indicate.
Intel Atom x3 SoFIA Processor
For years now we have questioned Intel's capability to develop a processor that could fit inside the thermal envelope that is required for a smartphone while also offering performance comparable to Qualcomm, MediaTek, and others. It seemed that the x86 architecture was a weight around Intel's ankles rather than a float lifting it up. Intel's answer was the development of SoFIA, (S)mart (o)r (F)eature phone with (I)ntel (A)rchitecture. The project started about 2 years ago leading to product announcements finally reaching us today. SoFIA parts are "designed for budget smartphones; SoFIA is set to give Qualcomm and MediaTek a run for their money in this rapidly growing part of the market."
The SoFIA processors are based on the same Silvermont architecture as the current generation of Atom processors, but they are more tuned for power efficiency. Originally planned to be a dual-core only option, Intel has actually built both dual-core and quad-core variants that will pair with varying modem options to create a combination that best fit target price points and markets. Intel has partnered with RockChip for these designs, even though the architecture is completely IA/x86 based. Production will be done on a 28nm process technology at an unnamed vendor, though you can expect that to mean TSMC. This allows RockChip access to the designs, to help accelerate development, and to release them into the key markets that Intel is targeting.
AMD Details Carrizo Further
Some months back AMD introduced us to their “Carrizo” product. Details were slim, but we learned that this would be another 28 nm part that has improved power efficiency over its predecessor. It would be based on the new “Excavator” core that will be the final implementation of the Bulldozer architecture. The graphics will be based on the latest iteration of the GCN architecture as well. Carrizo would be a true SOC in that it integrates the southbridge controller. The final piece of information that we received was that it would be interchangeable with the Carrizo-L SOC, which is a extremely low power APU based on the Puma+ cores.
A few months later we were invited by AMD to their CES meeting rooms to see early Carrizo samples in action. These products were running a variety of applications very smoothly, but we were not informed of speeds and actual power draw. All that we knew is that Carrizo was working and able to run pretty significant workloads like high quality 4K video playback. Details were yet again very scarce other than the expected timeline of release, the TDP ratings of these future parts, and how it was going to be a significant jump in energy efficiency over the previous Kaveri based APUs.
AMD is presenting more information on Carrizo at the ISSCC 2015 conference. This information dives a little deeper into how AMD has made the APU smaller, more power efficient, and faster overall than the previous 15 watt to 35 watt APUs based on Kaveri. AMD claims that they have a product that will increase power efficiency in a way not ever seen before for the company. This is particularly important considering that Carrizo is still a 28 nm product.
Intel Pushes Broadwell to the Next Unit of Computing
Intel continues to invest a significant amount of money into this small form factor product dubbed the Next Unit of Computing, or NUC. When it was initially released in December of 2012, the NUC was built as an evolutionary step of the desktop PC, part of a move for Intel to find new and unique form factors that its processors can exist in. With a 4" x 4" motherboard design the NUC is certainly a differentiating design and several of Intel's partners have adopted it for products of their: Gigabyte's BRIX line being the most relevant.
But Intel's development team continues to push the NUC platform forward and today we are evaluating the most recent iteration. The Intel NUC5i5RYK is based on the latest 14nm Broadwell processor and offers improved CPU performance, a higher speed GPU and lower power consumption. All of this is packed into a smaller package than any previous NUC on the market and the result is both impressive and totally expected.
A Walk Around the NUC
To most poeple the latest Intel NUC will look very similar to the previous models based on Ivy Bridge and Haswell. You'd be right of course - the fundamental design is unchanged. But Intel continues to push forward in small ways, nipping and tucking away. But the NUC is still just a box. An incredibly small one with a lot of hardware crammed into it, but a box none the less.
While I can appreciate the details including the black and silver colors and rounded edges, I think that Intel needs to find a way to add some more excitement into the NUC product line going forward. Admittedly, it is hard to inovate in that directions with a focus on size and compression.
New Features and Specifications
It is increasingly obvious that in the high end smartphone and tablet market, much like we saw occur over the last several years in the PC space, consumers are becoming more concerned with features and experiences than just raw specifications. There is still plenty to drool over when looking at and talking about 4K screens in the palm of your hand, octa-core processors and mobile SoC GPUs measuring performance in hundreds of GFLOPS, but at the end of the day the vast majority of consumers want something that does something to “wow” them.
As a result, device manufacturers and SoC vendors are shifting priorities for performance, features and how those are presented both the public and to the media. Take this week’s Qualcomm event in San Diego where a team of VPs, PR personnel and engineers walked me through the new Snapdragon 810 processor. Rather than showing slide after slide of comparative performance numbers to the competition, I was shown room after room of demos. Wi-Fi, LTE, 4K capture and playback, gaming capability, thermals, antennae modifications, etc. The goal is showcase the experience of the entire platform – something that Qualcomm has been providing for longer than just about anyone in this business, while educating consumers on the need for balance too.
As a 15-year veteran of the hardware space my first reaction here couldn’t have been scripted any more precisely: a company that doesn’t show performance numbers has something to hide. But I was given time with a reference platform featuring the Snapdragon 810 processor in a tablet form-factor and the results show impressive increases over the 801 and 805 processors from the previous family. Rumors of the chips heat issues seem overblown, but that part will be hard to prove for sure until we get retail hardware in our hands to confirm.
Today’s story will outline the primary feature changes of the Snapdragon 810 SoC, though there was so much detail presented at the event with such a short window of time for writing that I definitely won’t be able to get to it all. I will follow up the gory specification details with performance results compared to a wide array of other tablets and smartphones to provide some context to where 810 stands in the market.
SFF PCs get an upgrade
Ultra compact computers, otherwise known as small form factor PCs, are a rapidly increasing market as consumers realize that, for nearly all purposes other than gaming and video editing, Ultrabook-class hardware is "fast enough". I know that some of our readers will debate that fact, and we welcome the discussion, but as CPU architectures continue to improve in both performance and efficiency, you will be able to combine higher performance into smaller spaces. The Gigabyte BRIX platform is the exact result that you expect to see with that combination.
Previously, we have seen several other Gigabyte BRIX devices including our first desktop interaction with Iris Pro graphics, the BRIX Pro. Unfortunately though, that unit was plagued by noise issues - the small fan spun pretty fast to cool a 65 watt processor. For a small computer that would likely sit on top of your desk, that's a significant drawback.
Intel Ivy Bridge NUC, Gigabyte BRIX S Broadwell, Gigabyte BRIX Pro Haswell
This time around, Gigabyte is using the new Broadwell-U architecture in the Core i7-5500U and its significantly lower, 15 watt TDP. That does come with some specification concessions though, including a dual-core CPU instead of a quad-core CPU and a peak Turbo clock rate that is 900 MHz lower. Comparing the Broadwell BRIX S to the more relevant previous generation based on Haswell, we get essentially the same clock speed, a similar TDP, but also an improved core architecture.
Today we are going to look at the new Gigabyte BRIX S featuring the Core i7-5500U and an NFC chip for some interesting interactions. The "S" designates that this model could support a full size 2.5-in hard drive in addition to the mSATA port.
ARM Releases Top Cortex Design to Partners
ARM has an interesting history of releasing products. The company was once in the shadowy background of the CPU world, but with the explosion of mobile devices and its relevance in that market, ARM has had to adjust how it approaches the public with their technologies. For years ARM has announced products and technology, only to see it ship one to two years down the line. It seems that with the increased competition in the marketplace from Apple, Intel, NVIDIA, and Qualcomm ARM is now pushing to license out its new IP in a way that will enable their partners to achieve a faster time to market.
The big news this time is the introduction of the Cortex A72. This is a brand new design that will be based on the ARMv8-A instruction set. This is a 64 bit capable processor that is also backwards compatible with 32 bit applications programmed for ARMv7 based processors. ARM does not go into great detail about the product other than it is significantly faster than the previous Cortex-A15 and Cortex-A57.
The previous Cortex-A15 processors were announced several years back and made their first introduction in late 2013/early 2014. These were still 32 bit processors and while they had good performance for the time, they did not stack up well against the latest A8 SOCs from Apple. The A53 and A57 designs were also announced around two years ago. These are the first 64 bit designs from ARM and were meant to compete with the latest custom designs from Apple and Qualcomm’s upcoming 64 bit part. We are only now just seeing these parts make it into production, and even Qualcomm has licensed the A53 and A57 designs to insure a faster time to market for this latest batch of next-generation mobile devices.
We can look back over the past five years and see that ARM is moving forward in announcing their parts and then having their partners ship them within a much shorter timespan than we were used to seeing. ARM is hoping to accelerate the introduction of its new parts within the next year.
NVIDIA's Tegra X1
NVIDIA seems to like begin on a one year cycle with their latest Tegra products. Many years ago we were introduced to the Tegra 2, and the year after that the Tegra 3, and the year after that the Tegra 4. Well, NVIDIA did spice up their naming scheme to get away from the numbers (not to mention the potential stigma of how many of those products actually made an impact in the industry). Last year's entry was the Tegra K1 based on the Kepler graphics technology. These products were interesting due to the use of the very latest, cutting edge graphics technology in a mobile/low power format. The Tegra K1 64 bit variant used two “Denver” cores that were actually designed by NVIDIA.
While technically interesting, the Tegra K1 series have made about the same impact as the previous versions. The Nexus 9 was the biggest win for NVIDIA with these parts, and we have heard of a smattering of automotive companies using Tegra K1 in those applications. NVIDIA uses the Tegra K1 in their latest Shield tablet, but they do not typically release data regarding the number of products sold. The Tegra K1 looks to be the most successful product since the original Tegra 2, but the question of how well they actually sold looms over the entire brand.
So why the history lesson? Well, we have to see where NVIDIA has been to get a good idea of where they are heading next. Today, NVIDIA is introducing the latest Tegra product, and it is going in a slightly different direction than what many had expected.
The reference board with 4 GB of LPDDR4.
Core M 5Y70 Specifications
Back in August of this year, Intel invited me out to Portland, Oregon to talk about the future of processors and process technology. Broadwell is the first microarchitecture to ship on Intel's newest 14nm process technology and the performance and power implications of it are as impressive as they are complex. We finally have the first retail product based on Broadwell-Y in our hands and I am eager to see how this combination of technology is going to be implemented.
If you have not read through my article that dives into the intricacies of the 14nm process and the architectural changes coming with Broadwell, then I would highly recommend that you do so before diving any further into this review. Our Intel Core M Processor: Broadwell Architecture and 14nm Process Reveal story clearly explains the "how" and "why" for many of the decisions that determined the direction the Core M 5Y70 heads in.
As I stated at the time:
"The information provided by Intel about Broadwell-Y today shows me the company is clearly innovating and iterating on its plans set in place years ago with the focus on power efficiency. Broadwell and the 14nm process technology will likely be another substantial leap between Intel and AMD in the x86 tablet space and should make an impact on other tablet markets (like Android) as long as pricing can remain competitive. That 14nm process gives Intel an advantage that no one else in the industry can claim and unless Intel begins fabricating processors for the competition (not completely out of the question), that will remain a house advantage."
With a background on Intel's goals with Broadwell-Y, let's look at the first true implementation.
Core M 5Y70 Early Testing
During a press session today with Intel, I was able to get some early performance results on Broadwell-Y in the form of the upcoming Core M 5Y70 processor.
Testing was done on a reference design platform code named Llama Mountain and at the heart of the system is the Broadwell-Y designed dual-core CPU, the Core M 5Y70, which is due out later this year. Power consumption of this system is low enough that Intel has built it with a fanless design. As we posted last week, this processor has a base frequency of just 1.10 GHz but it can boost as high as 2.6 GHz for extra performance when it's needed.
Before we dive into the actual result, you should keep in mind a couple of things. First, we didn't have to analyze the systems to check driver revisions, etc., so we are going on Intel's word that these are setup as you would expect to see them in the real world. Next, because of the disjointed nature of test were were able to run, the comparisons in our graphs aren't as great as I would like. Still, the results for the Core M 5Y70 are here should you want to compare them to any other scores you like.
First, let's take a look at old faithful: CineBench 11.5.
UPDATE: A previous version of this graph showed the TDP for the Intel Core M 5Y70 as 15 watts, not the 4.5 watt listed here now. The reasons are complicated. Even though the Intel Ark website lists the TDP of the Core M 5Y70, Intel has publicly stated the processor will make very short "spikes" at 15 watts when in its highest Turbo Boost modes. It comes to a discussion of semantics really. The cooling capability of the tablet is only targeted to 4.5-6.0 watts and those very short 15 watt spikes can be dissipated without the need for extra heatsink surface...because they are so short. SDP anyone? END UPDATE
With a score of 2.77, the Core M 5Y70 processor puts up an impressive fight against CPUs with much higher TDP settings. For example, Intel's own Pentium G3258 gets a score of 2.71 in CB11, and did so with a considerably higher thermal envelope. The Core i3-4330 scores 38% higher than the Core M 5Y70 but it requires a TDP 3.6-times larger to do so. Both of AMD's APUs in the 45 watt envelope fail to keep up with Core M.