Qualcomm’s GPU History
Despite its market dominance, Qualcomm may be one of the least known contenders in the battle for the mobile space. While players like Apple, Samsung, and even NVIDIA are often cited as the most exciting and most revolutionary, none come close to the sheer sales, breadth of technology, and market share that Qualcomm occupies. Brands like Krait and Snapdragon have helped push the company into the top 3 semiconductor companies in the world, following only Intel and Samsung.
Founded in July 1985, seven industry veterans came together in the den of Dr. Irwin Jacobs’ San Diego home to discuss an idea. They wanted to build “Quality Communications” (thus the name Qualcomm) and outlined a plan that evolved into one of the telecommunications industry’s great start-up success stories.
Though Qualcomm sold its own handset business to Kyocera in 1999, many of today’s most popular mobile devices are powered by Qualcomm’s Snapdragon mobile chipsets with integrated CPU, GPU, DSP, multimedia CODECs, power management, baseband logic and more. In fact the typical “chipset” from Qualcomm encompasses up to 20 different chips of different functions besides just the main application processor. If you are an owner of a Galaxy Note 4, Motorola Droid Turbo, Nexus 6, or Samsung Galaxy S5, then you are most likely a user of one of Qualcomm’s Snapdragon chipsets.
Qualcomm’s GPU History
Before 2006, the mobile GPU as we know it today was largely unnecessary. Feature phones and “dumb” phones were still the large majority of the market with smartphones and mobile tablets still in the early stages of development. At this point all the visual data being presented on the screen, whether on a small monochrome screen or with the color of a PDA, was being drawn through a software renderer running on traditional CPU cores.
But by 2007, the first fixed-function, OpenGL ES 1.0 class of GPUs started shipping in mobile devices. These dedicated graphics processors were originally focused on drawing and updating the user interface on smartphones and personal data devices. Eventually these graphics units were used for what would be considered the most basic gaming tasks.
Subject: Graphics Cards, Processors, Mobile | June 4, 2015 - 04:58 PM | Scott Michaud
Tagged: amd, carrizo
My discussion of the Carrizo architecture went up a couple of days ago. The post did not include specific SKUs because we did not have those at the time. Now we do, and there will be products: one A8-branded, one A10-branded, and one FX-branded.
All three will be quad-core parts that can range between 12W and 35W designs, although the A8 processor does not have a 35W mode listed in the AMD Dual Graphics table. The FX-8800P is an APU that has all eight GPU cores while the A-series APUs have six. The A10-8700P and the A8-8600P are separated by a couple hundred megahertz base and boost CPU clocks, and 80 MHz GPU clock.
Also, we have been given a table of AMD Radeon R5 and R7 M-series GPUs that can be paired with Carrizo in an AMD Dual Graphics setup. These GPUs are the R7 M365, R7 M360, R7 M350, R7 M340, R5 M335, and R5 M330. They cannot be paired with every Carrizo APU, and some pairings only work in certain power envelopes. Thankfully, this table should only be relevant to OEMs, because end-users are receiving pre-configured systems.
Pricing and availability will depend on OEMs, of course.
Digging into a specific market
A little while ago, I decided to think about processor design as a game. You are given a budget of complexity, which is determined by your process node, power, heat, die size, and so forth, and the objective is to lay out features in the way that suits your goal and workload best. While not the topic of today's post, GPUs are a great example of what I mean. They make the assumption that in a batch of work, nearby tasks are very similar, such as the math behind two neighboring pixels on the screen. This assumption allows GPU manufacturers to save complexity by chaining dozens of cores together into not-quite-independent work groups. The circuit fits the work better, and thus it lets more get done in the same complexity budget.
Carrizo is aiming at a 63 million unit per year market segment.
This article is about Carrizo, though. This is AMD's sixth-generation APU, starting with Llano's release in June 2011. For this launch, Carrizo is targeting the 15W and 35W power envelopes for $400-$700 USD notebook devices. AMD needed to increase efficiency on the same, 28nm process that we have seen in their product stack since Kabini and Temash were released in May of 2013. They tasked their engineers to optimize their APU's design for these constraints, which led to dense architectures and clever features on the same budget of complexity, rather than smaller transistors or a bigger die.
15W was their primary target, and they claim to have exceeded their own expectations.
Backing up for a second. Beep. Beep. Beep. Beep.
When I met with AMD last month, I brought up the Bulldozer architecture with many individuals. I suspected that it was a quite clever design that didn't reach its potential because of external factors. As I started this editorial, processor design is a game and, if you can save complexity by knowing your workload, you can do more with less.
Bulldozer looked like it wanted to take a shortcut by cutting elements that its designers believed would be redundant going forward. First and foremost, two cores share a single floating point (decimal) unit. While you need some floating point capacity, upcoming workloads could use the GPU for a massive increase in performance, which is right there on the same die. As such, the complexity that is dedicated to every second FPU can be cut and used for something else. You can see this trend throughout various elements of the architecture.
Subject: Processors, Shows and Expos | June 2, 2015 - 11:10 AM | Ryan Shrout
Tagged: Intel, computex 2015, computex, Broadwell
Earlier this morning you saw us post a story about MSI updating its line of 20 notebooks with new Broadwell processors. Though dual-core Broadwell has been available for Ultrabooks and 2-in-1s for some time already, today marks the release of the quad-core variations we have been waiting on for some time. Available for mobile designs, as well as marking the very first Iris Pro graphics implementation for desktop users, Broadwell quad-core parts look to be pretty impressive.
Today Intel gives to the world a total 10 new processors for content creators and enthusiasts. Two of these parts are 65 watt SKUs in LGA packaging for use by enthusiasts and DIY builders. The rest are BGA designs for all-in-one PCs and high performance notebooks and include both 65 watt and 47 watt variants. And most are using the new Iris Pro Graphics 6200 implementation.
For desktop users, we get the Core i7-5775C and the Core i5-5675C. The Core i7 model is a quad-core, HyperThreaded CPU with a base clock of 3.3 GHz and a max Turbo clock of 3.7 GHz. It's unlocked so that overclockers and can mess around with them in the same way do with Haswell. The Iris Pro Graphics 6200 can scale up to 1150 MHz and rated DDR3L memory speeds are up to 1600 MHz. 6MB of L3 cache, a 65 watt TDP and a tray price of $366 round out the information we have.
Click to Enlarge
The Core i5-5675C does not include HyperThreading, has clock speed ranges of 3.1 GHz to 3.6 GHz and only sees the Iris Pro scale to 1100 MHz. Also, it drops from 6MB of L3 cache to 4MB. Pricing on this model will start a $276.
These two processors mark the first time we have seen Iris Pro graphics in a socketed form factor, something we have been asking Intel to offer for at least a couple of generations. They focused on 65 watt TDPs rather than anything higher mostly because of the target audience for these chips: if you are interested in the performance of integrated graphics then you likely are pushing a small form factor design or HTPC of some kind. If you have a Haswell-capable motherboard then you SHOULD be able to utilize one of these new processors though you'll want a Z97 board if you are going to try to overclock it.
From a performance standpoint, the Core i7-5775C will offer 2x the gaming performance, 35% faster video transcoding and 20% higher compute performance when compared to the previous top-end 65 watt Haswell part, the Core i7-4790S. That 4th generation part uses Intel HD Graphics 4600 that does not include the massive eDRAM that makes Iris Pro implementations so unique.
For mobile and AIO buyers, Intel has a whole host of new processors to offer. You'll likely find most of the 65 watt parts in all-in-one designs but you may see some mobile designs that go crazy and opt for them too. For the rest of the gaming notebook designs there are CPUs like the Core i7-5950HQ, a quad-core HyperThreaded part with a base clock of 2.9 GHz and max Turbo clock of 3.8 GHz inside a TDP of 47 watts. The Iris Pro Graphics 6200 will scale from 300 to 1150 MHz so GPU performance should basically be on par with the desktop 65-watt equivalent. Pricing is pretty steep though: starting at $623.
Click to Enlarge
These new processors, especially the new 5950HQ, offer impressive compute and gaming performance.
Compared to the Core i7-5600U, already available and used in some SFF and mobile platforms, the Core i7-5950HQ is 2.5x faster in SPECint and nearly 2x faster in a video conversion benchmark. Clearly these machines are going to be potent desktop replacement options.
For mainstream gamers, the Iris Pro Graphics 6200 on 1920x1080 displays will see some impressive numbers. Players of League of Legends, Heroes of the Storm and WoW will see over 60 FPS at the settings listed in the slide above.
We are still waiting for our hardware to show up but we have both the LGA CPUs and notebooks using the BGA option en route. Expect testing from PC Perspective very soon!
Subject: Processors | June 2, 2015 - 08:40 AM | Sebastian Peak
Tagged: rumor, nuc, leak, Intel Skylake, core i5, core i3
A report from FanlessTech shows what appears to be a leaked slide indicating an upcoming Intel 6th-generation Skylake NUC.
The site claims that these new Intel NUCs will be coming out in Q3 for a 6th-generation Core i3 model, and in Q4 for a 6th-gen Core i5 model. and this new NUC will feature 15W TDP Skylake-U processors and 1866 MHz DDR4 memory, along with fast M.2 storage and an SDXC card reader.
True to their name, FanlessTech speculates about the possibility of a passively-cooled version of the NUC: “Out of the box, the Skylake NUC is actively cooled. But fanless cases from Akasa, HDPLEX, Streacom and cirrus7 are to be expected.”
Here are the reported specs of this NUC:
- Intel 6th Generation Core i3 / i5-6xxxU (15W TDP)
- Dual-channel DDR4 SODIMMs 1.2V, 1866 MHz (32GB max)
- Intel HD Graphics 6xxx
- 1 x mini HDMI 1.4a
- 1 x mini DisplayPort 1.2
- 2 x USB 3.0 ports on the back panel
- 2 x USB 3.0 ports on the front panel (1 x charging capable)
- 2 x Internal USB 2.0 via header
- Internal support for M.2 SSD card (22x42 or 22x80)
- Internal SATA3 support for 2.5" HDD/SSD (up to 9.5mm thickness)
- SDXC slot with UHS-I support on the side
- Intel 10/100/1000Mbps Network Connection
- Intel Wireless-AC xxxx M.2 soldered-down, wireless antennas
- IEEE 802.11ac, Bluetooth 4, Intel® Wireless Display
- Up to 7.1 surround audio via Mini HDMI and Mini DisplayPort
- Headphone/Microphone jack on the front panel
- Consumer Infrared sensor on the front panel
- 19V, 65W wall-mount AC-DC power adapter
No further information has been revealed about this alleged upcoming NUC, but we will probably know more soon.
Subject: Processors | May 28, 2015 - 03:44 PM | Scott Michaud
Tagged: Intel, Skylake, skylake-s, haswell, devil's canyon
For a while, it was unclear whether we would see Broadwell on the desktop. With the recently leaked benchmarks of the Intel Core i7-6700K, it seems all-but-certain that Intel will skip it and go straight to Skylake. Compared to Devil's Canyon, the Haswell-based Core i7-4790K, the Skylake-S Core i7-6700K has the same base clock (4.0 GHz) and same full-processor Turbo clock (4.2 GHz). Pretty much every improvement that you see is pure performance per clock (IPC).
Image Credit: CPU Monkey
In multi-threaded applications, the Core i7-6700K tends to get about a 9% increase while, when a single core is being loaded, it tends to get about a 4% increase. Part of this might be the slightly lower single-core Turbo clock, which is said to be 4.2 GHz instead of 4.4 GHz. There might also be some increased efficiency with HyperThreading or cache access -- I don't know -- but it would be interesting to see.
I should note that we know nothing about the GPU. In fact, CPU Monkey fails to list a GPU at all. Intel has expressed interest in bringing Iris Pro-class graphics to the high-end mainstream desktop processors. For someone who is interested in GPU compute, especially with Explicit Unlinked MultiAdapter in DirectX 12 upcoming, it would be nice to see GPUs be ubiquitous and always enabled. It is expected to have the new GT4e graphics with 72 compute units and either 64 or 128MB of eDRAM. If clocks are equivalent, this could translate well over a teraflop (~1.2 TFLOPs) of compute performance in addition to discrete graphics. In discrete graphics, that would be nearly equivalent to an NVIDIA GTX 560 Ti.
We are expecting to see the Core i7-6700K launch in Q3 of this year. We'll see.
Subject: Processors | May 27, 2015 - 09:45 PM | Scott Michaud
Tagged: xeon, Skylake, Intel, Cannonlake, avx-512
AVX-512 is an instruction set that expands the CPU registers from 256-bit to 512-bit. It comes with a core specification, AVX-512 Foundation, and several extensions that can be added where it makes sense. For instance, AVX-512 Exponential and Reciprocal Instructions (ERI) help solve transcendental problems, which occur in geometry and are useful for GPU-style architectures. As such, it appears in Knights Landing but not anywhere else.
Image Credit: Bits and Chips
Today's rumor is that Skylake, the successor to Broadwell, will not include any AVX-512 support in its consumer parts. According to the lineup, Xeons based on Skylake will support AVX-512 Foundation, Conflict Detection Instructions, Vector Length Extensions, Byte and Word Instructions, and Double and Quadword Instructions. Fused Multiply and Add for 52-bit Integers and Vector Byte Manipulation Instructions will not arrive until Cannonlake shrinks everything down to 10nm.
The main advantage of larger registers is speed. When you can fit 512 bits of data in a memory bank and operate upon it at once, you are able to do several, linked calculations together. AVX-512 has the capability to operate on sixteen 32-bit values at the same time, which is obviously sixteen times the compute performance compared with doing just one at a time... if all sixteen undergo the same operation. This is especially useful for games, media, and other, vector-based workloads (like science).
This also makes me question whether the entire Cannonlake product stack will support AVX-512. While vectorization is a cheap way to get performance for suitable workloads, it does take up a large amount of transistors (wider memory, extra instructions, etc.). Hopefully Intel will be able to afford the cost with the next die shrink.
Subject: Graphics Cards, Processors, Displays, Systems | May 15, 2015 - 03:02 PM | Scott Michaud
Tagged: Oculus, oculus vr, nvidia, amd, geforce, radeon, Intel, core i5
Today, Oculus has published a list of what they believe should drive their VR headset. The Oculus Rift will obviously run on lower hardware. Their minimum specifications, published last month and focused on the Development Kit 2, did not even list a specific CPU or GPU -- just a DVI-D or HDMI output. They then went on to say that you really should use a graphics card that can handle your game at 1080p with at least 75 fps.
The current list is a little different:
- NVIDIA GeForce GTX 970 / AMD Radeon R9 290 (or higher)
- Intel Core i5-4590 (or higher)
- 8GB RAM (or higher)
- A compatible HDMI 1.3 output
- 2x USB 3.0 ports
- Windows 7 SP1 (or newer).
I am guessing that, unlike the previous list, Oculus has a more clear vision for a development target. They were a little unclear about whether this refers to the consumer version or the current needs of developers. In either case, it would likely serve as a guide for what they believe developers should target when the consumer version launches.
This post also coincides with the release of the Oculus PC SDK 0.6.0. This version pushes distortion rendering to the Oculus Server process, rather than the application. It also allows multiple canvases to be sent to the SDK, which means developers can render text and other noticeable content at full resolution, but scale back in places that the user is less likely to notice. They can also be updated at different frequencies, such as sleeping the HUD redraw unless a value changes.
The Oculus PC SDK (0.6.0) is now available at the Oculus Developer Center.
Subject: Processors | May 7, 2015 - 07:36 PM | Scott Michaud
Tagged: Intel, xeon, xeon e7 v3, xeon e7
On May 5th, Intel officially announced their new E7 v3 lineup of Xeon processors. This replaces the Xeon E7 v2 processors, which were based on Ivy Bridge-EX, with the newer Haswell-EX architecture. Interestingly, WCCFTech has Broadwell-EX listed next, even though the desktop is expected to mostly skip Broadwell and jump to Skylake in high-performance roles.
The largest model is the E7-8890 v3, which contains eighteen cores fed by a total of 45MB in L3 cache. Despite the high core count, the E7-8890 v3 has its base frequency set at 2.5 GHz to yield a TDP of 165W. The E7-8891 v3 (165W) and the E7-8893 v3 (140W) drop the core count to ten and four, but raise the base frequency to 2.8 GHz and 3.2 GHz, respectively. The E7-8880L v3 is a low power version, relatively speaking, which will also contains eighteen cores that are clocked at 2.0 GHz. This drops its TDP to 115W while still maintaining 45 MB of L3 cache.
Image Credit: WCCFTech
The product stack trickles down from there, but not much further. Just twelve processors are listed in the Xeon E7 segment, which Intel points out in the WCCFTech slides is a significant reduction in SKUs. This suggests that they believe their previous line was too many options for enterprise customers. When dealing with prices in the range of $1,223 - $7,174 USD for bulk orders, it makes sense to offer a little choice to slightly up-sell potential buyers, but too many choices can defeat that purpose. Also, it was a bit humorous to see such an engineering-focused company highlight a reduction of SKUs with a bubble point like it was a technological feature. Not bad, actually quite good as I mentioned above, just a bit funny.
The Xeon E7 v3 is listed as now available, with SKUs ranging from $1223 - $7174 USD.
Some Fresh Hope for 2016
EDIT 2015-05-07: A day after the AMD analyst meeting we now know that the roadmaps delivered here are not legitimate. While some of the information is likely correct on the roadmaps, they were not leaked by AMD. There is no FM3 socket, rather AMD is going with AM4. AMD will be providing more information throughout this quarter about their roadmaps, but for now take all of this information as "not legit".
SH SOTN has some eagle eyes and spotted the latest leaked roadmap for AMD. These roadmaps cover both mobile and desktop, from 2015 through 2016. There are obviously quite a few interesting tidbits of information here.
On the mobility roadmap we see the upcoming release of Carrizo, which we have been talking about since before CES. This will be the very first HSA 1.0 compliant part to hit the market, and AMD has done some really interesting things with the design in terms of performance, power efficiency, and die size optimizations. Carrizo will span the market from 15 watts to 35 watts TDP. This is a mobile only part, but indications point to it being pretty competent overall. This is a true SOC that will support all traditional I/O functions of older standalone southbridges. Most believe that this part will be manufactured by GLOBALFOUNDIRES on their 28 nm HKMG process that is more tuned to AMD's APU needs.
Carrizo-L will be based on the Puma+ architecture and will go from 10 watts to 15 watts TDP. This will use the same FP4 BGA connection as the big Carrizo APU. This should make these parts more palatable for OEMs as they do not have to differentiate the motherboard infrastructure. Making things easier for OEMs will give more reasons for these folks to offer products based on Carrizo and Carrizo-L APUs. The other big reason will be the GCN graphics compute units. Puma+ is a very solid processor architecture for low power products, but these parts are still limited to the older 28 nm HKMG process from TSMC.
One interesting addition here is that AMD will be introducing their "Amur" APU for the low power and ultra-low power markets. These will be comprised of four Cortex-A57 CPUs combined with AMD's GCN graphics units. This will be the first time we see this combination, and the first time AMD has integrated with ARM since ATI spun off their mobile graphics to Qualcomm under the "Adreno" branding (anagram for "Radeon"). What is most interesting here is that this APU will be a 20 nm part most likely fabricated by TSMC. This is not to say that Samsung or GLOBALFOUNDRIES might be producing it, but those companies are expending their energy on the 14 nm FinFET process that will be their bread and butter for years to come. This will be a welcome addition to the mobile market (tablets and handhelds) and could be a nice profit center for AMD if they are able to release this in a timely manner.
2016 is when things get very interesting. The Zen x86 design will dominate the upper 2/3 of the roadmap. I had talked about Zen when we had some new diagram leaks yesterday, but now we get to see the first potential products based off of this architecture. In mobile it will span from 5 watts to 35 watts TDP. The performance and mainstream offerings will be the "Bristol Ridge" APU which will feature 4 Zen cores (or one Zen module) combined with the next gen GCN architecture. This will be a 14nm part, and the assumption is that it will be GLOBALFOUNDRIES using 14nm FinFET LPP (Low Power Plus) that will be more tuned for larger APUs. This will also be a full SOC.
The next APU will be codenamed "Basilisk" that will span the 5 watt to 15 watt range. It will be comprised of 2 Zen cores (1/2 of a Zen module) and likely feature 2 to 4 MB of L3 cache, depending on power requirements. This looks to be the first Skybridge set of APUs that will share the same infrastructure as the ARM based Amur SOC. FT4 BGA is the basis for both the 2015 Amur and 2016 Basilisk SOCs.
Finally we have the first iteration of AMD's first ground up implementation of ARM's ARMv8-A ISA. The "Styx" APU features the new K12 CPU cores that AMD has designed from scratch. It too will feature the next generation GCN units as well as share the same FT4 BGA connection. Many are anxiously watching this space to see if AMD can build a better mousetrap when it comes to licensing the ARM ISA (as have Qualcomm, NVIDIA, and others).
2015 shows no difference in the performance desktop space, as it is still serviced by the now venerable Piledriver based FX parts on AM3+. The only change we expect to see here is that there will be a handful of new motherboard offerings from the usual suspects that will include the new USB 3.1 functionality derived from a 3rd party controller.
Mainstream and Performance will utilize the upcoming Godavari APUs. These are power and speed optimized APUs that are still based on the current Kaveri design. These look to be a simple refresh/rebadge with a slight performance tweak. Not exciting, but needs to happen for OEMs.
Low power will continue to be addressed by Beema based APUs. These are regular Puma based cores (not Puma+). AMD likely does not have the numbers to justify a new product in this rather small market.
2016 is when things get interesting again. We see the release of the FM3 socket (final proof that AM3+ is dead) that will house the latest Zen based APUs. At the top end we see "Summit Ridge" which will be composed of 8 Zen cores (or 2 Zen modules). This will have 4 MB of L2 cache and 16 MB of L3 cache if our other leaks are correct. These will be manufactured on 14nm FinFET LPE (the more appropriate process product for larger, more performance oriented parts). These will not be SOCs. We can expect these to be the basis of new Opterons as well, but there is obviously no confirmation of that on these particular slides. This will be the first new product in some years from AMD that has the chance to compete with higher end desktop SKUs from Intel.
From there we have the lower power Bristol Ridge and Basilisk APUs that we already covered in the mobile discussion. These look to be significant upgrades from the current Kaveri (and upcoming Godavari) APUs. New graphics cores, new CPU cores, and new SOC implementations where necessary.
AMD will really be shaking up the game in 2016. At the very least they will have proven that they can still change up their game and release higher end (and hopefully competitive) products. AMD has enough revenue and cash on hand to survive through 2016 and 2017 at the rate they are going now. We can only hope that this widescale change will allow AMD to make some significant inroads with OEMs on all levels. Otherwise Intel is free to do what they want and what price they want across multiple markets.