Subject: Editorial, General Tech, Graphics Cards, Processors | May 8, 2013 - 09:32 PM | Scott Michaud
Tagged: Volcanic Islands, radeon, ps4, amd
So the Southern Islands might not be entirely stable throughout 2013 as we originally reported; seismic activity being analyzed suggests the eruption of a new GPU micro-architecture as early as Q4. These Volcanic Islands, as they have been codenamed, should explode onto the scene opposing NVIDIA's GeForce GTX 700-series products.
It is times like these where GPGPU-based seismic computation becomes useful.
The rumor is based upon a source which leaked a fragment of a slide outlining the processor in block diagram form and specifications of its alleged flagship chip, "Hawaii". Of primary note, Volcanic Islands is rumored to be organized with both Serial Processing Modules (SPMs) and a Parallel Compute Module (PCM).
So apparently a discrete GPU can have serial processing units embedded on it now.
Heterogeneous Systems Architecture (HSA) is a set of initiatives to bridge the gap between massively parallel workloads and branching logic tasks. We usually make reference to this in terms of APUs and bringing parallel-optimized hardware to the CPU. In this case, we are discussing it in terms of bringing serial processing to the discrete GPU. According to the diagram, the chip within would contain 8 processor modules each with two processing cores and an FPU for a total of 16 cores. There does not seem to be any definite identification whether these cores would be based upon their license to produce x86 processors or their other license to produce ARM processors. Unlike an APU, this is heavily skewed towards parallel computation rather than a relatively even balance between CPU, GPU, and chipset features.
Now of course, why would they do that? Graphics processors can do branching logic but it tends to sharply cut performance. With an architecture such as this, a programmer might be able to more efficiently switch between parallel and branching logic tasks without doing an expensive switch across the motherboard and PCIe bus between devices. Josh Walrath suggested a server containing these as essentially add-in card computers. For gamers, this might help out with workloads such as AI which is awkwardly split between branching logic and massively parallel visibility and path-finding tasks. Josh seems skeptical about this until HSA becomes further adopted, however.
Still, there is a reason why they are implementing this now. I wonder, if the SPMs are based upon simple x86 cores, how the PS4 will influence PC gaming. Technically, a Volcanic Island GPU would be an oversized PS4 within an add-in card. This could give AMD an edge, particularly in games ported to the PC from the Playstation.
This chip, Hawaii, is rumored to have the following specifications:
- 4096 stream processors
- 16 serial processor cores on 8 modules
- 4 geometry engines
- 256 TMUs
- 64 ROPs
- 512-bit GDDR5 memory interface, much like the PS4.
20 nm Gate-Last silicon fab process
- Unclear if TSMC or "Common Platform" (IBM/Samsung/GLOBALFOUNDRIES)
Softpedia is also reporting on this leak. Their addition claims that the GPU will be designed on a 20nm Gate-Last fabrication process. While gate-last is considered to be not worth the extra effort in production, Fully Depleted Silicon On Insulator (FD-SOI) is apparently "amazing" on gate-last at 28nm and smaller fabrication. This could mean that AMD is eying that technology and making this design with intent of switching to an FD-SOI process, without a large redesign which an initially easier gate-first production would require.
Well that is a lot to process... so I will leave you with an open question for our viewers: what do you think AMD has planned with this architecture, and what do you like and/or dislike about what your speculation would mean?
Subject: General Tech, Processors | May 6, 2013 - 02:34 PM | Jeremy Hellstrom
Tagged: silvermont, merrifield, Intel, Bay Trail, atom
The news today is all about shrinking the Atom, both in process size and power consumption. Indeed The Tech Report heard talk of milliwatts and SoC's which shows the change of strategy Intel is having with Atom from small footprint HTPCs to POS and other ultra-low power applications. Hyperthreading has been dropped and Out of Order processing has been brought in which makes far more sense for the new niche Atom is destined for.
"Since their debut five years ago, Intel's Atom microprocessors have relied on the same basic CPU core. Next-gen Atoms will be based on the all-new Silvermont core, and we've taken a closer look at its underlying architecture."
Here is some more Tech News from around the web:
- AMD says HSA will cut latency bottleneck in GPU processing @ The Inquirer
- Redmond probes new IE 8 vulnerability @ The Register
- Not Like a Fine Wine: Windows Activation Still a Piece of Junk After All These Years @ Techgage
- Acer unveils new ultrabooks, notebooks and tablet @ DigiTimes
- Angering hippies and financing evil @ The Tech Report
- BlackBerry 10 passes US defence department tests @ The Register
- The TR Podcast 133: Iris graphics and the Radeon HD 7990
A much needed architecture shift
It has been almost exactly five years since the release of the first Atom branded processors from Intel, starting with the Atom 230 and 330 based on the Diamondville design. Built for netbooks and nettops at the time, the Atom chips were a reaction to a unique market that the company had not planned for. While the early Atoms were great sellers, they were universally criticized by the media for slow performance and sub-par user experiences.
Atom has seen numerous refreshes since 2008, but they were all modifications of the simplistic, in-order architecture that was launched initially. With today's official release of the Silvermont architecture, the Atom processors see their first complete redesign from the ground up. With the focus on tablets and phones rather than netbooks, can Intel finally find a foothold in the growing markets dominated by ARM partners?
I should note that even though we are seeing the architectural reveal today, Intel doesn't plan on having shipping parts until late in 2013 for embedded, server and tablets and not until 2014 for smartphones. Why the early reveal on the design then? I think that pressure from ARM's designs (Krait, Exynos) as well as the upcoming release of AMD's own Kabini is forcing Intel's hand a bit. Certainly they don't want to be perceived as having fallen behind and getting news about the potential benefits of their own x86 option out in the public will help.
Silvermont will be the first Atom processor built on the 22nm process, leaving the 32nm designs of Saltwell behind it. This also marks the beginning of a new change in the Atom design process, to adopt the tick/tock model we have seen on Intel's consumer desktop and notebook parts. At the next node drop of 14nm, we'll see see an annual cadence that first focuses on the node change, then an architecture change at the same node.
By keeping Atom on the same process technology as Core (Ivy Bridge, Haswell, etc), Intel can put more of a focus on the power capabilities of their manufacturing.
Subject: Processors | May 3, 2013 - 06:45 AM | Tim Verry
Tagged: z87, overclocking, Intel, haswell, core i7 4770k, 7ghz
OCaholic has spotted an interesting entry in the CPU-Z database. According to the site, an overclocker by the handle of “rtiueuiurei” has allegedly managed to push an engineering sample of Intel’s upcoming Haswell Core i7-4770K processor past 7GHz.
If the CPU-Z entry is accurate, the overclocker used a BCLK speed of 91.01 and a multiplier of 77 to achieve a CPU clockspeed of 7012.65MHz. The chip was overclocked on a Z87 motherboard along with a single 2GB G.Skill DDR3 RAM module. Even more surprising than the 7GHz clockspeed is the voltage that the overclocker used to get there: an astounding 2.56V according to CPU-Z.
From the information Intel provided at IDF Beijing, the new 22nm Haswell processors feature an integrated voltage regulator (IVR), and the CPU portion of the chip’s voltage is controlled by the Vccin value. Intel recommends a range of 1.8V to 2.3V for this value, with a maximum of 3V and a default of 1.8V. Therefore, the CPU-Z-reported number may actually be correct. On the other hand, it may also just be a bug in the software due to the unreleased-nature of the Haswell chip.
Voltage questions aside, the frequency alone makes for an impressive overclock, and it seems that the upcoming chips will have decent overclocking potential!
The Intel HD Graphics are joined by Iris
Intel gets a bad wrap on the graphics front. Much of it is warranted but a lot of it is really just poor marketing about the technologies and features they implement and improve on. When AMD or NVIDIA update a driver or fix a bug or bring a new gaming feature to the table, they are sure that every single PC hardware based website knows about and thus, that as many PC gamers as possible know about it. The same cannot be said about Intel though - they are much more understated when it comes to trumpeting their own horn. Maybe that's because they are afraid of being called out on some aspects or that they have a little bit of performance envy compared to the discrete options on the market.
Today might be the start of something new from the company though - a bigger focus on the graphics technology in Intel processors. More than a month before the official unveiling of the Haswell processors publicly, Intel is opening up about SOME of the changes coming to the Haswell-based graphics products.
We first learned about the changes to Intel's Haswell graphics architecture way back in September of 2012 at the Intel Developer Forum. It was revealed then that the GT3 design would essentially double theoretical output over the currently existing GT2 design found in Ivy Bridge. GT2 will continue to exist (though slightly updated) on Haswell and only some versions of Haswell will actually see updates to the higher-performing GT3 options.
In 2009 Intel announced a drive to increase graphics performance generation to generation at an exceptional level. Not long after they released the Sandy Bridge CPU and the most significant performance increase in processor graphics ever. Ivy Bridge followed after with a nice increase in graphics capability but not nearly as dramatic as the SNB jump. Now, according to this graphic, the graphics capability of Haswell will be as much as 75x better than the chipset-based graphics from 2006. The real question is what variants of Haswell will have that performance level...
I should note right away that even though we are showing you general performance data on graphics, we still don't have all the details on what SKUs will have what features on the mobile and desktop lineups. Intel appears to be trying to give us as much information as possible without really giving us any information.
Subject: Cases and Cooling, Processors | May 1, 2013 - 03:07 PM | Ryan Shrout
Tagged: power supply, Intel, idle, haswell, c7, c6
I came across an interesting news story posted by The Tech Report this morning that dives into the possibility of problems with Intel's upcoming Haswell processors and currently available power supplies. Apparently, the new C6 and C7 idle power states that give the new Haswell architecture benefits for low power scenarios place a requirement of receiving a 0.05 amps load on the 12V2 rail. (That's just 50 milliamps!) Without that capability, the system can exhibit unstable behavior and a quick look at the power supply selector on Intel's own website is only listing a couple dozen that support the feature.
This table from VR-Zone, the source of the information initially, shows the difference between the requirements for 3rd (Ivy Bridge) and 4th generation (Haswell) processors. The shift is an order of magnitude and is quite a dramatic change for PSU vendors. Users of Corsair power supplies will be glad to know that among those listed with support on the Intel website linked above were mostly Corsair units!
A potential side effect of this problem might be that motherboard vendors simply disable those sleep states by default. I don't imagine that will be a problem for PC builders anyway since most desktop users aren't really worried about the extremely small differences in power consumption they offer. For mobile users and upcoming Haswell notebook designs the increase in battery life is crucial though and Intel has surely been monitoring those power supplies closely.
I asked our in-house power supply guru, Lee Garbutt, who is responsible for all of the awesome power supply reviews on pcper.com, what he thought about this issue. He thinks the reason more power supplies don't support it already is for power efficiency concerns:
Most all PSUs have traditionally required "some load" on the various outputs to attain good voltage regulation and/or not shut down. Not very many PSUs are designed yet to operate with no load, especially on the critical +12V output. One of the reasons for this is efficiency. Its harder to design a PSU to operate correctly with a very low load AND to deliver high efficiency. It would be easy just to add some bleed resistance across the DC outputs to always have a minimal load to keep voltage regulation under control but then that lowers efficiency.
Subject: Processors | April 30, 2013 - 02:04 PM | Josh Walrath
Tagged: amd, FX, vishera, bulldozer, FX-6350, FX-4350, FX-6300, FX-4300, 32 nm, SOI, Beloved
Today AMD has released two new processors that address the AM3+ market. The FX-6350 and FX-4350 are two new refreshes of the quad and hex core lineup of processors. Currently the FX-8350 is still the fastest of the breed, and there is no update for that particular number yet. This is not necessarily a bad thing, but there are those of us who are still awaiting the arrival of the rumored “Centurion”.
These parts are 125 watt TDP units, which are up from their 95 watt predecessors. The FX-6350 runs at 3.9 GHz with a 4.2 GHz boost clock. This is up 300 MHz stock and 100 MHz boost from the previous 95 watt FX-6300. The FX-4350 runs at 3.9 GHz with a 4.3 GHz boost clock. This is 100 MHz stock and 300 MHz boost above that of the FX-4300. What is of greater interest here is that the L3 cache goes from 4 MB on the 4300 to 8 MB on the 4350. This little fact looks to be the reason why the FX-4350 is now a 125 watt TDP part.
It has been some two years since AMD started shipping 32 nm PD-SOI/HKMG products to the market, and it certainly seems as though spinning off GLOBALFOUNDRIES has essentially stopped the push to implement new features into a process node throughout the years. As many may remember, AMD was somewhat famous for injecting new process technology into current nodes to improve performance, yields, and power characteristics in “baby steps” type fashion instead of leaving the node as is and making a huge jump with the next node. Vishera has been out for some 7 months now and we have not really seen any major improvement in regards to performance and power characteristics. I am sure that yields and bins have improved, but the bottom line is that this is only a minor refresh and AMD raised TDPs to 125 watts for these particular parts.
The FX-6350 is again a three module part containing six cores. Each module features 2 MB of L2 cache for a total of 6 MB L2 and the entire chip features 8 MB of L3 cache. The FX-4350 is a two module chip with four cores. The modules again feature the same 2 MB of L2 cache for a total of 4 MB active on the chip with the above mentioned 8 MB of L3 cache that is double what the FX-4300 featured.
Perhaps soon we will see updates on FM2 with the Richland series of desktop processors, but for now this refresh is all AMD has at the moment. These are nice upgrades to the line. The FX-6350 does cost the same as the FX-6300, but the thinking behind that is that the 6300 is more “energy efficient”. We have seen in the past that AMD (and Intel for that matter) does put a premium on lower wattage parts in a lineup. The FX-4350 is $10 more expensive than the 4300. It looks as though the FX-6350 is in stock at multiple outlets but the 4350 has yet to show up.
These will fit in any modern AM3+ motherboard with the latest BIOS installed. While not an incredibly exciting release from AMD, it at least shows that they continue to address their primary markets. AMD is in a very interesting place, and it looks like Rory Read is busy getting the house in order. Now we just have to see if they can curve back their cost structure enough to make the company more financially stable. Indications are good so far, but AMD has a long ways to go. But hey, at least according to AMD the FX series is beloved!
heterogeneous Uniform Memory Access
Several years back we first heard AMD’s plans on creating a uniform memory architecture which will allow the CPU to share address spaces with the GPU. The promise here is to create a very efficient architecture that will provide excellent performance in a mixed environment of serial and parallel programming loads. When GPU computing came on the scene it was full of great promise. The idea of a heavily parallel processing unit that will accelerate both integer and floating point workloads could be a potential gold mine in wide variety of applications. Alas, the promise of the technology did not meet expectations when we have viewed the results so far. There are many problems with combining serial and parallel workloads between CPUs and GPUs, and a lot of this has to do with very basic programming and the communication of data between two separate memory pools.
CPUs and GPUs do not share common memory pools. Instead of using pointers in programming to tell each individual unit where data is stored in memory, the current implementation of GPU computing requires the CPU to write the contents of that address to the standalone memory pool of the GPU. This is time consuming and wastes cycles. It also increases programming complexity to be able to adjust to such situations. Typically only very advanced programmers with a lot of expertise in this subject could program effective operations to take these limitations into consideration. The lack of unified memory between CPU and GPU has hindered the adoption of the technology for a lot of applications which could potentially use the massively parallel processing capabilities of a GPU.
The idea for GPU compute has been around for a long time (comparatively). I still remember getting very excited about the idea of using a high end video card along with a card like the old GeForce 6600 GT to be a coprocessor which would handle heavy math operations and PhysX. That particular plan never quite came to fruition, but the idea was planted years before the actual introduction of modern DX9/10/11 hardware. It seems as if this step with hUMA could actually provide a great amount of impetus to implement a wide range of applications which can actively utilize the GPU portion of an APU.
Jaguar Hits the Embedded Space
It has long been known that AMD has simply not had a lot of luck going head to head against Intel in the processor market. Some years back they worked on differentiating themselves, and in so doing have been able to stay afloat through hard times. The acquisitions that AMD has made in the past decade are starting to make a difference in the company, especially now that the PC market that they have relied upon for revenue and growth opportunities is suddenly contracting. This of course puts a cramp in AMD’s style, but with better than expected results in their previous quarter, things are not nearly as dim as some would expect.
Q1 was still pretty harsh for AMD, but they maintained their marketshare in both processors and graphics chips. One area that looks to get a boost is that of embedded processors. AMD has offered embedded processors for some time, but with the way the market is heading they look to really ramp up their offerings to fit in a variety of applications and SKUs. The last generation of G-series processors were based upon the Bobcat/Brazos platform. This two chip design (APU and media hub) came in a variety of wattages with good performance from both the CPU and GPU portion. While the setup looked pretty good on paper, it was not widely implemented because of the added complexity of a two chip design plus thermal concerns vs. performance.
AMD looks to address these problems with one of their first, true SOC designs. The latest G-series SOC’s are based upon the brand new Jaguar core from AMD. Jaguar is the successor to the successful Bobcat core which is a low power, dual core processor with integrated DX11/VLIW5 based graphics. Jaguar improves performance vs. Bobcat in CPU operations between 6% to 13% when clocked identically, but because it is manufactured on a smaller process node it is able to do so without using as much power. Jaguar can come in both dual core and quad core packages. The graphics portion is based on the latest GCN architecture.
Subject: Processors | April 17, 2013 - 09:48 PM | Tim Verry
Tagged: overclocking, intel ivr, intel hd graphics, Intel, haswell, cpu
During the Intel Developer Forum in Beijing, China the X86 chip giant revealed details about how overclocking will work on its upcoming Haswell processors. Enthusiasts will be pleased to know that the new chips do not appear to be any more restrictive than the existing Ivy Bridge processors as far as overclocking. Intel has even opened up the overclocking capabilities slightly by allowing additional BCLK tiers without putting aspects such as the PCI-E bus out of spec.
The new Haswell chips have an integrated voltage regulator, which allows programmable voltage to both the CPU, Memory, and GPU portions of the chip. As far as overclocking the CPU itself, Intel has opened up the Turbo Boost and is allowing enthusiasts to set an overclocked Turbo Boost clockspeed. Additionally, Intel is specifying available BCLK values of 100, 125, and 167MHz without putting other systems out of spec (they use different ratios to counterbalance the increased BCLK, which is important for keeping the PCI-E bus within ~100Mhz). The chips will also feature unlocked core ratios all the way up to 80 in 100MHz increments. That would allow enthusiasts with a cherry-picked chip and outrageous cooling to clock the chip up to 8GHz without overclocking the BCLK value (though no chip is likely to reach that clockspeed, especially for everyday usage!).
Remember that the CPU clockspeed is determined by the BCLK value times a pre-set multiplier. Unlocked processors will allow enthusiasts to adjust the multiplier up or down as they please, while non-K edition chips will likely only permit lower multipliers with higher-than-default multipliers locked out. Further, Intel will allow the adventurous to overclock the BLCK value above the pre-defined 100, 125, and 167MHz options, but the chip maker expects most chips will max out at anywhere between five-to-seven percent higher than normal. PC Perspective’s Morry Teitelman speculates that slightly higher BCLK overclocks may be possible if you have a good chip and adequate cooling, however.
Similar to current-generation Ivy Bridge (and Sandy Bridge before that) processors, Intel will pack Haswell processors with its own HD Graphics pGPU. The new HD Graphics will be unlocked and the graphics ratio will be able to scale up to a maximum of 60 in 50MHz steps for a potential maximum of 3GHz. The new processor graphics cards will also benefit from Intel’s IVR (programmable voltage) circuitry. The HD Graphics and CPU are fed voltage from the integrated voltage regulator (IVR), and is controlled by adjusting the Vccin value. The default is 1.8V, but it supports a recommended range of 1.8V to 2.3V with a maximum of 3V.
Finally, Intel is opening up the memory controller to further overclocking. Intel will allow enthusiasts to overclock the memory in either 200MHz or 266MHz increments, which allows for a maximum of either 2,000MHz or 2,666MHz respectively. The default voltage will depend on the particular RAM DIMMs you use, but can be controlled via the Vddq IVR setting.
It remains to be seen how Intel will lock down the various processor SKUs, especially the non-K edition chips, but at least now we have an idea of how a fully-unlocked Haswell processor will overclock. On a positive note, it is similar to what we have become used to with Ivy Bridge, so similar overclocking strategies for getting the most out of processors should still apply with a bit of tweaking. I’m interested to see how the integration of the voltage regulation hardware will affect overclocking though. Hopefully it will live up to the promises of increased efficiency!
Are you gearing up for a Haswell overhaul of your system, and do you plan to overclock?