All | Editorial | General Tech | Graphics Cards | Networking | Motherboards | Cases and Cooling | Processors | Chipsets | Memory | Displays | Systems | Storage | Mobile | Shows and Expos
Subject: Processors | September 2, 2016 - 01:39 AM | Tim Verry
Tagged: IBM, power9, power 3.0, 14nm, global foundries, hot chips
Earlier this month at the Hot Chips symposium, IBM revealed details on its upcoming Power9 processors and architecture. The new chips are aimed squarely at the data center and will be used for massive number crunching in big data and scientific applications in servers and supercomputer nodes.
Power9 is a big play from Big Blue, and will help the company expand its precense in the Intel-ruled datacenter market. Power9 processors are due out in 2018 and will be fabricated at Global Foundries on a 14nm HP FinFET process. The chips feature eight billion transistors and utilize an “execution slice microarchitecture” that lets IBM combine “slices” of fixed, floating point, and SIMD hardware into cores that support various levels of threading. Specifically, 2 slices make an SMT4 core and 4 slices make an SMT8 core. IBM will have Power9 processors with 24 SMT4 cores or 12 SMT8 cores (more on that later). Further, Power9 is IBM’s first processor to support its Power 3.0 instruction set.
According to IBM, its Power9 processors are between 50% to 125% faster than the previous generation Power8 CPUs depending on the application tested. The performance improvement is thanks to a doubling of the number of cores as well as a number of other smaller improvements including:
- A 5 cycle shorter pipeline versus Power8
- A single instruction random number generator (RNG)
- Hardware assisted garbage collection for interpreted languages (e.g. Java)
- New interrupt architecture
- 128-bit quad precision floating point and decimal math support
- Important for finance and security markets, massive databases and money math.
- IEEE 754
- CAPI 2.0 and NVLink support
- Hardware accelerators for encryption and compression
The Power9 processor features 120 MB of direct attached eDRAM that acts as an L3 cache (256 GB/s). The chips offer up 7TB/s of aggregate fabric bandwidth which certainly sounds impressive but that is a number with everything added together. With that said, there is a lot going on under the hood. Power9 supports 48 lanes of PCI-E 4.0 (2 GB/s per lane per direction), 48 lanes of proprietary 25Gbps accelerator lanes – these will be used for NVLink 2.0 to connect to NVIDIA GPUs as well as to connect to FPGAs, ASICs, and other accelerators or new memory technologies using CAPI 2.0 (Coherent Accelerator Processor Interface) – , and four 16Gbps SMP links (NUMA) used to combine four quad socket Power9 boards into a single 16 socket “cluster.”
These are processors that are built to scale and tackle the big data problems. In fact, not only is Google interested in Power9 to power its services, but the US Department of Energy will be building two supercomputers using IBM’s Power9 CPUs and NVIDI’s Volta GPUs. Summit and Sierra will offer between 100 to 300 Petaflops of computer power and will be installed at Oak Ridge National Laboratory and Lawrence Livermore National Laboratory respectively. There, some of the projects they will tackle is enabling the researchers to visualize the internals of a virtual light water reactor, research methods to improve fuel economy, and delve further into bioinformatics research.
The Power9 processors will be available in four variants that differ in the number of cores and number of threads each core supports. The chips are broken down into Power9 SO (Scale Out) and Power9 SU (Scale Up) and each group has two processors depending on whether you need a greater number of weaker cores or a smaller number of more powerful cores. Power9 SO chips are intended for multi-core systems and will be used in servers with one or two sockets while Power9 SU chips are for multi-processor systems with up to four sockets per board and up to 16 total sockets per cluster when four four socket boards are linked together. Power9 SO uses DDR4 memory and supports a theoretical maximum 4TB of memory (1TB with today’s 64GB DIMMS) and 120 GB/s of bandwidth while Power9 SU uses IBM’s buffered “Centaur” memory scheme that allows the systems to address a theoretical maximum of 8TB of memory (2TB with 64GB DIMMS) at 230 GB/s. In other words, the SU series is Big Blue’s “big guns.”
A photo of the 24 core SMT4 Power9 SO die.
Here is where it gets a bit muddy. The processors are further broken down by an SMT4 or SMT8 and both Power9 SO and Power9 SU have both options. There are Power9 CPUs with 24 SMT4 cores and there are CPUs with 12 SMT8 cores. IBM indicated that SMT4 (four threads per core) was suited to systems running Linux and virtualization with emphasis on high core counts. Meanwhile SMT8 (eight threads per core) is a better option for large logical partitions (one big system versus partitioning out the compute cluster into smaller VMs as above) and running IBM’s Hypervisor. In either case (24 SMT4 or 12 SMT8) there is the same number of total threads, but you are able to choose whether you want fewer “stronger” threads on each core or more (albeit weaker) threads per core depending on which you workloads are optimized for.
Servers supporting Power9 are already under development by Google and Rackspace and blueprints are even available from the OpenPower Foundation. Currently, it appears that Power9 SO will emerge as soon as the second half of next year (2H 2017) with Power9 SU following in 2018 which would line up with the expected date for the Summit and Sierra supercomputer launches.
This is not a chip that will be showing up in your desktop any time soon, but it is an interesting high performance processor! I will be keeping an eye on updates from Oak Ridge lab hehe.
Subject: Processors, Mobile | August 31, 2016 - 07:30 AM | Sebastian Peak
Tagged: SoC, Snapdragon 821, snapdragon, SD821, qualcomm, processor, mobile, adreno
Qualcomm has officially launched the Snapdragon 821 SoC, an upgraded successor to the existing Snapdragon 820 found in such phones as the Samsung Galaxy S7.
"With Snapdragon 820 already powering many of the premier flagship Android smartphones today, Snapdragon 821 is now poised to become the processor of choice for leading smartphones and devices for this year’s holiday season. Qualcomm Technologies’ engineers have improved Snapdragon 821 in three key areas to ensure Snapdragon 821 maintains the level of industry leadership introduced by its predecessor."
Specifications were previously revealed when the Snapdragon 821 was announced in July, with a 10% increase on the CPU clocks (2.4 GHz, up from the previous 2.2 GHz max frequency). The Adreno 530 GPU clock increases 5%, to 650 MHz from 624 MHz. In addition to improved performance from CPU and GPU clock speed increases, the SD821 is said to offer lower power consumption (estimated at 5% compared to the SD820), and offers new functionality including improved auto-focus capability.
Enhanced overall user experience:
The Snapdragon 821 has been specifically tuned to support a more responsive user experience when compared with the 820, including:
- Shorter boot times: Snapdragon 821 powered devices can boot up to 10 percent faster.
- Faster application launch times: Snapdragon 821 can reduce app load times by up to 10 percent.
- Smoother, more responsive user interactions: UI optimizations and performance enhancements designed to allow users to enjoy smoother scrolling and more responsive browsing performance.
Improved performance and power consumption:
- CPU speeds increase: As we previously announced, the 821 features Qualcomm Kryo CPU speeds up to 2.4GHz, representing an up to 10 percent improvement in performance over Snapdragon 820.
- GPU speeds increase: The Qualcomm Adreno GPU received a 5 percent speed increase over Snapdragon 820.
- Power savings: The 821 is engineered to deliver an incremental 5 percent power savings when comparing standard use case models. This power savings can extend battery life and support OEMs interested in reducing battery size for slimmer phones.
New features and functionality:
- Snapdragon 821 introduces several new features and capabilities, offering OEMs new options to create more immersive and engaging user experiences, including support for:
- Snapdragon VR SDK (Software Development Kit): Offers developers a superior mobile VR toolset, provides compatibility with the Google Daydream platform, and access to Snapdragon 821’s powerful heterogeneous architecture. Snapdragon VR SDK supports a superior level of visual and audio quality and more immersive virtual reality and gaming experiences in a mobile environment.
- Dual PD (PDAF): Offers significantly faster image autofocus speeds under a wide variety of conditions when compared to single PDAF solutions.
- Extended Laser Auto-Focus Ranging: Extends the visible focusing range, improving laser focal accuracy over Snapdragon 820.
- Android Nougat OS: Snapdragon 821 (as well as the 820) will support the latest Android operating system when available, offering new features, expanded compatibility, and additional security compared to prior Android versions.
Qualcomm says the ASUS ZenFone 3 Deluxe is the first phone to use this new Snapdragon 821 SoC while other OEMs will be working on designs implementing the upgraded SoC.
Subject: Processors | August 22, 2016 - 05:37 PM | Jeremy Hellstrom
Tagged: amd, a10-7870K
Leaving aside the questionable naming to instead focus on the improved cooler on this ~$130 APU from AMD. Neoseeker fired up the fun sized, 125W rated cooler on top of the A10-7870K and were pleasantly surprised at the lack of noise even under load. Encouraged by the performance they overclocked the chip by 500MHz to 4.4GHz and were rewarded with a stable and still very quiet system. The review focuses more the improvements the new cooler offers as opposed to the APU itself, which has not changed. Check out the review if you are considering a lower cost system that only speaks when spoken to.
"In order to find out just how much better the 125W thermal solution will perform, I am going to test the A10-7870K APU mounted on a Gigabyte F2A88X-UP4 motherboard provided by AMD with a set of 16 GB (2 x 8) DDR3 RAM modules set at 2133 MHz speed. I will then run thermal and fan speed tests so a comparison of the results will provide a meaningful data set to compare the near-silent 125W cooler to an older model AMD cooling solution."
Here are some more Processor articles from around the web:
GlobalFoundries Will Allegedly Skip 10nm and Jump to Developing 7nm Process Technology In House (Updated)
Subject: Processors | August 20, 2016 - 03:06 PM | Tim Verry
Tagged: Semiconductor, lithography, GLOBALFOUNDRIES, global foundries, euv, 7nm, 10nm
UPDATE (August 22nd, 11:11pm ET): I reached out to GlobalFoundries over the weekend for a comment and the company had this to say:
"We would like to confirm that GF is transitioning directly from 14nm to 7nm. We consider 10nm as more a half node in scaling, due to its limited performance adder over 14nm for most applications. For most customers in most of the markets, 7nm appears to be a more favorable financial equation. It offers a much larger economic benefit, as well as performance and power advantages, that in most cases balances the design cost a customer would have to spend to move to the next node.
As you stated in your article, we will be leveraging our presence at SUNY Polytechnic in Albany, the talent and know-how gained from the acquisition of IBM Microelectronics, and the world-class R&D pipeline from the IBM Research Alliance—which last year produced the industry’s first 7nm test chip with working transistors."
An unexpected bit of news popped up today via TPU that alleges GlobalFoundries is not only developing 7nm technology (expected), but that the company will skip production of the 10nm node altogether in favor of jumping straight from the 14nm FinFET technology (which it licensed from Samsung) to 7nm manufacturing based on its own in house design process.
Reportedly, the move to 7nm would offer 60% smaller chips at three times the design cost of 14nm which is to say that this would be both an expensive and impressive endeavor. Aided by Extreme Ultraviolet (EUV) lithography, GlobalFoundries expects to be able to hit 7nm production sometime in 2020 with prototyping and small usage of EUV in the year or so leading up to it. The in house process tech is likely thanks to the research being done at the APPC (Advanced Patterning and Productivity Center) in Albany New York along with the expertise of engineers and design patents and technology (e.g. ASML NXE 3300 and 3300B EUV) purchased from IBM when it acquired IBM Microelectronics. The APPC is reportedly working simultaneously on research and development of manufacturing methods (especially EUV where extremely small wavelengths of ultraviolet light (14nm and smaller) are used to etch patterns into silicon) and supporting production of chips at GlobalFoundries' "Malta" fab in New York.
Advanced Patterning and Productivity Center in Albany, NY where Global Foundries, SUNY Poly, IBM Engineers, and other partners are forging a path to 7nm and beyond semiconductor manufacturing. Photo by Lori Van Buren for Times Union.
Intel's Custom Foundry Group will start pumping out ARM chips in early 2017 followed by Intel's own 10nm Cannon Lake processors in 2018 and Samsung will be offering up its own 10nm node as soon as next year. Meanwhile, TSMC has reportedly already tapped out 10nm wafers and will being prodction in late 2016/early 2017 and claims that it will hit 5nm by 2020. With its rivals all expecting production of 10nm chips as soon as Q1 2017, GlobalFoundries will be at a distinct disadvantage for a few years and will have only its 14nm FinFET (from Samsung) and possibly its own 14nm tech to offer until it gets the 7nm production up and running (hopefully!).
Previously, GlobalFoundries has stated that:
“GLOBALFOUNDRIES is committed to an aggressive research roadmap that continually pushes the limits of semiconductor technology. With the recent acquisition of IBM Microelectronics, GLOBALFOUNDRIES has gained direct access to IBM’s continued investment in world-class semiconductor research and has significantly enhanced its ability to develop leading-edge technologies,” said Dr. Gary Patton, CTO and Senior Vice President of R&D at GLOBALFOUNDRIES. “Together with SUNY Poly, the new center will improve our capabilities and position us to advance our process geometries at 7nm and beyond.”
If this news turns out to be correct, this is an interesting move and it is certainly a gamble. However, I think that it is a gamble that GlobalFoundries needs to take to be competitive. I am curious how this will affect AMD though. While I had expected AMD to stick with 14nm for awhile, especially for Zen/CPUs, will this mean that AMD will have to go to TSMC for its future GPUs or will contract limitations (if any? I think they have a minimum amount they need to order from GlobalFoundries) mean that GPUs will remain at 14nm until GlobalFoundries can offer its own 7nm? I would guess that Vega will still be 14nm, but Navi in 2018/2019? I guess we will just have to wait and see!
- To 7nm And Beyond (Interview @ Semiconductor Engineering)
- GloFo Looks For 7nm Leadership @ Electronics Weekly
- GlobalFoundries develops 7nm and 10nm technologies in-house @ KitGuru
- SUNY Poly and GLOBALFOUNDRIES Announce New $500M R&D Program in Albany To Accelerate Next Generation Chip Technology @ GlobalFoundries (PR)
- AMD GPU Roadmap: Capsaicin Names Upcoming Architectures @ PC Perspective
- Next Gen Graphics and Process Migration: 20 nm and Beyond @ PC Perspective
Subject: Graphics Cards, Processors | August 17, 2016 - 01:38 PM | Scott Michaud
Tagged: Xeon Phi, larrabee, Intel
Tom Forsyth, who is currently at Oculus, was once on the core Larrabee team at Intel. Just prior to Intel's IDF conference in San Francisco, which Ryan is at and covering as I type this, Tom wrote a blog post that outlined the project and its design goals, including why it didn't hit market as a graphics device. He even goes into the details of the graphics architecture, which was almost entirely in software apart from texture units and video out. For instance, Larrabee was running FreeBSD with a program, called DirectXGfx, that gave it the DirectX 11 feature set -- and it worked on hundreds of titles, too.
Also, if you found the discussion interesting, then there is plenty of content from back in the day to browse. A good example is an Intel Developer Zone post from Michael Abrash that discussed software rasterization, doing so with several really interesting stories.
Subject: General Tech, Processors, Displays, Shows and Expos | August 16, 2016 - 01:50 PM | Ryan Shrout
Tagged: VR, virtual reality, project alloy, Intel, augmented reality, AR
At the opening keynote to this summer’s Intel Developer Forum, CEO Brian Krzanich announced a new initiative to enable a completely untether VR platform called Project Alloy. Using Intel processors and sensors the goal of Project Alloy is to move all of the necessary compute into the headset itself, including enough battery to power the device for a typical session, removing the need for a high powered PC and a truly cordless experience.
This is indeed the obvious end-game for VR and AR, though Intel isn’t the first to demonstrate a working prototype. AMD showed the Sulon Q, an AMD FX-based system that was a wireless VR headset. It had real specs too, including a 2560x1440 OLED 90Hz display, 8GB of DDR3 memory, an AMD FX-8800P APU with R7 graphics embedded. Intel’s Project Alloy is currently using unknown hardware and won’t have a true prototype release until the second half of 2017.
There is one key advantage that Intel has implemented with Alloy: RealSense cameras. The idea is simple but the implications are powerful. Intel demonstrated using your hands and even other real-world items to interact with the virtual world. RealSense cameras use depth sensing to tracking hands and fingers very accurately and with a device integrated into the headset and pointed out and down, Project Alloy prototypes will be able to “see” and track your hands, integrating them into the game and VR world in real-time.
The demo that Intel put on during the keynote definitely showed the promise, but the implementation was clunky and less than what I expected from the company. Real hands just showed up in the game, rather than representing the hands with rendered hands that track accurately, and it definitely put a schism in the experience. Obviously it’s up to the application developer to determine how your hands would actually be represented, but it would have been better to show case that capability in the live demo.
Better than just tracking your hands, Project Alloy was able to track a dollar bill (why not a Benjamin Intel??!?) and use it to interact with a spinning lathe in the VR world. It interacted very accurately and with minimal latency – the potential for this kind of AR integration is expansive.
Those same RealSense cameras and data is used to map the space around you, preventing you from running into things or people or cats in the room. This enables the first “multi-room” tracking capability, giving VR/AR users a new range of flexibility and usability.
Though I did not get hands on with the Alloy prototype itself, the unit on-stage looked pretty heavy, pretty bulky. Comfort will obviously be important for any kind of head mounted display, and Intel has plenty of time to iterate on the design for the next year to get it right. Both AMD and NVIDIA have been talking up the importance of GPU compute to provide high quality VR experiences, so Intel has an uphill battle to prove that its solution, without the need for external power or additional processing, can truly provide the untethered experience we all desire.
Subject: Processors | July 28, 2016 - 02:47 PM | Tim Verry
Tagged: kaby lake, Intel, gt3e, coffee lake, 14nm
Intel will allegedly be releasing another 14nm processor following Kaby Lake (which is itself a 14nm successor to Skylake) in 2018. The new processors are code named "Coffee Lake" and will be released alongside low power runs of 10nm Cannon Lake chips.
Not much information is known about Coffee Lake outside of leaked slides and rumors, but the first processors slated to launch in 2018 will be mainstream mobile chips that will come in U and HQ mobile flavors which are 15W to 28W and 35W to 45W TDP chips respectively. Of course, these processors will be built on a very mature 14nm process with the usual small performance and efficiency gains beyond Skylake and Kaby Lake. The chips should have a better graphics unit, but perhaps more interesting is that the slides suggest that Coffee Lake will be the first architecture where Intel will bring "hexacore" (6 core) processors into mainstream consumer chips! The HQ-class Coffee Lake processors will reportedly come in two, four, and six core variants with Intel GT3e class GPUs. Meanwhile the lower power U-class chips top out at dual cores with GT3e class graphics. This is interesting because Intel has previous held back the six core CPUs for its more expensive and higher margin HEDT and Xeon platforms.
Of course 2018 is also the year for Cannon Lake which would have been the "tock" in Intel's old tick-tock schedule (which is no more) as the chips will move to a smaller process node and then Intel would improve on the 10nm process from there in future architectures. Cannon Lake is supposed to be built on the tiny 10nm node, and it appears that the first chips on this node will be ultra low power versions for laptops and tablets. Occupying the ULV platform's U-class (15W) and Y-class (4.5W), Cannon Lake CPUs will be dual cores with GT2 graphics. These chips should sip power while giving comparable performance to Kaby and Coffee Lake perhaps even matching the performance of the Coffee Lake U processors!
Stay tuned to PC Perspective for more information!
Subject: Processors, Mobile | July 18, 2016 - 12:03 AM | Sebastian Peak
Tagged: softbank, SoC, smartphones, mobile cpu, Cortex-A73, ARM Holdings, arm, acquisition
ARM Holdings is to be aquired by SoftBank for $32 billion USD. This report has been confirmed by the Wall Street Journal, who states that an official annoucement of the deal is likely on Monday as "both companies’ boards have agreed to the deal".
(Image credit: director.co.uk)
"Japan’s SoftBank Group Corp. has reached a more than $32 billion deal to buy U.K.-based chip-designer ARM HoldingsPLC, marking a significant push for the Japanese telecommunications giant into the mobile internet, according to a person familiar with the situation." - WSJ
ARM just announced their newest CPU core, the Cortex-A73, at the end of May, with performance and efficiency improvements over the current Cortex-A72 promised with the new architecture.
(Image credit: AnandTech)
We will have to wait and see if this aquisition will have any bearing on future product development, though it seems the acquisition targets the significant intellectual property value of ARM, whose designs can be found in most smartphones.
Subject: Processors, Mobile | July 11, 2016 - 11:44 AM | Sebastian Peak
Tagged: SoC, Snapdragon 821, snapdragon, qualcomm, adreno 530
Announced today, the Snapdragon 821 offers a modest CPU frequency increase over the Snapdragon 820, with clock speeds of up to 2.4 GHz compared to 2.2 GHz with the Snapdragon 820. The new SoC is still implementing Qualcomm's custom quad-core "Kryo" design, which is made up of two pairs of dual-core CPU clusters.
"What isn’t in this announcement is that the power cluster will likely be above 2 GHz and GPU clocks look to be around 650 MHz but without knowing whether there are some changes other than clock relative to Adreno 530 we can’t really estimate the performance of this part."
Specifics on the Adreno GPU were not mentioned in the official announcement. The 650 MHz GPU clock reported by Anandtech would offer a modest improvement over the SD820's 624 MHz Adreno 530 GPU. Additionally, the "power cluster" will reportedly move from 1.6 GHz with the SD820 to 2.0 GHz with the SD821.
No telling when this updated SoC will find its way into consumer devices, with the Snapdragon 820 currently available in the Samsung Galaxy S7/S7 Edge, LG G5, OnePlus 3, and a few others.
Subject: Graphics Cards, Processors | June 29, 2016 - 07:27 AM | Sebastian Peak
Tagged: RX 490, radeon, processors, Polaris, graphics card, Bristol Ridge, APU, amd, A12-9800
AMD's current "We're in the Game" promotion offers a glimpse at upcoming product names, including the Radeon RX 490 graphics card, and the new Bristol Ridge APUs.
Visit AMD's gaming promo page and click the link to "check eligibility" to see the following list of products, which includes the new product names:
It seems safe to assume that the new products listed - including the Radeon RX 490 - are close to release, though details on the high-end Polaris GPU are not mentioned. We do have details on the upcoming Bristol Ridge products, with this in-depth preview from Josh published back in April. The A12-9800 and A12-9800E are said to be the flagship products in this new 7th-gen lineup, so there will be new desktop parts with improved graphics soon.
Subject: Processors | June 27, 2016 - 02:40 PM | Jeremy Hellstrom
Tagged: dx12, 6700k, Intel, i7-6950X
[H]ard|OCP has been conducting tests using a variety of CPUs to see how well DX12 distributes load between cores as compared to DX11. Their final article which covers the 6700K and 6950X was done a little differently and so cannot be directly compared to the previously tested CPUs. That does not lower the value of the testing, scaling is still very obvious and the new tests were designed to highlight more common usage scenarios for gamers. Read on to see how well, or how poorly, Ashes of the Singularity scales when using DX12.
"This is our fourth and last installment of looking at the new DX12 API and how it works with a game such as Ashes of the Singularity. We have looked at how DX12 is better at distributing workloads across multiple CPU cores than DX11 in AotS when not GPU bound. This time we compare the latest Intel processors in GPU bound workloads."
Here are some more Processor articles from around the web:
- Intel Skylake Graphics: Windows 10 vs. Ubuntu 16.04 + Latest Open-Source Driver Code @ Phoronix
- AMD Wraith Cooler Performance on FX-6350 Black Edition CPU @ Neoseeker
- Athlon X4 880K @ Hardware Secrets
- AMD Athlon X4 845 CPU Review @ OCC
Subject: Processors | June 24, 2016 - 11:15 PM | Scott Michaud
Tagged: Intel, kaby lake, iGPU, h.265, hevc, vp8, vp9, codec, codecs
Fudzilla isn't really talking about their sources, so it's difficult to gauge how confident we should be, but they claim to have information about the video codecs supported by Kaby Lake's iGPU. This update is supposed to include hardware support for HDR video, the Rec.2020 color gamut, and HDCP 2.2, because, if videos are pirated prior to their release date, the solution is clearly to punish your paying customers with restrictive, compatibility-breaking technology. Time-traveling pirates are the worst.
According to their report, Kaby Lake-S will support VP8, VP9, HEVC 8b, and HEVC 10b, both encode and decode. However, they then go on to say that 10-bit VP9 and 10-bit HEVC 10b does not include hardware encoding. I'm not too knowledgeable about video codecs, but I don't know of any benefits to encoding 8-bit HEVC Main 10. Perhaps someone in our comments can clarify.
Subject: Processors | June 21, 2016 - 10:00 PM | Scott Michaud
Update (June 22nd @ 12:36 AM): Errrr. Right. Accidentally referred to the CPU in terms of TFLOPs. That's incorrect -- it's not a floating-point decimal processor. Should be trillions of operations per second (teraops). Whoops! Also, it has a die area of 64sq.mm, compared to 520sq.mm of something like GF110.
So this is an interesting news post. Graduate students at UCDavis have designed and produced a thousand-core CPU at IBM's facilities. The processor is manufactured on their 32nm process, which is quite old -- about half-way between NVIDIA's Fermi and Kepler if viewed from a GPU perspective. Its die area is not listed, though, but we've reached out to their press contact for more information. The chip can be clocked up to 1.78 GHz, yielding 1.78 teraops of theoretical performance.
These numbers tell us quite a bit.
The first thing that stands out to me is that the processor is clocked at 1.78 GHz, has 1000 cores, and is rated at 1.78 teraops. This is interesting because modern GPUs (note that this is not a GPU -- more on that later) are rated at twice the clock rate times the number of cores. The factor of two comes in with fused multiply-add (FMA), a*b + c, which can be easily implemented as a single instruction and are widely used in real-world calculations. Two mathematical operations in a single instruction yields a theoretical max of 2 times clock times core count. Since this processor does not count the factor of two, it seems like its instruction set is massively reduced compared to commercial processors.
If they even cut out FMA, what else did they remove from the instruction set? This would at least partially explain why the CPU has such a high theoretical throughput per transistor compared to, say, NVIDIA's GF110, which has a slightly lower TFLOP rating with about five times the transistor count -- and that's ignoring all of the complexity-saving tricks that GPUs play, that this chip does not. Update (June 22nd @ 12:36 AM): Again, none of this makes sense, because it's not a floating-point processor.
"Big Fermi" uses 3 billion transistors to achieve 1.5 TFLOPs when operating on 32 pieces of data simultaneously (see below). This processor does 1.78 teraops with 0.621 billion transistors.
On the other hand, this chip is different from GPUs in that it doesn't use their complexity-saving tricks. GPUs save die space by tying multiple threads together and forcing them to behave in lockstep. On NVIDIA hardware, 32 instructions are bound into a “warp”. On AMD, 64 make up a “wavefront”. On Intel's Xeon Phi, AVX-512 packs 16, 32-bit instructions together into a vector and operates them at once. GPUs use this architecture because, if you have a really big workload, you, chances are, have very related tasks; neighbouring pixels on a screen will be operating on the same material with slightly offset geometry, multiple vertexes of the same object will be deformed by the same process, and so forth.
This processor, on the other hand, has a thousand cores that are independent. Again, this is wasteful for tasks that map easily to single-instruction-multiple-data (SIMD) architectures, but the reverse (not wasteful in highly parallel tasks that SIMD is wasteful on) is also true. SIMD makes an assumption about your data and tries to optimize how it maps to the real-world -- it's either a valid assumption, or it's not. If it isn't? A chip like this would have multi-fold performance benefits, FLOP for
Subject: Processors | June 15, 2016 - 11:18 PM | Scott Michaud
Tagged: Zen, opteron, amd
We're beginning to see how the Zen architecture will affect AMD's entire product stack. This news refers to their Opteron line of CPUs, which are intended for servers and certain workstations. They tend to allow lots of memory, have lots of cores, and connect to a lot of I/O options and add-in boards at the same time.
In this case, Zen-based Opterons will be available in two, four, sixteen, and thirty-two core options, with two threads per core (yielding four, eight, thirty-two, and sixty-four threads, respectively). TDPs will range between 35W and 180W. Intel's Xeon E7 v4 goes up to 165W got 24 cores (on Broadwell-EX) so AMD has a little more headroom to play with for those extra eight cores. That is obviously a lot, and it should be, again, good for cloud applications that can be parallelized.
As for the I/O side of things, the rumored chip will have 128 PCIe 3.0 lanes. It's unclear whether that is per socket, or total. Its wording sounds like it is per-CPU, although much earlier rumors have said that it has 64 PCIe lanes per socket with dual-socket boards available. It will also support sixteen 10-Gigabit Ethernet connections, which, again, is great for servers, especially with virtualization.
These are expected to launch in 2017. Fudzilla claims that “very late 2016” is possible, but also that it will launch after high-end desktop, which are expected to be delayed until 2017.
Subject: Graphics Cards, Processors | June 13, 2016 - 03:51 PM | Scott Michaud
Tagged: amd, Polaris, Zen, Summit Ridge, rx 480, rx 470, rx 460
AMD has just unveiled their entire RX line of graphics cards at E3 2016's PC Gaming Show. It was a fairly short segment, but it had a few interesting points in it. At the end, they also gave another teaser of Summit Ridge, which uses the Zen architecture.
First, Polaris. As we know, the RX 480 was going to bring >5 TFLOPs at a $199 price point. They elaborated that this will apply to the 4GB version, which likely means that another version with more VRAM will be available, and that implies 8GB. Beyond the RX 480, AMD has also announced the RX 470 and RX 460. Little is known about the 470, but they mentioned that the 460 will have a <75W TDP. This is interesting because the PCIe bus provides 75W of power. This implies that it will not require any external power, and thus could be a cheap and powerful (in terms of esports titles) addition to an existing desktop. This is an interesting way to use the power savings of the die shrink to 14nm!
They also showed off a backpack VR rig. They didn't really elaborate, but it's here.
As for Zen? AMD showed the new architecture running DOOM, and added the circle-with-Zen branding to a 3D model of a CPU. Zen will be coming first to the enthusiast category with (up to?) eight cores, two threads per core (16 threads total).
The AMD Radeon RX 480 will launch on June 29th for $199 USD (4GB). None of the other products have a specific release date.
Subject: Processors | June 8, 2016 - 08:17 AM | Scott Michaud
Tagged: Xeon Phi, Intel, gpgpu
Intel's recent restructure had a much broader impact than I originally believed. Beyond the large number of employees who will lose their jobs, we're even seeing it affect other areas of the industry. Typically, ASUS releases their ZenPhone line with x86 processors, which I assumed was based on big subsidies from Intel to push their instruction set into new product categories. This year, ASUS chose the ARM-based Qualcomm Snapdragon, which seemed to me like Intel decided to stop the bleeding.
That brings us to today's news. After over 27 years at Intel, James Reinders accepted the company's early retirement offer, scheduled for his 10001st day with the company, and step down from his position as Intel's High Performance Computing Director. He worked on the Larabee and Xeon Phi initiatives, and published several books on parallelism.
According to his letter, it sounds like his retirement offer was part of a company-wide package, and not targeting his division specifically. That would sort-of make sense, because Intel is focusing on cloud and IoT. Xeon Phi is an area that Intel is battling NVIDIA for high-performance servers, and I would expect that it has potential for cloud-based applications. Then again, as I say that, AWS only has a handful of GPU instances, and they are running fairly old hardware at that, so maybe the demand isn't there yet.
Subject: Processors | June 7, 2016 - 03:29 PM | Ryan Shrout
Tagged: Intel, video, PAX, pax prime, i7-6950X, taser
Intel is partnering with 12 of their top system builders to build amazing PCs around the Core i7-6950X 10-core Extreme Edition processor and the SSD 750 Series drives. Intel will be raffling off 7 of these systems at PAX Prime in September. You can find out more details on the competition and how you can enter at http://inte.ly/rigchallenge.
As for us, we got a taser.
Subject: Processors | June 7, 2016 - 02:45 PM | Jeremy Hellstrom
Tagged: Zen, kaby lake, Intel, delayed, amd
Bad news upgraders, neither AMD nor Intel will be launching their new CPUs until the beginning of next year. Both AMD's Zen and Intel's Kaby Lake have now been delayed instead of launching in Q4 and Q3 of this year respectively. DigiTimes did not delve into the reasons behind the delay in AMD's 14nm GLOBALFOUNDRIES (and Samsung) sourced Zen but unfortunately the reasons beind Intel's delay are all too clear. With large stockpiles of Skylake and Haswell processors and systems based around them sitting in the channel, AMD's delay creates an opportunity for Intel and retailers to move that stock. Once Kaby Lake arrives the systems will no longer be attractive to consumers and the prices will plummet.
Here is to hoping AMD's delay does not imply anything serious, though the lack of a new product release at a time which traditionally sees sales increase is certainly going to hurt their bottom line for 2016.
"With the delays, the PC supply chain will not be able to begin mass production for the next-generation products until November or December and PC demand is also unlikely to pick up until the first quarter of 2017."
Here are some more Processor articles from around the web:
- Intel Core I7 6950X Extreme Edition Broadwell-E Overclocking Review @ OCC
- Intel Core i7 Extreme Edition @ Nitroware
Subject: Processors | June 7, 2016 - 09:39 AM | Scott Michaud
Tagged: xeon e7 v4, xeon e7, xeon, Intel, broadwell-ex, Broadwell
Yesterday, Intel launched eleven SKUs of Xeon processors that are based on Broadwell-EX. While I don't follow this product segment too closely, it's a bit surprising that Intel launched them so close to consumer-level Broadwell-E. Maybe I shouldn't be surprised, though.
These processors scale from four cores up to twenty-four of them, with HyperThreading. They are also available in cache sizes from 20MB up to 60MB. With Intel's Xeon naming scheme, the leading number immediately after the E7 in the product name denotes the number of CPUs that can be installed in a multi-socket system. The E7-8XXX line can be run in an eight-socket motherboard, while the E7-4XXX models are limited to four sockets per system. TDPs range between 115W and 165W, which is pretty high, but to be expected for a giant chip that runs at a fairly high frequency.
Intel Xeon E7 v4 launched on June 6th with listed prices between $1223 to $7174 per CPU.
Subject: Graphics Cards, Processors, Mobile | June 6, 2016 - 07:11 AM | Scott Michaud
Tagged: hsa 1.1, hsa
The HSA Foundation released version 1.1 of their specification, which focuses on “multi-vendor” compatibility. In this case, multi-vendor doesn't refer to companies that refused to join the HSA Foundation, namely Intel and NVIDIA, but rather multiple types of vendors. Rather than aligning with AMD's focus on CPU-GPU interactions, HSA 1.1 includes digital signal processors (DSPs), field-programmable gate arrays (FPGAs), and other accelerators. I can see this being useful in several places, especially on mobile, where cameras, sound processors, and CPU cores, and a GPU regularly share video buffers.
That said, the specification also mentions “more efficient interoperation with non-HSA compliant devices”. I'm not quite sure what that specifically refers to, but it could be important to keep an eye on for future details -- whether it is relevant for Intel and NVIDIA hardware (and so forth).
Charlie, down at SemiAccurate, notes that HSA 1.1 will run on all HSA 1.0-compliant hardware. This makes sense, but I can't see where this is explicitly mentioned in their press release. I'm guessing that Charlie was given some time on a conference call (or face-to-face) regarding this, but it's also possible that he may be mistaken. It's also possible that it is explicitly mentioned in the HSA Foundation's press blast and I just fail at reading comprehension.
If so, I'm sure that our comments will highlight my error.