Subject: General Tech, Processors | January 11, 2012 - 06:21 PM | Scott Michaud
Tagged: windows, processor, microsoft, cpu, bulldozer, amd
Let us take a little break from the CES news with a load of bull -- a download of Bulldozer. If you have an eerie sense of being in this situation before then you may in fact have a bad memory as it did in fact happen and it was only about a month ago. Microsoft released an update in mid-December to optimize their operating systems for AMD Bulldozer technology; that patch disappeared without any listed reason. As of today, we have access to both the patch as well as most of the reason for the delay in the first place.
You know: part of me wants to see a Bulldozer go 100MPH, and another part of me fears greatly.
The first order of business is to explain to those who have an AMD FX series, Opteron 4200 series, and/or an Opteron 6200 series processor how to increase their potential performance: KB 2646060 and KB 2645594 each contain a patch which will optimize Windows to the Bulldozer architecture for most users when both are applied.
It turns out that Microsoft pulled the Bulldozer update last month when discussions with AMD revealed that the patch would not provide the promised performance increases for most users. The problem specifically centers on the Core Parking feature within Windows 7 and Windows Server 2008 R2: after the hotfix in December was applied, Core Parking would still interfere with Bulldozer’s design by attempting to save power and sleep cores that were unused without understanding that Bulldozer cores are not typical cores. With Core Parking disabled for Bulldozer-based CPUs either through this hotfix or by changing your performance profiles to “High Performance” from the often default “Balanced” you would allow Bulldozer to run as it actually desires to run. According to how these bulletins are worded, should you have been on “High Performance” profile back in December before the hotfix was pulled you would have experienced what should only have been available starting today.
These performance increases are not for every application, however. AMD has stated that applications which are more sparsely multithreaded would benefit most from the update.
Workloads that are moderately threaded have the potential for uplift. This could include virtualization, database, or transactional environments that are “bursty” – a mixture of light and heavy transactions, or legacy applications that are by nature not very threaded. The more heavily threaded the application, the less the likely the uplift.
My intuition knowing this as well as the Core Parking issue is that once Windows finally wakes the Bulldozer core, your experience is maximal with the December patch; applications which only temporarily become multithreaded either do not wake the proper portions of the processor or wake the processor in time to be of maximum benefit.
It appears as if the removal of the hotfix last month was simply because AMD believed that while the patch was effective, it would not be correctly applied to the vast majority of customers without a second hotfix and thus give the appearance of little to no real benefits.
Subject: Processors | December 16, 2011 - 12:41 PM | Scott Michaud
Tagged: amd, bulldozer, cpu, processor, windows, microsoft
Intel was far from demolished when AMD's Bulldozer came to town. Users still clung to hope that Microsoft's Windows 7 was not optimized to take advantage of Bulldozer's multi-core environment. Vindication came sweetly with a knowledge base article and a patch from Microsoft confirming the issue and offering a solution. While they can still feel comfortable knowing they were right, the solution has been pulled from Microsoft's website without any announced reason. Who should we feel sorry for: those who didn't download it yet, or those who did?
To be entirely fair, Microsoft's knowledge base article was quite clear in its instruction to users regarding this hotfix.
A supported hotfix is available from Microsoft. However, this hotfix is intended to correct only the problem that is described in this article. Apply this hotfix only to systems that are experiencing the problem described in this article. This hotfix might receive additional testing. Therefore, if you are not severely affected by this problem, we recommend that you wait for the next software update that contains this hotfix.
Still, AMD users have another reason to be upset as if they needed one. The hotfix will come, and will come in completely stable form; it just looks like today is not that day. If you already received this update and have experienced technical difficulties, the comment form awaits.
Subject: Processors | December 16, 2011 - 01:56 AM | Tim Verry
Tagged: amd, bulldozer, cpu, processor, windows, microsoft
When AMD’s Bulldozer processors arrived, they were unable to best Intel’s fastest at most tasks. A number of users held out hope for Bulldozer; however, as it was discovered that Microsoft’s Windows 7 operating system was not optimized to take advantage of the multi-threaded execution scheduling engine. While MS has implemented this optimization in the Windows 8 kernel, the current stable release has been without a fix until recently. The fix in question is available for Windows 7 and Windows Server 2008 R2 and can be downloaded here. It should be noted that service pack 1 is a pre-requisite to this hot-fix.
Conservatively, previous indications suggested such a fix would add a 5 % to 10 % performance boost in multi-threaded applications. That number is based on the estimates from around the web from people comparing benchmarks between Windows 7 and Windows 8 Developer Preview. If you are running a Bulldozer processor in your machine, be sure to apply this update and let us know how performance improves.
Subject: Processors | July 16, 2011 - 12:54 AM | Tim Verry
Tagged: sandy bridge-e, processor, Intel, cpu
It seems as though intl is running into a slew of snags as they attempt to push out their Sandy Bridge-E processors and their accompanying X79 chipset motherboards. While it was previously thought that the Sandy Bridge-E processors would not be available until at least Janruary 2012, VR-Zone is reporting that the CPUs may actually be out in time for Christmas this year; however, they will have a reduced feature set. The X79 chipset that powers the Sandy Bridge-E processors will also be released with a reduced feature set. While Intel may reintroduce the removed features in later iterations of the silicon, the first run components will have PCI-Express 3.0 and four SATA/SAS 6Gbps ports removed. Further, Intel is waiting an extra CPU revision until it begins shipping the procesors out to board partners for their testing; the C-1 stepping instead of the C-0.
In the case of PCI-E 3.0 support, Intel has had trouble testing their engineering silicon with PCI-E 3.0 cards and is not confident enough to integrate it into their production chips at this time. due to the lack of widely available PCI-E 3.0 add-in cards, support for the standard is not that large of a loss in the short twrm but will certainly affect the component's future proofing value. The removal of the SATA ports is due to issues with storage that have yet to be detailed.
While new technology is always welcome, one cant help but feel that delaying the new processors and motherboards until the silicon is ready (and containing the planned features) may be better for consumers. The board and investors likely do not agree, however. In any case, Sandy Bridge-E and X79 are coming, it is just a question of how they come.
NCSU Researchers Tweak Core Prefetching And Bandwidth Allocation to Boost Mult-Core Performance By 40%
Subject: Processors | May 27, 2011 - 11:26 AM | Tim Verry
Tagged: processor, multi-core, efficiency, bandwidth, algorithm
With the clock speed arms race now behind us, the world has turned to increases in the number of processor cores to boost performance. As more applications are becoming multi-threaded, CPU core increases have become even more important. In the consumer space, quad and hexa-core chips are rather popular in the enthusiast segment. On the server side, eight core chips provide extreme levels of performance.
The way that most multi-core processors operate involves the various CPU cores having access to their own cache (Intel’s current gen chips actually have three levels of cache, with the third level being shared between all cores. This specific caching system; however, is beyond the scope of this article). This cache is extremely fast and keeps the processing core(s) fed with data which the processor then feeds through its assembly line-esque instruction pipeline(s). The cache is populated with data through a method called “prefetching.” This prefetching pulls data of running applications from RAM using mathematical algorithms to determine what the processor is likely going to need to process next. Unfortunately, while these predictive algorithms are usually correct, they sometimes make mistakes and the processor is not fed with data from the cache and thus must look for it elsewhere. These instances, called stalls, can severely degrade core performance as the processor must reach out past the cache and into the system memory (RAM), or worse, the even slower hard drive to find the data it needs. When the processor must reach beyond its on-die cache, it is required to use the system bus to query the RAM for data. This processor to RAM bus, while faster than reading from a disk drive, is much slower than the cache. Further, processors are restricted in the amount of available bandwidth between the CPU and the RAM. As the number of included cores increases, the amount of shared bandwidth each core has access to is greatly reduced.
The layout of a current Sandy Bridge Intel processor. Note the Cache and Memory I/O.
A team of researchers at North Carolina State University have been studying the above mentioned issues, which are inherent in multi-core processors. Specifically, the research team was part of North Carolina State University’s Department of Electrical and Computer Engineering, and includes Fang Liu and Yan Solihin who were funded in part by the National Science foundation. In a paper concluding their research that will be presented June 9th, 2011 at the international Conference on Measurement and Modeling of Computer Systems, they detail two methods for improving upon the current bandwidth allocation and cache prefetching implementations.
Dr. Yan Solihin, associate professor and co-author of the paper in question stated that certain processor cores require more bandwidth than others; therefore, by dynamically monitoring the type and amount of data being requested by each core, the amount of bandwidth available can be prioritized by a per-core basis. Solihin further stated that “by better distributing the bandwidth to the appropriate cores, the criteria are able to maximize system performance.”
Further, they have analyzed the data of the processors hardware counters and constructed a set of criteria that seek to improve efficiency by dynamically turning prefetching on and off on a per-core basis. By turning prefetching on and off on a per core basis, this further provides bandwidth to the cores that need it. By implementing both methods, the research team was able to improve multi-core performance by as much as 40 percent versus chips that do not prefetch data, and by 10 percent versus multi-core processors with cores that do prefetch data.
The researchers plan to detail their findings in a paper titled “Studying the Impact of Hardware Prefetching and Bandwidth Partitioning In Chip-Multiprocessors,” which will be publicly available on June 9th. The exact algorithms and criteria that they have determined will decrease the number of processor stalls and increase bandwidth efficiency will be extremely interesting to analyze. Further, it will be interesting to see if any of these improvements will be implemented by Intel or AMD in their future chips.
Recently, AMD launched two new AMD Embedded G-Series APUs (Accelerated Processing Units). The two new chips have a TDP rating of 5.5 and 6.4 watts, which represent a 39% improvement in power savings over the previous iterations. The 361mm² chip package is capable of being used in embedded systems without the need for a fan to cool it. The embedded chips include one or two low power x86 Bobcat processors and a discrete class DirectX 11 GPU on a single die.
AMD currently has three systems utilizing the new APUs, including a Pico-ITX form factor computer, a Qseven form factor computer, and a digital sign system. Buddy Broeker, the Director of Embedded Solutions for AMD stated that "today we take the ground-breaking AMD Fusion APU well below 7W TDP and shatter the accepted traditional threshold for across-the-board fanless enablement."
The two new chips are named the T40R and the T40E. The chips both run at 1.00GHz; however, the 6.4 watt TDP T40E is a dual core chip and the 5.5 watt TDP T40R is a single core variant. Both chips include an AMD Radeon 6250 GPU, a 64KB L1 cache, and a 512KB L2 cache per each CPU core. Further, the chips feature an integrated DDR3 memory controller that can support up to 667MHz solder-down SODIMMs or two DIMM slots. More details on the series as a whole can be found here.
Mobile and embedded processors continue to get smaller and faster. Have you seen any AMD powered embedded technology in your town?