30 nanoseconds is way too slow, down with the latency gap!

Subject: General Tech | February 23, 2017 - 03:45 PM |
Tagged: hbll, cache, l3 cache, Last Level Cache

There is an insidious latency gap lurking in your computer between your DRAM and your CPUs L3 cache.  The size of the latency depends on your processor as not all L3 cache are created equally but regardless there are wasted CPU cycles which could be reclaimed.   Piecemakers Technology, the Industrial Technology Research Institute of Taiwan and Intel are on the case, with a project to design something to fit in that niche between the CPU and DRAM.  Their prototype Last Level Cache is a chip with 17ns latency which would improve the efficiency at which L3 cache could be filled to pass onto the next level in the CPU.  The Register likens it to the way Intel has fit XPoint between the speed of SSDs and DRAM.  It will be interesting to see how this finds its way onto the market.

dram_l3_cache_gap.jpg

"Jim Handy of Objective Analysis writes about this: "Furthermore, there's a much larger latency gap between the processor's internal Level 3 cache and the system DRAM than there is between any adjacent cache levels.""

Here is some more Tech News from around the web:

Tech Talk

Source: The Register

An academic collaboration leads to a GPU/CPU collaboration

Subject: General Tech | February 8, 2012 - 05:13 PM |
Tagged: gpgpu, l3 cache, APU

Over at North Carolina State University, students Yi Yang, Ping Xiang and Dr. Huiyang Zhou, along with Mike Mantor of Advanced Micro Devices have been working on a way to improve how efficiently the GPU and CPU work together.  Our current generations of APU/GPGPUs, Llano and Sandy Bridge, have united the two processing units on a single substrate but as of yet they cannot efficiently pass operations back and forth.  This project works to leverage the L3 cache of the CPU to give a high speed bridge between the two processors, allowing the CPU to pass highly parallel tasks to the GPU for more efficient processing and letting the CPU deal with the complex operations it was designed for.  

Along with that bridge comes a change in the way the L2 prefetch is utilized; increasing memory access at that level frees up more for the L3 to pass data between CPU and GPU thanks to a specially designed preexecution unit triggered by the GPU and running on the CPU which will enable synchronized memory fetch instructions.  The result has been impressive, in their tests they saw an average improvement of 21.4% in performance.

APU.jpg

"Researchers from North Carolina State University have developed a new technique that allows graphics processing units (GPUs) and central processing units (CPUs) on a single chip to collaborate – boosting processor performance by an average of more than 20 percent.

"Chip manufacturers are now creating processors that have a 'fused architecture,' meaning that they include CPUs and GPUs on a single chip,” says Dr. Huiyang Zhou, an associate professor of electrical and computer engineering who co-authored a paper on the research. "This approach decreases manufacturing costs and makes computers more energy efficient. However, the CPU cores and GPU cores still work almost exclusively on separate functions. They rarely collaborate to execute any given program, so they aren’t as efficient as they could be. That's the issue we’re trying to resolve."

Here is some more Tech News from around the web:

Tech Talk