IDF 2014: Through Silicon Via - Connecting memory dies without wires

Subject: Storage, Shows and Expos | September 10, 2014 - 03:34 PM |
Tagged: TSV, Through Silicon Via, memory, idf 2014, idf

If you're a general computer user, you might have never heard the term "Through Silicon Via". If you geek out on photos of chip dies and wafers, and how chips are assembled and packaged, you might have heard about it. Regardless of your current knowledge of TSV, it's about to be a thing that impacts all of you in the near future.

Let's go into a bit of background first. We're going to talk about how chips are packaged. Micron has an excellent video on the process here:

The part we are going to focus on appears at 1:31 in the above video:

die wiring.png

This is how chip dies are currently connected to the outside world. The dies are stacked (four high in the above pic) and a machine has to individually wire them to a substrate, which in turn communicates with the rest of the system. As you might imagine, things get more complex with this process as you stack more and more dies on top of each other:

chip stacking.png

16 layer die stack, pic courtesy NovaChips

...so we have these microchips with extremely small features, but to connect them we are limited to a relatively bulky process (called package-on-package). Stacking these flat planes of storage is a tricky thing to do, and one would naturally want to limit how many of those wires you need to connect. The catch is that those wires also equate to available throughput from the device (i.e. one wire per bit of a data bus). So, just how can we improve this method and increase data bus widths, throughput, etc?

Before I answer that, let me lead up to it by showing how flash memory has just taken a leap in performance. Samsung has recently made the jump to VNAND:

vnand crop--.png

By stacking flash memory cells vertically within a die, Samsung was able to make many advances in flash memory, simply because they had more room within each die. Because of the complexity of the process, they also had to revert back to an older (larger) feature size. That compromise meant that the capacity of each die is similar to current 2D NAND tech, but the bonus is speed, longevity, and power reduction advantages by using this new process.

I showed you the VNAND example because it bears a striking resemblance to what is now happening in the area of die stacking and packaging. Imagine if you could stack dies by punching holes straight through them and making the connections directly through the bottom of each die. As it turns out, that's actually a thing:

tsv cross section.png

Read on for more info about TSV!

Author:
Manufacturer: Intel

Core M 5Y70 Early Testing

During a press session today with Intel, I was able to get some early performance results on Broadwell-Y in the form of the upcoming Core M 5Y70 processor.

llama1.jpg

Testing was done on a reference design platform code named Llama Mountain and at the heart of the system is the Broadwell-Y designed dual-core CPU, the Core M 5Y70, which is due out later this year. Power consumption of this system is low enough that Intel has built it with a fanless design. As we posted last week, this processor has a base frequency of just 1.10 GHz but it can boost as high as 2.6 GHz for extra performance when it's needed.

Before we dive into the actual result, you should keep in mind a couple of things. First, we didn't have to analyze the systems to check driver revisions, etc., so we are going on Intel's word that these are setup as you would expect to see them in the real world. Next, because of the disjointed nature of test were were able to run, the comparisons in our graphs aren't as great as I would like. Still, the results for the Core M 5Y70 are here should you want to compare them to any other scores you like.

First, let's take a look at old faithful: CineBench 11.5.

cb11.png

UPDATE: A previous version of this graph showed the TDP for the Intel Core M 5Y70 as 15 watts, not the 4.5 watt listed here now. The reasons are complicated. Even though the Intel Ark website lists the TDP of the Core M 5Y70, Intel has publicly stated the processor will make very short "spikes" at 15 watts when in its highest Turbo Boost modes. It comes to a discussion of semantics really. The cooling capability of the tablet is only targeted to 4.5-6.0 watts and those very short 15 watt spikes can be dissipated without the need for extra heatsink surface...because they are so short. SDP anyone? END UPDATE

With a score of 2.77, the Core M 5Y70 processor puts up an impressive fight against CPUs with much higher TDP settings. For example, Intel's own Pentium G3258 gets a score of 2.71 in CB11, and did so with a considerably higher thermal envelope. The Core i3-4330 scores 38% higher than the Core M 5Y70 but it requires a TDP 3.6-times larger to do so. Both of AMD's APUs in the 45 watt envelope fail to keep up with Core M.

Continue reading our preview of Intel Core M 5Y70 Performance!!

IDF 2014: Skylake Silicon Up and Running for 2H 2015 Release

Subject: Shows and Expos | September 9, 2014 - 05:27 PM |
Tagged: Skylake, Intel, idf 2014, idf, 14nm

2015 is shaping up to be an interesting year for Intel's consumer processor product lines. We are still expected to see Broadwell make some kind of debut in a socketed form in addition to the mobile releases trickling out beginning this holiday, but it looks like we will also get our first taste of Skylake late next year.

skylake1.jpg

Skylake is Intel's next microarchitecture and will be built on the same 14nm process technology currently shipping with Broadwell-Y. Intel stated that it expects to see dramatic improvements in all areas of measurement including performance, power consumption and silicon efficiency.

On stage the company demoed Skylake running the 3DMark Fire Strike benchmark though without providing any kind of performance result (obviously). That graphics demo was running on an engineering development board and platform and though it looked incredibly good from where we were sitting, we can't make any guess as to the performance quite yet.

skylake3.jpg

Intel then surprised us by bringing a notebook out from behind the monitor showing Skylake up and running in a mobile form factor decoding and playing back 4K video. Once again, the demo was smooth and impressive though you expect no more from an overly rehearsed keynote.

skylake2.jpg

Intel concluded that it was "excited about the health of Skylake" and that they should be in mass production in the first quarter of 2015 with samples going out to customers. Looking even further down the rabbit hole the company believes they have a "great line of sight to 10nm and beyond." 

Even though details were sparse, it is good news for Intel that they would be willing to show Skylake so early and yet I can't help but worry about a potentially shorter-than-expected life span for Broadwell in the desktop space. Mobile users will find the increased emphasis on power efficiency a big win for thin and light notebooks but enthusiast are still on the look out for a new product to really drive performance up in the mainstream.

IDF 2014: Western Digital announces new Ae HDD series for archival / cold storage

Subject: Storage, Shows and Expos | September 9, 2014 - 04:51 PM |
Tagged: WDC< Western Digital, WD, idf 2014, idf, hdd, Cold, Archival, Ae

We talked about helium filled, shingled HDD's from HGST earlier today. Helium may give you reduced power demands, but at the added expensive of hermetically sealed enclosures over conventional HDD's. Shingling may give added capacity, but at the expense of being forced into specific writing methods. Now we know Western Digital's angle into archival / cold storage:

WD_AE_PRN.jpg

..so instead of going with higher cost newer technologies, WD is taking their consumer products and making them more robust. They are also getting rid of the conventional thinking of capacity increments and are moving to 100GB increments. The idea is that once a large company or distributor has qualified a specific HDD model on their hardware, that model will stick around for a while, but be continued at an increased capacity as platter density yields increase over time. WD has also told me that capacities may even be mixed an matched within a 20-box of drives, so long as the average capacity matches the box label. This works in the field of archival / cold storage for a few reasons:

  • Archival storage systems generally do not use conventional RAID (where an entire array of matching capacity disks are spinning simultaneously). Drives are spun up and written to individually, or spun up individually to service the occasional read request. This saves power overall, and it also means the individual drives can vary in capacity with no ill effects.
  • Allowing for variable capacity binning helps WD ship more usable platters/drives overall (i.e. not rejecting drives that can't meet 6TB). This should drive overall costs down.
  • Increasing capacity by only a few hundred GB per drive turns into *huge* differences in cost when you scale that difference up to the number of drives you would need to handle a very large total capacity (i.e. Exabytes).

So the idea here is that WD is choosing to stick with what they do best, which they can potentially do for even cheaper than their consumer products. That said, this is really meant for enterprise use and not as a way for a home power user to save a few bucks on a half-dozen drives for their home NAS. You really need an infrastructure in place that can handle variable capacity drives seamlessly. While these drives do not employ SMR to get greater capacity, that may work out as a bonus, as writes can be performed in a way that all systems are currently compatible with (even though I suspect they will be tuned more for sequential write workloads).

Here's an illustration of this difference:

capacity 1.png

The 'old' method meant that drives on the left half of the above bell curve would have to be sold as 5TB units.

capacity 2.png

With the 'new' method, drives can be sold based on a spec closer to their actual capacity yield. For a given model, shipping capacities would increase as time goes on (top to bottom of the above graphic).

To further clarify what is meant by the term 'cold storage' - the data itself is cold, as in rarely if ever accessed:

tiers.png

Examples of this would be Facebook posts / images from months or years ago. That data may be rarely touched, but it needs to be accessible enough to be browsed to via the internet. The few second archival HDD spinup can handle this sort of thing, while a tape system would take far too long and would likely timeout that data request.

WD's Ae press blast after the break.

IDF 2014: HGST announces 3.2TB NVMe SSDs, shingled 10TB HDDs

Subject: Storage, Shows and Expos | September 9, 2014 - 02:00 PM |
Tagged: ssd, SMR, pcie, NVMe, idf 2014, idf, hgst, hdd, 10TB

It's the first day of IDF, so it's only natural that we see a bunch of non-IDF news start pouring out :). I'll kick them off with a few announcements from HGST. First item up is their new SN100 line of PCIe SSDs:

Ultrastar_SN100_Family_CMYK_Master.jpg

These are NVMe capable PCIe SSDs, available from 800GB to 3.2TB capacities and in (PCI-based - not SATA) 2.5" as well as half-height PCIe cards.

Next up is an expansion of their HelioSeal (Helium filled) drive line:

10TB_Market_applications_HR.jpg

Through the use of Shingled Magnetic Recording (SMR), HGST can make an even bigger improvement in storage densities. This does not come completely free, as due to the way SMR writes to the disk, it is primarily meant to be a sequential write / random access read storage device. Picture roofing shingles, but for hard drives. The tracks are slightly overlapped as they are written to disk. This increases density greatly, but writting to the middle of a shingled section is not possible without potentially overwriting two shingled tracks simultaneously. Think of it as CD-RW writing, but for hard disks. This tech is primarily geared towards 'cold storage', or data that is not actively being written. Think archival data. The ability to still read that data randomly and on demand makes these drives more appealing than retrieving that same data from tape-based archival methods.

Further details on the above releases is scarce at present, but we will keep you posted on further details as they develop.

Full press blast for the SN100 after the break.

Source: HGST

Intel Developer Forum (IDF) 2014 Keynote Live Blog

Subject: Processors, Shows and Expos | September 9, 2014 - 11:02 AM |
Tagged: idf, idf 2014, Intel, keynote, live blog

Today is the beginning of the 2014 Intel Developer Forum in San Francisco!  Join me at 9am PT for the first of our live blogs of the main Intel keynote where we will learn what direction Intel is taking on many fronts!

intelicon.jpg

NVIDIA Announces GAME24: Global PC Gaming Event

Subject: General Tech, Shows and Expos | September 2, 2014 - 05:51 PM |
Tagged: nvidia, game24, pc gaming

At 6PM PDT on September 18th, 2014, NVIDIA and partners will be hosting GAME24. The evemt will start at that time, all around the world, and finish 24 hours later. The three main event locations are Los Angeles, California, USA; London, England; and Shanghai, China. Four, smaller events will be held in Chicago, Illinois, USA; Indianapolis, Indiana, USA; Mission Viejo, California, USA; and Stockholm, Sweden. It will also be live streamed on the official website.

nvidia-game24-2014.png

Registration and attendance is free. If you will be in the area and want to join, sign up. Registration closes an hour before the event, but it is first-come-first-serve. Good luck. Have fun. Good game.

Source: NVIDIA

Tune in this Saturday! Celebrate 30 Years of Graphics and Gaming

Subject: General Tech, Shows and Expos | August 22, 2014 - 04:53 PM |
Tagged: richard huddy, kick ass, amd

amd-radeon-graphics-30-years.png

Join AMD’s Chief Gaming Scientist, Richard Huddy on Saturday, Aug. 23, 2014 at 10:00 AM EDT/7:00 AM PDT to celebrate 30 Years of Graphics and Gaming.  The event will feature interviews with Raja Koduri, AMD’s Corporate VP, Visual Computing; John Byrne, AMD’s Senior VP and General Manager, Computing and Graphics Business Group; and several special guests.   You can also expect new product announcements along with stories covering the history of AMD.  You can watch the twitch.tv livestream below once the festivities kick off!

Watch live video from AMD on www.twitch.tv

There is also a contest for those who follow @AMDRadeon and retweet their tweet of "Follow @AMDRadeon Tune into #AMD30Live 8/23/14 at 9AM CT www.amd.com/AMD30Live – Follow & Retweet for a chance to win! www.amd.com/AMD30Live"

Source: AMD

Khronos Announces "Next" OpenGL & Releases OpenGL 4.5

Subject: General Tech, Graphics Cards, Shows and Expos | August 15, 2014 - 08:33 PM |
Tagged: siggraph 2014, Siggraph, OpenGL Next, opengl 4.5, opengl, nvidia, Mantle, Khronos, Intel, DirectX 12, amd

Let's be clear: there are two stories here. The first is the release of OpenGL 4.5 and the second is the announcement of the "Next Generation OpenGL Initiative". They both occur on the same press release, but they are two, different statements.

OpenGL 4.5 Released

OpenGL 4.5 expands the core specification with a few extensions. Compatible hardware, with OpenGL 4.5 drivers, will be guaranteed to support these. This includes features like direct_state_access, which allows accessing objects in a context without binding to it, and support of OpenGL ES3.1 features that are traditionally missing from OpenGL 4, which allows easier porting of OpenGL ES3.1 applications to OpenGL.

opengl_logo.jpg

It also adds a few new extensions as an option:

ARB_pipeline_statistics_query lets a developer ask the GPU what it has been doing. This could be useful for "profiling" an application (list completed work to identify optimization points).

ARB_sparse_buffer allows developers to perform calculations on pieces of generic buffers, without loading it all into memory. This is similar to ARB_sparse_textures... except that those are for textures. Buffers are useful for things like vertex data (and so forth).

ARB_transform_feedback_overflow_query is apparently designed to let developers choose whether or not to draw objects based on whether the buffer is overflowed. I might be wrong, but it seems like this would be useful for deciding whether or not to draw objects generated by geometry shaders.

KHR_blend_equation_advanced allows new blending equations between objects. If you use Photoshop, this would be "multiply", "screen", "darken", "lighten", "difference", and so forth. On NVIDIA's side, this will be directly supported on Maxwell and Tegra K1 (and later). Fermi and Kepler will support the functionality, but the driver will perform the calculations with shaders. AMD has yet to comment, as far as I can tell.

nvidia-opengl-debugger.jpg

Image from NVIDIA GTC Presentation

If you are a developer, NVIDIA has launched 340.65 (340.23.01 for Linux) beta drivers for developers. If you are not looking to create OpenGL 4.5 applications, do not get this driver. You really should not have any use for it, at all.

Next Generation OpenGL Initiative Announced

The Khronos Group has also announced "a call for participation" to outline a new specification for graphics and compute. They want it to allow developers explicit control over CPU and GPU tasks, be multithreaded, have minimal overhead, have a common shader language, and "rigorous conformance testing". This sounds a lot like the design goals of Mantle (and what we know of DirectX 12).

amd-mantle-queues.jpg

And really, from what I hear and understand, that is what OpenGL needs at this point. Graphics cards look nothing like they did a decade ago (or over two decades ago). They each have very similar interfaces and data structures, even if their fundamental architectures vary greatly. If we can draw a line in the sand, legacy APIs can be supported but not optimized heavily by the drivers. After a short time, available performance for legacy applications would be so high that it wouldn't matter, as long as they continue to run.

Add to it, next-generation drivers should be significantly easier to develop, considering the reduced error checking (and other responsibilities). As I said on Intel's DirectX 12 story, it is still unclear whether it will lead to enough performance increase to make most optimizations, such as those which increase workload or developer effort in exchange for queuing fewer GPU commands, unnecessary. We will need to wait for game developers to use it for a bit before we know.

Intel and Microsoft Show DirectX 12 Demo and Benchmark

Subject: General Tech, Graphics Cards, Processors, Mobile, Shows and Expos | August 13, 2014 - 09:55 PM |
Tagged: siggraph 2014, Siggraph, microsoft, Intel, DirectX 12, directx 11, DirectX

Along with GDC Europe and Gamescom, Siggraph 2014 is going on in Vancouver, BC. At it, Intel had a DirectX 12 demo at their booth. This scene, containing 50,000 asteroids, each in its own draw call, was developed on both Direct3D 11 and Direct3D 12 code paths and could apparently be switched while the demo is running. Intel claims to have measured both power as well as frame rate.

intel-dx12-LockedFPS.png

Variable power to hit a desired frame rate, DX11 and DX12.

The test system is a Surface Pro 3 with an Intel HD 4400 GPU. Doing a bit of digging, this would make it the i5-based Surface Pro 3. Removing another shovel-load of mystery, this would be the Intel Core i5-4300U with two cores, four threads, 1.9 GHz base clock, up-to 2.9 GHz turbo clock, 3MB of cache, and (of course) based on the Haswell architecture.

While not top-of-the-line, it is also not bottom-of-the-barrel. It is a respectable CPU.

Intel's demo on this processor shows a significant power reduction in the CPU, and even a slight decrease in GPU power, for the same target frame rate. If power was not throttled, Intel's demo goes from 19 FPS all the way up to a playable 33 FPS.

Intel will discuss more during a video interview, tomorrow (Thursday) at 5pm EDT.

intel-dx12-unlockedFPS-1.jpg

Maximum power in DirectX 11 mode.

For my contribution to the story, I would like to address the first comment on the MSDN article. It claims that this is just an "ideal scenario" of a scene that is bottlenecked by draw calls. The thing is: that is the point. Sure, a game developer could optimize the scene to (maybe) instance objects together, and so forth, but that is unnecessary work. Why should programmers, or worse, artists, need to spend so much of their time developing art so that it could be batch together into fewer, bigger commands? Would it not be much easier, and all-around better, if the content could be developed as it most naturally comes together?

That, of course, depends on how much performance improvement we will see from DirectX 12, compared to theoretical max efficiency. If pushing two workloads through a DX12 GPU takes about the same time as pushing one, double-sized workload, then it allows developers to, literally, perform whatever solution is most direct.

intel-dx12-unlockedFPS-2.jpg

Maximum power when switching to DirectX 12 mode.

If, on the other hand, pushing two workloads is 1000x slower than pushing a single, double-sized one, but DirectX 11 was 10,000x slower, then it could be less relevant because developers will still need to do their tricks in those situations. The closer it gets, the fewer occasions that strict optimization is necessary.

If there are any DirectX 11 game developers, artists, and producers out there, we would like to hear from you. How much would a (let's say) 90% reduction in draw call latency (which is around what Mantle claims) give you, in terms of fewer required optimizations? Can you afford to solve problems "the naive way" now? Some of the time? Most of the time? Would it still be worth it to do things like object instancing and fewer, larger materials and shaders? How often?