We don’t know much about it, but at the first annual Global Technology Conference hosted by GLOBALFOUNDRIES, AMD’s Chekib Akrout showed the first images of the upcoming Orochi core processor:

Breaking: AMD shows die of Orochi, a 32nm 8-core Bulldozer-based CPU - Processors 3

Here is what we know for sure about the upcoming Orochi processor: it is going to be the second 32nm product from AMD after the upcoming Llano Fusion core is built, it uses a set of 4 Bulldozer modules that bring 8 processing cores and 8 threads with integration of AMD’s unique SMT alternative. 

If you haven’t read details about the new Bulldozer core and what it has to offer, definitely check out our recent preview of the processor based on information revealed at the Hot Chips conference last month. 

Nothing else was shared about the Orochi CPU in particular but we thought the hardware porn was worth the mention!
    A good way to express what Bulldozer is can be summed up as “slimmed down, but double wide”.  For each traditional core, AMD has instituted a dual ALU design with robust floating point and SSE units.  Each core can handle two threads, like SMT, but actually has separate execution units which each process individual threads without sharing execution resources.

    Each unit features a single fetch and decode stage.  The decode stage is comprised of four units, but we do not yet know their inner workings.  In the previous K7/K10.5 generations of parts, there are three complex decode units.  On the Intel side with Core 2 and Nehalem, there are three simple decode units and a single complex.  AMD also did not cover subjects such as macro-ops and macro-op fusion.  AMD has beefed up their decode stage significantly though.  It simply had to, because it is now feeding dual integer schedulers and a floating point scheduler feeding 2 x 128 bit FMACs and MMX units.

Breaking: AMD shows die of Orochi, a 32nm 8-core Bulldozer-based CPU - Processors 4

    Fetch, decode, floating point/SSE, and the L2 cache are the shared components.  Since most workloads are integer based, AMD doubled the integer units.  These 128 bit packed integer pipes are a step above what was offered in the Phenom II.  In theory, there should be a sizeable per clock increase in integer and floating point apps on Bulldozer over the Phenom II.  When something is more heavily threaded, then we will see dramatic improvements in performance.  Each integer core features its own L1 D-cache.  AMD has again not clarified how much L1 or L2 cache there is for each discrete unit, or L3 cache sizes for the entire processor.