NVIDIA Reveals 64-bit Denver CPU Core Details, Headed to New Tegra K1 Powered Devices Later This Year

Subject: Processors | August 11, 2014 - 10:06 PM |
Tagged: tegra k1, project denver, nvidia, Denver, ARMv8, arm, Android, 64-bit

During GTC 2014 NVIDIA launched the Tegra K1, a new mobile SoC that contains a powerful Kepler-based GPU. Initial processors (and the resultant design wins such as the Acer Chromebook 13 and Xiaomi Mi Pad) utilized four ARM Cortex-A15 cores for the CPU side of things, but later this year NVIDIA is deploying a variant of the Tegra K1 SoC that switches out the four A15 cores for two custom (NVIDIA developed) Denver CPU cores.

Today at the Hot Chips conference, NVIDIA revealed most of the juicy details on those new custom cores announced in January which will be used in devices later this year.

The custom 64-bit Denver CPU cores use a 7-way superscalar design and run a custom instruction set. Denver is a wide but in-order architecture that allows up to seven operations per clock cycle. NVIDIA is using a custom ISA and on-the-fly binary translation to convert ARMv8 instructions to microcode before execution. A software layer and 128MB cache enhance the Dynamic Code Optimization technology by allowing the processor to examine and optimize the ARM code, convert it to the custom instruction set, and further cache the converted microcode of frequently used applications in a cache (which can be bypassed for infrequently processed code). Using the wider execution engine and Dynamic Code Optimization (which is transparent to ARM developers and does not require updated applications), NVIDIA touts the dual Denver core Tegra K1 as being at least as powerful as the quad and octo-core packing competition.

Further, NVIDIA has claimed at at peak throughput (and in specific situations where application code and DCO can take full advantage of the 7-way execution engine) the Denver-based mobile SoC handily outpaces Intel’s Bay Trail, Apple’s A7 Cyclone, and Qualcomm’s Krait 400 CPU cores. In the results of a synthetic benchmark test provided to The Tech Report, the Denver cores were even challenging Intel’s Haswell-based Celeron 2955U processor. Keeping in mind that these are NVIDIA-provided numbers and likely the best results one can expect, Denver is still quite a bit more capable than existing cores. (Note that the Haswell chips would likely pull much farther ahead when presented with applications that cannot be easily executed in-order with limited instruction parallelism).

NVIDIA Denver CPU Core 64bit ARMv8 Tegra K1.png

NVIDIA is ratcheting up mobile CPU performance with its Denver cores, but it is also aiming for an efficient chip and has implemented several power saving tweaks. Beyond the decision to go with an in-order execution engine (with DCO hopefully mostly making up for that), the beefy Denver cores reportedly feature low latency power state transitions (e.g. between active and idle states), power gating, dynamic voltage, and dynamic clock scaling. The company claims that “Denver's performance will rival some mainstream PC-class CPUs at significantly reduced power consumption.” In real terms this should mean that the two Denver cores in place of the quad core A15 design in the Tegra K1 should not result in significantly lower battery life. The two K1 variants are said to be pin compatible such that OEMs and developers can easily bring upgraded models to market with the faster Denver cores.

NVIDIA Denver CPU cores in Tegra K1.png

For those curious, In the Tegra K1, the two Denver cores (clocked at up to 2.5GHz) share a 16-way L2 cache and each have 128KB instruction and 64KB data L1 caches to themselves. The 128MB Dynamic Code Optimization cache is held in system memory.

Denver is the first (custom) 64-bit ARM processor for Android (with Apple’s A7 being the first 64-bit smartphone chip), and NVIDIA is working on supporting the next generation Android OS known as Android L.

The dual Denver core Tegra K1 is coming later this year and I am excited to see how it performs. The current K1 chip already has a powerful fully CUDA compliant Kepler-based GPU which has enabled awesome projects such as computer vision and even prototype self-driving cars. With the new Kepler GPU and Denver CPU pairing, I’m looking forward to seeing how NVIDIA’s latest chip is put to work and the kinds of devices it enables.

Are you excited for the new Tegra K1 SoC with NVIDIA’s first fully custom cores?

Source: NVIDIA

Do you know Juno?

Subject: General Tech | July 3, 2014 - 09:39 AM |
Tagged: linux, linaro, juno, google, armv8-a, ARMv8, arm, Android

By now you should have read Ryan's post or listened to Josh talk about Juno on the PCPer Podcast but if you find yourself hungry for more information you can visit The Tech Report.  They discuss how the 64-bit Linaro is already able to take advantage of one of big.LITTLE's power efficiency optimization called Global Task Scheduling.  As Linaro releases monthly updates you can expect to see more features and better implementations as their take on the Android Open Source Project evolves.  Expect to see more of Juno and ARMv8 on review sites as we work out just how to benchmark these devices.

aosp.jpg

"ARM has created its own custom SoC and platform for 64-bit development. The folks at Linaro have used this Juno dev platform to port an early version of Android L to the ARMv8 instruction set. Here's a first look at the Juno hardware and the 64-bit software it enables."

Here is some more Tech News from around the web:

Tech Talk

ARM Ships Juno Development Platform for ARMv8-A Integration

Subject: Mobile | July 2, 2014 - 09:00 AM |
Tagged: linux, linaro, juno, google, armv8-a, ARMv8, arm, android l

Even though Apple has been shipping a 64-bit capable SoC since the release of the A7 part in September of 2013, the Android market has yet to see its first consumer 64-bit SoC release. That is about to change as we progress through the rest of 2014 and ARM is making sure that major software developers have the tools they need to be ready for the architecture shift. That help is will come in the form of the Juno ARM Development Platform (ADP) and 64-bit ready software stack.

Apple's A7 is the first core to implement ARMv8 but companies like Qualcomm, NVIDIA and course ARM have their own cores based on the 64-bit architecture. Much like we saw the with the 64-bit transition in the x86 ecosystem, ARMv8 will improve access to large datasets, will result in gains in performance thanks to increased register sizes, larger virtual address spaces above 4GB and more. ARM also improved performance of NEON (SIMD) and cryptography support while they were in there fixing up the house.

juno4.jpg

The Juno platform is the first 64-bit development platform to come directly from ARM and combines a host of components to create a reference hardware design for integrators and developers to target moving forward. Featuring a test chip built around Cortex-A57 (dual core), Cortex-A53 (quad core) and Mali-T624 (quad core), Juno allows software to target 64-bit development immediately without waiting for other SoC vendors to have product silicon ready. The hardware configuration implements big.LITTLE, OpenGL ES3.0 support, thermal and power management, Secure OS capability and more. In theory, ARM has built a platform that will be very similar to SoCs built by its partners in the coming months.

juno2.jpg

ARM isn't quite talking about the specific availability of the Juno platform, but for the target audience ARM should be able to provide the amount of development platforms necessary. Juno enables software development for 64-bit kernels, drivers, and tools and virtual machine hypervisors but it's not necessarily going to help developers writing generic applications. Think of Juno as the development platform for the low level designers and coders, not those that are migrating Facebook or Flappy Bird to your next smartphone.

The Juno platform helps ARM in a couple of specific ways. From a software perspective, it creates common foundation for the ARMv8 ecosystem and allows developer access to silicon before ARM's partners have prepared their own platforms. ARM claims that Juno is a fairly "neutral" platform so software developers won't feel like they are being funneled in one direction. I'd be curious what ARM's partners actually think about that though with the inclusion of Mali graphics, a product that ARM is definitely trying to promote in a competitive market.

juno1.jpg

Though the primary focus might be software, hardware partners will be able to benefit from Juno. On this board they will find the entire ARMv8 IP portfolio tested up to modern silicon. This should enable hardware vendors to see A57 and A53 working, in action and with the added benefit of a full big.LITTLE implementation. The hope is that this will dramatically accelerate the time to market for future 64-bit ARM designs.

The diagram above shows the full break down of the Juno SoC as well as some of the external connectivity on the board itself. The memory system is built around 8GB of DDR3 running at 12.8 GB/s and the is extensible through the PCI Express slots and the FPGA options. 

linaro.jpg

Of course hardware is only half the story - today Linaro is releasing a 64-bit port of the Android Open Source Project (AOSP) that will run on Juno. That, along with the Linux kernel v3.14 with ARMv8-A support should give developers the tools needed to write the applications, middleware and kernels for future hardware. Also worth noting on June 25th at Google I/O was the announcement of developer access coming for Android L. This build will support ARMv8-A as well.

The switch to 64-bit technology on ARM devices isn't going to happen overnight but ARM and its partners have put together a collective ecosystem that will allow the software and hardware developers to make transition as quick and, most importantly, as painless as possible. With outside pressure pushing on ARM and its low power processor designs, it is taking more of its fate in its own hands, pushing the 64-bit transition forward at an accelerated pace. This helps ARM in the mobile space, the consumer space as well as the enterprise markets, a key market for SoC growth.

AMD Shows Off ARM-Based Opteron A1100 Server Processor And Reference Motherboard

Subject: Processors | May 7, 2014 - 09:26 PM |
Tagged: TrustZone, server, seattle, PCI-E 3.0, opteron a1100, opteron, linux, Fedora, ddr4, ARMv8, arm, amd, 64-bit

AMD showed off its first ARM-based “Seattle” processor running on a reference platform motherboard at an event in San Francisco earlier this week. The new chip, which began sampling in March, is slated for general availability in Q4 2014. The “Seattle” processor will be officially labeled the AMD Opteron A1100.

During the press event, AMD demonstrated the Opteron A1100 running on a reference design motherboard (the Seattle Development Platform). The hardware was used to drive a LAMP software stack including an ARM optimized version of Linux based on RHEL, Apache 2.4.6, MySQL 5.5.35, and PHP 5.4.16. The server was then used to host a WordPress blog that included stream-able video.

AMD Seattle Development Platform Opteron A1100.jpg

Of course, the hardware itself is the new and interesting bit and thanks to the event we now have quite a few details to share.

The Opteron A1100 features eight ARM Cortex-A57 cores clocked at 2.0 GHz (or higher). AMD has further packed in an integrated memory controller, TrustZone encryption hardware, and floating point and NEON video acceleration hardware. Like a true SoC, the Opteron A1100 supports 8 lanes of PCI-E 3.0, eight SATA III 6Gbps ports, and two 10GbE network connections.

The Seattle processor has a total of 4MB of L2 cache (each pair of cores shares 1MB of L2) and 8MB L3 cache that all eight cores share. The integrated memory controller supports DDR3 and DDR4 memory in SO-DIMM, unbuffered DIMM, and registered ECC RDIMM forms (only one type per motherboard) enabling the ARM-based platform to be used in a wide range of server environments (enterprise, SMB, and home servers et al).

AMD has stated that the upcoming Opteron A1100 processor delivers between two and four times the performance of the existing Opteron X series (which uses four x86 Jaguar cores clocked at 1.9 GHz). The A1100 has a 25W TDP and is manufactured by Global Foundries. Despite the slight increase in TDP versus the Opteron X series (the Opteron X2150 is a 22W part), AMD claims the increased performance results in notable improvements in compute/watt performance.

AMD Opteron Server Processor.png

AMD has engineered a reference motherboard though partners will also be able to provide customized solutions. The combination of reference motherboard and ARM-based Opteron A1100 is known at the Seattle Development Platform. This reference motherboard features four registered DDR3 DIMM slots for up to 128GB of memory, eight SATA 6Gbps ports, support for standard ATX power supplies, and multiple PCI-E connectors that can be configured to run as a single PCI-E 3.0 x8 slot or two PCI-E 3.0 x4 slots.

The Opteron A1100 is an interesting move from AMD that will target low power servers. the ARM-based server chip has an uphill battle in challenging x86-64 in this space, but the SoC does have several advantages in terms of compute performance per watt and overall cost. AMD has taken the SoC elements (integrated IO, memory, companion processor hardware) of the Opteron X series and its APUs in general, removed the graphics portion, and crammed in as many low power 64-bit ARM cores as possible. This configuration will have advantages over the Opteron X CPU+GPU APU when running applications that use multiple serial threads and can take advantage of large amounts of memory per node (up to 128GB). The A1100 should excel in serving up files and web pages or acting as a caching server where data can be held in memory for fast access.

I am looking forward to the launch as the 64-bit ARM architecture makes its first major inroads into the server market. The benchmarks, and ultimately software stack support, will determine how well it is received and if it ends up being a successful product for AMD, but at the very least it keeps Intel on its toes and offers up an alternative and competitive option.

Source: Tech Report

Qualcomm Reveals New Flagship Snapdragon 808 and 810 64-Bit SoCs Coming In 2015

Subject: Mobile | April 8, 2014 - 04:47 PM |
Tagged: SoC, snapdragon, qualcomm, LTE, ARMv8, adreno, 64-bit

Qualcomm has announced two new flagship 64-bit SoCs with the Snapdragon 808 and Snapdragon 810. The new chips will begin sampling later this year and should start showing up in high end smartphones towards the second half of 2015. The new 800-series parts join the previously announced mid-range Snapdragon 610 and 615 which are also 64-bit ARMv8 parts.

The Snapdragon 810 is Qualcomm's new flagship processor. The chip features four ARM Cortex A57 cores and four Cortex A53 cores in a big.LITTLE configuration, an Adreno 430 GPU, and support for Category 6 LTE (up to 300 Mbps downloads) and LPDDR4 memory. This flagship part uses the 64-bit ARMv8 ISA. The new Adreno 430 GPU integrated in the SoC is reportedly 30% faster than the Adreno 420 GPU in the Snapdragon 805 processor.

Qualcomm Snapdragon SoC.jpg

In addition to the flagship part, Qualcomm is also releasing the Snapdragon 808 which pairs two Cortex A57 CPU cores and four Cortex A53 CPU cores in a big.LITTLE configuration with an Adreno 418 (approximately 20% faster than the popular Adreno 320) GPU. This chip supports LPDDR3 memory and Qualcomm's new Category 6 LTE modem.

Both the 808 and 810 have Adreno GPUs which support OpenGL ES 3.1. The new chips support a slew of wireless I/O including Categrory 6 LTE, 802.11ac Wi-Fi, Bluetooth 4.1, and NFC.

Qualcomm is reportedly planning to produce these SoCs on a 20nm process. For reference, the mid-range 64-bit Snapdragon 610 and 615 use a 28nm LP manufacturing process. The new 20nm process (presumably from TSMC) should enable improved battery life and clockspeed headroom on the flagship parts. Exactly how big the mentioned gains will be will depend on the specific manufacturing process, with smaller gains from a bulk/planar process shrink or greater improvements coming from more advanced methods such as FD-SOI if the new chip on a 20nm process is the same transistor count as one on a 28nm process (which is being used in existing chips).

The 808 and 810 parts are the new high-end 64-bit chips which will effectively supplant the 32-bit Snapdragon 805 which is a marginal update over the Snapdragon 800. The naming conventions and product lineups are getting a bit crazy here, but suffice it to say that the 808 and 810 are the effective successors to the 800 while the 805 is a stop-gap upgrade while Qualcomm moves to 64-bit ARMv8 and secures manufacturing for the new chips which should be slightly faster CPU-wise, notably faster GPU-wise and more capable with the faster cellular modem support and 64-bit ISA support.

For those wondering, the press release also states that the company is still working on development of its custom 64-bit Krait CPU architecture. However, it does not appear that 64-bit Krait will be ready by the first half of 2015, which is why Qualcomm has opted to use ARM's Cortex A57 and A53 cores in its upcoming flagship 808 and 810 SoCs.

Source: Qualcomm

Computex 2013: MiTAC Announces High Density 7-Star ARMv8-Powered Server

Subject: General Tech, Systems | June 4, 2013 - 08:44 PM |
Tagged: computex 2013, computex, X-Gene, mitac, ARMv8, appliedmicro, 7-star, 64-bit

During Computex, MiTAC announced a new high density "7-Star" ARMv8 server. Aimed at the enterprise market, the 7-Star platform is a 4U server that holds up to 18 compute cards. Each compute card contains an eight-core ARMv8-based X-Gene processor from AppliedMicro, two DDR3 DIMM slots, and space for two 2.5"/3.5" internal storage drives (SSD or HDD). The compute cards use a 10G SFP+ and a single Gigabit Ethernet port for networking purposes.

MiTAC 7-Star Shown Off At Computex.jpg

Of course, the interesting bit about the 7-Star is that it is one of the first server to use processors based on ARM's 64-bit ARMv8 architecture. MiTAC worked with ARM and AppliedMicro on the project, and it should be available later this year. It is currently being shown off at the ARM Holdings demo suite in Taipei, Taiwan. I'm intested to see how well these 64-bit ARM servers do, especially with new low power chips from Intel and AMD on the way!

Read more about ARMv8 at PC Perspective.

The full press release is below:

Source: MiTAC

ARM Details First Quarter 2013 Finances, Company Revenue Up 26% YoY

Subject: General Tech | April 24, 2013 - 10:14 PM |
Tagged: SoC, mobile, ARMv8, arm

British chip design company ARM recently released an unaudited financial report with details on its Q1 2013 performance. The mobile SoC giant announced that it saw 2.6 million ARM chips in the first quarter of this year, a 35% improvement over last year and further evidence that ARM still dominates the low-power mobile market.

In fact, the chip designer made $94.9 million in licensing all those ARM chips, which was a big chunk of the company’s total Q1 2013 revenue of $263.9 million. Revenue was up by 26% versus the first quarter of the previous year (Q1 2012), which was only $209.4 million. Further, ARM’s profit (pre-tax) is 89.4 million pounds or approximately $137 million USD.

ARM Logo.jpg

ARM saw revenue from both licensing and royalties increase year over year (YoY) by 24% and 33% which indicates that more companies are jumping into the mobile and embedded markets with ARM chips or licenses to make custom designs of their own. According to the report, the company sold five-times more Mali GPUs, saw a 50% increase in ARM-powered embedded devices, and noticed a 25% increase in ARM mobile devices year over year respectively. ARM has also started moving ARMv8 (64-bit ARM) licenses. Of the total 22 licenses in Q1 2013, 7 of the licenses were for ARM’s Cortex-A50 series processors along with a single ARMv8 license (a total of 9 to date). In Q1 2013, ARM also sold three Mali GPU licenses, and one of those was for the company’s high-end Skymir GPU.

In all, ARM had a good first quarter and is showing signs of increased growth. With ARMv8 on the horizon, I am interested to see the company’s numbers next year and how they compare year over year as ARM attempts to take over the server room in particular. The profits and revenue are modest in comparison to X86 giant Intel's Q1 2013 results, but are not bad at all for a company that doesn’t produce chips itself!

You can find ARM's Q1 2013 report here.

Source: ARM

Calxeda gains some allies in the Server War

Subject: General Tech | October 10, 2012 - 10:57 AM |
Tagged: calxeda, arm, 64bit, ARMv8

There are two very big hurdles for Calxeda to overcome if it wants its ARM based servers to make any headway in the market.  The first is OS support which could be the hardest to overcome as they are dependant on programmers making Linux distributions like Ubuntu, Fedora, and openSUSE compatible with ARM chips, Microsoft has already announced that the first version of Windows Server 2012 will not support ARM.  Compatibility is something that Calxeda cannot fix on its own, however the lack of a x64 chip is something that they can work to solve and thanks to the $55M they just received they can now move forward on finishing the chip design.  That money came from an impressive list of allies including the current parent company of GLOBALFOUNDRIES, ATIC as well as ARM Holdings, Battery Ventures, Flybridge Capital Partners, and Highland Capital Partners and will be used to design the next Cortex A15 and an as of yet unnamed x64 chip.  Check out The Register for more.

calxeda-ECore2.jpg

"ARM chip upstart Calxeda is lining its coffers as it prepares to do battle with its 32-bit EnergyCore ECX-1000 processors, and two more cores in its roadmap, to conquer some corner of the server world.

Calxeda now has more than 100 employees, who work in its Austin, Texas headquarters as well as in development labs in Silicon Valley and throughout Asia, and it needs cash as it ramps up sales and etches future EnergyCore processors to handle heavy duty workloads and 64-bit code."

Here is some more Tech News from around the web:

Tech Talk

Source: The Register

ARM, TSMC to Produce 64-bit Processors With 3D Transistors

Subject: Processors | July 24, 2012 - 09:07 AM |
Tagged: TSMC, ARMv8, arm, 64-bit, 3d transistors, 20nm

 

Yesterday ARM announced a multi-year partnership with fab TSMC to produce sub-20nm processors that utilize 3D FinFET transistors. The collaboration and data sharing between the two companies will allow the fabless ARM SoC company the ability to produce physical processors based on its designs and will allow TSMC a platform to further its process nodes and FinFET transistor technology. The first TSMC-produced processors will be based on the ARMv8 architecture and will be 64-bit compatible.

ARMv8.jpg

The addition of 3D transistors will allow the ARM processors to be even more power efficient and suitable for both mobile devices. Alternatively, it could allow for higher clockspeeds at the same TDP ratings as current chips. The other big news is that the chips will be moving to a 64-bit compatible design, which is huge considering ARM processors have traditionally been 32-bit. By moving to 64-bit, ARM is positioning itself for server and workstation adoption, especially with the recent ARM-compatible Windows 8 build due to be released soon. Granted, ARM SoCs have a long way to go before taking market share from Intel and AMD in the desktop and server markets in a big way but it is slowly but surely becoming more competitive with the x86-64 giants.

TSMC’s R&D Vice President Cliff Hou stated that the collaboration between ARM and TSMC will allow TSMC to optimize its FinFET process to target “high speed, low voltage and low leakage.” ARM further qualified that the partnership would give ARM early access to the 3D transistor FinFET process that could help create advanced SoC designs and ramp up volume production.

I think this is a very positive move for ARM, and it should allow them to make much larger inroads into the higher-end computing markets and see higher adoption beyond mobile devices. On the other hand, it is going to depend on TSMC to keep up and get the process down. Considering the issues with creating enough 28nm silicon to meet demand for AMD and NVIDIA’s latest graphics cards, a sub-20nm process may be asking a lot. Here’s hoping that it’s a successful venture for both companies, however.

You can find more information in the full press release.

Source: Maximum PC

ARM follows Intel and AMD's 64 bit lead

Subject: General Tech | October 31, 2011 - 08:57 AM |
Tagged: cortex, ARMv8, arm, 64bit

We've now some more detailed information on ARMs new 64 bit ARMv8 processor and its strengths and weaknesses.  For the most part it resembles the 64 bit architecture that Intel and AMD use, an extended 32 bit architecture with several hold overs.  Perhaps the most disappointing is that ARM has the same 48 bit limit to virtual address space that the competition has.  If ARM had managed to overcome the limitations of canonical form addresses, they would have something that neither Intel nor AMD could bring to the server room. ARM desperately needs somthing to offer that the competition cannot if they are to convince admins to move from a familiar architecture to a brand new ARM architecture; power savings probably won't be enough.  Drop by The Inquirer to read up on the improved exception levels and encryption acceleration of the new ARMv8 architecture.

arm_holdings_arm_v8.jpg

"At the ARM TechCon conference in Santa Clara on Thursday, the top brass at ARM Holdings, the company that controls the core designs and licenses them to a slew of chip makers for modification in smartphones, tablets, and other embedded devices, showed off the new ARMv8 architecture. It's an incremental improvement over the current v7 architecture, just like the 64-bit extensions to the original 32-bit x86 processors from Intel and AMD were."

Here is some more Tech News from around the web:

Tech Talk

 

Source: The Register