GPU - Real World Tech

SuperComputing 19: HPC Meets Machine Learning

November 25, 2019 by David Kanter

For me, SC19 was about the fusion of machine learning and scientific computing. I learned about new technologies from Nvidia, Graphcore, and Cerebras Systems and spoke on a panel about the role of MLPerf in benchmarking HPC systems for machine learning and the many lessons learned.

A Look Inside Apple’s Custom GPU for the iPhone

October 25, 2016 by David Kanter

Pages: 1 2

Previously, Apple’s iPhones and iPads used PowerVR GPUs from Imagination Technologies for graphics. Based on our analysis, Apple has created a custom GPU that powers the A8, A9, and 10 processors, shipping in the iPhone 6 and later models, and some iPads. Using public documents, we demonstrate that the programmable shader cores inside Apple’s GPU are different from Imagination Technologies’ PowerVR and offer superior 16-bit floating-point performance and data conversion functions. We further believe that Apple has also developed a custom shader compiler and graphics driver. The proprietary design enables Apple to deliver best-in-class performance for graphics, and other tasks that use the GPU, such as image processing and machine learning.

Tile-based Rasterization in Nvidia GPUs

August 1, 2016 by David Kanter

Starting with the Maxwell GM20x architecture, Nvidia high-performance GPUs have borrowed techniques from low-power mobile graphics architectures. Specifically, Maxwell and Pascal use tile-based immediate-mode rasterizers that buffer pixel output, instead of conventional full-screen immediate-mode rasterizers. Using simple DirectX shaders, we demonstrate the tile-based rasterization in Nvidia’s Maxwell and Pascal GPUs and contrast this behavior to the immediate-mode rasterizer used by AMD.

Adaptive Clocking in AMD’s Steamroller

May 6, 2014 by David Kanter

My favorite paper from the ISSCC processor session describes an adaptive clocking technique implemented in AMD’s 28nm Steamroller core that compensates for power supply noise. Initial results show a 10-20% decrease in power consumption from reducing the voltage, with no loss in performance. This elegant technique is likely to be adopted across AMD’s entire product line including GPUs, x86 CPUs, ARM-based CPUs, and other critical blocks in highly integrated SoCs.

Intel’s Long Awaited Return to the Memory Business

April 23, 2013 by David Kanter

Pages: 1 2

Graphics is a focal point of the upcoming Haswell platform, necessitating a high bandwidth memory solution. To deliver high performance Intel is returning to the DRAM market, which it exited in 1985. The memory that ships with Haswell will be a custom embedded DRAM mounted in the package and manufactured on a variant of Intel’s 22nm process. By avoiding the commodity memory market, Intel will preserve high margins by cannibalizing discrete GPUs and dedicated graphics memory.

Lessons in Technology and Innovation from the iPad 3 Graphics and Display

January 1, 2013 by David Kanter

Pages: 1 2 3

The iPad 3 was an influential and successful tablet, but an excellent example of an unbalanced system. In particular, the superb Retina display was not adequately matched by the GPU of the A5X, and represented a step backwards in terms of graphics capabilities. This article explores the challenges of designing innovative products given the underlying technical constraints, through the lens of the iPad 3 and its successors.

Intel’s Near-Threshold Voltage Computing and Applications

September 18, 2012 by David Kanter

Pages: 1 2 3 4

Near-threshold voltage computing extends the voltage scaling associated with Moore’s Law and dramatically improves power and energy efficiency. The technology is superb for throughput, at the cost of latency, and best suited to Intel’s products for HPC and mobile graphics.

Computational Efficiency for CPUs and GPUs in 2012

July 25, 2012 by David Kanter

Pages: 1 2 3

New compute efficiency data shows GPUs with a clear edge over CPUs, but the gap is narrowing as CPUs adopt wide vectors (e.g. AVX). Surprisingly, a throughput CPU is the most energy efficient processor, offering hope for future architectures. Our data also shows some advantages of AMD’s Bulldozer, and the overhead associated with highly scalable server CPUs.

Intel’s Ivy Bridge Graphics Architecture

April 22, 2012 by David Kanter

Pages: 1 2 3 4 5 6 7 8

The Ivy Bridge GPU takes advantage of Intel’s 22nm FinFET process to nearly double performance and enhance programmability with DX11 and OpenCL 1.1 support. The new scalable architecture features more powerful shader cores, distributed sampling pipelines, a high bandwidth L3 cache, tesselation and 4K resolution displays. Overall, Ivy Bridge should be the highest performance integrated GPU at launch and Intel’s first competitive graphics offering.

Impressions of Kepler

March 22, 2012 by David Kanter

Pages: 1 2

Our first look at Kepler focuses on architectural changes to the shader core that emphasize graphics performance and the enhanced power management. Based on our analysis of Nvidia’s 28nm GPU strategy, we project a new shader core for throughput computing products and discuss the expected features.

SuperComputing 19: HPC Meets Machine Learning

A Look Inside Apple’s Custom GPU for the iPhone

Tile-based Rasterization in Nvidia GPUs

Adaptive Clocking in AMD’s Steamroller

Intel’s Long Awaited Return to the Memory Business

Lessons in Technology and Innovation from the iPad 3 Graphics and Display

Intel’s Near-Threshold Voltage Computing and Applications

Computational Efficiency for CPUs and GPUs in 2012

Intel’s Ivy Bridge Graphics Architecture

Impressions of Kepler

Editor’s Picks

AMD’s Cayman GPU Architecture

PhysX87: Software Deficiency

Intel’s Sandy Bridge Microarchitecture

RWT on Twitter