4
3/12/07 CS Visit Days 1 A Sea Change in Processor Design 1 10 100 1000 10000 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 25%/year 52%/year 20%/year Uniprocessor SpecInt Performance: From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, 2006 All processor companies betting their future on many-core More instances of simpler processors are more power efficient Simple model for hardware scaling Moore’s Law and many-cores: 2X CPUs per chip / ~ 2 years

3/12/07CS Visit Days1 A Sea Change in Processor Design Uniprocessor SpecInt Performance: From Hennessy…

Embed Size (px)

DESCRIPTION

3/12/07CS Visit Days3 RAMP Blue Message passing machine MPI, UPC, Clusters 8 BEE2 modules, all-to-all connectivity over 10Gps links. Configuration, debugging, etc. over 100Mbps Enet. Initial applications: UPC NAS Parallel Benchmarks. 4 user FPGAs on each module hold 100MHz Xilinx MicroBlaze soft cores running UCLinux. Currently 8 cores per FPGA, 256 cores total. Each FPGA holds a packet network switch, shared memory controller, shared DP FPU, shared “console” switch

Citation preview

Page 1: 3/12/07CS Visit Days1 A Sea Change in Processor Design Uniprocessor SpecInt Performance: From Hennessy…

3/12/07 CS Visit Days 1

A Sea Change in Processor Design

1

10

100

1000

10000

1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006

Performance (vs. VAX-11/780) 25%/year

52%/year

20%/yearUniprocessor SpecInt Performance: From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, 2006

All processor companies betting their future on many-core More instances of simpler processors are more power efficient Simple model for hardware scaling

Moore’s Law and many-cores: 2X CPUs per chip / ~ 2 years

Page 2: 3/12/07CS Visit Days1 A Sea Change in Processor Design Uniprocessor SpecInt Performance: From Hennessy…

3/12/07 CS Visit Days 2

Problem with Sea Change: Compilers, operating systems, architectures not ready for 1000s of CPU per chip

How do we do research on 1000 CPU systems in arch., OS, compilers, OS, apps?

Research Accelerator for Multiple Processors

Solution: Create flexible (parameterized) 1000 CPU system from 10’s of FPGAs

Distribute out-of-the-box Massively Parallel Processor that runs standard binaries & OS

First system based on BEE2 (Berkeley Emulation Engine) - FPGA reconfigurable computing platform.

RAMP Description Language (RDL) defines and supports standard module interfaces and execution model. Used to describe “plumbing” to connect “gateware” units.

Page 3: 3/12/07CS Visit Days1 A Sea Change in Processor Design Uniprocessor SpecInt Performance: From Hennessy…

3/12/07 CS Visit Days 3

RAMP Blue Message passing machine

MPI, UPC, Clusters 8 BEE2 modules, all-to-all

connectivity over 10Gps links.

Configuration, debugging, etc. over 100Mbps Enet.

Initial applications: UPC NAS Parallel Benchmarks.

4 user FPGAs on each module hold 100MHz Xilinx MicroBlaze soft cores running UCLinux.

Currently 8 cores per FPGA, 256 cores total.

Each FPGA holds a packet network switch, shared memory controller, shared DP FPU, shared “console” switch

Page 4: 3/12/07CS Visit Days1 A Sea Change in Processor Design Uniprocessor SpecInt Performance: From Hennessy…

3/12/07 CS Visit Days 4

Computer Architecture & EngineeringCore Faculty:David Culler, Kurt Keutzer, John Kubiatowicz, Dave Patterson, John Wawrzynek, Krste Asanovic (defecting from MIT Fall’07)

Others active in computer architecture research:Demmel, Kahan, Katz, Nikolic, Rabaey, Smith, Yelick

Topics Covered:ManyCore Parallel Architectures and SystemsEmulation of Highly Parallel SystemsSelf aware computing systemsLow-power system designReconfigurable computing

Architectures for novel substrates