03.MM - Pregled Na Procesorska Tehnologija

Embed Size (px)

Citation preview

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    1/65

    . -

    -

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    2/65

    l l l l

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    3/65

    l PC

    l -

    l PC

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    4/65

    l

    l

    l (instruction set)

    l (clock cycle)

    l interrupt handlers .

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    5/65

    l

    l l l , , , l

    l l l , True/Falsel - l -

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    6/65

    l l l

    l ()

    l l

    ON/OFF

    l

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    7/65

    1988

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    8/65

    1997

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    9/65

    2003

    Ma infra m e

    Su p e rc o m p u te r

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    10/65

    2012

    l Multi- many-corel Gridl Cloud

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    11/65

    l l l l

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    12/65

    ISA instruction set architecture

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    13/65

    l : 1971-78

    l : 1979-85

    l : 1985-89

    l : 1990-20..

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    14/65

    : 1971-78

    l l < 50,000l < 0.5 MIPSl : 8-16

    l (= )l l + BASIC

    l :l Intel 4004, 8008, 8080, 8086l Zilog Z-80l Motorola 6800, 6502

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    15/65

    Intel 4004l

    l 1971l 8-bit , 4-

    bit l 2,300 l < 0.1

    MIPS

    l 8008: 8-bit 1972l 3,500

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    16/65

    Intel 8080l 16-bit l 1974l 4,800 l < 0.2 MIPSl Altair 8800

    1975l $300- $400l 256 bytes ;

    64K!

    l l 100-

    S-100,

    l Gates & Allen BASIC

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    17/65

    Intel 8086l 1978l < 0.5 MIPSl 16-bit l

    8080

    l 29,000 l

    , FP

    l 1981, IBM PC 8088--8-bit 8086

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    18/65

    : 1979-85

    l l 32-bit (68000)l l Workstations, Macs, PCs

    l >50,000l

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    19/65

    Motorola 68000l

    :l 32-bit

    l 16-bit l First flat 32-bit address

    l l

    l PDP-11l 1979

    l 68,000 l < 1 MIPS

    l l Apple Macl Sun, Silicon Graphics, & Apollo

    workstations

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    20/65

    : 1985-89

    l l

    mainframes

    l :l

    l RISC a take hold

    l < 500Kl > 5 MIPSl :

    l MIPS R2000, R3000l Sun SPARCl HP PA-RISC

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    21/65

    MIPS R2000l :

    l RISC l

    l (1 instruction/clock)

    l 1985l 125,000 l 5-8 MIPS

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    22/65

    : 1990-

    l l 64-bit l

    multiple-issue l

    l >1Ml Clock rate> 100MHzl > 50 MIPSl : Intel i860, Pentium, MIPS R4000,

    MIPS R1000, DEC Alpha, Sun UltraSPARC, HPPA-RISC, PowerPC

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    23/65

    4,5: 1995-

    l 4.5:l ,

    lAlpha 21264, Pentium III & 4, Intel Itanium

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    24/65

    l 1.6x l 1985-

    l l

    l

    l :l

    >

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    25/65

    l 2 l l

    (instruction level parallelismILP)

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    26/65

    l : f DRAM BWl CPU-DRAM

    30-50!

    l 1: l On-chip: 128 bytes (1984) 100K+ bytesl :

    l :1986l : 128KB to 4-16 MB

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    27/65

    l 2: :l

    l (1992)l :

    (1994)

    l (aware combos):, , l (prefetching):

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    28/65

    MIPS R4000

    l 64-bit l

    l On-chipl off-chip,

    l

    floating point

    l 1991:l l 1.4M l 100MHzl > 50 MIPS

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    29/65

    Intel i860l

    :

    l 2 instructions/clockl

    (issue) l Novel push pipelinel Novel cache bypass

    l 1991:l

    1.3M l 50 MIPS

    l (. )

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    30/65

    MIPS R10000l

    l out-of-order

    l 4 clock

    l 32 ( 32 in-flight)

    l

    l 1996:l 6.8M l 200 MHz

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    31/65

    Intel IA-64 Itaniuml

    EPIC :l

    l

    l & ILP .

    l Itaniuml (2001)l 25 M l 800 MHzl 130 Watts

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    32/65

    Intel

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    33/65

    Intel

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    34/65

    Intel

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    35/65

    l l l

    l

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    36/65

    Performa

    nce

    0.1

    1

    10

    100

    1965 1970 1975 1980 1985 1990 1995

    Supercomputers

    Minicomputers

    Mainframes

    Microprocessors

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    37/65

    ILP

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    38/65

    VLIW vs. superscalar

    l Software Techniquesl Static schedulingl Static issue (i.e. VLIW)

    l Static branch prediction

    l Alias/pointer analysisl Static speculation

    l Hardware Techniquesl Dynamic schedulingl Dynamic issue (i.e.

    superscalar)

    l Dynamic branch predictionl Dynamic disambiguationl Dynamic speculation

    Lower hardware complexity

    More, longer range analysis

    More machine dependence

    More stable performance

    Higher complexity

    Potential clock rate impact

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    39/65

    ILP

    Simplepipelining

    Scheduledpipelines

    Multipleissue

    Dynamicscheduling

    Speculation

    ILP

    Mountain

    Multilevelcaches & buffers

    Critical word &early restart

    Compilerprefetching

    Multipathprefetching

    Simplecaches

    Cache

    Mountain

    l No performance wall, but steeperslopes ahead.

    l Easier territory is behind us.l Industry-research gap vanished.l

    Energy efficiency may be key limit.

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    40/65

    2007

    l January 8l Intel releases the Core 2 Quad Q6600 processor. Price is US$851 in quantities of 1000. [2388.8]

    l Januaryl At the Consumer Electronics Show, Intel releases the 2.4 GHz Core 2 Quad processor, and Xeon 3200 series in 2.13

    GHz and 2.4 GHz speeds. [2301.24]

    l January 26l IBM and Intel announce a breakthrough in the production of the transistor, using halfnium to allow smaller traces at

    much greater energy efficiency. [2322]

    l(month unknown)l Intel shows an 80-core microprocessor. Performance is 1 teraflop, drawing 62 watts of power. Size is 275 square

    millimetres. The processor incorporates 100 million transistors in a 65 nanometre manufacturing process. [2387.8]

    l February 20l Advanced Micro Devices releases theAthlon 64 X2 Dual-Core 6000+ AM2 processor. Price is US$464 in quantities of

    1000. [2388.8]

    l Aprill In Beijing, China, the Spring Intel Development Forum is held. Intel demonstrates a processor with 80 cores,

    performing at 2 teraflops. [2396.8]

    l May 3l Advanced Micro Devices releases theAMD Turion TL-56 1.86 GHz processor. It features 1 MB cache, and 1.6 GHzhypertransport bus. Price is US$184 in quantities of 1000. [2388.8]l Advanced Micro Devices releases theAMD Turion TL-60 2 GHz processor. It features 1 MB cache, and 1.6 GHz

    hypertransport bus. Price is US$220 in quantities of 1000. [2388.8]l Advanced Micro Devices releases theAMD Turion TL-64 2.2 GHz processor. It features 1 MB cache, and 1.6 GHz

    hypertransport bus. Price is US$354 in quantities of 1000. [2388.8]

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    41/65

    l May 9l Intel releases the 1.8 GHz Core2Duo Mobile T7100 processor, with 4 MB cache, and 800 MHz front-sidede bus. Price

    is US$209 in quantites of 1000. Model T7300 (2 GHz) costs US$241; model T7500 (2.2 GHz) costs US$316; and

    model T7700 (2.4 GHz) costs US$530. [2390.8]l July 16

    l Intel releases the Core 2 Extreme Mobile X7800 processor, at US$851 in quantities of 1000. [2461.8]l Intel releases the Core 2 Duo E6540 and E6550 processors, at US$163 in quantities of 1000. Model Duo E6750 costs

    US$183; model Duo E6850 costs US$266; model Quad Q6700 costs US$530; model Extreme QX6800 costs US$999;model Extreme Qx6850 (3 GHz, 8 MB cache, 1333 MHz front-side bus) costs US$999. [2461.8]

    l July 27l Intel releases the 2.4 GHz Core 2 Duo E6600 processor, with 4 MB cache and 1066 MHz front-side bus. Price is US

    $316 in 1000-unit quantities. Model E6700 (2.66 GHz) costs US$530; model X6800 (2.93 GHz) costs US$999.[2461.8]

    l August 20l Advanced Micro Devices releases theAthlon 64 X2 Dual-Core 6400+ processor, US$251 in quantities of 1000.

    [2462.8]

    l September 10l Advanced Micro Devices unveils a new Opteron server chip, with four processor cores on one die. Each processor can

    have its own clock speed, including idle, consuming no power. Code-name during development was Barcelona.[2402.30] [2403.1]

    l November 11l Intel unveils new processors using a 45-nanometerprocessor. Code name during development was Penryn. [2322]

    lNovember 12

    l Intel releases the Core2 Extreme Quad Sx9650 processor, 3.0GHz, 12MB cache, 45nm process. Price is US$999 in

    1000-unit quantities. [2503.8]

    l November 19l Advanced Micro Devices releases the Phenom 9500 processor, at US$251. Model 9600 costs US$283. [2463.8]

    l December 23l Advanced Micro Devices releases the Phenom 9600 Black Edition microprocessor. Price is US$251 each in 1000-unit

    quantities. [2503.8]

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    42/65

    2008

    l January 7l Intel releases the Core2 Duo E8190 processor, using a 45nm trace process. Price is US$163 each in

    1000 unit quantities. Model E8200 is US$163; model E8400 is US$183; model E8500 is US$266.[2498.10]

    l Intel releases the Core2 Quad Q9300 processor, using a 45nm trace process. Price is US$266 each in1000 unit quantities. Model Q9450 is US$316; model Q9550 is US$530; model E8500 is US$266.[2498.10]

    l February 19l

    Intel releases the Core2 Extreme Quad QX9770 processor, 3.2GHz, 12MB cache, 1600MHz FSB, 45 nm,for US$1399 each in 1000-unit quantities. Model Qx9775 is US$1499. [2499.8]

    l March 2l Intel announces the name for its new low-power microprocessor family: "Atom". [2322]

    l March 26l Advanced Micro Devices unveiled the "Phenom" microprocessor for desktop personal computers. [2322]

    l April 2l At the Intel Developer Forum in Shanghai, China, Intel introduces the low-power Atom microprocessor,

    in speeds up to 1.86 GHz. [2322]

    l July 3l Advanced Micro Devices releases the Phenom 9950 Black Edition microprocessor. Price is US$235

    each in 1000-unit quantities. [2501.10]

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    43/65

    2010/2012

    l http://en.wikipedia.org/wiki/List_of_Intel_microprocessors

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    44/65

    l l l

    l

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    45/65

    Multithreading

    l 4 6

    l

    l / (threads) () ?

    l (simultaneousmultithreading) (SMT) (Hyperthreading)

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    46/65

    SMT

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    47/65

    l

    l 2-4 , 1.3X(throughput)

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    48/65

    l

    l L2 l Chip Communication

    CMPs (chip-levelmultiprocessor) +

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    49/65

    l 4 8

    l ()l l

    (e.g., DSP functionality)l l

    2010

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    50/65

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    51/65

    l (SMT) l (CMT)

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    52/65

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    53/65

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    54/65

    l , ,

    l

    :l

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    55/65

    l :l ,

    ,

    l . ()

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    56/65

    Sequence for superscalar

    l Slot issuing (functional units) presentedon figure

    l The superscalar executes a singleprogram or thread, from which itattempts to find multiple instructions toissue cach cycle

    l When it cannot, the issue slots gounused, and it incurs both horizontal

    and vertical waste

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    57/65

    Multithread superscalar

    l Multithread processors containhardware state (a program counter andregisters) for several threads

    l On any given cycle a processorexecutes instructions from one thethreads

    lOn the next cycle, it switches to adifferent thread context and executesinstruction from the new thread

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    58/65

    Multithread superscalar

    l The primary advanteges is in better rate of longlatency operations, effectively eliminating verticalwaste.

    l However, they cannot remove horizontal wastel Consequently, as instruction issue width continnes

    to increase, multithreaded architectures will

    ultimately suffer the same late as superscalars:l They will be limited by the ILP in a single thread.

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    59/65

    SMT sequence

    l Threads with high ILP can exploitparallelism by selecting instructions.The processor then dynamically

    schedules machine resources amongthe instructions, providing the highesthardware utilization

    l Multiple threads with low ILP can beexecuted together

    l So, SMT can recover issue slots to bothhorizontal and vertical waste

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    60/65

    Chip Multiprocessorl CMP is a very powerful techinique to obtain more

    performance in a power-effecient manner.

    l The idea is to set several microprocessors on asingle die Multiprocessor System-on-Chip(MPSoC)

    l The performance of small-scale CMP scales close tolinear with the number of microprocessors and islikely to exceed the performance of an equivalentmultiprocessor system

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    61/65

    Chip Multiprocessor model

    l CMP is an atractive option to use when movingtowards a new technology, such as SoC

    l Typical MPSoC applications are: networkprocessors, multimedia hubs, signal processors, etc

    l Usually implemented as heterogenous systemsl CMT and SMT can coexist-a CMP die can integrate

    several SMT microprocessors

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    62/65

    Superscalar proc. structure

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    63/65

    Multithreading proc. structure

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    64/65

    Chip multiprocessor structure

  • 7/28/2019 03.MM - Pregled Na Procesorska Tehnologija

    65/65

    ComparisonSuperscalar SMP CMP

    Number of CPUs' 1 1 8CPU issue width 12 12 2 per CPU

    Number of threads 1 8 1 per CPU

    Architecture registers (for integer and floating point) 32 32 per thread 32 per CPU

    Physical registers (for integer and floating point 32 + 256 256 + 256 32 + 32 per CPU

    Instruction window size 256 256 32 per CPU

    Branch predictor table size (entries) 32.768 32.768 8x4,096

    Return stack size 64 entries 64 entries 8x8 entriesInstruction (I) and data (D) cache organization 1x8 banks 1x8 banks 1 bank

    I and D cache sizes 128 kbytes 128 kbytes 16 kbytes per CPU

    I and D cache associativities 4-way 4-way 4-way

    I and 0 cache line sizes (bytes) 32 32 32

    I and P cache access times (cycles) 2 2 1

    Secondary cache organization (Mbytes) 1x8 banks 1x8 banks 1x8 banks

    Secondary cache size (bytes) 8 8 8

    Secondary cache associativity 4-way 4-way 4-way

    Secondary cache line size (bytes) 32 32 32

    Secondary cache access time (cycles) 5 5 7

    Secondary cache occupancy per access (cycles) 1 1 1

    Memory organization (no of banks) 4 4 4