18
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 1 コンピュータアーキテクチャ 演習 の補足 情報工学系 吉瀬謙二 Kenji Kise, Department co Computer Science kise_at_c.titech.ac.jp Ver. 2020-10-12a 2020年度(令和2年)版 Course number: CSC.T363 Visit our support page https://www.arch.cs.titech.ac.jp/lecture/CA/

CSC.T363 Computer Architecture...2020/10/16  · CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 1 コンピュータアーキテクチャ演習の補足

  • Upload
    others

  • View
    21

  • Download
    0

Embed Size (px)

Citation preview

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 1

    コンピュータアーキテクチャ 演習の補足

    情報工学系 吉瀬謙二

    Kenji Kise, Department co Computer Sciencekise_at_c.titech.ac.jp

    Ver. 2020-10-12a2020年度(令和2年)版

    Course number: CSC.T363

    Visit our support page https://www.arch.cs.titech.ac.jp/lecture/CA/

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 2

    【重要】 ACRiルームのサーバの予約

    • ACRiルームのアカウントを使って、次のURLからログインする。

    • https://gw.acri.c.titech.ac.jp/

    • 「予約ページトップ」から、vs で始まるサーバの 10月20日(火) 12:00~15:00 の枠と、15:00~18:00の枠を予約すること。

    • 同じサーバーを連続して予約すること。

    • 12:00開始の枠は、30分前の 11:30 までに予約する必要があるので注意。

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 3

    コンピュータアーキテクチャComputer Architecture

    3. 半導体メモリMemory Technologies

    Ver. 2020-10-13a

    Course number: CSC.T363

    吉瀬 謙二 情報工学系Kenji Kise, Department of Computer Sciencekise _at_ c.titech.ac.jp

    2020年度(令和2年)版

    www.arch.cs.titech.ac.jp/lecture/CA/Tue 14:20 - 16:00, 16:15 - 17:55Fri 14:20 - 16:00

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 4

    Memory Hierarchy Technologies

    ◼ Caches use SRAM (static random access memory) for speed and technology compatibility◼ Low density (6 transistor cells),

    high power, expensive, fast

    ◼ Static: content will last “forever” (until power turned off)

    ◼ Main Memory uses DRAM for size (density)◼ High density (1 transistor cells), low power, cheap, slow

    ◼ Dynamic: needs to be “refreshed” regularly (~ every 8 ms)

    ◼ 1% to 2% of the active cycles of the DRAM

    ◼ Addresses divided into 2 halves (row and column)

    ◼ RAS or Row Access Strobe triggering row decoder

    ◼ CAS or Column Access Strobe triggering column selector

    Dout[15-0]

    SRAM

    2M x 16

    Din[15-0]

    Address

    Chip select

    Output enable

    Write enable

    16

    16

    21

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 5

    Row

    Decoder

    rowaddress

    data bit or word

    RAM CellArray

    word (row) line

    bit (data) lines

    Each intersection

    represents a

    6-T SRAM cell or a

    1-T DRAM cell

    Column Selector &

    I/O Circuitscolumn

    address

    One memory row holds a block of

    data, so the column address selects

    the requested bit or word from that

    block

    Classical RAM Organization (~Square)

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 6

    data bitdata bit

    Row

    Decoder

    rowaddress

    Column Selector &I/O Circuits

    columnaddress

    data bit

    word (row) line

    bit (data) lines

    Each intersection

    represents a

    1-T DRAM cell

    The column addressselects the requested bit from the row in each plane

    RAM CellArray

    Classical RAM Organization (~Square Planes)

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 7

    Asynchronous (非同期式) SRAMメモリ

    Datasheet

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 8

    • DRAM Organization:

    • N rows x N column x M-bit

    • Read or Write M-bit at a time

    • Each M-bit access requiresa RAS / CAS cycle

    Row Address

    CAS

    RAS

    Col Address Row Address Col Address

    N r

    ow

    s

    N cols

    DRAM

    M bit planes

    Row

    Address

    Column

    Address

    M-bit Output

    1st M-bit Access 2nd M-bit Access

    Cycle Time

    Classical DRAM Operation

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 9

    N r

    ow

    s

    N cols

    DRAM

    Column Address

    M-bit Output

    M bit planesN x M SRAM

    Row

    Address

    • Page Mode DRAM

    • N x M SRAM to save a row

    • After a row is read into the SRAM “register”• Only CAS is needed to access other

    M-bit words on that row

    • RAS remains asserted while CAS is toggled

    Row Address

    CAS

    RAS

    Col Address Col Address Col Address Col Address

    1st M-bit Access 2nd M-bit 3rd M-bit 4th M-bit

    Cycle Time

    Page Mode DRAM Operation

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 10N

    row

    s

    N cols

    DRAM

    Column Address

    M-bit Output

    M bit planes

    Row

    Address

    ❑ After a row is read into the SRAM

    register

    Inputs CAS as the starting “burst”

    address along with a burst length

    Transfers a burst of data from a series of

    sequential addresses within that row

    +1

    Row Address

    CAS

    RAS

    Col Address

    1st M-bit Access 2nd M-bit 3rd M-bit 4th M-bit

    Cycle Time

    Row Add

    N x M SRAM

    Synchronous DRAM (SDRAM) Operation

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 11

    • Double Data Rate SDRAMs – DDR-SDRAMs (and DDR-SRAMs)

    • Double data rate because they transfer data on both the rising and falling edge of the clock

    • Are the most widely used form of SDRAMs

    • DDR2-SDRAMs

    • DDR3-SDRAMs

    Other DRAM Architectures

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 12

    DRAM Memory Latency & Bandwidth Milestones

    DRAM Page

    DRAM

    FastPage

    DRAM

    FastPage

    DRAM

    Synch

    DRAM

    DDR

    SDRAM

    Module Width 16b 16b 32b 64b 64b 64b

    Year 1980 1983 1986 1993 1997 2000

    Mb/chip 0.06 0.25 1 16 64 256

    Die size (mm2) 35 45 70 130 170 204

    Pins/chip 16 16 18 20 54 66

    BWidth (MB/s) 13 40 160 267 640 1600

    Latency (nsec) 225 170 125 75 62 52

    In the time that the memory to processor bandwidth doubles the memory latency improves by a factor of only 1.2 to 1.4

    To deliver such high bandwidth, the internal DRAM has to be organized as interleaved memory banks

    Patterson, CACM Vol 47, #10, 2004

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 13

    コンピュータアーキテクチャComputer Architecture

    4. キャッシュ:ダイレクトマップ方式Caches: Direct-Mapped

    Ver. 2020-10-13a

    Course number: CSC.T363

    吉瀬 謙二 情報工学系Kenji Kise, Department of Computer Sciencekise _at_ c.titech.ac.jp

    www.arch.cs.titech.ac.jp/lecture/CA/Tue 14:20 - 16:00, 16:15 - 17:55Fri 14:20 - 16:00

    2020年度(令和2年)版

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 14

    A Typical Memory Hierarchy

    Second

    Level

    Cache

    (SRAM)

    Control

    Datapath

    Secondary

    Memory

    (Disk)

    On-Chip Components

    Re

    gF

    ile

    Main

    Memory

    (DRAM)Data

    Ca

    ch

    eIn

    str

    Ca

    ch

    e

    ITL

    BD

    TL

    B

    Speed (%cycles): ½’s 1’s 10’s 100’s 1,000’s

    Size (bytes): 100’s K’s 10K’s M’s G’s to T’s

    Cost: highest lowest

    ❑ By taking advantage of the principle of locality (局所性)

    Present much memory in the cheapest technology

    at the speed of fastest technology

    TLB: Translation Lookaside Buffer

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 15

    パレートの法則

    • Vilfredo Federico Damaso Pareto

    • イタリアの経済学者(1948 – 1923)

    • パレートの法則

    • 全体の数値の大部分は,全体を構成するうちの一部の要素が生み出している

    • 80:20の法則

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 16

    The Memory Hierarchy: Why Does it Work?

    • Temporal Locality (時間的局所性,Locality in Time):

    Keep most recently accessed data items closer to the processor

    • Spatial Locality (空間的局所性,Locality in Space):

    Move blocks consisting of contiguous words to the upper levels

    Lower LevelMemoryUpper Level

    MemoryTo Processor

    From ProcessorBlock X

    Block Y

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 17

    Cache

    • Two questions to answer (in hardware):

    • Q1: How do we know if a data item is in the cache?

    • Q2: If it is, how do we find it?

    • Direct mapped

    • For each item of data at the lower level, there is exactly one location in the cache where it might be - so lots of items at the lower level must share locations in the upper level

    • Address mapping:(block address) modulo (# of blocks in the cache)

    • First, consider block sizes of one word

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 18

    Caching: A Simple First Example

    00

    01

    10

    11

    Cache

    0000xx

    0001xx

    0010xx

    0011xx

    0100xx

    0101xx

    0110xx

    0111xx

    1000xx

    1001xx

    1010xx

    1011xx

    1100xx

    1101xx

    1110xx

    1111xx

    Main Memory

    Tag Data

    Q1: Is it there?

    Compare the cache tagto the high order 2 memory address bitsto tell if the memory block is in the cache

    ValidTwo low order bits define the byte in the word (32-b words)

    Q2: How do we find it?

    Use next 2 low order memory address bits –the index – to determine which cache block

    (block address) modulo (# of blocks in the cache)

    Index