25
CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 1 コンピュータアーキテクチャ 演習 の補足 情報工学系 吉瀬謙二 Kenji Kise, Department co Computer Science kise_at_c.titech.ac.jp Ver. 2020-10-12a 2020年度(令和2年)版 Course number: CSC.T363 Visit our support page https://www.arch.cs.titech.ac.jp/lecture/CA/

CSC.T363 Computer Architecture...2020/10/13  · CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 3 UART (Universal Asynchronous Receiver/Transmitter) •

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 1

    コンピュータアーキテクチャ 演習の補足

    情報工学系 吉瀬謙二

    Kenji Kise, Department co Computer Sciencekise_at_c.titech.ac.jp

    Ver. 2020-10-12a2020年度(令和2年)版

    Course number: CSC.T363

    Visit our support page https://www.arch.cs.titech.ac.jp/lecture/CA/

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 2

    【重要】 ACRiルームのサーバの予約

    • ACRiルームのアカウントを使って、次のURLからログインする。

    • https://gw.acri.c.titech.ac.jp/

    • 「予約ページトップ」から、vs で始まるサーバの 10月13日(火) 12:00~15:00 の枠と、15:00~18:00の枠を予約すること。

    • 同じサーバーを連続して予約すること。

    • 12:00開始の枠は、30分前の 11:30 までに予約する必要があるので注意。

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 3

    UART (Universal Asynchronous Receiver/Transmitter)

    • 調歩同期方式によるシリアル信号をパラレル信号に変換したり,その逆方向の変換をおこなう集積回路をUARTと呼ぶ.8ビット(1バイト)単位でデータを送信・受信する.

    • UARTを用いることで,FPGAとコンピュータの間でのお手軽なデータ通信が可能.

    • 例えば,’a’ という文字を送信する場合,’a’ は 8’h61, 8’b01100001 (次スライドのASCII Tableを参照)なので,下図のタイミングで送信線TXDを制御する.

    • データが送信されるまで送信線TXD を1とする.

    • まず,青色で示した0 (これをスタートビットと呼ぶ)を送信することで,データ送信の開始を明示.

    • 次に,黄色で示した様に送信したいデータ 8’b01100001 の最下位ビットから順番に送信する.

    • 最後に,赤色で示した1(これをストップビットと呼ぶ)を送信する.

    • 1ビットを送受信するための時間間隔は送信側と受信側で同じレートを用いる.これをボー・レート (baud) と呼ぶ.例えば、1000 baud であれば、1ビット送信の間隔は 1msec となる.

    3

    0 1 0 0 0 0 1 1 0

    Time

    1 1

    Startbit

    TXD

    Stopbit

    1

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 4

    Byte Addresses

    • Since 8-bit bytes are so useful, most architectures address individual bytes in memory

    • The memory address of a word must be a multiple of 4 (alignment restriction)

    • Big Endian:

    • Leftmost byte is word address

    • IBM 360/370, Motorola 68k, MIPS, SPARC, HP PA

    • Little Endian:

    • Rightmost byte is word address

    • Intel 80x86, DEC Vax, DEC Alpha (Windows NT)

    msb lsb

    3 2 1 0little endian byte 0

    0 1 2 3big endian byte 0

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 5

    コンピュータアーキテクチャComputer Architecture

    2. コンピュータの性能と消費電力の動向Trends in Performance and Power

    Ver. 2020-10-05a2020年度(令和2年)版

    Course number: CSC.T363

    www.arch.cs.titech.ac.jp/lecture/CA/Tue 14:20 - 16:00, 16:15 - 17:55Fri 14:20 - 16:00

    吉瀬 謙二 情報工学系Kenji Kise, Department of Computer Sciencekise _at_ c.titech.ac.jp

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 6

    Performance Factors

    Want to distinguish elapsed time and the time spent on our task

    CPU execution time (CPU time) : time the CPU spends working on a task

    Does not include time waiting for I/O or running other programs

    CPU execution time # CPU clock cycles

    for a program for a program= x clock cycle time

    CPU execution time # CPU clock cycles for a program

    for a program clock rate = -------------------------------------------

    ◼ Can improve performance by reducing either the length of the clock cycleor the number of clock cycles required for a program

    or

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 7

    Performance Factors

    Performance = f x IPCf: frequency (clock rate)

    IPC: retired instructions per cycle

    CPU execution time # CPU clock cycles for a program

    for a program clock rate = -------------------------------------------

    int flag = 1;

    int foo(){

    while(flag);

    }

    Performance = clock rate x 1 / # CPU clock cycles for a program

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 8

    Pollack’s Rule

    • Pollack's Rule states that microprocessor "performance increase due to microarchitecture advances is roughly proportional to the square root of the increase in complexity". Complexity in this context means processor logic, i.e. its area.

    • Superscalar, vector

    • Instruction level parallelism, data level parallelism

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 9

    From multi-core era to many-core era

    Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction, MICRO-36

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 10

    Power within a Microprocessor

    • Powerdynamic = 1/2 x Capacitive load x Voltage 2 x Frequency switched

    • Pdynamic = 1/2 x C x V2 x F

    • Power required per transistor

    • The first 32-bit microprocessors like Intel 80386 consumed less than two watt.

    • 3.3GHz Intel Core i7 consumes 130 watts.

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 11

    From multi-core era to many-core era

    Platform 2015: Intel® Processor and Platform Evolution for the Next Decade, 2005

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 12

    Intel Sandy Bridge, January 2011

    4 to 8 core

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 13

    アーキテクチャの異なる視点による分類

    Flynnによる命令とデータの流れに注目した並列計算機

    の分類(1966年)

    SISD (Single Instruction stream, Single Data stream)

    SIMD( Single Instruction stream, Multiple Data stream)

    MISD (Multiple Instruction stream, Single Data stream)

    MIMD (Multiple Instruction stream, Multiple Data stream)

    Instruction stream

    Data stream

    SISD SIMD MISD MIMD

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 14

    アーキテクチャの異なる視点による分類

    Flynnによる命令とデータの流れに注目した並列計算機

    の分類(1966年)

    SISD (Single Instruction stream, Single Data stream)

    SIMD( Single Instruction stream, Multiple Data stream)

    MISD (Multiple Instruction stream, Single Data stream)

    MIMD (Multiple Instruction stream, Multiple Data stream)

    MIMD

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 15

    コンピュータアーキテクチャComputer Architecture

    3. 半導体メモリMemory Technologies

    Ver. 2020-10-12a

    Course number: CSC.T363

    吉瀬 謙二 情報工学系Kenji Kise, Department of Computer Sciencekise _at_ c.titech.ac.jp

    2020年度(令和2年)版

    www.arch.cs.titech.ac.jp/lecture/CA/Tue 14:20 - 16:00, 16:15 - 17:55Fri 14:20 - 16:00

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 16

    参考資料

    • MIG を使って DRAM メモリを動かそう (1)

    • https://www.acri.c.titech.ac.jp/wordpress/archives/6048

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 17

    コンピュータの古典的な要素

    出力制御

    データパス

    記憶

    入力

    出力

    プロセッサ

    コンピュータ

    インタフェース

    コンパイラ

    性能の評価

    Instruction Set Architecture (ISA), 命令セットアーキテクチャ

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 18

    DRAM (dynamic random access memory)

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 19

    Processor-Memory(DRAM) Performance Gap

    1

    10

    100

    1000

    10000

    1980

    1983

    1986

    1989

    1992

    1995

    1998

    2001

    2004

    Year

    Perf

    orm

    an

    ce

    “Moore’s Law”

    Processor

    55%/year

    (2X/1.5yr)

    DRAM

    7%/year

    (2X/10yrs)

    Processor-Memory

    Performance Gap

    (grows 50%/year)

    The “Memory Wall”

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 20

    The Memory System’s Fact and Goal

    Fact: Large memories are slow and fast memories are small

    How do we create a memory that gives the illusionof being large, cheap and fast ?

    With hierarchy (階層)

    With parallelism (並列性)

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 21

    A Typical Memory Hierarchy

    Second

    Level

    Cache

    (SRAM)

    Control

    Datapath

    Secondary

    Memory

    (Disk)

    On-Chip Components

    RegF

    ile

    Main

    Memory

    (DRAM)Da

    ta

    Cache

    Instr

    Ca

    ch

    e

    ITL

    BD

    TL

    B

    Speed (%cycles): ½’s 1’s 10’s 100’s 1,000’s

    Size (bytes): 100’s K’s 10K’s M’s G’s to T’s

    Cost: highest lowest

    ❑ By taking advantage of the principle of locality (局所性)

    Present much memory in the cheapest technology

    at the speed of fastest technology

    TLB: Translation Lookaside Buffer

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 22

    Cache

    • Cache memory consists of a small, fast memory that acts as a buffer for the large memory.

    • The nontechnical definition of cache is a safe place for hiding things.

    Intel Core 2 Duo

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 23

    Intel Sandy Bridge, January 2011

    Processor

    Main memoryDisk

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 24

    Characteristics of the Memory Hierarchy

    Increasing

    distance from

    the processor

    in access

    time

    L1$

    L2$

    Main Memory

    Secondary Memory

    Processor

    (Relative) size of the memory at each level

    Inclusive (包括)–what is in L1$ is a

    subset of what is in

    L2$.

    L2$ is a subset of

    what is in MM.

    MM is a subset of

    is in SM.

    4-8 bytes (word)

    1 to 4 blocks

    1,024+ bytes (disk sectors = page)

    8-32 bytes (block)

  • CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 25

    Memory Hierarchy Technologies

    ◼ Caches use SRAM (static random access memory) for speed and technology compatibility◼ Low density (6 transistor cells),

    high power, expensive, fast

    ◼ Static: content will last “forever” (until power turned off)

    ◼ Main Memory uses DRAM for size (density)◼ High density (1 transistor cells), low power, cheap, slow

    ◼ Dynamic: needs to be “refreshed” regularly (~ every 8 ms)

    ◼ 1% to 2% of the active cycles of the DRAM

    ◼ Addresses divided into 2 halves (row and column)

    ◼ RAS or Row Access Strobe triggering row decoder

    ◼ CAS or Column Access Strobe triggering column selector

    Dout[15-0]

    SRAM

    2M x 16

    Din[15-0]

    Address

    Chip select

    Output enable

    Write enable

    16

    16

    21