CSC.T363 Computer Architecture...2020/10/13 · CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 3 UART (Universal Asynchronous Receiver/Transmitter) •

CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 1

コンピュータアーキテクチャ演習の補足

情報工学系吉瀬謙二

Kenji Kise, Department co Computer Sciencekise_at_c.titech.ac.jp

Ver. 2020-10-12a2020年度（令和2年）版

Course number: CSC.T363

Visit our support page https://www.arch.cs.titech.ac.jp/lecture/CA/


【重要】 ACRiルームのサーバの予約

• ACRiルームのアカウントを使って、次のURLからログインする。

• https://gw.acri.c.titech.ac.jp/

• 「予約ページトップ」から、vs で始まるサーバの 10月13日(火) 12:00～15:00 の枠と、15:00～18:00の枠を予約すること。

• 同じサーバーを連続して予約すること。

• 12:00開始の枠は、30分前の 11:30 までに予約する必要があるので注意。


UART (Universal Asynchronous Receiver/Transmitter)

• 調歩同期方式によるシリアル信号をパラレル信号に変換したり，その逆方向の変換をおこなう集積回路をUARTと呼ぶ．8ビット（1バイト）単位でデータを送信・受信する．

• UARTを用いることで，FPGAとコンピュータの間でのお手軽なデータ通信が可能．

• 例えば，’a’ という文字を送信する場合，’a’ は 8’h61, 8’b01100001 （次スライドのASCII Tableを参照）なので，下図のタイミングで送信線TXDを制御する．

• データが送信されるまで送信線TXD を1とする．

• まず，青色で示した0 （これをスタートビットと呼ぶ）を送信することで，データ送信の開始を明示．

• 次に，黄色で示した様に送信したいデータ 8’b01100001 の最下位ビットから順番に送信する．

• 最後に，赤色で示した1（これをストップビットと呼ぶ）を送信する．

• 1ビットを送受信するための時間間隔は送信側と受信側で同じレートを用いる．これをボー・レート (baud) と呼ぶ．例えば、1000 baud であれば、１ビット送信の間隔は 1msec となる．

3

0 1 0 0 0 0 1 1 0

Time

1 1

Startbit

TXD

Stopbit

1


Byte Addresses

• Since 8-bit bytes are so useful, most architectures address individual bytes in memory

• The memory address of a word must be a multiple of 4 (alignment restriction)

• Big Endian:

• Leftmost byte is word address

• IBM 360/370, Motorola 68k, MIPS, SPARC, HP PA

• Little Endian:

• Rightmost byte is word address

• Intel 80x86, DEC Vax, DEC Alpha (Windows NT)

msb lsb

3 2 1 0little endian byte 0

0 1 2 3big endian byte 0


コンピュータアーキテクチャComputer Architecture

2. コンピュータの性能と消費電力の動向Trends in Performance and Power

Ver. 2020-10-05a2020年度（令和2年）版


www.arch.cs.titech.ac.jp/lecture/CA/Tue 14:20 - 16:00, 16:15 - 17:55Fri 14:20 - 16:00

吉瀬謙二情報工学系Kenji Kise, Department of Computer Sciencekise _at_ c.titech.ac.jp


Performance Factors

Want to distinguish elapsed time and the time spent on our task

CPU execution time (CPU time) : time the CPU spends working on a task

Does not include time waiting for I/O or running other programs

CPU execution time # CPU clock cycles

for a program for a program= x clock cycle time

CPU execution time # CPU clock cycles for a program

for a program clock rate = -------------------------------------------

◼ Can improve performance by reducing either the length of the clock cycleor the number of clock cycles required for a program

or


Performance Factors

Performance = f x IPCf: frequency (clock rate)

IPC: retired instructions per cycle

CPU execution time # CPU clock cycles for a program

for a program clock rate = -------------------------------------------

int flag = 1;

int foo(){

while(flag);

}

Performance = clock rate x 1 / # CPU clock cycles for a program


Pollack’s Rule

• Pollack's Rule states that microprocessor "performance increase due to microarchitecture advances is roughly proportional to the square root of the increase in complexity". Complexity in this context means processor logic, i.e. its area.

• Superscalar, vector

• Instruction level parallelism, data level parallelism


From multi-core era to many-core era

Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction, MICRO-36


Power within a Microprocessor

• Powerdynamic = 1/2 x Capacitive load x Voltage 2 x Frequency switched

• Pdynamic = 1/2 x C x V2 x F

• Power required per transistor

• The first 32-bit microprocessors like Intel 80386 consumed less than two watt.

• 3.3GHz Intel Core i7 consumes 130 watts.


From multi-core era to many-core era

Platform 2015: Intel® Processor and Platform Evolution for the Next Decade, 2005


Intel Sandy Bridge, January 2011

4 to 8 core


アーキテクチャの異なる視点による分類

Flynnによる命令とデータの流れに注目した並列計算機

の分類（1966年）

SISD (Single Instruction stream, Single Data stream)

SIMD( Single Instruction stream, Multiple Data stream)

MISD (Multiple Instruction stream, Single Data stream)

MIMD (Multiple Instruction stream, Multiple Data stream)

Instruction stream

Data stream

SISD SIMD MISD MIMD


アーキテクチャの異なる視点による分類

Flynnによる命令とデータの流れに注目した並列計算機

の分類（1966年）

SISD (Single Instruction stream, Single Data stream)

SIMD( Single Instruction stream, Multiple Data stream)

MISD (Multiple Instruction stream, Single Data stream)

MIMD (Multiple Instruction stream, Multiple Data stream)

MIMD


コンピュータアーキテクチャComputer Architecture

3. 半導体メモリMemory Technologies

Ver. 2020-10-12a


吉瀬謙二情報工学系Kenji Kise, Department of Computer Sciencekise _at_ c.titech.ac.jp

2020年度（令和2年）版

www.arch.cs.titech.ac.jp/lecture/CA/Tue 14:20 - 16:00, 16:15 - 17:55Fri 14:20 - 16:00


参考資料

• MIG を使って DRAM メモリを動かそう (1)

• https://www.acri.c.titech.ac.jp/wordpress/archives/6048


コンピュータの古典的な要素

出力制御

データパス

記憶

入力

出力

プロセッサ

コンピュータ

インタフェース

コンパイラ

性能の評価

Instruction Set Architecture (ISA), 命令セットアーキテクチャ


DRAM (dynamic random access memory)


Processor-Memory(DRAM) Performance Gap

1

10

100

1000

10000

1980

1983

1986

1989

1992

1995

1998

2001

2004

Year

Perf

orm

an

ce

“Moore’s Law”

Processor

55%/year

(2X/1.5yr)

DRAM

7%/year

(2X/10yrs)

Processor-Memory

Performance Gap

(grows 50%/year)

The “Memory Wall”


The Memory System’s Fact and Goal

Fact: Large memories are slow and fast memories are small

How do we create a memory that gives the illusionof being large, cheap and fast ?

With hierarchy （階層）

With parallelism （並列性）


A Typical Memory Hierarchy

Second

Level

Cache

(SRAM)

Control

Datapath

Secondary

Memory

(Disk)

On-Chip Components

RegF

ile

Main

Memory

(DRAM)Da

ta

Cache

Instr

Ca

ch

e

ITL

BD

TL

B

Speed (%cycles): ½’s 1’s 10’s 100’s 1,000’s

Size (bytes): 100’s K’s 10K’s M’s G’s to T’s

Cost: highest lowest

❑ By taking advantage of the principle of locality （局所性）

Present much memory in the cheapest technology

at the speed of fastest technology

TLB: Translation Lookaside Buffer


Cache

• Cache memory consists of a small, fast memory that acts as a buffer for the large memory.

• The nontechnical definition of cache is a safe place for hiding things.

Intel Core 2 Duo


Intel Sandy Bridge, January 2011

Processor

Main memoryDisk


Characteristics of the Memory Hierarchy

Increasing

distance from

the processor

in access

time

L1$

L2$

Main Memory

Secondary Memory

Processor

(Relative) size of the memory at each level

Inclusive (包括）–what is in L1$ is a

subset of what is in

L2$.

L2$ is a subset of

what is in MM.

MM is a subset of

is in SM.

4-8 bytes (word)

1 to 4 blocks

1,024+ bytes (disk sectors = page)

8-32 bytes (block)


Memory Hierarchy Technologies

◼ Caches use SRAM (static random access memory) for speed and technology compatibility◼ Low density (6 transistor cells),

high power, expensive, fast

◼ Static: content will last “forever” (until power turned off)

◼ Main Memory uses DRAM for size (density)◼ High density (1 transistor cells), low power, cheap, slow

◼ Dynamic: needs to be “refreshed” regularly (~ every 8 ms)

◼ 1% to 2% of the active cycles of the DRAM

◼ Addresses divided into 2 halves (row and column)

◼ RAS or Row Access Strobe triggering row decoder

◼ CAS or Column Access Strobe triggering column selector

Dout[15-0]

SRAM

2M x 16

Din[15-0]

Address

Chip select

Output enable

Write enable

16

16

21

Documents

CSC.T363 Computer Architecture...2020/10/13 · CSC.T363 Computer Architecture, Department of Computer Science, TOKYO TECH 3 UART (Universal Asynchronous Receiver/Transmitter) •