[537] Flashpages.cs.wisc.edu/~harter/537/lec-24.pdf · Flash: 11 11 11 11 11 11 11 11 00 01 11 11...

[537] FlashTyler Harter

Flash vs. Disk

Disk OverviewI/O requires: seek, rotate, transfer

Inherently: - not parallel (only one head) - slow (mechanical) - poor random I/O (locality around disk head)

Random requests take 10ms+

Flash OverviewHold charge in cells. No moving parts!

Inherently parallel.

No seeks!

Disks vs. Flash: PerformanceThroughput: - disk: ~130 MB/s (sequential) - flash: ~400 MB/s

Latency - disk: ~10 ms (one op) - flash - read: 10-50 us - program: 200-500 us - erase: 2 ms

types of write, more later…

Source: http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44830.pdf

Disks vs. Flash: CapacityAn obvious question is why are we talking about spinning disks at all, rather than SSDs, which have higher IOPS and are the “future” of storage. The root reason is that the cost per GB remains too high, and more importantly that the growth rates in capacity/$ between disks and SSDs are relatively close (at least for SSDs that have sufficient numbers of program-erase cycles to use in data centers), so that cost will not change enough in the coming decade. We do make extensive use of SSDs, but primarily for high performance workloads and caching, and this helps disks by shifting seeks to SSDs.

~ Eric Brewer et al.

Flash Architecture

SLC: Single-Level Cell

NAND Cell

MLC: Multi-Level Cell

NAND Cell

Single- vs. Multi- Level Cell

expensive robust

cheap sensitive

WearoutProblem: flash cells wear out after being overwritten too many times.

MLC: ~10K times SLC: ~100K times

Usage strategy: ???

WearoutProblem: flash cells wear out after being overwritten too many times.

MLC: ~10K times SLC: ~100K times

Usage strategy: wear leveling. - prevents some cells from wearing out while others still fresh.

BanksFlash devices are divided into banks (aka, planes).

Banks can be accessed in parallel.

Bank 0 Bank 1 Bank 2 Bank 3

readread

datadata

Flash WritesWriting 0’s: - fast, fine-grained

Writing 1’s: - slow, course-grained

Flash WritesWriting 0’s: - fast, fine-grained - called “program”

Writing 1’s: - slow, course-grained - called “erase”

Flash WritesWriting 0’s: - fast, fine-grained [pages] - called “program”

Writing 1’s: - slow, course-grained [blocks] - called “erase”

Bank 0 Bank 2 Bank 3Bank 1

Bank 0 Bank 2 Bank 3

each bank contains many “blocks”

1111 1111

one block

1111 1111

one page

1111 1111

1001 1111

1111 1111

program

1111 1111

1001 1111

1111 1111

1001 1100

1111 1111

program

1111 1111

1001 1100

1111 1111

1001 1100

1111 1111

1110 0001

1111 1111

program

1111 1111

1001 1100

1111 1111

1110 0001

1111 1111

1001 1100

1111 1111

1110 0001

1111 1111

APIsdisk flash

read read sector read page

APIsdisk flash

read read sector read page

write sector

program page (0’s)

erase block (1’s)

Flash HierarchyBank/plane: 1024 to 4096 blocks - banks accessed in parallel

Block: 64 to 256 pages - unit of erase

Page: 2 to 8 KB - unit of read and program

Disk vs. Flash PerformanceThroughput: - disk: ~130 MB/s (sequential) - flash: ~400 MB/s

Latency - disk: ~10 ms (one op) - flash - read: 10-50 us - program: 200-500 us - erase: 2 ms

Working with File Systems

Traditional File Systems

File System

Storage DeviceTraditional API: - read sector - write sector

Traditional File Systems

File System

Storage DeviceTraditional API: - read sector - write sector

not same as flash

Options1. Build/use new file systems for flash - JFFS, YAFFS - lot of work!

2. Build traditional API over flash API. - use FFS, LFS, whatever we want

Traditional API with Flashread(addr): return flash_read(addr)

write(addr, data): block_copy = flash_read(all pages of block) modify block_copy with data flash_erase(block of addr) flash_program(all pages of block_copy)

Memory:

Flash:00 00

block 0 block 1 block 2

Flash:00 00

FS wants to write 0001

Memory:

Flash:00 00

Memory:

Flash:00 00

Memory:00 00

read all other pages in block

Flash:00 00

Memory:00 00

Flash:00 00

Memory:00 01

modify target page in memory

Flash:00 00

Memory:00 01

Flash:1111

Memory:00 01

erase block

Flash:1111

Memory:00 01

Flash:1111

Memory:00 01

program all pages in block

Flash:1111

Memory:

Write AmplificationRandom writes are extremely expensive!

Writing one 2KB page may cause: - read, erase, and program of 256KB block.

Write AmplificationRandom writes are extremely expensive!

Writing one 2KB page may cause: - read, erase, and program of 256KB block.

Would FFS or LFS be better with flash?

File Systems over FlashCopy-On-Write FS may prevent some expensive random writes.

What about wear leveling? LFS won’t do this.

What if we want to use some other FS?

Flash Translation Layer

Better SolutionAdd copy-on-write layer between FS and flash. Avoids RMW (read-modify-write) cycle.

Translate logical device addrs to physical addrs.

FTL: Flash Translation Layer.

Should translation use math or data structure?

block 0

block 1

0 1 2 3 4 5 6 7

physical:

logical:

block 0

block 1

0 1 2 3 4 5 6 7

physical:

logical:

write 1101

block 0

block 1

0 1 2 3 4 5 6 7

physical:

logical:

write 1101

block 0

block 1

0 1 2 3 4 5 6 7

physical:

logical:

write 1101

block 0

block 1

0 1 2 3 4 5 6 7

physical:

logical:

block 0

block 1

0 1 2 3 4 5 6 7

physical:

logical:must eventually

be garbage collected

FTLCould be implemented as device driver or in firmware (usually the latter).

Where to store mapping? SRAM.

Physical pages can be in three states: - valid, invalid, free

States

free valid

invalid

States

free valid

invalid

program

States

free valid

invalid

program

erase relocate

States

free valid

invalid

program

erase relocate

why is file delete problematic?

States

free valid

invalid

program

erase relocate or TRIM

SSD-aware file systems can tell device that page is garbage

SSD Architecture

FTL SRAM: mapping tbl

SSD: looks like disk

Problem: Big Mapping TableAssume 200GB device, 2KB pages, 4-byte entries.

SRAM needed: (200GB / 2KB) * 4 bytes = 400 MB.

Too big, SRAM is expensive!

Page Translations

block 0

block 1

0 1 2 3 4 5 6 7

physical:

logical:

2-Page Translations

block 0

block 1

0 1 2 3 4 5 6 7

physical:

logical:

Larger MappingsAdvantage: larger mappings decrease table size.

Disadvantage?

2-Page Translations

block 0

block 1

0 1 2 3 4 5 6 7

physical:

logical:

2-Page Translations

block 0

block 1

0 1 2 3 4 5 6 7

physical:

logical:

write 1011

2-Page Translations

block 0

block 1

0 1 2 3 4 5 6 7

physical:

logical:

write 1011

2-Page Translations

block 0

block 1

0 1 2 3 4 5 6 7

physical:

logical:

write 1011

2-Page Translations

block 0

block 1

0 1 2 3 4 5 6 7

physical:

logical:

write 1011

Larger MappingsAdvantage: larger mappings decrease table size.

Disadvantages? - more read-modify-write updates - more garbage - less flexibility for placement

Hybrid FTLUse course-grained mapping for most (e.g., 95%) of data. Map at block level.

Use fine-grained mapping for recent data. Map at page level.

Hybrid FTL

physical:

logical: A B C D W X Y Z

A B C Z W X Y D

block maplogical

physical 1 0

page maplogical

physical 7 3

Strategy: Log BlocksWrite changed pages to designated log blocks.

After blocks become full, merge changes with old data.

Eventually garbage collect old pages.

Merging

MergingMerging technique depends on I/O pattern.

Three merge types: - full merge - partial merge - switch merge

block 0

B C D 1111block 1 (log)

0 1 2 3

physical:

logical: …

block 2

block 0

0 1 2 3

physical:

logical: …

block 2

write D2

block 0

B C D D2

block 1 (log)

0 1 2 3

physical:

logical: …

block 2

write D2

block 0

B C D D2

block 1 (log)

0 1 2 3

physical:

logical: …

block 2

block 0

B C D D2

block 1 (log)

0 1 2 3

physical:

logical: …

block 2

eventually, we need to get rid of red arrows, as these represent expensive mappings

block 0

B C D D2

block 1 (log)

0 1 2 3

physical:

logical: …

block 2

block 0

B C D D2

block 1 (log)

0 1 2 3

physical:

logical: …

block 2

B C D2

block 0

B C D D2

block 1 (log)

0 1 2 3

physical:

logical: …

block 2

B C D2

block 0

B C D D2

block 1 (log)

0 1 2 3

physical:

logical: …

block 2

B C D2

block 0

B C D D2

block 1 (log)

0 1 2 3

physical:

logical: …

block 2

B C D2

garbage

block 0

0 1 2 3

physical:

logical: …

block 2

block 0

0 1 2 3

physical:

logical: …

block 2

write D2

block 0

11 11 D2

0 1 2 3

physical:

logical: …

block 2

write D2

block 0

11 11 D2

0 1 2 3

physical:

logical: …

block 2

block 0

11 11 D2

0 1 2 3

physical:

logical: …

block 2

why was this a better pick?

block 0

B C D A

block 1 (log)

B C D2

0 1 2 3

physical:

logical: …

block 2

block 0

B C D A

block 1 (log)

B C D2

0 1 2 3

physical:

logical: …

block 2

block 0

B C D A

block 1

B C D2

0 1 2 3

physical:

logical: …

block 2

block 0

B C D A

block 1

B C D2

0 1 2 3

physical:

logical: …

block 2

garbage

block 0

0 1 2 3

physical:

logical: …

block 2

block 0

0 1 2 3

physical:

logical: …

block 2

write A2

block 0

B C D A2

block 1 (log)

0 1 2 3

physical:

logical: …

block 2

write A2

block 0

B C D A2

block 1 (log)

B2 11 11

0 1 2 3

physical:

logical: …

block 2

write B2

block 0

B C D A2

block 1 (log)

B2 C2 1111

0 1 2 3

physical:

logical: …

block 2

write C2

block 0

B C D A2

block 1 (log)

B2 C2 D2

0 1 2 3

physical:

logical: …

block 2

write D2

block 0

B C D A2

block 1 (log)

B2 C2 D2

0 1 2 3

physical:

logical: …

block 2

garbage

Three merge types: - full merge [copy unmodified and modified] - partial merge [copy unmodified] - switch merge [no copy]

why not always do partial or switch merge?

SummaryFlash is much faster than disk, but…

It is more expensive.

It’s not a drop-in replacement beneath an FS without a complex layer for emulating hard disk API.

[537] Flashpages.cs.wisc.edu/~harter/537/lec-24.pdf · Flash: 11 11 11 11 11 11 11 11 00 01 11 11...

Documents

SUN MON TUE WED THUR FRI SAT...6:00 PM Card Game 10:00 AM Word Scramble 11:00 AM Balloon Baseball 1:00 PM Block Puzzles 2:00 PM Hook a Duck 3:00 PM Documentary- Planet Earth 4:00 PM

NBME 11 Block 2

PBL Block 11 Ainnur

The Artificial Intelligence for Reference service Recent 12:00 PM Pizza Hut 100% Block < Recent 12:00 PM Pizza Hut Now 100% Block Deals < Recent 12:00 PM Pizza Hut Block Reorder

Wairau Intermediate School ICT PD Syndicate Day, 7 th May 2010 Block two: 9:35-11:00

BOTSWANA UNIVERSITY OF AGRICULTURE AND NATURAL … · thursday 9:00 am 11:00 am class aeb329 leadership & group dynamics in extension 300/001(tea room block) moremedi g tuesday 11:00

Welcome to Food for Health & Fitness 9/8/2015 Block 1 – 8:00 – 8:30 Block 2 – 8:38 – 9:11 Block 3 – 9:16 - 9:49 Class Procedure: Attendance Introductions

11. Vertigo Block 19

D'leedon Floorplan Block 11

Planting Details...BLOCK 13 BLOCK 2 BLOCK 6 BLOCK 2 BLOCK 5 BLOCK 3 BLOCK 7 BLOCK 8 BLOCK 9 BLOCK 10 BLOCK 11 BLOCK 12 BLOCK 3 BLOCK 4 LOT 1 LOT 3 LOT 4 LOT 6 LOT 7 LOT 8 LOT 9 LOT

Prepared to Serve - NHCUCC€¦ · 22/02/2020 · Workshop Block A 10:00 am - 11:15 am Workshop Block B 11:35 am - 12:35 pm Workshop Block C 12:55 pm - 1:55 pm Workshop Block D 2:15

Block 11 Steam Traping

OGRAMME - 22WPC · EW02 Masterclass in Strategies & Techniques for Raising Capital for the HALL 6 Global Oil & Gas Industry ... OGRAMME 11:00 – 13:00 INTERACTIVE POSTER PLAZA BLOCK

Orientierungskurs WS 2012. BLOCK A (Gruppen x.1) B LOCK B (Gruppen x.2) KALENDERWOCHE 40 Montag 01.10.2012 8:00-9:00 9:00-10:00 10:00-11:00 11:00-12:00

Flash Slides: Tyler R. Caraza-Harter - IIT-Computer Sciencecs.iit.edu/~khale/class/intro-os/s19/handout/lec21.pdf · Flash: 11 11 11 11 11 11 11 11 00 01 11 11 11 11 11 11. block

Abteilung Aachen - katho-nrw.de · Beatrix S 25 2 Mi 14:00 - 16:00, Block (e) 6 Feger, Anne Vonderbank-Linzen, Beatrix S 25 2 09:00 - 17:00, Block (m) 10 Feger, Anne Vonderbank-Linzen,

Block Cover onko 3 2012 Collect · 2017-12-10 · 11:00–11:20 всемирный отчет о заболеваемости раком молочной железы – 2012 Питэр

User Group meeting in Brussels* 21/11/2018 12:30 – 16:00 CET€¦ · • 2-hour block =>Max OBD is 50 orders • 10-hour block =>Max OBD is 10 orders • 24-hour block =>Max OBD

Rising 9th Grade Curriculum Night€¦ · Rising 9th Grade Curriculum Night January 24, 2017. A DAY B Day 1st Block 9:00 10:29 5th Block 9:00 10:29 2nd Block 10:36 12:05 6th Block

Section 11. Timers - Microchip Technology · 2010. 1. 5. · Figure 11-1 illustrates a block diagram of a Type A timer. Figure 11-1: Type A Timer Block Diagram TGATE TCS 00 10 x1