Better I/O Through Byte-Addressable, Persistent Memory Better I/O Through Byte-Addressable, Persistent

  • View
    0

  • Download
    0

Embed Size (px)

Text of Better I/O Through Byte-Addressable, Persistent Memory Better I/O Through Byte-Addressable,...

  • Better I/O Through Byte-Addressable, Persistent Memory

    Jeremy Condit, Ed Nightingale, Chris Frost,

    Engin Ipek, Ben Lee, Doug Burger, Derrick Coetzee

  • A New World of Storage

    + Fast + Byte-addressable - Volatile

    + Non-volatile - Slow - Block-addressable

    Disk / Flash

    DRAM

    2

  • A New World of Storage

    BPRAM

    Byte-addressable, Persistent RAM

    3

    + Fast + Byte-addressable + Non-volatile

  • A New World of Storage

    BPRAM

    How do we build fast, reliable systems with BPRAM?

    Byte-addressable, Persistent RAM

    4

    + Fast + Byte-addressable + Non-volatile

  • Phase Change Memory

    • Most promising form of BPRAM

    • “Melting memory chips in mass production” – Nature, 9/25/09

    5

  • Phase Change Memory

    phase change material (chalcogenide)

    electrode

    slow cooling -> crystalline state (1)

    fast cooling -> amorphous state (0)

    Properties Reads: 2-4x DRAM Writes: 5-10x DRAM Endurance: 108+

    6

  • A New World of Storage

    + Non-volatile - Slow - Block-addressable

    BPRAM

    Disk / FlashHow do we build fast, reliable systems with BPRAM?

    Byte-addressable, Persistent RAM

    This talk: BPFS, a file system for BPRAM Result: Improved performance and reliability

    7

    + Fast + Byte-addressable + Non-volatile

  • Goal

    New mechanism: short-circuit shadow paging

    8

    New guarantees for applications

    • File system operations will commit atomically and in program order

    • Your data is durable as soon as the cache is flushed

  • Design Principles

    1. Eliminate the DRAM buffer cache; use the L1/L2 cache instead 

    3. Provide atomicity and ordering in hardware

    Write A Write B

    2. Put BPRAM on the memory bus

    9

  • Outline

    • Intro

    • File System

    • Hardware Support

    • Evaluation

    • Conclusion

    10

  • BPRAM in the PC

    L1

    L2

    DRAM

    HD / Flash

    PCI/IDE bus

    Memory bus

    11

  • BPRAM in the PC

    L1

    L2

    DRAM

    HD / Flash

    PCI/IDE bus

    Memory bus

    BPRAM

    • BPRAM and DRAM are addressable by the CPU

    • Physical address space is partitioned

    • BPRAM data may be cached in L1/L2

    12

  • BPRAM in the PC

    L1

    L2

    DRAM

    Memory bus

    BPRAM

    • BPRAM and DRAM are addressable by the CPU

    • Physical address space is partitioned

    • BPRAM data may be cached in L1/L2

    13

  • BPFS: A BPRAM File System

    • Guarantees that all file operations execute atomically and in program order

    • Despite guarantees, significant performance improvements over NTFS on the same media

    • Short-circuit shadow paging often allows atomic, in-place updates

    14

  • file directory

    inode file

    root pointer

    indirect blocks

    inodes

    BPFS: A BPRAM File System

    file

    15

  • file directory

    inode file

    root pointer

    indirect blocks

    inodes

    BPFS: A BPRAM File System

    file

    16

  • Enforcing FS Consistency Guarantees

    • What happens if we crash during an update?

    17

  • Enforcing FS Consistency Guarantees

    • What happens if we crash during an update?

    18

  • Enforcing FS Consistency Guarantees

    • What happens if we crash during an update?

    19

  • Enforcing FS Consistency Guarantees

    • What happens if we crash during an update?

    – Disk: Use journaling or shadow paging

    – BPRAM: Use short-circuit shadow paging

    20

  • Review 1: Journaling

    • Write to journal, then write to file system

    A B

    file system

    journal

    21

  • Review 1: Journaling

    • Write to journal, then write to file system

    A B

    file system

    journal A’ B’

    22

  • Review 1: Journaling

    • Write to journal, then write to file system

    A B

    file system

    journal A’ B’

    B’A’

    23

  • Review 1: Journaling

    • Write to journal, then write to file system

    A B

    file system

    journal A’ B’

    B’A’

    • Reliable, but all data is written twice

    24

  • Review 2: Shadow Paging

    • Use copy-on-write up to root of file system

    BA

    file’s root pointer

    25

  • Review 2: Shadow Paging

    • Use copy-on-write up to root of file system

    BA A’ B’

    file’s root pointer

    26

  • Review 2: Shadow Paging

    • Use copy-on-write up to root of file system

    BA A’ B’

    file’s root pointer

    27

  • Review 2: Shadow Paging

    • Use copy-on-write up to root of file system

    BA A’ B’

    file’s root pointer

    28

  • Review 2: Shadow Paging

    • Use copy-on-write up to root of file system

    BA A’ B’

    file’s root pointer

    29

  • Review 2: Shadow Paging

    • Use copy-on-write up to root of file system

    BA A’ B’

    file’s root pointer

    • Any change requires bubbling to the FS root

    • Small writes require large copying overhead 30

  • Short-Circuit Shadow Paging

    • Uses byte-addressability and atomic 64b writes

    BA

    file’s root pointer

    31

    • Inspired by shadow paging – Optimization: In-place update when possible

  • Short-Circuit Shadow Paging

    • Uses byte-addressability and atomic 64b writes

    BA A’ B’

    file’s root pointer

    32

    • Inspired by shadow paging – Optimization: In-place update when possible

  • Short-Circuit Shadow Paging

    • Uses byte-addressability and atomic 64b writes

    BA A’ B’

    file’s root pointer

    33

    • Inspired by shadow paging – Optimization: In-place update when possible

  • Short-Circuit Shadow Paging

    • Uses byte-addressability and atomic 64b writes

    BA A’ B’

    file’s root pointer

    34

    • Inspired by shadow paging – Optimization: In-place update when possible

  • Opt. 1: In-Place Writes

    • Aligned 64-bit writes are performed in place

    – Data and metadata

    file’s root pointer

    35

  • Opt. 1: In-Place Writes

    • Aligned 64-bit writes are performed in place

    – Data and metadata

    file’s root pointer

    in-place write

    36

  • Opt. 1: In-Place Writes

    • Aligned 64-bit writes are performed in place

    – Data and metadata

    file’s root pointer

    37

  • Opt. 1: In-Place Writes

    • Aligned 64-bit writes are performed in place

    – Data and metadata

    file’s root pointer

    38

  • Opt. 1: In-Place Writes

    • Aligned 64-bit writes are performed in place

    – Data and metadata

    file’s root pointer

    39

  • • Appends committed by updating file size

    file’s root pointer + size

    40

    Opt. 2: Exploit Data-Metadata Invariants

  • • Appends committed by updating file size

    file’s root pointer + size

    in-place append

    41

    Opt. 2: Exploit Data-Metadata Invariants

  • • Appends committed by updating file size

    file’s root pointer + size

    in-place append

    file size update

    42

    Opt. 2: Exploit Data-Metadata Invariants

  • BPFS Example

    directory filedirectory

    inode file

    root pointer

    indirect blocks

    inodes

    43

  • BPFS Example

    directory filedirectory

    inode file

    root pointer

    indirect blocks

    inodes

    add entry

    remove entry

    44

    • Cross-directory rename bubbles to common ancestor

  • BPFS Example

    directory filedirectory

    inode file

    root pointer

    indirect blocks

    inodes

    45

  • Outline

    • Intro

    • File System

    • Hardware Support

    • Evaluation

    • Conclusion

    46

  • BPRAM

    L1 / L2

    ...

    CoW

    Commit

    ...

    Problem 1: Ordering

    47

  • BPRAM

    L1 / L2

    ...

    CoW

    Commit

    ...

    Problem 1: Ordering

    48

  • BPRAM

    L1 / L2

    ...

    CoW

    Commit

    ...

    Problem 1: Ordering

    49

  • BPRAM

    L1 / L2

    ...

    CoW

    Commit

    ...

    Problem 1: Order