Better I/O ThroughByte-Addressable, Persistent Memory
Jeremy Condit, Ed Nightingale, Chris Frost,
Engin Ipek, Ben Lee, Doug Burger, Derrick Coetzee
A New World of Storage
+ Fast+ Byte-addressable- Volatile
+ Non-volatile- Slow- Block-addressable
Disk / Flash
DRAM
2
A New World of Storage
BPRAM
Byte-addressable, Persistent RAM
3
+ Fast+ Byte-addressable+ Non-volatile
A New World of Storage
BPRAM
How do we build fast, reliable systems with BPRAM?
Byte-addressable, Persistent RAM
4
+ Fast+ Byte-addressable+ Non-volatile
Phase Change Memory
• Most promising formof BPRAM
• “Melting memory chips in mass production”– Nature, 9/25/09
5
Phase Change Memory
phase change material(chalcogenide)
electrode
slow cooling -> crystalline state (1)
fast cooling -> amorphous state (0)
PropertiesReads: 2-4x DRAMWrites: 5-10x DRAMEndurance: 108+
6
A New World of Storage
+ Non-volatile- Slow- Block-addressable
BPRAM
Disk / FlashHow do we build fast, reliable systems with BPRAM?
Byte-addressable, Persistent RAM
This talk: BPFS, a file system for BPRAMResult: Improved performance and reliability
7
+ Fast+ Byte-addressable+ Non-volatile
Goal
New mechanism:short-circuit shadow paging
8
New guarantees for applications
• File system operations will commit atomically and in program order
• Your data is durable as soon as the cache is flushed
Design Principles
1. Eliminate the DRAM buffer cache;use the L1/L2 cache instead
3. Provide atomicity andordering in hardware
Write A Write B
2. Put BPRAM on the memory bus
9
Outline
• Intro
• File System
• Hardware Support
• Evaluation
• Conclusion
10
BPRAM in the PC
L1
L2
DRAM
HD / Flash
PCI/IDE bus
Memory bus
11
BPRAM in the PC
L1
L2
DRAM
HD / Flash
PCI/IDE bus
Memory bus
BPRAM
• BPRAM and DRAM are addressable by the CPU
• Physical address space is partitioned
• BPRAM data may be cached in L1/L2
12
BPRAM in the PC
L1
L2
DRAM
Memory bus
BPRAM
• BPRAM and DRAM are addressable by the CPU
• Physical address space is partitioned
• BPRAM data may be cached in L1/L2
13
BPFS: A BPRAM File System
• Guarantees that all file operations execute atomically and in program order
• Despite guarantees, significant performance improvements over NTFS on the same media
• Short-circuit shadow paging often allowsatomic, in-place updates
14
file directory
inodefile
root pointer
indirect blocks
inodes
BPFS: A BPRAM File System
file
15
file directory
inodefile
root pointer
indirect blocks
inodes
BPFS: A BPRAM File System
file
16
Enforcing FS Consistency Guarantees
• What happens if we crash during an update?
17
Enforcing FS Consistency Guarantees
• What happens if we crash during an update?
18
Enforcing FS Consistency Guarantees
• What happens if we crash during an update?
19
Enforcing FS Consistency Guarantees
• What happens if we crash during an update?
– Disk: Use journaling or shadow paging
– BPRAM: Use short-circuit shadow paging
20
Review 1: Journaling
• Write to journal, then write to file system
A B
file system
journal
21
Review 1: Journaling
• Write to journal, then write to file system
A B
file system
journal A’ B’
22
Review 1: Journaling
• Write to journal, then write to file system
A B
file system
journal A’ B’
B’A’
23
Review 1: Journaling
• Write to journal, then write to file system
A B
file system
journal A’ B’
B’A’
• Reliable, but all data is written twice
24
Review 2: Shadow Paging
• Use copy-on-write up to root of file system
BA
file’s root pointer
25
Review 2: Shadow Paging
• Use copy-on-write up to root of file system
BA A’ B’
file’s root pointer
26
Review 2: Shadow Paging
• Use copy-on-write up to root of file system
BA A’ B’
file’s root pointer
27
Review 2: Shadow Paging
• Use copy-on-write up to root of file system
BA A’ B’
file’s root pointer
28
Review 2: Shadow Paging
• Use copy-on-write up to root of file system
BA A’ B’
file’s root pointer
29
Review 2: Shadow Paging
• Use copy-on-write up to root of file system
BA A’ B’
file’s root pointer
• Any change requires bubbling to the FS root
• Small writes require large copying overhead30
Short-Circuit Shadow Paging
• Uses byte-addressability and atomic 64b writes
BA
file’s root pointer
31
• Inspired by shadow paging– Optimization: In-place update when possible
Short-Circuit Shadow Paging
• Uses byte-addressability and atomic 64b writes
BA A’ B’
file’s root pointer
32
• Inspired by shadow paging– Optimization: In-place update when possible
Short-Circuit Shadow Paging
• Uses byte-addressability and atomic 64b writes
BA A’ B’
file’s root pointer
33
• Inspired by shadow paging– Optimization: In-place update when possible
Short-Circuit Shadow Paging
• Uses byte-addressability and atomic 64b writes
BA A’ B’
file’s root pointer
34
• Inspired by shadow paging– Optimization: In-place update when possible
Opt. 1: In-Place Writes
• Aligned 64-bit writes are performed in place
– Data and metadata
file’s root pointer
35
Opt. 1: In-Place Writes
• Aligned 64-bit writes are performed in place
– Data and metadata
file’s root pointer
in-place write
36
Opt. 1: In-Place Writes
• Aligned 64-bit writes are performed in place
– Data and metadata
file’s root pointer
37
Opt. 1: In-Place Writes
• Aligned 64-bit writes are performed in place
– Data and metadata
file’s root pointer
38
Opt. 1: In-Place Writes
• Aligned 64-bit writes are performed in place
– Data and metadata
file’s root pointer
39
• Appends committed by updating file size
file’s root pointer + size
40
Opt. 2: Exploit Data-Metadata Invariants
• Appends committed by updating file size
file’s root pointer + size
in-place append
41
Opt. 2: Exploit Data-Metadata Invariants
• Appends committed by updating file size
file’s root pointer + size
in-place append
file size update
42
Opt. 2: Exploit Data-Metadata Invariants
BPFS Example
directory filedirectory
inodefile
root pointer
indirect blocks
inodes
43
BPFS Example
directory filedirectory
inodefile
root pointer
indirect blocks
inodes
add entry
remove entry
44
• Cross-directory rename bubbles to common ancestor
BPFS Example
directory filedirectory
inodefile
root pointer
indirect blocks
inodes
45
Outline
• Intro
• File System
• Hardware Support
• Evaluation
• Conclusion
46
BPRAM
L1 / L2
...
CoW
Commit
...
Problem 1: Ordering
47
BPRAM
L1 / L2
...
CoW
Commit
...
Problem 1: Ordering
48
BPRAM
L1 / L2
...
CoW
Commit
...
Problem 1: Ordering
49
BPRAM
L1 / L2
...
CoW
Commit
...
Problem 1: Ordering
50
BPRAM
L1 / L2
...
CoW
Commit
...
Problem 1: Ordering
51
...
CoW
Commit
...
Problem 2: Atomicity
L1 / L2
BPRAM
52
...
CoW
Commit
...
Problem 2: Atomicity
L1 / L2
BPRAM
53
...
CoW
Commit
...
Problem 2: Atomicity
L1 / L2
BPRAM
54
...
CoW
Commit
...
Problem 2: Atomicity
L1 / L2
BPRAM
55
Enforcing Ordering and Atomicity
• Ordering
– Solution: Epoch barriers to declare constraints
– Faster than write-through
– Important hardware primitive (cf. SCSI TCQ)
• Atomicity
– Solution: Capacitor on DIMM
– Simple and cheap!
56
...
CoW
Barrier
Commit
...
Ordering and Atomicity
L1 / L2
BPRAM
57
...
CoW
Barrier
Commit
...
Ordering and Atomicity
L1 / L2
BPRAM
1
1 1
58
...
CoW
Barrier
Commit
...
Ordering and Atomicity
L1 / L2
BPRAM
1
1 1
59
...
CoW
Barrier
Commit
...
Ordering and Atomicity
L1 / L2
BPRAM
1
1 1
2
60
...
CoW
Barrier
Commit
...
Ordering and Atomicity
L1 / L2
BPRAM
1
1 1
2
Ineligible for eviction!
61
...
CoW
Barrier
Commit
...
Ordering and Atomicity
L1 / L2
BPRAM
2
Ineligible for eviction!
62
...
CoW
Barrier
Commit
...
Ordering and Atomicity
L1 / L2
BPRAM
2
63
...
CoW
Barrier
Commit
...
Ordering and Atomicity
L1 / L2
BPRAM
64
...
CoW
Barrier
Commit
...
Ordering and Atomicity
L1 / L2
BPRAM
65
MP works too(see paper)
Outline
• Intro
• File System
• Hardware Support
• Evaluation
• Conclusion
66
Methodology
• Built and evaluated BPFS in Windows
• Three parts:
– Experimental: BPFS vs. NTFS on DRAM
– Simulation: Epoch barrier evaluation
– Analytical: BPFS on PCM
67
0
2
4
6
8
10
8 64 512 4096
Random n Byte Write
Microbenchmarks
0
0.4
0.8
1.2
1.6
2
8 64 512 4096
Tim
e (s
)
Append n Bytes
NTFS - DiskNTFS - RAMBPFS - RAM
68
NOT DURABLE!
NOT DURABLE!
DURABLE!
DURABLE!
BPFS Throughput On PCM
0
0.25
0.5
0.75
1
Exec
uti
on
Tim
e (v
s. N
TFS
/ D
isk)
NTFSDisk
NTFSRAM
BPFSRAM
69
BPFSPCM(Proj)
BPFS Throughput On PCM
0
0.25
0.5
0.75
1
Exec
uti
on
Tim
e (v
s. N
TFS
/ D
isk)
0
0.25
0.5
0.75
1
0 200 400 600 800
Sustained Throughput of PCM (MB/s)
ProjectedThroughput
BPFS - PCM
NTFSDisk
NTFSRAM
BPFSRAM
70
BPFSPCM(Proj)
Conclusions
• BPRAM changes the trade-offs for storage– Use consistency technique designed for medium
• Short-circuit shadow paging:– improves performance
– improves reliability
Bonus: PCM chips on display at poster session!
71