Upload
stephany-mcdaniel
View
213
Download
0
Embed Size (px)
Citation preview
RAMCloud: a Low-Latency Datacenter Storage System
John Ousterhout
Stanford University
● RAMCloud: new class of datacenter storageAll data always in DRAM
● Large scale: 100 - 10,000 storage servers 100 TB - 1 PB total capacity
● Low latency: 5 µs remote access time from anywhere in datacenter
● Durable/available
● Overall goal: enable a new class of applications
Does this make sense for Radio Astronomy apps?
April 3, 2014 Exascale Radio Astronomy Conference Slide 2
Introduction
April 3, 2014 Exascale Radio Astronomy Conference Slide 3
Traditional Storage Choices
LocalDRAM
LocalDisk
LocalFlash
NetworkDisk
Latency 50-100ns 5-10ms 50-200µs 5-10ms
Bandwidth 10-50 GB/s 100 MB/s(per disk)
250 MB/s(per drive)
1 GB/s/node(network)
Maximum capacity
1 TB 5-20TB 1-5 TB 10-100PB
Maximum cores
24 24 24 scalable
Durable and fault tolerant?
no partially partially yes
April 3, 2014 Exascale Radio Astronomy Conference Slide 4
RAMCloud Architecture
Master
Backup
Master
Backup
Master
Backup
Master
Backup…
Appl.
Library
Appl.
Library
Appl.
Library
Appl.
Library…
DatacenterNetwork Coordinator
1000 – 10,000 Storage Servers
1000 – 100,000 Application Servers
CommodityServers
64-256 GBper server
High-speed networking:● 5 µs round-trip● Full bisection bwidth
Data Model: Key-Value Store
● Basic operations: read(tableId, key)
=> blob, version write(tableId, key, blob)
=> version delete(tableId, key)
● Other operations: cwrite(tableId, key, blob, version)
=> version
Enumerate objects in table Efficient multi-read, multi-write Atomic increment
● Under development: Secondary indexes Atomic updates of multiple objects
April 3, 2014 Exascale Radio Astronomy Conference Slide 5
Tables
(Only overwrite ifversion matches)
Key (≤ 64KB)
Version (64b)
Blob (≤ 1MB)
Object
● One copy of data in DRAM
● Multiple copies on disk/flash Each master’s backup data scattered across cluster
● Fast crash recovery Remaining servers work together to recover lost data Typical recovery time: 1-2 seconds
April 3, 2014 Exascale Radio Astronomy Conference Slide 6
Data Durability
● Using Infiniband networking (24 Gb/s, kernel bypass) Other networking also supported, but slower
● Reads: 100B objects: 5µs 10KB objects: 10µs Single-server throughput (100B objects): 700 Kops/sec. Small-object multi-reads: 1-2M objects/sec.
● Durable writes: 100B objects: 16µs 10KB objects: 40µs Small-object multi-writes: 400-500K objects/sec.
April 3, 2014 Exascale Radio Astronomy Conference Slide 7
RAMCloud Performance
1 client, 1 server
April 3, 2014 Exascale Radio Astronomy Conference Slide 8
Comparisons
LocalDRAM
NetworkDisk
RAMCloud
Latency 50-100ns 5-10ms 5µs
Bandwidth 10-50 GB/s 1 GB/s/node(network)
1 GB/s/node(network)
Maximum capacity
1 TB 10-100PB 1-5PB
Maximum cores
24 scalable scalable
Durable and fault tolerant?
no yes yes
● Ongoing research project at Stanford
● Goal: production-quality system Source code freely available Version 1.0 tagged in January 2014
(first version suitable for real applications) Starting to work with early adopters
● System requirements: x86 servers (minimum cluster size: 10-20 servers) Linux operating system Need networking with kernel-bypass NICs
● Built-in support for Mellanox Infiniband● Driver for SolarFlare 10 Gbs Ethernet NICs under development
April 3, 2014 Exascale Radio Astronomy Conference Slide 9
RAMCloud Status
Issues to consider:
● Remote access data model Sparse vs. bulk
● Key-value store
● Durability
April 3, 2014 Exascale Radio Astronomy Conference Slide 10
Is RAMCloud Right for You?
D D D D D D D D D D
Network
Large-Scale Applications
● Computation, data colocated
● Works best for: Bulk processing (touch all
data) High locality of access
● Performance dominated by bandwidth
● Examples: analytics
● Remote data access
● Works best for: Sparse and unpredictable data
accesses No locality
● Performance dominated by latency
● Example: transactional Web applications (Facebook)
April 3, 2014 Exascale Radio Astronomy Conference Slide 11
C C C C C C C C C CD
Network
C C C C C C C C C CD D D D D D D D D
ComputationNodes
StorageNodes
● RAMCloud: general-purpose DRAM-based storage Scale Latency
● Goals: Harness full performance potential of DRAM-based storage Enable new applications: intensive manipulation of large-scale
data
● What could you do with: 1M cores 1 petabyte data 5-10µs access time
April 3, 2014 Exascale Radio Astronomy Conference Slide 12
Conclusion