31
Optimizing Ext4 for Low Optimizing Ext4 for Low Memory Environments Memory Environments Theodore Ts'o November 7, 2012

Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

  • Upload
    vuthien

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

Optimizing Ext4 for Low Optimizing Ext4 for Low Memory EnvironmentsMemory Environments

Theodore Ts'o

November 7, 2012

Page 2: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

Agenda

Status of Ext4

Why do we care about Low Memory Environments: Cloud Computing

Optimizing Ext4 for Low Memory Environments

Conclusion

Page 3: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

Ext4 Status

Now stable in the most common configurations

Some distributions are planning on replacing ext[23] with ext4

New features recently added to ext4

Punch system call

Metadata checksumming

Online resizing for > 16TB file systems

Page 4: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

Advantages of ext4

“Modern” file system that is still reasonably simple

Lines of Code as a Proxy for Complexity (as of 3.6.5)

Minix: 2441

Ext2: 9703

Ext3: 19,304

Ext4: 41,249

Btrfs: 88,189

XFS: 94,591

Page 5: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

Advantages of ext4

“Modern” file system that is still reasonably simple

Portions of the code base are (relatively) stable and are time-tested

Userspace utilities

Journal Block layer (also used by OCFS2)

Page 6: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

Advantages of ext4

“Modern” file system that is still reasonably simple

Portions of the code base are (relatively) stable and are time-tested

Incremental development instead of “rip and replace”

Well understood performance characteristics

Page 7: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

Disadvantages of ext4

Incremental development means that certain design decisions are very hard to change:

Fixed inode table

Bitmap based allocations

32-bit inode numbers

Currently RAID support is extremely weak

Lack of sexy new features

Compression

Filesystem-level snapshots (use thin provisioned snapshots instead)

FS-aware RAID and LVM

Page 8: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

Common Ext4 Use Cases

Default File System for Desktop / Servers

Distributions may change this choice in the future

Android devices (Honeycomb / Ice Cream Sandwich)

Cloud storage servers

Page 9: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

Rise of Cloud Computing

Or Grid Computing, Utility Computing, etc.

Challenges

Usability – How to deliver something useful to the user?SAAS

PAAS

Custom programming for cloud/grid/utility compluting

Security – Public vs. Private Clouds?

Economics – Is it really cheaper at the end of the day?

Page 10: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

Rise of Cloud Computing

Or Grid Computing, Utility Computing, etc.

The economics of cloud computing

Really big, efficient data centers

More efficient use of servers

Traditional servers often don't use their resources efficientlyCPU

Disk

Networking Bandwidth

To make the cloud economics work important to pack a lot of jobs onto a smaller number of servers

Virtualization

Containers

Page 11: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

Using resources efficiently in file systems

Restricted memory means less caching available

Data Blocks

Metadata Blocks

Block allocation bitmaps are the big problem

When they get pushed out of memory, long unlink() and fallocate() times

Surpringly, CPU can be a problem too

Especially for PCIe attached flash (large IOP/s)

Plenty of other uses for the CPU (transcoding video formats)

Also important for large-scale macro benchmarks (TPC-C)

Page 12: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

Restricted Memory is a problem for Copy-on-Write file systems, too

Suggestion from the ZFS Open Solaris list:

“If you are using a laptop and not serving anything and performance is not a major concern and you're free to reboot whenever you want, then you can survive on 2G of ram. But a server presumably DOES stuff and you don't want to reboot frequently. I'd recommend 4G minimally, 8G standard, and if you run any applications (databases, web servers, symantec products) then add more.”

http://permalink.gmane.org/gmane.os.solaris.opensolaris.zfs/44928

Page 13: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

A short aside about latency

Avoiding latency makes the users happy

“Fast is better than slow. We know your time is valuable, so when you’re seeking an answer on the web you want it right away–and we aim to please. We may be the only people in the world who can say our goal is to have people leave our homepage as quickly as possible.... And we continue to work on making it all go even faster.”

From “Ten things we know to be true”

Page 14: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

A short aside about latency

Avoiding latency makes the users happy

A few slow requests slow the requests behind them

...

Page 15: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

A short aside about latency

Avoiding latency makes the users happy

A few slow requests slow the requests behind them

A few slow operations effectively slows down its peers in a distributed computation

Page 16: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

Optimizing ext4 for low-memory environments

No Journal Mode

Smarter metadata caching

Page 17: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

No Journal Mode for Ext4

General principle: Don't pay for features you don't need

A review of cluster storage at Google

The hardware

Thousands of machines in a data center

Tens of thousands of disks

GFS as a clustered file system

Replication at the clustered file system level(So we can survive loss of machines)

Checksumming done by the clustered file system(The end to end principle)

Page 18: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

No Journal Mode for Ext4

General principle: Don't pay for features you don't need

A review of cluster storage at Google

Journaling is not free

Page 19: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

Journalling is not free

ext4 ext4 nojournal2000.00

2050.00

2100.00

2150.00

2200.00

2250.00

2300.00

2350.00

FFSB Large File Creates

2 CPU's using Direct I/O

Tra

nsa

ctio

ns

per

sec

ond

Page 20: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

No Journal Mode for Ext4

General principle: Don't pay for features you don't need

A review of cluster storage at Google

Journaling is not free

No journal mode one of the first Google changes to ext4

Wanted the improvements of extents, delayed allocation, etc.

Google had chosen not to use ext3 since journalling had significant costs

Ext4 in no journal mode is the best of both worlds

Page 21: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

Improving metadata caching

Small inodes

Ext2 only supported 128 byte inodes

Ext3/ext4 supports larger inodes

256 byte defaultUsed to store extended attributes

Also used to store subsecond timestamps for ext4

Small inodes means more inodes per block --- makes a huge difference in memory limited environments

Page 22: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

Effects of 128 byte inodes

ext4 ext4-128I ext4 nojournal ext4 128I NJ1900.00

2000.00

2100.00

2200.00

2300.00

2400.00

2500.00

FFSB Large File Creates

2 CPU's using Direct I/O

Tra

nsa

ctio

ns

per

sec

ond

Page 23: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

Improving metadata caching

Small inodes

Free block statistics for each block group

Ext4 now caches the size of the largest available free block

This allows a block group to be evaluated without needing to needing to consult the block bitmap

Page 24: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

Improving metadata caching

Small inodes

Free block statistics for each block group

Inode extent information

Ext4's on-disk format uses 12 bytes/extent

4 in inode

340 in a 4k extent tree leaf block

Maximum 128M in an extent

Page 25: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

Improving metadata caching

Small inodes

Free block statistics for each block group

Inode extent information

Internal bigextent patch in Google

An in-memory b-tree which collapses adjacent extents

Originally because cache line misses was measurable while searching the on-disk representation on PCIe attached flash

Takes less memory than a 4k extent block in most cases

Will be going upstream soon

Page 26: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

Conclusion

General Purpose File System Myth

Page 27: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

General Purpose File System Myth?

“There can only be one!”

Page 28: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

General Purpose File System Myth?

“There can only be one!”

Too hard for users to choose

File systems used to be used for many things at the same time

But.... workloads are different

Design tradeoffs; optimizing for one workload can compromise another

How did this myth survive for so long?

Many workloads did not stress the file system

File systems were simpler – fewer features

Servers were more inefficiently run – more idle resources

Page 29: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

Conclusion

General Purpose File System Myth

Future ext4 work

Extent Status Tree

(provides SEEK_HOLE/SEEK_DATA support)

Inline data

RAID stripe awareness

Can also be used to make ext4 erase block aware for eMMC devices with primitive flash translation layers

Atomic msync()

Terence Kelly and Stan Park at HP

Page 30: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

Conclusion

General Purpose File System Myth

Future ext4 work

Remember to optimize the entire storage stack

Functionality at the block device layer

Thin-provisioned snapshots

dm-cache / bcache

Optimizing userspace

The sqllite library

Applications

Improving abstractions up and down the storage stack

Page 31: Optimizing Ext4 for Low Memory Environments · PDF fileJournal Block layer (also used by OCFS2) ... Security – Public vs. Private Clouds? ... GFS as a clustered file system

Thank You!