Upload
cucufrog
View
2.158
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Brief introduction to Linux memory management, focus on page reclamation. Swap and IO architecture are also mentioned.
Citation preview
VM and I/O Topics in Linux
Page Replacement, Swap and I/O
Jiannan Ouyang
Ph.D. Student
Computer Science Department
University of Pittsburgh
05/05/2011
Outline
• Overview of Linux Memory Management
• Page Reclamation
• Swap & I/O
Jiannan Ouyang, CS PhD@PITT 2
Describing Physical Memory
Jiannan Ouyang, CS PhD@PITT 3
Node: NUMA memory region
Zone: memory type
Struct Page: page frame
Physical Page Allocation
Jiannan Ouyang, CS PhD@PITT 4
Binary Buddy Allocator:
• If a block of the desired size is not available, a large block is broken up in half, and the
two blocks are buddies to each other. One half is used for the allocation, and the other is
free. The blocks are continuously halved as necessary until a block of the desired size is
available.
• When a block is later freed, the buddy is examined, and the two are coalesced if it is free.
Page Table Management
• Three Level Mapping
Jiannan Ouyang, CS PhD@PITT 5
Kernel Memory Mapping
Jiannan Ouyang, CS PhD@PITT 6 Virtual Memory
0x00000000
4-GB
Physical memory 0x00000000
0x3FFFFFFF
1-GB 896-MB
896-MB
0xC0000000
display memory
device memory
User Memory Mapping
Jiannan Ouyang, CS PhD@PITT 7 virtual memory
kernel
space
user space
text data
stack
text
data
stack
physical memory
mappings
3-GB
User Memory Mapping
Jiannan Ouyang, CS PhD@PITT 8
user space
kernel
space
user space
text
data
stack
kernel
space
text
data
stack
text
data
data
stack
stack
physical memory virtual memory virtual memory
Outline
• Overview of Linux Memory Management
• Page Reclamation
• Swap & I/O
Jiannan Ouyang, CS PhD@PITT 9
Memory Customers
Jiannan Ouyang, CS PhD@PITT 10
Kernel Code & data
User Code & Data
Slab Cache
Page Cache
Icache & dcache Buddy
System
Request
Reclaim
• All memory except “User Code & data” are used by the kernel
• “User Code & Data” are managed in user space, i.e. malloc/free,
kernel can only swap out user pages
Slab Cache
Jiannan Ouyang, CS PhD@PITT 11
• Cache for commonly used objects kept in an initialized state
available for use by the kernel.
• Save time of allocating, initializing and freeing the same object.
Disk related caches
• Dcache (metadata): dentry objects representing filesystem pathnames.
• Icache (metadata): inode objects representing disk inodes.
• Page Cache (data): data pages from disk, main disk cache used
Jiannan Ouyang, CS PhD@PITT 12
Memory Customers Review
Jiannan Ouyang, CS PhD@PITT 13
Kernel Code & data
User Code & Data
Slab Cache
Page Cache
Icache & dcache Buddy
System
Request
Reclaim
We’ll see when will the kernel start reclaim pages, which pages to
reclaim, and the replacement policy.
Reclamation: When?
Jiannan Ouyang, CS PhD@PITT 14
Zone Watermarks • Pages Low: kswapd is woken up by the buddy
allocator to start freeing pages. The value is twice the value of pages min by default.
• Pages Min: the allocator will do the kswapd work in a synchronous fashion, sometimes referred to as the direct-reclaim path.
• Pages High: kswapd will go back to sleep. The default for pages high is three times the value of pages min.
Jiannan Ouyang, CS PhD@PITT 15
Reclamation: Which?
Jiannan Ouyang, CS PhD@PITT 16
Reclamation: Which? (Con.)
Jiannan Ouyang, CS PhD@PITT 17
• Mapped & Anonymous Pages
– Mapped: backed up by a file
– Anonymous: anonymous memory region of a process
• Shared & Non-shared Pages
– Unmapping from all page table entries at once: reverse mapping, important improvement in Linux 2.6 Kernel
Reclamation: Which? (Con.)
shrink_caches until given target number of pages is met,
1. slab cache (Kmem_cache_reap)
2. User pages & page cache (refill & shrink_cache)
3. dcache and icache
Jiannan Ouyang, CS PhD@PITT 18
Replacement Policy
Jiannan Ouyang, CS PhD@PITT 19
active
inactive
Ref=1, clear
Ref=0
(active, ref) = {11,10, 01, 00}
reclaim
access
access
active=1
active=0
Moving pages across the list
Jiannan Ouyang, CS PhD@PITT 20
mark_page_accessed( ):
on each access increase the (active, ref) counter;
if active=1 move inactive->active;
Refill_inactive_zone():
if (ref=1) {ref=0; move to head of active list;}
else {move active -> inactive;}
Outline
• Overview of Linux Memory Management
• Page Reclamation
• Swap & I/O
Jiannan Ouyang, CS PhD@PITT 21
Swap
• Able to reclaim all the page frames obtained by a process, and not only those have an image on disk
– anonymous pages (User stack or heap)
– Dirty pages that belong to a private memory mapping of a process
– IPC shared pages
Jiannan Ouyang, CS PhD@PITT 22
Swap (Con.)
• Set up “swap areas” on disk
• allocating and freeing “page slots” in swap areas
• Provide functions both to “swap out” pages from RAM into a swap area and to “swap in” pages from a swap area into RAM.
• Mark Page Table entries to keep track of the positions of data in the swap areas.
Jiannan Ouyang, CS PhD@PITT 23
Example
total used free shared buffers cached
Mem: 2013 1811 201 0 157 872
-/+ buffers/cache: 782 1231
Swap: 397 0 397
Jiannan Ouyang, CS PhD@PITT 24
While(1){
p = malloc(N);
memset(p, 0, N);
//demand paging
}
$free -m
total used free shared buffers cached
Mem: 2013 1956(+) 56(-) 0 4(-) 109(-)
-/+ buffers/cache: 1842(+) 170(-)
Swap: 397 8 389
Linux I/O Architecture
Jiannan Ouyang, CS PhD@PITT 25
• How to do bypassing?
• Default file I/O API,
fwrite(), are buffered
• File System:
(dir, name, offset) -> LBA
• Device File: not normal
file
I/O Bypassing
• Disk Cache
– O_DIRECT
• File System
– Device file
• I/O Scheduler
– To be solved
Jiannan Ouyang, CS PhD@PITT 26
Thanks Q&A
Jiannan Ouyang, CS PhD@PITT 27
Reference
• Understanding the Linux Kernel, 3rd
• Understanding the Linux Virtual Memory Manager
Jiannan Ouyang, CS PhD@PITT 28
BACKUP SLICES
Jiannan Ouyang, CS PhD@PITT 29
Page Table Management
• Three Level Mapping
Jiannan Ouyang, CS PhD@PITT 30
Page Table Management (Con.)
Jiannan Ouyang, CS PhD@PITT 31
MMU Linear Address Physical Address
PGD Address