View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Mobile and Internet Systems Group
A Semantic-based Cache Replacement Algorithm for Mobile File Access
Sharun Santhosh and Weisong Shi
Department of Computer ScienceWayne State [email protected]
http://mist.cs.wayne.edu
Mobile and Internet Systems Group
Motivation
The Future Staying connected anywhere, anytime will become a
reality
How ? Cable modem or DSL connection at home High speed Ethernet network at work or school Satellite network in the car WiFi network at the airport or the neighborhood coffee
shop
Challenges Effectiveness - Adapt to the various underlying
connectivity Convenience - Adaptation should be transparent to the
user Security – secure access in resource constraint devices
Mobile and Internet Systems Group
Heterogeneous Environment
Wide Area Wide Area Network (WAN)Network (WAN)
9.6 Kbit/s <2Mbs9.6 Kbit/s <2Mbs• Voice• SMS• e-Mail• Web browsing
• mCommerce• Internet access• Document transfer• Low/high quality video
GPS
Local Area Network wLAN
802.11a,b,g802.11a,b,g
LAN
<100Mbs<100Mbs• Access•“hot spots”•LAN equivalent
WirelessBridge
WorkgroupSwitches
Personal Area Network (PAN)
<1Mbs<1Mbs• Access•Synchronization•10 Meters
Bluetooth
GSM/CDMA
Mobile and Internet Systems Group
Adaptive Comm
unication
Optimization (Fractal)
Conn
ectio
n Vi
ew b
ased
Secu
re a
nd T
rans
pare
nt
Reco
nnec
tion
Semantic based Caching
Our Solution
CEGORClosE and Go,
Open and Resume
Mobile and Internet Systems Group
Roadmap
Motivation
Caching
Semantic-based Caching
Simulation Results
Conclusion
Mobile and Internet Systems Group
Caching
Three basic steps involved in accessing data anywhere and anytime. Retrieve the files from the server Work on them locally Write the changes back to the server
A Cache optimizes this process Reduce frequency of disk operations performed Reduce frequency of requests to the fileservers Reducing network load
Problem being addressed Minimization of CommunicationMinimization of Communication
Mobile and Internet Systems Group
Why Study Caching?
It has been studied extensively yet LRU is the most commonly used algorithm Used in NFS, AFS, Sprite, CODA and most operating
systems buffer caches
Why ? It’s simple to implement. Cache misses are acceptable in existing systems. Number of files replaced do not matter high hit ratio vs. # of replacement
But in a heterogeneous environment Each miss implies additional communication Storage of work (when in a weakly connected or
disconnected state) Cannot assume a reliable link exists with the server
Mobile and Internet Systems Group
Usage Scenario
“Imagine a field engineer is accessing layout diagrams for a faulty electricity sub-station, half way through communications go down. A cache MISS may cause several minutes delay, perhaps longer, e.g.,
Which was the 10,000 volt cable?”
Mobile and Internet Systems Group
Is simple caching (LRU) enough??
Mobile and Internet Systems Group
Goals
Caches for distributed file systems, that operate across heterogeneous networks must
Provide the hit rates of conventional caches that operate over homogenous networks
Minimize communication overhead, i.e., minimize replacements which mean increased file availability and
Mobile and Internet Systems Group
Our Approach to Caching
File access patterns aren’t random
A semantic relationship exists between two files in a file access sequence User behavior Program execution
We define and investigate two kinds of such relations Inter-file relations Intra-file relations
We introduce the notion of eviction index for each cached item
Mobile and Internet Systems Group
Outline
Motivation
Caching
Semantic-based Caching
Simulation Results
Conclusion
Mobile and Internet Systems Group
Inter-file relations
Analysis of DFS traces
Mobile and Internet Systems Group
Inter-file relations
An inter-file relationship exits between two files i and j, if i is the next file opened following j being closed. File j is called file i’s precursor.
Xi - represents the number of times file i is accessed.
Ti - represents the time since the last access to file i.
Yj - represents the number of times file j precedes file i.
Mobile and Internet Systems Group
Intra-file relations
An intra-file relationship is said to exist between two files i and j if they are both open before they are closed.
Intra-file relations are based on shared time Si,j
defined below Where O(i) and C(i) are the time at which file i was
opened or closed respectively
i
j
C
C
O
O
Sij
Mobile and Internet Systems Group
Intra-file relations
Ti - represents the time since the last access to file i.
Tj - represents the time since the last access to file j where j is open before i is closed.
Si,j - represents the shared time of file i with respect to file j where i is closed before j.
Stotal - represents the total shared time with all files that are open before i is closed
Mobile and Internet Systems Group
Inter + Intra
Mobile and Internet Systems Group
Workload
DFS Traces from CMU were utilized during the simulation
Mobile and Internet Systems Group
Implementation
Seven replacement algorithms RR – Round Robin LRU – Least recently used LFU – Least frequently used GDS – Greedy dual size INTER – based only on inter-file relations INTRA – based on intra-file relations Both – based on both intra and inter file relations
Varying cache sizes 10KB, 25KB, 50KB, 100KB, 500KB …
Seven traces Simulator maintains a cache (hashlist), open list
(list of currently open files), close list (list of files that are closed).
Mobile and Internet Systems Group
Structure of simulator
Mobile and Internet Systems Group
Simulator pseudocode
Mobile and Internet Systems Group
Outline
Motivation
Caching
Semantic-based Caching
Simulation Results
Conclusion
Mobile and Internet Systems Group
Hit rates of all algorithms
INTER/BOTH
Mobile and Internet Systems Group
INTRA
Replace attempts of all algorithms
INTER/BOTH
Mobile and Internet Systems Group
Performance – DFS Traces
Mobile and Internet Systems Group
Performance – DFS Traces
Mobile and Internet Systems Group
Performance
Mobile and Internet Systems Group
The Need For File System Tracing
Traces haven’t been collected periodically enough to reflect present day usage activity
Publicly available traces such as traces collected at the disk driver level or web proxy traces do not give us relevant information on file system workload.
Mobile and Internet Systems Group
New Open
Data Logger
Trace Module
System Call Interception
Standard Library
int fopen(char *name,char *mode)
USER SPACE KERNEL SPACE
Original
Open
Original
Close
Open Close
System Call Table
Mobile and Internet Systems Group
Analysis Summary
Most files were opened for less than a hundredth of a second
Majority of files are accessed only a few times. There is a small percentage of very popular files
Majority of files are less than 100KB in size. Large file can be very large (heavy tail)
Almost half the accesses repeat within a short period of initially occurring
File throughput has greatly increased due to presence of large files
Majority of files accessed have a unique predecessor
Mobile and Internet Systems Group
MIST traces – Hit Rates
Mobile and Internet Systems Group
MIST Traces – Files Replaced
Mobile and Internet Systems Group
MIST Traces –Byte Hit Rate
Mobile and Internet Systems Group
Summary
We have presented a semantic-based caching algorithm and shown that it performs better than conventional caching approaches in terms of hit ratio and byte hit ratio
We have also shown that it does this performing far
fewer replacements
Compared to prevalent replacement strategies that ignore file relations and communication overhead, this approach would seem to better suit distributed file systems that operate across heterogeneous environments
Mobile and Internet Systems Group
Future Work
Collecting more state-of-the-art distributed file systems traces
Applying the cache replacement algorithm into a real wireless file system in computer-assisted surgery application
Investigating the idea into more general applications, such as mobile database access, etc.