Upload
maud-beryl-garrison
View
218
Download
2
Embed Size (px)
Citation preview
Master’s Thesis
Semantic Query Caching in Mobile Environments
By: Jekkin Shah
Advisor: Dr. Konstantinos Kalpakis
Master’s Thesis 22 http://www.csee.umbc.edu/~jekkin1/thesis
Semantic Query Caching in Mobile Environments
Introduction
Motivation
Contribution
Concept of Semantic Caching
Issues involved in semantic caching
System Architecture
Prototype and Experiments
Conclusion and further work
Master’s Thesis 33 http://www.csee.umbc.edu/~jekkin1/thesis
Introduction
Disparate works and progresses in :
Geographic Information System (GIS)
Global Positioning System (GPS)
Wireless Technology
Handheld devices
Convergence to Mobile Geographic Information System (mobile GIS)
Rapid growth in mobile GIS applications in all walks of life
Emphasis on spatial data, its storage, retrieval and manipulation
Master’s Thesis 44 http://www.csee.umbc.edu/~jekkin1/thesis
Mobile GIS
GIS GPS
Wireless Handheld
Convergence
Master’s Thesis 55 http://www.csee.umbc.edu/~jekkin1/thesis
Growing List of Applications
Car navigation systems
Emergency services
Real time stock quotes
Field services
Real time tracking and routing of shipments
Environmental surveys
and the list is growing rapidly …
Master’s Thesis 66 http://www.csee.umbc.edu/~jekkin1/thesis
Semantic Query Caching in Mobile Environments
Introduction
Motivation
Contribution
Concept of Semantic Caching
Issues involved in semantic caching
System Architecture
Prototype and Experiments
Conclusion and further work
Master’s Thesis 77 http://www.csee.umbc.edu/~jekkin1/thesis
Motivation
Hungry !!! Lets find a nearby restaurant
query Q1:
FIND restaurants WHERE location = “nearby”
McDonalds 2 miles
Dominos Express 2.4 miles
Taco Bell 3 miles
Subway 4 miles
McDonalds 5 miles
…….
Found 37 matches
Master’s Thesis 88 http://www.csee.umbc.edu/~jekkin1/thesis
Example 1 (cont.)
Wait …We also need some gas !!!
Lets see if we can find a gas station near McDonalds.
query Q2:
FIND “McDonalds” WHERE gas Station = “nearby”
McDonalds 5 miles
McDonalds 12.4 miles
Found 2 matches
Master’s Thesis 99 http://www.csee.umbc.edu/~jekkin1/thesis
Shouldn’t we speed up the process ?
Query Q1 is in local cache
Query Q1 subsumes query Q2
Why do we need to execute query Q2 from scratch ??
We need a technique to determine and extract Q2 from Q1
Unfortunately, traditional techniques like page caching do not provide much help in this case
Q1Q2
Master’s Thesis 1010http://www.csee.umbc.edu/~jekkin1/thesis
A new approach – Semantic Caching
Along with query results, store the queries also in cache
Use these queries (query descriptors) to determine if and how a new query can be answered from cache Check if the required data is present in cache. Extract the data from cache
Add, remove, merge data by performing corresponding operation on query descriptors
Manage cache by managing the query descriptors
Think of query descriptors as intelligent pointer references that implicitly contain some information about the data they refer to
Master’s Thesis 1111http://www.csee.umbc.edu/~jekkin1/thesis
Problems with traditional caching
Pointer references do not contain any implicit information
Q1 p1,p2,p3,p4,p5,p6
Q2 p7,p8,p9,p10,p11,p12
Q3 all the pages
Space constraints will make it difficult to store all the pages in cache.
p1
p3
p5
p7 p8
p9 p10
p11 p12
p2
p4
p6data3
Master’s Thesis 1212http://www.csee.umbc.edu/~jekkin1/thesis
Semantic Query Caching in Mobile Environments
Introduction
Motivation
Contribution
Concept of Semantic Caching
Issues involved in semantic caching
System Architecture
Prototype and Experiments
Conclusion and further work
Master’s Thesis 1313http://www.csee.umbc.edu/~jekkin1/thesis
Contribution
An architecture for Semantic Caching in mobile environments
A system prototype as a “proof-of-concept” with the following building blocks Query parser and validator A Solver for determining query satisfiability An Executor for processing partial and remainder queries A Cache manager for efficiently managing the cache
A cache replacement algorithm
Techniques for query processing
Master’s Thesis 1414http://www.csee.umbc.edu/~jekkin1/thesis
Semantic Query Caching in Mobile Environments
Introduction
Motivation
Contribution
Concept of Semantic Caching
Issues involved in semantic caching
System Architecture
Prototype and Experiments
Conclusion and further work
Master’s Thesis 1515http://www.csee.umbc.edu/~jekkin1/thesis
Issues in semantic caching
Although the idea of semantic caching is straight forward, store query descriptors along with their results, the issues involved are much harder !!
Simple concept but Difficult Implementation
Issues
1. We need to decide if the answer is present in cache
2. If present, do we have sufficient information to extract it ?
Master’s Thesis 1616http://www.csee.umbc.edu/~jekkin1/thesis
Answering Queries from Cache
Q1 Select * from db where A > 50
Q2 Select * from db where B < 550
Q3 Select * from db where (A > 200 and B < 300)
Is result of Q3 present in (Q1 + Q2) ?
Master’s Thesis 1717http://www.csee.umbc.edu/~jekkin1/thesis
Solving the implication problem
Let T = { Q1, Q2 } be a set of query descriptors already in cache
We need to show that QT
We show that ¬ (Q T) is FALSE
¬ (Q T)
¬ (¬ Q T)
Q (¬T)
Q ¬(T1 T2 T3 T4)
Q (¬T1) (¬T2) (¬T3) (¬T4)
This is the primary technique used in our thesis.
The algorithm is adopted from [LY85].
Master’s Thesis 1818http://www.csee.umbc.edu/~jekkin1/thesis
Solving the implication problem (Cont.)
Exponential growth in the number of equations to be solved.
Solution: Clustering based on Signatures
Signature created by taking into account the predicate attributes present in the query
Restriction on the number of clusters created
Signature used in indexing the query descriptors
Attr A, B
Attr X, D
Master’s Thesis 1919http://www.csee.umbc.edu/~jekkin1/thesis
Data Extraction problem
Select * from db where A > 50
Select * from db where B < 550
Select * from db
where (A > 200 and B < 300 and C = 100)
Data1
Data2
Data3
Can we extract Data3 ?
We fetch attribute C from remote source and take a Cartesian product with the data already present in cache
Master’s Thesis 2020http://www.csee.umbc.edu/~jekkin1/thesis
Answering Partial Queries
What happens if QT is FALSE ?
There may be a non empty intersection set between Q and T
Answer (Q T) locally (Partial match)
Send (Q ¬ T) to the server (Remainder Query)
T1 T2
Q
Master’s Thesis 2121http://www.csee.umbc.edu/~jekkin1/thesis
Semantic Query Caching in Mobile Environments
Introduction
Motivation
Contribution
Concept of Semantic Caching
Issues involved in semantic caching
System Architecture
Prototype and Experiments
Conclusion and further work
Master’s Thesis 2222http://www.csee.umbc.edu/~jekkin1/thesis
Semantic Caching Architecture
Solver
(Query implication)
Query parser and Validator
Cache manager
Remote db
Local Cache
Executor
results
query
Master’s Thesis 2323http://www.csee.umbc.edu/~jekkin1/thesis
Cache Structure
Local Cache is implemented as relational database structures
Query descriptors are stored in one table indexed by their signatures
Corresponding query results (data) are stored in another table
An auxiliary table associates the query descriptors with its corresponding data
Cache manager interacts with query descriptor table
Manipulation of data is achieved through the manipulation of query descriptors
Master’s Thesis 2424http://www.csee.umbc.edu/~jekkin1/thesis
Cache: Operations and Management
Cache Manager:
Replacement module: Replacement: Determines what needs to be cached and
what can be purged out
Management module: Addition: Granularity of addition is a semantic region
Deletion: Removal of region, though not necessarily leading to the removal of data
Merge: To simplify query processing, two or more regions can be merged
Decomposition: A very large region, can be decomposed for efficiency reasons
Master’s Thesis 2525http://www.csee.umbc.edu/~jekkin1/thesis
Cache Replacement
Theory and Assumptions
What is the performance metric ?
Conventional caching schemes optimize one or more of the following parameters with the goal of improving the performance Hit ratio
Response time
Data transmission time
Due to the dynamics of our application domain, none of these parameters truly reflect the performance of our applications
Master’s Thesis 2626http://www.csee.umbc.edu/~jekkin1/thesis
Theory and Assumptions (Cont.)
Cache Hit Rate : how do we define hit rate ? One: At least one data record obtained from cache All: All data records to be obtained from local cache Mid: 50% of data records to be satisfied from local cache
Response time: Partially answered queries make it difficult to accurately
define the response time
Data transmission time: Lot of dependence on the actual network parameters like
latency and bandwidth
Master’s Thesis 2727http://www.csee.umbc.edu/~jekkin1/thesis
Theory and Assumptions (Cont.)
Mobile environments: Premium on bandwidth
Our goal: “To minimize the cost of servicing the requests that cannot be answered from the local cache”
Cost is measured in terms of time
Performance metric is Byte hit rate (BHR): Ratio of actual amount of data served from local cache to
the amount of data transferred from the remote source
Assumptions: Negligible query execution time
Uniform latency and bandwidth across the network
Master’s Thesis 2828http://www.csee.umbc.edu/~jekkin1/thesis
Replacement Algorithm
Guiding Action Selection function (GAS) to assign a value to each semantic region
GAS value = a + (s * f * b) s = size of data transferred from the remote source f = frequency of access of the query a, b are domain specific parameters a = freshness count of each query b = 1/Sd, where Sd is the distance between the current
location of the moving object and the location of query
Using the GAS function the value of each semantic region is calculated
Master’s Thesis 2929http://www.csee.umbc.edu/~jekkin1/thesis
Replacement Algorithm (Cont.)
For each query in cache we have, GAS value (Vi)
Weight (Wi)
Also, we have a limit on the total size of the cache (W) and also the total number of queries (K) that can be admitted
Problem definition: “Given a set of rectangles with a weight and a value,
choose at most K rectangles that gives maximum value, provided the weight does not exceed W”
The problem can be formulated as the 0-1 Knapsack problem with additional cardinality constraint
Master’s Thesis 3030http://www.csee.umbc.edu/~jekkin1/thesis
Semantic Query Caching in Mobile Environments
Introduction
Motivation
Contribution
Concept of Semantic Caching
Issues involved in semantic caching
System Architecture
Prototype and Experiments
Conclusion and further work
Master’s Thesis 3131http://www.csee.umbc.edu/~jekkin1/thesis
Experiments (Setup)
Requirements Workload (datasets and queries)
Modeling the behavior of the moving object
Query execution guidelines
Real datasets Hard to obtain
Complexity in processing due to complex structures of spatial objects
Synthetic dataset generator Easily generated
Various parameters can be controlled
Master’s Thesis 3232http://www.csee.umbc.edu/~jekkin1/thesis
Workload
Query load selection Tables
Restaurants: LocX, LocY, Name, ID, tables, City, Zip
Gas Stations: LocX, LocY, Name, ID, Low, Mid, High
Query specifications:
Rectangular queries (select and project only)
Number of queries issued per trip : 20-70
Type of queries: Location aware, location dependent and non-location related
Frequency of issuance: Selected randomly ranging from 5 ms to 100 ms
Overlap rate: 10-25%
Master’s Thesis 3333http://www.csee.umbc.edu/~jekkin1/thesis
Experiments (Moving Object)
Behavior of Moving Object Generating Spatio-Temporal Dataset (GSTD) [PT00]
Moves in a 2D space
Static points and regions called infrastructure emulate real life objects like buildings, rivers, roads etc.
Trajectories are generated using specific guidelines
Initial statistical distribution of infrastructure objects
Source and destination location
Speed of moving object
Direction of motion
Duration of journey
Master’s Thesis 3434http://www.csee.umbc.edu/~jekkin1/thesis
Query Execution Guidelines
Controllable parameters Type of queries:
Location dependent, Location aware, Non-location related
Frequency of query issuance
Selectivity of chosen queries
Query overlap rate
Parameters are chosen in a variety of combinations Random
Gaussian distribution
Skewed distribution
Master’s Thesis 3535http://www.csee.umbc.edu/~jekkin1/thesis
Results
Cache Size Vs Hit Rate ( NEW vs m-LRU)
The NEW replacement scheme compares roughly equal to modified LRU replacement scheme
BHR increases upto 70% when cache size is progressively increased
Master’s Thesis 3636http://www.csee.umbc.edu/~jekkin1/thesis
Results
Hit rates Vs Number of queries (NEW scheme)
Increasing the number of queries in the system does not substantially increase the hit rates.
Byte hit rate performs nearly equal to Hit rate Mid
Master’s Thesis 3737http://www.csee.umbc.edu/~jekkin1/thesis
Semantic Query Caching in Mobile Environments
Introduction
Motivation
Contribution
Concept of Semantic Caching
Issues involved in semantic caching
System Architecture
Prototype and Experiments
Conclusion and further work
Master’s Thesis 3838http://www.csee.umbc.edu/~jekkin1/thesis
Conclusion
No assumption made on Spatial Locality of Reference
Query descriptors act as “Intelligent” References
Can support Content Based Reasoning
Ability to take advantage of Schema Knowledge
Page / Tuple caching schemes do not scale well in our GIS domain
Reasons:• “Unintelligent” pointer references
• Questionable assumption of Spatial Locality of Reference
• Inability to take advantage of Semantic Overlaps
Master’s Thesis 3939http://www.csee.umbc.edu/~jekkin1/thesis
Advantages of Semantic Caching
Benefits of Semantic Caching Leverages semantic locality found in typical mobile GIS
applications
Adapts dynamically to the patterns of user queries rather than caching static clusters of tuples
Minimizes cost of cache lookup due to compact representation of query descriptors
Capable of providing partial and/or approximate answers to queries quickly
Master’s Thesis 4040http://www.csee.umbc.edu/~jekkin1/thesis
Conclusion (Cont.)
Shortcomings of Semantic Caching Complicated cache management schemes
Too restrictive. Solver can process only simple type of queries
Captures the semantics of the query and not the result objects. Hence, fails to utilize cached objects when the semantics of the query do not match
Master’s Thesis 4141http://www.csee.umbc.edu/~jekkin1/thesis
Conclusion (Cont.)
Future work … Lots of things Make the solver more general to handle different
types of queries
Make the caching scheme flexible enough to capture the semantics of the query descriptors as well as the result objects
Simpler cache management
Ability to share cache with peers