41
Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Embed Size (px)

Citation preview

Page 1: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis

Semantic Query Caching in Mobile Environments

By: Jekkin Shah

Advisor: Dr. Konstantinos Kalpakis

Page 2: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 22 http://www.csee.umbc.edu/~jekkin1/thesis

Semantic Query Caching in Mobile Environments

Introduction

Motivation

Contribution

Concept of Semantic Caching

Issues involved in semantic caching

System Architecture

Prototype and Experiments

Conclusion and further work

Page 3: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 33 http://www.csee.umbc.edu/~jekkin1/thesis

Introduction

Disparate works and progresses in :

Geographic Information System (GIS)

Global Positioning System (GPS)

Wireless Technology

Handheld devices

Convergence to Mobile Geographic Information System (mobile GIS)

Rapid growth in mobile GIS applications in all walks of life

Emphasis on spatial data, its storage, retrieval and manipulation

Page 4: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 44 http://www.csee.umbc.edu/~jekkin1/thesis

Mobile GIS

GIS GPS

Wireless Handheld

Convergence

Page 5: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 55 http://www.csee.umbc.edu/~jekkin1/thesis

Growing List of Applications

Car navigation systems

Emergency services

Real time stock quotes

Field services

Real time tracking and routing of shipments

Environmental surveys

and the list is growing rapidly …

Page 6: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 66 http://www.csee.umbc.edu/~jekkin1/thesis

Semantic Query Caching in Mobile Environments

Introduction

Motivation

Contribution

Concept of Semantic Caching

Issues involved in semantic caching

System Architecture

Prototype and Experiments

Conclusion and further work

Page 7: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 77 http://www.csee.umbc.edu/~jekkin1/thesis

Motivation

Hungry !!! Lets find a nearby restaurant

query Q1:

FIND restaurants WHERE location = “nearby”

McDonalds 2 miles

Dominos Express 2.4 miles

Taco Bell 3 miles

Subway 4 miles

McDonalds 5 miles

…….

Found 37 matches

Page 8: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 88 http://www.csee.umbc.edu/~jekkin1/thesis

Example 1 (cont.)

Wait …We also need some gas !!!

Lets see if we can find a gas station near McDonalds.

query Q2:

FIND “McDonalds” WHERE gas Station = “nearby”

McDonalds 5 miles

McDonalds 12.4 miles

Found 2 matches

Page 9: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 99 http://www.csee.umbc.edu/~jekkin1/thesis

Shouldn’t we speed up the process ?

Query Q1 is in local cache

Query Q1 subsumes query Q2

Why do we need to execute query Q2 from scratch ??

We need a technique to determine and extract Q2 from Q1

Unfortunately, traditional techniques like page caching do not provide much help in this case

Q1Q2

Page 10: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 1010http://www.csee.umbc.edu/~jekkin1/thesis

A new approach – Semantic Caching

Along with query results, store the queries also in cache

Use these queries (query descriptors) to determine if and how a new query can be answered from cache Check if the required data is present in cache. Extract the data from cache

Add, remove, merge data by performing corresponding operation on query descriptors

Manage cache by managing the query descriptors

Think of query descriptors as intelligent pointer references that implicitly contain some information about the data they refer to

Page 11: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 1111http://www.csee.umbc.edu/~jekkin1/thesis

Problems with traditional caching

Pointer references do not contain any implicit information

Q1 p1,p2,p3,p4,p5,p6

Q2 p7,p8,p9,p10,p11,p12

Q3 all the pages

Space constraints will make it difficult to store all the pages in cache.

p1

p3

p5

p7 p8

p9 p10

p11 p12

p2

p4

p6data3

Page 12: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 1212http://www.csee.umbc.edu/~jekkin1/thesis

Semantic Query Caching in Mobile Environments

Introduction

Motivation

Contribution

Concept of Semantic Caching

Issues involved in semantic caching

System Architecture

Prototype and Experiments

Conclusion and further work

Page 13: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 1313http://www.csee.umbc.edu/~jekkin1/thesis

Contribution

An architecture for Semantic Caching in mobile environments

A system prototype as a “proof-of-concept” with the following building blocks Query parser and validator A Solver for determining query satisfiability An Executor for processing partial and remainder queries A Cache manager for efficiently managing the cache

A cache replacement algorithm

Techniques for query processing

Page 14: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 1414http://www.csee.umbc.edu/~jekkin1/thesis

Semantic Query Caching in Mobile Environments

Introduction

Motivation

Contribution

Concept of Semantic Caching

Issues involved in semantic caching

System Architecture

Prototype and Experiments

Conclusion and further work

Page 15: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 1515http://www.csee.umbc.edu/~jekkin1/thesis

Issues in semantic caching

Although the idea of semantic caching is straight forward, store query descriptors along with their results, the issues involved are much harder !!

Simple concept but Difficult Implementation

Issues

1. We need to decide if the answer is present in cache

2. If present, do we have sufficient information to extract it ?

Page 16: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 1616http://www.csee.umbc.edu/~jekkin1/thesis

Answering Queries from Cache

Q1 Select * from db where A > 50

Q2 Select * from db where B < 550

Q3 Select * from db where (A > 200 and B < 300)

Is result of Q3 present in (Q1 + Q2) ?

Page 17: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 1717http://www.csee.umbc.edu/~jekkin1/thesis

Solving the implication problem

Let T = { Q1, Q2 } be a set of query descriptors already in cache

We need to show that QT

We show that ¬ (Q T) is FALSE

¬ (Q T)

¬ (¬ Q T)

Q (¬T)

Q ¬(T1 T2 T3 T4)

Q (¬T1) (¬T2) (¬T3) (¬T4)

This is the primary technique used in our thesis.

The algorithm is adopted from [LY85].

Page 18: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 1818http://www.csee.umbc.edu/~jekkin1/thesis

Solving the implication problem (Cont.)

Exponential growth in the number of equations to be solved.

Solution: Clustering based on Signatures

Signature created by taking into account the predicate attributes present in the query

Restriction on the number of clusters created

Signature used in indexing the query descriptors

Attr A, B

Attr X, D

Page 19: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 1919http://www.csee.umbc.edu/~jekkin1/thesis

Data Extraction problem

Select * from db where A > 50

Select * from db where B < 550

Select * from db

where (A > 200 and B < 300 and C = 100)

Data1

Data2

Data3

Can we extract Data3 ?

We fetch attribute C from remote source and take a Cartesian product with the data already present in cache

Page 20: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 2020http://www.csee.umbc.edu/~jekkin1/thesis

Answering Partial Queries

What happens if QT is FALSE ?

There may be a non empty intersection set between Q and T

Answer (Q T) locally (Partial match)

Send (Q ¬ T) to the server (Remainder Query)

T1 T2

Q

Page 21: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 2121http://www.csee.umbc.edu/~jekkin1/thesis

Semantic Query Caching in Mobile Environments

Introduction

Motivation

Contribution

Concept of Semantic Caching

Issues involved in semantic caching

System Architecture

Prototype and Experiments

Conclusion and further work

Page 22: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 2222http://www.csee.umbc.edu/~jekkin1/thesis

Semantic Caching Architecture

Solver

(Query implication)

Query parser and Validator

Cache manager

Remote db

Local Cache

Executor

results

query

Page 23: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 2323http://www.csee.umbc.edu/~jekkin1/thesis

Cache Structure

Local Cache is implemented as relational database structures

Query descriptors are stored in one table indexed by their signatures

Corresponding query results (data) are stored in another table

An auxiliary table associates the query descriptors with its corresponding data

Cache manager interacts with query descriptor table

Manipulation of data is achieved through the manipulation of query descriptors

Page 24: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 2424http://www.csee.umbc.edu/~jekkin1/thesis

Cache: Operations and Management

Cache Manager:

Replacement module: Replacement: Determines what needs to be cached and

what can be purged out

Management module: Addition: Granularity of addition is a semantic region

Deletion: Removal of region, though not necessarily leading to the removal of data

Merge: To simplify query processing, two or more regions can be merged

Decomposition: A very large region, can be decomposed for efficiency reasons

Page 25: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 2525http://www.csee.umbc.edu/~jekkin1/thesis

Cache Replacement

Theory and Assumptions

What is the performance metric ?

Conventional caching schemes optimize one or more of the following parameters with the goal of improving the performance Hit ratio

Response time

Data transmission time

Due to the dynamics of our application domain, none of these parameters truly reflect the performance of our applications

Page 26: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 2626http://www.csee.umbc.edu/~jekkin1/thesis

Theory and Assumptions (Cont.)

Cache Hit Rate : how do we define hit rate ? One: At least one data record obtained from cache All: All data records to be obtained from local cache Mid: 50% of data records to be satisfied from local cache

Response time: Partially answered queries make it difficult to accurately

define the response time

Data transmission time: Lot of dependence on the actual network parameters like

latency and bandwidth

Page 27: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 2727http://www.csee.umbc.edu/~jekkin1/thesis

Theory and Assumptions (Cont.)

Mobile environments: Premium on bandwidth

Our goal: “To minimize the cost of servicing the requests that cannot be answered from the local cache”

Cost is measured in terms of time

Performance metric is Byte hit rate (BHR): Ratio of actual amount of data served from local cache to

the amount of data transferred from the remote source

Assumptions: Negligible query execution time

Uniform latency and bandwidth across the network

Page 28: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 2828http://www.csee.umbc.edu/~jekkin1/thesis

Replacement Algorithm

Guiding Action Selection function (GAS) to assign a value to each semantic region

GAS value = a + (s * f * b) s = size of data transferred from the remote source f = frequency of access of the query a, b are domain specific parameters a = freshness count of each query b = 1/Sd, where Sd is the distance between the current

location of the moving object and the location of query

Using the GAS function the value of each semantic region is calculated

Page 29: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 2929http://www.csee.umbc.edu/~jekkin1/thesis

Replacement Algorithm (Cont.)

For each query in cache we have, GAS value (Vi)

Weight (Wi)

Also, we have a limit on the total size of the cache (W) and also the total number of queries (K) that can be admitted

Problem definition: “Given a set of rectangles with a weight and a value,

choose at most K rectangles that gives maximum value, provided the weight does not exceed W”

The problem can be formulated as the 0-1 Knapsack problem with additional cardinality constraint

Page 30: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 3030http://www.csee.umbc.edu/~jekkin1/thesis

Semantic Query Caching in Mobile Environments

Introduction

Motivation

Contribution

Concept of Semantic Caching

Issues involved in semantic caching

System Architecture

Prototype and Experiments

Conclusion and further work

Page 31: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 3131http://www.csee.umbc.edu/~jekkin1/thesis

Experiments (Setup)

Requirements Workload (datasets and queries)

Modeling the behavior of the moving object

Query execution guidelines

Real datasets Hard to obtain

Complexity in processing due to complex structures of spatial objects

Synthetic dataset generator Easily generated

Various parameters can be controlled

Page 32: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 3232http://www.csee.umbc.edu/~jekkin1/thesis

Workload

Query load selection Tables

Restaurants: LocX, LocY, Name, ID, tables, City, Zip

Gas Stations: LocX, LocY, Name, ID, Low, Mid, High

Query specifications:

Rectangular queries (select and project only)

Number of queries issued per trip : 20-70

Type of queries: Location aware, location dependent and non-location related

Frequency of issuance: Selected randomly ranging from 5 ms to 100 ms

Overlap rate: 10-25%

Page 33: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 3333http://www.csee.umbc.edu/~jekkin1/thesis

Experiments (Moving Object)

Behavior of Moving Object Generating Spatio-Temporal Dataset (GSTD) [PT00]

Moves in a 2D space

Static points and regions called infrastructure emulate real life objects like buildings, rivers, roads etc.

Trajectories are generated using specific guidelines

Initial statistical distribution of infrastructure objects

Source and destination location

Speed of moving object

Direction of motion

Duration of journey

Page 34: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 3434http://www.csee.umbc.edu/~jekkin1/thesis

Query Execution Guidelines

Controllable parameters Type of queries:

Location dependent, Location aware, Non-location related

Frequency of query issuance

Selectivity of chosen queries

Query overlap rate

Parameters are chosen in a variety of combinations Random

Gaussian distribution

Skewed distribution

Page 35: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 3535http://www.csee.umbc.edu/~jekkin1/thesis

Results

Cache Size Vs Hit Rate ( NEW vs m-LRU)

The NEW replacement scheme compares roughly equal to modified LRU replacement scheme

BHR increases upto 70% when cache size is progressively increased

Page 36: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 3636http://www.csee.umbc.edu/~jekkin1/thesis

Results

Hit rates Vs Number of queries (NEW scheme)

Increasing the number of queries in the system does not substantially increase the hit rates.

Byte hit rate performs nearly equal to Hit rate Mid

Page 37: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 3737http://www.csee.umbc.edu/~jekkin1/thesis

Semantic Query Caching in Mobile Environments

Introduction

Motivation

Contribution

Concept of Semantic Caching

Issues involved in semantic caching

System Architecture

Prototype and Experiments

Conclusion and further work

Page 38: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 3838http://www.csee.umbc.edu/~jekkin1/thesis

Conclusion

No assumption made on Spatial Locality of Reference

Query descriptors act as “Intelligent” References

Can support Content Based Reasoning

Ability to take advantage of Schema Knowledge

Page / Tuple caching schemes do not scale well in our GIS domain

Reasons:• “Unintelligent” pointer references

• Questionable assumption of Spatial Locality of Reference

• Inability to take advantage of Semantic Overlaps

Page 39: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 3939http://www.csee.umbc.edu/~jekkin1/thesis

Advantages of Semantic Caching

Benefits of Semantic Caching Leverages semantic locality found in typical mobile GIS

applications

Adapts dynamically to the patterns of user queries rather than caching static clusters of tuples

Minimizes cost of cache lookup due to compact representation of query descriptors

Capable of providing partial and/or approximate answers to queries quickly

Page 40: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 4040http://www.csee.umbc.edu/~jekkin1/thesis

Conclusion (Cont.)

Shortcomings of Semantic Caching Complicated cache management schemes

Too restrictive. Solver can process only simple type of queries

Captures the semantics of the query and not the result objects. Hence, fails to utilize cached objects when the semantics of the query do not match

Page 41: Master’s Thesis Semantic Query Caching in Mobile Environments By: Jekkin Shah Advisor: Dr. Konstantinos Kalpakis

Master’s Thesis 4141http://www.csee.umbc.edu/~jekkin1/thesis

Conclusion (Cont.)

Future work … Lots of things Make the solver more general to handle different

types of queries

Make the caching scheme flexible enough to capture the semantics of the query descriptors as well as the result objects

Simpler cache management

Ability to share cache with peers