View
53
Download
1
Category
Tags:
Preview:
Citation preview
Soheila Dehghanzadeh, Daniele Dell’Aglio, Shen Gao,
Emanuele Della Valle, Alessandra Mileo , Abraham Bernstein
ICWE - 25 June 2015
Outline
● Introduction to Continous Queries
● Motivating Example
● Problem Description
● Solution
● Experimental Results
● Conclusions
2ICWE - 25 June 2015
Introduction•R
DF Stream Processing engines usually register queries and execute them in a continuous fashion.
3ICWE - 25 June 2015
RDF Stream Generator
Query
W(ω,β)
EvaluationEvaluation
Time-based sliding window
S3
S4 S5
S6
S7
S8
S9 S10
S11
S12
SSS1
S2
β
ω
t
widthslideWindow
4ICWE - 25 June 2015
Introduction•C
omplex continuous queries combines data streams with remote background data.
Join
RDF Stream Generator
Background data(SPARQL endpoint)
5ICWE - 25 June 2015
Motivating ExampleFinding Influential Users
•Influential User: users who have more than a specific number of followers and are mentioned more than a specific times in a specific period (200 seconds).
•Follower number: stored in a remote endpoint.
•Mention number: computed by processing the stream of messages.
6ICWE - 25 June 2015
Inspired by Chris Testa's SemTech 2011 talk: http://goo.gl/kLSqGo
Investigating the Scenario Symmetrical hash join
•Drawbacks:
• Data access constraints.• Background data is huge and has to be fetched at every
evaluation - slow and wasting computational and financial resources.
Join
RDF Stream Generator
Background data(SPARQL endpoint)
7ICWE - 25 June 2015
Investigating the Scenario Nested Loop Join
•Drawbacks:
• One invocation for each mapping from the WINDOW clause evaluation – high number of requests to the server.
• API restrictions (e.g., limited amount of requests over time).
Join
RDF Stream Generator
Background data(SPARQL endpoint)
8ICWE - 25 June 2015
Investigating the Scenario Local Views
•Challenges:
• Data goes out of date
Join
RDF Stream Generator
Background data(SPARQL endpoint)
Local View
9ICWE - 25 June 2015
Investigating the ScenarioMaintenance processes
•Maintenance introduces a trade-off between response quality and time.
•We propose to manage this trade-off by fixing time dimension based on query constraints and maximizing freshness of response.
Join
RDF Stream Generator
Background data(SPARQL endpoint)
Local View
Maintenance Process
Freshness decreases
Refresh Cost/Quality trade-
off
10ICWE - 25 June 2015
Problem Description
The maintenance process should identify elements of the local view that maximize response freshness.
11ICWE - 25 June 2015
Requirements of The Maintenance Process
1. should satisfy the Quality of Service constraints on responsiveness and freshness of the answer;
2. should take into account the change rates of the data elements in the REST API;
3. should consider the dynamicity of the change rate values;
4. may consider the sliding window operator.
12ICWE - 25 June 2015
Hypotheses
•We formulated the following hypotheses to build the maintenance process
•HP1: the freshness of the answer can increase by maintaining part of the local view involved in the current query evaluation
•HP2: the freshness of the answer increases by refreshing the (possibly) stale local view entries that would remain fresh in a higher number of evaluations
13ICWE - 25 June 2015
JOIN WSJWSJ WBMWBM
RefresherRefresher
BKG
Window
Solution: WSJ+WBM
Local View
HP1
HP2
14ICWE - 25 June 2015
τ
t5 6 7 8 9 10 11
W1 W2 W3 W4
124
5 6 7 8 9 10 11 124
Terminology
Best Before Time: the time that an element will
become stale and is defined by:
Mappings from the WINDOW clause
Mappings in the LOCAL VIEW
Compatible mappings
15ICWE - 25 June 2015
τ
t5 6 7 8 9 10 11
W1 W2 W3 W4
124
5 6 7 8 9 10 11 124
WSJ
•WSJ identifies the candidate set: the possibly stale local view mappings involved in the current evaluation.
•WSJ analyzes the content of the current window evaluation and identifying the compatible mappings in the local view.
•The possibly stale mappings are identified by analyzing the associated best before time
16ICWE - 25 June 2015
V L Score
τ
t5 6 7 8 9 10 11
W1 W2 W3 W4
124
5 6 7 8 9 10 11 124
WBM
•WBM ranks the candidate set to determine which mappings to update.
•The ranking is computed through two values: the renewed best before time and the remaining life time
•The top k elements are selected to be refreshed. The value k is selected according to the responsiveness constraint.
17ICWE - 25 June 2015
V L Score341
τ
t5 6 7 8 9 10 11
W1 W2 W3 W4
124
5 6 7 8 9 10 11 124
WBM: renewed best before time
•When would the mappings became stale if refreshed now?
•The renewed best before time V is computed as:
18ICWE - 25 June 2015
V L Score3 34 11 3
τ
t5 6 7 8 9 10 11
W1 W2 W3 W4
124
5 6 7 8 9 10 11 124
WBM: remaining life time and score
•For how many future evaluations the mappings is involved?
•The remaining life time L is computed as:
•WBM ranks the mappings by using a score:
Score=min(L,V)
• is selected for the maintenance
19ICWE - 25 June 2015
Experiment- Data Collection
1. Streaming APIa. Twitter stream data for mention count
2. Twitter APIs to get number of followersa. Create snapshots everyone minutesb. Simulate the change based on user’s predefined change rates.
Streaming Dataset
Snapshots /synthetic
data
20ICWE - 25 June 2015
Experimental setup
•We study our hypotheses using a comparative evaluation with
• LRU: use the least recently updated elements for maintenance• RND: use a random subset of elements for maintenance
•Error measure
• Comparing the differences between consecutive evaluation of the motivated query against cache and real/synthetic dataset.
•HP1: We compared the cumulative staleness of using WSJ or not (i.e., GNR) for both baselines.
• GNR: candidate set is the whole view entries.•H
P2: We compared the cumulative staleness of using WBM and the improved baselines.
21ICWE - 25 June 2015
HP1: Maintaining involved entries of local view maximizes response accuracy.
Synthetic
Real
WSJ shows better improvement by increasing the update budget than GNR.
22ICWE - 25 June 2015
HP2: Maintaining possibly stale entries from local view that will stay fresh for a longer time maximizes response accuracy.
Synthetic
Real
WBM doesn’t improve as well as WBM* which shows the estimation error has caused by wrong estimation for BBT. Use more accurate prediction for BBT.
23ICWE - 25 June 2015
Conclusions and Future Work•C
onclusions:• We proposed using the idea of materialization to optimize processing
continuous queries.• We proposed a policy to maximize the freshness according to time
constraint in continuous query.• We tested our policy against based line policies (LRU and Random).
•Future Work:
• Extensions of real continuous query processors with the proposed approach
• Measuring the time overhead of maintenance • Investigating more complex queries that have complicated join patterns
between the SERVICE and STREAM clauses.• Dynamically estimating the change rate of users.
24ICWE - 25 June 2015
Slide 25
Soheila Dehghanzadeh, Daniele Dell’Aglio, Shen Gao, Emanuele Della Valle, Alessandra Mileo , Abraham Bernstein
soheila.dehghanzadeh@insight-centre.org http://www.slideshare.net/sallyde
ICWE - 25 June 2015
Recommended