Upload
jean-paul-calbimonte
View
272
Download
3
Embed Size (px)
DESCRIPTION
RDF Stream processing and reactive systems
Citation preview
RDF Stream Processing: Let’s React!
Jean-Paul CalbimonteLSIR EPFL
OrdRing 2014. International Semantic Web Conference ISWC 2014
Riva del Garda, 20.10.2014
@jpcik
The RSP CommunityResearch workMany PapersPhD ThesisDatasetsPrototypesBenchmarks
RDF StreamsStream ReasoningComplex Event ProcessingStream Query ProcessingStream CompressionSemantic Sensor WebM
any
topi
cs
Tons
of w
ork
http://www.w3.org/community/rspW3C RSP Community Group
Effort to our work on RDF stream processingdiscussstandardizecombineformalizeevangelize
2
The RSP Community
3
Why Streams?Internet of ThingsSensor NetworksMobile NetworksSmart DevicesParticipatory SensingTransportation
Financial DataSocial MediaUrban PlanningHealth MonitoringMarketing
“It’s a streaming world!”[1]
4 Della Valle, et al : It's a Streaming World! Reasoning upon Rapidly Changing Information. IEEE Intelligent Systems
Why Streams?RDF
Web standardsData discoveryData sharingWeb queries G
o W
eb
SemanticsVocabulariesData HarvestingData linkingMatchingIn
tegr
ation Ontologies
ExpressivityInferenceRule processingKnowledge basesRe
ason
ing
Query languagesQuery answeringEfficient processingQuery FederationPr
oces
sing
Work in progress
Is this what we require?
5
Stonebreaker et al. The 8 requirement of Real-TimeStream Processing. SIGMOD Record. 2005.
Looking back 10 years ago…
“8 requirements of real-time
stream processing”[2]
Do we address them?Do we have more requirements?Do we need to do more?
Keep data movingQuery with stream SQLHandle imperfectionsPredictable outcomesIntegrate stored dataData safety & availabilityPartition & scaleRespond Instantaneously8
Requ
irem
ents
6
Reactive SystemsKeep data movingQuery with stream SQLHandle imperfectionsPredictable outcomesIntegrate stored dataData safety & availabilityPartition & scaleRespond Instantaneously8
Requ
irem
entsEvent-Driven
Jonas Boner. Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems. 2013.
Events:re
act t
o ScalableLoad:
ResilientFailure:
ResponsiveUsers:
Do we address them?Do we have more requirements?Do we need to do more?
7
Warning: You may see code
8
① Keep the data movingProcess data in-stream
Not required to store Active processing model
input streams
RSPqueries/
rules output streams/events
RDF Streams9
RDF Stream…Gi
Gi+1
Gi+2
…Gi+n
…unbo
unde
d se
quen
ce
Gi {(s1,p1,o1), (s2,p2,o2),…} [ti]
1+ triplesimplicit/explicit timestamp/interval
public class SensorsStreamer extends RdfStream implements Runnable { public void run() { .. while(true){ ... RdfQuadruple q=new RdfQuadruple(subject,predicate,object, System.currentTimeMillis()); this.put(q); } }} C-SP
ARQL
How do I code this?
something to run on a thread
timestamped triple
the stream is “observable”
Data structure, execution and callbacks are mixed
10
Observer patternTightly coupled listener
Actor Model
11
Actor1
Actor2
m No shared mutable stateAvoid blocking operatorsLightweight objectsLoose coupling
communicate through messages
mailboxstate
behaviornon-blocking response
send: fire-forget
More goodies later…
Implementations: e.g. Akka for Java/Scala
RDF Streamobject DemoStreams { ... def streamTriples={ Iterator.from(1) map{i=> ... new Triple(subject,predicate,object) } }
Data structureInfinite triple iterator
Executionval f=Future(DemoStreams.streamTriples)f.map{a=>a.foreach{triple=> //do something}}
Asynchronous iteration
Message passingf.map{a=>a.foreach{triple=> someSink ! triple}}
send triple to actor
Immutable RDF stream avoid shared mutable state avoid concurrent writes unbounded sequence
Ideas using akka actors
Disclai
mer: ideas
in
progre
ss
Futures non blocking composition concurrent computations work with not-yet-computed results
Actors message-based share-nothing async distributable
12
RDF Stream
13
… other issues: Graph implementation? Timestamps: application vs system? Serialization?
Loose coupling Immutable data streams Asynchronous message passing Well defined input/output
Event-driven
② Query using SQL on StreamsSPARQL
Model Continuou
s execution
Union, Join, Optional,
Filter
Aggregates
Time
window
Triple
window
R2S operato
r
Sequence, Co-
ocurrence
Time
functio
n
TA-SPARQL TA-RDF ✗ ✔ Limited ✗ ✗ ✗ ✗ ✗
tSPARQL tRDF ✗ ✔ ✗ ✗ ✗ ✗ ✗ ✗
Streaming SPARQL
RDF Stream ✔ ✔ ✗ ✔ ✔ ✗ ✗ ✗
C-SPARQL RDF Stream ✔ ✔ ✔ ✔ ✔ ✗ ✗ ✔
CQELS RDF Stream ✔ ✔ ✔ ✔ ✔ ✗ ✗ ✗
SPARQLStream (Virtual) RDF Stream
✔ ✔ ✔ ✔ ✗ ✔ ✗ ✗
EP-SPARQL RDF Stream ✔ ✔ ✔ ✗ ✗ ✗ ✔ ✗
Instans RDF ✔ ✔ ✔ ✗ ✗ ✗ ✗ ✗
W3C RSP review features in existing systems agree on fundamental operators discuss on possible semantics https://www.w3.org/community/rsp/wiki/RSP_Query_Features
Not e
xhau
stive
!
RSP is not always/only SPARQL-like queryingSPARQL protocol is not enoughRSP RESTful interfaces?
14
Powerful languages for continuous query processing
ExecContext context=new ExecContext(HOME, false);
String queryString =" SELECT ?person ?loc …ContinuousSelect selQuery=context.registerSelect(queryString);selQuery.register(new ContinuousListener(){ public void update(Mapping mapping){ String result=""; for(Iterator<Var> vars=mapping.vars();vars.hasNext();) result+=" "+ context.engine().decode(mapping.get(vars.next())); System.out.println(result); } });
RSP QueryingExample with CQELS (code.google.com/p/cqels)
CQELS continuous query: “Queries evaluated continuously against the changing dataset”[4]
Le Phuoc et al. An Native and adaptive approach for unifyied processing of Linked Streams and Linked Data. ISWC2011.
get result updates
adding listener
register query
15
SELECT ?person ?loc WHERE { STREAM <http://deri.org/streams/rfid> [RANGE 3s] {?person :detectedAt ?loc} }
CQELS
Tightly coupled listenersResults delivery: push & pull?
Dynamic Push-Pull
16
ProducerConsumer
m
data flow
demand flow
Push when consumer is fasterPull when producer is fasterDynamically switch modes
Communication is dynamic depending on demand vs supply
Event-drivenResponsive
③ Handle stream imperfectionsDelayed, missing, out-of-order …
17
val future = myActor.ask("hello")(5 seconds)
context.setReceiveTimeout(100 milliseconds)
def receive = { case "Hello" => //do something case ReceiveTimeout => throw new RuntimeException("Timed out")
Setting timeouts in message passing
timeout after 5 sec
Timeout receiving messages
in case of timeout
Async message passing helpsStill to be studied at protocol levelDifferent guarantee requirementsIn W3C RSP: usually ‘out-of-scope’
“order matters!”[
1]
ResponsiveResilient
④ Generate predictable outcomes
Correctness in RSP
18
“RSP engines produce different but correct results”[1]
RSP queries: operational semanticsCommon query execution modelComparable / predictable results
Work
in Progre
ss
Dell’Aglio Calbimonte, Balduini, Corcho, Della Valle. On Correctness on RDF Stream Processing Benchmarking. ISWC 2013
⑤ Integrate stored and streaming data
19
SELECT ?person1 ?person2 FROM NAMED <http://deri.org/floorplan/>WHERE {GRAPH <http://deri.org/floorplan/> {?loc1 lv:connected ?loc2}STREAM <http://deri.org/streams/rfid> [NOW] {?person1 lv:detectedAt ?loc1}
SELECT ?road ?speed WHERE { { ?road :slowTrafficDue ?observ } { ?observ rdfs:subClassOf :SlowTraffic { ?observ :speed ?speed } FILTER (getDURATION() < ”P1H”^^xsd:duration)}
Integration in RSP languages
stored RDF graph
RDF streamCQELS
RDF stream + stored
EP-SPARQL
“EP-SPARQL uses background ontologies to enable
stream reasoning”[1]
context.loadDataset("{URI OF GRAPH}", "{DIRECTORY TO LOAD}");
Loading background knowledge
More than just joins with stored dataStream reasoning / inferences
Clean query model for stream+stored dataWeb-ready inter-dataset queriesShared vocabularies/ontologies
Responsive?
Scalable?Anicic et al. EP-SPARQL: a unified language for event processing and stream reasoning. WWW2011.
⑥ Guarantee data safety and availability
Restart,Suspend,Stop,Escalate, etc
20
Parent
Actor1
Automatic supervisionIsolate failuresManage local failuresSupervision strategies: All-for-One One-for-oneDeath watch handling
Supervisionhierarchy
Supervision
Actor2
Actor3
Actor4 X
Resilient
Data availability
21
High load:input streams
RSPqueries/
rulesunde
r str
ess
“eviction: delete from cache of operators”[1]
binding1binding2..bindingX
binding3binding5..bindingY
X
Join operator
X
Evict cache items based on: recency LRU Likeliness of matchingResponsive
Resilient
Gao et al. The CLOCK Data-Aware Eviction Approach: Towards Processing Linked Data Streams with Limited Resources. ESWC 2014
⑦ Partition and scale automatically
22
Dynamite: Parallel MaterializationMaintaining a dynamic ontology
Parallel threads
Scalable
Actors everywhere
23
Actor1
Actor2
m No difference in one core many cores many servers
Actor3
Actor4
Transparent RemotingLocality optimizationDefine Routing policiesDefine actor clusters
m
m
Existing ‘map reduce’ for streams: Storm, S4, Spark, Akka StreamsCreate workflows of Stream Processors?
⑧ Process and respond instantaneously
Blocking queriesSync communicationBottlenecks No end to end reactivity
24
SPARQLStream
Virtual RDF Stream
DSMS CEP Sensor middleware
…
rewritten queries
users, applications
query processing
Morph-streams
data layer
“Query virtual RDF streams”[1]
Push:Using Websockets! Two way communicationResponsive answers
Responsive?
RSP Stream ReasoningExample with TrOWL (trowl.eu)
Ontologies evolve over time!
Adding and removing axioms over time
ONTO
+ axiom1+ axiom2
ONTO’- axiom3
ONTO’’
“reasoning with
changing knowledge”[3]
Ren, Pan, Zhao. Towards Scalable Reasoning on Ontology Streams via Syntactic Approximation. IWOD2010.
val onto=mgr.createOntology val reasoner = new RELReasonerFactory().createReasoner(onto) (1 to 10).foreach{i=> reasoner += (Indiv(rspEx+"sys"+i) ofClass system) reasoner.reclassify}println (reasoner.getInstances(SystemClass, true))
get instances of ‘System’ class
reclassify
add 10 axioms
25
trowl
Responsive
Powerful Dynamic Ontology MaintenanceStreaming query results?Notify new inferences?
A lot to do…
26
Reactive RSPsKeep data movingQuery with stream SQLHandle imperfectionsPredictable outcomesIntegrate stored dataData safety & availabilityPartition & scaleRespond Instantaneously8
Requ
irem
ents We go beyond only these
27
Data HeterogeneityData ModelingStream ReasoningData discoveryStream data linkingQuery optimization… more
Reactive PrinciplesNeeded if we want to build relevant systems
Reactive RSP workflows
28
MorphStreams
CSPARQL
s
Etalis
TrOWL
s
s CQELS
Dynamite
s
Minimal agreements: standards, serialization, interfacesFormal models for RSPs and reasoningWorking prototypes/systems!
Event-driven message passingAsync communicationImmutable streamsTransparent RemotingParallel and distributedSupervised Failure HandlingResponsive processing
Reactive RSPs
We like to do things
29
Muchas gracias!
Jean-Paul CalbimonteLSIR EPFL
@jpcik