EXCHANGING INTENSIONAL EXCHANGING INTENSIONAL XML DATAXML DATA
Tova MiloTova Milo INRIA & Tel-Aviv U. ; Serge AbiteboulSerge Abiteboul INRIA ;
Bernd AmannBernd Amann Cedric-CNAM ; Omar BenjellounOmar Benjelloun INRIA ;
Fred Dang NgocFred Dang Ngoc INRIA
H. GÜL ÇALIKLI 2002700743H. GÜL ÇALIKLI 2002700743
MURAT KORAŞ 2002700797MURAT KORAŞ 2002700797
INTRODUCTIONINTRODUCTION
Emergence of Web Services as standard means of publishing and accessing data on the web introduced a new class of XML documents called “intensional documents”.“intensional documents”.
Intensional Documents:Intensional Documents:XML documents where; some of some of the documents are defined defined
explicitlyexplicitly some are defined by programsdefined by programs that generate
data.
INTRODUCTIONINTRODUCTION
materialisation: the process of evaluating some of the programs included in an XML document and replacing them by their results.
GOAL of this PAPER:GOAL of this PAPER: Study the new issues raised by the exchange of Study the new issues raised by the exchange of
intensional XML document btw. Applicationsintensional XML document btw. Applications Decide on Decide on which data should be materialised which data should be materialised
before it is sent and which should not before it is sent and which should not
INTRODUCTIONINTRODUCTIONCONSIDERATIONS for MATERIALISATIONCONSIDERATIONS for MATERIALISATION
Performance:Performance: current system loadcurrent system load cost of communicationcost of communication
Capabilities:Capabilities: unability to handle intensional parts of a documentunability to handle intensional parts of a document lack of access rights (to a particular service)lack of access rights (to a particular service)
Security:Security: invoking service calls from an untrusted party may invoking service calls from an untrusted party may
cause severe security violations cause severe security violations Functionalities:Functionalities:
confidentiality reasonsconfidentiality reasons calling services may involve fees to be paid.calling services may involve fees to be paid.
INTRODUCTIONINTRODUCTION
Sendercapabilities
ACLcost...
Receivercapabilities
ACLcost...
Data Exchange Schema
g
q f
fq g
...
g
q r
g
f
r qg
r
g
q
... ... ... ...
Data exchange scenario for intensional documentsData exchange scenario for intensional documents
g
r
THE MODEL and THE PROBLEMTHE MODEL and THE PROBLEM
SIMPLE INTENSIONAL XML:SIMPLE INTENSIONAL XML: Model intentional XML documents asModel intentional XML documents as Labelled Labelled
TreesTrees consisting of two types of nodes: consisting of two types of nodes: Data nodes Data nodes Function NodesFunction Nodes correspond to “ correspond to “Service Calls”Service Calls”
Assume the existance of someAssume the existance of some Disjoint Domains:Disjoint Domains: N :N : domain of NODESdomain of NODES
L :L : domain of LABELSdomain of LABELS F : F : domain of FUNCTION NAMESdomain of FUNCTION NAMES D : D : domain of DATA VALUESdomain of DATA VALUES
THE MODEL and THE PROBLEMTHE MODEL and THE PROBLEM
SIMPLE INTENSIONAL XML (cont’d)SIMPLE INTENSIONAL XML (cont’d) DEFINITION 1:DEFINITION 1: An An intensional documentintensional document dd is an is an
expression expression (T,(T,λλ)) where: where: T=(N,E,<)T=(N,E,<) is an is an ordered tree.ordered tree.
N N NN : finite set of nodes: finite set of nodes E N X NE N X N : : edges edges << : associates with each node in N a total : associates with each node in N a total
order on its children.order on its children. λλ :N :N L L U U F F U U D D is a is a labeling functionlabeling function for for
the nodes.the nodes.
NOTE:NOTE: only leaf nodes may be assigned data only leaf nodes may be assigned data
values from values from DD
THE MODEL and THE PROBLEMTHE MODEL and THE PROBLEM
SIMPLE INTENSIONAL XML (cont’d)SIMPLE INTENSIONAL XML (cont’d) Nodes with a label in Nodes with a label in L L U U D D are called are called Data Data
Nodes.Nodes. Nodes with a label in Nodes with a label in F F are called are called Function Function
Nodes.Nodes. The children subtrees of a function node are The children subtrees of a function node are
the the Function ParametersFunction Parameters When the function is called;When the function is called;
These subtrees are passed to itThese subtrees are passed to it The return value replaces the function node in The return value replaces the function node in
the document.the document.
newspaper
title
“The Sun”
date
“04/10/2002”
Get_Temp
city
“Paris”
TimeOut
“Exhibits”
temp
“16 ºC”
THE MODEL and THE PROBLEMTHE MODEL and THE PROBLEM
SIMPLE SCHEMA:SIMPLE SCHEMA: DEFINITION 2:DEFINITION 2: A document schema s document schema s is an
expression (L,F,(L,F,ττ)) where, L L LL :finite set of labelsfinite set of labels F F F F ::finite set of function namesfinite set of function names ττ : :function that maps:function that maps:
Each label name l Є L to a regular expression over L U F or to the keyword data
Each function name f Є F to a pair of expressions called
τin(f ) input type of f τout(f ) output type of f
THE MODEL and THE PROBLEMTHE MODEL and THE PROBLEM
SIMPLE SCHEMA (cont’d)SIMPLE SCHEMA (cont’d)
Example of a Schema:Example of a Schema: data:data: ττ (newspaper) =title.date.(Get_Temp|temp) (newspaper) =title.date.(Get_Temp|temp)
.(TimeOut|exhibit).(TimeOut|exhibit) ττ (title) = data (title) = data ττ (date) = data (date) = data ττ (temp) = data (temp) = data ττ (city) = data (city) = data ττ (exhibit) = data (exhibit) = data
THE MODEL and THE PROBLEMTHE MODEL and THE PROBLEM
SIMPLE SCHEMA (cont’d)SIMPLE SCHEMA (cont’d)
Example of a Schema (cont’d):Example of a Schema (cont’d): functions:functions: ττinin (Get_Temp)= city (Get_Temp)= city ττoutout (Get_Temp)= temp (Get_Temp)= temp ττinin (TimeOut)= data (TimeOut)= data ττoutout (Timeout)= (exhibit|performance) (Timeout)= (exhibit|performance) ττinin (Get_Date)= title (Get_Date)= title ττinin (Get_Date)= date (Get_Date)= date
THE MODEL and THE PROBLEMTHE MODEL and THE PROBLEM
THE MODEL and THE PROBLEMTHE MODEL and THE PROBLEM
SIMPLE SCHEMA (cont’d):SIMPLE SCHEMA (cont’d): DEFINITION 3:DEFINITION 3: An An intensional document t intensional document t is is
instance of a schema instance of a schema s=s=(L,F,(L,F,ττ)) if for each:
Data Node n Є t with label l Є L, the labels of
n’s children form a word in lang(ττ((l ))
Same is valid for Function Node.
Used to denode the regular language defined by ττ ( (l )
SIMPLE SCHEMA (cont’d):SIMPLE SCHEMA (cont’d): DEFINITION 3 (cont’d):DEFINITION 3 (cont’d): f f : a function name : a function name
tt11,......,t,......,tn n :: a sequence of intensional trees a sequence of intensional trees IFIF the labels of n’s children form a word in the labels of n’s children form a word in lang(ττinin(f)) (lang(ττoutout(f)) )ANDANDall the trees are instances of s.THENTHEN
tt11,......,t,......,tnn is an is an input instanceinput instance of of f f (output instance)(output instance)
THE MODEL and THE PROBLEMTHE MODEL and THE PROBLEM
every subtree conforms to the
same schema as the whole document
SIMPLE SCHEMA (cont’d):SIMPLE SCHEMA (cont’d): DEFINITION 4: (about Rewritings)DEFINITION 4: (about Rewritings)
t,t’: treest,t’: trees IFIF t’ is obtained from t by;t’ is obtained from t by;
selecting a function node selecting a function node v v in t with some in t with some label label ff andand
replacing it by an arbitrary output instance replacing it by an arbitrary output instance of of ff
THENTHEN we say thatwe say that t t’t t’
THE MODEL and THE PROBLEMTHE MODEL and THE PROBLEM
v
THE MODEL and THE PROBLEMTHE MODEL and THE PROBLEM
SIMPLE SCHEMA (cont’d):SIMPLE SCHEMA (cont’d): DEFINITION 4: (about Rewritings) (cont’d)DEFINITION 4: (about Rewritings) (cont’d)
IFIF t tt t1 1 tt2 ------ 2 ------ ttn n THENTHEN
we say that we say that t tt tn n
nodes nodes vv11,........, v,........, vnn are called are called rewriting rewriting
sequencesequence the set of all trees the set of all trees t’t’ such that such that t t’ t t’ is is
denoted denoted ext(t)ext(t)..
vv11 vv22 vvnn
* t rewrites into tt rewrites into tnn
*
THE MODEL and THE PROBLEMTHE MODEL and THE PROBLEM
SIMPLE SCHEMA (cont’d):SIMPLE SCHEMA (cont’d): DEFINITION 5: (DEFINITION 5: (about Rewritings)about Rewritings) Let:Let:
t be a treet be a tree s be a schemas be a schema
1.1. IF IF ext(t) contains some instance of s ext(t) contains some instance of s THENTHEN
t t possibly rewrites possibly rewrites into s.into s. 2. 2. IFIF either either t is already an instance of st is already an instance of s
oror there exists some node there exists some node vv in t such that in t such that all trees t’ where all trees t’ where t t’t t’ safely rewrite safely rewrite into sinto s
THEN THEN we say that we say that t t safely rewritessafely rewrites into s into s
vv
THE MODEL and THE PROBLEMTHE MODEL and THE PROBLEM
SIMPLE SCHEMA (cont’d):SIMPLE SCHEMA (cont’d): DEFINITION 6:DEFINITION 6: Let:Let:
s be a schemas be a schema r is a distinguished label called root labelr is a distinguished label called root label
IF IF all the instances t of s with root label r rewrite all the instances t of s with root label r rewrite safely into instances of s’ safely into instances of s’
THENTHEN we say that:we say that:
s s safely rewritessafely rewrites into s’into s’
THE MODEL and THE PROBLEMTHE MODEL and THE PROBLEM
A Richer Data Model :A Richer Data Model :
Function Patterns:Function Patterns:
The schemas we have seen so far specify that a The schemas we have seen so far specify that a particular functionparticular function, identified by its name, may , identified by its name, may appear in the document.appear in the document.
But sometimes, one does not know in advance But sometimes, one does not know in advance which functions will be used at a given place.which functions will be used at a given place.
A common intensional schema for such A common intensional schema for such documents should not require the use of a documents should not require the use of a particular function, but rather allow for a set of particular function, but rather allow for a set of functions, which have a proper signature.functions, which have a proper signature.
THE MODEL and THE PROBLEMTHE MODEL and THE PROBLEM
to specify such set of functions we useto specify such set of functions we use Function Function PatternsPatterns
Function Patterns:Function Patterns: A function belongs to the A function belongs to the pattern if its name satisfies thepattern if its name satisfies the boolean predicateboolean predicate
and itsand its signaturesignature is the same as the required oneis the same as the required one EX:EX:
ττnamename (Forecast)= UDDIF InACL(Forecast)= UDDIF InACL ττinin (Forecast)= city(Forecast)= city ττoutout (Forecast)= temp(Forecast)= temp
V
THE MODEL and THE PROBLEMTHE MODEL and THE PROBLEM
A Richer Data Model (cont’d): A Richer Data Model (cont’d): Restricted Service Invocations:Restricted Service Invocations:
We assumed so far that all the functions appearing We assumed so far that all the functions appearing in a document may be invoked in a rewriting, in in a document may be invoked in a rewriting, in order to match a given schema.order to match a given schema.
This is not always the case, for the reasons like;This is not always the case, for the reasons like; securitysecurity,, costcost,, access rightsaccess rights , etc. , etc.
THUS, function names/patterns in the schema can THUS, function names/patterns in the schema can be partitioned into two disjoint groups of be partitioned into two disjoint groups of invocable invocable and and noninvocablenoninvocable ones. ones.
A A legal rewritinglegal rewriting is then one that invokes only is then one that invokes only invocable functionsinvocable functions..
EXCHANGING INTENSIONAL DATAEXCHANGING INTENSIONAL DATA
Rewriting Process: Rewriting Process:
1.1.Safe Writing:Safe Writing: check if check if tt safely rewrites to safely rewrites to ss
if so, find a if so, find a rewriting sequencerewriting sequence.. rewriting sequencerewriting sequence a sequence of functions a sequence of functions
that need to be invoked to transformthat need to be invoked to transform tt into the into the required structure required structure
preferred required structure preferred required structure shortest/ cheapest shortest/ cheapest oneone
EXCHANGING INTENSIONAL DATAEXCHANGING INTENSIONAL DATA
Rewriting Process(cont’d):Rewriting Process(cont’d): 2.2.Possible Writing :Possible Writing : IFIF a safe rewriting does not exist a safe rewriting does not exist
check whether at least check whether at least tt may rewrite to may rewrite to ss.. IFIF it is acceptable to do so (the sender accepts it is acceptable to do so (the sender accepts
that the rewriting may fail),that the rewriting may fail), try to find a successful rewriting sequence if try to find a successful rewriting sequence if
one existsone exists preferred rewriting sequence preferred rewriting sequence one with the one with the
least cost.least cost.
EXCHANGING INTENSIONAL DATAEXCHANGING INTENSIONAL DATA
Rewriting Process(cont’d): Rewriting Process(cont’d):
3.3.Mixed Approached:Mixed Approached:
In mixed approach, one could In mixed approach, one could first invoke some function callsfirst invoke some function calls then attempt from there to find safe rewritings.then attempt from there to find safe rewritings.
EXCHANGING INTENSIONAL DATAEXCHANGING INTENSIONAL DATA
Rewriting Process(cont’d):Rewriting Process(cont’d): DEFINITION 7:DEFINITION 7:
For a rewriting sequenceFor a rewriting sequence ttvv ::tt11 .. .. ttn n ,,
IFIF V V j j ЄЄ ttii butbut V V jj ЄЄ ttii-1 -1 ..
THENTHEN we say that we say that function nodefunction node VVjj depends on depends on
aa function nodefunction node V V ii .. IF IF the dependency graph among the nodes the dependency graph among the nodes
contains no paths of length greater than contains no paths of length greater than kk.. THEN THEN we say that we say that a rewriting sequence is ofa rewriting sequence is of
depth kdepth k
v1 vn
EXCHANGING INTENSIONAL DATAEXCHANGING INTENSIONAL DATA
RESTRICTION:RESTRICTION:
“Consider onsider onlyonly k-depth left-to-rightk-depth left-to-right rewritings. rewritings.“
SAFE REWRITINGSAFE REWRITING
Algorithm for k-depth left to right safe rewriting Algorithm is decomposed into three parts:
1.Rewriting Function Parameters: to invoke a function
its parameters should be of right type if not
they should be rewritten to fit that type.
when rewriting the parameters; the functions in them can be invoked
ONLY IFONLY IF their own parameters can be rewritten into (i.e. are the expected input type.)
Algorithm is decomposed into three parts (cont’d) 1.Rewriting Function Parameters (cont’d)
For deepest functions Verify that their parameters are instances
of the corresponding input types. If notrewriting fails.
Move upward ( do till all functions in the tree(forest) are done)
Try to safely rewrite f ’s own parameters into the required structure.
If notrewriting fails.
SAFE REWRITINGSAFE REWRITING
Algorithm is decomposed into three parts (cont’d) 2.Top Down Traversal:2.Top Down Traversal:
In each iteration of the recursive procedure In each iteration of the recursive procedure “Rewriting Function Parameters”“Rewriting Function Parameters”,the ,the parameters of the outmost functions of tree parameters of the outmost functions of tree (forest) are handled.(forest) are handled.
In this part In this part safely rewrite the tree (forest) safely rewrite the tree (forest) by invoking only these outmost functions.by invoking only these outmost functions.
THUS:THUS: traverse the tree (forest) traverse the tree (forest) top downtop down At At each stepeach step treat a treat a single nodesingle node and and its its
childrenchildren..
SAFE REWRITINGSAFE REWRITING
Algorithm is decomposed into three parts (cont’d) 2.Top Down Traversal (cont’d)2.Top Down Traversal (cont’d)
node nnode n with children whose labels form a with children whose labels form a word word ww The subtree rooted at node n can be rewritten The subtree rooted at node n can be rewritten
into the target schema into the target schema s=(L,F,s=(L,F,ττ))IF and ONLY IF:IF and ONLY IF: 1. 1. ww can be safely rewritten into a word in can be safely rewritten into a word in
lang(lang(ττ(label(n)))(label(n)))ANDAND 2. each of n’s children can be safely
rewritten into an instance of s.
SAFE REWRITINGSAFE REWRITING
SAFE REWRITINGSAFE REWRITING
Algorithm is decomposed into three parts (cont’d) 3.Rewriting the children of a node n:3.Rewriting the children of a node n: Given:Given:
w word (sequence of labels of n’s children) Goal:Goal:
rewrite rewrite w so that it becomes a word in the regular language R=R=ττ(label(n))(label(n))
The process of The process of rewritingrewriting involves: involves: choosing some functions in choosing some functions in ww and replacing them and replacing them
by a possible outputby a possible output then choosing some other functions (which might then choosing some other functions (which might
have been returned by previous calls) and have been returned by previous calls) and replacing them by their outputreplacing them by their output
and so on up to the depth kand so on up to the depth k
Safe Rewriting Algorithm:Safe Rewriting Algorithm: Given:Given:
word word ww the output types the output types RRf1f1,.....,R,.....,Rfnfn of the available functionsof the available functions
target regular language target regular language RR Purpose of the algorithm:Purpose of the algorithm:
to test ifto test if ww can be safely rewritten into a word in can be safely rewritten into a word in RR if so, to find a if so, to find a safe rewriting sequencesafe rewriting sequence
SAFE REWRITINGSAFE REWRITING
SAFE REWRITINGSAFE REWRITING
Safe Rewriting Algorithm:Safe Rewriting Algorithm: Note:Note:For illustration purposes we use the For illustration purposes we use the newspaper documentnewspaper document
w=title.date.Get_Temp.TimeOutw=title.date.Get_Temp.TimeOut word children labels formword children labels form
R=title.date.temp (TimeOut|exhibitR=title.date.temp (TimeOut|exhibit**)) safe rewriting of the above word into the word in safe rewriting of the above word into the word in RR
The Algorithm:The Algorithm: 1)1) Build the finite state automata for the following Build the finite state automata for the following
regular languagesregular languages 1.1) 1.1) An AutomatonAn Automaton AAww accepting accepting ww as a single as a single
wordword..
SAFE REWRITINGSAFE REWRITING
The Algorithm (cont’d)The Algorithm (cont’d) 1.2)1.2) Build automata A Build automata Afi ,fi ,i=1,...,n i=1,...,n each accepting each accepting
the regular language Rthe regular language Rfifi
1.3) 1.3) Build an automaton A accepting the Build an automaton A accepting the complement of the regular language complement of the regular language R R . . The The automaton should be deterministic and complete.automaton should be deterministic and complete.
SAFE REWRITINGSAFE REWRITING
The complement automation A for schema
ττ’(newspaper)=title.temp(TimeOut|exhibit*)’(newspaper)=title.temp(TimeOut|exhibit*)
p5
p3 p3 p4 p6temp TimeOut
exhibit
exhibit
*
*
**
*
p1 datep0 title
*
SAFE REWRITINGSAFE REWRITING
The Algorithm (cont’d)The Algorithm (cont’d) 2)2) Let ALet Aw w := := AAww 3)3) For j=1,...,k For j=1,...,k
Consider all the edgesConsider all the edges e=(v,u) e=(v,u) in in AAww that are that are labelled by the function name labelled by the function name ffi i and not iterated and not iterated in previous iterationsin previous iterations
3.1)3.1) extend A extend Aww by attaching a copy of the by attaching a copy of the automaton Aautomaton Afifi with its inital and final states with its inital and final states linked to linked to v v andand u u respectively by respectively by εε moves.moves.
3.2)3.2) denote denote v v as a as a fork node fork node ((for the edge efor the edge e)) 3.3) 3.3) two fork options of two fork options of v v areare e e itself and the new itself and the new
outgoing outgoing εε edge edge
k
k
k
SAFE REWRITINGSAFE REWRITING
1 depth automaton Aw for the word
w=title.date.Get_Temp.TimeOutw=title.date.Get_Temp.TimeOut
1
q1date
q0title
q2Get_Temp
q3 TimeOut q4
q5
ε
q6
εtemp
q7
ε ε
exhibit
performance
Fork node Fork node
Represents choice of invoking the function
Represents choice of not invoking the function
SAFE REWRITINGSAFE REWRITING
The Algorithm (cont’d)The Algorithm (cont’d) 4) 4) Construct the cartesian product automatonConstruct the cartesian product automaton
AAXX=A=Aw w X AX A The fork nodes and fork options in The fork nodes and fork options in AAX X reflect reflect
those of those of AAw :w : 4.1)4.1) the fork nodes the fork nodes [q p] [q p] ЄЄ A AX X nodes where nodes where qq was was
a fork node in a fork node in AAw w 4.2)4.2) a fork option in a fork option in AAX X consists of all edges consists of all edges
originating from one fork option edge in originating from one fork option edge in AAw.w.
k
k
k
k
SAFE REWRITINGSAFE REWRITING
The cartesian product automaton Ax = Aw x A
q0,p0
q3,p6
q1,p1 q2,p2 q3,p3
q5,p2 q6,p3
q4,p4
q7,p3 q4,p3
q7,p5 q5,p5
q7,p6
q4,p6
q7,p6
title date
Get_Temp
temp
TimeOut
Perform.
exhibit
PerformanceexhibitTimeOut
εExhibit
Performance
ε
ε ε
ε
ε
εε
Figure6:Figure6:
SAFE REWRITINGSAFE REWRITING
The Algorithm (cont’d):The Algorithm (cont’d): 5)5) Mark nodes in Mark nodes in AAXX ::
5.1)5.1) mark states that are accepting states in both mark states that are accepting states in both AAww and and A A
5.2)5.2) iteratively mark; iteratively mark; nonfork (regular) nodes: nonfork (regular) nodes: IF IF one of their one of their
outgoing edges points to a outgoing edges points to a marked nodemarked node fork nodes: fork nodes: IF IF both of their fork options (for both of their fork options (for
some some fi fi ) contain an edge that points to a ) contain an edge that points to a marked node.marked node.
k
SAFE REWRITINGSAFE REWRITING
The cartesian product automaton Ax = Aw x A
q0,p0
q3,p6
q1,p1 q2,p2 q3,p3
q5,p2 q6,p3
q4,p4
q7,p3 q4,p3
q7,p5 q5,p5
q7,p6
q4,p6
q7,p6
title date
Get_Temp
temp
TimeOut
Perform.
exhibit
PerformanceexhibitTimeOut
εExhibit
Performance
ε
ε ε
ε
ε
εε
Figure6:Figure6:
SAFE REWRITINGSAFE REWRITING
The Algorithm (cont’d):The Algorithm (cont’d): 6)6) Try to obtain a SAFE REWRITING.Try to obtain a SAFE REWRITING.
““A safe rewriting exists IFF the initial state is not A safe rewriting exists IFF the initial state is not marked”marked”
6.1) 6.1) Follow a non-marked pathFollow a non-marked path (corresponding to(corresponding to w w ) starting from the initial state of) starting from the initial state of AAx x to a state to a state
[q p] where q is an accepting state[q p] where q is an accepting state ofof AAww 6.1.1)6.1.1) non-marked fork options on the path non-marked fork options on the path
determine the rewriring choices (i.e. which determine the rewriring choices (i.e. which functions to call)functions to call)
6.1.2)6.1.2)when a function is invoked, we cont,nue when a function is invoked, we cont,nue the path with the new rewritten word rather the path with the new rewritten word rather than the wordthan the word w w
k
SAFE REWRITINGSAFE REWRITING
The Algorithm (cont’d):The Algorithm (cont’d): 6.2)6.2) To minimize the rewriting cost, choose a To minimize the rewriting cost, choose a
path with minimal number/cost of function path with minimal number/cost of function invocations.invocations.
EXIT EXIT % End of the algorithm% End of the algorithm
SAFE REWRITINGSAFE REWRITING
The complement automaton A for schema
ττ’(newspaper)=title.date.temp.exhibit*’(newspaper)=title.date.temp.exhibit*
p5
q3 p3 p4 p6temp *
exhibit
exhibit
*
*
**
*
q1 dateq0 title
*
1
Figure7:Figure7:
SAFE REWRITINGSAFE REWRITING
The cartesian product automaton Ax = Aw x A
q0,p0
q3,p6
q1,p1 q2,p2 q3,p3
q5,p2 q6,p3
q7,p3 q4,p3
q7,p5 q5,p5
q7,p6
q4,p6
q7,p6
title date
Get_Temp
temp
TimeOut
Perform.
exhibit
PerformanceexhibitTimeOut
εExhibit
Performance
ε
ε ε
ε
ε
εε
1 1 1
Figure8:Figure8:
SAFE REWRITINGSAFE REWRITING
Complexity of the Algorithm:Complexity of the Algorithm: ss00 schema of the sender schema of the sender ss agreed data exchange schema agreed data exchange schema ComplexityComplexity is determined by is determined by the size of thethe size of the
cartesian product of the automatoncartesian product of the automaton. . 1.1. Construct the cartesian product Construct the cartesian product 2.2. Traverse and mark the nodes of the resulting Traverse and mark the nodes of the resulting
productproduct THUS complexity is bounded by:THUS complexity is bounded by: O(|AO(|Axx| )=O( ( | A| )=O( ( | Aw w | | X X | | A |) )A |) )
2 2k
SAFE REWRITINGSAFE REWRITING
Complexity of the Algorithm:Complexity of the Algorithm: (cont’d) (cont’d)
O(|AO(|Axx| )=O( ( | A| )=O( ( | Aw w | | X X || A |) )A |) )2 2k
Maximum size:
O((|s0|+|w|) )k Complexity is polynomial
in the size of schemas s and s0 (with the exponent determined by k)
POSSIBLE REWRITINGPOSSIBLE REWRITING
The AlgorithmThe Algorithm 1.1. Build finite state automaton for the following Build finite state automaton for the following
languages:languages: 1.1.1.1. An automaton A An automaton Aww 1.2. 1.2. An automaton accepting the regular An automaton accepting the regular
language language RR
k
POSSIBLE REWRITINGPOSSIBLE REWRITING
An automaton A for schema
ττ’’(newspaper)=title.date. Temp.exhibit*’’(newspaper)=title.date. Temp.exhibit*
p2 p3 p4temp Exhibit
exhibit
p1 datep0 title
Figure10:Figure10:
POSSIBLE REWRITINGPOSSIBLE REWRITING The Algorithm (cont’d)The Algorithm (cont’d) 2.Construct the cartesian product automaton Ax=Aw x A
q0,p0 q1,p1 q2,p2 q3,p3
q5,p2 q6,p3
q7,p3
title date
tempε ε
ε
Figure11:Figure11:
q4,p3
q4,p4
q7,p4
ε
ε
exhibit
k
POSSIBLE REWRITINGPOSSIBLE REWRITING The Algorithm (cont’d)The Algorithm (cont’d) 3.3.Mark all nodes in Ax having some
outgoing path leading to a final state 4.4.IF the initial state is marked THEN a
rewriting may exist. To obtain such a rewriting:
Follow a marked path from the initial state of Follow a marked path from the initial state of AAxx to a final one , with the fork options on the to a final one , with the fork options on the path determining the rewriting choices.path determining the rewriting choices.
Backtrack when the call return a value that Backtrack when the call return a value that does not allow to continue to an accepting statedoes not allow to continue to an accepting state
To minimize thE rewriting cost, choose a path To minimize thE rewriting cost, choose a path with the minimal number/cost of function with the minimal number/cost of function invocations.invocations.
SAFE REWRITINGSAFE REWRITING
The cartesian product automaton for possible rewritting.
q0,p0 q1,p1 q2,p2 q3,p3
q5,p2 q6,p3
q7,p3
title date
tempε ε
ε
Figure11:Figure11:
q4,p3
q4,p4
q7,p4
ε
ε
exhibit
implementation performed in the Schema Enforcement Module of ActiveXML.
We’ll describe: how the intensional document and schema
model map to: XML XML schema SOAP WSDL
Describe ActiveXML and Schema Enforcement Module
IMPLEMENTATIONIMPLEMENTATION
IMPLEMENTATIONIMPLEMENTATION
In the implementation; intensional XML document a synctactically
well-formed XML document
To distinguish intensional parts from the rest of the document; namespace http://www.activexml.com/ns/int is used. http://www.activexml.com/ns/int namespace
defined for function (service) calls.
IMPLEMENTATIONIMPLEMENTATION
newspaper
title
“The Sun”
date
“04/10/2002”
Get_Temp
city
“Paris”
TimeOut
“Exhibits”
IMPLEMENTATIONIMPLEMENTATION
Namespace Namespace defined for defined for
function (service) function (service) callscalls
Data nodes Data nodes title title and and datedate
1.1.URL of URL of the serverthe server
Three attributes of the Three attributes of the function nodes provide function nodes provide necessary information necessary information
to call the to call the SOAP ServiceSOAP Service
2.2.Method Method namename
3.3.associated associated namespacenamespace
IMPLEMENTATIONIMPLEMENTATION
Function TimeOutFunction TimeOut
1.1.URL of URL of the serverthe server
2.2.Method Method namename
3.3.associated associated namespacenamespace
IMPLEMENTATIONIMPLEMENTATION
XML Representation of Function AttributesXML Representation of Function Attributes
id attribute:id attribute: identifies identifies the function attributesthe function attributes
Attributes: designate the SOAP function that
implements the boolean predicate used for function
pattern
The “contents” detail the function signature i.e. Expected types of input
parameters and the result of function calls
IMPLEMENTATIONIMPLEMENTATION
Function Pattern “Forecast”Function Pattern “Forecast”
Captures any function with one input parameter of element type “city”
Returns an element of type “temp”
IMPLEMENTATIONIMPLEMENTATION
Newspaper element with structureNewspaper element with structure title.date.(Forecast|temp). (TimeOut|exhibit*)
IMPLEMENTATIONIMPLEMENTATION
ActiveXML System:ActiveXML System: Active XML is a peer-to-peer system centered
around intensional XML documents. Each peer;
contains a repository of intensional documents provides active features to enrich them by
automatically triggering the function calls they contain.
also provides some Web Services defined declaratively as queries/updates on top of the repository documents.
All exchanges between the ActiveXML peers and with Web Service providers/consumers use the SOAP Protocol
IMPLEMENTATIONIMPLEMENTATION
The Role ofThe Role of Schema Encorcement Module Schema Encorcement Module :: 1. 1. to verify whether the call parameters conform to
the WSDLint description of the service. 22. if not, try to rewrite them into the required
structure. 3. 3. if if 2 2 fails, to report an error.fails, to report an error.
NOTE:NOTE: Similarly, before an ActiveXML returns its answer,
the Schema Encorcement Module performs the same three steps on the returned data.
IMPLEMENTATIONIMPLEMENTATION
Implementation of Schema Enforcement Module : Parser uses a standard SAX parser.
does not cover all the features of XML Schema implements the important features such as;
complex types element/type references schema import does not check simple types, inheritance and
keys, but could easily be added to the code.
IMPLEMENTATIONIMPLEMENTATION
Different from the algorithm proposed, implementation builds the automaton in a lazy mode;
start from the inital state and construct only needed parts
The construction is pruned whenever a node can be marked directly without looking at the remaining, unexplored branches.
Main ideas that guide this process: 1.Sink Nodes once you get there you can’t get out 2.Marked Nodes
IMPLEMENTATIONIMPLEMENTATION
The pruned automaton
q0,p0
q3,p6
q1,p1 q2,p2 q3,p3
q5,p2 q6,p3
q4,p4
q7,p3 q4,p3
q7,p5 q5,p5
q7,p6
q4,p6
q7,p6
title date
Get_Temp
temp
TimeOut
Perform.
exhibit
Performance
exhibitTimeOut
εExhibit
Performance
ε
ε ε
ε
ε
εε
Figure12:Figure12:
CONCLUSION and RELATED WORKCONCLUSION and RELATED WORK
XML documents with embedded calls to Web services are already present in several existing products.
WHAT’S NEW ? However, the proposed extension of the XML
Schema with function types is a first step towards a more precise description of XML documents embedding computation.
MAIN PROBLEM: whether Safe Rewriting remains decidable when the
k-depth restriction is removed.