Upload
locke
View
52
Download
1
Embed Size (px)
DESCRIPTION
Technical University of Valencia Computer Science Department. SOFSEM’07 (22/01/2007). A Program Slicing Based Method to Filter XML/DTD documents. Josep F. Silva Galiana. Contents. Motivation Program Slicing XML DTD XSLT - PowerPoint PPT Presentation
Citation preview
Technical University of Valencia Computer Science Department
SOFSEMrsquo07 (22012007)
A Program Slicing Based Method to Filter XMLDTD documents
2
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Program Slicing
3
Program Slicing
bull DefinitionDefinition Program transformation to extract the program statements that (potentially) affect the values computed at some point of interest
bull Origin Origin Originally introduced by Weiser
bull ExampleExample (1) read(n) (2) i=1(3) sum=0(4) product=1(5) while (ilt=n) do
begin(6) sum=sum+i(7) product=producti(8) i=i+1
end(9) write(sum)(10) write(product)
Slicing Criterion = (10 product)
4
Program Slicing
bull DefinitionDefinition Program transformation to extract the program statements that (potentially) affect the values computed at some point of interest
bull Origin Origin Originally introduced by Weiser
bull ExampleExample (1) read(n) (2) i=1(3) sum=0(4) product=1(5) while (ilt=n) do
begin(6) sum=sum+i(7) product=producti(8) i=i+1
end(9) write(sum)(10) write(product)
Slicing Criterion = (10 product)
5
Program Slicing
bull ApplicationsApplications Debugging Code understanding Specialization etc
All the applications are based on the Program Dependence Graphs (PDGs) (structure and behaviour of programs)
What would happen if Program Slicing was applied to a data structure Would it be interesting
6
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
XML
7
XML
bull OriginOrigin XML was developed by an XML Working Group formed under the auspices of the World Wide Web Consortium (W3C) in 1996
bull StructureStructure Documents are trees composed by lsquoELEMENTSrsquo which contain attributes
Example of XML document
XML XML (e(eXXtensible tensible MMarkup arkup LLanguage)anguage)
8
XML
bull ObjectiveObjective The purpose of a DTD is to define the legal building blocks of an XML document It defines the document structure with a list of legal elements
bull StructureStructure Documents are graphs composed by lsquoELEMENTSrsquo
Example of DTD document
DTD DTD ((DDocument ocument TType ype DDefinition)efinition)
9
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status Name Surname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
DTD DTD ((DDocument ocument TType ype DDefinition)efinition)XML XML (e(eXXtensible tensible MMarkup arkup LLanguage)anguage)
10
XML
bull ObjectiveObjective XSLT is a language for transforming XML
bull StructureStructure An XSLT stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses a formatting vocabulary such as (X)HTML or XSL-FO
bull XSLT is a programming language
Example of XSLT document(Source Code)
XSLT XSLT (e(eXXtensible tensible SStylesheet tylesheet LLanguage anguage TTransformations)ransformations)
Example of XSLT document(Result)
11
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Slicing XML Documents
12
Slicing XML Documentsbull We see XML documents and DTDs as trees
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
13
Slicing XML Documents
bull The Slicing Criterion is composed by a set of nodes in the tree
bull For each node in the slicing criterion we extract from the tree all those nodes that are in the path from the root to the node
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
14
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
15
Slicing XML Documentsbull XML backward slicing criterion
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Web Page(Original)
Web Page(Slice)
16
Slicing XML Documentsbull XML backward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
17
Slicing XML Documents
bull We distinguish between DTD and XML slicing criterionsbull XML slicing criterions are more fine-grained than DTD slicing criterions
bull We distinguish between forward and backward slices (or a combination)
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
18
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
19
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
20
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML backward-forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
21
Slicing XML Documents
bull What happens with DTDs Slices are well-formed but are they valid
bull For each XML slice we produce a DTD slice and viceversa
bull We guarantee that XML slices are valid with respect to DTD slices
DTD
document
SlicerSlicer
XMLdocument
DTD Slicedocument
XML SlicedocumentSlicing Criterion
22
Slicing XML Documents
bull A simple slicing algorithm
23
Slicing XML Documents
bull In the case of a DTD criterion composed by a set of positions C = p1hellippn Pos(D) the algorithm would be the same except that the first loop would be
For each v1v2(hellip)vn C do Vrsquo = Vrsquo v1 v1v2 hellip v1v2(hellip)vn Wrsquo = Wrsquo v1|iv2|j(hellip)vn|k Where v1v2(hellip)vn vrsquo and v1|iv2|j(hellip)vn|k X
Both algorithms produce valid XML and DTD slices with respect to the slicing criterion
24
Slicing XML Documents
The following theorem states the correctness of the technique
Theorem Let D be a well-formed DTD and X a well-formed XML document valid with respect to D Given a slice Drsquo of D and a slice Xrsquo of X computed with an XML slicing criterion C and given a slice Drsquorsquo of D and a slice Xrsquorsquo of X computed with a DTD slicing criterion Crsquo then
a) Drsquo is well-formed and Xrsquo is valid with respect to Drsquob) Drsquorsquo is well-formed and Xrsquorsquo is valid with respect to Drsquorsquo
If all the elements in C are of one of the types in Crsquo then
c) Drsquo = Drsquorsquod) Xrsquo is a subtree of Xrsquorsquo
25
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Implementation
26
Implementation
We have implemented a prototype in Haskell
Haskell provides us a formal basis with many advantages for the manipulation of XML documents
- The HaXml library
It allows us to automatically translate XML or HTML documents into a Haskell representation In particular we use the following data structures that can represent any XMLHTML document
data Element = Elem Name [Attribute] [Content]data Attribute = (Name Value)data Content = CElem Element
| CText String
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
2
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Program Slicing
3
Program Slicing
bull DefinitionDefinition Program transformation to extract the program statements that (potentially) affect the values computed at some point of interest
bull Origin Origin Originally introduced by Weiser
bull ExampleExample (1) read(n) (2) i=1(3) sum=0(4) product=1(5) while (ilt=n) do
begin(6) sum=sum+i(7) product=producti(8) i=i+1
end(9) write(sum)(10) write(product)
Slicing Criterion = (10 product)
4
Program Slicing
bull DefinitionDefinition Program transformation to extract the program statements that (potentially) affect the values computed at some point of interest
bull Origin Origin Originally introduced by Weiser
bull ExampleExample (1) read(n) (2) i=1(3) sum=0(4) product=1(5) while (ilt=n) do
begin(6) sum=sum+i(7) product=producti(8) i=i+1
end(9) write(sum)(10) write(product)
Slicing Criterion = (10 product)
5
Program Slicing
bull ApplicationsApplications Debugging Code understanding Specialization etc
All the applications are based on the Program Dependence Graphs (PDGs) (structure and behaviour of programs)
What would happen if Program Slicing was applied to a data structure Would it be interesting
6
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
XML
7
XML
bull OriginOrigin XML was developed by an XML Working Group formed under the auspices of the World Wide Web Consortium (W3C) in 1996
bull StructureStructure Documents are trees composed by lsquoELEMENTSrsquo which contain attributes
Example of XML document
XML XML (e(eXXtensible tensible MMarkup arkup LLanguage)anguage)
8
XML
bull ObjectiveObjective The purpose of a DTD is to define the legal building blocks of an XML document It defines the document structure with a list of legal elements
bull StructureStructure Documents are graphs composed by lsquoELEMENTSrsquo
Example of DTD document
DTD DTD ((DDocument ocument TType ype DDefinition)efinition)
9
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status Name Surname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
DTD DTD ((DDocument ocument TType ype DDefinition)efinition)XML XML (e(eXXtensible tensible MMarkup arkup LLanguage)anguage)
10
XML
bull ObjectiveObjective XSLT is a language for transforming XML
bull StructureStructure An XSLT stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses a formatting vocabulary such as (X)HTML or XSL-FO
bull XSLT is a programming language
Example of XSLT document(Source Code)
XSLT XSLT (e(eXXtensible tensible SStylesheet tylesheet LLanguage anguage TTransformations)ransformations)
Example of XSLT document(Result)
11
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Slicing XML Documents
12
Slicing XML Documentsbull We see XML documents and DTDs as trees
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
13
Slicing XML Documents
bull The Slicing Criterion is composed by a set of nodes in the tree
bull For each node in the slicing criterion we extract from the tree all those nodes that are in the path from the root to the node
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
14
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
15
Slicing XML Documentsbull XML backward slicing criterion
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Web Page(Original)
Web Page(Slice)
16
Slicing XML Documentsbull XML backward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
17
Slicing XML Documents
bull We distinguish between DTD and XML slicing criterionsbull XML slicing criterions are more fine-grained than DTD slicing criterions
bull We distinguish between forward and backward slices (or a combination)
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
18
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
19
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
20
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML backward-forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
21
Slicing XML Documents
bull What happens with DTDs Slices are well-formed but are they valid
bull For each XML slice we produce a DTD slice and viceversa
bull We guarantee that XML slices are valid with respect to DTD slices
DTD
document
SlicerSlicer
XMLdocument
DTD Slicedocument
XML SlicedocumentSlicing Criterion
22
Slicing XML Documents
bull A simple slicing algorithm
23
Slicing XML Documents
bull In the case of a DTD criterion composed by a set of positions C = p1hellippn Pos(D) the algorithm would be the same except that the first loop would be
For each v1v2(hellip)vn C do Vrsquo = Vrsquo v1 v1v2 hellip v1v2(hellip)vn Wrsquo = Wrsquo v1|iv2|j(hellip)vn|k Where v1v2(hellip)vn vrsquo and v1|iv2|j(hellip)vn|k X
Both algorithms produce valid XML and DTD slices with respect to the slicing criterion
24
Slicing XML Documents
The following theorem states the correctness of the technique
Theorem Let D be a well-formed DTD and X a well-formed XML document valid with respect to D Given a slice Drsquo of D and a slice Xrsquo of X computed with an XML slicing criterion C and given a slice Drsquorsquo of D and a slice Xrsquorsquo of X computed with a DTD slicing criterion Crsquo then
a) Drsquo is well-formed and Xrsquo is valid with respect to Drsquob) Drsquorsquo is well-formed and Xrsquorsquo is valid with respect to Drsquorsquo
If all the elements in C are of one of the types in Crsquo then
c) Drsquo = Drsquorsquod) Xrsquo is a subtree of Xrsquorsquo
25
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Implementation
26
Implementation
We have implemented a prototype in Haskell
Haskell provides us a formal basis with many advantages for the manipulation of XML documents
- The HaXml library
It allows us to automatically translate XML or HTML documents into a Haskell representation In particular we use the following data structures that can represent any XMLHTML document
data Element = Elem Name [Attribute] [Content]data Attribute = (Name Value)data Content = CElem Element
| CText String
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
3
Program Slicing
bull DefinitionDefinition Program transformation to extract the program statements that (potentially) affect the values computed at some point of interest
bull Origin Origin Originally introduced by Weiser
bull ExampleExample (1) read(n) (2) i=1(3) sum=0(4) product=1(5) while (ilt=n) do
begin(6) sum=sum+i(7) product=producti(8) i=i+1
end(9) write(sum)(10) write(product)
Slicing Criterion = (10 product)
4
Program Slicing
bull DefinitionDefinition Program transformation to extract the program statements that (potentially) affect the values computed at some point of interest
bull Origin Origin Originally introduced by Weiser
bull ExampleExample (1) read(n) (2) i=1(3) sum=0(4) product=1(5) while (ilt=n) do
begin(6) sum=sum+i(7) product=producti(8) i=i+1
end(9) write(sum)(10) write(product)
Slicing Criterion = (10 product)
5
Program Slicing
bull ApplicationsApplications Debugging Code understanding Specialization etc
All the applications are based on the Program Dependence Graphs (PDGs) (structure and behaviour of programs)
What would happen if Program Slicing was applied to a data structure Would it be interesting
6
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
XML
7
XML
bull OriginOrigin XML was developed by an XML Working Group formed under the auspices of the World Wide Web Consortium (W3C) in 1996
bull StructureStructure Documents are trees composed by lsquoELEMENTSrsquo which contain attributes
Example of XML document
XML XML (e(eXXtensible tensible MMarkup arkup LLanguage)anguage)
8
XML
bull ObjectiveObjective The purpose of a DTD is to define the legal building blocks of an XML document It defines the document structure with a list of legal elements
bull StructureStructure Documents are graphs composed by lsquoELEMENTSrsquo
Example of DTD document
DTD DTD ((DDocument ocument TType ype DDefinition)efinition)
9
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status Name Surname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
DTD DTD ((DDocument ocument TType ype DDefinition)efinition)XML XML (e(eXXtensible tensible MMarkup arkup LLanguage)anguage)
10
XML
bull ObjectiveObjective XSLT is a language for transforming XML
bull StructureStructure An XSLT stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses a formatting vocabulary such as (X)HTML or XSL-FO
bull XSLT is a programming language
Example of XSLT document(Source Code)
XSLT XSLT (e(eXXtensible tensible SStylesheet tylesheet LLanguage anguage TTransformations)ransformations)
Example of XSLT document(Result)
11
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Slicing XML Documents
12
Slicing XML Documentsbull We see XML documents and DTDs as trees
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
13
Slicing XML Documents
bull The Slicing Criterion is composed by a set of nodes in the tree
bull For each node in the slicing criterion we extract from the tree all those nodes that are in the path from the root to the node
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
14
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
15
Slicing XML Documentsbull XML backward slicing criterion
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Web Page(Original)
Web Page(Slice)
16
Slicing XML Documentsbull XML backward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
17
Slicing XML Documents
bull We distinguish between DTD and XML slicing criterionsbull XML slicing criterions are more fine-grained than DTD slicing criterions
bull We distinguish between forward and backward slices (or a combination)
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
18
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
19
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
20
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML backward-forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
21
Slicing XML Documents
bull What happens with DTDs Slices are well-formed but are they valid
bull For each XML slice we produce a DTD slice and viceversa
bull We guarantee that XML slices are valid with respect to DTD slices
DTD
document
SlicerSlicer
XMLdocument
DTD Slicedocument
XML SlicedocumentSlicing Criterion
22
Slicing XML Documents
bull A simple slicing algorithm
23
Slicing XML Documents
bull In the case of a DTD criterion composed by a set of positions C = p1hellippn Pos(D) the algorithm would be the same except that the first loop would be
For each v1v2(hellip)vn C do Vrsquo = Vrsquo v1 v1v2 hellip v1v2(hellip)vn Wrsquo = Wrsquo v1|iv2|j(hellip)vn|k Where v1v2(hellip)vn vrsquo and v1|iv2|j(hellip)vn|k X
Both algorithms produce valid XML and DTD slices with respect to the slicing criterion
24
Slicing XML Documents
The following theorem states the correctness of the technique
Theorem Let D be a well-formed DTD and X a well-formed XML document valid with respect to D Given a slice Drsquo of D and a slice Xrsquo of X computed with an XML slicing criterion C and given a slice Drsquorsquo of D and a slice Xrsquorsquo of X computed with a DTD slicing criterion Crsquo then
a) Drsquo is well-formed and Xrsquo is valid with respect to Drsquob) Drsquorsquo is well-formed and Xrsquorsquo is valid with respect to Drsquorsquo
If all the elements in C are of one of the types in Crsquo then
c) Drsquo = Drsquorsquod) Xrsquo is a subtree of Xrsquorsquo
25
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Implementation
26
Implementation
We have implemented a prototype in Haskell
Haskell provides us a formal basis with many advantages for the manipulation of XML documents
- The HaXml library
It allows us to automatically translate XML or HTML documents into a Haskell representation In particular we use the following data structures that can represent any XMLHTML document
data Element = Elem Name [Attribute] [Content]data Attribute = (Name Value)data Content = CElem Element
| CText String
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
4
Program Slicing
bull DefinitionDefinition Program transformation to extract the program statements that (potentially) affect the values computed at some point of interest
bull Origin Origin Originally introduced by Weiser
bull ExampleExample (1) read(n) (2) i=1(3) sum=0(4) product=1(5) while (ilt=n) do
begin(6) sum=sum+i(7) product=producti(8) i=i+1
end(9) write(sum)(10) write(product)
Slicing Criterion = (10 product)
5
Program Slicing
bull ApplicationsApplications Debugging Code understanding Specialization etc
All the applications are based on the Program Dependence Graphs (PDGs) (structure and behaviour of programs)
What would happen if Program Slicing was applied to a data structure Would it be interesting
6
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
XML
7
XML
bull OriginOrigin XML was developed by an XML Working Group formed under the auspices of the World Wide Web Consortium (W3C) in 1996
bull StructureStructure Documents are trees composed by lsquoELEMENTSrsquo which contain attributes
Example of XML document
XML XML (e(eXXtensible tensible MMarkup arkup LLanguage)anguage)
8
XML
bull ObjectiveObjective The purpose of a DTD is to define the legal building blocks of an XML document It defines the document structure with a list of legal elements
bull StructureStructure Documents are graphs composed by lsquoELEMENTSrsquo
Example of DTD document
DTD DTD ((DDocument ocument TType ype DDefinition)efinition)
9
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status Name Surname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
DTD DTD ((DDocument ocument TType ype DDefinition)efinition)XML XML (e(eXXtensible tensible MMarkup arkup LLanguage)anguage)
10
XML
bull ObjectiveObjective XSLT is a language for transforming XML
bull StructureStructure An XSLT stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses a formatting vocabulary such as (X)HTML or XSL-FO
bull XSLT is a programming language
Example of XSLT document(Source Code)
XSLT XSLT (e(eXXtensible tensible SStylesheet tylesheet LLanguage anguage TTransformations)ransformations)
Example of XSLT document(Result)
11
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Slicing XML Documents
12
Slicing XML Documentsbull We see XML documents and DTDs as trees
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
13
Slicing XML Documents
bull The Slicing Criterion is composed by a set of nodes in the tree
bull For each node in the slicing criterion we extract from the tree all those nodes that are in the path from the root to the node
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
14
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
15
Slicing XML Documentsbull XML backward slicing criterion
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Web Page(Original)
Web Page(Slice)
16
Slicing XML Documentsbull XML backward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
17
Slicing XML Documents
bull We distinguish between DTD and XML slicing criterionsbull XML slicing criterions are more fine-grained than DTD slicing criterions
bull We distinguish between forward and backward slices (or a combination)
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
18
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
19
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
20
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML backward-forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
21
Slicing XML Documents
bull What happens with DTDs Slices are well-formed but are they valid
bull For each XML slice we produce a DTD slice and viceversa
bull We guarantee that XML slices are valid with respect to DTD slices
DTD
document
SlicerSlicer
XMLdocument
DTD Slicedocument
XML SlicedocumentSlicing Criterion
22
Slicing XML Documents
bull A simple slicing algorithm
23
Slicing XML Documents
bull In the case of a DTD criterion composed by a set of positions C = p1hellippn Pos(D) the algorithm would be the same except that the first loop would be
For each v1v2(hellip)vn C do Vrsquo = Vrsquo v1 v1v2 hellip v1v2(hellip)vn Wrsquo = Wrsquo v1|iv2|j(hellip)vn|k Where v1v2(hellip)vn vrsquo and v1|iv2|j(hellip)vn|k X
Both algorithms produce valid XML and DTD slices with respect to the slicing criterion
24
Slicing XML Documents
The following theorem states the correctness of the technique
Theorem Let D be a well-formed DTD and X a well-formed XML document valid with respect to D Given a slice Drsquo of D and a slice Xrsquo of X computed with an XML slicing criterion C and given a slice Drsquorsquo of D and a slice Xrsquorsquo of X computed with a DTD slicing criterion Crsquo then
a) Drsquo is well-formed and Xrsquo is valid with respect to Drsquob) Drsquorsquo is well-formed and Xrsquorsquo is valid with respect to Drsquorsquo
If all the elements in C are of one of the types in Crsquo then
c) Drsquo = Drsquorsquod) Xrsquo is a subtree of Xrsquorsquo
25
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Implementation
26
Implementation
We have implemented a prototype in Haskell
Haskell provides us a formal basis with many advantages for the manipulation of XML documents
- The HaXml library
It allows us to automatically translate XML or HTML documents into a Haskell representation In particular we use the following data structures that can represent any XMLHTML document
data Element = Elem Name [Attribute] [Content]data Attribute = (Name Value)data Content = CElem Element
| CText String
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
5
Program Slicing
bull ApplicationsApplications Debugging Code understanding Specialization etc
All the applications are based on the Program Dependence Graphs (PDGs) (structure and behaviour of programs)
What would happen if Program Slicing was applied to a data structure Would it be interesting
6
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
XML
7
XML
bull OriginOrigin XML was developed by an XML Working Group formed under the auspices of the World Wide Web Consortium (W3C) in 1996
bull StructureStructure Documents are trees composed by lsquoELEMENTSrsquo which contain attributes
Example of XML document
XML XML (e(eXXtensible tensible MMarkup arkup LLanguage)anguage)
8
XML
bull ObjectiveObjective The purpose of a DTD is to define the legal building blocks of an XML document It defines the document structure with a list of legal elements
bull StructureStructure Documents are graphs composed by lsquoELEMENTSrsquo
Example of DTD document
DTD DTD ((DDocument ocument TType ype DDefinition)efinition)
9
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status Name Surname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
DTD DTD ((DDocument ocument TType ype DDefinition)efinition)XML XML (e(eXXtensible tensible MMarkup arkup LLanguage)anguage)
10
XML
bull ObjectiveObjective XSLT is a language for transforming XML
bull StructureStructure An XSLT stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses a formatting vocabulary such as (X)HTML or XSL-FO
bull XSLT is a programming language
Example of XSLT document(Source Code)
XSLT XSLT (e(eXXtensible tensible SStylesheet tylesheet LLanguage anguage TTransformations)ransformations)
Example of XSLT document(Result)
11
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Slicing XML Documents
12
Slicing XML Documentsbull We see XML documents and DTDs as trees
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
13
Slicing XML Documents
bull The Slicing Criterion is composed by a set of nodes in the tree
bull For each node in the slicing criterion we extract from the tree all those nodes that are in the path from the root to the node
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
14
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
15
Slicing XML Documentsbull XML backward slicing criterion
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Web Page(Original)
Web Page(Slice)
16
Slicing XML Documentsbull XML backward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
17
Slicing XML Documents
bull We distinguish between DTD and XML slicing criterionsbull XML slicing criterions are more fine-grained than DTD slicing criterions
bull We distinguish between forward and backward slices (or a combination)
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
18
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
19
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
20
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML backward-forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
21
Slicing XML Documents
bull What happens with DTDs Slices are well-formed but are they valid
bull For each XML slice we produce a DTD slice and viceversa
bull We guarantee that XML slices are valid with respect to DTD slices
DTD
document
SlicerSlicer
XMLdocument
DTD Slicedocument
XML SlicedocumentSlicing Criterion
22
Slicing XML Documents
bull A simple slicing algorithm
23
Slicing XML Documents
bull In the case of a DTD criterion composed by a set of positions C = p1hellippn Pos(D) the algorithm would be the same except that the first loop would be
For each v1v2(hellip)vn C do Vrsquo = Vrsquo v1 v1v2 hellip v1v2(hellip)vn Wrsquo = Wrsquo v1|iv2|j(hellip)vn|k Where v1v2(hellip)vn vrsquo and v1|iv2|j(hellip)vn|k X
Both algorithms produce valid XML and DTD slices with respect to the slicing criterion
24
Slicing XML Documents
The following theorem states the correctness of the technique
Theorem Let D be a well-formed DTD and X a well-formed XML document valid with respect to D Given a slice Drsquo of D and a slice Xrsquo of X computed with an XML slicing criterion C and given a slice Drsquorsquo of D and a slice Xrsquorsquo of X computed with a DTD slicing criterion Crsquo then
a) Drsquo is well-formed and Xrsquo is valid with respect to Drsquob) Drsquorsquo is well-formed and Xrsquorsquo is valid with respect to Drsquorsquo
If all the elements in C are of one of the types in Crsquo then
c) Drsquo = Drsquorsquod) Xrsquo is a subtree of Xrsquorsquo
25
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Implementation
26
Implementation
We have implemented a prototype in Haskell
Haskell provides us a formal basis with many advantages for the manipulation of XML documents
- The HaXml library
It allows us to automatically translate XML or HTML documents into a Haskell representation In particular we use the following data structures that can represent any XMLHTML document
data Element = Elem Name [Attribute] [Content]data Attribute = (Name Value)data Content = CElem Element
| CText String
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
6
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
XML
7
XML
bull OriginOrigin XML was developed by an XML Working Group formed under the auspices of the World Wide Web Consortium (W3C) in 1996
bull StructureStructure Documents are trees composed by lsquoELEMENTSrsquo which contain attributes
Example of XML document
XML XML (e(eXXtensible tensible MMarkup arkup LLanguage)anguage)
8
XML
bull ObjectiveObjective The purpose of a DTD is to define the legal building blocks of an XML document It defines the document structure with a list of legal elements
bull StructureStructure Documents are graphs composed by lsquoELEMENTSrsquo
Example of DTD document
DTD DTD ((DDocument ocument TType ype DDefinition)efinition)
9
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status Name Surname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
DTD DTD ((DDocument ocument TType ype DDefinition)efinition)XML XML (e(eXXtensible tensible MMarkup arkup LLanguage)anguage)
10
XML
bull ObjectiveObjective XSLT is a language for transforming XML
bull StructureStructure An XSLT stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses a formatting vocabulary such as (X)HTML or XSL-FO
bull XSLT is a programming language
Example of XSLT document(Source Code)
XSLT XSLT (e(eXXtensible tensible SStylesheet tylesheet LLanguage anguage TTransformations)ransformations)
Example of XSLT document(Result)
11
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Slicing XML Documents
12
Slicing XML Documentsbull We see XML documents and DTDs as trees
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
13
Slicing XML Documents
bull The Slicing Criterion is composed by a set of nodes in the tree
bull For each node in the slicing criterion we extract from the tree all those nodes that are in the path from the root to the node
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
14
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
15
Slicing XML Documentsbull XML backward slicing criterion
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Web Page(Original)
Web Page(Slice)
16
Slicing XML Documentsbull XML backward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
17
Slicing XML Documents
bull We distinguish between DTD and XML slicing criterionsbull XML slicing criterions are more fine-grained than DTD slicing criterions
bull We distinguish between forward and backward slices (or a combination)
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
18
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
19
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
20
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML backward-forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
21
Slicing XML Documents
bull What happens with DTDs Slices are well-formed but are they valid
bull For each XML slice we produce a DTD slice and viceversa
bull We guarantee that XML slices are valid with respect to DTD slices
DTD
document
SlicerSlicer
XMLdocument
DTD Slicedocument
XML SlicedocumentSlicing Criterion
22
Slicing XML Documents
bull A simple slicing algorithm
23
Slicing XML Documents
bull In the case of a DTD criterion composed by a set of positions C = p1hellippn Pos(D) the algorithm would be the same except that the first loop would be
For each v1v2(hellip)vn C do Vrsquo = Vrsquo v1 v1v2 hellip v1v2(hellip)vn Wrsquo = Wrsquo v1|iv2|j(hellip)vn|k Where v1v2(hellip)vn vrsquo and v1|iv2|j(hellip)vn|k X
Both algorithms produce valid XML and DTD slices with respect to the slicing criterion
24
Slicing XML Documents
The following theorem states the correctness of the technique
Theorem Let D be a well-formed DTD and X a well-formed XML document valid with respect to D Given a slice Drsquo of D and a slice Xrsquo of X computed with an XML slicing criterion C and given a slice Drsquorsquo of D and a slice Xrsquorsquo of X computed with a DTD slicing criterion Crsquo then
a) Drsquo is well-formed and Xrsquo is valid with respect to Drsquob) Drsquorsquo is well-formed and Xrsquorsquo is valid with respect to Drsquorsquo
If all the elements in C are of one of the types in Crsquo then
c) Drsquo = Drsquorsquod) Xrsquo is a subtree of Xrsquorsquo
25
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Implementation
26
Implementation
We have implemented a prototype in Haskell
Haskell provides us a formal basis with many advantages for the manipulation of XML documents
- The HaXml library
It allows us to automatically translate XML or HTML documents into a Haskell representation In particular we use the following data structures that can represent any XMLHTML document
data Element = Elem Name [Attribute] [Content]data Attribute = (Name Value)data Content = CElem Element
| CText String
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
7
XML
bull OriginOrigin XML was developed by an XML Working Group formed under the auspices of the World Wide Web Consortium (W3C) in 1996
bull StructureStructure Documents are trees composed by lsquoELEMENTSrsquo which contain attributes
Example of XML document
XML XML (e(eXXtensible tensible MMarkup arkup LLanguage)anguage)
8
XML
bull ObjectiveObjective The purpose of a DTD is to define the legal building blocks of an XML document It defines the document structure with a list of legal elements
bull StructureStructure Documents are graphs composed by lsquoELEMENTSrsquo
Example of DTD document
DTD DTD ((DDocument ocument TType ype DDefinition)efinition)
9
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status Name Surname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
DTD DTD ((DDocument ocument TType ype DDefinition)efinition)XML XML (e(eXXtensible tensible MMarkup arkup LLanguage)anguage)
10
XML
bull ObjectiveObjective XSLT is a language for transforming XML
bull StructureStructure An XSLT stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses a formatting vocabulary such as (X)HTML or XSL-FO
bull XSLT is a programming language
Example of XSLT document(Source Code)
XSLT XSLT (e(eXXtensible tensible SStylesheet tylesheet LLanguage anguage TTransformations)ransformations)
Example of XSLT document(Result)
11
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Slicing XML Documents
12
Slicing XML Documentsbull We see XML documents and DTDs as trees
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
13
Slicing XML Documents
bull The Slicing Criterion is composed by a set of nodes in the tree
bull For each node in the slicing criterion we extract from the tree all those nodes that are in the path from the root to the node
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
14
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
15
Slicing XML Documentsbull XML backward slicing criterion
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Web Page(Original)
Web Page(Slice)
16
Slicing XML Documentsbull XML backward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
17
Slicing XML Documents
bull We distinguish between DTD and XML slicing criterionsbull XML slicing criterions are more fine-grained than DTD slicing criterions
bull We distinguish between forward and backward slices (or a combination)
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
18
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
19
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
20
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML backward-forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
21
Slicing XML Documents
bull What happens with DTDs Slices are well-formed but are they valid
bull For each XML slice we produce a DTD slice and viceversa
bull We guarantee that XML slices are valid with respect to DTD slices
DTD
document
SlicerSlicer
XMLdocument
DTD Slicedocument
XML SlicedocumentSlicing Criterion
22
Slicing XML Documents
bull A simple slicing algorithm
23
Slicing XML Documents
bull In the case of a DTD criterion composed by a set of positions C = p1hellippn Pos(D) the algorithm would be the same except that the first loop would be
For each v1v2(hellip)vn C do Vrsquo = Vrsquo v1 v1v2 hellip v1v2(hellip)vn Wrsquo = Wrsquo v1|iv2|j(hellip)vn|k Where v1v2(hellip)vn vrsquo and v1|iv2|j(hellip)vn|k X
Both algorithms produce valid XML and DTD slices with respect to the slicing criterion
24
Slicing XML Documents
The following theorem states the correctness of the technique
Theorem Let D be a well-formed DTD and X a well-formed XML document valid with respect to D Given a slice Drsquo of D and a slice Xrsquo of X computed with an XML slicing criterion C and given a slice Drsquorsquo of D and a slice Xrsquorsquo of X computed with a DTD slicing criterion Crsquo then
a) Drsquo is well-formed and Xrsquo is valid with respect to Drsquob) Drsquorsquo is well-formed and Xrsquorsquo is valid with respect to Drsquorsquo
If all the elements in C are of one of the types in Crsquo then
c) Drsquo = Drsquorsquod) Xrsquo is a subtree of Xrsquorsquo
25
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Implementation
26
Implementation
We have implemented a prototype in Haskell
Haskell provides us a formal basis with many advantages for the manipulation of XML documents
- The HaXml library
It allows us to automatically translate XML or HTML documents into a Haskell representation In particular we use the following data structures that can represent any XMLHTML document
data Element = Elem Name [Attribute] [Content]data Attribute = (Name Value)data Content = CElem Element
| CText String
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
8
XML
bull ObjectiveObjective The purpose of a DTD is to define the legal building blocks of an XML document It defines the document structure with a list of legal elements
bull StructureStructure Documents are graphs composed by lsquoELEMENTSrsquo
Example of DTD document
DTD DTD ((DDocument ocument TType ype DDefinition)efinition)
9
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status Name Surname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
DTD DTD ((DDocument ocument TType ype DDefinition)efinition)XML XML (e(eXXtensible tensible MMarkup arkup LLanguage)anguage)
10
XML
bull ObjectiveObjective XSLT is a language for transforming XML
bull StructureStructure An XSLT stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses a formatting vocabulary such as (X)HTML or XSL-FO
bull XSLT is a programming language
Example of XSLT document(Source Code)
XSLT XSLT (e(eXXtensible tensible SStylesheet tylesheet LLanguage anguage TTransformations)ransformations)
Example of XSLT document(Result)
11
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Slicing XML Documents
12
Slicing XML Documentsbull We see XML documents and DTDs as trees
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
13
Slicing XML Documents
bull The Slicing Criterion is composed by a set of nodes in the tree
bull For each node in the slicing criterion we extract from the tree all those nodes that are in the path from the root to the node
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
14
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
15
Slicing XML Documentsbull XML backward slicing criterion
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Web Page(Original)
Web Page(Slice)
16
Slicing XML Documentsbull XML backward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
17
Slicing XML Documents
bull We distinguish between DTD and XML slicing criterionsbull XML slicing criterions are more fine-grained than DTD slicing criterions
bull We distinguish between forward and backward slices (or a combination)
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
18
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
19
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
20
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML backward-forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
21
Slicing XML Documents
bull What happens with DTDs Slices are well-formed but are they valid
bull For each XML slice we produce a DTD slice and viceversa
bull We guarantee that XML slices are valid with respect to DTD slices
DTD
document
SlicerSlicer
XMLdocument
DTD Slicedocument
XML SlicedocumentSlicing Criterion
22
Slicing XML Documents
bull A simple slicing algorithm
23
Slicing XML Documents
bull In the case of a DTD criterion composed by a set of positions C = p1hellippn Pos(D) the algorithm would be the same except that the first loop would be
For each v1v2(hellip)vn C do Vrsquo = Vrsquo v1 v1v2 hellip v1v2(hellip)vn Wrsquo = Wrsquo v1|iv2|j(hellip)vn|k Where v1v2(hellip)vn vrsquo and v1|iv2|j(hellip)vn|k X
Both algorithms produce valid XML and DTD slices with respect to the slicing criterion
24
Slicing XML Documents
The following theorem states the correctness of the technique
Theorem Let D be a well-formed DTD and X a well-formed XML document valid with respect to D Given a slice Drsquo of D and a slice Xrsquo of X computed with an XML slicing criterion C and given a slice Drsquorsquo of D and a slice Xrsquorsquo of X computed with a DTD slicing criterion Crsquo then
a) Drsquo is well-formed and Xrsquo is valid with respect to Drsquob) Drsquorsquo is well-formed and Xrsquorsquo is valid with respect to Drsquorsquo
If all the elements in C are of one of the types in Crsquo then
c) Drsquo = Drsquorsquod) Xrsquo is a subtree of Xrsquorsquo
25
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Implementation
26
Implementation
We have implemented a prototype in Haskell
Haskell provides us a formal basis with many advantages for the manipulation of XML documents
- The HaXml library
It allows us to automatically translate XML or HTML documents into a Haskell representation In particular we use the following data structures that can represent any XMLHTML document
data Element = Elem Name [Attribute] [Content]data Attribute = (Name Value)data Content = CElem Element
| CText String
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
9
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status Name Surname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
DTD DTD ((DDocument ocument TType ype DDefinition)efinition)XML XML (e(eXXtensible tensible MMarkup arkup LLanguage)anguage)
10
XML
bull ObjectiveObjective XSLT is a language for transforming XML
bull StructureStructure An XSLT stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses a formatting vocabulary such as (X)HTML or XSL-FO
bull XSLT is a programming language
Example of XSLT document(Source Code)
XSLT XSLT (e(eXXtensible tensible SStylesheet tylesheet LLanguage anguage TTransformations)ransformations)
Example of XSLT document(Result)
11
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Slicing XML Documents
12
Slicing XML Documentsbull We see XML documents and DTDs as trees
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
13
Slicing XML Documents
bull The Slicing Criterion is composed by a set of nodes in the tree
bull For each node in the slicing criterion we extract from the tree all those nodes that are in the path from the root to the node
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
14
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
15
Slicing XML Documentsbull XML backward slicing criterion
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Web Page(Original)
Web Page(Slice)
16
Slicing XML Documentsbull XML backward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
17
Slicing XML Documents
bull We distinguish between DTD and XML slicing criterionsbull XML slicing criterions are more fine-grained than DTD slicing criterions
bull We distinguish between forward and backward slices (or a combination)
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
18
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
19
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
20
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML backward-forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
21
Slicing XML Documents
bull What happens with DTDs Slices are well-formed but are they valid
bull For each XML slice we produce a DTD slice and viceversa
bull We guarantee that XML slices are valid with respect to DTD slices
DTD
document
SlicerSlicer
XMLdocument
DTD Slicedocument
XML SlicedocumentSlicing Criterion
22
Slicing XML Documents
bull A simple slicing algorithm
23
Slicing XML Documents
bull In the case of a DTD criterion composed by a set of positions C = p1hellippn Pos(D) the algorithm would be the same except that the first loop would be
For each v1v2(hellip)vn C do Vrsquo = Vrsquo v1 v1v2 hellip v1v2(hellip)vn Wrsquo = Wrsquo v1|iv2|j(hellip)vn|k Where v1v2(hellip)vn vrsquo and v1|iv2|j(hellip)vn|k X
Both algorithms produce valid XML and DTD slices with respect to the slicing criterion
24
Slicing XML Documents
The following theorem states the correctness of the technique
Theorem Let D be a well-formed DTD and X a well-formed XML document valid with respect to D Given a slice Drsquo of D and a slice Xrsquo of X computed with an XML slicing criterion C and given a slice Drsquorsquo of D and a slice Xrsquorsquo of X computed with a DTD slicing criterion Crsquo then
a) Drsquo is well-formed and Xrsquo is valid with respect to Drsquob) Drsquorsquo is well-formed and Xrsquorsquo is valid with respect to Drsquorsquo
If all the elements in C are of one of the types in Crsquo then
c) Drsquo = Drsquorsquod) Xrsquo is a subtree of Xrsquorsquo
25
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Implementation
26
Implementation
We have implemented a prototype in Haskell
Haskell provides us a formal basis with many advantages for the manipulation of XML documents
- The HaXml library
It allows us to automatically translate XML or HTML documents into a Haskell representation In particular we use the following data structures that can represent any XMLHTML document
data Element = Elem Name [Attribute] [Content]data Attribute = (Name Value)data Content = CElem Element
| CText String
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
10
XML
bull ObjectiveObjective XSLT is a language for transforming XML
bull StructureStructure An XSLT stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses a formatting vocabulary such as (X)HTML or XSL-FO
bull XSLT is a programming language
Example of XSLT document(Source Code)
XSLT XSLT (e(eXXtensible tensible SStylesheet tylesheet LLanguage anguage TTransformations)ransformations)
Example of XSLT document(Result)
11
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Slicing XML Documents
12
Slicing XML Documentsbull We see XML documents and DTDs as trees
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
13
Slicing XML Documents
bull The Slicing Criterion is composed by a set of nodes in the tree
bull For each node in the slicing criterion we extract from the tree all those nodes that are in the path from the root to the node
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
14
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
15
Slicing XML Documentsbull XML backward slicing criterion
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Web Page(Original)
Web Page(Slice)
16
Slicing XML Documentsbull XML backward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
17
Slicing XML Documents
bull We distinguish between DTD and XML slicing criterionsbull XML slicing criterions are more fine-grained than DTD slicing criterions
bull We distinguish between forward and backward slices (or a combination)
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
18
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
19
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
20
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML backward-forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
21
Slicing XML Documents
bull What happens with DTDs Slices are well-formed but are they valid
bull For each XML slice we produce a DTD slice and viceversa
bull We guarantee that XML slices are valid with respect to DTD slices
DTD
document
SlicerSlicer
XMLdocument
DTD Slicedocument
XML SlicedocumentSlicing Criterion
22
Slicing XML Documents
bull A simple slicing algorithm
23
Slicing XML Documents
bull In the case of a DTD criterion composed by a set of positions C = p1hellippn Pos(D) the algorithm would be the same except that the first loop would be
For each v1v2(hellip)vn C do Vrsquo = Vrsquo v1 v1v2 hellip v1v2(hellip)vn Wrsquo = Wrsquo v1|iv2|j(hellip)vn|k Where v1v2(hellip)vn vrsquo and v1|iv2|j(hellip)vn|k X
Both algorithms produce valid XML and DTD slices with respect to the slicing criterion
24
Slicing XML Documents
The following theorem states the correctness of the technique
Theorem Let D be a well-formed DTD and X a well-formed XML document valid with respect to D Given a slice Drsquo of D and a slice Xrsquo of X computed with an XML slicing criterion C and given a slice Drsquorsquo of D and a slice Xrsquorsquo of X computed with a DTD slicing criterion Crsquo then
a) Drsquo is well-formed and Xrsquo is valid with respect to Drsquob) Drsquorsquo is well-formed and Xrsquorsquo is valid with respect to Drsquorsquo
If all the elements in C are of one of the types in Crsquo then
c) Drsquo = Drsquorsquod) Xrsquo is a subtree of Xrsquorsquo
25
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Implementation
26
Implementation
We have implemented a prototype in Haskell
Haskell provides us a formal basis with many advantages for the manipulation of XML documents
- The HaXml library
It allows us to automatically translate XML or HTML documents into a Haskell representation In particular we use the following data structures that can represent any XMLHTML document
data Element = Elem Name [Attribute] [Content]data Attribute = (Name Value)data Content = CElem Element
| CText String
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
11
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Slicing XML Documents
12
Slicing XML Documentsbull We see XML documents and DTDs as trees
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
13
Slicing XML Documents
bull The Slicing Criterion is composed by a set of nodes in the tree
bull For each node in the slicing criterion we extract from the tree all those nodes that are in the path from the root to the node
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
14
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
15
Slicing XML Documentsbull XML backward slicing criterion
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Web Page(Original)
Web Page(Slice)
16
Slicing XML Documentsbull XML backward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
17
Slicing XML Documents
bull We distinguish between DTD and XML slicing criterionsbull XML slicing criterions are more fine-grained than DTD slicing criterions
bull We distinguish between forward and backward slices (or a combination)
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
18
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
19
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
20
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML backward-forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
21
Slicing XML Documents
bull What happens with DTDs Slices are well-formed but are they valid
bull For each XML slice we produce a DTD slice and viceversa
bull We guarantee that XML slices are valid with respect to DTD slices
DTD
document
SlicerSlicer
XMLdocument
DTD Slicedocument
XML SlicedocumentSlicing Criterion
22
Slicing XML Documents
bull A simple slicing algorithm
23
Slicing XML Documents
bull In the case of a DTD criterion composed by a set of positions C = p1hellippn Pos(D) the algorithm would be the same except that the first loop would be
For each v1v2(hellip)vn C do Vrsquo = Vrsquo v1 v1v2 hellip v1v2(hellip)vn Wrsquo = Wrsquo v1|iv2|j(hellip)vn|k Where v1v2(hellip)vn vrsquo and v1|iv2|j(hellip)vn|k X
Both algorithms produce valid XML and DTD slices with respect to the slicing criterion
24
Slicing XML Documents
The following theorem states the correctness of the technique
Theorem Let D be a well-formed DTD and X a well-formed XML document valid with respect to D Given a slice Drsquo of D and a slice Xrsquo of X computed with an XML slicing criterion C and given a slice Drsquorsquo of D and a slice Xrsquorsquo of X computed with a DTD slicing criterion Crsquo then
a) Drsquo is well-formed and Xrsquo is valid with respect to Drsquob) Drsquorsquo is well-formed and Xrsquorsquo is valid with respect to Drsquorsquo
If all the elements in C are of one of the types in Crsquo then
c) Drsquo = Drsquorsquod) Xrsquo is a subtree of Xrsquorsquo
25
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Implementation
26
Implementation
We have implemented a prototype in Haskell
Haskell provides us a formal basis with many advantages for the manipulation of XML documents
- The HaXml library
It allows us to automatically translate XML or HTML documents into a Haskell representation In particular we use the following data structures that can represent any XMLHTML document
data Element = Elem Name [Attribute] [Content]data Attribute = (Name Value)data Content = CElem Element
| CText String
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
12
Slicing XML Documentsbull We see XML documents and DTDs as trees
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
13
Slicing XML Documents
bull The Slicing Criterion is composed by a set of nodes in the tree
bull For each node in the slicing criterion we extract from the tree all those nodes that are in the path from the root to the node
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
14
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
15
Slicing XML Documentsbull XML backward slicing criterion
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Web Page(Original)
Web Page(Slice)
16
Slicing XML Documentsbull XML backward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
17
Slicing XML Documents
bull We distinguish between DTD and XML slicing criterionsbull XML slicing criterions are more fine-grained than DTD slicing criterions
bull We distinguish between forward and backward slices (or a combination)
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
18
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
19
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
20
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML backward-forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
21
Slicing XML Documents
bull What happens with DTDs Slices are well-formed but are they valid
bull For each XML slice we produce a DTD slice and viceversa
bull We guarantee that XML slices are valid with respect to DTD slices
DTD
document
SlicerSlicer
XMLdocument
DTD Slicedocument
XML SlicedocumentSlicing Criterion
22
Slicing XML Documents
bull A simple slicing algorithm
23
Slicing XML Documents
bull In the case of a DTD criterion composed by a set of positions C = p1hellippn Pos(D) the algorithm would be the same except that the first loop would be
For each v1v2(hellip)vn C do Vrsquo = Vrsquo v1 v1v2 hellip v1v2(hellip)vn Wrsquo = Wrsquo v1|iv2|j(hellip)vn|k Where v1v2(hellip)vn vrsquo and v1|iv2|j(hellip)vn|k X
Both algorithms produce valid XML and DTD slices with respect to the slicing criterion
24
Slicing XML Documents
The following theorem states the correctness of the technique
Theorem Let D be a well-formed DTD and X a well-formed XML document valid with respect to D Given a slice Drsquo of D and a slice Xrsquo of X computed with an XML slicing criterion C and given a slice Drsquorsquo of D and a slice Xrsquorsquo of X computed with a DTD slicing criterion Crsquo then
a) Drsquo is well-formed and Xrsquo is valid with respect to Drsquob) Drsquorsquo is well-formed and Xrsquorsquo is valid with respect to Drsquorsquo
If all the elements in C are of one of the types in Crsquo then
c) Drsquo = Drsquorsquod) Xrsquo is a subtree of Xrsquorsquo
25
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Implementation
26
Implementation
We have implemented a prototype in Haskell
Haskell provides us a formal basis with many advantages for the manipulation of XML documents
- The HaXml library
It allows us to automatically translate XML or HTML documents into a Haskell representation In particular we use the following data structures that can represent any XMLHTML document
data Element = Elem Name [Attribute] [Content]data Attribute = (Name Value)data Content = CElem Element
| CText String
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
13
Slicing XML Documents
bull The Slicing Criterion is composed by a set of nodes in the tree
bull For each node in the slicing criterion we extract from the tree all those nodes that are in the path from the root to the node
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
14
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
15
Slicing XML Documentsbull XML backward slicing criterion
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Web Page(Original)
Web Page(Slice)
16
Slicing XML Documentsbull XML backward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
17
Slicing XML Documents
bull We distinguish between DTD and XML slicing criterionsbull XML slicing criterions are more fine-grained than DTD slicing criterions
bull We distinguish between forward and backward slices (or a combination)
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
18
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
19
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
20
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML backward-forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
21
Slicing XML Documents
bull What happens with DTDs Slices are well-formed but are they valid
bull For each XML slice we produce a DTD slice and viceversa
bull We guarantee that XML slices are valid with respect to DTD slices
DTD
document
SlicerSlicer
XMLdocument
DTD Slicedocument
XML SlicedocumentSlicing Criterion
22
Slicing XML Documents
bull A simple slicing algorithm
23
Slicing XML Documents
bull In the case of a DTD criterion composed by a set of positions C = p1hellippn Pos(D) the algorithm would be the same except that the first loop would be
For each v1v2(hellip)vn C do Vrsquo = Vrsquo v1 v1v2 hellip v1v2(hellip)vn Wrsquo = Wrsquo v1|iv2|j(hellip)vn|k Where v1v2(hellip)vn vrsquo and v1|iv2|j(hellip)vn|k X
Both algorithms produce valid XML and DTD slices with respect to the slicing criterion
24
Slicing XML Documents
The following theorem states the correctness of the technique
Theorem Let D be a well-formed DTD and X a well-formed XML document valid with respect to D Given a slice Drsquo of D and a slice Xrsquo of X computed with an XML slicing criterion C and given a slice Drsquorsquo of D and a slice Xrsquorsquo of X computed with a DTD slicing criterion Crsquo then
a) Drsquo is well-formed and Xrsquo is valid with respect to Drsquob) Drsquorsquo is well-formed and Xrsquorsquo is valid with respect to Drsquorsquo
If all the elements in C are of one of the types in Crsquo then
c) Drsquo = Drsquorsquod) Xrsquo is a subtree of Xrsquorsquo
25
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Implementation
26
Implementation
We have implemented a prototype in Haskell
Haskell provides us a formal basis with many advantages for the manipulation of XML documents
- The HaXml library
It allows us to automatically translate XML or HTML documents into a Haskell representation In particular we use the following data structures that can represent any XMLHTML document
data Element = Elem Name [Attribute] [Content]data Attribute = (Name Value)data Content = CElem Element
| CText String
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
14
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
15
Slicing XML Documentsbull XML backward slicing criterion
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Web Page(Original)
Web Page(Slice)
16
Slicing XML Documentsbull XML backward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
17
Slicing XML Documents
bull We distinguish between DTD and XML slicing criterionsbull XML slicing criterions are more fine-grained than DTD slicing criterions
bull We distinguish between forward and backward slices (or a combination)
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
18
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
19
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
20
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML backward-forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
21
Slicing XML Documents
bull What happens with DTDs Slices are well-formed but are they valid
bull For each XML slice we produce a DTD slice and viceversa
bull We guarantee that XML slices are valid with respect to DTD slices
DTD
document
SlicerSlicer
XMLdocument
DTD Slicedocument
XML SlicedocumentSlicing Criterion
22
Slicing XML Documents
bull A simple slicing algorithm
23
Slicing XML Documents
bull In the case of a DTD criterion composed by a set of positions C = p1hellippn Pos(D) the algorithm would be the same except that the first loop would be
For each v1v2(hellip)vn C do Vrsquo = Vrsquo v1 v1v2 hellip v1v2(hellip)vn Wrsquo = Wrsquo v1|iv2|j(hellip)vn|k Where v1v2(hellip)vn vrsquo and v1|iv2|j(hellip)vn|k X
Both algorithms produce valid XML and DTD slices with respect to the slicing criterion
24
Slicing XML Documents
The following theorem states the correctness of the technique
Theorem Let D be a well-formed DTD and X a well-formed XML document valid with respect to D Given a slice Drsquo of D and a slice Xrsquo of X computed with an XML slicing criterion C and given a slice Drsquorsquo of D and a slice Xrsquorsquo of X computed with a DTD slicing criterion Crsquo then
a) Drsquo is well-formed and Xrsquo is valid with respect to Drsquob) Drsquorsquo is well-formed and Xrsquorsquo is valid with respect to Drsquorsquo
If all the elements in C are of one of the types in Crsquo then
c) Drsquo = Drsquorsquod) Xrsquo is a subtree of Xrsquorsquo
25
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Implementation
26
Implementation
We have implemented a prototype in Haskell
Haskell provides us a formal basis with many advantages for the manipulation of XML documents
- The HaXml library
It allows us to automatically translate XML or HTML documents into a Haskell representation In particular we use the following data structures that can represent any XMLHTML document
data Element = Elem Name [Attribute] [Content]data Attribute = (Name Value)data Content = CElem Element
| CText String
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
15
Slicing XML Documentsbull XML backward slicing criterion
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Web Page(Original)
Web Page(Slice)
16
Slicing XML Documentsbull XML backward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
17
Slicing XML Documents
bull We distinguish between DTD and XML slicing criterionsbull XML slicing criterions are more fine-grained than DTD slicing criterions
bull We distinguish between forward and backward slices (or a combination)
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
18
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
19
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
20
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML backward-forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
21
Slicing XML Documents
bull What happens with DTDs Slices are well-formed but are they valid
bull For each XML slice we produce a DTD slice and viceversa
bull We guarantee that XML slices are valid with respect to DTD slices
DTD
document
SlicerSlicer
XMLdocument
DTD Slicedocument
XML SlicedocumentSlicing Criterion
22
Slicing XML Documents
bull A simple slicing algorithm
23
Slicing XML Documents
bull In the case of a DTD criterion composed by a set of positions C = p1hellippn Pos(D) the algorithm would be the same except that the first loop would be
For each v1v2(hellip)vn C do Vrsquo = Vrsquo v1 v1v2 hellip v1v2(hellip)vn Wrsquo = Wrsquo v1|iv2|j(hellip)vn|k Where v1v2(hellip)vn vrsquo and v1|iv2|j(hellip)vn|k X
Both algorithms produce valid XML and DTD slices with respect to the slicing criterion
24
Slicing XML Documents
The following theorem states the correctness of the technique
Theorem Let D be a well-formed DTD and X a well-formed XML document valid with respect to D Given a slice Drsquo of D and a slice Xrsquo of X computed with an XML slicing criterion C and given a slice Drsquorsquo of D and a slice Xrsquorsquo of X computed with a DTD slicing criterion Crsquo then
a) Drsquo is well-formed and Xrsquo is valid with respect to Drsquob) Drsquorsquo is well-formed and Xrsquorsquo is valid with respect to Drsquorsquo
If all the elements in C are of one of the types in Crsquo then
c) Drsquo = Drsquorsquod) Xrsquo is a subtree of Xrsquorsquo
25
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Implementation
26
Implementation
We have implemented a prototype in Haskell
Haskell provides us a formal basis with many advantages for the manipulation of XML documents
- The HaXml library
It allows us to automatically translate XML or HTML documents into a Haskell representation In particular we use the following data structures that can represent any XMLHTML document
data Element = Elem Name [Attribute] [Content]data Attribute = (Name Value)data Content = CElem Element
| CText String
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
16
Slicing XML Documentsbull XML backward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
17
Slicing XML Documents
bull We distinguish between DTD and XML slicing criterionsbull XML slicing criterions are more fine-grained than DTD slicing criterions
bull We distinguish between forward and backward slices (or a combination)
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
18
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
19
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
20
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML backward-forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
21
Slicing XML Documents
bull What happens with DTDs Slices are well-formed but are they valid
bull For each XML slice we produce a DTD slice and viceversa
bull We guarantee that XML slices are valid with respect to DTD slices
DTD
document
SlicerSlicer
XMLdocument
DTD Slicedocument
XML SlicedocumentSlicing Criterion
22
Slicing XML Documents
bull A simple slicing algorithm
23
Slicing XML Documents
bull In the case of a DTD criterion composed by a set of positions C = p1hellippn Pos(D) the algorithm would be the same except that the first loop would be
For each v1v2(hellip)vn C do Vrsquo = Vrsquo v1 v1v2 hellip v1v2(hellip)vn Wrsquo = Wrsquo v1|iv2|j(hellip)vn|k Where v1v2(hellip)vn vrsquo and v1|iv2|j(hellip)vn|k X
Both algorithms produce valid XML and DTD slices with respect to the slicing criterion
24
Slicing XML Documents
The following theorem states the correctness of the technique
Theorem Let D be a well-formed DTD and X a well-formed XML document valid with respect to D Given a slice Drsquo of D and a slice Xrsquo of X computed with an XML slicing criterion C and given a slice Drsquorsquo of D and a slice Xrsquorsquo of X computed with a DTD slicing criterion Crsquo then
a) Drsquo is well-formed and Xrsquo is valid with respect to Drsquob) Drsquorsquo is well-formed and Xrsquorsquo is valid with respect to Drsquorsquo
If all the elements in C are of one of the types in Crsquo then
c) Drsquo = Drsquorsquod) Xrsquo is a subtree of Xrsquorsquo
25
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Implementation
26
Implementation
We have implemented a prototype in Haskell
Haskell provides us a formal basis with many advantages for the manipulation of XML documents
- The HaXml library
It allows us to automatically translate XML or HTML documents into a Haskell representation In particular we use the following data structures that can represent any XMLHTML document
data Element = Elem Name [Attribute] [Content]data Attribute = (Name Value)data Content = CElem Element
| CText String
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
17
Slicing XML Documents
bull We distinguish between DTD and XML slicing criterionsbull XML slicing criterions are more fine-grained than DTD slicing criterions
bull We distinguish between forward and backward slices (or a combination)
Web Page(Original)
Web Page(Slice)
XML DTD
Forward Backward
18
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
19
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
20
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML backward-forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
21
Slicing XML Documents
bull What happens with DTDs Slices are well-formed but are they valid
bull For each XML slice we produce a DTD slice and viceversa
bull We guarantee that XML slices are valid with respect to DTD slices
DTD
document
SlicerSlicer
XMLdocument
DTD Slicedocument
XML SlicedocumentSlicing Criterion
22
Slicing XML Documents
bull A simple slicing algorithm
23
Slicing XML Documents
bull In the case of a DTD criterion composed by a set of positions C = p1hellippn Pos(D) the algorithm would be the same except that the first loop would be
For each v1v2(hellip)vn C do Vrsquo = Vrsquo v1 v1v2 hellip v1v2(hellip)vn Wrsquo = Wrsquo v1|iv2|j(hellip)vn|k Where v1v2(hellip)vn vrsquo and v1|iv2|j(hellip)vn|k X
Both algorithms produce valid XML and DTD slices with respect to the slicing criterion
24
Slicing XML Documents
The following theorem states the correctness of the technique
Theorem Let D be a well-formed DTD and X a well-formed XML document valid with respect to D Given a slice Drsquo of D and a slice Xrsquo of X computed with an XML slicing criterion C and given a slice Drsquorsquo of D and a slice Xrsquorsquo of X computed with a DTD slicing criterion Crsquo then
a) Drsquo is well-formed and Xrsquo is valid with respect to Drsquob) Drsquorsquo is well-formed and Xrsquorsquo is valid with respect to Drsquorsquo
If all the elements in C are of one of the types in Crsquo then
c) Drsquo = Drsquorsquod) Xrsquo is a subtree of Xrsquorsquo
25
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Implementation
26
Implementation
We have implemented a prototype in Haskell
Haskell provides us a formal basis with many advantages for the manipulation of XML documents
- The HaXml library
It allows us to automatically translate XML or HTML documents into a Haskell representation In particular we use the following data structures that can represent any XMLHTML document
data Element = Elem Name [Attribute] [Content]data Attribute = (Name Value)data Content = CElem Element
| CText String
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
18
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
Slicing XML Documentsbull DTD backward slicing criterion
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Name Sched Course
SubjectStatus Name Surname
Name Year Budget
Project
PersonalInfo
Contact Teaching Research
ltELEMENT PersonalInfo (Contact Teaching Research)gt
ltELEMENT Contact (Status NameSurname)gt
ltELEMENT Status ANYgtltELEMENT Name ANYgtltELEMENT Surname ANYgtltELEMENT Teaching (Subject+)gtltELEMENT Subject (Name Sched
Course)gtltELEMENT Sched ANYgtltELEMENT Course ANYgtltELEMENT Research (Project)gtltELEMENT Project ANYgtltATTLIST Project
name CDATA REQUIREDyear CDATA REQUIREDbudget CDATA IMPLIED
gt
Web Page(Original)
Web Page(Slice)
19
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
20
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML backward-forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
21
Slicing XML Documents
bull What happens with DTDs Slices are well-formed but are they valid
bull For each XML slice we produce a DTD slice and viceversa
bull We guarantee that XML slices are valid with respect to DTD slices
DTD
document
SlicerSlicer
XMLdocument
DTD Slicedocument
XML SlicedocumentSlicing Criterion
22
Slicing XML Documents
bull A simple slicing algorithm
23
Slicing XML Documents
bull In the case of a DTD criterion composed by a set of positions C = p1hellippn Pos(D) the algorithm would be the same except that the first loop would be
For each v1v2(hellip)vn C do Vrsquo = Vrsquo v1 v1v2 hellip v1v2(hellip)vn Wrsquo = Wrsquo v1|iv2|j(hellip)vn|k Where v1v2(hellip)vn vrsquo and v1|iv2|j(hellip)vn|k X
Both algorithms produce valid XML and DTD slices with respect to the slicing criterion
24
Slicing XML Documents
The following theorem states the correctness of the technique
Theorem Let D be a well-formed DTD and X a well-formed XML document valid with respect to D Given a slice Drsquo of D and a slice Xrsquo of X computed with an XML slicing criterion C and given a slice Drsquorsquo of D and a slice Xrsquorsquo of X computed with a DTD slicing criterion Crsquo then
a) Drsquo is well-formed and Xrsquo is valid with respect to Drsquob) Drsquorsquo is well-formed and Xrsquorsquo is valid with respect to Drsquorsquo
If all the elements in C are of one of the types in Crsquo then
c) Drsquo = Drsquorsquod) Xrsquo is a subtree of Xrsquorsquo
25
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Implementation
26
Implementation
We have implemented a prototype in Haskell
Haskell provides us a formal basis with many advantages for the manipulation of XML documents
- The HaXml library
It allows us to automatically translate XML or HTML documents into a Haskell representation In particular we use the following data structures that can represent any XMLHTML document
data Element = Elem Name [Attribute] [Content]data Attribute = (Name Value)data Content = CElem Element
| CText String
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
19
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
20
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML backward-forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
21
Slicing XML Documents
bull What happens with DTDs Slices are well-formed but are they valid
bull For each XML slice we produce a DTD slice and viceversa
bull We guarantee that XML slices are valid with respect to DTD slices
DTD
document
SlicerSlicer
XMLdocument
DTD Slicedocument
XML SlicedocumentSlicing Criterion
22
Slicing XML Documents
bull A simple slicing algorithm
23
Slicing XML Documents
bull In the case of a DTD criterion composed by a set of positions C = p1hellippn Pos(D) the algorithm would be the same except that the first loop would be
For each v1v2(hellip)vn C do Vrsquo = Vrsquo v1 v1v2 hellip v1v2(hellip)vn Wrsquo = Wrsquo v1|iv2|j(hellip)vn|k Where v1v2(hellip)vn vrsquo and v1|iv2|j(hellip)vn|k X
Both algorithms produce valid XML and DTD slices with respect to the slicing criterion
24
Slicing XML Documents
The following theorem states the correctness of the technique
Theorem Let D be a well-formed DTD and X a well-formed XML document valid with respect to D Given a slice Drsquo of D and a slice Xrsquo of X computed with an XML slicing criterion C and given a slice Drsquorsquo of D and a slice Xrsquorsquo of X computed with a DTD slicing criterion Crsquo then
a) Drsquo is well-formed and Xrsquo is valid with respect to Drsquob) Drsquorsquo is well-formed and Xrsquorsquo is valid with respect to Drsquorsquo
If all the elements in C are of one of the types in Crsquo then
c) Drsquo = Drsquorsquod) Xrsquo is a subtree of Xrsquorsquo
25
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Implementation
26
Implementation
We have implemented a prototype in Haskell
Haskell provides us a formal basis with many advantages for the manipulation of XML documents
- The HaXml library
It allows us to automatically translate XML or HTML documents into a Haskell representation In particular we use the following data structures that can represent any XMLHTML document
data Element = Elem Name [Attribute] [Content]data Attribute = (Name Value)data Content = CElem Element
| CText String
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
20
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
Slicing XML Documentsbull XML backward-forward slicing criterion
Logic MonWed 16-184-Mathematics
Subject
Algebra MonTur 11-133-Mathematics
Professor Ryan Gibson
Subject
Syslog 2003-2004 16000 euro
Project
hellip
PersonalInfo
Contact Teaching Research
hellip
ltPersonalInfogtltContactgtltStatusgt Professor ltStatusgt ltNamegt Ryan ltNamegtltSurnamegt Gibson ltSurnamegtltContactgt ltTeachinggtltSubjectgtltNamegt Logic ltNamegtltSchedgt MonWed 16-18 ltSchedgtltCoursegt 4-Mathematics ltCoursegtltSubjectgtltSubjectgt ltNamegt Algebra ltNamegtltSchedgt MonTur 11-13 ltSchedgtltCoursegt 3-Mathematics ltCoursegtltSubjectgt hellipltTeachinggtltResearchgtltProjectname = ldquoSysLogrsquorsquoyear = ldquo2003-2004rsquorsquobudget = ldquo16000eurorsquorsquo gtltResearchgt
ltPersonalInfogt
Web Page(Original)
Web Page(Slice)
21
Slicing XML Documents
bull What happens with DTDs Slices are well-formed but are they valid
bull For each XML slice we produce a DTD slice and viceversa
bull We guarantee that XML slices are valid with respect to DTD slices
DTD
document
SlicerSlicer
XMLdocument
DTD Slicedocument
XML SlicedocumentSlicing Criterion
22
Slicing XML Documents
bull A simple slicing algorithm
23
Slicing XML Documents
bull In the case of a DTD criterion composed by a set of positions C = p1hellippn Pos(D) the algorithm would be the same except that the first loop would be
For each v1v2(hellip)vn C do Vrsquo = Vrsquo v1 v1v2 hellip v1v2(hellip)vn Wrsquo = Wrsquo v1|iv2|j(hellip)vn|k Where v1v2(hellip)vn vrsquo and v1|iv2|j(hellip)vn|k X
Both algorithms produce valid XML and DTD slices with respect to the slicing criterion
24
Slicing XML Documents
The following theorem states the correctness of the technique
Theorem Let D be a well-formed DTD and X a well-formed XML document valid with respect to D Given a slice Drsquo of D and a slice Xrsquo of X computed with an XML slicing criterion C and given a slice Drsquorsquo of D and a slice Xrsquorsquo of X computed with a DTD slicing criterion Crsquo then
a) Drsquo is well-formed and Xrsquo is valid with respect to Drsquob) Drsquorsquo is well-formed and Xrsquorsquo is valid with respect to Drsquorsquo
If all the elements in C are of one of the types in Crsquo then
c) Drsquo = Drsquorsquod) Xrsquo is a subtree of Xrsquorsquo
25
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Implementation
26
Implementation
We have implemented a prototype in Haskell
Haskell provides us a formal basis with many advantages for the manipulation of XML documents
- The HaXml library
It allows us to automatically translate XML or HTML documents into a Haskell representation In particular we use the following data structures that can represent any XMLHTML document
data Element = Elem Name [Attribute] [Content]data Attribute = (Name Value)data Content = CElem Element
| CText String
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
21
Slicing XML Documents
bull What happens with DTDs Slices are well-formed but are they valid
bull For each XML slice we produce a DTD slice and viceversa
bull We guarantee that XML slices are valid with respect to DTD slices
DTD
document
SlicerSlicer
XMLdocument
DTD Slicedocument
XML SlicedocumentSlicing Criterion
22
Slicing XML Documents
bull A simple slicing algorithm
23
Slicing XML Documents
bull In the case of a DTD criterion composed by a set of positions C = p1hellippn Pos(D) the algorithm would be the same except that the first loop would be
For each v1v2(hellip)vn C do Vrsquo = Vrsquo v1 v1v2 hellip v1v2(hellip)vn Wrsquo = Wrsquo v1|iv2|j(hellip)vn|k Where v1v2(hellip)vn vrsquo and v1|iv2|j(hellip)vn|k X
Both algorithms produce valid XML and DTD slices with respect to the slicing criterion
24
Slicing XML Documents
The following theorem states the correctness of the technique
Theorem Let D be a well-formed DTD and X a well-formed XML document valid with respect to D Given a slice Drsquo of D and a slice Xrsquo of X computed with an XML slicing criterion C and given a slice Drsquorsquo of D and a slice Xrsquorsquo of X computed with a DTD slicing criterion Crsquo then
a) Drsquo is well-formed and Xrsquo is valid with respect to Drsquob) Drsquorsquo is well-formed and Xrsquorsquo is valid with respect to Drsquorsquo
If all the elements in C are of one of the types in Crsquo then
c) Drsquo = Drsquorsquod) Xrsquo is a subtree of Xrsquorsquo
25
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Implementation
26
Implementation
We have implemented a prototype in Haskell
Haskell provides us a formal basis with many advantages for the manipulation of XML documents
- The HaXml library
It allows us to automatically translate XML or HTML documents into a Haskell representation In particular we use the following data structures that can represent any XMLHTML document
data Element = Elem Name [Attribute] [Content]data Attribute = (Name Value)data Content = CElem Element
| CText String
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
22
Slicing XML Documents
bull A simple slicing algorithm
23
Slicing XML Documents
bull In the case of a DTD criterion composed by a set of positions C = p1hellippn Pos(D) the algorithm would be the same except that the first loop would be
For each v1v2(hellip)vn C do Vrsquo = Vrsquo v1 v1v2 hellip v1v2(hellip)vn Wrsquo = Wrsquo v1|iv2|j(hellip)vn|k Where v1v2(hellip)vn vrsquo and v1|iv2|j(hellip)vn|k X
Both algorithms produce valid XML and DTD slices with respect to the slicing criterion
24
Slicing XML Documents
The following theorem states the correctness of the technique
Theorem Let D be a well-formed DTD and X a well-formed XML document valid with respect to D Given a slice Drsquo of D and a slice Xrsquo of X computed with an XML slicing criterion C and given a slice Drsquorsquo of D and a slice Xrsquorsquo of X computed with a DTD slicing criterion Crsquo then
a) Drsquo is well-formed and Xrsquo is valid with respect to Drsquob) Drsquorsquo is well-formed and Xrsquorsquo is valid with respect to Drsquorsquo
If all the elements in C are of one of the types in Crsquo then
c) Drsquo = Drsquorsquod) Xrsquo is a subtree of Xrsquorsquo
25
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Implementation
26
Implementation
We have implemented a prototype in Haskell
Haskell provides us a formal basis with many advantages for the manipulation of XML documents
- The HaXml library
It allows us to automatically translate XML or HTML documents into a Haskell representation In particular we use the following data structures that can represent any XMLHTML document
data Element = Elem Name [Attribute] [Content]data Attribute = (Name Value)data Content = CElem Element
| CText String
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
23
Slicing XML Documents
bull In the case of a DTD criterion composed by a set of positions C = p1hellippn Pos(D) the algorithm would be the same except that the first loop would be
For each v1v2(hellip)vn C do Vrsquo = Vrsquo v1 v1v2 hellip v1v2(hellip)vn Wrsquo = Wrsquo v1|iv2|j(hellip)vn|k Where v1v2(hellip)vn vrsquo and v1|iv2|j(hellip)vn|k X
Both algorithms produce valid XML and DTD slices with respect to the slicing criterion
24
Slicing XML Documents
The following theorem states the correctness of the technique
Theorem Let D be a well-formed DTD and X a well-formed XML document valid with respect to D Given a slice Drsquo of D and a slice Xrsquo of X computed with an XML slicing criterion C and given a slice Drsquorsquo of D and a slice Xrsquorsquo of X computed with a DTD slicing criterion Crsquo then
a) Drsquo is well-formed and Xrsquo is valid with respect to Drsquob) Drsquorsquo is well-formed and Xrsquorsquo is valid with respect to Drsquorsquo
If all the elements in C are of one of the types in Crsquo then
c) Drsquo = Drsquorsquod) Xrsquo is a subtree of Xrsquorsquo
25
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Implementation
26
Implementation
We have implemented a prototype in Haskell
Haskell provides us a formal basis with many advantages for the manipulation of XML documents
- The HaXml library
It allows us to automatically translate XML or HTML documents into a Haskell representation In particular we use the following data structures that can represent any XMLHTML document
data Element = Elem Name [Attribute] [Content]data Attribute = (Name Value)data Content = CElem Element
| CText String
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
24
Slicing XML Documents
The following theorem states the correctness of the technique
Theorem Let D be a well-formed DTD and X a well-formed XML document valid with respect to D Given a slice Drsquo of D and a slice Xrsquo of X computed with an XML slicing criterion C and given a slice Drsquorsquo of D and a slice Xrsquorsquo of X computed with a DTD slicing criterion Crsquo then
a) Drsquo is well-formed and Xrsquo is valid with respect to Drsquob) Drsquorsquo is well-formed and Xrsquorsquo is valid with respect to Drsquorsquo
If all the elements in C are of one of the types in Crsquo then
c) Drsquo = Drsquorsquod) Xrsquo is a subtree of Xrsquorsquo
25
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Implementation
26
Implementation
We have implemented a prototype in Haskell
Haskell provides us a formal basis with many advantages for the manipulation of XML documents
- The HaXml library
It allows us to automatically translate XML or HTML documents into a Haskell representation In particular we use the following data structures that can represent any XMLHTML document
data Element = Elem Name [Attribute] [Content]data Attribute = (Name Value)data Content = CElem Element
| CText String
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
25
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Implementation
26
Implementation
We have implemented a prototype in Haskell
Haskell provides us a formal basis with many advantages for the manipulation of XML documents
- The HaXml library
It allows us to automatically translate XML or HTML documents into a Haskell representation In particular we use the following data structures that can represent any XMLHTML document
data Element = Elem Name [Attribute] [Content]data Attribute = (Name Value)data Content = CElem Element
| CText String
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
26
Implementation
We have implemented a prototype in Haskell
Haskell provides us a formal basis with many advantages for the manipulation of XML documents
- The HaXml library
It allows us to automatically translate XML or HTML documents into a Haskell representation In particular we use the following data structures that can represent any XMLHTML document
data Element = Elem Name [Attribute] [Content]data Attribute = (Name Value)data Content = CElem Element
| CText String
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
27
XML XSLT WebPage
(Data)(Presentation)
Implementation
From XML slices to Webpage slices
XML XSLT WebPage
(Data)(Presentation)
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
28
Implementation
XSLT Implementation Guidelines
XSLT documents must generate the information and the presentation elements under the same conditions (ie the former is generated if and only if the later is generated)
Both the XML data and the presentation labels are generated together
This does not imposes any restriction on the power of XSLT since the same webpages can be generated On the contrary this way of programming forces the programmer to build transformations that canbe easily reused and maintained because both the information and presentation data depending on the same condition are put together
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
29
Implementation
XSLT Implementation Guidelines
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
30
Implementation
The implementation some examples and other material is publicly available at
wwwdsicupves~jsilvaxml
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
31
MotivationProgram SlicingXML
bull DTDbull XSLT
Slicing XML Documentsbull Example
ImplementationConclusions amp Future Work
Contents
Conclusions amp Future Work
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery
32
Conclusions
We proposed the application of program slicing techniques to XML data structures
We defined an algorithm to slice XML and DTD documents
XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest
implementation effort
Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files
Future Work Migration to XML Schema New implementation based on XQuery