View
220
Download
1
Embed Size (px)
Citation preview
No More Pain for XML’s Gain
XJ: Facilitating XML Processing in Java
Matthew HarrenMukund RaghavachariOded ShmueliMichael Burke Rajesh BordawekarIgor PechtchanskiVivek Sarke
Itay Maman236826 Seminar lecture, 15 June 2005
2
The basic premise
• XML is getting increasingly popular• XML manipulation is now a common programming
task• The lead question:
– Do modern OO languages sufficiently support XML ?
3
<?xml version="1.0" encoding="UTF-8"?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"><xs:element name="catalog"> <xs:complexType> <xs:sequence> <xs:element name="course" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="points" type="xs:int"/> <xs:element name="number" type="xs:int"/> <xs:element name="name" type="xs:string"/> <xs:element name="teacher" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element></xs:schema>
Introduction: Schema file(file: technioncatalog.xsd)
4
Desired Output...
Introduction: XML document(file: short.xml)
<?xml version="1.0" encoding="UTF-8"?><catalog> <course> <points>3</points> <number>234319</number> <name>Programming Languages</name> <teacher>Ron Pinter</teacher> </course> <course> <points>3</points> <number>234141</number> <name>Combinatorics for CS</name> <teacher>Ran El-Yaniv</teacher> </course></catalog>
“Combinatorics for CS (234141) by Ran El-Yaniv, 3 credit points”
5
Introduction: The XJ program
import java.io.*; import technioncatalog.*; public class Demo1 { public static void main(String[] args) throws Throwable { catalog cat = new catalog(new(File("short.xml")); catalog.course c = cat [| /course[2] |]; printCourse(c); } private static void printCourse(catalog.course c) { String name = c [| /name |]; String teacher = c [| /teacher |]; int points = c [| /points |]; int id = c [| /number |]; System.out.println(name + "(" + id + ") by " + teacher + ", " + points); } }
6
public static void main(String[] args) throws Throwable { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(new java.io.File("short.xml")); XPath xp = XPathFactory.newInstance().newXPath(); DTMNodeList nodes = (DTMNodeList) xp.evaluate("//course", doc, XPathConstants.NODESET); printCourse(nodes.item(1)); }
XPath is a plain string. It may be:•Syntactically incorrect•Incompatible with the document
The types of the XML objects
(Node, Document) do not reflect the schema
Traditional XML processing: (DOM, XPath apis)
7
private static void printCourse(Node n) { NodeList nodes = n.getChildNodes(); System.out.println(nodes.item(5).getTextContent() + " (" + nodes.item(3).getTextContent() + ") by " + nodes.item(7).getTextContent() + ", " + nodes.item(1).getTextContent() + " credit points");}
Assumption: Four child nodes must exist
Assumption: 3rd child is the course number
• These assumptions will not hold if the schema is changed– => run-time errors– problems remain, even if we identify nodes by name
• Possible Schema changes:– Allowing a new optional <students> sub-element– Changing the order of the sub-elements
What about reading the numeric value of an element?
Traditional XML processing(DOM apis)
Assumption: 2nd child has no child elements
8
No easy solution
• Similar problems occur when:1. XML elements are created by the program
2. Other libraries are used for reading/writing XML documents– Such as: Xalan, SAX
3. The developer wraps several complex operations within a single function/method/class
• These are inherent problems of the language
9
Shaping the future
• What XML-related facilities do we want?– Typed XML objects – Seamless translation of a Schema/DTD into a Java type – Two composition techniques
• XML notation • Java’s object creation syntax
– Two decomposition techniques
• Typed XPath • Typed, named methods/fields
– XPath expressions as first-class-values
10
Has the future arrived yet?
• Significant effort in integration of XML into modern programming language
– XJ– Scala– Cω– XTatic– …
• We will overview the constructs offered by XJ– A super-set of Java– Available at: http://www.research.ibm.com/xj
11
XJ’s Type system
12
XJ’s Type system
• Hierarchy of classes– A common root class: XMLObject – Automatic import: package com.ibm.xj.*
• Genericity: Sequence<T>, XMLCursor<T>– XMLCursor<T> is a Sequence<T> iterator
13
Integration with Schema
• The rationale: 1. An OO program is a collection of class definitions
2. A Schema file is a collection of type definitions
• => let’s integrate these definitions
• Any Schema is also an XJ types– The XJ compiler generates a “logical class” for
each such type– Schema file == package name– Using a schema == import schema_file_name;
14
import technioncatalog.*;
public class Demo2 { public static void main(String[] args) throws Throwable { String x = "Algorithms 1"; int y = 234247; catalog cat = buildCatalog(new catalog.course( <course><points>3</points>
<number>{y}</number><name>{x}</name> <teacher>Shlomo Moran</teacher></course>)); }
private static catalog buildCatalog(catalog.course c) {
return new catalog(<catalog>{c}</catalog>); } }
XML literal in XJ code• Invalid XML content triggers a compile-time error• Resulting elements are typed!• Curly braces allow “escaping” back into XJ
15
... course c = new course(<course> <teacher>Shlomo Moran</teacher></course>); buildCatalog(c);
XMLObject x = new course.teacher ( <teacher>Shlomo Moran</teacher>); buildCatalog(x);...
private static catalog buildCatalog(catalog.course c) { return new catalog(<catalog>{c}</catalog>); }
An ill-typed program
Wrong <course> element
An XMLObject cannot be passed as a course element
16
Embedding XPath Queries in XJ
• Syntax: XmlValue [| XPathQuery |]
• Requires: a context-provider: – An XML element over which the XPath query is invoked
– (see the cat variable in the sample)
• Escaping: use a ‘$’ prefix
course doSomething(catalog cat, int courseNum) { return cat [| /course[./number = $courseNum] |];}
17
• Problem: resulting type is sometimes not so clear• Two options
– Sequence<T>• If the compiler determines that all result elements are
of type T– Sequence<XMLObject>
• (Otherwise)
• Automatic conversion from a singleton sequence
• Static check of XPath queries– If result is always empty => compile-time error– (The compiler cannot catch all cases)
XPath Semantics
18
Implicit coercions
• An atomic XML value can be seamlesslyconverted into a corresponding Java value
– xsd:double => double– xsd:boolean => boolean– xsd:string => java.lang.String– …
• This reduces the verbosity of XML-related code:
import technioncatalog.*;import technioncatalog.catalog.*;
public static String getTeacher(course c) { return c [| /teacher |]; }
Sequence<teacher> ► teacher ► String
19
Updates: Assignment to Query Result
• An XPath expression returns a reference to an existing element
– (No copying is involved)– Consistent with Java’s semantics for objects
• Thus, it can be assigned to – An XPath expression is a legal lvalue
• Bulk assignment– Occurs when the XPath expression denotes a sequence– Bulk assignment operator := allows multiple assignments– Double the credit points of each course:
public static void changePoint(catalog.course c, int p) {
c [| /points |] = p;}
cat [| //points |] *:= 2;
20
Tree structure update
• Class XMLObject also defines methods, such as:– insertAfter()– insertBefore()– insertAsFirst()– detach()
public static void addCourse(catalog cat) { course c = new course(<course><points>4</points> <number>234111</number><name>Intorduction to CS</name> <teacher>Roy Friedman</teacher></course>); cat.insertAsLast(c);}
Which object is being modified?
21
Problems: Type Consistency
• Definitions1. An XML update operation, u, is a mapping over XML values
• u: T1 -> T2
2. An update is consistent if T1 = T2
• Ideally, a compile-time error should be triggered for each inconsistent update in the program
• Unfortunately, this cannot be promised
• The solution: Additional run-time check
Can you think of an example ?
Why do we want the two types to be equal?
22
Problems: Covariant subtyping (1/2)
• Covariance: change of type in signature is in the same direction as that of the inheritance
class X { }class A { public void m(X x) { } }
Class X1 extends X { }Class A1 extends A { public void m(X1 x) { } }...A a = new A1(); a.m(new X());
A1.m() is “spoiled”: Requires
only X1 objects
• Java favors type-safety: A method with covariant arguments is considered to be an overloading rather than overriding
– Same approach is taken by C++, C#
• But, covariance is allowed for arrays– Array assignments may fail at run-time
Which method should be invoked: A.m() or
A1.m() ?
23
Problems: Covariant subtyping (2/2)
(Now let us get back to our technioncatalog schema…)
• A <course> value is also spoiled – It requires unique children: <points>, <name>, etc.
• But, it also has an unspoiled super-class: XMLObject– All updates to XMLObject are legal at compile-time
• The following code compiles successfully:
public static void trick(course c) { XMLObject x = c; points p = new points(<points>4</points>); x.appendAsLast(p); }
Run-time error is here !!
24
• Language constructs seen so far
– Typed XML objects – Seamless translation of a Schema/DTD into a Java type – Two composition techniques
• XML notation • Java’s object creation syntax
– Two decomposition techniques
• Typed XPath • Typed, named methods/fields
– XPath expressions as first-class-values
Shaping the future (revisited)
25
XPath expression as first-class-values
• What is a first-class-value?– A value that can be used “naturally” in the program
• Passed as an argument• Stored in a variable/field• Returned from a method• Created
• In XJ, XPath expression do not met these conditions– The main obstacle: The XPath part of the expression cannot
be separated from its context provider
26
XPath expression as first-class-values(cont’d)
• Let’s speculate on XPath as an FCV…• (Following code IS NOT a legal XJ program)
private static Sequence<teacher> teachers;
static Sequence<teacher> find(XPath<catalog,teacher> q) { Catalog c = new Catalog(new File("file1.xml")); return q.evaluate(c);}
static void main(String[] args) { Sqeuence<teacher> all = find(<catalog>[| //teacher |]); Sequence<teacher> few = find( <catalog>[| //number/234319/../../teacher |] );}
27
XPath expression as first-class-values(cont’d)
• Operators on XPath values– Composition– Conjunction– Disjunction
• These operators will allow the developer to easily create a rich array of safe XPath values
• The compiler must keep track of the type of each such value
– Basically an XPath value is a function T -> R, where both T,R are subclasses of XMLObject
– When two XPath values are composed, the result type is deduced from the types of the operands
28
import Data._; // import generated definitionsimport scala.xml._; // for creating PCDATA nodes
object Main with Application { val x = course(teacher(Text("Ran El-Yaniv")), points(Text("3")), name(Text("Combinatorics for CS")), number(Text("234141"))); Console.println(x); }
Scala: Composition of XML elements
• In Scala, types can be defined in a DTD file– A DTD can be translated into Scala classes via the
dtd2scala utility
• Scala offers two options for composition of XML elements:
– Using XML notation (similar to XJ)– Using case-class construction notation:
29
Typed, named methods/fields
• Usually, values aggregated by a Java object are accessed by fields/methods
– Can we access XML sub-elements this way?– (Following code IS NOT a legal XJ program)
import technioncatalog.*;void printTeachers(catalog cat) { for(int i = 0; i < cat.courses.length; ++i) { catalog.course c = cat.courses[i]; System.out.println(c.teacher); }}
30
Typed, named methods/fields(cont’d)
• Some of the difficulties:– Sub-elements are not always named– Schema supports optional types: <xsd:choice>
• How can Java express an “optional” field?
• Observation: Java’s typing mechanisms cannot capture the wealth of Schema/DTD types
– Missing features: virtual fields, inheritance without polymorphism
– Other features can be found in Functional languages• E.g.: Variant types, immutability, structural conformance• But, their popularity lags behind
31
Summary
• XJ is a Java extension that has built in support for XML
– Type safety: Many things are checked at compile time
– Ease of use
• OO languages are not powerful enough (in terms of typing)
– Some type information is lost in the transition Schema -> Java
32
-The End-