1
Using XML in .NET
Sorting through the choices
2
XML and API’s
• XML is a series of W3C recommendations
• Focus on structure and behavior
• W3C says very little about API implementation
• The only real API defined by the W3C is the Document Object Model (a.k.a DOM)
• W3C does not mandate any particular API
3
Relevant W3C Recommendations
• Extensible Markup Language 1.0– Defines syntax– Defines document node types
Core content represented as Elements, Attributes and Text
Non-core content represented as Comments and Processing Instructions
4
W3C Recommendations (cont.)
• Namespaces– Defines rules for unique element and attribute naming– Uniqueness achieved through the use of Uniform Resource
Identifiers (URI)– All .NET API’s support namespaces
5
W3C Recommendations (cont.)
• XML Information Set (Infoset)– Defines information items to represent XML node types– Abstracts data relationships from syntax– Abstracts application from document encoding
Content represented to applications as Unicode
– Most higher level recommendations based on Infoset– .NET API’s mostly represent content consistent with the
Infoset
6
W3C Recommendations (cont.)
• XPath– XML Document query language– Syntactically similar to directory paths
/Invoices/Invoice[descendant::Price > 50]
– Identify what you want not how to get it
Loosely analogous to SQL
7
Verifying Document Content
• Wellformed– Document conforms to XML 1.0 recommendation– All .NET API’s enforce wellformidness
• Validation– Document conforms to defined content rules– .NET supported validation types
Document Type Definitions, W3C XML Schema, XML Data Reduced
– Not all .NET API’s support validation
8
.NET XML API’s
• Document Object Model (DOM)– Implements the W3C DOM recommendation– A collection of classes representing the various Infoset
information items– Indirectly supports validation– Relatively resource intensive– Only API that both reads & writes– Excellent for random access processing– Easiest migration if experienced with MSXML’s
DomDocument
9
.NET XML API’s (cont.)
• XPathNavigator– A scrollable, cursor-based document reader– Indirectly supports document validation– May be more resource efficient then DOM– Well-suited for processing document subsets
10
.NET XML API’s (cont.)
• XmlReader– Abstract (MustOverride) forward-only document reader– Well-suited for sequential processing– XmlTextReader
Derived from XmlReader
Absolutely fastest document reader
Does not support validation– XmlValidatingReader
Derived from XmlReader
Directly supports validation
11
.NET XML API’s (cont.)
• XmlWriter– Abstract (MustOverride) forward-only document writer– Well-suited for sequential document generation– XmlTextWriter
Derived from XmlWriter
Absolutely fastest document writer
Does not support validation
12
Where’s the SAX Parser?
• .NET does not provide a SAX parser
• Benefits of SAX available through XmlReader implementations
• Microsoft asserts that XmlReader provides several benefits over SAX– XmlReader’s “pull” model is simpler to program then SAX’s
“push” model– Pull model allows program to be optimized for specific
document structure– Simpler programming when multiple documents processed
simultaneously
13
Document Object Model
14
Document Object Model (DOM)
• A hierarchy of classes representing the various document nodes
• Classes in System.Xml
• Well-suited for random access and dynamic modification
• All node classes inherit from XmlNode
• The only creatable class is XmlDocument
• Contents validated if populated through XmlValidatingReader
15
System.Object
XmlLinkedNode
XmlNode
XmlEntityReference
XmlProcessingInstruction
XmlCharacterData
XmlCDataSection
XmlElement
XmlDecleration
XmlDocumentType
.NET DOM Class Hierarchy Classes in System.Xml
XmlComment
XmlSignificantWhiteSpace
XmlText
XmlWhiteSpace
XmlEntity
XmlNotation
XmlCharacterData
XmlAttribute
XmlDocument
XmlDocumentFragment
16
XmlNode
• XmlNode– Abstract (MustOverride) class that generically represents
each node– Properties & methods to manage node relationships– Properties expose node type, name, namespace and
content– Meaning of name and content vary depending on node type
17
XmlNode Name & Value Definitions
NodeType Name Value
Element Element QName null
Attribute Attribute QName Attribute Value
Text #text Text
Document #document null
XmlDecleration xml Declaration Content
Comment #comment Comment Text
Processing Instruction PI Target PI Data
18
XmlNode Relationships
• Relationships managed through read-only XmlNode reference properties– OwnerDocument– ParentNode– Siblings
PreviousSibling, NextSibling– Children
FirstChild, LastChild
19
DOM Tree Walker
void Main(string[] args){ XmlDocument dom = new XmlDocument() ; dom.Load(@"C:\DataFiles\Classes.xml") ; TreeWalk(dom) ;}
void TreeWalk(XmlNode node){ if (node == null) return ; Console.WriteLine("Type: {0}\tName: {1}\tValue: {2}", node.NodeType, node.Name, node.Value) ; TreeWalk(node.FirstChild) ; TreeWalk(node.NextSibling) ;}
20
XmlNode Collections
• Child Nodes– childNodes property returnes an XmlNodeList
Nodes accessed either through item property or indexer []
– XmlNode also exposes an indexer [ ] to access children
Invoice[“Price”]
21
XmlNode Collections (cont.)
• Attribute Nodes– Attributes property returns an XmlAttributeCollection
Attributes accessed through an indexer [] by either name or position
Attributes added or changed through SetNamedItem method
22
DOM Tree Walker Using Collections
static void CollectWalk(XmlNode node){ Console.WriteLine("Type: {0}\tName: {1}\tValue: {2}", node.NodeType, node.Name, node.Value) ; if (node.HasChildNodes) { XmlNodeList nodeList = node.ChildNodes ; foreach (XmlNode child in nodeList) CollectWalk(child) ; }}
23
DOM Tree Walker Using Collections
static void CollectWalk(XmlNode node){ Console.WriteLine("Type: {0}\tName: {1}\tValue: {2}", node.NodeType, node.Name, node.Value) ;
if (node.Attributes != null) foreach (XmlAttribute Attr in node.Attributes) Console.WriteLine("\tAttr: {0}={1}", Attr.Name, Attr.Value) ;
if (node.HasChildNodes) { XmlNodeList nodeList = node.ChildNodes ; foreach (XmlNode child in nodeList) CollectWalk(child) ; }}
24
XPath Support
• SelectNodes– Returns an XmlNodeList containing selection result
• SelectSingleNode– Returns an XmlNode containing only the first node in the
selection result
void ShowStudentNames (XmlNode node){XmlNodeList nl = node.SelectNodes(“Student/@Name”) ;foreach (XmlNode n in nl) Console.WriteLine(“Student:{0}”, n.Value) ;}
25
DOM Modification
• Node Creation– New nodes must be created by XmlDocument
• XmlNode Placement Methods– InsertBefore, InserAfter– PrependChild, AppendChild– RemoveChild, ReplaceChild, RemoveAll
• Modification events managed through delegates– NodeChanging, NodeChanged– NodeInserting, NodeInserted– NodeRemoving, NodeRemoved
26
DOM Modificationstatic void BuildClass(XmlDocument dom, string Path){ XmlElement cs = dom.CreateElement("Classes") ; dom.AppendChild(cs) ; XmlElement c = dom.CreateElement("Class") ; c.SetAttribute("name", ".NET XML") ; cs.AppendChild(c) ; n = c.AppendChild(dom.CreateElement("Students")) ; n.AppendChild(dom.CreateTextNode("12")) ; n = c.AppendChild(dom.CreateElement("Location")) ; n.AppendChild(dom.CreateTextNode("Maine Bytes")) ; n = c.AppendChild(dom.CreateElement("Inst")) ; n.AppendChild(dom.CreateTextNode("Jim")) ; dom.Save(Path) ;}
<?xml version="1.0" encoding="utf-8"?><Classes name=".NET XML"> <Students>12</Students> <Location>Maine Bytes</Location> <Instructor>Jim</Instructor></Classes>
27
XPathNavigator
28
XPathNavigator
• Classes in System.Xml.XPath
• Read-only
• Provides a scrolling cursor “window” over the document
• Great support for document filtering
• Best XPath support
• Content is interpreted according to XPath specification
29
Creating the Navigator
• XPathNavigator is an abstract (MustOverride) class
• Must be factoried from another object
• Factory objects must implement IXPathNavogable– XPathDocument implementation creates an efficient
navigator cache
Can be populated from XmlValidatingReader
– XmlNode implementation creates a navigator over the corresponding DOM instance.
30
XPathNavigator Name & Value Definitions
NodeType Name Value
Element Element QNameConcatenated value of descendant text nodes in document order
Attribute Attribute QName Attribute Value
Text null Text
Root nullConcatenated value of all text nodes in document order
Comment null Comment Text
Processing Instruction PI Target PI Data
31
Cursor Navigation
• Cursor is controlled by “MoveTo…” methods– MoveToRoot– MoveToParent– MoveToFirstChild– Siblings
MoveToFirst, MoveToPrevious, MoveToNext
– Attributes
MoveToFirstAttribute, MoveToNextAttribute
32
Navigator Tree Walker
static void Main(string[] args){ XPathDocument doc = new XPathDocument(Path) ; XPathNavigator nav = doc.CreateNavigator() ; TreeWalk(nav) ;}
static void TreeWalk(XPathNavigator nav){ Console.WriteLine("Type: {0}\tName: {1}\tValue: {2}", nav.NodeType, nav.Name, nav.Value) ; if (nav.HasChildren) { nav.MoveToFirstChild() ; TreeWalk(nav) ; nav.MoveToParent() ; } if (nav.MoveToNext()) TreeWalk(nav) ;}
33
Navigator Tree Walker
static void TreeWalk(XPathNavigator nav){ Console.WriteLine("Type: {0}\tName: {1}\tValue: {2}", nav.NodeType, nav.Name, nav.Value) ; if (nav.HasAttributes) { while (nav.MoveToNextAttribute()) Console.WriteLine("\tAttr: {0}={1}", nav.Name, nav.Value) ; nav.MoveToParent() ; } if (nav.HasChildren) { nav.MoveToFirstChild() ; TreeWalk(nav) ; nav.MoveToParent() ; } if (nav.MoveToNext()) TreeWalk(nav) ;}
34
Processing Document Subsets
• Selection Methods– SelectChildren, SelectDescendants, SelectAncestors– All functions return an XPathNodeIterator
• XPathNodeIterator– Represents a cursor over the selected set– MoveNext method
Advances the cursor– Current property
Returns an XPathNavigator positioned at the current node
35
XPath Support
• Select method– Generic XPath selection– Node set returned as an XPathNodeIterator
• Evaluate method– Returns typed XPath results– XPath supports arithmetic operations, logical operations and
function calls– XPath statements can return numeric, string and Boolean
results
36
Navigator XPath Support
static void LargeClasses(XPathNavigator nav){ XPathNodeIterator nodes = nav.Select("//Class[Students > 10]") ; while (nodes.MoveNext()) { XPathNavigator classNav = nodes.Current ; classNav.MoveToAttribute("name", "") ; Console.WriteLine("Class: {0}", classNav.Value) ; }}
static void CountStdnt(XPathNavigator nav){ double total = (double) nav.Evaluate("sum(//Students)") ; Console.WriteLine("Total Students: {0}", total) ;}
37
Enhancing XPath Performance
• XPath statements can be compiled to improve performance– XPathNavigator.Compile
Compiles XPath string into an XPathExpression
– XPathExpression represents a compiled XPath statement– Select & Evaluate are overloaded to support
XPathExpressions in addition to XPath strings
void CountStudents (XPathNavigator nav){XPathExpression exp = nav.Compile(“sum(//Students)”)Console.WriteLine(“Total:{0}”, nav.Evaluate(exp)) ;}
38
Sequential Reading
39
XmlReader
• Abstract (MustOverride) class
• Represents a forward-only document reader
• Exposes information only for the current node position– NodeType, Name, NamespaceURI, Value
• Application must handle some lexical issues
40
XmlReader (cont.)
• Read method– Supports generic document processing– Reads the next hierarchical node– Application code must manage details of each node
• Attributes must be specifically read– MoveToFirstAttribute, MoveToNextAttribute
Iterates through the attribute list– MoveToAttribute
Move to a named attribute or attribute position
41
XmlReader (cont.)
• ReadStartElement & ReadEndElement provide element optimizations– Reader verifies that node is an element– Overloads support name and namespace verification
• ReadElementString– Encapsulates node type, name & namespace verification– Reads element start & end tags and text child– Returns value of text child
• MoveToContent– Skips over white space, comments & Processing
Instructions
42
XmlTextReader
• Derived from XmlReader
• Most performent .NET XML reader
• Adds methods to interrogate file information– LineNumber, LinePosition, Encoding
• Adds methods to simplify large data block handling– ReadBase64, ReadBinHex, ReadChars
43
XmlTextReader
static void Main(string[] args){ ClassInfo(new XmlTextReader(Path)) ; }
static void ClassInfo(XmlTextReader Rdr){ Rdr.MoveToContent() ; Rdr.ReadStartElement("Classes") ; Rdr.MoveToContent() ; while (Rdr.Name != "Classes") { Console.Write ("{0}|", Rdr["name"]) ; Rdr.ReadStartElement("Class") ; Console.Write ("{0}|", Rdr.ReadElementString("Students")) ; Console.Write ("{0}|", Rdr.ReadElementString("Location")) ; Console.WriteLine("{0}", Rdr.ReadElementString("Instructor")) ; Rdr.ReadEndElement() ; Rdr.MoveToContent() ; }}
44
XmlValidatingReader
• Derived from XmlReader
• Provides DocumentValidation over an existing XmlReader instance
• Validation errors reported to a delegate or by throwing an exception– Delegates registered with ValidationEventHandler– If no delegate registered then an XmlException is thrown
45
XmlValidatingReader
• XmlDocument and XPathDocument can be populated through XmlValidatingReader
void LoadDom (XmlDocument dom, string fileName){ XmlTextReader TRdr = new XmlTextReader(fileName) ; XmlValidatingReader VRdr = new XmlValidatingReader(TRdr) ; VRdr.ValidationEventHandler += new ValidationEventHandler(vCallBack) ; dom.Load(VRdr) ;}
void vCallBack(object sender, ValidationEventArgs args){. . .}
46
XmlValidatingReader
• ValidationType & Schemas properties can be used to manage the validation process
• ReadTypedValue returns value as the proper CLR type– Must be using XML Schema or XDR validation
47
Sequential Writing
48
XmlWriter
• Abstract (MustOverride) class
• Represents a forward-only, sequential document writer
• Checks wellformidness of generated content
• Does not validate
49
XmlWriter (cont.)
• Provides “Write” methods for the various node types– WriteStartElement, WriteEndElement, WriteString,
WriteComment, etc.– WriteElementString writes start tag, end tag and character
child in a single call
• WriteDocType method supports writing DTD entries
• WriteRaw method allows pass-through writing of raw XML– Writer does not check wellformidness of raw writes
50
XmlTextWriter
• Derives from XmlWriter
• Adds Formatting control– Formmatting, Indentation, IndentChar & QuoteChar
properties
• Adds methods to simplify large data block handling– WriteBase64, WriteBinHex, WriteChars
51
XmlTextWriterstatic void WriteClass(XmlTextWriter wrt){ wrt.Formatting = Formatting.Indented ; wrt.WriteStartDocument() ; wrt.WriteStartElement("Classes") ; wrt.WriteAttributeString("name", ".NET XML") ; wrt.WriteElementString("Students", "12") ; wrt.WriteElementString("Location", "Maine Bytes") ; wrt.WriteElementString("Instructor", "Jim") ;}static void Main(string[] args){ XmlTextWriter wrt = new XmlTextWriter(Path, Encoding.UTF8) ; WriteClass(wrt) ; wrt.Close() ;}
<?xml version="1.0" encoding="utf-8"?><Classes name=".NET XML"> <Students>12</Students> <Location>Maine Bytes</Location> <Instructor>Jim</Instructor></Classes>
52
Summary
• Each API is optimized for a different use
• DOM– Random Access, Dynamic Updates
• XPath Navigator– Document Subsets, Rich XPath Support
• XmlReader– Abstract class, models sequential reading
• XmlTextReader– Most performant reader
53
Summary (cont.)
• XmlValidatingReader– Validation, Usable directly or w/ DOM & XPathDocument
• XmlWriter– Abstract class, models sequential writing
• XmlTextWriter– Most performant writer
• Often best solution is to use a combination of the API’s
54
Download & Contact Information
Presentation Downloadhttp://www.jwhedgehog.com/MaineBytes0701
Sample Code Downloadhttp://www.jwhedgehog.com/MaineBytes0701