Upload
neeraj-singh
View
236
Download
0
Embed Size (px)
Citation preview
8/14/2019 Session02 XML Syntax
1/36
2008 MindTree Consulting
XML Syntax
Rules of XML Language
Sep-2009
8/14/2019 Session02 XML Syntax
2/36
Slide 2
Agenda
Need for XML
Quiz
XML Syntax - Rules of XML language
8/14/2019 Session02 XML Syntax
3/36
2008 MindTree Consulting
Need For XMLRevision of previous session
Quiz
8/14/2019 Session02 XML Syntax
4/36
8/14/2019 Session02 XML Syntax
5/36
Slide 5
XML
How can you have both more tags and fewer tags in a singlelanguage?
To resolve this dilemma, XML makes essentially two changes to HTML:
It predefines no tags.
It is stricter.
8/14/2019 Session02 XML Syntax
6/36
Slide 6
What is Markup
In an electronic document, the markup is the codes, embeddedwith the document text, which store the information required for
electronic processing, like font name, boldness or, in the case of
XML, the document structure. This is not specific to XML. Every
electronic document standard uses some sort of markup.
8/14/2019 Session02 XML Syntax
7/36Slide 7
Applications of XML
PublishingXML is being used by an increasing number of publishers as the format
for documents.
Example XML document for a monthly newsletter. As you can see, it uses
elements for the title, abstract, paragraphs, and other concepts commonin publishing.
Business Document Exchange
For example placing the order in XML rather than on paper. Advantage is
that software can process it. An application could read this order andautomatically fulfill it.
RSS / Atom
Eg Bloglines
8/14/2019 Session02 XML Syntax
8/36 2008 MindTree Consulting
XML Introduction - QuizBasic questions on XML Introduction
8/14/2019 Session02 XML Syntax
9/36Slide 9
XML Introduction - Quiz
XML stands forXML is about the description of data, and not its presentation.
XML allows us to define your own tags, so we can create our own
markup languages.
The XML specification is owned by W3C
XML is designed to be both machine readable and human readable.
XML provides a platform-neutral, language-independent means of
describing data.Obviously, its the markup that differentiates the XML document
from plain text.
8/14/2019 Session02 XML Syntax
10/36 2008 MindTree Consulting
The XML SyntaxStart & End Tags, Elements, Element nesting XML Names, Attributes,
XML Declaration, Entities, CDATA, Comments, Processing Instructions,Well formed XML
8/14/2019 Session02 XML Syntax
11/36Slide 11
XML - Example
Listing 2.1: An Address Book in XML
John Doe
34 Fountain Square Plaza
OH45202
CincinnatiUS
513-555-8889
513-555-7098
JackSmith
513-555-3465
8/14/2019 Session02 XML Syntax
12/36Slide 12
Elements Start and End Tags
The building block of XML is the element, as thats what comprisesXML documents. Each element has a name and a content.
513-555-7098
The content of an element is delimited by special markups known
as start tag and end tag.
Unlike HTML, both start and end tags are required. The following is
not correct in XML:
513-555-7098
8/14/2019 Session02 XML Syntax
13/36Slide 13
Names in XML
Element names must follow certain rules. As we will see, there are othernames in XML that follow the same rules.
Names in XML must start with either a letter or the underscore character
(_). The rest of the name consists of letters, digits, the underscore
character, the dot (.), or a hyphen (-). Spaces are not allowed in
names.
Finally, names cannot start with the string xml, which is reserved for the
XML specification itself.
Unlike HTML, names are case sensitive in XML.
By convention, XML elements are frequently written in lowercase. When a
name consists of several words, the words are usually separated by a
hyphen, as in address-book or written as AddressBook. Choose the
convention that works best for you but try to be consistent.
8/14/2019 Session02 XML Syntax
14/36Slide 14
Names in XML - Quiz
The following are examples of valid or invalid element names inXML:
8/14/2019 Session02 XML Syntax
15/36Slide 15
Attributes
It is possible to attach additional information to elements in the form ofattributes.
Attributes have a name and a value. The names follow the same rules as
element names.
The syntax is similar to HTML. Elements can have one or more attributes in
the start tag, and the name is separated from the value by the equalcharacter.
The value of the attribute is enclosed in double or single quotation marks.
For example, the tel element can have a preferred attribute:
513-555-8889Unlike HTML, XML insists on the quotation marks. The XML processor would
reject the following:
513-555-8889
8/14/2019 Session02 XML Syntax
16/36
8/14/2019 Session02 XML Syntax
17/36Slide 17
Empty Element
Elements that have no content are known as empty elements.Usually, they are enclosed in the document for the value of their
attributes.
There is a shorthand notation for empty elements: The start and
end tags merge and the slash from the end tag is added at the endof the opening tag.
For XML, the following two elements are identical:
Quiz
An empty element tag can have attributes. ( Yes / no)
8/14/2019 Session02 XML Syntax
18/36Slide 18
Nesting of Elements
Element content is not limited to text; elements can contain otherelements that in turn can contain text or elements and so on.
An XML document is a tree of elements. There is no limit to the depth ofthe tree, and elements can repeat. As you see in Listing 2.1, there are twoentry elements in the address-book element. The entry for John Doe hastwo tel elements. Figure 2.1 is the tree of Listing 2.1. [Refer: XML Example
slide]An element that is enclosed in another element is called a child. The
element it is enclosed into is itsparent.
Jack
Smith
Start and end tags must always be balanced and children are alwayscompletely enclosed in their parents. Following is legal or illegal?
JackSmith
8/14/2019 Session02 XML Syntax
19/36Slide 19
Root
At the root of the document there must be one and only oneelement. In other words, all the elements in the document must be
the children of a single element.
Quiz: Following example is legal or illegal?
John Doe
JackSmith
8/14/2019 Session02 XML Syntax
20/36
Slide 20
XML Declaration
TheXML declaration is the first line of the document. Thedeclaration identifies the document as an XML document. The
declaration also lists the version of XML used in the document.
The declaration can contain other attributes to support otherfeatures such as character set encoding.
The XML declaration is optional.
If the declaration is included however, it must start on the first
character of the first line of the document. The XMLrecommendation suggests you include the declaration in every XML
document.
8/14/2019 Session02 XML Syntax
21/36
Slide 21
XML Declaration Stand-alone document
If an XML document can be read with no reference to external sources, it is said to
be a stand-alone document. Such documents can be annotated with a standaloneattribute with a value of yes in the XML declaration. If an XML document requiresexternal sources to be resolved to parse correctly and/or to construct the entiredata tree (for example, a document with references to external general entities),then it is not a stand-alone document. Such documents may be markedstandalone='no', but because this is the default, such an annotation rarely appears in
XML documents.XML declarations
8/14/2019 Session02 XML Syntax
22/36
Slide 22
Comments
To insert comments in a document, enclose them between .
Comments are used for notes, indication of ownership, and more.
They are intended for the human reader and they are ignored by
the XML processor.
Comments cannot be inserted in the markup. They must appear
before or after the markup.
8/14/2019 Session02 XML Syntax
23/36
Slide 23
Unicode
Characters in XML documents follow the Unicode standard.
XML uses the 16 bit Unicode character set.XML processor must recognize the UTF-8 and UTF-16 encodings.
Most processors support other encodings. In particular, for WesternEuropean languages, they support ISO 8859-1 (the official name for Latin-1).
Documents that use encoding other than UTF-8 or UTF-16 must start withan XML declaration. The declaration must have an attribute encoding toannounce the encoding used. For example, a document written in Latin-1(such as with Windows Notepad) could use the following declaration:
Jos Dupont
8/14/2019 Session02 XML Syntax
24/36
Slide 24
XML Declaration - Quiz
How the XML processor can read the encoding parameter. Indeed,to reach the encoding parameter, the processor must read the
declaration. However, to read the declaration, the processor needs
to know which encoding is being used.
What about those documents that have no declaration (since thedeclaration is optional)?
8/14/2019 Session02 XML Syntax
25/36
8/14/2019 Session02 XML Syntax
26/36
Slide 26
Predefined Entities in XML
XML predefines entities for the characters used in markup (angle brackets,
quotes, and so on). The entities are used to escape the characters from
element or attribute content. The entities are
< left angle bracket must be escaped with > in the combination ]]> inCDATA sections (see the following)
' single quote can be escaped with ' essentially in parameter
value
" double quote can be escaped with " essentially in parameter
valueQuiz Correct / Incorrect?
Mark & Spencer
Mark & Spencer
8/14/2019 Session02 XML Syntax
27/36
Slide 27
Character references
XML also supports character references where a letter is replaced by its
Unicode character code.
DecimalUnicodeValue;
Character references that start with provide a decimal representation of the character
code.
HexadecimalUnicodeValue;
Character references that start with provides a hexadecimal representation of the
character code.
Example - Character references
Martin
Franais
8/14/2019 Session02 XML Syntax
28/36
Slide 28
Processing Instructions
Processing instructions (abbreviated PI) is a mechanism to insertnon-XML statements, such as scripts, in the document.
The processing instruction is enclosed in .
The first name is the target. It identifies the application or the
device to which the instructions are directed. The rest of theprocessing instructions are in a format specific to the target. It
does not have to be XML.
8/14/2019 Session02 XML Syntax
29/36
Slide 29
CDATA Sections
As you have seen, markup characters (left angle bracket and ampersand)that appear in the content of an element must be escaped with an entity.
For some applications, it is difficult to escape markup characters, if only
because there are too many of them. Also, it is difficult to include an XML
document in an XML document.
CDATA (Character DATA) sections are intended for these cases. CDATA
sections are delimited by . The XML processor ignores
all markup except for]]>
PCDATA stands for parsed character data and means the element can
contain text. #PCDATA is often (but not always) used for leaf elements.The difference between CDATA and PCDATA is that PCDATA cannot contain
markup characters.
8/14/2019 Session02 XML Syntax
30/36
8/14/2019 Session02 XML Syntax
31/36
Slide 31
CDATA Section - Example
The following example uses a CDATA section to insert an XMLexample into an XML document:
8/14/2019 Session02 XML Syntax
32/36
Slide 32
Well Formed XML
The end tag matches the corresponding start tag, and there is:
No overlapping in element definitions.
No instances of multiple attributes with the same name for one element
Syntax conforms to the XML Specifications
Start-tags all have matching end-tags (or are empty-element tags).
Element tags do not overlap.Attributes have unique names.
Markup characters are properly escaped.
Elements form a hierarchical tree, with a single root node.
There are no references to external entities, except if a DTD is
provided.
8/14/2019 Session02 XML Syntax
33/36
Slide 33
Well formed XML - example
Suraj
Kumar
Verma
IT Services
C2
, even these symbols don't bother it.]]>
AbhiDhar
R&D Services
8/14/2019 Session02 XML Syntax
34/36
Slide 34
Four Common Errors in XML Syntax
Forget End TagsForget That XML Is Case Sensitive
Introduce Spaces in the Name of Element
John Doe
Forget the Quotes for Attribute Value
513-555-8889
8/14/2019 Session02 XML Syntax
35/36
8/14/2019 Session02 XML Syntax
36/36
Thank you
XML Technology, Semester 4
SICSR Executive MBA(IT) @ MindTree, Bangalore, India
By Neeraj Singh (toneeraj(AT)gmail(DOT)com
)
mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]