52
1 XML and the Semi- Structured Data Model

XML and the Semi-Structured Data Model

  • Upload
    eljah

  • View
    70

  • Download
    0

Embed Size (px)

DESCRIPTION

XML and the Semi-Structured Data Model. Motivation. We have seen that relational databases are very convenient to query. However: There is a LOT of data not in relational databases!! Perhaps the most widely accessed database is the web, and it certainly isn’t a relational database. - PowerPoint PPT Presentation

Citation preview

Page 1: XML and the Semi-Structured Data Model

1

XML and the Semi-Structured Data Model

Page 2: XML and the Semi-Structured Data Model

2

Motivation

• We have seen that relational databases are very convenient to query. However:– There is a LOT of data not in relational

databases!!

• Perhaps the most widely accessed database is the web, and it certainly isn’t a relational database.

Page 3: XML and the Semi-Structured Data Model

3

Documents Vs. Databases

Documents Databases

Paragraphs, Sentences Tables, tuples

Easy for people to understand

Easy for computers to understand

Static Dynamic

Page 4: XML and the Semi-Structured Data Model

4

Querying the Web

• The web can be queried using a search engine, however, we can’t ask questions like:– What is the weather in Zanzibar today?– What is the lowest price for which a Jaguar is sold

on the web?

• Problems:– There are no facilities for asking complex

questions, such as aggregation of data– Words have overloaded meanings (Jaguar)

Page 5: XML and the Semi-Structured Data Model

5

Understanding the Web

• In order to query the web, we must be able to understand it.

• 2 Computer Science Approaches:– Artificial Intelligence Approach– Database Approach

Page 6: XML and the Semi-Structured Data Model

6

Artificial Intelligence Approach

“The web is unstructured and we must deal with it”

• Use techniques for machine learning to understand the web.

• Example: To understand the word “Jaguar” check if it appears on a page with the word car or automobile; or rather with jungle and Africa

• Problem: Such techniques tend to be inexact and have a large percentage of mistakes

Page 7: XML and the Semi-Structured Data Model

7

Database Approach

“The web is unstructured and we will structure it”

• Sometimes problems that are very difficult can be solved easily by enforcing a standard

• Encourage the use of XML as a standard for data exchange on the web

Page 8: XML and the Semi-Structured Data Model

8

Example XML Document<?xml version=“1.0”?>

<transaction>

<account>89-344</account>

<buy shares = “100”>

<ticker exch = “NASDAQ”>WEBM</ticker>

</buy>

<sell shares = “30”>

<ticker exch = “NYSE”>GE</ticker>

</sell>

</transaction>

Opening Tag

Attribute Name

Attribute Value

ElementClosing Tag

Page 9: XML and the Semi-Structured Data Model

9

XML Representation of a Table<?xml version=“1.0”?>

<ROWSET>

<ROW num = “1” >

<ENAME>KING </ENAME>

<SAL>5000</SAL>

</ROW>

<ROW num = “2” >

<ENAME>SCOTT </ENAME>

<SAL>3000</SAL>

</ROW>

</ROWSET>

ENAME SAL

KING 5000

SCOTT 3000

Page 10: XML and the Semi-Structured Data Model

10

Very Unstructured XML

<?xml version=“1.0”?>

<DamageReport>

The insured’s <Vehicle Make = “Volks”> Beetle </Vehicle> broke through the guard rail and plummeted into the ravine. The cause was determined to be <Cause>faulty brakes </Cause>. Amazingly there were no casualties.

</DamageReport>

Page 11: XML and the Semi-Structured Data Model

11

XML Vs. HTML

• XML and HTML are brothers. They are both special cases of SGML.

• HTML has specific tag and attribute names. These are associated with a specific meaning

• XML can have any tag and attribute name. These are not associated with any meaning

• HTML is used to specify visual style• XML is used to specify meaning

Page 12: XML and the Semi-Structured Data Model

12

Rules for Creating XML Documents

Page 13: XML and the Semi-Structured Data Model

13

Rule 1 – XML Declaration

• An XML document should begin with an XML declaration. A simple declaration is:

<?xml version=“1.0”?>

Other things can be specified, such as

character encoding.

Page 14: XML and the Semi-Structured Data Model

14

Rule 2 – Document Element

• Use exactly one top-level document element:

Example:<?xml version=“1.0”?>

<Question> This is legal </Question>

<?xml version=“1.0”?>

<Question> Is this legal? </Question>

<Answer> No. </Answer>

Page 15: XML and the Semi-Structured Data Model

15

Rule 3 – Match Opening and Closing Tags

• XML is case sensitive. The following examples are all illegal

Example:

<Question> This is legal </QUESTION>

<Question> <B> Is this legal? </Question> </B>

Page 16: XML and the Semi-Structured Data Model

16

Rule 4 – Comments

• Comments are between <!-- and --> characters. Comments can’t appear as attribute values or within a tag.

Example:<!-- This is a legal comment -->

<Question <!-- This is illegal -->>

Why is this illegal

<!-- This is a legal comment -->

</Question>

Page 17: XML and the Semi-Structured Data Model

17

Rule 5 – Element Names

• Element and attribute names must be continuous sequences of letters or hyphens or underscores.

Example:Legal Names:

<_legal> <This-is-OK>

I Illegal Names: <2-Part-Question> <Two Part Question>

<Question 4You = “Yes”>

Page 18: XML and the Semi-Structured Data Model

18

Rule 6 – Attribute Values

• Attribute values – go in opening tags.– should be enclosed by matching quotes (‘ or “)– should have only text and not tags

Legal Example:

<Question Poster = “Yitzchak”>Do you like XML? </Question>

<Answer Poster = ‘Yaakov’>I do.</Answer>

Page 19: XML and the Semi-Structured Data Model

19

Rule 6 – Continued

Illegal Examples:

<Question Poster = “Yitzchak’>Do you like XML? </Question>

<Question>Do you like XML? </Question Poster = “Yitzchak”>

<Question Poster = “<first>Yitzchak</first>”>Do you like XML? </Question>

Page 20: XML and the Semi-Structured Data Model

20

Rule 7 – Empty Elements

• Empty elements are elements that do not contain text or nested elements. They can be written in a compact syntax:

<Person First = “Shmuel” Last = “Levy”></Person>

is the same as

<Person First = “Shmuel” Last = “Levy” />

Page 21: XML and the Semi-Structured Data Model

21

Abstract View of XML

Page 22: XML and the Semi-Structured Data Model

22

A Different Data Model

Relational Semi-Structured

Abstract

Model

Sets of tuples

Labeled Directed Graph

Concrete

Model

Tables XML Documents

Standard

for

Storing Data

Data Exchange

Page 23: XML and the Semi-Structured Data Model

23

An Example<?xml version=“1.0”?>

<transaction>

<account>89-344</account>

<buy shares = “100”>

<ticker exch = “NASDAQ”>WEBM</ticker>

</buy>

<sell shares = “30”>

<ticker exch = “NYSE”>GE</ticker>

</sell>

</transaction>

Page 24: XML and the Semi-Structured Data Model

24

Corresponding Treetransaction

account

89-344

buy

ticker

shares

100

NASDAQ WEBM

exch

sell

ticker

shares

30

NYSE GE

exch

Page 25: XML and the Semi-Structured Data Model

25

Using XML

• Quering XML: There are query languages that query XML and return XML. Examples: XQuery, XPath, SQL4X

• Displaying XML: An XML document can have an associated style-sheet which specifies how the document should be translated to HTML. Examples: CSS, XSL

Page 26: XML and the Semi-Structured Data Model

26

Namespaces

• Namespaces are used to attach an accepted meaning to a set of tags.

• Syntax for defining a namespace

<SomeElement xmlns:prefixname=“namespaceURL” >

the namespace will be recognized within the SomeElement element.

Page 27: XML and the Semi-Structured Data Model

27

Example Namespace

<irs:Form id=“1040” xmlns:irs=“http://www.irs.gov”><irs:Name>Tina Wells</irs:Name><PhoneNumber>03-5655666</PhoneNumber>

</irs:Name>

• In order for the namespace to be recognized in all elements, the declaration should be in the document element

Page 28: XML and the Semi-Structured Data Model

28

XSQL Pages

Page 29: XML and the Semi-Structured Data Model

29

What are XSQL Pages?

• XSQL pages are XML documents that have SQL queries embedded in them.

• When a user requests to view an XSQL page, the web server:1. Dynamically computes the embedded queries2. Translates the query results into XML3. Inserts the results in the proper places in the

document4. Transforms the result to HTML if a stylesheet is

given

Page 30: XML and the Semi-Structured Data Model

30

A Simple Example

<?xml version=“1.0”?>

<xsql:query connection=“scott” xmlns:xsql=“urn:oracle-xsql”>

SELECT sname

FROM Sailors

</xsql:query>You should specify the connection and the namespace on the document element

Page 31: XML and the Semi-Structured Data Model

31

Page Seen in Browser

<?xml version=“1.0”?>

<ROWSET>

<ROW num = “1” >

<SNAME>Rusty</SNAME>

</ROW>

<ROW num = “2” >

<SNAME>Justin </SNAME>

</ROW>

</ROWSET>

• A ROWSET element encloses query result

• Each ROW element encloses each row

• Each column in the row is within a tag with its column’s name

Page 32: XML and the Semi-Structured Data Model

32

Another Example

<?xml version=“1.0”?>

<RESULTS connection=“scott” xmlns:xsql=“urn:oracle-xsql”>

Here is something interesting:

<xsql:query>

SELECT sname, age + rating as ra

FROM Sailors

WHERE sid = 13

</xsql:query>

</RESULTS>

Page 33: XML and the Semi-Structured Data Model

33

Resulting Document

<?xml version=“1.0”?>

<RESULTS>

Here is something interesting:

<ROWSET>

<ROW num = “1” >

<SNAME>Rusty</SNAME>

<RA>55</RA>

</ROW>

</ROWSET>

</RESULTS>

Page 34: XML and the Semi-Structured Data Model

34

Using Parameters

• Your page can use parameters. The value of a parameter param is determined in the following fashion:1. The value of the URL parameter param if

supplied2. The value of the HTTP session object param if

supplied3. The value of the closest ancestor’s attribute

named param, if present4. An empty string

Page 35: XML and the Semi-Structured Data Model

35

Example with Parameters

<?xml version=“1.0”?>

<xsql:query connection=“scott” xmlns:xsql=“urn:oracle-xsql”

sname = “Joe”>

SELECT *

FROM Sailors

WHERE sname = ‘{@sname}’

</xsql:query>

Page 36: XML and the Semi-Structured Data Model

36

Evaluating the Query

• Suppose the XSQL document is at:

http://cs.huji.ac.il/~db/query1.xsql• Then, requesting the url:

http://cs.huji.ac.il/~db/query1.xsql?sname=Jim

will return all the details of Jim.• Requesting

http://cs.huji.ac.il/~db/query1.xsql

will return all the details of Joe (the defualt value)

Page 37: XML and the Semi-Structured Data Model

37

A Strange Example

<?xml version=“1.0”?>

<xsql:query connection=“scott” xmlns:xsql=“urn:oracle-xsql”

select = “*” where = “1=1” order=“1”>

SELECT {@select}

FROM {@from}

WHERE {@where}

ORDER BY {@order}

</xsql:query>

Page 38: XML and the Semi-Structured Data Model

38

Customizing Results

• The query tag can have different attributes that customize the query results. Here are some of the important options:– max-rows: The maximum number of rows returned– skip-rows: The number of rows to skip before

returning rows– rowset-element: The name of the rowset element– row-element: The name of the row element

Page 39: XML and the Semi-Structured Data Model

39

Customizing Results

<?xml version=“1.0”?>

<xsql:query connection=“scott” xmlns:xsql=“urn:oracle-xsql”

skip = “0” max-rows=“2” skip-rows={@skip} >

SELECT *

FROM Program

ORDER BY url

</xsql:query>

By calling the same page with different values for skip, we can see the different programs

Page 40: XML and the Semi-Structured Data Model

40

Notes

• An XSQL document can have many queries.• The queries can appear within arbitrary XML

tags

• We can produce XML that has a more nested structure using the CURSOR function...

Page 41: XML and the Semi-Structured Data Model

41

Remembering Subqueries in the SELECT Clause

• Subqueries in the SELECT clause must return a single value. What do we do if we want for each boat, all the sailors who reserved the boat?

• We want each bid to be associated with a table of Sailors data!

Page 42: XML and the Semi-Structured Data Model

42

Using the CURSOR Function

<?xml version=“1.0”?>

<xsql:query connection=“scott” xmlns:xsql=“urn:oracle-xsql”>SELECT bid,

CURSOR(SELECT sid, sname FROM Sailors S, Reserves R WHERE S.sid = R.sid

and R.bid = B.bid) as Reservers

FROM Boats B;</xsql:query>

Page 43: XML and the Semi-Structured Data Model

43

<?xml version=“1.0”?>

<ROWSET>

<ROW num = “1” >

<BID>113</BID>

<RESERVERS>

<RESERVERS_ROW num = “1” >

<SID> 13 </SID>

<SNAME> Joe </SNAME>

</RESERVERS_ROW>

<RESERVERS_ROW num = “2” >

.... </RESERVERS_ROW>

</RESERVERS>

</ROW>

</ROWSET>

Note use of select query alias instead of inner row set and row tags.

Page 44: XML and the Semi-Structured Data Model

44

Setting Page Level Parameters

• The following statement defines a parameter pname. The value of pname is the value in the first column of the first row of the query

• The variable pname will be recognized in the page

<xsql:set-page-param name=“pname”>

SELECT Statement

</xsql:set-page-param>

Page 45: XML and the Semi-Structured Data Model

45

Example<?xml version=“1.0”?>

<page connection=“scott” xmlns:xsql=“urn:oracle-xsql”>

<xsql:set-page-param name=“num-stories”> SELECT headings_num

FROM user_prefs WHERE userid={@user}

</xsql:set-page-param>

<xsql:query max-rows={@num-stories} > SELECT title, url FROM latest_news

</xsql:query>

</page>

Page 46: XML and the Semi-Structured Data Model

46

Another Way to Define a Page Level Parameter

• Page level parameters can also be set with the statement:

<xsql:set-page-param name=“pname” value=“val”/>

• For example:

<xsql:set-page-param name=“num-stories” value=“10”/>

Page 47: XML and the Semi-Structured Data Model

47

Additional Options

• The set-page-param element can have the following attributes:– only-if-unset: If the value is “yes” then the

parameter will be set only if it has no value– ignore-empty-value: If value is “yes” then the

parameter will be set only if its value will not be an empty string

Page 48: XML and the Semi-Structured Data Model

48

Setting Cookie Values

• The following statement defines a parameter pname. The value of pname is the value in the first column of the first row of the query

• The variable pname will be recognized until the cookie expires

<xsql:set-cookie name=“pname”> SELECT Statement

</xsql:set-cookie>

Page 49: XML and the Semi-Structured Data Model

49

Additional Attributes for Set-Cookie

• The set-cookie element can have the following attributes:– max-age: The number of seconds before

the cookie expires (defaults to expire when user exits current browser instance)

– only-if-unset– ignore-empty-value

Page 50: XML and the Semi-Structured Data Model

50

Example

<?xml version=“1.0”?>

<page connection=“scott” xmlns:xsql=“urn:oracle-xsql”>

<xsql:set-cookie name=“siteuser” max-age=“31536000”

only-if-unset=“yes” ignore-empty-value=“yes”> SELECT username

FROM site_users WHERE username= ‘{@username}’ and password=‘{@password}’

</xsql:set-cookie>

<!-- Other Actions Here -->

</page>

Page 51: XML and the Semi-Structured Data Model

51

DML or PL/SQL• We can do DML (update, insert, delete) or call PL/SQL

procedures with the following basic syntax:

<xsql:dml> DML Statement

</xsql:dml>

or

<xsql:dml>BEGIN

Any valid PL/SQL StatementEND;

</xsql:dml>

Page 52: XML and the Semi-Structured Data Model

52

Example<xsql:dml>

INSERT INTO page_requests_log(page,userid) VALUES(‘page12.xsql’, ‘{@siteuser}’)

</xsql:dml>

If successful the following element is added to the page:

<xsql-status action=“xsql:dml” rows=“n” />

Otherwise, an error element is added:<xsql-error action=“xsql:dml”> ...</xsql-error>