135
Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone / 123RF Stock Photo

Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

Ghislain Fourny

Big Data for Engineers Spring 201811. Data Models

pinkyone / 123RF Stock Photo

Page 2: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

2

CSV (Comma separated values)

ID,Last name,First name,Theory,1,Einstein,Albert,"General, Special Relativity"2,Gödel,Kurt,"""Incompleteness"" Theorem"

This is syntax

Page 3: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

3

CSV (Comma separated values)

ID,Last name,First name,Theory,1,Einstein,Albert,"General, Special Relativity"2,Gödel,Kurt,"""Incompleteness"" Theorem"

ID Last name First name Theory1 Einstein Albert General, Special Relativity2 Gödel Kurt "Incompleteness" Theorem

This is a data model

This is syntax

Page 4: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

4

Syntax vs. Data Models

ID,Last name,First name,Theory,1,Einstein,Albert,"General, Special Relativity"2,Gödel,Kurt,"""Incompleteness"" Theorem"

Physical viewSyntax

Page 5: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

5

Syntax vs. Data Models

ID,Last name,First name,Theory,1,Einstein,Albert,"General, Special Relativity"2,Gödel,Kurt,"""Incompleteness"" Theorem"

Physical viewSyntax

ID Last name First name Theory1 Einstein Albert General, Special Relativity2 Gödel Kurt "Incompleteness" Theorem

Logical viewData Model

Page 6: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

6

Syntax vs. Data Models

<a><d e="f"/><c>This is <b>text</b>.</c>

</a>

Physical viewSyntax

Page 7: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

7

Syntax vs. Data Models

<a><d e="f"/><c>This is <b>text</b>.</c>

</a>

Physical viewSyntax

Logical viewData Model

a

d

This is

c

b .

text

e = f

Page 8: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

8

Edge vs. Node labeling

foo

bar

Labels are on the edges

Page 9: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

9

Edge vs. Node labeling

foo

bar

foobar

foo

bar

Labels are on the edges Labels are on the nodes

Page 10: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

10

XML Data models

Information Set (Infoset)http://www.w3.org/TR/xml-infoset/

Page 11: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

11

XML Data models

Information Set (Infoset)http://www.w3.org/TR/xml-infoset/

Post Schema-Validation Infoset (PSVI)http://www.w3.org/TR/xmlschema11-1/

Page 12: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

12

XML Data models

Information Set (Infoset)http://www.w3.org/TR/xml-infoset/

Post Schema-Validation Infoset (PSVI)http://www.w3.org/TR/xmlschema11-1/

XQuery and XPath Data Model (XDM)http://www.w3.org/TR/xpath-datamodel/

Page 13: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

13

JSON Data Models

"original" (implicit) JSON Data Modelhttp://www.json.org/

Page 14: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

14

JSON Data Models

"original" (implicit) JSON Data Modelhttp://www.json.org/

JSON Schema Data Modelhttps://www.ietf.org/archive/id/draft-wright-json-

schema-01.txt

Page 15: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

15

JSON Data Models

"original" (implicit) JSON Data Modelhttp://www.json.org/

JSON Schema Data Modelhttps://www.ietf.org/archive/id/draft-wright-json-

schema-01.txt

JSONiq Data Model (JDM)http://www.jsoniq.org/docs/JSONiqExtensionToXQuer

y/html/section-jsoniq-data-model.html

Page 16: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

16

HTML/XML Data model

Document Object Model (DOM)http://www.w3.org/TR/REC-DOM-Level-1/

Page 17: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

17

XML Information Setgrigory_bruev / 123RF Stock Photo

Page 18: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

18

Information Set

<?xml version="1.0" encoding="UTF-8"?> <dc:metadata xmlns:dc="http://www.systems.ethz.ch"> <title xml:lang="en" year="2008" >Systems Group</title> <publisher>ETH Zurich</publisher> </dc:metadata>

Page 19: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

19

Information Set

Page 20: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

2020

The 11 XML Information Items

DocumentElementAttributeProcessing InstructionCharacterComment

NamespaceUnexpanded Entity ReferenceDTDUnparsed EntityNotation

Page 21: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

2121

The 11 XML Information Items

DocumentElementAttributeProcessing InstructionCharacterComment

NamespaceUnexpanded Entity ReferenceDTDUnparsed EntityNotation

Page 22: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

22

Document Information Items

Document Information Item[children] Element Information Item[document element] Element Information Item metadata[notations] <empty>[unparsed entities] <empty>[base URI ] file:///Users/bigdata/Documents/info.xml[character encoding scheme] UTF-8[standalone] <no value>[version] 1.0

docmetadata

Page 23: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

23

Element Information Items

Element Information Item[namespace name] http://www.systems.ethz.ch[local name] metadata[prefix] dc[children] Element Information Items [attributes] <empty>[namespace attributes] Attribute Information Item xmlns:dc[in-scope namespaces] Namespace Information Items[base URI] file:///Users/bigdata/Documents/info.xml[parent] Document Information Item

metadata

doc

title publisher

Page 24: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

24

Attribute Information Items

Attribute Information Item [namespace name] empty[local name] year[prefix] empty[normalized value] 2008[specified] true[attribute type] <no value> [references] unknown[owner element] Element Information Item

year

title

Page 25: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

25

XML Infoset - the treedoc

metadataxmlns:dc

title

dc->systems

ETH Zurich

publisher

langyear

Systems Group

dc->systemsdc->systems

Page 26: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

26

Post-Schema-Validation Infoset

Infoset

+

Types

Post-Schema-Validation Infoset (PSVI)

Page 27: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

27

XPath and XQuery Data Model

Weerapat Kiatdumrong / 123RF Stock Photo

Page 28: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

28

XDM: Sequences of Items

( , , , , , )

Page 29: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

29

XDM: Sequence of one item

Page 30: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

30

XDM: Sequence of one item

= ( )

Page 31: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

31

XDM: Sequences are flat

(( , ), )

Page 32: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

32

XDM: Sequences are flat

(( , ), )=( , , )

Page 33: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

33

XDM: Items

Atomic Node

Page 34: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

34

XDM: Seven Kinds of XML Nodes

§ Document node§ Element node§ Attribute node§ Text node§ Comment node§ Processing instruction node§ Namespace node

Page 35: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

35

XDM: Seven Kinds of XML Nodes

Infoset

XDM

Page 36: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

36

XDM vs. Infoset

Infoset

XDM

xs:untyped

Page 37: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

37

XDM: New Items in 3.0 and 3.1

Functions

Page 38: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

38

XDM: New Items in 3.0 and 3.1

Functions Maps

lorem

ipsum

dolor

sit

amet

Page 39: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

39

XDM: New Items in 3.0 and 3.1

Functions Maps Arrays

lorem

ipsum

dolor

sit

amet

Page 40: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

40

XDM and Querying

Expression

for if thenelse where

order by

whileany

every

let return

exit with

=

+

Page 41: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

41

Types

Page 42: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

42

Type Systems

Almost all type systems (Java, SQL, PSVI, JDM, Protocol buffers, Avro, Parquet, and so on) share the following properties:

Page 43: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

43

Type Systems

Almost all type systems (Java, SQL, PSVI, JDM, Protocol buffers, Avro, Parquet, and so on) share the following properties:

- Distinction between atomic types and structured types

Page 44: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

44

Type Systems

Almost all type systems (Java, SQL, PSVI, JDM, Protocol buffers, Avro, Parquet, and so on) share the following properties:

- Distinction between atomic types and structured types

- Same categories of atomic types

Page 45: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

45

Type Systems

Almost all type systems (Java, SQL, PSVI, JDM, Protocol buffers, Avro, Parquet, and so on) share the following properties:

- Distinction between atomic types and structured types

- Same categories of atomic types

- Lists and maps as structured types

Page 46: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

46

Type Systems

Almost all type systems (Java, SQL, PSVI, JDM, Protocol buffers, Avro, Parquet, and so on) share the following properties:

- Distinction between atomic types and structured types

- Same categories of atomic types

- Lists and maps as structured types

- Sequence type cardinalities

Page 47: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

47

Types (General)

Atomic Typesvs.

Structured Types

Page 48: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

48

Atomic Types

Page 49: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

49

Atomic Types

Strings

Page 50: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

50

Strings(Character sequences with monoid structure)

"foo"

"Zurich"

"Ilsebill salzte nach."

f o o

Z u r i c h

I l s e b i l l ␣ s a l z t e ␣ n a c h .

Page 51: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

51

Atomic Types

StringsNumbers

Page 52: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

52

Interval-based integer types(exist as signed and unsigned)

8-bit (Java's byte)16-bit (Java's short)32-bit (Java's int)64-bit (Java's long)

Page 53: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

53

Arbitrary precision decimals (and integers)

Any precision and scale

3141592653.5897932384626433832795

Page 54: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

54

Float and DoubleIEEE 754 standard

single precision double precision

32 bits 64 bits

ca. 7 digits3141592000 3141592653.58979

ca. 15 digits

10-37 to 1037 10-307 to 10308

Page 55: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

55

Atomic Types

StringsNumbersBooleans

Page 56: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

56

Booleans

TRUEttrueyyeson1

FALSEffalsennooff0

Page 57: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

57

Atomic Types

StringsNumbersBooleansDates and Times

Page 58: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

58

Dates and times

Date

Time

Timestamp

Duration

Page 59: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

59

Dates (Gregorian calendar)

Year + Month + Day2017 August 1st

(AD)

Page 60: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

60

Times

Hours + Minutes + Seconds

10 : 31 : 15.109378

Page 61: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

61

Timestamps

Year + Month + Day + Hours + Minutes + Seconds

2017 August 1st 10 : 31 : 15.109378(AD)

Page 62: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

62

Atomic Types

StringsNumbersBooleansDates and TimesTime Intervals

Page 63: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

63

Duration kinds

Year Month Day Hour Minute Second

Example: 2 years and 4 months

Page 64: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

64

Duration kinds

Year Month Day Hour Minute Second

Example: 2 years and 4 months

Example: 3 hours and 14 minutes

Page 65: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

65

Atomic Types

StringsNumbersBooleansDates and TimesTime IntervalsBinaries

Page 66: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

66

Atomic Types

StringsNumbersBooleansDates and TimesTime IntervalsBinariesNull

Page 67: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

67

Lexical space vs. value space

Value space Lexical space

Page 68: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

68

Lexical space vs. value space

"1""01"...

Value space Lexical space

Page 69: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

69

Lexical space vs. value space

"4""04""100b"...

Value space Lexical space

Page 70: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

70

Subtypes

Supertype'svalue space

Subtype'svalue space

Page 71: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

71

Structured Types

Data Structure ExamplesAssociative Arrays (a.k.a. maps)

JSON Object,Protobuf Message,Set of XML Attributes

Ordered Lists JSON Array,XML Element,Protobuf repeated field

Page 72: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

72

Structured Types

Data Structure ExamplesAssociative Arrays (a.k.a. maps)

JSON Object,Protobuf Message,Set of XML Attributes

Ordered Lists JSON Array,XML Element,Protobuf repeated field

Page 73: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

73

Cardinality

Howmany?

Commonsign

Common adjective

Page 74: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

74

Cardinality

Howmany?

Commonsign

Common adjective

One required

Page 75: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

75

Cardinality

Howmany?

Commonsign

Common adjective

One requiredZero or more *

Page 76: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

76

Cardinality

Howmany?

Commonsign

Common adjective

One requiredZero or more *Zero or one ? optional

Page 77: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

77

Cardinality

Howmany?

Commonsign

Common adjective

One requiredZero or more *Zero or one ? optionalOne or more +

Page 78: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

78

JSON Data Modelwklzzz / 123RF Stock Photo

Page 79: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

79

JSON Values

Atomic values

Strings

Numbers

Booleans

Null

Objects

Arrays

Structured values

Page 80: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

80

JSON Values

Atomic values

Strings

Numbers

Booleans

Null

ObjectsString-to-Value map

ArraysList of values

Structured values

Page 81: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

81

JSON Values

Atomic values

Strings

Numbers

Booleans

Null

ObjectsString-to-Value map

ArraysList of values

Structured values

Recursion

Page 82: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

82

Tree-based visual model

{"foo" : true,"bar" : [{"foobar" : "foo"

},null

]}

Page 83: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

83

Tree-based visual model

{"foo" : true,"bar" : [{"foobar" : "foo"

},null

]}

object

Page 84: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

84

Tree-based visual model

{"foo" : true,"bar" : [{"foobar" : "foo"

},null

]}

foo

true

bar

object

array

Page 85: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

85

Tree-based visual model

{"foo" : true,"bar" : [{"foobar" : "foo"

},null

]}

foo

true

bar

foobar

null

object

array

object

foo

Page 86: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

86

ValidationBurak Cakmak / 123RF Stock Photo

Page 87: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

87

Document

Validation: The Pipeline

Page 88: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

88

Document Well-Formedness

Validation: The Pipeline

Page 89: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

89

Document Well-Formedness Validation

Validation: The Pipeline

Page 90: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

90

On the oXygen Cheat Sheet

Validity Well-Formedness

Page 91: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

91

Validation vs. Annotation

Validation

Page 92: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

92

Validation vs. Annotation

Validation

Annotation

Page 93: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

93

Validation

Page 94: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

94

Validation

Page 95: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

95

Validation

Page 96: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

96

Validation

Page 97: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

97

Validation

Page 98: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

98

XML Schema

Page 99: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

99

Empty Schema

<?xml&version="1.0"&encoding="UTF98"?>&<xs:schema&&&xmlns:xs="http://www.w3.org/2001/XMLSchema">&</xs:schema>&&

Page 100: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

100

Simple Scenario<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="foo" type="xs:string"/> </xs:schema>

<?xml version="1.0" encoding="UTF-8"?> <foo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="schema.xsd"> This is text. </foo>

SchemaInstance

schema.xsd

file.xml

Page 101: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

101

Simple Scenario<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="foo" type="xs:string"/> </xs:schema>

<?xml version="1.0" encoding="UTF-8"?> <foo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="schema.xsd"> This is text. </foo>

SchemaInstance

schema.xsd

file.xml

Page 102: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

102

Simple Scenario<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="foo" type="xs:string"/> </xs:schema>

<?xml version="1.0" encoding="UTF-8"?> <foo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="schema.xsd"> This is text. </foo>

SchemaInstance

Page 103: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

103

Simple Scenario<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="foo" type="xs:integer"/> </xs:schema>

<?xml version="1.0" encoding="UTF-8"?> <foo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="schema.xsd"> 142857 </foo>

SchemaInstance

Page 104: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

104

Simple Types: Built-inStrings string

anyURIQName

Numbers decimalintegerfloatdoublelong int short bytepositiveInteger nonNegativeInteger...unsignedLong unsignedInt...

Booleans boolean

Page 105: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

105

Simple Types: Built-inDates and Times dateTime

timedategYearMonthgMonthDaygYeargMonthgDaydateTimeStamp

Time Intervals durationyearMonthDurationdayTimeDuration

Binaries hexBinary base64BinaryNull -

Page 106: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

106

Dates

2014-12-02

2014-12-02T10:15:00Z

01:15:00-08:00

Page 107: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

107

Durations

P1Y2MT3H

Page 108: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

108

User-defined types

Page 109: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

109

User-defined types

Restriction

Page 110: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

110

User-defined types

RestrictionUnion Not atomic

Page 111: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

111

User-defined types

RestrictionUnion Not atomic

List Not atomic

Page 112: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

112

Restriction<?xml&version="1.0"&encoding="UTF98"?>&<xs:schema&&&xmlns:xs="http://www.w3.org/2001/XMLSchema">&&&<xs:simpleType&name="myFixedLengthString">&&&&&<xs:restriction&base="xs:string">&&&&&&&<xs:length&value="3"/>&&&&&</xs:restriction>&&&</xs:simpleType>&&&<xs:element&name="foo"&type="myFixedLengthString"/>&</xs:schema>&

<?xml&version="1.0"&encoding="UTF98"?>&<foo&&&xmlns:xsi="http://www.w3.org/2001/XMLSchema9instance"&&&xsi:noNamespaceSchemaLocation="schema.xsd">ZRH</foo>&&&

SchemaInstance

Page 113: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

113

Restriction<xs:simpleType,name="myFixedLengthString">,,,<xs:restriction,base="xs:string">,,,,,<xs:length,value="3"/>,,,</xs:restriction>,</xs:simpleType>,,

<foo>ZRH</foo>,,,

SchemaInstance

Page 114: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

114

List

<xs:simpleType,name="myList">,,,<xs:list,itemType="xs:string"/>,</xs:simpleType>,,<foo>foo,bar,foobar</foo>,,

SchemaInstance

Page 115: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

115

Union

<xs:simpleType,name="myUnion">,,,<xs:union,memberTypes="xs:integer,xs:boolean"/>,</xs:simpleType>,,

<foo>true</foo>,

,,,

SchemaInstance

Page 116: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

116

Complex Types

Page 117: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

117

Complex Types

Empty <foo/>

Page 118: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

118

Complex Types

EmptySimple Content

<foo/>

<foo>text</foo>

Page 119: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

119

Complex Types

EmptySimple Content

Complex Content

<foo/>

<foo>text</foo>

<foo><a/><b/>

</foo>

Page 120: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

120

Complex Types

EmptySimple Content

Complex ContentMixed Content

<foo/>

<foo>text</foo>

<foo><a/><b/>

</foo><foo>Text<a/>Text<b/>

</foo>

Page 121: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

121

Complex content

SchemaInstance

<xs:complexType-name="complexContent">---<xs:sequence>-----<xs:element-name="twotofour"-type="xs:string"-minOccurs="2"-maxOccurs="4"/>-----<xs:element-name="zeroorone"-type="xs:boolean"-minOccurs="0"-maxOccurs="1"/>---</xs:sequence>-</xs:complexType>--

<foo>---<twotofour>foobar</twotofour>---<twotofour>foobar</twotofour>---<twotofour>foobar</twotofour>---<zeroorone>true</zeroorone>-</foo>----

Page 122: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

122

Complex content

<xs:complexType-name="complexContent">---<xs:sequence>-----<xs:element-name="twotofour"-type="xs:string"-minOccurs="2"-maxOccurs="4"/>-----<xs:element-name="zeroorone"-type="xs:boolean"-minOccurs="0"-maxOccurs="1"/>---</xs:sequence>-</xs:complexType>--

<foo>---<twotofour>foobar</twotofour>---<twotofour>foobar</twotofour>---<twotofour>foobar</twotofour>---<zeroorone>true</zeroorone>-</foo>----

SchemaInstance

Page 123: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

123

Empty content

<xs:complexType-name="emptyType">---<xs:sequence/>-</xs:complexType>--<foo/>---

SchemaInstance

Page 124: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

124

Simple content

<xs:complexType-name="dateCountry">---<xs:simpleContent>-----<xs:extension-base="xs:date">-------<xs:attribute-name="country"-type="xs:string"/>-----</xs:extension>---</xs:simpleContent>-</xs:complexType>--

<foo-country="Switzerland">2014D12D02</foo>---

SchemaInstance

Page 125: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

125

Mixed content

<xs:complexType-name="mixedContent"-mixed="true">---<xs:sequence>-----<xs:element-name="b"-type="xs:string"-minOccurs="0"-maxOccurs="unbounded"/>---</xs:sequence>-</xs:complexType>--

<foo>Some-text-and-some-<b>bold</b>-text.</foo>-----

SchemaInstance

Page 126: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

126

Simple type on attributes

<xs:complexType-name="withAttribute">---<xs:sequence/>---<xs:attribute-name="country"-----------------type="xs:string"-----------------default="Switzerland"/>-</xs:complexType>--<foo-country="Switzerland"/>---

SchemaInstance

Page 127: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

127

Named Types<xs:complexType name="empty"> <xs:sequence/> </xs:complexType> <xs:element name="c" type="empty"> </xs:element> <c/>

SchemaInstance

Page 128: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

128

Anonymous Types

<xs:element name="c"> <xs:complexType> <xs:sequence/> </xs:complexType> </xs:element> <c/>

SchemaInstance

Page 129: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

129

No namespaces<?xml&version="1.0"&encoding="UTF98"?>&<xs:schema&&&xmlns:xs="http://www.w3.org/2001/XMLSchema"&&&<xs:element&name="foo"&type="xs:string"/>&</xs:schema>&&

&

<?xml&version="1.0"&encoding="UTF98"?>&<foo&&&xmlns:xsi="http://www.w3.org/2001/XMLSchema9instance"&&&xsi:noNamespaceSchemaLocation="schema.xsd">&&&This&is&text.&</foo>&&

SchemaInstance

Page 130: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

130

With namespaces<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.example.com/bigdata" xmlns:big="http://www.example.com/bigdata"> <xs:element name="foo" type="xs:string"/> </xs:schema> <?xml version="1.0" encoding="UTF-8"?> <big:foo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=" http://www.example.com/bigdata schema.xsd" xmlns:big="http://www.example.com/bigdata"> This is text. </big:foo>

SchemaInstance

Page 131: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

131

Warning: named types with namespaces<xs:complexType name="empty"> <xs:sequence/> </xs:complexType> <xs:element name="c" type="big:empty"> </xs:element> <big:c/>

SchemaInstance

@name:implicitly in the target namespace

always unprefixed

Page 132: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

132

Bonus material: The Schema of Schemas

!!<xs:schema!!!!!xmlns:xs="http://www.w3.org/2001/XMLSchema"!!!!!targetNamespace="http://www.w3.org/2001/XMLSchema">!!!!!<xs:element!name="schema"!id="schema">!!!!!!!<xs:complexType>!!!!!!!!!<xs:complexContent>!!!!!!!!!!!..!!!!!!!!!</xs:complexContent>!!!!!!!</xs:complexType>!!!!!</xs:element>!!!!!<xs:element!name="element"!type="xs:topLevelElement"!id="element"/>!!!!!<xs:element!name="simpleType"!type="xs:topLevelSimpleType"!id="simpleType"/>!!!!!<xs:element!name="complexType"!type="xs:topLevelComplexType"!id="complexType"/>!!!!!<xs:complexType!name="element"!abstract="true">!!!!!!!<xs:complexContent>!!!!!!!!!..!!!!!!!</xs:complexContent>!!!!!</xs:complexType>!!!</xs:schema>!

Page 133: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

133

Alternate data models and validation formatswklzzz / 123RF Stock Photo

Page 134: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

134

Protocol buffers

message Person {required string last_name = 1;repeated string first_name = 2;optional Title title = 3;optional Person boss = 4;

}

Page 135: Ghislain Fourny Big Data for Engineers Spring 2018 · Ghislain Fourny Big Data for Engineers Spring 2018 11. Data Models pinkyone/ 123RF Stock Photo

135

Avro

fields valuesname ETHcanton ZHstudents 20,000

{"type" : "map","name" : "university","fields" : [{ "name" : "name", "type" : "string" },{ "name" : "canton", "type" : "cantonal-code" },{ "name" : "students", "type" : "long" }

]}